US00750 1286B2

(12) United States Patent (10) Patent No.: US 7,501,286 B2 Gygiet al. (45) Date of Patent: Mar. 10, 2009

(54) ABSOLUTE QUANTIFICATION OF Sannolo, et al., “Biomonitoring of Human Exposure to Methyl Bro PROTEINS AND MODIFIED FORMS mide by Isotope Dilution Mass Spectrometry of Adducts', THEREOF BY MULTISTAGE MASS Journal of Mass Spectrometry, J. Mass Spectrom. 34, 1028-1032 SPECTROMETRY (1999). Zhou, et al., XP-002974320“A systematic approach to the analysis of protein phosphorylation”. Nature Biotechnology, Apr. 2001, vol. 19. (75) Inventors: Steven P. Gygi, Foxboro, MA (US); pp. 375-378. Scott Anthony Gerber, Brookline, MA Vinale, et al., “Development of a Stable Isotope Dilution Assay for an (US) Accurate Quantification of Protein-Bound Ne (1-Deoxy-D-fructos 1-yl)-L-lysine Using a "C-Labeled Internal Standard”. J. Agric. (73) Assignee: President and Fellows of Harvard Food Chem, 1999, 47,5084-5092. College, Cambridge, MA (US) Desiderio, “Mass spectrometric analysis of neuropeptidergic sys tems in the human pituitary and cerebrospinal fluid'. Journal of (*) Notice: Subject to any disclaimer, the term of this Chromatography B, 731 (1999) 3-22. patent is extended or adjusted under 35 Barr, et al., “Isotope dilution-mass spectrometric quantification of U.S.C. 154(b) by 323 days. specific proteins: model application with apolipoprotein A-1’’. Clini cal Chemistry 42:10, 1676-1682 (1996). (21) Appl. No.: 10/781,047 Goshe, et al., “Phosphoprotein Isotope-Coded Affinity Tag Approach for Isolating and Quantitating Phosphopeptides in Proteome-Wide (22) Filed: Feb. 17, 2004 Analyses”. Anal. Chem. 2001, 73, 2578-2586. Gygi, et al., “Quantitative analysis of complex protein mixtures using (65) Prior Publication Data isotope-coded affinity tags'. Nature Biotechnology, vol. 17, Oct. 1999, pp. 994-955. US 2004/O229283 A1 Nov. 18, 2004 * cited by examiner (51) Int. Cl. Primary Examiner Jon P Weber GOIN 24/00 (2006.01) Assistant Examiner Bin Shen (52) U.S. Cl...... 436/173 (74) Attorney, Agent, or Firm George W. Neuner; Gregory (58) Field of Classification Search ...... None B. Butler, Esq.; Edwards Angell Palmer & Dodge LLP See application file for complete search history. (56) References Cited (57) ABSTRACT U.S. PATENT DOCUMENTS The invention provides reagents, kits and methods for detect 4,572,915 A 2f1986 Crooks ing and/or quantifying proteins in complex mixtures, such as 5,532,002 A 7/1996 Story a lysate. The methods can be used in high throughput 20O2/O123055 A1 9, 2002 Estell et al. assays to profile cellular proteomes. In one aspect, the inven tion provides a peptide internal standard labeled with a stable FOREIGN PATENT DOCUMENTS isotope and corresponding in sequence to the WO WOOOf 11208 3, 2000 amino acid sequence of a Subsequence of a target polypep WO WO O2/O84250 A2 10, 2002 tide. In another aspect, the peptide internal standard is labeled OTHER PUBLICATIONS at a modified amino acid residue and is used to determine the presence of, and/or quantitate the amount of a particular Gerberetal. “Direct profiling of multiple enzyme activities in human cell lysates by affinity chromatography/electrospray Ionization mass modified form of a protein. spectrometry application to clinical enzymology'. Anal. Chem. 2001, 73:1651-1657.* 18 Claims, 7 Drawing Sheets U.S. Patent Mar. 10, 2009 Sheet 1 of 7 US 7,501,286 B2

Target Protein

...RLSFVFGGTDEK. Tracer (IS) Peptide Sythesis !

HN-LSFVF*GGTDEK-COOH

FIG. 1

100% o MS (M+2H)2

FG. 2 U.S. Patent Mar. 10, 2009 Sheet 2 of 7 US 7,501,286 B2

100%

100% MS/MS/MS

MS/MS/MS cH

MS/MS

D* A. B m/Z FIG. 3B U.S. Patent Mar. 10, 2009 Sheet 3 of 7 US 7,501,286 B2

Biological sample S Homogenze !

Cell lysate(protein)

Add S (e.g. 10fmol) l Trypsinze !

Cell lysate()

FIG. 4A

LC-MS/MSIMS Ratio (Da? D'a)=absolute quantification

Time FIG. 4B U.S. Patent Mar. 10, 2009 Sheet 4 of 7 US 7,501,286 B2

U.S. Patent Mar. 10, 2009 Sheet 7 Of 7 US 7,501,286 B2

SW/SW

eouepundW enee US 7,501,286 B2 1. 2 ABSOLUTE QUANTIFICATION OF gen atoms, respectively. Proteins from two cells lysates can PROTEINS AND MODIFIED FORMS be labeled independently with one or the other ICAT reagent THEREOF BY MULTISTAGE MASS at cysteinyl residues. After mixing and proteolysing the SPECTROMETRY lysates, the ICAT-labeled peptides are isolated by affinity to a biotin molecule incorporated into each ICAT reagent. ICAT GOVERNMENT GRANTS labeled peptides are analyzed by LC-MS/MS where they elute as heavy and light pairs of peptides. Quantification is At least part of the work contained in this application was performed by determining the relative expression ratio relat performed under government grant HG00041 from the ing to the amount of each ICAT-labeled peptide pair in the National Institutes of Health, U.S. Department of Health and 10 sample. Human Services. The government may have certain rights in Identification of each ICAT-labeled peptide is performed this invention. by a second stage of mass spectrometry (MS/MS) and sequence database searching. The end result is relative pro RELATED APPLICATIONS tein expression ratios on a large scale. The major drawback to 15 this technique are 1) quantification is only relative; 2) spe This application claims priority under 35 U.S.C. S 120 to cialized chemistry is required, and 3) database searches are PCT/US02/025778, filed Aug. 14, 2002, published in hindered by the presence of the large ICAT reagent molecule, English, which claims priority under 35 U.S.C. S 119(e) to and 4) relative amounts of posttranslationally modified (e.g., United States Provisional Application Ser. No. 60/312,279, phosphorylated) proteins are transparent to analysis. filed Aug. 14, 2001. SUMMARY FIELD OF THE INVENTION The present invention provides reagents, kits, and methods This invention provides methods, reagents and kits for for accurate quantification of proteins and methods for using obtaining absolute quantification of proteins and their modi 25 the same. The reagents, kits, and methods of the invention are fications directly from cell lysates. In particular, the invention useful for rapid, high throughput analysis of proteomes. provides peptide internal standards for use in high perfor In one aspect, the invention provides a method for gener mance liquid chromatography (HPLC) with online detection ating a peptide internal standard. The method comprises iden by multistage mass spectrometry (MS). tifying a real or predicted peptide digestion product of a target 30 polypeptide, determining the amino acid sequence of the BACKGROUND OF THE INVENTION peptide digestion product and synthesizing a peptide having the amino acid sequence. The peptide is labeled with a mass There is a need to provide novel methods for the quantifi altering label (e.g., by incorporating labeled amino acid resi cation of proteins and modified proteins from cell lysates. The dues during the synthesis process) and fragmented (e.g., by current standard for protein detection (quantification) is 35 multi-stage mass spectrometry). Preferably, the label is a based on immunoreactive detection (Western analysis). How stable isotope. A peptide signature diagnostic of the peptide is ever, this technique requires the availability of an appropri determined, after one or more rounds of fragmenting, and the ately specific antibody. In addition, many antibodies only signature is used to identify the presence and/or quantity of a recognize proteins in an unfolded (denatured) form, cross peptide of identical amino acid sequence in a sample. reactivity can be severely limiting, and quantification is gen 40 Preferably, a labeled peptide is provided which co-elutes erally relative. with an unlabeled peptide having the same amino acid The development of methods and instrumentation for auto sequence (i.e., a target peptide) in a chromatographic separa mated, data-dependent electrospray ionization (ESI) tandem tion procedure (e.g., Such as HPLC). mass spectrometry (MS/MS) in conjunction with microcap In one aspect, the mass-altering label is part of a peptide illary liquid chromatography (LC) and database searching 45 comprising a modification, and the peptide is fragmented to has significantly increased the sensitivity and speed of the determine a peptide signature diagnostic of Such a modified identification of gel-separated proteins. Microcapillary LC peptide. The modified residue in the peptide internal standard MS/MS has been used successfully for the large-scale iden comprises a phosphorylated residue, a glycosylated residue, tification of individual proteins directly from mixtures with an acetylated residue, a ubiquitinated residue, a ribosylated outgel electrophoretic separation (Linket al., 1999; Opiteket 50 residue, or a farnesylated residue, or another modification al., 1997). However, while these approaches dramatically found in a cellular protein. In one aspect, panels of peptide accelerate protein identification, quantities of the analyzed internal standards are generated corresponding to (i.e., diag proteins cannot be easily determined, and these methods have nostic of) different modified forms of the same protein. not been shown to Substantially alleviate the dynamic range Peptide internal standards corresponding to different pep problem also encountered by the 2DE/MS/MS approach. 55 tide Subsequences of a single target protein also can be gen Therefore, low abundance proteins in complex samples are erated to provide for redundant controls in a quantitative also difficult to analyze by the microcapillary LC/MS/MS assay. In one aspect, different peptide internal standards cor method without their prior enrichment. responding to the same target protein are generated and dif There is thus a need to provide methods for the accurate ferentially labeled (e.g., peptides are labeled at multiple sites comparison of protein expression levels between cells in two 60 to vary the amount of heavy label associated with a given different states, particularly for comparison of low abundance peptide). proteins. In another aspect, a panel of peptide internal standards Another metholology has recently been described. ICATTM corresponding to different amino acid Subsequences of a reagent technology makes use of a class of chemical reagents single protein is used to scan for mutations in that protein. In called isotope coded affinity tags (ICAT). These reagents 65 a further aspect, peptide internal standards corresponding to exist in isotopically heavy and light forms which are chemi different variant sequences of a single amino acid Subse cally identical with the exception of eight deuterium or hydro quence of a single protein are provided. A match between a US 7,501,286 B2 3 4 peptide internal standard and a target peptide in a sample a second cell state. In a further aspect, the first cell is a normal indicates the presence of a variant sequence in the sample. In cell and the second cell is from a patient with a disease. In still one aspect, the multiple peptide internal standards corre a further aspect, the first cell is exposed to a condition and/or sponding to variant sequences are differentially labeled. treated with an agent and the second cell is not exposed and/or In a further aspect, a panel of peptide internal standards treated. Preferably, first and second mixtures are evaluated in corresponding to amino acid Subsequences of different pro parallel. teins in a molecular pathway is generated. Molecular path ways, include, but are not limited to signal transduction path Alternatively, the two mixtures can be from identical ways, cell cycle pathways, metabolic pathways, blood samples or cells. In one aspect, a labeled peptide internal clotting pathways, and the like. In one aspect, the panel 10 standard is provided in different known amounts in each includes peptide standards which correspond to different mixture. In another aspect, pairs of labeled peptide internal modified forms of one or more proteins in a pathway and the standards are provided each comprising mass-altering labels panel is used to determine the presence and/or quantity of the which differ in mass, e.g., by including different amounts of activated or inactivated form of a pathway protein. a heavy isotope in each peptide. The invention also provides a method for determining the 15 The invention also provides a method of determining the presence and/or quantity of a target polypeptide in at least one presence of and/or quantity of a modification in a target mixture of different polypeptides. The method comprises polypeptide. Preferably, the label in the internal standard is providing a mixture of different polypeptides and spiking the part of a peptide comprising a modified amino acid residue or mixture with a known quantity of a peptide internal standard to an amino acid residue which is predicted to be modified in labeled with a mass-altering label. Preferably, the labeled a target polypeptide. In one aspect, the presence of the modi peptide internal standard comprises a Subsequence of the fication reflects the activity of a target polypeptide and the target polypeptide and possesses a known peptide fragment assay is used to detect the presence and/or quantity of an signature diagnostic of the presence of the peptide Subse active polypeptide. The method is advantageous in enabling quence. The spiked mixture is treated with a protease activity detection of Small quantities of polypeptide (e.g., about 1 part to generate a plurality of peptides including the labeled pep 25 tide internal standard and peptides corresponding to the target per million (ppm) or less than about 0.001% of total cellular polypeptide. Preferably, a chromatographic separation step is protein). performed to isolate the labeled peptide internal standard and The invention additionally provides a method for scanning any target peptide present in the spiked mixture which com for mutations in a protein sequence using panels of peptide prises the same amino acid sequence as the standard. Prefer 30 internal standards corresponding to different variant forms of ably, the internal standard and target peptide co-elute with a single sequence or multiple peptide internal standards rep each other. resenting different amino acid subsequences of a protein. In The labeled peptide internal standard and target peptide are the first scenario, a match to a variant peptide internal stan fragmented (e.g., using multistage mass spectrometry) and dard in a sample indicates the presence of the variant in the the ratio of labeled fragments to unlabeled fragments; is 35 sample. In the second scenario, a lack of match to a one determined. The quantity of the target polypeptide can be peptide internal standard and matches to one or more other calculated using both the ratio and known quantity of the peptide internal standards indicates the presence of a muta labeled internal standard. The mixtures of different polypep tion in the amino acid sequence corresponding to the mis tides can include, but are not limited to, such complex mix matched peptide. tures as a crude fermenter solution, a cell-free culture fluid, a 40 In a further aspect, the invention provides a method for cell or tissue extract, blood sample, a plasma sample, alymph profiling the activity of a molecular pathway using panels of sample, a cell or tissue lysate; a mixture comprising at least peptide internal standards corresponding to different pathway about 100 different polypeptides; at least about 1000 different proteins and/or to different modified forms of the proteins. polypeptides, at least about 100,000 different polypeptides. The presence and/or quantity of the proteins can be used to or a mixture comprising Substantially the entire complement 45 profile the function of a pathway in a particular cell. In one of proteins in a cell or tissue. In one preferred aspect, the aspect, the pathway is one or more of a signal transduction method is used to determine the presence of and/or quantity of pathway, a cell cycle pathway, a metabolic pathway, a blood one or more target polypeptides directly from one or more cell clotting pathway and the like. The coordinate function of lysates, i.e., without separating proteins from other cellular multiple pathways can be evaluated using a plurality of panels components or eliminating other cellular components. 50 of standards. Similarly, the peptide internal standards can be In one aspect, the presence and/or quantity of target used to assay for the presence of multiple diseases or patho polypeptide in a mixture are diagnostic of a cell state. In logical conditions by providing a panel of peptide internal another aspect, the cell state is representative of an abnormal standards which comprises peptide internal standards diag physiological response, for example, a physiological nostic of different diseases. response which is diagnostic of a disease. In a further aspect, 55 The invention further provides reagents useful for perform the cell state is a state of differentiation or represents a cell ing the method. In one aspect, a reagent according to the which has been exposed to a condition or agent (e.g., a drug, invention comprises a peptide internal standard labeled with a therapeutic agent, a potential toxin). In one aspect, the a stable isotope. Preferably, the standard has a unique peptide method is used to diagnose the presence or risk of a disease. fragmentation signature diagnostic of the peptide. The pep In another aspect, the method is used to identify a condition or 60 tide is a Subsequence of a known protein and can be used to agent which produces a selected cell state (e.g., to identify an identify the presence of and/or quantify the protein in Sample, agent which returns one or more diagnostic parameters of a Such as a cell lysate. In one aspect, the peptide internal stan cell State to normal). dard comprises a label associated with a modified amino acid In a further aspect, the method comprises determining the residue. Such as a phosphorylated amino acid residue, a gly presence and/or quantity of target peptides in at least two 65 cosylated amino acid residue, an acetylated amino acid resi mixtures. In another aspect, one mixture is from a cell having due, a farnesylated residue, a ribosylated residue, and the like. a first cell State and the second mixture is from a cell having In another aspect, a pair of reagents is provided, a peptide US 7,501,286 B2 5 6 internal standard corresponding to a modified peptide and a FIGS. 4A and B illustrate steps in a method for absolute peptide internal standard corresponding to a peptide identical quantitation of proteins in a complex mixture of proteins. in sequence but not modified. FIG. 4A shows sample processing steps in which a cell lysate In one aspect, panels of peptide internal standards repre is spiked with a known amount of a labeled peptide internal senting different variant forms of a single amino acid Subse standard according to the invention. FIG. 4B shows mass quence of a polypeptide are provided. spectra of a labeled peptide internal standard and the corre In another aspect, panels of peptide internal standards cor sponding unlabeled peptide in the sample. The ratio of responding to different amino acid Subsequences of single labeled to unlabeled peptide provides a means to quantify the polypeptide are provided. amount of unlabeled peptide in the sample. In a further aspect, panels of peptide internal standards are 10 FIG. 5A shows a peptide internal standard suitable for use provided which correspond to different proteins in a molecu in detecting and/or quantitating a protein comprising the lar pathway (e.g., a signal transduction pathway, a cell cycle amino acid sequence GFTALK (SEQ ID NO: 1). The upper pathway, a metabolic pathway, a blood clotting pathway and panel of the Figure shows the native tryptic peptide. The the like). In still a further aspect, peptide internal standards lower portion of the FIG. shows a peptide internal standard corresponding to different modified forms of one or more 15 corresponding to this peptide which comprises a stable iso proteins in a pathway are provided. tope (C). As can be seen from the Figure, the stable isotope In still a further aspect, panels of peptide internal standards provides a characteristic mass difference in the two peptides are provided which correspond to proteins diagnostic of dif without altering the essential chemical structure of the pep ferent diseases, allowing a mixture of peptide internal stan tide. FIG. 5B shows a peptide internal standard suitable for dards to be used to test for the presence of multiple diseases in use in detecting a phosphorylated form of a protein compris a single assay. ing the amino acid sequence GFTALK (SEQID NO: 1). FIG. The invention additionally provides kits comprising one or 5C shows a peptide internal standard suitable for use in more peptide internal standards labeled with a stable isotope. detecting a methylated form of the amino acid sequence In one aspect, a kit comprises peptide internal standards com GFTALK (SEQID NO: 1). prising different peptide Subsequences from a single known 25 FIG. 6 shows diagnostic peptide fragmentation signatures protein. In another aspect, the kit comprises peptide internal obtained for two peptides comprising the sequences ALELFR standards corresponding to different variant forms of the (SEQ ID NO: 2) and LFTGHPETLEK (SEQ ID NO:3), same amino acid Subsequence of a target polypeptide. In still respectively, from the myoglobin protein. Each peptide pro another aspect, the kit comprises peptide internal standards duces a characteristic signature ion that can be used to detect corresponding to different known or predicted modified 30 and/or quantify myoglobin in a sample of cellular proteins. forms of a polypeptide. In a further aspect, the kit comprises Providing both peptide internal standards together in an assay peptide internal standards corresponding to sets of related can provide an additional control for quantification. proteins, e.g., such as proteins involved in a molecular path way (a signal transduction pathway, a cell cycle, etc) and/or to DETAILED DESCRIPTION different modified forms of proteins in the pathway. In still a 35 further aspect, a kit comprises a labeled peptide internal stan The invention provides reagents, kits and methods for dard as described above and software for performing multi detecting and/or quantifying proteins in complex mixtures, stage mass spectrometry. The kit may also include a means for Such as a cell lysate. The methods can be used in high through obtaining access to a database comprising data files which put assays to profile cellular proteomes. include data relating to the mass spectra of fragmented pep 40 tide ions generated from peptide internal standards. The Definitions means for obtaining access can be provided in the form of a The following definitions are provided for specific terms URL and/or identification number for accessing a database or which are used in the following written description. in the form of a computer program product comprising the As used in the specification and claims, the singular form data files. In one aspect, the kit comprises a computer pro 45 a”, “an and “the include plural references unless the con gram product which is capable of instructing a processor to text clearly dictates otherwise. For example, the term “a cell perform any of the methods described above. includes a plurality of cells, including mixtures thereof. The BRIEF DESCRIPTION OF THE FIGURES term “a protein’ includes a plurality of proteins. 50 “Protein', as used herein, means any protein, including, but The objects and features of the invention can be better not limited to peptides, enzymes, glycoproteins, hormones, understood with reference to the following detailed descrip receptors, antigens, antibodies, growth factors, etc., without tion and accompanying drawings. limitation. Presently preferred proteins include those com FIG. 1 is a schematic diagram illustrating a method for prised of at least 25 amino acid residues, more preferably at generating a peptide internal standard for a protein or modi 55 least 35 amino acid residues and still more preferably at least fied protein to be detected and/or quantified (Peptides shown 50 amino acid residues. The terms “polypeptide' and “pro are disclosed as SEQID NOS 4 and 5, respectively in order of tein’ are generally used interchangeably herein to refer to a appearance). polymer of amino acid residues. FIG. 2 illustrates characterization of peptide internal stan As used herein, the term "peptide' refers to a compound of dards by mass-to-charge ratio and retention time in reverse 60 two or more subunit amino acids. The subunits are linked by phase chromatography according to one aspect of the inven peptide bonds. tion. As used herein, a “target protein' or a “target polypeptide' FIGS. 3A and B show characterization of a peptide signa is a protein or polypeptide whose presence or amount is being ture by multistage mass spectrometry. FIG. 3A shows a sig determined in a protein sample. The protein/polypeptide may nature obtained after a second stage of mass spectrometry. 65 be a known protein (i.e., previously isolated and purified) or a FIG. 3B shows a signature obtained after performing a third putative protein (i.e., predicted to exist on the basis of an open stage of mass spectrometry. reading frame in a nucleic acid sequence). US 7,501,286 B2 7 8 As used herein, a “protease activity” is an activity which one which is reproducibly observed when a peptide digestion cleaves amide bonds in a protein or polypeptide. The activity product of a target protein/polypeptide identical in sequence may be implemented by an enzyme such as a protease or by a to the peptide portion of a peptide internal standard, is frag chemical agent, such as CNBr. mented and which differs only from the fragmentation pattern As used herein, “a protease cleavage site' is an amide bond 5 of the peptide internal standard by the mass of the mass which is broken by the action of a protease activity. altering label. Preferably, a diagnostic signature is unique to As used herein, a “labeled peptide internal standard refers the target protein (i.e., the specificity of the assay is at least to a synthetic peptide which corresponds in sequence to the about 95%, at least about 99%, and preferably, approaches amino acid Subsequence of a known protein or a putative 100%). protein predicted to exist on the basis of an open reading 10 A "relational database as used herein means a database in frame in a nucleic acid sequence and which is labeled by a which different tables and categories of the database are mass-altering label Such as a stable isotope. The boundaries related to one another through at least one common attribute of a labeled peptide internal standard are governed by pro and is used for organizing and retrieving data. tease cleavage sites in the protein (e.g., sites of protease The term “external database' as used herein refers to pub digestion or sites of cleavage by a chemical agent such as 15 licly available databases that are not a relational part of the CNBr). Protease cleavage sites may be predicted cleavage internal database, such as GenBank and Blocks. sites (determined based on the primary amino acid sequence As used herein, an “expression profile' refers to measure of a protein and/or on the presence or absence of predicted ment of a plurality of cellular constituents that indicate protein modifications, using a software modeling program) or aspects of the biological state of a cell. Such measurements may be empirically determined (e.g., by digesting a protein may include, e.g., abundances or proteins or modified forms and sequencing peptide fragments of the protein). In one thereof. aspect, a labeled peptide internal standard includes a modified As used herein, a “cell state profile” refers to values of amino acid residue. measurements of levels of one or more proteins in the cell. “Percent identity” and “similarity” between two sequences Preferably, such values are obtained by determining the can be determined using a mathematical algorithm (see, e.g., 25 amount of peptides in a sample having the same peptide Computational Molecular Biology, Lesk, A. M., ed., Oxford fragmentation signatures as that of peptide internal standards University Press, New York, 1988: Biocomputing: Informat corresponding to the one or more proteins. A "diagnostic ics and Genome Projects, Smith, D. W., ed., Academic Press, profile’ refers to values that are diagnostic of a particular cell New York, 1993: Computer Analysis of Sequence Data, Part state. Such that when Substantially the same values are 1, Griffin, A.M., and Griffin, H. G., eds., Humana Press, New 30 observed in a cell, that cell may be determined to have the cell Jersey, 1994; Sequence Analysis in Molecular Biology, von state. For example, in one aspect, a cell state profile comprises Heinje, G., Academic Press, 1987; and Sequence Analysis the value of a measurement of p53 expression in a cell. A Primer, Gribskov, M. and Devereux, J., eds., M. Stockton diagnostic profile would be a value which is significantly Press, New York, 1991). For example, the percent identity higher than the value determined for a normal cell and Such a between two amino acid sequences can be determined using 35 profile would be diagnostic of a tumor cell. A “test cell state the Needleman and Wunsch algorithm (J. Mol. Biol. (48): profile' is a profile which is unknown or being verified. 444-453, 1970) which is part of the GAP program in the GCG As used herein, a processor that “receives a diagnostic Software package (available at http://www.gcg.com), by the profile' receives data relating to the values diagnostic of a local homology algorithm of Smith & Waterman (Adv. Appl. particular cell state. For example, the processor may receive Math. 2: 482, 1981), by the search for similarity methods of 40 the values by accessing a database where such values are Pearson & Lipman (Proc. Natl. Acad. Sci. USA 85: 2444, stored through a server in communication with the processor. 1988) and Altschul, et al. (Nucleic Acids Res. 25(17): 3389 3402, 1997), by computerized implementations of these algo Labeled Peptide Internal Standards rithms (GAP, BESTFIT. FASTA, and BLAST in the Wiscon The invention provides labeled peptide internal standards sin Genetics Software Package (available from, Genetics 45 for use in determining the presence of, and/or quantifying the Computer Group, 575 Science Dr. Madison, Wis.), or by amount of a target protein in a sample which comprises an manual alignment and visual inspection (see, e.g., Ausubel et amino acid Subsequence identical to the peptideportion of the al., Supra). Gap parameters can be modified to Suit a user's internal standard. Peptide internal standards are generated by needs. For example, when employing the GCG software examining the primary amino acid sequence of a protein and package, a NWSgapdna. CMP matrix and a gap weight of 40, 50 synthesizing a peptide comprising the same sequence as an 50, 60, 70, or 80 and a length weight of 1, 2, 3, 4, 5, or 6 can amino acid Subsequence of the protein (see, e.g., FIG. 1). In be used. Examplary gap weights using a Blossom 62 matrix one aspect, the peptide's boundaries are determined by pre or a PAM250 matrix, are 16, 14, 12, 10, 8, 6, or 4, while dicting the cleavage sites of a protease. In another aspect, a exemplary length weights are 1, 2, 3, 4, 5, or 6. The percent protein is digested by the protease and the actual sequence of identity between two amino acid or nucleotide sequences also 55 one or more peptide fragments is determined. Suitable pro can be determined using the algorithm of E. Myers and W. teases include, but are not limited to one or more of serine Miller (CABIOS 4: 11-17, 1989) which has been incorpo proteases (e.g., Such as trypsin, hepsin, SCCE, TADG12, rated into the ALIGN program (version 2.0), using a PAM120 TADG14); metallo proteases (e.g., such as PUMP-1); chy weight residue table, a gap length penalty of 12 and a gap motrypsin, cathepsin, pepsin; elastase; pronase; Arg-C: Asp penalty of 4. 60 N: Glu-C: Lys-C; carboxypeptidases A, B, and/or C: dispase: As used herein, “a peptide fragmentation signature’ refers thermolysin; cysteine proteases such as gingipains, and the to the distribution of mass-to-charge ratios of fragmented like. Proteases may be isolated from cells or obtained through peptide ions obtained from fragmenting a peptide, for recombinant techniques. Chemical agents with a protease example, by collision induced disassociation, ECD, LID, activity also can be used (e.g., such as CNBr). PSD, IRNPD, SID, and other fragmentation methods. A pep 65 The target protein can be a known protein or a protein tide fragmentation signature which is "diagnostic' or a "diag predicted to exist on the basis of an open reading frame in a nostic signature' of a target protein or target polypeptide is nucleic acid sequence. Such open reading frames can be US 7,501,286 B2 9 10 identified from a database of sequences including, but not shift fragment masses produced by MS analysis to regions of limited to, the GenBank database, EMBL data library, the the spectrum with low background. The ion mass signature Protein Sequence Database and PIR-International, SWISS component is the portion of the labeling moiety which pref PROT. The ExPASy proteomics server of the Swiss Institute erably exhibits a unique ion mass signature in mass spectro of Bioinformatics (SIB) and databases described in PCT/ 5 metric analyses. The Sum of the masses of the constituent US01/25884. Predicted cleavage sites also can be identified atoms of the label is preferably uniquely different than the through modeling software, such as IVIS-Digest (available at fragments of all the possible amino acids. As a result, the http://prospector.ucsf.edu/). Predicted sites of protein modi labeled amino acids and peptides are readily distinguished fication also can be determined using software packages Such from unlabeled amino acids and peptides by their ion/mass as Scansite, Findmod, NetOGlyc (for prediction of type-O- 10 pattern in the resulting mass spectrum. In a preferred embodi glycosylation sequences), YinCYang (for prediction of ment, the ion mass signature component imparts a mass to a O-beta-GlcNac attachment sites), big-PI Predictor (for pre protein fragment produced during mass spectrometric frag diction of GPI modifications), NetPhos (for prediction of Ser, mentation that does not match the residue mass for any of the Thr, and Tyr phosphorylation sites), NMT (for prediction of 20 natural amino acids. N-terminal N-myristolation) and Sulfinator (for prediction of 15 The label should be robust under the fragmentation condi tyrosine sulfation sites) which are accessible through http:// tions of MS and not undergo unfavorable fragmentation. au.expasy.org/tools/#ptm, for example. Labeling chemistry should be efficient under a range of con A peptide sequence within a target protein is selected ditions, particularly denaturing conditions and the labeled tag according to one or more criteria to optimize the use of the preferably remains soluble in the MS buffer system of choice. peptide as an internal standard. Preferably, the size of the Preferably, the label does not suppress the ionization effi peptide is selected to minimize the chances that the peptide ciency of the protein. More preferably, the label does not alter sequence will be repeated elsewhere in other non-target pro the ionization efficiency of the protein and is not otherwise teins. Preferably, therefore, a peptide is at least about 6 amino chemically reactive. Alternatively, or additionally, the label acids. The size of the peptide is also optimized to maximize contains a mixture of two or more isotopically distinct species ionization frequency. Thus, peptides longer than about 20 25 to generate a unique mass spectrometric pattern at each amino acids are not preferred. In one aspect, an optimal labeled fragment position. peptide ranges from about 6 amino acids to about 20 amino In one preferred aspect, peptide internal standards com acids, and preferably from about 7 amino acids to about 15 prise mass-altering labels which are stable isotopes. In certain amino acids. preferred embodiments, the method utilizes isotopes of A peptide sequence is also selected which is not likely to be 30 hydrogen, nitrogen, oxygen, carbon, or Sulfur. Suitable iso chemically reactive during mass spectrometry. Thus, peptide topes include, but are not limited to, H, C, N, O, O, or sequences which comprise cysteine, tryptophan or methion S. In another aspect, pairs of peptide internal standards can ine residues are avoided. be provided, comprising identical peptide portions but distin Peptides also are selected based on the presence of one or guishable labels, e.g., peptides may be labeled at multiple more bonds that preferentially fragment. For example, 35 sites to provide different heavy forms of the peptide). Mul because peptides will preferentially fragment at proline resi tiple labeled amino acids may be incorporated in a peptide dues, intense fragment ions may be produced at proline. during the synthesis process. In another aspect, the label is Therefore in one aspect of the invention, a peptide is selected part of a peptide comprising a modified amino acid residue, from a region of a protein comprising a proline amino acid Such as a phosphorylated residue (see, e.g., FIG. 5B), a gly residue. 40 cosylated residue, an acetylated residue, a ribosylated resi In another aspect, a peptide is selected from a region of a due, or a farnesylated residue, a methilyated residue (see, e.g., protein which is not expected or not known to be modified, so FIG.5C). In this embodiment, pairs or larger sets of peptide that the peptide internal standard can be used to determine the internal standards corresponding to modified and unmodified quantity of all forms of the protein. However, in a further peptides also can be produced. In one aspect, such a pair/set is aspect, the peptide internal standard does include an amino 45 differentially labeled. acid residue which is expected to, or is known to be modified, Peptide internal standards are characterized according to to provide an internal standard to quantify only the modified their mass-to-charge ratio (m/z) and preferably, also accord form the protein (see, e.g., FIGS. 5B and 5C). Peptide stan ing to their retention time on a chromatographic column (e.g., dards representing modified (e.g., FIGS. 5B and 5C) and such as an HPLC column). Internal standards are selected unmodified forms of a protein (see, e.g., FIG.5A) can be used 50 which co-elute with peptides of identical sequence but which together, to determine the extent of protein modification in a are not labeled (see, e.g., FIG. 2). particular sample of proteins, i.e., to determine what fraction The peptide internal standard is then analyzed by frag of the total amount of protein is represented by the modified menting the peptide. Fragmentation can be achieved by form. inducing ion/molecule collisions by a process known as col The peptide is synthesized using one or more labeled 55 lision-induced dissociation (CID) (also known as collision amino acids (i.e., the label is actually part of the peptide) or activated dissociation (CAD)). Collision-induced dissocia less preferably, labels may be attached after synthesis. By tion is accomplished by selecting a peptide ion of interest with providing the label as part of the peptide (see, e.g., FIGS. a mass analyzer and introducing that ion into a collision cell. 5A-5C), there are minimal differences in the chemical struc The selected ion then collides with a collision gas (typically ture of a peptide internal standard and a native peptide 60 argon or helium) resulting in fragmentation. Generally, any obtained from the digestion of a target protein with a protease method that is capable of fragmenting a peptide is encom activity. Further, because the peptide is synthesized, it is passed within the scope of the present invention. In addition unnecessary to separate and/or purify the peptide from other to CID, other fragmentation methods include, but are not cellular proteins. limited to, surface induced dissociation (SID) (James and Preferably, the label is a mass-altering label. The type of 65 Wilkins, Anal. Chem. 62: 1295-1299, 1990; and Williams, et label selected is generally based on the following consider al., J Amer. Soc. Mass Spectrom. 1: 413-416, 1990), black ations: The mass of the label should preferably be unique to body infrared radiative dissociation (BIRD); electron capture US 7,501,286 B2 11 12 dissociation (ECD) (Zubarev, et al., J. Am. Chem. Soc. 120: Peptide internal standards may also be used to scan for 3265-3266, 1998); post-source decay (PSD), LID, and the mutations in proteins including, but not limited to, BRCA1, like. BRCA2, CFTR, p53, blood group antigens, HLA proteins, The fragments are then analyzed to obtain a fragmention MHC proteins, G-Protein Coupled Receptors, apolipoprotein spectrum. One suitable way to do this is by CID in multistage E. kinases (e.g., such as hCds1, MTKs, PTK, CDKs, STKs, mass spectrometry (MS). Traditionally used to characterize CaMs, and the like) (see, e.g., U.S. Pat. No. 6,426.206), the structure of a peptide and/or to obtain sequence informa phosphatases, human drug metabolizing proteins, viral pro tion, it is a discovery of the present invention, that MS" teins such as a viral envelope proteins (e.g., HIV envelope provides enhanced sensitivity in methods for quantitating proteins), transporter proteins, and the like. absolute amounts of proteins. Thus, in one aspect, peptide 10 In a further aspect, peptides corresponding to different internal standards are generated for low abundance proteins modified forms of a protein are synthesized, providing inter (e.g., below 2000 copies/cell). nal standards to detect and/or quantitate changes in protein Preferably, a peptide internal standard is analyzed by at modifications in different cell states. In still a further aspect, least two stages of mass spectrometry to determine the frag peptide internal standards are generated which correspond to mentation pattern of the peptide and to identify a peptide 15 different proteins in a molecular pathway and/or modified fragmentation signature (see, e.g., FIG.3A). More preferably, forms of Such proteins (e.g., proteins in a signal transduction a peptide signature is obtained in which peptide fragments pathway, cell cycle, metabolic pathway, blood clotting path have significant differences in m/z ratios to enable peaks way, etc.) providing panels of internal standards to evaluate corresponding to each fragment to be well separated. Still the regulated expression of proteins and/or the activity of more preferably, signatures are unique, i.e., diagnostic of a proteins in a particular pathway. Combinations of the above peptide being identified and comprising minimal overlap described internal standards can be used in a given assay. with fragmentation patterns of peptides with different amino acid sequences. If a suitable fragment signature is not Methods of Using Peptide Internal Standards obtained at the first stage, additional stages of mass spectrom The labeled peptide internal standards according to the etry are performed until a unique signature is obtained (see, 25 invention can be used to facilitate quantitative determination e.g., FIG. 3B). of the relative amounts of proteins in different samples. Also, Fragment ions in the MS/MS and MS spectra are gener the use of differentially isotopically labeled reagents as inter ally highly specific and diagnostic for peptides of interest. In nal standards facilitates quantitative determination of the contrast, to prior art methods, the identification of peptide absolute amounts of one or more proteins present in a single diagnostic signatures provides for a way to perform highly 30 sample. Samples that can be analyzed by method of the inven selective analysis of a complex protein mixture, such as a tion include, but are not limited to, cell homogenates; cell cellular lysate in which there may be greater than about 100, fractions; biological fluids, including, but not limited to urine, about 1000, about 10,000, or even about 100,000 different blood, and cerebrospinal fluid; tissue homogenates; tears; kinds of proteins. Thus, while conventional mass spectros feces; saliva; lavage fluids such as lung or peritoneal lavages; copy would not be able to distinguish between peptides with 35 and generally, any mixture of biomolecules, e.g., Such as different sequences but similar m/z ratios (which would tend mixtures including proteins and one or more of lipids, carbo to co-elute with any labeled Standard being analyzed), the use hydrates, and nucleic acids such as obtained partial or com of peptide fragmentation methods and multistage mass spec plete fractionation of cell or tissue homogenates. trometry in conjunction with LC methods, provide a way to Preferably, a proteome is analyzed. By a proteome is detect and quantitate target proteins which are only a small 40 intended at least about 20% of total protein coming from a fraction of a complex mixture (e.g., present in less than 2000 biological sample source, usually at least about 40%, more copies per cell or less than about 0.001% of total cellular usually at least about 75%, and generally 90% or more, up to protein) through these diagnostic signatures. and including all of the protein obtainable from the source. Multiple peptide Subsequences of a single protein may be Thus, the proteome may be present in an intact cell, a lysate, synthesized, labeled, and fragmented to identify optimal frag 45 a microsomal fraction, an organelle, a partially extracted mentation signatures. However, in one aspect at least two lysate, biological fluid, and the like. The proteome will be a different peptides are used as internal standards to identify/ mixture of proteins, generally having at least about 20 differ quantify a single protein, providing an internal redundancy to ent proteins, usually at least about 50 different proteins and in any quantitation system (see, e.g., as shown in FIG. 6). In most cases, about 100 different proteins, about 1000 different another aspect, peptide internal standards are synthesized 50 proteins, about 10,000 different proteins, about 100,000 dif which correspond to a single amino acid Subsequence of a ferent proteins, or more. In one aspect, a proteome comprises target polypeptide but which vary in one or more amino acids. substantially all of the proteins in a cell. In one preferred The peptide internal standards may correspond to known aspect, as shown in FIG. 4A, a complex mixture of cellular variants or mutations in the target polypeptide or can be proteins is evaluated directly from a cell lysate, i.e., without randomly varied to identify all possible mutations in an amino 55 any steps to separate and/or purify and/or eliminate cellular acid sequence. components or cellular debris. In one preferred aspect, peptide internal standards corre While the methods described herein are compatible with sponding to proteins expressed from nucleic acids compris any biochemical, immunological or cell biological fraction ing single nucleotide polymorphisms are synthesized to iden ation methods that reduce sample complexity and enrich for tify variant proteins encoded by Such nucleic acids. Thus, 60 proteins of low abundance, it is a particular advantage of the peptide internal standards can be generated corresponding to method that it can be used to detect and quantitate peptides in SNPs which map to coding regions of genes and can be used complex mixtures of polypeptides, such as cell lysates. to identify and quantify variant protein sequences on an indi Unlike methods in the prior art, because the present invention vidual or population level. SNP sequences can be accessed detects diagnostic signatures that are highly selective for indi through The Human SNP database retrieved from http:// 65 vidual peptides, the quantities of Such peptides can be dis www-genome.wi.mit.edu/SNP/human/index.html on-line, cerned even in a mixture of peptides of similar mass/charge retrieved on 2004-02-17. ratios. US 7,501,286 B2 13 14 Generally, the sample will have at least about 0.01 mg of standard is provided in different known amounts in each protein, at least about 0.05 mg, and usually at least about 1 mg mixture. In another aspect, pairs of labeled peptide internal of protein or 10 mg of protein or more, typically at a concen standards are provided each comprising mass-altering labels tration in the range of about 0.1-10 mg/mi. The sample may that differ in mass. For example, differentially labeled pep be adjusted to the appropriate buffer concentration and pH, if 5 tides may be generated by incorporating different amounts of desired. a heavy label into each peptide varying the number of sites In one aspect, as shown in FIG. 4A, a known amount of a within the peptides labeled by a heavy isotope. labeled peptide internal standard corresponding to a target The invention also provides a method of determining the protein to be detected and/or quantitated, is added to a sample presence of and/or quantity of a modification in a target such as a cell lysate. Preferably, about 10 femtomoles is 10 polypeptide. Preferably, the label in the internal standard is spiked into the sample. The sample is contacted with a pro attached to a peptide comprising a modified amino acid resi tease activity (e.g., one or more proteases or appropriate due or to an amino acid residue that is predicted to be modi chemical agent(s) are added to the sample) and the spiked fied in a target polypeptide. In one aspect, multiple internal sample is incubated for a suitable period of time to allow standards representing different modified forms of a single peptide digestion. If the target protein is present in the sample, 15 protein and/or peptides representing different modified the digestion step should liberate a target peptide identical in regions of the protein are added to a sample and correspond sequence to the peptide portion of the internal standard and ing target peptides (bearing the same modifications) are the amount of target peptides so liberated from target proteins detected and/or quantified. Preferably, standards representing in the sample should be proportional to the amount of target both modified and unmodified forms of a protein are provided protein in the sample. in order to compare the amount of modified protein observed Preferably, a separation procedure is performed to separate to the total amount of protein in a sample. a labeled peptide internal standard and corresponding target In another aspect, peptide internal standards comprising peptide from other peptides in the sample. Representative different peptides from a single protein are added in known examples include high-pressure liquid chromatography amounts to a sample to provide additional controls or to scan (HPLC), Reverse Phase-High Pressure Liquid Chromatogra 25 for mutations in different regions of a protein. In a further phy (RP-HPLC), electrophoresis (e.g., capillary electro aspect, peptides corresponding to a single amino acid Subse phoresis), anion or cation exchange chromatography, and quence in a protein but representing different variant forms of open-column chromatography. Preferred is microcapillary the protein are added to a sample as a means of detecting liquid chromatography. As discussed above, internal stan and/or quantifying a particular variant form of the protein. dards are selected so that they co-elute with their correspond 30 In still another aspect, peptide internal standards are added ing target peptides as pairs of peptides that differ only in the to a sample that represents different proteins in a molecular mass contributed by the mass-altering label. pathway (e.g., a signal transduction pathway, a cell cycle, a Each peptide then is examined by monitoring of a selected metabolic pathway, a blood clotting pathway) and/or differ reaction in the mass spectrometer. This involves using the ent modified forms of such proteins. In this aspect, the func prior knowledge gained by the characterization of the peptide 35 tion of the pathway is evaluated by monitoring the presence, internal standard and then requiring the mass spectrometer to absence or quantity of particular pathway proteins and/or continuously monitor a specific ion in the MS/MS or MS" their modified forms. Multiple pathways may be evaluated at spectrum for both the peptide of interest and the internal a time by combining mixtures of different pathway peptide standard. After elution, the areas-under-the-curve (AUC) for internal standards. both the peptide internal standard and target peptide peaks are 40 In a further aspect, peptide internal standards represent calculated (see, e.g., FIG. 4B). The ratio of the two areas proteins and/or modified forms thereof whose presence is provides the absolute quantification that can be normalized diagnostic of a particular tissue type (e.g., neural proteins, for the number of cells used in the analysis and the protein's cardiac proteins, skin proteins, lung proteins, liver proteins, molecular weight, to provide the precise number of copies of pancreatic proteins, kidney proteins, proteins characteristic the protein per cell. 45 In one aspect, the presence and/or quantity of target of reproductive organs, etc.). These can be used separately or polypeptide in a mixture is diagnostic of a cell state. In in combination to perform tissue-typing analysis. another aspect, the cell state is representative of an abnormal Peptide internal standards may represent proteins or modi physiological response, for example, a physiological fied forms thereof whose presence is characteristic of a par response which is diagnostic of a disease. In a further aspect, 50 ticular genotype (e.g., such as HLA proteins, blood group the cell state is a state of differentiation or represents a cell proteins, proteins characteristic of a particular pedigree, etc.). which has been exposed to a condition or agent (e.g., a drug, These can be used separately or in combination to perform a therapeutic agent, a potential toxin). Preferably, protein forensic analyses, for example. quantities identified are compared to a reference quantity In one aspect, peptide internal standards are used in prena obtained from a reference sample (e.g., a sample from a 55 tal testing to detect the presence of a congenital disease or to normal patient, a sample not exposed to a condition or agent, quantitate protein levels diagnostic of a chromosomal abnor etc.). mality. In another aspect, the method comprises determining the Peptide internal standards may represent proteins or modi presence and/or quantity of target peptides in at least two fied forms thereof whose presence is characteristic of particu mixtures. In still another aspect, one mixture is from a cell 60 lar diseases. Such peptides may correspond to target proteins having a first cell state and the second mixture is from a cell diagnostic of neurological disease (e.g., neurodegenerative having a second cell state. In a further aspect, the first cell is diseases, including, but not limited to, Alzheimer's disease; a normal cell and the second cell is from a patient with a amyotrophic lateral Sclerosis; dementia, depression; Down's disease. Preferably, first and second mixtures are evaluated in syndrome; Huntington's disease; peripheral neuropathy; parallel. 65 multiple Sclerosis; neurofibromatosis; Parkinson's disease; Alternatively, the two mixtures can be from identical and Schizophrenia). These standards can be used separately or samples or cells. In one aspect, the labeled peptide internal in combination to diagnose a neurological disease. US 7,501,286 B2 15 16 Preferably, sets of internal standards are used so that diag amino acids; antigens; cells, cell nuclei, organelles, portions nostic fragmentation signatures can be evaluated for a num of cell membranes; viruses; receptors; modulators of recep ber of different diseases in a single assay. Thus, a sample may tors (e.g., agonists, antagonists, and the like); enzymes: be obtained from a patient who presents with general Symp enzyme modulators (e.g., Such as inhibitors, cofactors, and toms associated with a neurological disease, and a peptide the like); enzyme Substrates; hormones; nucleic acids (e.g., internal standard mixture comprising internal standards for Such as oligonucleotides; polynucleotides; genes, cDNAS: proteins diagnostic of different neurological diseases can be RNA; antisense molecules, ribozymes, aptamers), and com added to the sample. The sample is contacted with a protease binations thereof. Compounds also can be obtained from activity and peptide fractions are obtained, e.g., such as by synthetic libraries from drug companies and other commer HPLC. Peptide ions are subsequently fragmented as 10 cially available sources known in the art (e.g., including, but described above to detect any diagnostic fragmentation sig not limited, to the LEADOUESTR, library of screening com natures present characteristic of a particular disease. The pounds) or can be generated through combinatorial synthesis uniqueness of the fragmentation signature thus allows a spe using methods well known in the art. In one aspect, a com cific diagnosis to be obtained while testing for a plurality of pound is identified as a modulating agent if it alters the site of different types of diseases. The peptide internal standard mix 15 modification of a polypeptide and/or if it alters the amount of ture may include a peptide internal standard corresponding to modification by an amount that is significantly different from a control target protein, Such as a constitutively expressed the amount observed in a control cell (e.g., not treated with protein of known abundance. A negative standard (e.g., Such compound) (setting p values to<0.05). In another aspect, a as a peptide internal standard corresponding to a plant pro compound is identified as a modulating agent, if it alters the tein) may also be provided. amount of the polypeptide (whether modified or not). Similarly, peptide internal standards can be used to diag Compounds identified as modulating agents are used in nose an immune disease, including, but not limited to, methods of treatment of pathologies associated with abnor acquired immunodeficiency syndrome (AIDS). Addison's mal sites/levels of modification or abnormal levels or types of disease; adult respiratory distress syndrome; allergies; anky protein. For administration to a patient, one or more Such losing spondylitis; amyloidosis; anemia; asthma, atheroscle 25 compounds are generally formulated as a pharmaceutical rosis; autoimmune hemolytic anemia; autoimmune thyroidi composition. Preferably, a pharmaceutical composition is a tis; bronchitis; cholecystitis; contact dermatitis; Crohn's sterile aqueous or non-aqueous solution, Suspension or emul disease; atopic dermatitis; dermatomyositis; diabetes melli Sion, which additionally comprises a physiologically accept tus; emphysema; episodic lymphopenia with lymphocytotox able carrier (i.e., a non-toxic material that does not interfere ins; erythroblastosis fetalis; erythema nodosum, atrophic gas 30 with the activity of the active ingredient). More preferably, tritis; glomerulonephritis; Goodpasture’s syndrome; gout: the composition also is non-pyrogenic and free of viruses or Graves disease: Hashimoto's thyroiditis; hypereosinophilia; other microorganisms. Any suitable carrier known to those of irritable bowel syndrome; myasthenia gravis; myocardial or ordinary skill in the art may be used. Representative carriers pericardial inflammation; osteoarthritis; osteoporosis; pan include, but are not limited to: physiological saline Solutions, creatitis; and polymyositis. 35 gelatin, water, alcohols, natural or synthetic oils, saccharide Similarly, peptide internal standards can be used to char Solutions, glycols, injectable organic esters such as ethyl ole acterize infectious diseases, respiratory diseases, reproduc ate or a combination of such materials. Optionally, a pharma tive diseases, gastrointestinal diseases, dermatological dis ceutical composition may additionally contain preservatives eases, hematological diseases, cardiovascular diseases, and/or other additives such as, for example, antimicrobial endocrine diseases, urological diseases, and the like. 40 agents, anti-oxidants, chelating agents and/or inert gases, Because peptide internal standards provide diagnostic and/or other active ingredients. fragmentation signatures for detecting and/or quantitating Routes and frequency of administration, as well doses, will proteins or modified forms thereof, changes in the presence or vary from patient to patient. In general, the pharmaceutical amounts of Such fragmentation signatures in a sample of compositions is administered intravenously, intraperito proteins from a cell (e.g., Such as a cell lyState), as discussed 45 neally, intramuscularly, Subcutaneously, intracavity or trans above, can be diagnostic of a cell state. In one aspect, a single dermally. Between 1 and 6 doses is administered daily. A fragmentation signature from a peptide internal standard is Suitable dose is an amount that is sufficient to show improve diagnostic. In other aspects, sets of fragmentation signatures ment in the symptoms of a patient afflicted with a disease are diagnostic and multiple peptide internal standards are associated anaberrant modification state oran abnormal level spiked into a sample to evaluate changes in cell state. 50 or type of a protein. Such improvement may be detected by In one preferred embodiment, changes in cell state are monitoring appropriate clinical or biochemical endpoints as evaluated after exposure of the cell to a compound. Com is known in the art. In general, the amount of a modulating pounds are selected which are capable of normalizing a cell agent present in a dose, or produced in situ by DNA present in state, e.g., by selecting for compounds which alter fragmen a dose (e.g., where the modulating agent is a polypeptide or tation signatures from those characteristic of abnormal physi 55 peptide encoded by the DNA), ranges from about 1 g to ological responses to those representative of a normal cell. about 100 mg per kg of host. Suitable dose sizes will vary with For example, a three way comparison of healthy, diseased, the size of the patient, but will typically range from about 10 and treated diseased individuals can identify which com mL to about 500 mL for 10-60 kg animal. A patient can be a pounds are able to restore a disease cell state to a one that mammal. Such as a human, or a domestic animal. more closely resembles a normal cell state. This can be used 60 to Screen for drugs or other therapeutic agents, to monitor the Computer Systems and Databases efficacy of treatment, and to detect or predict the occurrence The invention also provides methods for generating a data of side effects, whether in a clinical trial or in routine treat base comprising data files for storing information relating to ment, and to identify protein targets which are more impor diagnostic fragmentation signatures for peptide internal stan tant to the manifestation and treatment of a disease. 65 dards. Preferably, data in the data files include one or more Compounds which can be evaluated include, but are not peptide fragmentation signatures characteristic or diagnostic limited to: drugs; toxins; proteins; polypeptides; peptides; of a cell state (e.g., Such as a state which is characteristic of a US 7,501,286 B2 17 18 disease, a normal physiological response, a developmental In one preferred aspect, cell state profiles comprise quan process, exposure to a therapeutic agent, exposure to a toxic titative data relating to target proteins and/or modified forms agent or a potentially toxic agent, and/or exposure to a con thereof obtained by using one or more of the methods dition). Data in the data files also preferably includes values described above. corresponding to level of proteins corresponding to the pep A variety of data storage structures are available for creat tide fragmentation signatures found in a particular cell state. ing a computer readable medium or memory comprising data In one aspect, for a cell state determined by the differential files of the database. The choice of the data storage structure expression of at least one protein, a data file corresponding to will generally be based on the means chosen to access the the cell State will minimally comprise data relating to the stored information. For example, the data can be stored in a mass spectra observed after peptide fragmentation of a pep 10 word processing text file, formatted in commercially-avail tide internal standard diagnostic of the protein. Preferably, the able software such as WordPerfect and Microsoft Word, or data file will include a value corresponding to the level of the represented in the form of an ASCII file, stored in a database protein in a cell having the cell state. For example, a tumor application, such as DB2, Sybase, Oracle, or the like. The cell state is associated with the overexpression of p53 (see, skilled artisan can readily adapt any number of data processor e.g., Kern, et al., Int. J. Oncol. 21(2): 243-9, 2001). The data 15 structuring formats (e.g., textfiles, pdf files, or database struc file will comprise mass spectral data observed after fragmen tures) in order to obtain computer readable medium or a tation of a labeled peptide internal standard corresponding to memory having recorded thereon data relating to diagnostic a subsequence of p53. Preferably, the data file also comprises fragmentation signatures, e.g., Such as mass spectral data a value relating to the level of p53 in a tumor cell. The value obtained after fragmentation of the peptide internal stan may be expressed as a relative value (e.g., a ratio of the level dards, and protein levels. of p53 in the tumor cell to the level of p53 in a normal cell) or Correlations between a particular diagnostic signature as an absolute value (e.g., expressed in nM or as a % of total observed and a cell state (e.g., a disease, genotype, tissue cellular proteins). type, etc.) may be known or may be identified using the Preferably, the data files also include information relating database described above and Suitable statistical programs, to the presence or amount of a modified form of a target a 25 expert Systems, and/or data mining systems, as are known in polypeptide in at least one cell and to mass spectral data the art. diagnostic of the modified form (i.e., peak data for a frag In another aspect, the invention provides a computer sys mented peptide internal standard which corresponds to the tem comprising: a database having data files containing infor modified form). More preferably, the data files also comprise mation identifying diagnostic fragmentation signatures (e.g., spectral data diagnostic of the unmodified form as well as data 30 mass spectral peaks) as corresponding to particular peptide corresponding to the level of the unmodified form. internal standards which in turn are identified as correspond In one aspect, the database also comprises data relating to ing to particular target proteins. Preferably, the data files also the source of a cell whose cell state is being evaluated. For comprise information for relating the diagnostic fragmenta example, the database comprises data relating to identifying tion signatures so identified to one or more cell states, e.g., characteristics of a patient from whom the cell is derived. 35 where the target protein corresponding to the peptide internal The invention further provides a computer memory com standard is diagnostic of a cell state, the peptide internal prising data files for storing information relating to the diag standard and fragmentation signature are also identified nostic fragmentation signatures of peptide internal standards. within the data file as being diagnostic of a cell state. In one In one preferred aspect, the database comprises peptide diag preferred aspect, the system further comprises a user interface nostic signatures, e.g., mass spectral data obtained after frag 40 allowing a user to selectively view information relating to a mentation of one or more peptide internal standards, which diagnostic fragmentation signature and to obtain information can be used to identify a cell having a particular cell state. about a cell state. The interface may comprise links allowing More preferably, the database includes data relating to a plu a user to access different portions of the database by selecting rality of cell state profiles, i.e., data relating to levels of target the links (e.g. by moving a cursor to the link and clicking a proteins identified by the peptide internal standards in a plu 45 mouse or by using a keystroke on a keypad). The interface rality of cells having different cell states. For example, pro may additionally display fields for entering information relat files of disease states may be included in the database and ing to a sample being evaluated. these profiles will include measurements of levels of one or Still more preferably, the system is capable of comparing more proteins, or modified forms thereof, characteristic of the diagnostic fragmentation signatures of known peptide inter disease state. Profiles of cells exposed to different compounds 50 nal standards to mass spectral data obtained for peptides in a include measurements of levels of proteins or modified forms sample spiked with one or more internal standards in order to thereof characteristic of the response(s) of the cells to the determine and/or quantify levels of target proteins corre compounds. In one aspect, the measurements are obtained by sponding to the standards in the sample. When a match is performing any of the methods described above. identified, the system may also provide information regarding Preferably, the database is in electronic form and the cell 55 the cell state for which the peptide internal standard is diag state profiles, which are also in electronic form, provide mea nostic (i.e., the system will identify the source of the cell, the surements of levels of a plurality of proteins in a cell or cells compound to which a cell has been exposed, and/or a disease of one or more Subjects. In one aspect, the database comprises which the cell is responding to). In some aspects, sets of measurements of more than about 5, more than about 10, peptide internal standards are evaluated, as only the set will be more thanabout 30, more than about 50, more than about 100, 60 diagnostic. more than about 500, more than about 1000, more than about The system may also be used to collect and categorize 10,000, or more than about 100,000 proteins in a cell, i.e., the peptide fragmentation signatures for different types of cell database comprises data relating to the proteome of a cell. The states to identify sets of peptide internal standards character measurements represent levels of modified and/or unmodi istic of particular cell States. In this aspect, preferably, the fied forms of the proteins. In one aspect, the measurements 65 system comprises a relational database. More preferably, the also include data regarding the site of protein modifications in system further comprises an expert System for identifying one or more proteins in a cell. sets of peptide internal standards that are diagnostic of differ US 7,501,286 B2 19 20 ent cell States. In one aspect, the system is capable of cluster ATF-2, 3APK, MLK1-4, PAK, MKK, p38, a SAPK subunit, ing related information. Suitable clustering programs are , and one or more inflammatory cytokines. known in the art and are described in, for example, U.S. Pat. In another aspect, a set of peptide internal standards is No. 6,303,297. provided which comprises at least about two, at least about 5 The system preferably comprises a means for linking a or more, of peptide internal standards which correspond to database comprising data files of diagnostic fragmentation proteins selected from the group including, but not limited to, signatures to other databases, e.g., Such as genomic data PLC isoenzymes, phosphatidylinositol 3-kinase (PI-3 bases, pharmacological databases, patient databases, pro kinase), an actin-binding protein, a phospholipase Disoform, teomic databases, and the like. (PLD), and receptor and nonreceptor PTKs. Preferably, the system comprises in combination, a data 10 In another aspect, a set of peptide internal standards is entry means, a display means (e.g., graphic user interface); a provided which comprises at least about 2, at least about 5, or programmable central processing unit; and a data storage more, of peptide internal standards which correspond to pro means comprising the data files and information described teins involved in a JAK signaling pathway, e.g., Such as one or above, electronically stored in a relational database. more of JAK 1-3, a STAT protein, IL-2, TYK2, CD4, IL-4, Preferably, the central processing unit comprises an oper 15 CD45, a type I interferon (IFN) receptor complex protein, an ating system for managing a computer and its network inter IFN Subunit, and the like. connections. This operating system can be, for example, of In a further aspect, a set of peptide internal standards is the Microsoft Windows' family, such as Windows 95, Win provided which comprises at least about 2, at least about 5, or dows 98, or Windows NT, or any new Windows programmed more of peptide internal standards which correspond to developed. A Software component representing common lan cytokines. Preferably, such a set comprises standards selected guages may be provided. Preferred languages include C/C++, from the group including, but not limited to, pro-and anti and JAVAR). In one aspect, methods of this invention are inflammatory cytokines (which may each comprise their own programmed in Software packages which allow symbolic set or which may be provided as a mixed set of peptide entry of equations, high-level specification of processing, and internal standards). statistical evaluations. 25 In still another aspect, a set of peptide internal standards is provided which comprises a peptide diagnostic of a cellular Reagents and Kits differentiation antigen or CD. Such kits are useful for tissue The invention further provides reagents useful for perform typing. ing the method. In one aspect, a reagent according to the In one aspect, peptides corresponding to known variants or invention comprises a peptide internal standard labeled with 30 mutations in a target polypeptide, or which are randomly a stable isotope. Preferably, the standard has a unique peptide varied to identify all possible mutations in an amino acid fragmentation signature diagnostic of the peptide. The pep sequence, are provided in the kit. In a preferred aspect, pep tide is a Subsequence of a known protein and can be used to tide internal standards corresponding to proteins expressed identify the presence of and/or quantify the protein in Sample, from nucleic acids comprising single nucleotide polymor Such as a cell lysate. 35 phisms are provided. The invention additionally provides kits comprising one or Peptide internal standards may include peptides corre more peptide internal standards labeled with a stable isotope sponding to variant proteins selected from the group consist or reagents Suitable for performing Such labeling. In certain ing of BRCAI; BRCA2: CFTR: p53: a JAK protein; a STAT preferred embodiments, the method utilizes isotopes of protein; blood group antigens; HLA proteins; MHC proteins; hydrogen, nitrogen, oxygen, carbon, or Sulfur. Suitable iso 40 G-Protein Coupled Receptors; apolipoprotein E: kinases topes include, but are not limited to, H, C, N, O, 18O, (e.g., such as hCdsl, MTKs, PTK, CDKs, STKs, CaMs, and or 34S. In another aspect, pairs of peptide internal standards the like) (see, e.g., U.S. Pat. No. 6,426.206); phosphatases: are provided, comprising identical peptide portions but dis human drug metabolizing proteins; viral proteins, including tinguishable labels, e.g., peptides may be labeled at multiple but not limited to viral envelope proteins (e.g., an HIV enve sites to provide different heavy forms of the peptide. Pairs of 45 lope protein); transporter proteins; and the like. peptide internal standards corresponding to modified and In one aspect, the peptide internal standard comprises a unmodified peptides also can be provided. label associated with a modified amino acid residue. Such as In one aspect, a kit comprises peptide internal standards a phosphorylated amino acid residue, a glycosylated amino comprising different peptide Subsequences from a single 50 acid residue, an acetylated amino acid residue, a farnesylated known protein. In another aspect, the kit comprises peptide residue, a ribosylated residue, and the like. In another aspect, internal standards corresponding to different known or pre a pair of reagents is provided, a peptide internal standard dicted modified forms of a polypeptide. In a further aspect, corresponding to a modified peptide and a peptide internal the kit comprises peptide internal standards corresponding to standard corresponding to a peptide, identical in sequence but sets of related proteins, e.g., such as proteins involved in a 55 not modified. molecular pathway (a signal transduction pathway, a cell In another aspect, one or more control peptide internal cycle, etc), or which are diagnostic of particular disease standards are provided. For example, a positive control may states, developmental stages, tissue types, genotypes, etc. be a peptide internal standard corresponding to a constitu Peptide internal standards corresponding to a set may be tively expressed protein, while a negative peptide internal provided in separate containers or as a mixture or "cocktail” 60 standard may be provided corresponding to a protein known of peptide internal standards. not to be expressed in a particular cell or species being evalu In one aspect, a plurality of peptide internal standards ated. For example, in a kit comprising peptide internal stan representing a MAPK signal transduction pathway is pro dards for evaluating a cell state in a human being, a plant vided. Preferably, the kit comprises at least two, at least about peptide internal standard may be provided. 5, at least about 10 or more, of peptide internal standards 65 In still another aspect, a kit comprises a labeled peptide corresponding to any of MAPK, GRB2, mSOS, ras, raf, internal standard as described above and software for analyz MEK, p85, KHS1, GCK1, HPK1, MEKK 1-5, ELK1, c-JUN, ing mass spectra (e.g., Such as SEQUEST). US 7,501,286 B2 21 22 Preferably, the kit also comprises a means for providing state corresponding to a normal physiological response or to access to a computer memory comprising data files storing an abnormal physiological response, Such as a disease). information relating to the diagnostic fragmentation signa Preferably, the software enables a processor to receive a tures of one or more peptide internal standards. Access may plurality of diagnostic profiles and to select a diagnostic pro be in the form of a computer readable program product com- 5 file that most closely resembles or “matches the profile prising the memory, or in the form of a URL and/or password obtained for the test cell state profile by matching values of for accessing an internet site for connecting a user to Such a levels of proteins determined in the test sample to values in a memory. In another aspect, the kit comprises diagnostic frag diagnostic profile, to identify Substantially all of a diagnostic mentation signatures (e.g., Such as mass spectral data) in profile which matches the test cell state profile. electronic or written form, and/or comprises data, in elec- 10 tronic or written form, relating to amounts of target proteins Substantially all of a diagnostic profile is matched by a test characteristic of one or more different cell states and corre cell State profile when most of the cellular constituents (e.g., sponding to peptides which produce the fragmentation sig proteins in the proteome) which are diagnostic of the cell natures. state, are found to have substantially the same value in the two The kit may further comprise expression analysis Software 15 profiles within a margin provided by experimental error. Pref on computer readable medium, which is capable of being erably, at least about 75% of the diagnostic proteins can be encoded in a memory of a computer having a processor and matched, at least about 80%, at least about 85%, at least about capable of causing the processor to perform a method com 90% or at least about 95% can be matched. Preferably, where prising: determining a test cell state profile from peptide one, or only a few proteins (e.g., less than 10) are used to fragmentation patterns in a test sample comprising a cell with 20 establish s diagnostic profile, preferably all of the proteins an unknown cell state or a cell state being verified; receiving have Substantially the same value. a diagnostic profile characteristic of a known cell state; and Variations, modifications, and other implementations of comparing the test cell state profile with the diagnostic pro what is described herein will occur to those of ordinary skill file. in the art without departing from the spirit and scope of the In one aspect, the test cell State profile comprises values of 25 invention as described and claimed herein and Such varia levels of peptides in a test sample that correspond to one or tions, modifications, and implementations are encompassed more peptide internal standards provided in the kit. The diag within the scope of the invention. nostic profile comprises measured levels of the one or more All of the references identified hereinabove are expressly peptides in a sample having the known cell State (e.g., a cell incorporated herein by reference.

SEQUENCE LISTING

<16 Oc NUMBER OF SEO ID NOS: 5

<210 SEQ ID NO 1 <211 LENGTH: 6 &212> TYPE: PRT <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic peptide &220s FEATURE: <221 NAME/KEY: MOD RES <222> LOCATION: (3) <223> OTHER INFORMATION: May be (pT) &220s FEATURE: <221 NAME/KEY: MOD RES <222> LOCATION: (6) <223> OTHER INFORMATION: May be (mK) and/or radiolabeled <4 OO SEQUENCE: 1 Gly Phe Thr Ala Lieu Lys 1. 5

<210 SEQ ID NO 2 <211 LENGTH: 6 &212> TYPE: PRT <213> ORGANISM: Artificial Sequence &220s FEATURE: <223> OTHER INFORMATION: Description of Artificial Sequence: Synthetic peptide

<4 OO SEQUENCE: 2 Ala Lieu. Glu Lieu. Phe Arg 1. 5

<210 SEQ ID NO 3 US 7,501,286 B2 23 24

- Continued

LENGTH: 11 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Description of Artificial Sequence: Synthetic peptide

<4 OO SEQUENCE: 3 Lieu. Phe Thr Gly His Pro Glu Thir Lieu. Glu Lys 1. 5 1O

SEQ ID NO 4 LENGTH: 12 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Description of Artificial Sequence: Synthetic peptide

<4 OO SEQUENCE: 4 Arg Lieu. Ser Phe Val Phe Gly Gly. Thir Asp Glu Lys 1. 5 1O

SEO ID NO 5 LENGTH: 11 TYPE PRT ORGANISM: Artificial Sequence FEATURE: OTHER INFORMATION: Description of Artificial Sequence: Synthetic peptide FEATURE: NAME/KEY: MOD RES LOCATION: (3) OTHER INFORMATION: Radiolabeled Phe FEATURE: NAME/KEY: MOD RES LOCATION: (5) OTHER INFORMATION: Radiolabeled Phe

Lieu. Ser Phe Val Phe Gly Gly Thr Asp Glu Lys 1. 5 1O

What is claimed is: f) calculating from the ratio and the known quantity of the 1. A method for determining the presence and/or quantity 45 labeled internal standard, the quantity of the target of a modified target polypeptide in at least one mixture of polypeptide in the mixture. different polypeptides, comprising: 2. The method of claim 1, wherein the fragmenting is a) providing a mixture of different polypeptides; performed by multistage mass spectrometry. b) adding a known quantity of a single peptide internal 3. The method of claim 1, further comprising separating standard labeled with a mass-altering label, therebygen 50 peptides obtained in step (c) using a chromatography step. erating a spiked mixture, wherein the labeled peptide 4. The method according to claim 3, wherein the chroma internal standard comprises a Subsequence of the target tography step comprises performing HPLC. polypeptide and wherein the labeled peptide internal 5. The method according to claim 4, wherein the labeled standard possesses a known peptide fragment signature 55 peptide internal standard and target peptide comprising the diagnostic of the presence of the peptide; same amino acid sequences as the labeled peptide internal c) treating the spiked mixture with a protease activity to standard are co-eluted during separation. generate a plurality of peptides including the labeled 6. The method according to claim 1, wherein the mixture of peptide internal standard and peptides corresponding to different polypeptides is selected from the group consisting the target polypeptide; 60 of a crude fermenter solution, a cell-free culture fluid, a cell d) fragmenting the labeled peptide internal standard and or tissue extract, blood sample, a plasma sample, a lymph any target peptide present in the spiked mixture com sample, a cell or tissue lysate; a mixture comprising at least prising the same amino acid sequence as the labeled about 100 different polypeptides; a mixture comprising sub peptide internal standard; 65 stantially the entire complement of proteins in a cell or tissue. e) determining the ratio of labeled fragments to unlabeled 7. The method according to claim 1, wherein the peptide fragments; and internal standard is labeled using a stable isotope. US 7,501,286 B2 25 26 8. The method according to claim 1, wherein the labeled 13. The method according to claim 1, further comprising peptide internal standard is produced according to a method determining the presence and/or quantity of target peptides in for generating a peptide internal standard, comprising: at least two mixtures. a) identifying a real or predicted peptide digestion product 14. The method according to claim 13, wherein one mix of a target polypeptide; ture is from a cell having a first cell state and the second b) determining the amino acid sequence of the peptide; mixture is from a cell having a second cell state. c) synthesizing a peptide comprising the amino acid 15. The method according to claim 14, wherein the first cell sequence of the peptide digestion product; is a normal cell and the second cell is from a patient with a d) labeling the peptide with a mass-altering label; disease. e) fragmenting the peptide and identifying a peptide sig 10 nature diagnostic of the peptide. 16. The method according to claim 13, wherein the deter 9. The method according to claim 1, wherein the presence mining is done in parallel. and/or quantity of target polypeptide is diagnostic of a cell 17. The method according to claim 13, wherein the two State. mixtures are the same arid the labeled peptide internal stan 10. The method according to claim 9, wherein the cell state 15 dard is provided in different known amounts in each mixture. is representative of an abnormal physiological response. 18. The method according to claim 13, wherein the labeled 11. The method according to claim 10, wherein the abnor peptide internal standard in each mixture comprises the same mal physiological response is diagnostic of a disease. peptide but different labels. 12. The method according to claim 9, wherein the cell state is a state of differentiation.