O-Linked Glycopeptide Analysis Marshall Bern, Ph.D., Protein Metrics Inc
Total Page:16
File Type:pdf, Size:1020Kb
APPLICATION NOTE Byonic™: O-Linked Glycopeptide Analysis Marshall Bern, Ph.D., Protein Metrics Inc. (www.proteinmetrics.com) October 2014 Benefits summary Best practices for coping with the special challenges posed by searching for O- glycopeptides. How-to for using Byonic to effectively search for O-linked glycopeptides Method O-linked glycosylation presents more analytical problems than almost any other post- translational modification. Mucin-type domains may have 10 or more closely spaced modification sites (serines and threonines), heavily decorated with an assortment of O- glycans. Peptides from such domains are difficult to digest, ionize, and fragment, and then the resulting mass spectra are difficult to analyze due to the large number of possible “peptiforms” (peptide along with its modification state). Searching for, say, 8 different O- glycan compositions on each of 10 potential sites involves an enormous search (with 910 ≈ 3.5 billion combinations). In this application note, we show to how to cope with the special challenges posed by O-glycopeptides. As described in the N-glycosylation application note, Byonic identifies glycopeptides to the level of peptide sequence and glycan composition. A glycan composition is given by a string such as HexNAc(1)Hex(1)NeuAc(2), which specifies the monosaccharide composition, but does not distinguish isomers such as GlcNAc and GalNAc, nor identify the branching structure (called “topology” or “cartoon”) and linkage information (positions and stereochemistry of the glycosidic bonds). In this particular case, however, the most likely topology is the one shown top middle in Figure 1. GlcNAc (lower left) along with eight Figure 1. By convention the common GalNAc-initiated O-glycans with reducing end of a glycan, that is, compositions HexNAc(1), HexNAc(1)Hex(1), etc. the monosaccharide attached to the protein, is shown to the right. Application Note - Byonic™: O-linked glycopeptide analysis 1 APPLICATION NOTE As described in the N-glycosylation application note, Byonic’s Glycans tab offers three different ways to set glycan modifications: a glycan database in a simple text format, a menu for entering the glycan composition, and a free text format, which allows complete generality (for example, glycans on unusual residues such as cysteine or tyrosine). O-glycosylation analysis poses two challenges not usually present in N-glycosylation analysis: site localization and combinatorial explosion. For site localization, Byonic provides a statistic called “Delta Mod Score”, which is the drop in Byonic score from the top-scoring peptiform to the second-best peptiform. Delta Mod below about 20.0 means that the identification is uncertain in some way, usually in the placement of a modification. Delta Mod above about 40.0 means that the identification is significantly better than any other scored candidate, and assuming Byonic scored every possible peptiform, the identification should be correct in every detail. Of course, manual validation is always advisable for interesting peptiforms. There are a number of ways to cope with combinatorial explosion. Generally, one should keep the glycan database as small as possible, searching only what is relevant. In addition, there are three factors to consider. 1. First and foremost, it is best to use a focused protein database (see the application note) for searches with many modifications. 2. Second, concentrate the search on the most likely peptiforms; for example, it may be best to search only fully tryptic peptides with no missed cleavages. Allowing a missed cleavage may combine two peptides with 5 potential O-glycosylation sites each (which, if 8 O-glycan compositions are under consideration, gives a search space of 2 × 95 ≈ 180,000 peptiforms) into one peptide with 10 potential sites (and 3.5 billion peptiforms). 3. Third, if the search is still too large, combine O-glycan compositions into O-glycan sums. For example, the peptide FGVSSSSSGPSQTLTSTGNFK from Figure 3 could be searched for all O-GlcNAc peptiforms with the modification rule HexNAc / 203.79373 @ S,T | common10 and a Total common max setting of 10, giving 210 ≈ 1024 peptiforms, or could be searched less exactly with ten modification rules: HexNAc / 203.79373 @ S,T | rare1, 2HexNAcs / 407.58746 @ S,T | rare1, … and so forth. With a Total rare max setting of 1, the latter approach produces only 101 peptiforms for a ten-fold speed-up. The latter approach, however, gives up hope of fully automatic site localization, and just aims to identify an O-glycopeptide to the level of peptide identity and total mass of O-glycans. Application Note - Byonic™: O-linked glycopeptide analysis 2 APPLICATION NOTE Example Figure 2 shows the modification settings for a search of a sample enriched for O-GlcNAc. The search included phosphorylation, because many O-GlcNAc proteins are also phosphoproteins. Figure 3 shows an example identification from this search: an O-GlcNAc- ylated peptide from Nuclear pore complex protein Nup153. The O-GlcNAc has uncertain placement with HCD but confident placement with ETD fragmentation. This situation is typical, because glycosidic bonds break easily with collisional fragmentation, and O-glycans either fly off completely or lose monosaccharides. Because the initial bond is through O (and hence weak), O-glycopeptides also do not give Y1 (Peptide + HexNAc) ions as reliably. Figure 2. Modification settings for an O-GlcNAc / phosphorylation search on a sample with a number of sample preparation artifacts. The search, allowing non-specific N-terminus, against a focused database with ~250 target proteins, takes 3 minutes. The same search with Total common max set to 4 takes 13 minutes and gives only 1% more PSMs. Application Note - Byonic™: O-linked glycopeptide analysis 3 APPLICATION NOTE Figure 3. These spectra show successive HCD and ETD scans of the same precursor ion. The HCD spectrum (top) has no peaks to place the O-GlcNAc modification so Byonic gives a Delta Mod score of 0.0, but the ETD spectrum (bottom) includes four small flanking peaks (c3, c4, z17, and z18) and Byonic gives it a Delta Mod score of 36.9. Byonic uses ~ to denote peaks with labile modifications off (glycosylation, sulfation, phosphorylation, and carboxylation), so in the HCD spectrum y4 and ~y5 differ by 101 Da, the mass of unmodified threonine. Figure 4 shows the glycan modifications for a search on human IgA, which has a hinge peptide with 12 serines and threonines. Byonic cannot currently search for all possible O-glycans on all possible sites, so we must lower our goals to identifying the total O-glycosylation composition and resolving only a few O-glycans. This limitation, however, is not only a software limitation, because peptide fragmentation is rarely good enough to completely resolve more than two Figure 4. Glycan modifications used in a or three O-glycans. Figure 5 gives a typical search of an IgA sample. Here we use a 100- protein database (due to impure sample) and result: resolution of one glycan and partial Byonic’s option to set protein-specific PTMs. resolution of others. Byonic does, however, Total common max of 4 gives a 1 hour search. Application Note - Byonic™: O-linked glycopeptide analysis 4 APPLICATION NOTE identify 40 different total glycosylation compositions, up to HexNAc(11)Hex(11)NeuAc(1), a mass of 4308 Da. Figure 5. An ETD Orbitrap spectrum from an IgA sample that Byonic matched to K.HYTNPSQDVTVPC[+57]PVPS[+365]T[+365]PPTPS[+656]PST[+365]PPTPSPSC[+57]C[+57]HPR. The modification placements are incompletely resolved due to lack of peaks. The zoom shows z13 and z14. Both of these peaks show interference from other peaks, yet give evidence that ST[+365]PPTP is correct. Protein Metrics Inc. San Carlos, CA www.proteinmetrics.com Application Note - Byonic™: O-linked glycopeptide analysis 5 .