Tandem mass spectrometry identifies many mouse brain O-GlcNAcylated including EGF domain-specific O-GlcNAc transferase targets

Joshua F. Alfaroa, Cheng-Xin Gongb, Matthew E. Monroea, Joshua T. Aldricha, Therese R. W. Claussa,SamuelO.Purvinea, Zihao Wangc, David G. Camp IIa, Jeffrey Shabanowitzd, Pamela Stanleye, Gerald W. Hartc, Donald F. Huntd, Feng Yanga,1, and Richard D. Smitha,1

aPacific Northwest National Laboratory, Richland, WA 99352; bDepartment of Neurochemistry, New York State Institute for Basic Research in Developmental Disabilities, Staten Island, NY 10314; cDepartment of Biological Chemistry, Johns Hopkins University School of Medicine, Baltimore, MD 21205; dDepartment of Chemistry, University of Virginia, Charlottesville, VA 22904; and eDepartment of Cell Biology, Albert Einstein College of Medicine, New York, NY 10461

Edited by Richard L. Huganir, The Johns Hopkins University School of Medicine, Baltimore, MD, and approved March 20, 2012 (received for review January 11, 2012)

O-linked N-acetylglucosamine (O-GlcNAc) is a reversible posttransla- metric levels of O-GlcNAc modification at given sites on tional modification of Ser and Thr residues on cytosolic and nuclear substrates necessitate enrichment of O-GlcNAcylated proteins or proteins of higher eukaryotes catalyzed by O-GlcNAc transferase peptides before sequence analysis by MS (11). Additionally, iden- O tifying specific Ser and Thr residues that are O-GlcNAcylated is (OGT). -GlcNAc has recently been found on Notch1 extracellular do- fi O main catalyzed by EGF domain-specificOGT.AberrantO-GlcNAc mod- dif cult, because -GlcNAc is readily lost as an oxonium ion during ification of brain proteins has been linked to Alzheimer’s disease (AD). collision-induced dissociation (CID), a widely used fragmentation mode for peptide sequencing by MS (12). Alternative higher-energy However, understanding specific functions of O-GlcNAcylation in AD collisional dissociation (HCD) and electron transfer dissociation has been impeded by the difficulty in characterization of O-GlcNAc fi (ETD) MS methods have improved detection or have facilitated sites on proteins. In this study, we modi ed a chemical/enzymatic site-specific identification, but challenges remain (13). photochemical cleavage approach for enriching O-GlcNAcylated pep- Several methods to enrich O-GlcNAcylated proteins or pep- tides in samples containing ∼100 μg of tryptic peptides from mouse tides have led to the identification of a limited number of BIOCHEMISTRY cerebrocortical brain tissue. A total of 274 O-GlcNAcylated proteins O-GlcNAcylation sites by MS. For example, lectin weak-affinity were identified. Of these, 168 were not previously known to be mod- chromatography (4, 10, 14) enabled identification of up to 142 ified by O-GlcNAc. Overall, 458 O-GlcNAc sites in 195 proteins were O-GlcNAcylation sites in 62 proteins from mouse embryonic stem identified. Many of the modified residues are either known phosphor- cells (15). Immunoprecipitation, using O-GlcNAc-specific mono- ylation sites or located proximal to known phosphorylation sites. clonal antibodies (13, 16), identified 83 O-GlcNAcylated sites These findings support the proposed regulatory cross-talk between from a HEK293T cell extract (13). A recent metabolic labeling fi O-GlcNAcylation and phosphorylation. This study produced the most study that used alkyne-modi ed GlcNAc incorporated into OGT O substrates and Cu(I)-catalyzed [3 + 2] azide–alkyne cycloaddition comprehensive -GlcNAc proteome of mammalian brain tissue with – both protein identification and O-GlcNAc site assignment. Interest- (CuAAC) to a chemically cleavable biotin azide probe, enabled identification of 374 putative O-GlcNAc modified proteins, but ingly, we observed O-β-GlcNAc on EGF-like repeats in the extracellular yielded no information on sites of O-GlcNAc modification (17). In domains of five membrane proteins, expanding the evidence for ex- O O fi addition to limited coverage of -GlcNAcylation sites, a drawback tracellular -GlcNAcylation by the EGF domain-speci cOGT.Wealso to using these approaches is that they typically require milligram β α O fi report a GlcNAc- -1,3-Fuc- -1- -Thr modi cation on the EGF-like re- quantities of protein (4, 10, 13, 14, 16) or are limited to cultured peat of the versican core protein, a proposed substrate of Fringe β-1,3- cells (17), which makes them generally ill-suited for clinically de- N-acetylglucosaminyltransferases. rived tissue samples often available in small amounts. The chemical/enzymatic photochemical cleavage (CEPC) method chemical/enzymatic photochemical cleavage enrichment | glycosylation | (18, 19) improves upon the highly selective chemical/enzymatic mouse cerebral cortex approaches for O-GlcNAcylated proteins/peptides enrichment (20, 21) and increases analytical sensitivity by introducing a photo- O N O chemical cleavable-biotin probe that allows efficient release of single -linked -acetylglucosamine ( -GlcNAc) attached to fi Ser and Thr residues of cytosolic and nuclear proteins is a re- enriched peptides from the avidin af nity column. In this method A (Fig. 1A), O-GlcNAcylated peptides are first enzymatically labeled versible posttranslational modification (PTM) found in some bac- fi with azidogalactosamine (GalNAz). The free azido group in the teria, some protozoans, lamentous fungi, viruses, and all higher GalNAz is then conjugated to the alkyne group in a photocleavable eukaryotes. The enzymes that catalyze the dynamic cycling of biotin probe (PC-PEG-biotin-alkyne) through CuAAC. The bio- O-GlcNAc modification, O-GlcNAc transferase (OGT) and fi O tinylated peptides are enriched using avidin af nity chromatogra- -GlcNAc hexosaminidase (OGA), are more highly expressed in phy, and subsequently released through photochemical cleavage. the pancreas and brain than in other tissues (1, 2). In addition, O-GlcNAc-modified peptides enriched by this method are many proteins involved in neuronal communications, synaptic tagged with a basic aminomethyltriazolacetylgalactosamine (AMT- transmission, and synaptic plasticity are O-GlcNAcylated (3, 4), suggesting a role for this modification in brain function. O-GlcNAc

cycling is highly sensitive to nutrients and stress and is regulated by Author contributions: J.F.A. and F.Y. designed research; J.F.A., T.R.W.C., and F.Y. per- nearly every metabolic pathway. Aberrant O-GlcNAc modification formed research; J.F.A., M.E.M., J.T.A., S.O.P., Z.W., F.Y., and R.D.S. contributed new has been linked to Alzheimer’s disease (AD) (5, 6) in which brain reagents/analytic tools; J.F.A., J.S., P.S., G.W.H., D.F.H., and F.Y. analyzed data; and J.F.A., glucose metabolism is impaired (7). The reduced O-GlcNAcylation C.-X.G., D.G.C., P.S., G.W.H., D.F.H., F.Y., and R.D.S. wrote the paper. in AD contributes to hyperphosphorylation of tau protein and The authors declare no conflict of interest. formation of the neurofibrillary tangles characteristic of AD and This article is a PNAS Direct Submission. related neurodegenerative disorders (8, 9). 1To whom correspondence may be addressed. E-mail: [email protected] or rds@pnnl. Current understanding of the function of O-GlcNAcylation in gov. fi neurodegeneration has been impeded by dif culties in identifying This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. this modification, even using MS (10, 11). First, the substoichio- 1073/pnas.1200425109/-/DCSupplemental.

www.pnas.org/cgi/doi/10.1073/pnas.1200425109 PNAS Early Edition | 1of6 Downloaded by guest on September 30, 2021 Lysis A B WT mouse and 600 µg 20 mg cerebral trypsin desalted GalNAz transfer by corcal ssue digeson pepdes GalT1 Y289L, PNGase F 3xTg-AD mouse treatment, CIP 20 mg cerebral desalng corcal ssue 1/3 Saved

Labeling with Bion-PEG-PC-Alkyne

1/3 1/3

SCX, Bion-avidin SCX, Bion-avidin Enrichment Enrichment Original Modified Wash Method Wash Method UV cleavage UV cleavage

½ sample ½ sample ½ sample CID-ETD HCD-CID/ETD HCD-CID/ETD

WT 3xTg-AD WT 3xTg-AD WT 3xTg-AD *O-GlcNAc Sites: 358 152 160 135 182 139

O-GlcNAc Proteins: 249 133 126 104 144 122

Fig. 1. Overview of the CEPC method (A) and experimental work flow (B) used in this study for the identification of O-GlcNAc proteins and modification sites. (A) Steps modified in the enrichment strategy are depicted in dashed boxes. PNGase F and calf intestine phosphatase (CIP) are added to the reaction mixture to ensure selective and complete derivatization with GalNAz (18). During the CuAAC reaction, Cu(I) is stabilized with TBTA. (B) The original and modified wash methods for the biotin-avidin enrichment step were compared using peptides from one WT mouse and one 3xTg-AD mouse (female, 1 y old). The number of O-GlcNAc sites and proteins identified from LC-MS/MS analysis of individual samples is depicted at the bottom of the figure.

GalNAc) that facilitates ETD identification and site localization of protein identifications by ∼16% (Fig. 1B) and reduced the large O-GlcNAc–modified peptides (10, 18, 19, 21). This approach en- amount of hydrophobic Tris[(1-benzyl-1H-1,2,3-triazol-4-yl) methyl] abled identification of 141 O-GlcNAcylation sites in 64 proteins amine (TBTA) that remained from the CuAAC reaction (Fig. S1). from <15 μg of spindle and midbody proteins that were enriched from HeLa cells (18, 19). However, to date, this method has not Evaluation of MS Fragmentation Methods for Identifying CEPC- been used for complex tissues or global proteomic analyses. In Enriched Tagged O-GlcNAcylated Peptides. In a parallel effort we addition, there remain challenges in reducing contaminants from evaluated the capability of individual MS/MS methods (HCD, CID, the CuAAC reaction, which are detrimental to liquid chromatog- and ETD) for identifying tagged O-GlcNAc (AMT-GalNAz- raphy (LC)-MS measurements. GlcNAc modified) in 100 μg WT mouse cerebrocortical peptides In the study reported herein, we modified the CEPC protocol enriched using CEPC. Of the three fragmentation methods, only and used it to enrich O-GlcNAcylated peptides from tissue samples ETD provided information regarding the location of the modifi- obtained from the brain cortex of one WT mouse and one 3xTg-AD cation sites. The basic tag (AMT-GalNAz) increased peptide model (22) mouse. CEPC enrichment of ∼100 μg mouse brain fragment efficiency of both ETD and HCD compared with that for cortex peptides from ∼3 mg WT tissue resulted in the identification untagged, native O-GlcNAc peptides (13). Overall, ETD generated of 249 O-GlcNAcylated proteins and 358 O-GlcNAc site assign- the greatest number of O-GlcNAcylated peptides, followed by ments from a single LC-MS/MS analysis, using an alternating CID/ HCD and CID (Fig. S2). In addition, the AMT-GalNAz-GlcNAc ETD approach. Enrichment of six 100-μg samples (three each for modification produced three major diagnostic oxonium fragment WT and 3xTg-AD mice) enabled identification of 274 O-GlcNA- ions for the intact sugars (204.09, 300.13, and 503.21 m/z) compared cylated proteins and 458 O-GlcNAc sites, many of which were with one (204.09 for HexNAc) from untagged, native O-GlcNAc previously unknown. Important findings include data that support peptides. The AMT-GalNAz tag allowed CID to detect at least extensive cross-talk between O-GlcNAcylated and phosphorylated two of the three major oxonium ions from intact sugars (Fig. S3) proteins involved in cerebrocortical processes, and unexpected sites despite the one-third low molecular-weight cutoff rule (23). Com- of extracellular O-GlcNAc modification. Specifically, we observed pared with CID, HCD consistently detected the three major di- O-GlcNAc on a secreted cytokine AIMP1, on EGF-like repeats in agnostic ions with higher mass accuracy and generally higher the extracellular domain of five other proteins, and on GlcNAc- intensity (Fig. S4). The results of this evaluation indicate that β-1,3-Fuc-O-EGF of versican that reflects the action of a Fringe combining tandem MS methods to obtain both site-specific (ETD) β-1,3-GlcNAc transferase. and diagnostic (HCD or CID) information can increase the con- fidence of O-GlcNAcylated peptide identification and site locali- Results and Discussion zation, consistent with previous studies (10, 13). In addition, Modified CEPC Approach for O-GlcNAc Peptide Enrichment. To max- alternating CID-ETD analyses have been advocated as a superior imize sensitivity and selectivity in the CEPC O-GlcNAc peptide method for identification of labile phosphopeptides (24). Our study enrichment method (Fig. 1A), we modified the enzymatic reactions further demonstrates the need to combine the orthogonal methods (SI Methods and SI Results and Discussion) to ensure high yields in of fragmentation for assigning labile PTMs. transfer of GalNAz to O-GlcNAcylated peptides by the mutant β-1,4-galactosyltransferase (GalT1Y289L), and effective removal of Known and Previously Unreported O-GlcNAcylated Proteins and Sites. N-glycans that may contain terminalGlcNAcbypeptide:N-glycosi- In this study, we analyzed cerebrocortical tissue from one 3xTg-AD dase F (PNGase F). We also incorporated additional aqueous mouse and one WT mouse (both female, 1 y old). The 3xTg-AD washes and an organic wash [70% (vol/vol) methanol in water] mouse, which overexpresses mutated amyloid-β precursor during avidin affinity enrichment (dashed boxes in Fig. 1A;workflow protein, tau, and preselinin-1, is a commonly used model of AD in Fig. 1B). The added washes increased the number of O-GlcNAc (22). Three samples from each mouse were analyzed (Fig. 1B).

2of6 | www.pnas.org/cgi/doi/10.1073/pnas.1200425109 Alfaro et al. Downloaded by guest on September 30, 2021 We identified 1,575 unique O-GlcNAcylated peptides (Table α-synuclein, several microtubule-associated proteins (Dataset S5), S1) corresponding to 555 unique peptide sequences (Dataset S1) and proteins involved in neurogenesis. and 274 O-GlcNAcylated unique proteins (Dataset S2). Overall, Because neurogenesis is impaired in AD (25), the O-GlcNA- 458 unambiguous O-GlcNAc sites (≤5% false localization rate; cylation status of these identified proteins (Dataset S4) may play Dataset S3) were assigned to 195 proteins, tripling the number of a role in AD. Comparison of the O-GlcNAcylated peptides O-GlcNAc sites reported in any single study (15, 19). Analysis of obtained from 3xTg-AD vs. WT cerebrocortical tissue showed the sequence around these O-GlcNAc sites revealed several that fewer (179 vs. 259) O-GlcNAcylated proteins were identified statistically significant (P ≤ 1E-6) motifs for O-GlcNAcylation from the 3xTg-AD mouse (Fig. 1 and Fig. S6). These results are (Fig. S5), among which P-X-gT-X-A and P-V-gS are enriched 14- consistent with previous observations of down-regulation of brain and 23-fold, respectively, compared with a dynamic statistical protein O-GlcNAcylation in AD due to metabolic impairment background of the entire mouse protein database. Some of the (8). In addition, some of the O-GlcNAcylated proteins present motif sequences agree with the previously reported OGT pre- only in the 3xTg-AD tissue may represent aberrantly O-GlcNA- ferred sequence P/V-P/V-V-gS/T-S/T (10, 19). Ontology analysis cylated proteins (Dataset S2). Future quantitative proteomic of the 274 O-GlcNAcylated proteins supports the involvement of studies that analyze brain tissue from several WT and 3xTg-AD O-GlcNAcylation in numerous cellular functions in the brain, mice will help to suggest roles for O-GlcNAcylation in AD. such as cytoskeleton organization, neurogenesis, synaptic trans- We also identified 168 O-GlcNAcylated proteins previously mission, learning, and memory (Dataset S4). unknown to be O-GlcNAc-modified, including 114 proteins (Table We localized known and previously unreported O-GlcNAc 1) that have confidently localized O-GlcNAc sites and 54 proteins sites to specific residues in many of the 106 previously reported (Table 2) that have defined tryptic peptide regions for the O-GlcNAcylated proteins, which include numerous neuronal pro- O-GlcNAc modification, but their modification residues are teins implicated in AD, such as synapsin I and II, synaptopodin, ambiguous. Many of these O-GlcNAc-proteins are cytoskeleton

Table 1. Identified O-GlcNAcylated proteins previously unknown to be O-GlcNAc-modified with confidently localized sites (≤5% false localization rate)

Gene Gene Accession symbol Site Accession symbol Site Accession symbol Site P51141 Dvl1 S383 Q91Z69 Srgap1 S982 Q80U40 Rimbp2 S681, T683

P62484 Abi2 T297 O08553 Dpysl2 S507 Q80UY2 Kcmf1 S262 BIOCHEMISTRY Q03173 Enah S362 O08599 Stxbp1 S511 Q80X80 C2cd2l T438, T447 Q2PFD7 Psd3 S245 O54967 Tnk2 T832, T833 Q811L6 Mast4 S2165 Q4JIM5 Abl2 T872 P0C090 Rc3h2 S592 Q8BG95 Ppp1r12b T542 Q811P8 Arhgap32 S1027 P0C7T6 Atxn1l S40 Q8BHL3 Tbc1d10b T43, T44, T162 Q8BJ42 Dlgap2 T633 P0CG14 Chtf8 S381 Q8BJM5 Slc30a6 T374 T363, S373, S381, Q8BL65 Ablim2 S412 P28652 Camk2b T325 Q8BU25 Pamr1 T267 Q8BRT1 Clasp2 T476 P31230 Aimp1 T91 Q8BXL9 Iffo1 T175 Q8CH77 Nav1 T543, T617, T619 P55937 Golga3 T207 Q8C0T5 Sipa1l1 T1402, S1403 Q8VDQ8 Sirt2 T366 P59644 Inpp5j S117 Q8CDG3 Vcpip1 T1072 Q8VHG2 Amot T196 P59764 Dock4 T1806 Q8CFE4 Scyl2 S741 Q922J3 Clip1 T150 P97789 Xrn1 S1668 Q8CGI1 Fam193a T706 Q9QXS1 Plec T2762 Q05793 Hspg2 T847 Q8K021 Scamp1 T59 T11, T16, T17, T540, Q9QYC0 Add1 S557, T558, T559 Q2M3X8 Phactr1 S337 Q8K3X4 Irf2bpl S159 Q9WUM3 Coro1b S421 Q2VWQ2 Nell1 T542 Q8R1X6 Spg20 T478 P06537 Nr3c1 S43 Q3UH68 Limch1 T506 Q8R3Y5 S354 P0C6A2 Mamld1 S253 Q3UNH4 Gprin1 T343 Q8VHW2 Cacng8 T381 P16951 Atf2 T272 Q3V0G7 Garnl3 S905 Q91V09 Wdr13 S140 P42227 Stat3 T717 Q499E5 Stox2 T863 Q91X58 Zfand2b T167 P58462 Foxp1 T446 Q4VAA2 Cdv3 T178 Q91XV3 Basp1 S169 P70365 Ncoa1 T401 Q571K4 Tab3 T385, S412 Q91Z67 Srgap2 S990 Q02780 Nfia T362 Q5FWH2 Unkl T459 Q922Y1 Ubxn1 T192 Q3UCQ1 Foxk2 S540 Q5QNQ6 Osbp2 T140 Q99KN9 Clint1 S328 Q61026 Ncoa2 T964 Q5SRX1 Tom1l2 T187, T188 Q9DAI6 Fam135b T989 Q62441 Tle4 T330 Q5SUE8 Ankrd40 S198, T199 Q9DAM7 T69, T72 Q64336 Tbr1 S647 Q61001 Lama5 S2140 Q9DCT8 Crip2 T88 T1276, T1796, Q68ED7 Crtc1 T417 Q62419 Sh3gl1 T284 Q9EPN1 Nbea T1797 Q6NXK2 Znf532 S455 Q64332 Syn2 S79 Q9ERQ3 Znf704 T468 Q8BT14 Cnot4 S316, T573 Q68FF7 Slain1 T411 Q9EST3 Eif4enif1 S416 S225, S226, Q8BW22 Ss18l1 T48 Q68FH0 Pkp4 S1087 Q9JI46 Nudt3 T159 Q91W39 Ncoa5 T521 Q69ZI1 Sh3rf1 T512 Q9QWY8 Asap1 T823 P70392 Rasgrf2 S763 Q6A058 Armcx2 T328 Q9QWZ1 Rad1 T232 P83510 Tnik S539, T577 Q6A0A2 Larp4b T51 Q9QY01 Ulk2 T613, T727 Q80YA9 Cnksr2 S329 Q6NXJ0 Wwc2 S520, T528 Q9QZR5 Hipk2 S1009 Q8CF89 Tab1 S393 Q6PFX7 T468 Q9R0Z9 Dlc1 S174 T1093, Q8CHG7 Rapgef2 S807, T1006, T1007 Q6RHR9 Magi1 T1094 Q9WTS4 Odz1 S237, T685 Q91WJ0 Frs3 S439 Q80TN7 Nav3 S1210 Q9WUU8 Tnip1 T103 Proteins are shaded in different colors according to their functional category. Blue, cytoskeleton; red, regu- lation of transcription; brown, signaling; unshaded, other; underlined entries, kinases. Protein names are in- cluded in Table S2.

Alfaro et al. PNAS Early Edition | 3of6 Downloaded by guest on September 30, 2021 Table 2. Identified O-GlcNAcylated proteins previously unknown to be O-GlcNAc-modified with modification site localized to a specific tryptic peptide region

Gene O-GlcNAcylation Gene O-GlcNAcylation Gene O-GlcNAcylation Accession symbol region* Accession symbol region* Accession symbol region* A2AHC3 Camsap1 370–396 Q62073 Map3k7 373–387 Q7TN29 Smap2 180–202 A2AKB4 Frmpd1 1206–1230 Q6PAJ1 Bcr 133–155 Q7TPM1 Prrc2b 1293–1311 P97434 Mprip 175–201 Q6PGG2 Gmip 578–591 Q80TL4 Kiaa1045 328–355 Q6DFV3 Arhgap21 312–331 Q9QZS8 Sh2d3c 420–439 Q80TZ3 Dnajc6 600–640 Q6PFD5 Dlgap3 552–575 B1AZP2 Dlgap4 258–290 Q80U78 Pum1 798–817 Q8BIE6 Frmd4a 926–953 O08919 Numbl 243–292 Q80VP1 Epn1 468–498 Q9JL04 Fmn2 243–264 P13595 Ncam1 894–938 Q80WC7 Agfg2 178–204, 445–479 Q9Z1K7 Apc2 2080–2096 P63250 Kcnj3 6–40 Q80YR4 Znf598 560–581 A2A884 Hivep3 973–991 Q03141 Mark3 536–552 Q8BXR9 Osbpl6 193–219 P42128 Foxk1 570–606 Q3UHC0 Tnrc6c 748–773 Q8BZB3 377–401 P45481 Crebbp 135–156 Q3UIL6 Plekha7 112–138 Q8C120 Sh3rf3 588–624 Q61818 Rai1 531–550 Q3UQN2 Fcho2 476–504 Q8CG79 Tp53bp2 323–343 Q80TZ9 Rere 1209–1246 Q4G0F8 Ubn1 988–1009 Q8R361 Rab11fip5 538–550 Q8CHY6 Gatad2a 311–339 Q501J7 Phactr4 190–213 Q8VIG0 Zcchc14 615–645 Q9JL19 Ncoa6 1218–1246 Q5SWP3 Nacad 1190–1210 Q91YD3 Dcp1a 497–533 O35099 Map3k5 1216–1246 Q69ZH9 Arhgap23 559–591 Q9CR95 Necap1 195–217 P97379 G3bp2 225–252 Q69ZR2 Hectd1 1339–1370 Q9DBG5 Plin3 69–84 Q3UHD9 Agap2 73–111, 226–258 Q6NS60 Fbxo41 377–392 Q9EP53 Tsc1 1057–1079 Proteins are shaded in different colors according to their functional category. Blue, cytoskeleton; red, regulation of transcription; brown, signaling; unshaded, other; underlined entries, kinases; bold, peptide sequences containing the Asp-Xaa-Ser/Thr motif. Protein names are included in Table S3. *Tryptic peptide residue range.

proteins, signaling proteins, or proteins involved in the regulation phorylation site on NEDD4-1 also occur within this binding domain. of transcription—all classes of proteins known to be O-GlcNA- NEDD4-1, TNIK, and Rap2A are known to form a complex that cylated (26, 27). We observed three O-GlcNAcylation sites regulates NEDD4-1–mediated Rap2A ubiquitination (35). To- (T1276, T1796, and T1797) on neurobeachin (Nbea), a protein gether, our findings suggest that cross-talk between phosphory- implicated in membrane protein traffic and autism, and required lation and O-GlcNAcylation may be involved in the NEDD4-1/ for the formation and functioning of central synapses (28). This TNIK/Rap2A (35) signaling pathway that regulates neurite growth. protein is a known phosphoprotein and binds to protein kinase A; We also suggest that this cross-talk may extend to ubiquitination, however, its O-GlcNAcylation status was previously unknown given that TNIK is required in this complex to enable ubiquiti- (29). Literature provides examples of cross-talk between nation of Rap2A (35) and that its interaction with NEDD4-1 may O-GlcNAcylation and phosphorylation (19, 26, 30), as well be regulated by O-GlcNAcylation/phosphorylation. as examples of O-GlcNAc regulating ubiquitination through E1 -activating enzyme (26, 31). We observed many O-GlcNAcylation on a Secreted Protein and on the Extracellular O-GlcNAcylated proteins previously shown to be involved in the Domains of Membrane Proteins. Fig. 2 shows subcellular localiza- cycling of either phosphorylation or ubiquitination, which provides tion of the 274 identified O-GlcNAcylated proteins. Previous indirect evidence of cross-talk between these modifications. These studies have demonstrated that O-GlcNAc modifies numerous proteins were not previously reported to be O-GlcNAc-modified nuclear and cytoplasmic proteins (26). Unexpectedly, we also and include 13 kinases (underlined entries in Tables 1 and 2), one identified O-GlcNAc modification on the extracellular EGF do- putative phosphatase (Dnajc6), three proteins involved in phos- main of five membrane proteins (Table 3), and on one secreted phatase regulation (Phactr1, Mprip, and Ppp1r12b), three E3 cytokine. Fig. S3 shows modification and site assignments derived ubiquitin-protein ligases (SH3RF1, HECTD1, and KCMF1), and from MS/MS spectra for the tryptic peptide CACLAGYTGQR one deubiquitinating protein (VCPIP1; Tables 1 and 2). from the EGF domain of Pamr1. To our knowledge, there are only All previously identified O-GlcNAcylated proteins are also two previous reports of extracellular O-GlcNAcylation, i.e., the known to be phosphorylated (26), which supports cross-talk be- extracellular domain of NOTCH and Dumpy in Drosophila (36, 37). tween the two modifications. There are 268 (>98%) of the 274 The O-GlcNAc transferase that attaches O-β-GlcNAc to O-GlcNAc-modified proteins identified in this study also known NOTCH is a distinct enzyme that is genetically unrelated to as phosphoproteins (32) (Dataset S2). We found that ∼24% of OGT (36). It resides in the endoplasmic reticulum (ER) within the identified O-GlcNAc sites have either reciprocal or proximal the secretory pathway and is termed EGF domain-specific (±10 aa residues) phosphorylation sites (Dataset S3), which O-GlcNAc transferase (EOGT) (36, 37). EOGT is conserved again suggests possible cross-talk between phosphorylation and O-GlcNAcylation in cerebrocortical processes. Some of these potential cross-talk cases that may regulate protein interactions 30 (e.g., EMSY with BRCA2 and HCFC1 with SIN3A) were also 25 fi Proteins 20 observed in HeLa cells (19). Our ndings include two mapped 15 O-GlcNAc sites (S539 and T577) on TNIK, a / 10

kinase previously unknown to be O-GlcNAcylated. These -GlcNAc 5 O-GlcNAc sites (S539 and T577) with proximal known phos- O 0 phorylation sites (S541, S545, and T552) (33) on TNIK are all % of located within its predicted (34) binding region to NEDD4-1, an E3 ubiquitin ligase. O-GlcNAcylation of human NEDD4-1 was recently reported (17), but with no site localization. We identified an O-GlcNAc site (T375) that is located within the experimentally determined TNIK binding region of NEDD4-1 Fig. 2. Cellular component annotation of identified O- (35). A known (S381) and a predicted (S385) (34) proximal phos- GlcNAcylated proteins.

4of6 | www.pnas.org/cgi/doi/10.1073/pnas.1200425109 Alfaro et al. Downloaded by guest on September 30, 2021 Table 3. O-GlcNAc modification sites on a secreted protein and on the extracellular domains of membrane proteins Gene Scan Accession symbol Modification site Motif sequence Localization

ETD P31230 Aimp1 T91 PLQTNCTASESVV Interaction region with HSP90B1 ETD Q05793 Hspg2 T847 ACAPGYTGRRCES EGF-like 3 domain ETD Q2VWQ2 Nell1 T542 VCPSGFTGSHCEK EGF-like 4 domain ETD Q61001 Lama5 S2140 TCPPGLSGERCDT EGF-like 22 domain ETD Q8BU25 Pamr1 T267 ACLAGYTGQRCEN EGF-like domain HCD* O35516 Notch2 T673? VCSPGFTGQRCNI EGF-like 17 domain

*The O-GlcNAc site identified by HCD is ambiguous due to the labile nature of the modification under this collision mode condition.

from Drosophila to mammals (37, 38). There is no evidence that Summary. To our knowledge, the present study has produced the intracellular forms of OGT or O-GlcNAcase occur in the ex- most comprehensive O-GlcNAc proteome to date in terms of tracellular or luminal spaces. Four O-GlcNAcylated peptides both protein identifications (274) and O-GlcNAc site assign- from these proteins were located within an EGF-like domain, ments (458) for mouse brain tissue, and used much smaller sharing a similar motif sequence CXXGXS/TGXXC to the samples (∼100 μg tissue peptide per enrichment) than in pre- reported extracellular O-GlcNAcylation on NOTCH and Dumpy vious studies (10, 15, 20). Our studies suggest roles for extensive in Drosophila and Notch1 in mammals (36–38). We also identified cross-talk between O-GlcNAc and other posttranslational mod- an O-GlcNAc site on the EGF-like domain of NOTCH2 protein of ifications in not only the regulation of normal neuronal func- the Notch signaling pathway, which we infer is on the T residue in tions, but also in the etiology of neurodegeneration. We dem- the YSCVCSPGFTGQR sequence, consistent with the CXXGXS/ onstrate the suitability of the CEPC method for global proteomic O TGXXC motif (Table 3). Our findings suggest that this EOGT (36) analysis of -GlcNAcylated peptides, and the potential for rapidly O has additional substrates. In fact, we determined that 91 mouse expanding the -GlcNAc proteome in brain and other tissues. proteins, 104 human proteins, and 18 Drosophila proteins contain This approach will enable high-resolution spatial mapping of O-GlcNAcylation patterns in brain samples from laser-capture BIOCHEMISTRY the CXXGXS/TGXXC motif (Datasets S6, S7,andS8). These O proteins are involved in the Notch signaling pathway, extracellular microdissection to gain insights into roles for -GlcNAcylation matrix (ECM)-receptor interactions, and other signaling pathways. in neurodegenerative disease. Among the proteins in both the human and mouse proteome that Methods contain this motif, ∼30% are localized within the ECM. The im- portance of this modification within the ECM has been demon- Sample Preparation. Mouse cerebrocortical tissue was homogenized in a strated in Drosophila, where loss of EOGT causes defects in the solution containing 6 M guanidine HCl, 10 mM DTT, 50 mM ammonium apical ECM (37). The proteins containing the CXXGXS/TGXXC bicarbonate [NH4HCO3 (pH 8.1)] with 1% (vol/vol) phosphatase inhibitor mixture 2% (vol/vol) (Sigma), and 100 nM PUGNAc. Details of protein motif appear conserved across species, with 83 proteins conserved digestion are described in SI Methods. between human and mouse, and 15 of 18 Drosophila proteins having orthologs in both human and mouse. O O -GlcNAc Enrichment. Enrichment was performed as previously described (18) We mapped an -GlcNAc site (T91) on AIMP1 that is known except where specified otherwise in SI Methods. to regulate angiogenesis, inflammation, and wound healing (39, O fi 40), and report -GlcNAc modi cation on a secreted cytokine. A LC-MS/MS Analysis. Samples were analyzed using a LTQ Orbitrap Velos MS recent study (41) showed AIMP1 plays a glucagon-like role in (Thermo Scientific) coupled to an automated dual-column metal-free nanoLC α glucose homeostasis and its secretion is induced by TNF or heat platform (49). Details of the separation and mass spectrometer parameters shock (40, 42). Motif analysis indicates two proximal phosphory- are described in SI Methods. lation sites (S99 and S107) of T91 are potential targets of the intracellular kinase MAPK. In addition, because AIMP1 is present Data Analysis. We used two database search engines, SEQUEST and Protein in multiple subcellular locations besides the extracellular space Prospector, to obtain more comprehensive peptide identifications. Data (34) and lacks an EGF repeat, the AIMP1 form we detected from were searched against a decoy protein database. Fully tryptic peptide iden- mouse cerebrocortical tissue is likely cytosolic, and the T91 is the tifications were filtered in a way that no reversed hits were left, with an substrate of OGT instead of EOGT. The O-GlcNAc site (T91), estimated zero false discovery rate (Datasets S9 and S10) (50). All identi- together with its reported five proximal phosphorylation sites (S99, fications were within ±5 ppm measured mass accuracy and required obser- S101, T105, S107, and S138), are all located within the HSP90B1 vation of oxonium ion fragments (204.0872, 300.1308, and 503.2101 m/z for interaction region of AIMP1. The potential cross-talk between HCD; 300.1308 and 503.2101 m/z for CID) in their corresponding HCD or CID these sites may be relevant to AIMP1 and HSP90B1 interactions scans with a mass tolerance of 0.0025 Da for HCD and 0.3 Da for CID, which that regulate KDELR1-mediated retention of HSP90B1/gp96 in were extracted using our updated MASIC software (51). Ascore (52) and SLIP score (built into the Protein Prospector search engine) (53) were used to es- the endoplasmic reticulum (34). fi fi Interestingly, we also identified a GlcNAc-β-1,3-Fuc-α-1-O-Thr timate the con dence of O-GlcNAc modi cation site assignment for SEQUEST and Protein Prospector search results, respectively. See SI Methods and SI site (T3103) in the EGF-like domain 2 (EGF2) of versican core fi protein (Fig. S3) within the tryptic peptide sequence NGAT# Results and Discussion for more details. All of the peptides identi ed from fi NEDD4-1, TNIK, AIMP1, and EGF-like repeats in the extracellular domain of CVDGFNTFR(#indicatesthemodication). The addition of fi O six proteins were manually con rmed by authors J.F.A., F.Y., and J.S. The six -fucose to EGF repeats is catalyzed by Pofut1 (43), and elongation N-linked GlcNAc peptides (Dataset S11) were also manually confirmed. of the monosaccharide is initiated by Fringe, an O-fucose–specific β-1,3-N-acetylglucosaminyltransferase (44, 45) to form the di- fi β Bioinformatics. Gene ontology annotation, cellular component, and bi- saccharide modi cation we detected. GlcNAc- -1,3-Fuc may be an ological process was performed using the Software Tool for Researching intermediate species before subsequent elongation by galactosyl- Annotations of Proteins, or STRAP (54). The pathway analysis was performed transferase and sialyltransferase to form a tetrasaccharide (46). using DAVID Bioinformatics Resources 6.7 as previously described (55). The Thr modified by O-fucose is in a predicted consensus site Briefly, all identified O-GlcNAc modified proteins were queried against the 2 3 (C XXXXS/TC ) between C2 and C3 of EGF repeats for O-fuco- mouse proteome as a background. The statistical enrichment was calculated 2 3 sylation (47, 48) and within EGF2 (C RNGATC ) of versican. for KEGG pathways identified from a protein list obtained during this study,

Alfaro et al. PNAS Early Edition | 5of6 Downloaded by guest on September 30, 2021 and pathways with P ≤ 0.05 were reported as significant. Prealigned se- This work was funded by PNNL Laboratory Directed Research Development quence was generated with six residues on either side of all of the un- funding (to F.Y.); three National Institutes of Health (NIH) grants (to R.D.S.), ambiguous O-GlcNAc sites. The sequence was subject to motif analysis online National Center for Research Resources Grant 5P41RR018522-10, National by motif-x (http://motif-x.med.harvard.edu/) (56), and the motifs were built Institute of General Medical Sciences Grants 8 P41 GM103493-10 and AG027429; NIH Grant GM 037537 (to D.F.H); NIH Grants N01-HV-00240, R01 through comparison with a dynamic statistical background of mouse protein fi CA42486, and P01HL107153 (to G.W.H); AG027429 and TW008123 (to database. The occurrences threshold was set at 20, and the signi cance C.-X.G.); and NIH National Cancer Institute Grant R01 36434 (to P.S.). Samples (P value) was 1E-6. were analyzed using capabilities developed under the support of the NIH National Center for Research Resources Grant RR018522 and the US De- ACKNOWLEDGMENTS. We thank Dr. Joshua Adkins at PacificNorthwest partment of Energy Biological and Environmental Research (DOE/BER). Work National Laboratory (PNNL) for helpful suggestions regarding the manuscript, was performed in the Environmental Molecular Science Laboratory, a DOE/ Robert Chalkley (University of California, San Francisco) for help with using BER national scientific user facility at PNNL in Richland, WA. PNNL is oper- Protein Prospector, and Ronald J. Moore for discussions regarding MS analysis. ated for the DOE by Battelle under Contract DE-AC05-76RLO-1830.

1. Kreppel LK, Blomberg MA, Hart GW (1997) Dynamic glycosylation of nuclear and 28. Medrihan L, et al. (2009) Neurobeachin, a protein implicated in membrane protein cytosolic proteins. Cloning and characterization of a unique O-GlcNAc transferase traffic and autism, is required for the formation and functioning of central synapses. with multiple tetratricopeptide repeats. J Biol Chem 272:9308–9315. J Physiol 587:5095–5106. 2. Okuyama R, Marshall S (2003) UDP-N-acetylglucosaminyl transferase (OGT) in brain 29. Wang XL, et al. (2000) Neurobeachin: A protein kinase A-anchoring, beige/Chediak-hi- tissue: Temperature sensitivity and subcellular distribution of cytosolic and nuclear gashi protein homolog implicated in neuronal membrane traffic. J Neurosci 20:8551–8565. enzyme. J Neurochem 86:1271–1280. 30. Hunter T (2007) The age of crosstalk: Phosphorylation, ubiquitination, and beyond. 3. Murrey HE, Hsieh-Wilson LC (2008) The chemical neurobiology of carbohydrates. Mol Cell 28:730–738. Chem Rev 108:1708–1731. 31. Guinez C, et al. (2008) Protein ubiquitination is modulated by O-GlcNAc glycosylation. 4. Vosseller K, et al. (2006) O-linked N-acetylglucosamine proteomics of postsynaptic FASEB J 22:2901–2911. density preparations using lectin weak affinity chromatography and mass spectrom- 32. Hornbeck PV, Chabra I, Kornhauser JM, Skrzypek E, Zhang B (2004) PhosphoSite: A etry. Mol Cell Proteomics 5:923–934. bioinformatics resource dedicated to physiological protein phosphorylation. Proteo- 5. Dias WB, Hart GW (2007) O-GlcNAc modification in diabetes and Alzheimer’s disease. mics 4:1551–1561. Mol Biosyst 3:766–772. 33. Huttlin EL, et al. (2010) A tissue-specific atlas of mouse protein phosphorylation and 6. Gong CX, Liu F, Grundke-Iqbal I, Iqbal K (2006) Impaired brain glucose metabolism expression. Cell 143:1174–1189. leads to Alzheimer neurofibrillary degeneration through a decrease in tau O- 34. Apweiler R, et al.; UniProt Consortium (2010) The Universal Protein Resource (UniProt) – GlcNAcylation. J Alzheimers Dis 9:1–12. in 2010. Nucleic Acids Res 38(Database issue):D142 D148. 7. Hoyer S (2004) Causes and consequences of disturbances of cerebral glucose metabolism 35. Kawabe H, et al. (2010) Regulation of Rap2A by the ubiquitin ligase Nedd4-1 controls – in sporadic Alzheimer disease: Therapeutic implications. Adv Exp Med Biol 541:135–152. neurite development. 65:358 372. 8. Liu F, et al. (2009) Reduced O-GlcNAcylation links lower brain glucose metabolism and 36. Matsuura A, et al. (2008) O-linked N-acetylglucosamine is present on the extracellular – tau pathology in Alzheimer’s disease. Brain 132:1820–1832. domain of notch receptors. J Biol Chem 283:35486 35495. 9. Yuzwa SA, et al. (2008) A potent mechanism-inspired O-GlcNAcase inhibitor that 37. Sakaidani Y, et al. (2011) O-linked-N-acetylglucosamine on extracellular protein do- blocks phosphorylation of tau in vivo. Nat Chem Biol 4:483–490. mains mediates epithelial cell-matrix interactions. Nat Commun 2:583. fi 10. Chalkley RJ, Thalhammer A, Schoepfer R, Burlingame AL (2009) Identification of 38. Sakaidani Y, et al. (2012) O-linked-N-acetylglucosamine modi cation of mammalian protein O-GlcNAcylation sites using electron transfer dissociation mass spectrometry Notch receptors by an atypical O-GlcNAc transferase Eogt1. Biochem Biophys Res Commun 419(1):14–19. on native peptides. Proc Natl Acad Sci USA 106:8894–8899. 39. Ko YG, et al. (2001) A cofactor of tRNA synthetase, p43, is secreted to up-regulate 11. Wang Z, Hart G (2008) Glycomic approaches to study GlcNAcylation: Protein identifi- proinflammatory . J Biol Chem 276:23028–23033. cation, site-mapping, and site-specific O-GlcNAc quantitation. Clin Proteomics 4:5–13. 40. Park SG, et al. (2005) The novel cytokine p43 stimulates dermal fibroblast pro- 12. Greis KD, et al. (1996) Selective detection and site-analysis of O-GlcNAc-modified liferation and wound repair. Am J Pathol 166:387–398. glycopeptides by beta-elimination and tandem electrospray mass spectrometry. Anal 41. Park SG, et al. (2006) Hormonal activity of AIMP1/p43 for glucose homeostasis. Proc Biochem 234:38–49. Natl Acad Sci USA 103:14913–14918. 13. Zhao P, et al. (2011) Combining high-energy C-trap dissociation and electron transfer dis- 42. Barnett G, et al. (2000) adenocarcinoma cells release the novel proin- sociation for protein O-GlcNAc modification site assignment. JProteomeRes10:4088–4104. flammatory polypeptide EMAP-II in response to stress. Cancer Res 60:2850–2857. 14. Vosseller K, Wells L, Hart GW (2001) Nucleocytoplasmic O-glycosylation: O-GlcNAc 43. Wang Y, et al. (2001) Modification of epidermal growth factor-like repeats with O- and functional proteomics. Biochimie 83:575–581. fucose. Molecular cloning and expression of a novel GDP-fucose protein O-fucosyl- 15. Myers SA, Panning B, Burlingame AL (2011) Polycomb repressive complex 2 is nec- transferase. J Biol Chem 276:40338–40345. essary for the normal site-specific O-GlcNAc distribution in mouse embryonic stem 44. Brucker K, Perez L, Clausen H, Cohen S (2000) Glycosyltransferase activity of fringe cells. Proc Natl Acad Sci USA 108:9490–9495. modulates Notch-Delta interactions. Nature 406:411–415, and erratum (2000) 407:654. 16. Teo CF, et al. (2010) Glycopeptide-specific monoclonal antibodies suggest new roles 45. Moloney DJ, et al. (2000) Fringe is a glycosyltransferase that modifies Notch. Nature for O-GlcNAc. Nat Chem Biol 6:338–343. 406:369–375. 17. Zaro BW, Yang YY, Hang HC, Pratt MR (2011) Chemical reporters for fluorescent 46. Moloney DJ, et al. (2000) Mammalian Notch1 is modified with two unusual forms of detection and identification of O-GlcNAc-modified proteins reveal glycosylation of O-linked glycosylation found on epidermal growth factor-like modules. J Biol Chem the ubiquitin ligase NEDD4-1. Proc Natl Acad Sci USA 108:8146–8151. 275:9604–9611. 18. Wang ZH, et al. (2010) Enrichment and site mapping of O-linked N-acetylglucosamine 47. Rampal R, Luther KB, Haltiwanger RS (2007) Notch signaling in normal and disease by a combination of chemical/enzymatic tagging, photochemical cleavage, and states: Possible therapies related to glycosylation. Curr Mol Med 7:427–445. – electron transfer dissociation mass spectrometry. Mol Cell Proteomics 9:153 160. 48. Rana NA, Haltiwanger RS (2011) Fringe benefits: Functional and structural impacts of 19. Wang Z, et al. (2010) Extensive crosstalk between O-GlcNAcylation and phosphory- O-glycosylation on the extracellular domain of Notch receptors. Curr Opin Struct Biol lation regulates cytokinesis. Sci Signal 3:ra2. 21:583–589. 20. Rexach JE, Clark PM, Hsieh-Wilson LC (2008) Chemical approaches to understanding 49. Zhao R, et al. (2009) Automated metal-free multiple-column nanoLC for improved – O-GlcNAc glycosylation in the brain. Nat Chem Biol 4:97 106. phosphopeptide analysis sensitivity and throughput. J Chromatogr B Analyt Technol 21. Khidekel N, et al. (2007) Probing the dynamics of O-GlcNAc glycosylation in the brain Biomed Life Sci 877:663–670. – using quantitative proteomics. Nat Chem Biol 3:339 348. 50. Elias JE, Haas W, Faherty BK, Gygi SP (2005) Comparative evaluation of mass spectrometry ’ 22. Oddo S, et al. (2003) Triple-transgenic model of Alzheimer s disease with plaques and platforms used in large-scale proteomics investigations. Nat Methods 2:667–675. – tangles: Intracellular Abeta and synaptic dysfunction. Neuron 39:409 421. 51. Monroe ME, Shaw JL, Daly DS, Adkins JN, Smith RD (2008) MASIC: A software pro- 23. Cunningham C, Jr., Glish GL, Burinsky DJ (2006) High amplitude short time excitation: gram for fast quantitation and flexible visualization of chromatographic profiles A method to form and detect low mass product ions in a quadrupole ion trap mass from detected LC-MS(/MS) features. Comput Biol Chem 32:215–217. spectrometer. J Am Soc Mass Spectrom 17:81–84. 52. Beausoleil SA, Villén J, Gerber SA, Rush J, Gygi SP (2006) A probability-based approach 24. Kim MS, Zhong J, Kandasamy K, Delanghe B, Pandey A (2011) Systematic evaluation for high-throughput protein phosphorylation analysis and site localization. Nat Bio- of alternating CID and ETD fragmentation for phosphorylated peptides. Proteomics technol 24:1285–1292. 11:2568–2572. 53. Chalkley RJ, Baker PR, Trinidad JC (2011) Modification site localization scoring in- 25. Lazarov O, Mattson MP, Peterson DA, Pimplikar SW, van Praag H (2010) When neu- tegrated into a search engine. Mol Cell Proteomics 10(7):1–9. rogenesis encounters aging and disease. Trends Neurosci 33:569–579. 54. Bhatia VN, Perlman DH, Costello CE, McComb ME (2009) Software tool for researching 26. Butkinaree C, Park K, Hart GW (2010) O-linked beta-N-acetylglucosamine (O-GlcNAc): annotations of proteins: Open-source protein annotation software with data visual- Extensive crosstalk with phosphorylation to regulate signaling and transcription in ization. Anal Chem 81:9819–9823. response to nutrients and stress. Biochim Biophys Acta 1800:96–106. 55. Huang W, Sherman BT, Lempicki RA (2009) Bioinformatics enrichment tools: Paths toward 27. Hart GW, Slawson C, Ramirez-Correa G, Lagerlof O (2011) Cross talk between O- the comprehensive functional analysis of large gene lists. Nucleic Acids Res 37:1–13. GlcNAcylation and phosphorylation: Roles in signaling, transcription, and chronic 56. Schwartz D, Gygi SP (2005) An iterative statistical approach to the identification of protein disease. Annu Rev Biochem 80:825–858. phosphorylation motifs from large-scale data sets. Nat Biotechnol 23:1391–1398.

6of6 | www.pnas.org/cgi/doi/10.1073/pnas.1200425109 Alfaro et al. Downloaded by guest on September 30, 2021