ABSTRACT

KALMAR, JACLYN. Development of Innovative Strategies for the Analyses of Complex Biological Systems Using Mass Spectrometry. (Under the direction of Dr. David C. Muddiman).

Mass spectrometry (MS) is a powerful analytical tool due to its versatility, specificity, and sensitivity. MS has allowed for the proliferation in collection of molecular level –omics data, which, in turn, has provided deep insights into various complex biological systems. This work discusses new strategies for the analysis of biological molecules thought be involved in the pathogenicity of Rice blast disease and Alzheimer’s disease.

Infrared Matrix-Assisted Laser Desorption Electrospray Ionization (IR-MALDESI) mass spectrometry imaging was used to identify meta-metabolomic features of

Magnaporthe oryzae infected barley leaves. Three separate sets of barley were inoculated with Wild type (WT) M. oryzae, an F-box E3 ligase knock out (E3 ligase

KO) M. oryzae, or a control solution. Over the course of the infection, each treatment was imaged using an advanced polarity switching method, allowing the detection of low and high molecular weight compounds that ionize in positive or negative polarities. Serotonin, a barley defense metabolite, was putatively identified using MS1 data then confirmed with tandem mass spectrometry fragmentation patterns. Metabolites in the melanin pathway, important for infection development of M. oryzae, were also identified using MS1 data but were unable to be confirmed due to their low abundances. Molecules related to the pathogenicity of the fungus were only found in the samples treated with the wild type M. oryzae where those treated with the genetically modified version displayed no metabolic

changes related to a fungal infection.

A label-free, quantitative proteomic analysis was performed on the mycelia from the same wild type and genetic knock versions of M. oryzae. The post-translational modifications (PTMs) thought to be involved with the knockout protein, phosphorylation and ubiquitination, were also investigated. A total of 4,432 were identified in the

WT and E3 ligase KO samples. Eighty-nine proteins were increased and 69 proteins were decreased in the E3 ligase KO strain. Sixty proteins were unique to the WT strain; 13 of which had both phosphorylation and ubiquitination PTMs. Seventy-one proteins were unique to the E3 ligase KO strain; 24 having both phosphorylation and ubiquitination

PTMs. Several proteins were associated with key biological processes and greatly assisted in the selection of future genes for functional studies and enabling mechanistic insight related to virulence.

The inclusion of systems suitability workflows to ensure optimal and reproducible data collection is critically important. We created mixed system suitability samples by evaluating the addition of either the 6 × 5 Promega Reference mix and the 7 × 5 Pierce™

System Suitability Standard or both with commercially available HeLa cell lysate. These system suitability mixtures provided the capability of monitoring identification (e.g. spectral matches, peptide identifications, etc.), identification free (e.g., mass measurement accuracy, and chromatography), and quantitative (LOD, LOQ, and dynamic range) metrics from a single injection.

The analysis of N-linked glycans using MS presents significant challenges, owing to their hydrophilic nature. To address these difficulties, a variety of derivatization methods has been developed to improve ionization and detection sensitivity. The

Individuality Normalization when Labeling with Isotopic Glycan Hydrazide Tags

(INLIGHT)TM strategy for labeling glycans, has been utilized in the analysis of N-linked glycans. The protocol using INLIGHT TM derivatization and subsequent analysis were investigated. Optimization of the modified method resulted in 20-100 times greater peak areas for the detected N-linked glycans in fetuin and horseradish peroxidase including the identification of low abundance glycans, (Fuc)1(Gal)2(GlcNAc)4(Man)3(NeuAc)1 and

(Gal)3(GlcNAc)5(Man)3(NeuAc)3.

The analysis of glycans has been hindered by the lack of software. GlycoHunter, a user-friendly software created in MATLAB enables researchers to accurately and efficiently process MS1 glycomics data where a NAT and SIL pairs are generated for relative quantification, including but not limited to, INLIGHT™.

© Copyright 2021 by Jaclyn Kalmar

All Rights Reserved

Development of Innovative Strategies for the Analyses of Complex Biological Systems Using Mass Spectrometry.

by Jaclyn Kalmar

A dissertation submitted to the Graduate Faculty of North Carolina State University in partial fulfillment of the requirements for the degree of Doctor of Philosophy

Chemistry

Raleigh, North Carolina 2021

APPROVED BY:

______Dr. David C. Muddiman Dr. Michael S. Bereman Committee Chair

______Dr. Erin S. Baker Dr. Edmond F. Bowden

______Dr. Ralph A. Dean

DEDICATION

To my husband Ben, my love and best friend. Thank you for all of your encouragement. I could not have done this without you.

ii

BIOGRAPHY

Jaclyn (Jackie) Gowen Kalmar was born in Harborcreek, Pennsylvania on the shores of Lake Erie. Jackie loved science and thought her mission in life was to become a surgeon. She began volunteering at a local hospital in high school through a student volunteer program and was placed in the operating room which turned out to be one of the most influential summers of her life. She observed a few procedures in the operating room and when a surgeon realized she was a student he gave her a full tour of the opened thoracic cavity. From that first surgical experience until halfway through college Jackie was riveted by the idea of becoming a surgeon. Her passion lead her to take a job as a patient transporter at the same hospital so she could interact with various clinical personnel and see the interworking’s of healthcare. But, as fate would have it, her fascination for chemistry was also growing and eventually won her heart and mind. During the rest of her college experience Jackie joined biophysical chemist, Dr. Mary Grace

Galinato’s research team and worked on analyzing the kinetics of newly reconstituted nitrate reductase enzymes. She also had the opportunity to work with Dr. Bruce

Wittmershaus, a physics professor, investigating new fluorescent materials for luminescent solar concentrators. Jackie also ventured into industry as well and completed an internship in the biodiesel industry testing product quality and the developing improvements to make to the processing line starting with their raw material. Jackie received a BS in Chemistry with a minor in Biology from Pennsylvania State University in

2016 and went on to pursue a Ph.D. in Analytical Chemistry at North Carolina State under the direction of Dr. David Muddiman.

iii

TABLE OF CONTENTS

LIST OF TABLES ...... ix

LIST OF FIGURES...... x

LIST OF PUBLICATIONS ...... xii

Chapter 1: Introduction ...... 1

1.1. Systems biology ...... 1

1.2. Rice Blast Disease ...... 1

1.3. Alzheimer’s Disease ...... 3

1.4. Metabolomics ...... 5

1.5. Proteomics ...... 6

1.6. Glycomics ...... 7

1.7. Mass Spectrometry ...... 10

1.8. Separation and Ionization ...... 10

1.9. Electrospray Ionization ...... 11

1.10. Matrix-Assisted Laser Desorption Electrospray Ionization (MALDESI) ...... 13

1.11. Fourier transform mass spectrometry ...... 15

1.12. Research Synopsis ...... 17

1.13. Literature Cited ...... 20

Chapter 2: Investigating Host-Pathogen Meta-Metabolic Interactions of Magnaporthe oryzae Infected Barley Using Infrared Matrix-Assisted Laser

Desorption Electrospray Ionization Mass Spectrometry ...... 32

iv

2.1. Introduction ...... 32

2.2. Materials and Methods ...... 34

2.2.1. Fungal Growth ...... 34

2.2.2. Barley and Inoculation ...... 34

2.2.3. IR-MALDESI Mass Spectrometry Imaging ...... 35

2.2.4. Feature Identification ...... 40

2.2.5. Follow-on Targeted Tandem Mass Spectrometry ...... 41

2.3. Results and Discussion ...... 41

2.4. Conclusions ...... 51

2.5. Acknowledgements ...... 51

2.6. Literature Cited ...... 52

Chapter 3: Comparative Proteomic Analysis of Wild Type and Mutant Lacking a

SCF E3 Ligase F-Box Protein in Magnaporthe oryzae ...... 59

3.1. Introduction ...... 59

3.2. Materials and Methods ...... 61

3.2.1. Sample Preparation ...... 61

3.2.2. Protein Digestion, Alkylation, and Desalting ...... 62

3.2.3. NanoLC-MS/MS Using HRAM MS Platform Technologies ...... 62

3.2.4. Data Analysis ...... 63

v

3.3. Results and Discussion ...... 65

3.3.1. E3 Ligase protein expressed in WT but not KO strain ...... 66

3.3.2. Technical analysis of protein profiles for WT and gene KO

strains ...... 70

3.3.3. Proteome comparison of WT and E3 ligase KO strains ...... 74

3.3.3.1. Occurrence of both ubiquitination and phosphorylation

on proteins ...... 76

3.4. Conclusions ...... 77

3.5. Acknowledgements ...... 78

3.6. Literature Cited ...... 79

Chapter 4: Simultaneous Monitoring of Identification, Identification-Free, and Quantitative Metrics to Assess Systems Suitability in LC-MS/MS Based

Proteomic Experiments ...... 87

4.1. Introduction ...... 87

4.2. Materials and Methods ...... 89

4.2.1. Materials ...... 89

4.2.2. Sample Preparation ...... 90

4.2.3. NanoLC-MS/MS ...... 91

4.2.4. Data Analysis ...... 92

vi

4.3. Results and Discussion ...... 93

4.4. Conclusions ...... 105

4.5. Acknowledgements ...... 106

4.6. Literature Cited ...... 107

Chapter 5: Enhanced Protocol for Quantitative N-linked Glycomics Analysis

Using Individuality Normalization when Labeling with Isotopic Glycan

Hydrazide Tags (INLIGHT)™ ...... 109

5.1. Introduction ...... 109

5.2. Experimental ...... 112

5.2.1. Materials ...... 112

5.2.2. Modified N-linked Glycan Preparations Protocol ...... 113

5.2.3. INLIGHT™ Derivatization of N-linked Glycans ...... 114

5.2.4. NanoLC-MS/MS Analysis ...... 116

5.2.5. UHPLC MS/MS Analysis ...... 117

5.2.6 Data Analysis ...... 118

5.3. Results and Discussion ...... 119

5.4. Conclusions ...... 129

5.5. Acknowledgements ...... 130

5.6 Compliance with Ethical Standards ...... 130

vii

5.7. Literature Cited ...... 131

Chapter 6: GlycoHunter: An Open-Source Software for the Detection,

Identification, and Relative Quantification of INLIGHT™ Labeled N-linked

Glycans ...... 143

6.1. Introduction ...... 143

6.2. Experimental ...... 146

6.3. Results and Discussion ...... 147

6.4. Conclusions ...... 159

6.5. Acknowledgements ...... 160

6.6. Literature Cited ...... 161

Appendices ...... 170

Appendix A. Supplemental Materials for Chapter 3 ...... 171

A.1 Literature Cited ...... 196

Appendix B. Supplemental Materials for Chapter 4 ...... 197

Appendix C. Supplemental Materials for Chapter 5 ...... 199

Appendix D. Supplemental Materials for Chapter 6 ...... 207

viii

LIST OF TABLES

Table 3.1 Proteins up- and down-regulated and unique to the WT or E3 Ligase KO mycelia samples ...... 75 Table 4.1 Sample contents for the mixed system suitability standard study ...... 90 Table 5.1 Differences between 2 derivatization conditions used for labeling of released biological N-linked glycans ...... 115 Table A.1 Proteins with increased abundance in the E3 Ligase KO samples ...... 172 Table A.2 Proteins with decreased abundance in the E3 Ligase KO samples ...... 175 Table A.3 Proteins unique to the WT samples ...... 178 Table A.4 Proteins unique to the E3 Ligase KO samples ...... 180 Table A.5 Post-translational modification sites ...... 183 Table A.6 List of hypothetical proteins with Pfam annotation ...... 186 Table C.1 Optimized gradient and mobile phase conditions used for the analysis of INLIGHT™ derivatized N-linked glycans and maintenance of the Porous Graphitic Carbon (PGC) column ...... 199 Table C.2 Optimized gradient and mobile phase conditions used for the analysis of INLIGHT™ derivatized N-linked glycans using Hydrophilic Interaction Chromatography (HILIC) column ...... 199 Table C.3 Optimized gradient and mobile phase conditions used for the analysis of INLIGHT™ derivatized N-linked glycans using reversed-phase C18 column ...... 200 Table C.4 N-linked glycan identifications in the fetuin samples ...... 201 Table C.5 N-linked glycan identifications in the horseradish peroxidase ...... 202 Table D.1 Operation Timing for Loading Data into GlycoHunter ...... 207 Table D.2 Operation Timing for Finding Peak Pairs using GlycoHunter ...... 207 Table D.3 Operation Timing for Exporting Data into Excel files and Skyline Transition Lists from GlycoHunter ...... 208 Table D.4 Glycan Features identified using GlycoHunter (GH) then Skyline ...... 209

ix

LIST OF FIGURES

Figure 1.1 Asexual disease cycle of M. oryzae ...... 2 Figure 1.2 N-glycosylated proteins involved in the progression of AD ...... 5 Figure 1.3 Structures of the most common monosaccharide units ...... 8 Figure 1.4 N-linked glycosylation process and post processing through the endoplasmic reticulum and golgi apparatus ...... 9 Figure 1.5 Schematic of the mechanism of Electrospray Ionization (ESI) and the hydrophobic ion effect using N-linked glycans as an example ...... 13 Figure 1.6 Schematic of the mechanism of Matrix-Assisted Laser Desorption Electrospray Ionization (MALDESI) ...... 14 Figure 1.7 Schematic of the Thermo Fisher Q-Exactive series mass spectrometers and a schematic of Orbitrap current analysis and subsequent mathematical transformations resulting in m/z spectra ...... 17 Figure 2.1 Extended polarity switching acquisition scheme ...... 36 Figure 2.2 Representation of the extended polarity switching during data collection and analysis ...... 39 Figure 2.3 Positive ionization and negative ionization mode generated the same feature ...... 42 Figure 2.4 Experimental MS1 data and MS2 data for putative identification of serotonin ...... 44 Figure 2.5 Fungal biosynthesis of melanin with corresponding MSI images ...... 47 Figure 2.6 Longitudinal comparison of WT, E3 ligase KO, and Control ...... 49 Figure 3.1 Experimental workflow for the analysis of WT and E3 Ligase KO M. oryzae ...... 66 Figure 3.2 LC-MS/MS evidence of the E3 Ligase KO ...... 69 Figure 3.3 Normalized spectral counting scatterplots of WT and E3 ligase KO technical replicates ...... 71 Figure 3.4 Proteins discovered in the WT and E3 ligase KO samples ...... 73 Figure 4.1 Adaptation of both reference mix protocols ...... 94 Figure 4.2 Chromatograms of the 6 × 5 mix, the 7 × 5 mix and a combined sample of both mixes ...... 97

x

Figure 4.3 Example normalized standard curves for the 6 × 5 mix and the 7 × 5 mix and the calculated LOD and LOQ values for each peptide as well as the whole sample average ...... 100 Figure 4.4 Mass measurement accuracy box plots for all peptides in the 6 × 5 mix and the 7 × 5 mix spiked into 200 ng HeLa cell lysate by concentration ...... 102 Figure 4.5 The average number of HeLa proteins identified and peptide spectral matches (PSMs) obtained from each sample composition ...... 104 Figure 5.1 Modified FANGS workflow for glycan sample preparation ...... 114 Figure 5.2 PNGase F enzyme comparisons ...... 121 Figure 5.3 P2PGN concentration optimization for maximum peak area ...... 123 Figure 5.4 Sensitivity enhancement utilizing the modified method ...... 125 Figure 5.5 LC separation and sensitivity optimization for INLIGHT™ derivatized glycans ...... 126 Figure 6.1 GlycoHunter graphical user interface (GUI) ...... 148 Figure 6.2 GlycoHunter informatics workflow ...... 150 Figure 6.3 Export Peaks GUI and informatics workflow ...... 155 Figure 6.4 Identified paired features using GlycoHunter compared to values found manually from previous analysis and an example mass spectrum of a glycan ...... 158 Figure A.1 Unnormalized spectral counting scatterplots ...... 171 Figure B.1 Normalized standard curves for the remaining peptides in the 6 × 5 mix ...... 197 Figure B.2 Normalized standard curves for the remaining peptides in the 7 × 5 mix ...... 198 Figure D.1 GlycoHunter Peak Identification search parameters in the Excel export file ...... 212 Figure D.2 GlycoHunter peak pair identification results ...... 213 Figure D.3 GlycoHunter Peak Pair identifications grouped by charge state ...... 214 Figure D.4 Example Skyline transition list exported from GlycoHunter...... 215

xi

LIST OF PUBLICATIONS

1. Kalmar, J.G., Oh, Y., Dean, R.A., Muddiman, D.C. “Investigating host-pathogen meta-metabolic interactions of Magnaporthe oryzae infected barley using infrared matrix-assisted laser desorption electrospray ionization mass spectrometry.” Anal. Bioanal. Chem. 2020, 412, 139-147.

2. Kalmar, J.G., Oh, Y., Dean, R.A., Muddiman, D.C. “Comparative Proteomic Analysis of Wild Type and Mutant Lacking a SCF E3 Ligase F-box Protein in Magnaporthe oryzae.” J. Proteome Res. 2020, 19 (9), 3761-3768.

3. Kalmar, J.G., Bereman, M.S., Muddiman, D.C. “Simultaneous Monitoring of Identification, Identification-Free, and Quantitative Metrics to Assess Systems Suitability in LC-MS/MS Based Proteomic Experiments.” in final preparation.

4. Kalmar, J.G., Butler, K.E., Baker, E.S., Muddiman, D.C. “Enhanced Protocol for Quantitative N-linked Glycomics Analysis Using Individuality Normalization when Labeling with Isotopic Glycan Hydrazide Tags (INLIGHT)™.” Anal. Bioanal. Chem. 2020, 412 (27), 7569-7579.

5. Kalmar, J.G., Garrard, K.P., Muddiman, D.C. “GlycoHunter: An Open-Source Software for the Detection, Identification, and Relative Quantification of INLIGHT™ Labelled N-linked Glycans.” J. Proteome Res.. 2020, submitted.

xii

Chapter 1

Introduction 1.1 Systems Biology

Systems level biology is frequently described as trying to understand the bigger picture (i.e. entire organisms, whole tissues or even specific cell types) by studying the pieces of that system. The analysis of these ‘pieces’ is typically done at the molecular level and includes thorough investigations into four major areas: DNA, RNA, proteins and metabolites. These areas of study are often referred to as different –omic fields such as genomics, transcriptomics, proteomics, and metabolomics. Recently, many –omic fields have developed however, they stem from these for main areas such as specific subclasses or interactions between the groups. Researchers have been able to collect immense amounts of data for each of these areas using newer concepts such as high- throughput analysis that allows for the large amounts of data to lead to discoveries about each subject such as the structure, function, and changes within organisms. This work discusses new strategies for the analysis of metabolites and proteins in Rice Blast disease and changes in N-linked glycans associated with Alzheimer’s disease.

1.2 Rice Blast Disease

Rice, Oryza sativa, is one of the most widely consumed cereal crops around the world next to corn and wheat. Rice is a staple food for many countries across the globe, especially in Asia1. Across the globe, over 400 million tons of rice are milled each year.

1

Figure 1.1. Asexual disease cycle of M. oryzae. Figure recreated from Dean et al.4

Unfortunately, 10-30% of the annual crop is destroyed by the rice blast pathogen,

Magnaporthe oryzae2. M. oryzae is a filamentous fungus that is the primary cause of rice blast disease. Not only does it destroy large amount of rice, it can also infect other cereal plants including millet, wheat, and barley3. Figure 1.1 shows the M. oryzae infection process which begins with the attachment of the spore, known as the conidium, to the host surface4. Then, the spore germinates and forms a specialized structure known as the appressorium. The appressorium forms a melanized surface that begins to develop significant turgor pressure due to the building up of infectious material coming from the conidium4. When the pressure is high enough, an infectious peg is forced into the host plant cell beginning the infection and, ultimately, the death of the plant. Under favorable conditions, the infection forms lesions that then can produce thousands of conidia to continue the cycle. The lesions can continue to produce conidia for over twenty days. The cycle of the disease from conidia to conidia can occur within a week.

2

The M. oryzae protein MGG_13065, SCF E3 Ubiquitin Ligase complex F-box protein, was identified as being an important protein in appressorium development and pathogenicity. This protein is part of the E3 SCF complex in the ubiquitin mediated proteolysis pathway5,6. Due to the presence of a leucine rich region (LRR) in its sequence, it interacts with phosphorylated sites of targeted proteins allowing them to be brought into close proximity to the E2 conjugation ligase and tagged with the ubiquitin protein or multiple times resulting in the polyubiquitin chain. These characteristics make the evaluation of MGG_13065 of great interest for a better understanding of how the ubiquitin mediated proteolysis pathway is involved in virulence. The genome of M. oryzae was sequenced in 2005 as the biological and chemical understanding of the pathogen- host relationship was lacking, noting the importance of studying it as it provides a good template for many ‘OMIC’ analyses to utilize for understanding of its pathogenic features4,7

1.3 Alzheimer’s Disease

Alzheimer’s disease (AD) is the 6th leading cause of death in the United States.

There are roughly 5.7 million people living with Alzheimer’s in the US, 96% of which are

65 years or older. AD is a neurodegenerative disease that results in changes in memory, thinking and behavior. The primary symptom of AD is an abnormal increase in loss of memory. Other symptoms such as confusion, problems with reading and writing, personality changes, and difficulty organizing thoughts can also be a sign that a loved one has AD. Unfortunately, the symptoms of AD can only be treated and the disease itself cannot currently be prevented, slowed, or cured8.

3

AD is a neurodegenerative disease that is defined by brain regional accumulation of amyloid plaques and neurofibrillary tangles while also causing tremendous amounts of brain matter degeneration. The extracellular plaques are primarily made from protein deposits of beta-amyloid (Aβ), and a pathologic form of tau, called paired helical filament (PHF) tau, the primary protein component on largely intraneuronal neurofibrillary degeneration. Aβ plaques are formed when the amyloid precursor protein

(APP) is sent through a β-cleavage pathway, where APP is sent to the β-site APP cleaving enzyme 1 (BACE1) then a γ-secretase resulting in the toxic protein9,10. Under normal conditions, APP undergoes an α-cleavage when sent to the α-secretase

(ADAM10) and then a γ-secretase resulting in the non-toxic protein p39,11. APP, BACE1,

ADAM10 and other significant proteins related to the pathogenicity of AD are all N- glycosylated (Figure 1.2). The glycosylation of tau causes a change in its conformation and is thought to facilitate its hyperphosphorylation by changing the susceptibility to phosphorylation12–14. Genetic and proteomic studies have elucidated multiple genetic risk factors and other relevant proteins related to AD progression, suggesting there is far more left to understand about the molecular basis of AD15,16 including abnormalities caused by dysregulation of glycosylation events.

4

Figure 1.2. Some of the N-glycosylated proteins involved in the progression AD. This figure has been recreated and modified from Kizuka et al.9

1.4 Metabolomics

Metabolomics is the large-scale study of exogenous and endogenous small molecule intermediates and products of an organism’s metabolism17. There are mainly two types of metabolites, primary and secondary. Primary metabolites are molecules essential for the function of the organism where secondary metabolites are important but not essential to the functions of the organism. They can vary with changes in physiological state, between organisms, and in overall abundances though, they are supposedly more closely related to the phenotypes observed18,19. There are multiple types of instrumentation used for studying metabolomics including infrared (IR) spectroscopy,

NMR, and, mass spectrometry20. The two most common strategies often used with these instruments are called metabolic fingerprinting and metabolic profiling. Metabolic fingerprinting is an untargeted approach to gather a lot of information and can be used to define overall changes in the biology20. Metabolic profiling is a targeted approach where a list of generated metabolites, usually in specific biological pathways or previously

5

defined biomarkers, are identified and quantified20. Using metabolomics to study M. oryzae has allowed for the discovery of metabolites that are specific to the pathogenesis of the fungus including glutamine21, oxylipins22,23, pyriculol24, and trehalose25.

1.5 Proteomics

Coined in 1994 by Marc Wilkins as an analogy to genomics, or the study of an organism’s genes, proteomics is the large-scale study of the function, localization, interactions and modifications of proteins in organisms and how they change in time, space, and location26,27. The ability of identifying and understanding the proteins for a single organism is a non-trivial event. The complexity and diversity of a proteome can begin to be understood through understanding how they are synthesized. The most recent count of protein coding genes in the is roughly 19,000 on 23 pairs of chromosomes28. This is compared the M. oryzae genome consisting of over 12,000 coding and non-coding genes on 7 chromosomes29. These thousands of genes are then transcribed to mRNA which can further be diversified through alternative splicing30. These extremely diverse mRNA can then be translated into thousands of proteins which can further be modified with over 200 types of post translational modifications (PTMs)31.

Collectively, each individual protein with specific sequence and PTM status is referred to as a proteoform32.

There are many ways to study proteins including immunoassays, electrophoresis, nuclear magnetic resonance, x-ray crystallography and, most notably, mass spectrometry. Studying proteomics using a mass spectrometer is often done using one

6

of two fundamental strategies characterized as top-down or bottom-up. Top down proteomics keeps the protein in its native state and uses only MS/MS fragmentation to interpret the various proteoforms. Bottom-up proteomics is the most commonly used strategy to identify proteins and their associated modifications using mass spectrometry.

This method is done by first digesting the proteins with a protease (e.g. trypsin, chymotrypsin)33 into peptides which are then fragmented with MS/MS. In both workflows, identification of the protein is determined by the precursor mass and the identity of the fragment ions. Proteomics has been demonstrated as a viable tool for studying M. oryzae.

For example, the field has helped determine that G-protiens34, myosins35, calpains36 and, cAMP-dependent protein kinases37,38, along with specific PTMs such as ubiquitin39–41 , glycosylation42–44 and, phosphorylation45,46, are all important factors in regulating the pathogenicity of the fungus.

1.6 Glycomics

Glycosylation of proteins is a crucial co/post-translational modification associated with intra- and extracellular communication47, protein folding and stability48, and protein trafficking49. Glycans are linked polysaccharide units primarily comprised of the monomers found in Figure 1.3. The monomers are often denoted using shapes and colors to as the structure become very large and complex. Examples of the shapes and colors associated with their corresponding monomer structures50,51 can also be found in

Figure 1.3 and throughout the remainder of this document. There are two main types of glycans: O-linked and N-linked. N-linked glycans are covalently bound to the nitrogen atom of an asparagine (N) residue through a glycosidic bond that often resides in the

7

sequon N-X-S/T where X is any amino acid except proline52. O-linked glycans are covalently bound to the oxygen molecules in serine (S), threonine (T) and sometimes tyrosine (Y)53. N-linked glycan studies are of great interest for assessment of disease onset and progression as their disruption have been associated with neurodegenerative disorders54,55, inflammation-based diseases56, and numerous types of cancer49,57, among many others. More than 70% of proteins found in eukaryotic systems have the N-X-S/T sequon, necessary for N-glycosylation and as a consequence58, more than half of proteins are decorated with N-linked glycans59.

Figure 1.3. Structures of the most common monosaccharide units.

N-linked glycans begin assembly in the cytoplasm when an N-acetylglucosamine

(GlcNac) monosaccharide is bound to the lipid structure, dolichol phosphate52. The

8

glycans are then continuously added to using a series of glycotransferases found in the cytoplasm and the endoplasmic reticulum (ER). Further modifications can occur in the ER and golgi apparatus which results in 3 main types of N-linked glycans: high mannose, hybrid and complex52 (Figure 1.4). No matter the category final structure can be classified in, N-linked glycans contain a core structure that is comprised of two GlcNac units and three mannose units seen throughout Figure 1.4.

Figure 1.4. N-linked glycosylation process and post processing through the endoplasmic reticulum and golgi apparatus resulting in three major types of N-linked glycans: High mannose, hybrid and complex.

There are many methods that exist for the removal and purification of N-linked glycans from glycoproteins, one of the most common methods utilizes the enzyme peptide-N-glycosidase F (PNGase F) for glycan release 60. PNGase F cleaves N-linked glycans from glycoproteins at the nitrogen located in N, converting it to aspartic acid (D) and leaving the glycan structure intact52. This cleavage and conversion results in a 0.084

Da mass shift, allowing the assessment of potential glycosylation sites on the peptide and

9

complete characterization of the released glycan. The released glycans can then be analyzed either directly or following chemical derivatization, with chromatography coupled with optical detectors or mass spectrometry (MS). The use of MS-based approaches for the analysis of glycans has become highly preferable as it allows for MS and MS/MS structural elucidation61,62.

1.7 Mass Spectrometry

Mass spectrometry (MS) is a powerful analytical tool in the field of science due to its versatility, specificity, and sensitivity especially when working with complex mixtures with significant chemical diversity. Fundamentally, MS revolves around the separation, ionization of analytes and subsequent mass analysis based on their mass-to-charge ratio

(m/z). Separation, ionization, and mass analysis, specifically the Orbitrap mass analyzer, pertaining to the completed analyses discussed below.

1.8 Separation and Ionization Methods

Generally, metabolomics, proteomics and glycomics include complex mixtures from the samples of interest and require separation techniques for their identifications.

The golden standard for these analyses is high performance liquid chromatography

(HPLC) connected to a mass spectrometer. HPLC is able to separate complex mixtures by creating partitioning interactions between a flowing mobile phase and a stationary phase comprised of varying chemistry. The most commonly used stationary phase for metabolomics and proteomics consists of varying lengths of alkyl chains, C18 being the

10

most popular, and is referred to as reversed-phase liquid chromatography (RPLC). For maximum identifications, solvent composition, gradient lengths and column lengths need to be optimized. The integration of multiple separation techniques (e.g., hydrophilic interaction liquid chromatography) are designed provide complementary information due to the fact that polar molecules interact with the hydrophobic carbon chains are poorly

(unretained species). Glycans are difficult to ionize due to their innate hydrophilicity and hydrophobic bias of the electrospray droplets63 and often separated using other types of stationary phases including hydrophilic interaction liquid chromatography (HILIC)64 or porous graphitic carbon (PGC) chromatography65. However, the introduction a variety of derivatization methods have been employed to facilitate glycan analysis including protection of the alcohol functional groups, as seen with permethylation66, or adding chemical labels to the reducing end of the sugar, ass seen with the individuality normalization when labeling with glycan hydrazide tags (INLIGHT) strategy67–70. The addition of derivatization reagents increases the hydrophobicity of glycans by expanding their non-polar surface area (NPSA), significantly enhancing their surface activity for better separations and increasing their likelihood of ionization. Overall, LC is integral to most metabolomics, proteomic, and glycomic workflows to allow separation of complex mixtures as it is automated, reproducible, and efficient.

1.9 Electrospray Ionization

Electrospray ionization is a soft ionization technique for biomolecules invented by

John Fenn and coworkers71. This technique has allowed for the direct coupling of LC to

MS71. The during the continuous flow of dilute solution carrying analytes and the addition

11

of an applied potential causes the oxidation (Eq. 1.1 – positive ion mode) and reduction

(Eq. 1.2 – negative ion mode) of water to form.

+ - 2H2O → O2 + 4H + 4e (Equation 1.1)

- - 2H2O + 2e → H2 + OH (Equation 1.2)

The formation of charged droplets by charge separation at surface of liquid, buildup of charge causes elongation of solvent droplets at the emitter tip called the Taylor cone.

Once it surpasses the Rayleigh limit (Eq. 1.3.), the spray undergoes coulombic fission and droplets are expelled from Taylor cone.

2 3 qr = √64휋 휀0훾푟 (Equation 1.3)

A drop in the potential between the emitter and the inlet to the MS causes the charged droplets to travel toward MS. As they move through air, they undergo solvent evaporation, and once the droplet reaches the Rayleigh limit again, they undergo coulombic fission and eject highly charged progeny droplets. These droplets undergo the same process and result in formation of smaller and smaller droplets until they reach the

MS inlet and are completely desolvated leaving just a charged analyte. Two accepted proposals of the mechanism of ion formation are the Charged Residue Model proposed by Dole in 196872 and the Ion Evaporation Model proposed by Iribarne and Thompson in

197673. The hydrophobicity of the molecules can have an effect on which ionization mechanism the molecules follow. Often referred to as hydrophobic bias, molecules that are hydrophilic in nature, such as glycans, tend to remain solvated longer then hydrophobic molecules. Hydrophobic molecules are attracted to the charges found uniformly around the surface of the droplet then, as the droplets evaporate and coulombic

12

fission occurs, the hydrophobic species are expelled from the surface of the droplet becoming ionized similarly the ion evaporation model. Several studies using various molecules as peptides74–76, nucleic acids63 and glycans70 have all reported similar findings. However, the exact mechanism of ion formation from the charged droplets are still debated today. A schematic of ESI can be found below in Figure 1.577.

Figure 1.5. Schematic of the mechanism of Electrospray Ionization (ESI) and the hydrophobic ion effect using N-linked glycans as an example. This figure was modified with permission from Elizabeth Hecht77.

1.10 Matrix-assisted laser desorption electrospray ionization (MALDESI)

Matrix-assisted laser desorption electrospray ionization (MALDESI) is an ionization technique invented by Muddiman and coworkers in 200678. This technique occurs at atmospheric pressure and combines matrix assisted laser desorption/ionization

(MALDI) and ESI. MALDI, introduced by Tanaka79 and Karas and Hillenkamp80, is an ionization technique where analytes are imbedded in an energy absorbing matrix then irradiated by laser pulses. When irradiated, the matrix absorbs the most of the laser

13

energy causing desorption of the neutral and charged matrix and analyte molecules. A series of charge transfers occur between the matrix and analytes during the flight to the

MS, resulting in charged species. Similar to MALDI, MALDESI uses a laser, ultraviolet

(UV) or infrared (IR) to ablate the tissue of interest. In this instance, a 2.94 μm laser is used to excite the O-H stretching of water to desorb the material. The neutral material then partitions into an orthogonal electrospray beam and subsequently ionized in an ESI- like fashion mentioned above. MALDESI is a useful tool for the mass spectrometry imaging (MSI) of biological samples giving MS data in a spatial resolved way.

Figure 1.6. Schematic of the mechanism of Matrix-Assisted Laser Desorption

Electrospray Ionization (MALDESI). This figure was modified with permission from Milad

Nazari40.

14

1.11 Fourier Transform Mass Spectrometry (FTMS)

No matter the method of ionization, the ionized analytes are measured with a mass analyzer based on a mass-to-charge ratio (m/z). While there are a variety of mass analyzers including a time-of-flight (TOF), magnetic sector, quadrupole mass filters, and ion-traps, only two are based on Fourier transform. These two are the Fourier Transform

Ion Cyclotron Resonance (FT-ICR) and the Orbitrap. Both FT-ICR and Orbitrap mass analyzers allow for the generation of high resolving power and accurate mass measurements. In the experiments described below, two Thermo Q-Exactive series instruments, the Q- Exactive High Field instrument and the Q- Exactive Plus, are used.

With slight differences between the two, the main structure remains the same as show below in Figure 1.7A.

Ions enter through a heated capillary inlet and are funneled through series of rings referred to as the S-lens. The ions then go through the bent flatapole where the neutral species are lost as they cannot make the turn. The ions are then transferred to the quadrupole where m/z selection occurs. The chosen m/z ions are collected in the C-trap based on automated gain control (AGC), which limits the number of ions collected to an optimized amount, or the injection time (IT) in which there is a set amount of time ions can accumulate before being sent for analysis. These two setting are dependent on one another in that the C-trap will fill with ions until one or AGC or IT is reached first. Upon reaching the C-trap, the ions are thermally cooled by interacting with nitrogen gas. From the C-trap, the ions can either be sent to the higher-energy collisional dissociation (HCD) cell for fragmentation or to the Orbitrap for analysis.

15

The Orbitrap mass analyzer itself consists of a central spindle shaped electrode with bell-shaped outer electrodes. Once the packet of ions is injected into the Orbitrap, an electric field is applied causing the ions to rotate as well as oscillate around the center electrode81–83 (Fig. 1.7B). The outer electrodes detect the current generated by the oscillations, which are then recorded as sinusoidal wave components. These are then transformed to frequency using a Fourier transform algorithm. The frequency can then be used to calculate m/z according to the equation seen in Figure 1.7B. Resolving power up to 280, 000 and accurate mass of < 3 ppm can be achieved on either the Q-Exactive

HF or the Q-Exactive Plus.

16

A)

B)

Figure 1.7. A) Schematic of the Thermo Fisher Q-Exactive series mass spectrometers.

B) Schematic of Orbitrap current analysis and subsequent mathematical transformations resulting in m/z spectra. This figure was adapted from Makarov and coworkers81.

1.12 Research Synopsis

This dissertation focuses on the analysis of different biological molecules thought to play major roles in the pathogenicity of two disease Rice blast disease and Alzheimer’s disease. Chapter 2 describes the mass spectrometry imaging analysis of barley leaves

17

that were infected with either the wild type M. oryzae or a genetically modified version of

M. oryzae where protein MGG_13065 was removed. Due to the timing of the MALDESI ionization technique and the data collection from the mass spectrometer the m/z range is limited to multiples of 4 (e.g. 100-400 m/z, 250-1000 m/z, etc.). This lead to the development of an extended polarity switching method allowing for the discovery of as many metabolites as possible within the range of 100-1200 m/z. Several analytes were found in both positive and negative mode allowing the ability to narrow down the options for putative identifications. Molecules related to the pathogenicity of the fungus were only found in the samples treated with the wild type M. oryzae where the barley leaves treated with the genetically modified version displayed no metabolic changes related to a fungal infection. Chapter 3 describes the comparative proteomic analysis of the mycelium of the same fungi found in Chapter 2, the wild type M. oryzae to the E3 ligase knockout version.

The post-translational modifications thought to be involved with the knockout protein, phosphorylation and ubiquitination, were also investigated.

Chapter 4 discusses the creation of mixed suitability samples from commercially available materials including the Pierce™ 7 × 5 LC-MS/MS System Suitability Standard and Promega 6 × 5 LC-MS/MS Peptide Reference Mix spiked into Pierce™ HeLa digest standard. The 7 × 5 and 6 × 5 mixes included a series of peptide isotopologues that are included at increasing concentrations. The peptides and isotopologues allowed for LC metrics (e.g. RTs and peak widths at full width half max (FWHM)) to be evaluated. From the isotopologue information, we were also able to monitor other metrics including mass measurement accuracy (MMA), dynamic range, limit of detection (LOD) and limit of quantification (LOQ). Using the HeLa digest, we were able to monitor changes in protein

18

IDs as well as the number of peptide spectral matches (PSMs) within a whole proteome.

Chapter 5 elaborates on the improvements made to the chemical derivatization protocol for Individuality Normalization when Labeling with Isotopic Glycan Hydrazide

Tags (INLGIHT™). The INLIGHT™ tags add a non-polar molecule to the reducing end of the glycans, which allows for easier separation and better ionization results. Optimizing the protocol allowed for the detection of previously unseen glycans that were thought to be undetected due to ion suppression from the excess tag. In future studies, this protocol will be used for the analysis of N-linked glycans related to Alzheimer’s disease. Chapter

6 further investigates the data collected for Chapter 5 using GlycoHunter. GlycoHunter is a software that has been developed to search full scan mass spectral data for pairs related to natural and stable isotope labelled glycans and can perform relative quantification from the abundance of the pairs. How the software works will be fully elaborated upon in the chapter.

The Appendices contain the supplemental materials for each chapter including step-by-step protocol for new INLIGHT™ derivatization protocol.

19

1.13 LITERATURE CITED

(1) Umadevi, M.; Pushpa, R.; Sampathkumar, K. P.; Bhowmik, D. Rice-Traditional

Medicinal Plant in India. J. Pharmacogn. Phytochem. 2012, 1 (1).

(2) Greer, C. A.; Webster, R. K. Occurrence, Distribution, Epidemiology, Cultivar

Reaction, and Management of Rice Blast Disease in California. Plant Dis. 1096,

85 (10).

(3) Couch, B. C.; Kohn, L. M. A Multilocus Gene Genealogy Concordant with Host

Preference Indicates Segregation of a New Species, Magnaporthe Oryzae, from

M. Grisea. Mycologia 2002, 94 (4), 683.

(4) Dean, R. A.; Talbot, N. J.; Ebbole, D. J.; Farman, M. L.; Mitchell, T. K.; Orbach, M.

J.; Thon, M.; Kulkarni, R.; Xu, J.-R.; Pan, H.; Read, N. D.; Lee, Y.-H.; Carbone, I.;

Brown, D.; Oh, Y. Y.; Donofrio, N.; Jeong, J. S.; Soanes, D. M.; Djonovic, S.;

Kolomiets, E.; Rehmeyer, C.; Li, W.; Harding, M.; Kim, S.; Lebrun, M.-H.; Bohnert,

H.; Coughlan, S.; Butler, J.; Calvo, S.; Ma, L.-J.; Nicol, R.; Purcell, S.; Nusbaum,

C.; Galagan, J. E.; Birren, B. W. The Genome Sequence of the Rice Blast Fungus

Magnaporthe Grisea. Nature 2005, 434 (7036), 980–986.

(5) Liu, T.-B.; Xue, C. The Ubiquitin-Proteasome System and F-Box Proteins in

Pathogenic Fungi. Mycobiology 2011, 39 (4), 243–248.

(6) Gorelik, M.; Orlicky, S.; Sartori, M. A.; Tang, X.; Marcon, E.; Kurinov, I.;

Greenblatt, J. F.; Tyers, M.; Moffat, J.; Sicheri, F.; Sidhu, S. S. Inhibition of SCF

Ubiquitin Ligases by Engineered Ubiquitin Variants That Target the Cul1 Binding

Site on the Skp1–F-Box Interface. Proc. Natl. Acad. Sci. 2016.

20

(7) Kim, Y.; Nandakumar, M. P.; Marten, M. R. Proteomics of Filamentous Fungi.

Trends Biotechnol. 2007, 25 (9), 395–400.

(8) Alzheimer’s Association. 2018 ALZHEIMER’S DISEASE FACTS AND FIGURES;

2018.

(9) Kizuka, Y.; Kitazume, S.; Taniguchi, N. N-Glycan and Alzheimer’s Disease.

Biochimica et Biophysica Acta - General Subjects. Elsevier B.V. October 1, 2017,

pp 2447–2454.

(10) Haass, C.; Kaether, C.; Thinakaran, G.; Sisodia, S. Trafficking and Proteolytic

Processing of APP. Cold Spring Harb. Perspect. Med. 2012, 2 (5), a006270.

(11) Lichtenthaler, S. F. Alpha-Secretase in Alzheimer’s Disease: Molecular Identity,

Regulation and Therapeutic Potential. Journal of Neurochemistry. John Wiley &

Sons, Ltd January 1, 2011, pp 10–21.

(12) Liu, F.; Zaidi, T.; Iqbal, K.; Grundke-Iqbal, I.; Merkle, R. K.; Gong, C.-X. Role of

Glycosylation in Hyperphosphorylation of Tau in Alzheimer’s Disease. FEBS Lett.

2002, 512 (1–3), 101–106. https://doi.org/10.1016/S0014-5793(02)02228-7.

(13) Gong, C.-X.; Liu, F.; Grundke-Iqbal, I.; Iqbal, K. Post-Translational Modifications

of Tau Protein in Alzheimer’s Disease. J. Neural Transm. 2005, 112 (6), 813–838.

(14) Liu, F.; Zaidi, T.; Iqbal, K.; Grundke-Iqbal, I.; Gong, C.-X. Aberrant Glycosylation

Modulates Phosphorylation of Tau by Protein Kinase A and Dephosphorylation of

Tau by Protein Phosphatase 2A and 5. Neuroscience 2002, 115 (3), 829–837.

(15) Masters, C. L.; Bateman, R.; Blennow, K.; Rowe, C. C.; Sperling, R. A.;

21

Cummings, J. L. Alzheimer’s Disease. Nat. Rev. Dis. Prim. 2015, 1, 15056.

(16) Schedin-Weiss, S.; Winblad, B.; Tjernberg, L. O. The Role of Protein

Glycosylation in Alzheimer Disease. FEBS J. 2014, 281 (1), 46–62.

(17) Oliver, S. G.; Winson, M. K.; Kell, D. B.; Baganz, F. Systematic Functional

Analysis of the Yeast Genome. Trends Biotechnol. 1998, 16 (9), 373–378.

(18) Fiehn, O. Metabolomics – the Link between Genotypes and Phenotypes. Plant

Mol. Biol. 2002, 48 (1/2), 155–171.

(19) Zampieri, M.; Sauer, U. Metabolomics-Driven Understanding of Genotype-

Phenotype Relations in Model Organisms. Curr. Opin. Syst. Biol. 2017, 6, 28–36.

(20) Ellis, D. I.; Dunn, W. B.; Griffin, J. L.; William Allwood, J.; Goodacre, R.; for

correspondence, A. Metabolic Fingerprinting as a Diagnostic Tool.

Pharmacogenomics 2007, 8 (9), 1243–1266.

(21) Huang, H.; Nguyen Thi Thu, T.; He, X.; Gravot, A.; Bernillon, S.; Ballini, E.; Morel,

J.-B. Increase of Fungal Pathogenicity and Role of Plant Glutamine in Nitrogen-

Induced Susceptibility (NIS) To Rice Blast. Front. Plant Sci. 2017, 8, 265.

(22) Yara, A.; Yaeno, T.; Montillet, J.-L.; Hasegawa, M.; Seo, S.; Kusumi, K.; Iba, K.

Enhancement of Disease Resistance to Magnaporthe Grisea in Rice by

Accumulation of Hydroxy Linoleic Acid. Biochem. Biophys. Res. Commun. 2008,

370 (2), 344–347.

(23) Wennman, A.; Jernerén, F.; Magnuson, A.; Oliw, E. H. Expression and

Characterization of Manganese Lipoxygenase of the Rice Blast Fungus Reveals

22

Prominent Sequential Lipoxygenation of α-Linolenic Acid. Arch. Biochem.

Biophys. 2015, 583, 87–95.

(24) Jacob, S.; Grötsch, T.; Foster, A. J.; Schüffler, A.; Rieger, P. H.; Sandjo, L. P.;

Liermann, J. C.; Opatz, T.; Thines, E. Unravelling the Biosynthesis of Pyriculol in

the Rice Blast Fungus Magnaporthe Oryzae. Microbiology 2017, 163 (4), 541–

553.

(25) Foster, A. J.; Jenkinson, J. M.; Talbot, N. J. Trehalose Synthesis and Metabolism

Are Required at Different Stages of Plant Infection by Magnaporthe Grisea.

EMBO J. 2003, 22 (2), 225–235.

(26) Wasinger, V. C.; Cordwell, S. J.; Cerpa-Poljak, A.; Yan, J. X.; Gooley, A. A.;

Wilkins, M. R.; Duncan, M. W.; Harris, R.; Williams, K. L.; Humphery-Smith, I.

Progress with Gene-Product Mapping of the Mollicutes:Mycoplasma Genitalium.

Electrophoresis 1995, 16 (1), 1090–1094.

(27) Graves, P. R.; Haystead, T. A. J. Molecular Biologist’s Guide to Proteomics.

Microbiol. Mol. Biol. Rev. 2002, 66 (1), 39–63; table of contents.

(28) Piovesan, A.; Antonaros, F.; Vitale, L.; Strippoli, P.; Pelleri, M. C.; Caracausi, M.

Human Protein-Coding Genes and Gene Feature Statistics in 2019. BMC Res.

Notes 2019, 12 (1), 315.

(29) Kim, S.; Park, J.; Park, S.-Y.; Mitchell, T. K.; Lee, Y.-H. Identification and Analysis

of in Planta Expressed Genes of Magnaporthe Oryzae. BMC Genomics 2010, 11

(1), 104.

23

(30) Black, D. L. Mechanisms of Alternative Pre-Messenger RNA Splicing. Annu. Rev.

Biochem. 2003, 72 (1), 291–336.

(31) Duan, G.; Walther, D. The Roles of Post-Translational Modifications in the

Context of Protein Interaction Networks. PLoS Comput. Biol. 2015, 11 (2),

e1004049.

(32) Smith, L. M.; Kelleher, N. L.; Proteomics, T. C. for T. D.; Linial, M.; Goodlett, D.;

Langridge-Smith, P.; Goo, Y. A.; Safford, G.; Bonilla*, L.; Kruppa, G.; Zubarev, R.;

Rontree, J.; Chamot-Rooke, J.; Garavelli, J.; Heck, A.; Loo, J.; Penque, D.;

Hornshaw, M.; Hendrickson, C.; Pasa-Tolic, L.; Borchers, C.; Chan, D.; Young*,

N.; Agar, J.; Masselon, C.; Gross*, M.; McLafferty, F.; Tsybin, Y.; Ge, Y.;

Sanders*, I.; Langridge, J.; Whitelegge*, J.; Marshall, A. Proteoform: A Single

Term Describing Protein Complexity. Nat. Methods 2013, 10 (3), 186–187.

(33) Giansanti, P.; Tsiatsiani, L.; Low, T. Y.; Heck, A. J. R. Six Alternative Proteases

for Mass Spectrometry–Based Proteomics beyond Trypsin. Nat. Protoc. 2016, 11

(5), 993–1006.

(34) Liu, S.; Dean, R. A. G Protein α Subunit Genes Control Growth, Development,

and Pathogenicity of Magnaporthe Grisea. Mol. Plant-Microbe Interact. 1997, 10

(9), 1075–1086.

(35) Guo, M.; Tan, L.; Nie, X.; Zhang, Z. A Class-II Myosin Is Required for Growth,

Conidiation, Cell Wall Integrity and Pathogenicity of Magnaporthe Oryzae.

Virulence 2017, 8 (7), 1335–1354.

(36) Liu, X.-H.; Ning, G.-A.; Huang, L.-Y.; Zhao, Y.-H.; Dong, B.; Lu, J.-P.; Lin, F.-C.

24

Calpains Are Involved in Asexual and Sexual Development, Cell Wall Integrity

and Pathogenicity of the Rice Blast Fungus. Sci. Rep. 2016, 6, 31204.

(37) Mitchell, T. K.; Dean, R. A. The CAMP-Dependent Protein Kinase Catalytic

Subunit Is Required for Appressorium Formation and Pathogenesis by the Rice

Blast Pathogen Magnaporthe Grisea. Plant Cell 1995, 7 (11), 1869–1878.

(38) Lee, Y. H.; Dean, R. A. CAMP Regulates Infection Structure Formation in the

Plant Pathogenic Fungus Magnaporthe Grisea. Plant Cell 1993, 5 (6), 693–700.

(39) Oh, Y.; Franck, W. L.; Han, S.-O.; Shows, A.; Gokce, E.; Muddiman, D. C.; Dean,

R. A. Polyubiquitin Is Required for Growth, Development and Pathogenicity in the

Rice Blast Fungus Magnaporthe Oryzae. 2012, 7 (8), e42868.

(40) Moon, J.; Parry, G.; Estelle, M. The Ubiquitin-Proteasome Pathway and Plant

Development. Plant Cell 2004, 16 (12), 3181–3195.

(41) Parker, J.; Oh, Y.; Moazami, Y.; Pierce, J. G.; Loziuk, P. L.; Dean, R. A.;

Muddiman, D. C. Examining Ubiquitinated Peptide Enrichment Efficiency through

an Epitope Labeled Protein. Anal. Biochem. 2016, 512, 114–119.

(42) Deshpande, N.; Wilkins, M. R.; Packer, N.; Nevalainen, H. Protein Glycosylation

Pathways in Filamentous Fungi. Glycobiology 2008, 18 (8), 626–637.

(43) Chen, X.-L.; Shi, T.; Yang, J.; Shi, W.; Gao, X.; Chen, D.; Xu, X.; Xu, J.-R.; Talbot,

N. J.; Peng, Y.-L. N-Glycosylation of Effector Proteins by an α-1,3-

Mannosyltransferase Is Required for the Rice Blast Fungus to Evade Host Innate

Immunity. Plant Cell 2014, 26 (3), 1360–1376.

25

(44) Yang, T.; Stoopen, G.; Yalpani, N.; Vervoort, J.; De Vos, R.; Voster, A.;

Verstappen, F. W. A.; Bouwmeester, H. J.; Jongsma, M. A. Metabolic Engineering

of Geranic Acid in to Achieve Fungal Resistance Is Compromised by Novel

Glycosylation Patterns. Metab. Eng. 2011, 13, 414–425.

(45) Franck, W. L.; Gokce, E.; Randall, S. M.; Oh, Y.; Eyre, A.; Muddiman, D. C.;

Dean, R. A. Phosphoproteome Analysis Links Protein Phosphorylation to Cellular

Remodeling and Metabolic Adaptation during Magnaporthe Oryzae Appressorium

Development. J. Proteome Res. 2015, 14 (6), 2408–2424.

(46) Akutsu, M.; Dikic, I.; Bremm, A. Ubiquitin Chain Diversity at a Glance. J. Cell Sci.

2016, 129 (5), 875–880.

(47) Parker, R. B.; Kohler, J. J. Regulation of Intracellular Signaling by Extracellular

Glycan Remodeling. ACS Chem. Biol. 2010, 5 (1), 35–46.

(48) Jayaprakash, N. G.; Surolia, A. Role of Glycosylation in Nucleating Protein

Folding and Stability. Biochem. J. 2017, 474 (14), 2333–2347..

(49) Bard, F.; Chia, J. Cracking the Glycome Encoder: Signaling, Trafficking, and

Glycosylation. Trends Cell Biol. 2016, 26 (5), 379–388.

(50) Varki, A.; Cummings, R. D.; Aebi, M.; Packer, N. H.; Seeberger, P. H.; Esko, J.

D.; Stanley, P.; Hart, G.; Darvill, A.; Kinoshita, T.; Prestegard, J. J.; Schnaar, R.

L.; Freeze, H. H.; Marth, J. D.; Bertozzi, C. R.; Etzler, M. E.; Frank, M.;

Vliegenthart, J. F. G.; Lütteke, T.; Perez, S.; Bolton, E.; Rudd, P.; Paulson, J.;

Kanehisa, M.; Toukach, P.; Aoki-Kinoshita, K. F.; Dell, A.; Narimatsu, H.; York,

W.; Taniguchi, N.; Kornfeld, S. Symbol Nomenclature for Graphical

26

Representations of Glycans. Glycobiology 2015, 25 (12), 1323–1324.

(51) Neelamegham, S.; Aoki-Kinoshita, K.; Bolton, E.; Frank, M.; Lisacek, F.; Lütteke,

T.; O’Boyle, N.; Packer, N. H.; Stanley, P.; Toukach, P.; Varki, A.; Woods, R. J.

Updates to the Symbol Nomenclature for Glycans Guidelines. Glycobiology 2019,

29 (9), 620–624.

(52) Stanley, P.; Taniguchi, N.; Aebi, M. Chapter 9. N-Glycans, Essentials of

Glycobiology, 2nd Edition. Essentials Glycobiol. 2017, 1–14.

(53) Zarschler, K.; Janesch, B.; Pabst, M.; Altmann, F.; Messner, P.; Schäffer, C.

Protein Tyrosine O-Glycosylation-A Rather Unexplored Prokaryotic Glycosylation

System. Glycobiology 2010, 20 (6), 787–798.

(54) Cho, B. G.; Veillon, L.; Mechref, Y. N-Glycan Profile of Cerebrospinal Fluids from

Alzheimer’s Disease Patients Using Liquid Chromatography with Mass

Spectrometry. J. Proteome Res. 2019, 18 (10), 3770–3779..

(55) Abou-Abbass, H.; Abou-El-Hassan, H.; Bahmad, H.; Zibara, K.; Zebian, A.;

Youssef, R.; Ismail, J.; Zhu, R.; Zhou, S.; Dong, X.; Nasser, M.; Bahmad, M.;

Darwish, H.; Mechref, Y.; Kobeissy, F. Glycosylation and Other PTMs Alterations

in Neurodegenerative Diseases: Current Status and Future Role in Neurotrauma.

Electrophoresis. Wiley-VCH Verlag June 1, 2016, pp 1549–1561.

(56) McCarthy, C.; Saldova, R.; Wormald, M. R.; Rudd, P. M.; McElvaney, N. G.;

Reeves, E. P. The Role and Importance of Glycosylation of Acute Phase Proteins

with Focus on Alpha-1 Antitrypsin in Acute and Chronic Inflammatory Conditions.

J. Proteome Res. 2014, 13 (7), 3131–3143.

27

(57) Wang, H.; Ramakrishnan, A.; Fletcher, S.; Prochownik, E. V; Genetics, M. HHS

Public Protein Glycosylation in Cancer. Annu Rev Pathol 2015, 2 (2), 473–510.

(58) Stanley P, Taniguchi N, A. M. N-Glycans. Essentials Glycobiol. 2017, 1–14.

(59) Apweiler, R.; Hermjakob, H.; Sharon, N. On the Frequency of Protein

Glycosylation, as Deduced from Analysis of the SWISS-PROT Database.

Biochem. Biophys. Acta 1999, 1473 (1), 4–8. (60) Tarentino, A. L.; Gomez,

C. M.; Plummer, T. H. Deglycosylation of Asparagine-Linked Glycans by Peptide:

N-Glycosidase F. Biochemistry 1985, 24 (17), 4665–4671.

(61) Han, L.; Costello, C. E. Mass Spectrometry of Glycans. Biochem. 2013, 78 (7),

710–720.

(62) Wada, Y.; Azadi, P.; Costello, C. E.; Dell, A.; Dwek, R. A.; Geyer, H.; Geyer, R.;

Kakehi, K.; Karlsson, N. G.; Kato, K.; Kawasaki, N.; Khoo, K. H.; Kim, S.; Kondo,

A.; Lattova, E.; Mechref, Y.; Miyoshi, E.; Nakamura, K.; Narimatsu, H.; Novotny,

M. V.; Packer, N. H.; Perreault, H.; Peter-Katalinić, J.; Pohlentz, G.; Reinhold, V.

N.; Rudd, P. M.; Suzuki, A.; Taniguchi, N. Comparison of the Methods for Profiling

Glycoprotein Glycans - HUPO Human Disease Glycomics/Proteome Initiative

Multi-Institutional Study. Glycobiology 2007, 17 (4), 411–422.

(63) Null, A. P.; Nepomuceno, A. I.; Muddiman, D. C. Implications of Hydrophobicity

and Free Energy of Solvation for Characterization of Nucleic Acids by

Electrospray Ionization Mass Spectrometry. Anal. Chem. 2003, 75 (6), 1331–

1339.

(64) Nováková, L.; Havlíková, L.; Vlčková, H. Hydrophilic Interaction Chromatography

28

of Polar and Ionizable Compounds by UHPLC. TrAC Trends Anal. Chem. 2014,

63, 55–64.

(65) West, C.; Elfakir, C.; Lafosse, M. Porous Graphitic Carbon: A Versatile Stationary

Phase for Liquid Chromatography. Journal of Chromatography A. May 2010, pp

3201–3216.

(66) Baldwin, M. A.; Stahl, N.; Reinders, L. G.; Gibson, B. W.; Prusiner, S. B.;

Burlingame, A. L. Permethylation and Tandem Mass Spectrometry of

Oligosaccharides Having Free Hexosamine: Analysis of the Glycoinositol

Phospholipid Anchor Glycan from the Scrapie Prion Protein. Anal. Biochem. 1990,

191 (1), 174–182.

(67) Walker, S. H.; Taylor, A. D.; Muddiman, D. C. Individuality Normalization When

Labeling with Isotopic Glycan Hydrazide Tags (INLIGHT): A Novel Glycan-

Relative Quantification Strategy. J. Am. Soc. Mass Spectrom. 2013, 24 (9), 1376–

1384.

(68) Hecht, E. S.; McCord, J. P.; Muddiman, D. C. Definitive Screening Design

Optimization of Mass Spectrometry Parameters for Sensitive Comparison of Filter

and Solid Phase Extraction Purified, INLIGHT Plasma N-Glycans. Anal. Chem.

2015, 87 (14), 7305–7312.

(69) Hecht, E. S.; McCord, J. P.; Muddiman, D. C. A Quantitative Glycomics and

Proteomics Combined Purification Strategy. J. Vis. Exp. 2016, No. 109, e53735.

(70) Kalmar, J. G.; Butler, K. E.; Baker, E. S.; Muddiman, D. C. Enhanced Protocol for

Quantitative N-Linked Glycomics Analysis Using Individuality Normalization When

29

Labeling with Isotopic Glycan Hydrazide Tags (INLIGHT)TM. Anal. Bioanal. Chem.

2020, 1–11.

(71) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Electrospray

Ionization for Mass Spectrometry of Large Biomolecules. Science 1989, 246

(4926), 64–71.

(72) Dole, M.; Hines, R. L.; Mack, L. L.; Mobley, R. C.; Ferguson, L. D.; Alice, M. B.

Gas Phase Macroions. Macromolecules 1968, 1 (1), 96–97.

(73) Iribarne, J. V.; Thomson, B. A. On the Evaporation of Small Ions from Charged

Droplets. J. Chem. Phys. 1976, 64 (6), 2287.

(74) Cech, N. B.; Enke, C. G. Relating Electrospray Ionization Response to Nonpolar

Character of Small Peptides. Anal. Chem. 2000, 72 (13), 2717–2723.

(75) Zhan, D.; Fenn, J. B. Gas Phase Hydration of Electrospray Ions from Small

Peptides. Int. J. Mass Spectrom. 2002, 219 (1), 1–10.

(76) Gordon, E. F.; Mansoori, B. A.; Carroll, C. F.; Muddiman, D. C. Hydropathic

Influences on the Quantification of Equine Heart Cytochrome c Using Relative Ion

Abundance Measurements by Electrospray Ionization Fourier Transform Ion

Cyclotron Resonance Mass Spectrometry. J. Mass Spectrom. 1999, 34 (10),

1055–1062.

(77) Hecht, E. S. Development of Mass Spectrometry Molecular Analysis Strategies

and Chemistries for the Quantitative Measurement of Glycans in Complex

Biological Systems., North Carolina State University, Raleigh, NC, 2017.

30

(78) Sampson, J. S.; Hawkridge, A. M.; Muddiman, D. C. Generation and Detection of

Multiply-Charged Peptides and Proteins by Matrix-Assisted Laser Desorption

Electrospray Ionization (MALDESI) Fourier Transform Ion Cyclotron Resonance

Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2006, 17 (12), 1712–1716.

(79) Tanaka, K.; Waki, H.; Ido, Y.; Akita, S.; Yoshida, Y.; Yoshida, T.; Matsuo, T.

Protein and Polymer Analyses up Tom/z 100 000 by Laser Ionization Time-of-

Flight Mass Spectrometry. Rapid Commun. Mass Spectrom. 1988, 2 (8), 151–

153.

(80) Karas, M.; Hillenkamp, F. Laser Desorption Ionization of Proteins with Molecular

Masses Exceeding 10,000 Daltons. Anal. Chem. 1988, 60 (20), 2299–2301.

(81) Scigelova, M.; Makarov, A. Fundamentals and Advances of Orbitrap Mass

Spectrometry. In Encyclopedia of Analytical Chemistry; John Wiley & Sons, Ltd:

Chichester, UK, 2000; pp 1–36..

(82) Makarov*, A. Electrostatic Axially Harmonic Orbital Trapping: A High-

Performance Technique of Mass Analysis. 2000.

(83) Zubarev, R. A.; Makarov, A. Orbitrap Mass Spectrometry. Anal. Chem. 2013, 85

(11), 5288–5296.

31

Chapter 2

Investigating Host-Pathogen Meta-Metabolic Interactions of Magnaporthe oryzae Infected Barley using Infrared Matrix-Assisted Laser Desorption Electrospray Ionization Mass Spectrometry Reused with permission from: Kalmar, J.G. Oh, Y. Dean, R. A.; Muddiman, D. C. Anal. Bioanal. Chem. 2020, 410, 139-147. © Springer-Verlag GmbH Germany 2019

2.1 INTRODUCTION

Matrix-Assisted Laser Desorption Electrospray Ionization (MALDESI)1 is an ambient ionization method and has been used extensively for mass spectrometry imaging

(MSI) of biological samples2–6. It is a hybrid of matrix-assisted laser desorption ionization

(MALDI), and electrospray ionization (ESI)7 ionization methods. Similar to MALDI,

MALDESI uses a laser, ultraviolet (UV) or infrared (IR) to ablate the tissue of interest. In this instance, a 2.94 μm wavelength laser excites the O-H stretching of water to ablate material from the sample which then, subsequently, partitions into an orthogonal electrospray beam for ionization.

One application of IR-MALDESI MSI is metabolomics. Metabolomics is the study of exogenous and endogenous metabolites, small molecule end products of enzymatic processes that occur in an organism. There are two types of metabolites, primary and secondary. Primary metabolites are molecules essential for the function of the organism whereas secondary metabolites are important for a variety of functions including defense, and interspecies interactions but, are not essential to the functions of the organism. The metabolome is very chemically diverse and thus, require analysis to be carried out in both positive and negative electrospray polarities8. To advance our understanding of plants

32

and how they interact with their environment including pathogen responses, mass spectrometry imaging is a powerful strategy to capture this biological information9,10.

Rice, Oryza sativa, is a cereal crop that is a staple food for many countries across the globe, especially in Asia11. Worldwide, over 400 million tons of rice are milled each year. Unfortunately, 10-30% of the annual crop is destroyed by the rice blast pathogen,

Magnaporthe oryzae12. M. oryzae is a filamentous fungus that is the primary cause of rice blast disease. Not only does it destroy large amounts of rice, it can also infect other cereal plants including millet, wheat, and barley13. The M. oryzae infection begins with the attachment of a spore, known as the conidium, to the host surface14. After germination, the spore forms a specialized structure known as the appressorium. The appressorium has a melanized cell wall that allows the development of significant turgor pressure14.

When the pressure is high enough, an infectious peg is forced into the host plant cell initiating the infection and ultimately the death of the plant.

In previous work15, it was found that the F-box E3 ligase (MGG_13065) (E3 ligase

KO) is crucial for the development of the appressorium and pathogenicity. This protein is part of the ubiquitin mediated proteolysis pathway that is used to post-translationally modify proteins with ubiquitin and transfer them to the proteasome for degradation. By removing F-box E3 ligase, biological insights can be garnered by determining the metabolites involved in the infection process that change between the Wild Type (WT) infection and the E3 ligase KO infection on barley. M. oryzae is a seminal model organism for studying host-pathogen interactions14,16, and thus, is a logical starting point to measure metabolites that occur during the infectious process in a spatially resolved manner.

33

Moreover, measuring post-infection metabolic changes longitudinally is a powerful means to understand how the host responds to the infection.

2.2 MATERIALS AND METHODS

2.2.1 Fungal Growth

Wild type and E3 ligase KO strands of Magnaporthe oryzae 70-15 were cultured on oatmeal agar plates and left to incubate between 25-28 °C for 1 week. This reactivated fungus was then transferred to minimal media to increase the production of spores. These samples were left to grow in the incubator for another week. The fungal spores were collected into falcon tubes and the final concentrations of each spore solution was 1 × 104 spores/mL.

2.2.2. Barley and Inoculation

Barley was used as a model host organism to develop this approach because it is one of the plants that MO can infect, it’s simple to grow, and it has a short growing period.

10 barley plants in total were grown. Barley seeds were planted into simple potting soil that was saturated with water. These plants were left to grow for a week before inoculation with the respective fungal spore solution. Three groups of barley plants, each comprising 3-4 plants, were put into white plastic garbage bags. One group was treated with the WT M. oryzae 70-15 strain, one group was treated with the E3 ligase KO M. oryzae 70-15 strain, and the last group was treated with a control 0.02% Tween 20

34

solution and labelled accordingly. These bags were then closed tightly to form separate humid environments that allowed the fungi to grow and infect the plants. The plants were left in these humid conditions for 2 hours before initial sampling to allow the solutions to settle onto the leaf blade surfaces.

2.2.3 IR-MALDESI Mass Spectrometry Imaging

A 2 cm section of inoculated barley leaf blade was removed from each group of plants and attached to a microscope slide with double-sided tape. Each sample was freshly collected on the day of sampling over the course of 8 days except for 72 hours

(days 2 and 3), during which time lesions became visible 17.

For each day, IR-MALDESI was set up for analysis by preparing a stable electrospray ionization beam from a solution comprised of a 50:50 methanol water solution with 0.2% formic acid at a flow-rate of 2.0 μL/min operating at 4.0 kV. Due to the pulsed nature of IR-MALDESI, the automatic gain control (AGC) was turned off and the injection time (IT) was fixed at 75 ms. A 3.2 mm by 3.4 mm region of interest (ROI) was used for all samples except for day 1 where the ROI was set to 3.2 mm by 2.0 mm for a baseline analysis of the inoculated barley blades having no visible lesions. The sample tissue in the ROI, typically surrounding an infection lesion, was completely ablated by 10 burst-mode pulses from a 2.94 μm infrared laser laser18. The shape of the laser beam was oval resulting in step sizes of 50 μm in the x direction and 100 μm in the y direction.

All images were collected using an ROI containing an integer multiple of four in the x direction as to make sure the acquisition using the extended polarity switching scheme

35

show in Figure 2.1. Previously imaged areas could not be reused as the laser quantitatively ablated the tissue.

+ + - - 100-400 300-1200 300-1200 100-400 m/z m/z m/z m/z

Figure 2.1. Extended polarity switching acquisition8 scheme where two m/z ranges, 100-

400 Th and 300-1200 Th, are analyzed in both positive and negative polarities. The blank boxes represent “equilibration” scans that occur after the previous analytical scan which are noted with m/z range and polarity.

Figure 2.1 depicts the sequence of events used for these studies. The data was collected using a modified polarity switching method that measured two m/z ranges, 100-

400 Th and 300-1200 Th in the positive-ion mode and the same m/z ranges in the negative-ion mode. In the instrument method, each analytical IR-MALDESI measurement was collected at 140 k resolving power at full-width half maximum at m/z = 200, followed by a lower resolving power (70 k) spectrum that is inserted to allow the mass spectrometer time to equilibrate to the next set of parameters. A previous study on polarity switching

IR-MALDESI imaging8, found that a delay is needed to stabilize the electrospray when the analyzing polarity is changed. This delay is referred to as an equilibration scan. An

36

equilibration scan is designed with the next analytical scan’s metrics but, with a lower resolving power to decrease analysis time. Ideally, an equilibration scan would only occur when the polarities are changed; however, due to the function of the Handshake N-th

Counter setting in the Q Exactive tune file, an equilibration scan must also be added after every m/z range (data not shown). The Handshake N-th Counter sets the number of continuous scans before the next contact closure signal (TTL pulse) is required. For example, if two positive-ion mode mass scans, followed by an equilibration scan, and then two negative mode scans were programmed, the timing of the contact closure would not align with the correct scan.

The acquisition method shown in Figure 2.1 was designed and tested. This polarity switching method was developed to capture the chemically diverse metabolites in this pathogen/host interaction as possible in terms of their m/z and functionality (e.g., acidic and basic molecules). Importantly, two m/z ranges are needed because if the ratio of low m/z to high m/z within a range is greater than four, the Q Exactive Plus will divide the total ion injection into multiple ion injections resulting in the loss of ions in the small delay between the collections. Due to the transient nature of the laser ablation this results in collection of the ions in the first subscan and little to no collection of ions in the second subscan. The low m/z range, 100-400 Th, was chosen because it could be used to identify small metabolites during the infection. Often, primary metabolites are molecules that are smaller in mass including some amino acids, vitamins, organic acids and nucleosides.

Secondary metabolites, can also be small (e.g., toxins and gibberellins), but they are typically larger in mass and require the second m/z range, 300-1200 Th. The dual polarities also allowed us to sample as many metabolites in those m/z ranges as possible

37

as some ionize in positive-ion mode (basic compounds) while others ionize in negative- ion mode (acidic compounds).

38

Figure 2.2. Representation of the extended polarity switching during data collection

(spatial resolution of 50 μm by 100 μm) and analysis. The center image is overlaid onto an optical image of the sampled ROI collected on Day 8 of the WT infection. The parsed data has the resolution, in the x-direction, spanning the width of both m/z ranges in both polarities resulting in the width of the representative data being 4× the collected size (200

μm). The resolution in the y-direction remains the same as collected. +L is the 100-400

Th range in positive-ion mode and -L is the 100-400 Th in the negative-ion mode. +H is the 300-400 Th in positive-ion mode and –H is 300-1200 Th in the negative-ion mode.

39

The center image in Figure 2.2 is a composite of four ion images collected simultaneously using the alternating acquisition method described in Figure 2.1. Each m/z range and polarity cover the ROI multiple times allowing collection of information in one image rather than sampling each m/z range and polarity on separate tissue pieces.

This allowed us to drastically reduce data collection time while still collecting the requisite information, albeit at a lower spatial resolution in the x-direction. MSIReader19,20 allowed for the visualization of the parsed data in each m/z range and polarity individually, to find compounds that form images of interest such as ion distributions correlated positively or negatively to the visible lesion(s). The parsed data was then used to analyze features that occurred on individual days of the experiment or stitched together to track changes within the longitudinal study.

2.2.4 Feature Identification

To analyze the IR-MALDESI images that were collected, we used MSiReader

1.0119,20. To transform raw data collected using the extended polarity switching method into the correct format for analysis, the raw data was first converted to an mzML21 file then parsed manually using the imzML converter22,23. imzML files allow for the raw data to be converted into an image by adding x and y directionality. Compounds were generated using the MSiPeakfinder tool and structural similarity algorithm (SSIM)24,25 as implemented in MSiReader. The SSIM parameters used for this function were kept at a default of a Gaussian radius of 1.5 and exponents α = β = γ =124.

40

Features identified with either the MSiPeakfinder or SSIM algorithms were searched against compound databases including METLIN26, KEGG27–29, and PlanyCyc30 as well as manual structure elucidation using carbon estimation, sulfur counting, spectral accuracy31, and a targeted tandem MS fragmentation32.

2.2.5 Follow-on targeted tandem mass spectrometry

A set of m/z values of interest from the initial study were chosen for targeted tandem mass spectrometry analysis. New samples inoculated with the WT fungi were prepared as described above and allowed to infect the barley for 7 days. Each m/z value, in a single polarity and mass range, was imaged over the course of a whole lesion. The data was collected using the AIF-MS/MS setting in the Thermo Tune software where the scans were centered at the m/z of interest with a ±2 Th tolerance window. In this method, each IR-MALDESI measurement was collected at 140 k resolving power at full-width half maximum at m/z 200 and the HCD normalized collision energy was set to 20%.

Fragmentation patterns were compared to those found online with METLIN26 and

MetFrag33.

2.3 RESULTS AND DISCUSSION

Magnaporthe oryzae is thought to produce a diverse array of secondary metabolites that are important during infection of the plant 34,35. While experiencing biotic stress from the fungal attack, the plant also produces defense metabolites during the infection. Being able to cover the diverse metabolites within multiple mass ranges and

41

ionization polarities from both the fungus and the host plant is important for understanding how both metabolisms change as the infection progresses. This can lead to possible anti-fungal targets or genetic modifications that can allow for crops to be resilient.

Figure 2.3. A) Positive ionization and B) negative ionization mode generated the same feature. Two protons are equivalent to a 2.0146 Da mass change and the difference between the mass values of A and B is 2.0148 Da. Knowing the mass measurement accuracy of the instrument is under 3 ppm, this strongly supports that these are the same features. C) Corresponding optical images including the ROI for the days of the infection.

Days 2 and 3 were not sampled due to their lack of visible lesions.

One of the benefits of using this extended polarity switching method for untargeted metabolite imaging studies is that we found both small and large molecular weight

42

features that are ionized in both positive and negative mode. The ability for a compound to ionize in both modes suggests at least one acidic and one basic site within the molecule. The images in Figure 2.3 were discovered using the MSiPeakfinder tool then compared to each other after noticing similar distributions of the features. Recognizing that Figure 2.3A was collected in positive mode and Figure 2.3B was collected in negative mode, the difference between the masses would be the same as that of two protons if it was the same molecule. The advantage of finding features that ionize in both positive and negative modes allowed us to narrow down the list of possible feature identifications due to the ability to look at both spectra for manual structural elucidation as shown in Figure 2.4A.

43

Figure 2.4. A) Experimental MS1 data (accurate mass measurments) and C and S counting for putative assignment of this feature as serotonin. Mass measurement accuracy (MMA) is used to limit the number of potential candidates based solely on the experimental m/z values. Moreover, the peaks in the isotopic cluster can be confidently assigned based on their respective m/z values; the values of 1.13 ppm, 1.23 ppm, and

1.12 ppm assures us that this isotopic distribution belongs to this feature. B) Experimental

MS2 data for putative identification of serotonin at an HCD normalized collision energy of

20%. Fragments were identified by comparing them to the product ion spectra found in

METLIN26.

44

An example of manual structure elucidation for putative identification is shown in

Figure 2.4 where an experimental isotopic distribution was identified for serotonin.

Serotonin (C10H8N2O), a tryptophan derivative, is a metabolite produced in cells near the infection penetration site. It has a variety of roles in plants including fortify the cell walls and may act as a physical barrier during the pathogenesis of M. oryzae 36–38. As seen in

Figure 2.4A, the experimental isotopic distribution for this feature was found at m/z

177.1024 (100%), m/z 178.1058 (10.6%), and m/z 179.1091 (0.44%) for M, M+1 and

M+2, respectively. Using the carbon estimation equation in Figure 2.4A, it was estimated that there are 10 carbons due to the natural abundance of 13C of ~1.1% as reported by

NIST and IUPAC39. There was no identification of a sulfur M+2 peak in the experimental data at m/z 179.0982 therefore excluding sulfur being present for this compound. As result of MMA, 3 ppm or below is deemed acceptable data for the Q-Exactive Plus

Orbitrap mass analyzer, and spectral accuracy31, this data putatively identifies this compound as serotonin. Due to the distribution of this feature, the putative identification as serotonin, and its significance in the defense of the plant during an infection, it was chosen for further analysis via tandem mass spectrometry. The product ion spectrum shown in Figure 2.4B was first analyzed with MetFrag to identify compounds from KEGG that had the same mass as the feature identified with the MS1 data. There were 3 candidates on the list: serotonin, N-hydroxyl- tryptamine, and 5-(N-methyl-4,5,dihydro-

1H-pyrrol-2-yl)pyridine-2-ol. All three of these candidates could be ionized in both positive and negative mode and have 10 carbons. Any candidate that could not possibly ionize in both polarities and did not have between 9 and 11 carbons, due to the carbon estimation calculation, was removed from consideration. The correlation of the fragments between

45

the calculated and the experimental m/z values strongly suggested serotonin or N- hydroxyl-tryptamine was the feature identified. Both m/z 115.0544 and m/z 132.0808 could have belonged to either candidate; however, the placement of the alcohol group for the fragment at m/z 160.0756 allowed for the distinction between the two choices, resulting in confident identification of this feature as serotonin.

Serotonin, along with other pertinent phenolic secondary metabolites, are important for the defense of the barley from the attacking M.oryzae. However, the fungus also produces compounds to facilitate entry and invasion into its host. Though there are many biological pathways that can be a part of this function, the biosynthesis of melanin, as seen in Figure 2.5, plays a crucial role in the pathogenicity of M. oryzae. Melanin is also a significant component of the fungal cell wall.

46

Figure 2.5. Fungal biosynthesis of melanin with corresponding MSI images. The images were found by analyzing the negative-ion mode 100-400 Th range on Day 5 of the longitudinal study. All m/z values were in the form of [M-H+]- and within a mass tolerance of ± 2.5ppm.

During appressorium formation and hyphal invasion, the deposition of a melanin layer is critical for successful infection. Figure 2.5 shows the metabolites within the melanin pathway scheme as well as images at their respective m/z values. These metabolites were observed in negative ionization mode due to lack of sites available for

47

protonation. The images were obtained from the –L data from day four of the study and putatively identified using METLIN m/z matching and MS1 manual elucidation as described with Figure 2.4A.

As described in the methods, the individual days were combined to visualize the changes of a single metabolite longitudinally. Using the imzML converter, we were able to stitch a series of +L, +H, -H, and -L datasets, from each day to form a single longitudinal image as seen in Figure 2.6.

48

Figure 2.6. Longitudinal comparison of WT, E3 ligase KO, and Control. A) Optical images of barley blade ROIs. B) MS images of dihydroxynaphthalene (m/z 159.0452), an important metabolite in the melanin pathway.

The optical images collected for each ROI are shown in Figure 2.6A and the combined –L data are shown in Figure 2.6B, for each treatment (WT, E3 ligase KO, and

Control) as a function of the day the leaves were sampled. The top row of both Figure

2.6A and 2.6B contains the WT M. oryzae treated barley blade samples. In the WT treated 49

samples, lesions were first observed on the fourth day after inoculation. On days five through eight, there were clear lesions (optical images, Figure 2.6A) that were imaged in each ROI. According to the MS images, there is little presence of dihydroxynaphthalene until day 5 suggesting limited formation of the melanized appressorium and invasion hyphae prior to day 5. On days five through eight, there was clear evidence of scytalone that were imaged in each ROI supporting an active melanin pathway on each day, matching what is observed in the optical images.

Both middle rows in Figure 2.6A and 2.6B contain the E3 ligase KO M. oryzae treated barley blade samples. Importantly, there are no visible infection lesions on any day. The E3 ligase KO fungus is unable to produce the appressorium to cause an infection in the plant, resulting in the E3 ligase KO images looking similar to the Control images.

There is little presence of scytalone on any day especially in the form of a lesion. Since the E3 ligase KO results in no appressorium formation for the fungus, it is possible that a metabolite found in that pathway would not show up. Furthermore, because the fungus did not successfully colonize the plant more plant-based metabolites relative to fungal metabolites were observed.

The bottom rows in Figure 2.6A and 2.6B contain the Control treated barley blade samples. As noted, there are no visible infection lesions on any day and there is also little presence of scytalone on any day in this series, especially, in the form of a lesion. This is expected as there was no introduction of any fungal treatment for these samples resulting in no appressorium formation. Importantly, this negative control supports that no cross- contamination with the WT occurred.

50

2.4 CONCLUSIONS

An infection of WT M. oryzae and E3 ligase KO M. oryzae in barley was thoroughly investigated using an IR-MALDESI imaging system coupled to a high resolution accurate mass (HRAM) MS platform. A modified polarity switching method was developed to capture as many metabolites as possible to determine compounds associated with both barley and M. oryzae during the infection. Comparisons between the WT, E3 ligase KO and control treated plants were made. Some biologically significant features, like those in the melanin pathway and serotonin, were identified in the WT infected plants by searching databases and verifying with manual elucidation and tandem mass spectrometry analysis.

The plants infected with the E3 ligase KO M. oryzae did not have features suggesting infection related metabolites or plant response metabolites. Many compounds that have been identified with interesting distributions using accurate mass measurements; however, they remain unknown due to their low abundance preventing the use of spectral accuracy to more confidently determine the elemental composition.

2.5 ACKNOWLEDGMENTS

The mass spectrometry measurements were carried out in the Molecular

Education, Technology, and Research Innovation Center (METRIC) at North Carolina

State University. The authors gratefully acknowledge the financial support received from the U.S Department of Agriculture (National Institute of Food and Agriculture) Grant

Number 2014-67013-21722, and North Carolina State University.

51

2.6 LITERATURE CITED

(1) Sampson, J. S.; Hawkridge, A. M.; Muddiman, D. C. Generation and Detection of

Multiply-Charged Peptides and Proteins by Matrix-Assisted Laser Desorption

Electrospray Ionization (MALDESI) Fourier Transform Ion Cyclotron Resonance

Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2006, 17 (12), 1712–1716.

https://doi.org/10.1016/j.jasms.2006.08.003.

(2) Judd, R.; Bagley, M. C.; Li, M.; Zhu, Y.; Lei, C.; Yuzuak, S.; Ekelöf, M.; Pu, G.;

Zhao, X.; Muddiman, D. C.; Xie, D.-Y. Artemisinin Biosynthesis in Non-Glandular

Trichome Cells of Artemisia Annua. Mol. Plant 2019, 12 (5), 704–714.

https://doi.org/10.1016/J.MOLP.2019.02.011.

(3) Bokhart, M. T.; Muddiman, D. C. Infrared Matrix-Assisted Laser Desorption

Electrospray Ionization Mass Spectrometry Imaging Analysis of Biospecimens.

Analyst 2016, 141 (18), 5236–5245. https://doi.org/10.1039/C6AN01189F.

(4) Nazari, M.; Bokhart, M. T.; Muddiman, D. C. Whole-Body Mass Spectrometry

Imaging by Infrared Matrix-Assisted Laser Desorption Electrospray Ionization (IR-

MALDESI). J. Vis. Exp. 2016, No. 109, e53942. https://doi.org/10.3791/53942.

(5) Loziuk, P.; Meier, F.; Johnson, C.; Ghashghaei, H. T.; Muddiman, D. C.

TransOmic Analysis of Forebrain Sections in Sp2 Conditional Knockout

Embryonic Mice Using IR-MALDESI Imaging of Lipids and LC-MS/MS Label-Free

Proteomics. Anal. Bioanal. Chem. 2016, 408 (13), 3453–3474.

https://doi.org/10.1007/s00216-016-9421-3.

(6) Robichaud, G.; Barry, J. A.; Muddiman, D. C. IR-MALDESI Mass Spectrometry

52

Imaging of Biological Tissue Sections Using Ice as a Matrix. J. Am. Soc. Mass

Spectrom. 2014, 25 (3), 319–328. https://doi.org/10.1007/s13361-013-0787-6.

(7) Fenn, J. B.; Mann, M.; Meng, C. K.; Wong, S. F.; Whitehouse, C. M. Electrospray

Ionization for Mass Spectrometry of Large Biomolecules. Science 1989, 246

(4926), 64–71.

(8) Nazari, M.; Muddiman, D. C. Polarity Switching Mass Spectrometry Imaging of

Healthy and Cancerous Hen Ovarian Tissue Sections by Infrared Matrix-Assisted

Laser Desorption Electrospray Ionization (IR-MALDESI) †. Analyst 2015, 141,

595. https://doi.org/10.1039/c5an01513h.

(9) Boughton, B. A.; Thinagaran, D.; Sarabia, D.; Bacic, A.; Roessner, U. Mass

Spectrometry Imaging for Plant Biology: A Review. Phytochem. Rev. 2016, 15,

445–488. https://doi.org/10.1007/s11101-015-9440-2.

(10) Lee, Y. J.; Perdian, D. C.; Song, Z.; Yeung, E. S.; Nikolau, B. J. Use of Mass

Spectrometry for Imaging Metabolites in Plants. Plant J. 2012, 70 (1), 81–95.

https://doi.org/10.1111/j.1365-313X.2012.04899.x.

(11) Umadevi, M.; Pushpa, R.; Sampathkumar, K. P.; Bhowmik, D. Rice-Traditional

Medicinal Plant in India. J. Pharmacogn. Phytochem. 2012, 1 (1).

(12) Greer, C. A.; Webster, R. K. Occurrence, Distribution, Epidemiology, Cultivar

Reaction, and Management of Rice Blast Disease in California. Plant Dis. 1096,

85 (10).

(13) Couch, B. C.; Kohn, L. M. A Multilocus Gene Genealogy Concordant with Host

53

Preference Indicates Segregation of a New Species, Magnaporthe Oryzae, from

M. Grisea. Mycologia 2002, 94 (4), 683. https://doi.org/10.2307/3761719.

(14) Dean, R. A.; Talbot, N. J.; Ebbole, D. J.; Farman, M. L.; Mitchell, T. K.; Orbach, M.

J.; Thon, M.; Kulkarni, R.; Xu, J.-R.; Pan, H.; Read, N. D.; Lee, Y.-H.; Carbone, I.;

Brown, D.; Oh, Y. Y.; Donofrio, N.; Jeong, J. S.; Soanes, D. M.; Djonovic, S.;

Kolomiets, E.; Rehmeyer, C.; Li, W.; Harding, M.; Kim, S.; Lebrun, M.-H.; Bohnert,

H.; Coughlan, S.; Butler, J.; Calvo, S.; Ma, L.-J.; Nicol, R.; Purcell, S.; Nusbaum,

C.; Galagan, J. E.; Birren, B. W. The Genome Sequence of the Rice Blast Fungus

Magnaporthe Grisea. Nature 2005, 434 (7036), 980–986.

https://doi.org/10.1038/nature03449.

(15) Oh, Y.; Franck, W.; Dean, R. Functional Analysis of Protein Ubiquitination in the

Rice Blast Fungus Magnaporthe Oryzae. In 12th European Conference on Fungal

Genetics; Seville, Spain, 2014; p P103.

(16) Talbot, N. J. On the Trail of a Cereal Killer: Exploring the Biology of Magnaporthe

Grisea. Annu. Rev. Microbiol. 2003, 57 (1), 177–202.

https://doi.org/10.1146/annurev.micro.57.030502.090957.

(17) Talbot, N. J.; Wilson, R. A.; -Lincoln Talbot, N. Under Pressure: Investigating the

Biology of Plant Infection by Magnaporthe Oryza. 2009.

(18) Ekelöf, M.; Manni, J.; Nazari, M.; Bokhart, M.; Muddiman, D. C. Characterization

of a Novel Miniaturized Burst-Mode Infrared Laser System for IR-MALDESI Mass

Spectrometry Imaging. Anal. Bioanal. Chem. 2018, 410 (9), 2395–2402.

https://doi.org/10.1007/s00216-018-0918-9.

54

(19) Robichaud, G.; Garrard, K. P.; Barry, J. A.; Muddiman, D. C. MSiReader: An

Open-Source Interface to View and Analyze High Resolving Power MS Imaging

Files on Matlab Platform. J. Am. Soc. Mass Spectrom. 2013, 24 (5), 718–721.

https://doi.org/10.1007/s13361-013-0607-z.

(20) Bokhart, M. T.; Nazari, M.; Garrard, K. P.; Muddiman, D. C. MSiReader v1.0:

Evolving Open-Source Mass Spectrometry Imaging Software for Targeted and

Untargeted Analyses. J. Am. Soc. Mass Spectrom. 2018, 29 (1), 8–16.

https://doi.org/10.1007/s13361-017-1809-6.

(21) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P. ProteoWizard: Open

Source Software for Rapid Proteomics Tools Development. Bioinformatics 2008,

24 (21), 2534–2536. https://doi.org/10.1093/bioinformatics/btn323.

(22) Schramm, T.; Hester, A.; Klinkert, I.; Both, J.-P.; Heeren, R. M. A.; Brunelle, A.;

Laprévote, O.; Desbenoit, N.; Robbe, M.-F.; Stoeckli, M.; Spengler, B.; Römpp, A.

ImzML — A Common Data Format for the Flexible Exchange and Processing of

Mass Spectrometry Imaging Data. J. Proteomics 2012, 75 (16), 5106–5110.

https://doi.org/10.1016/j.jprot.2012.07.026.

(23) Race, A. M.; Styles, I. B.; Bunch, J. Inclusive Sharing of Mass Spectrometry

Imaging Data Requires a Converter for All. J. Proteomics 2012, 75 (16), 5111–

5112. https://doi.org/10.1016/j.jprot.2012.05.035.

(24) Ekelöf, M.; Garrard, K. P.; Judd, R.; Rosen, E. P.; Xie, D.-Y.; Kashuba, A. D. M.;

Muddiman, D. C. Evaluation of Digital Image Recognition Methods for Mass

Spectrometry Imaging Data Analysis. J. Am. Soc. Mass Spectrom. 2018, 29 (12),

55

2467–2470. https://doi.org/10.1007/s13361-018-2073-0.

(25) Wang, Z.; Bovik, A. C.; Sheikh, H. R.; Simoncelli, E. P. Image Quality

Assessment: From Error Visibility to Structural Similarity. IEEE Trans. Image

Process. 2004, 13 (4), 600–612. https://doi.org/10.1109/TIP.2003.819861.

(26) Smith, C. A.; O’Maille, G.; Want, E. J.; Qin, C.; Trauger, S. A.; Brandon, T. R.;

Custodio, D. E.; Abagyan, R.; Siuzdak, G. METLIN: A Metabolite Mass Spectral

Database. Ther. Drug Monit. 2005, 27 (6), 747–751.

(27) Kanehisa, M.; Furumichi, M.; Tanabe, M.; Sato, Y.; Morishima, K. KEGG: New

Perspectives on Genomes, Pathways, Diseases and Drugs. Nucleic Acids Res.

2017, 45 (D1), D353–D361. https://doi.org/10.1093/nar/gkw1092.

(28) Kanehisa, M.; Goto, S. KEGG: Kyoto Encyclopedia of Genes and Genomes.

Nucleic Acids Res. 2000, 28 (1), 27–30. https://doi.org/10.1093/nar/28.1.27.

(29) Kanehisa, M.; Sato, Y.; Kawashima, M.; Furumichi, M.; Tanabe, M. KEGG as a

Reference Resource for Gene and Protein Annotation. Nucleic Acids Res. 2016,

44 (D1), D457–D462. https://doi.org/10.1093/nar/gkv1070.

(30) Schläpfer, P., Zhang, P., Wang, C., Kim, T., Banf, M., Chae, L., Dreher, K.,

Chavali, A. K., Nilo-Poyanco, R., Bernard, T., Kahn, D., Rhee, S. Y. Genome-

Wide Prediction of Metabolic Enzymes, Pathways, and Gene Clusters in Plants.

Plant Physiol. 2017, 173 (4), 2041–2059.

(31) Khodjaniyazova, S.; Nazari, M.; Garrard, K. P.; Matos, M. P. V; Jackson, G. P.;

Muddiman, D. C. Characterization of the Spectral Accuracy of an Orbitrap Mass

56

Analyzer Using Isotope Ratio Mass Spectrometry. 2017.

https://doi.org/10.1021/acs.analchem.7b03983.

(32) Barry, J. A.; Robichaud, G.; Bokhart, M. T.; Thompson, C.; Sykes, C.; Kashuba,

A. D. M.; Muddiman, D. C. Mapping Antiretroviral Drugs in Tissue by IR-MALDESI

MSI Coupled to the Q Exactive and Comparison with LC-MS/MS SRM Assay. J.

Am. Soc. Mass Spectrom. 2014, 25 (12), 2038–2047.

https://doi.org/10.1007/s13361-014-0884-1.

(33) Ruttkies, C.; Schymanski, E. L.; Wolf, S.; Hollender, J.; Neumann, S. MetFrag

Relaunched: Incorporating Strategies beyond in Silico Fragmentation. J.

Cheminform. 2016, 8 (1), 3. https://doi.org/10.1186/s13321-016-0115-9.

(34) Jacob, S.; Grötsch, T.; Foster, A. J.; Schüffler, A.; Rieger, P. H.; Sandjo, L. P.;

Liermann, J. C.; Opatz, T.; Thines, E. Unravelling the Biosynthesis of Pyriculol in

the Rice Blast Fungus Magnaporthe Oryzae. Microbiology 2017, 163 (4), 541–

553. https://doi.org/10.1099/mic.0.000396.

(35) Horbach, R.; Navarro-Quesada, A. R.; Knogge, W.; Deising, H. B. When and How

to Kill a Plant Cell: Infection Strategies of Plant Pathogenic Fungi. J. Plant

Physiol. 2011, 168 (1), 51–62. https://doi.org/10.1016/J.JPLPH.2010.06.014.

(36) Hayashi, K.; Fujita, Y.; Ashizawa, T.; Suzuki, F.; Nagamura, Y.; Hayano-Saito, Y.

Serotonin Attenuates Biotic Stress and Leads to Lesion Browning Caused by a

Hypersensitive Response to Magnaporthe Oryzae Penetration in Rice. Plant J.

2016, 85 (1), 46–56. https://doi.org/10.1111/tpj.13083.

(37) Ishihara, A.; Hashimoto, Y.; Tanaka, C.; Dubouzet, J. G.; Nakao, T.; Matsuda, F.;

57

Nishioka, T.; Miyagawa, H.; Wakasa, K. The Tryptophan Pathway Is Involved in

the Defense Responses of Rice against Pathogenic Infection via Serotonin

Production. Plant J. 2008, 54 (3), 481–495. https://doi.org/10.1111/j.1365-

313X.2008.03441.x.

(38) Fujiwara, T.; Maisonneuve, S.; Isshiki, M.; Mizutani, M.; Chen, L.; Wong, H. L.;

Kawasaki, T.; Shimamoto, K. Sekiguchi Lesion Gene Encodes a Cytochrome

P450 Monooxygenase That Catalyzes Conversion of Tryptamine to Serotonin in

Rice. J. Biol. Chem. 2010, 285 (15), 11308–11313.

https://doi.org/10.1074/jbc.M109.091371.

(39) Berglund, M.; Wieser, M. E. Isotopic Compositions of the Elements 2009 (IUPAC

Technical Report). Pure Appl. Chem. 2011, 83 (2), 397–410.

https://doi.org/10.1351/PAC-REP-10-06-02.

58

Chapter 3

Comparative Proteomic Analysis of Wild Type and Mutant Lacking a SCF E3 Ligase F-box Protein in Magnaporthe oryzae Reused with permission from: Kalmar, J.G. Oh, Y. Dean, R. A.; Muddiman, D. C. J. Proteome. Res. 2020, 19(9), 3761-3768. © American Chemical Society 2020

3.1 INTRODUCTION

Proteomics is a diverse field which provides a unique opportunity to understand the regulation, expression, and function of proteins in an organism. Proteomics is now a viable research strategy to study used to generate new insight into fungal biology as a result of the recent sequencing of the many fungal genomes1–7. Magnaporthe oryzae (M. oryzae) is a pathogenic, filamentous fungus that is a primary cause of rice blast disease and is widely regarded as a key model for studying host-pathogen interactions. Rice blast disease causes losses of 10% - 30% of the annual rice harvest8. The genome of M. oryzae was sequenced in 2005 as a foundation for many additional ‘OMIC’ analyses to understand its pathogenic features5,9. Proteomics has gained considerable interest in recent years in large part due to technical advances for evaluating biological pathways, protein-protein interactions, disease targets, and biomarker investigations. Protein function is known to be affected by various structural modifications, thus evaluation of post translational modifications (PTMs) is vital 10–13. Phosphorylation is a very common

PTM that occurs predominantly on serine, threonine and tyrosine amino acid residues with more recent studies demonstrating it also occurs on lysine residues14. It has been linked to protein activation and deactivation in signaling cascades and other functions such as regulating the cell cycle and protein-protein interactions. Recently,

59

phosphorylation has also been affiliated with protein degradation within the ubiquitin- proteasome system15,16. Ubiquitination is another common PTM that occurs, mainly, on the lysine residues of proteins with the primary role of sending proteins to the proteasome for degradation. Other functions include cellular localization, protein-protein interaction,

NF-ĸB activation, and DNA repair17–20. It was recently observed that the ubiquitin- proteasome system is linked to plant host-pathogen interactions in some cases21–26.

The M. oryzae protein MGG_13065, SCF E3 Ubiquitin Ligase complex F-box protein, was identified as being an important protein for growth and development, including for the infection process27. This protein is part of the E3 SCF complex in the ubiquitin mediated proteolysis pathway16,28. Due to the presence of a leucine rich region

(LRR) in its sequence, it interacts with phosphorylated sites of targeted proteins allowing them to be brought into close proximity to the E2 conjugation ligase and tagged with the ubiquitin protein. This ubiquitination tagging can occur multiple times resulting in the polyubiquitin chain. Furthermore, the connectivity of the ubiquitin proteins in the polyubiquitin chain is thought to determine the mechanism of the targeted protein20.

These characteristics make the evaluation of MGG_13065 of great interest for a better understanding of how the ubiquitin mediated proteolysis pathway is involved in virulence.

To begin to understand how this E3 ligase regulates ubiquitination and development in M. oryzae, we compared the WT M. oryzae 70-15 strain with a

MGG_13065 gene knock out (E3 ligase KO) strain. To ensure we could obtain sufficient biological tissue to conduct these studies, we examined mycelia from liquid cultures. We then evaluated differences in overall mycelial protein expression and phosphorylation and ubiquitination levels using label-free quantitative (LFQ) global proteomics. The monitoring

60

of global changes in relative protein abundance greatly assists in the selection of future gene knock out targets and enabling mechanistic studies related to virulence.

3.2 MATERIALS AND METHODS

3.2.1 Sample Preparation

Fungal conidia from M. oryzae wild type 70-15 strain and MGG_13065 deletion mutant were harvested from 1 week old V8 agar plates and inoculated into complete liquid medium (10 g sucrose; 1 ml A. nidulans trace elements; 6 g casein acid hydrolysate; 6 g yeast extract in 1 L). The culture was grown at 28°C on a 200 rpm shaker for 3 days. The mycelial tissue was then collected on sterile filter paper, washed three times with sterile distilled water, and divided into 4 equal pieces. Biological replicates were grown by placing one piece of the mycelial tissue into each of the four flasks containing complete liquid media. These cultures were grown at 28 °C in a 200 rpm shaker. After 24h, the mycelial mats were collected, washed with sterile distilled water and then ground into powder with liquid nitrogen. Proteins were extracted using urea lysis buffer containing 8

M urea, 50 mM Tris-HCl (pH 7.5), 150 mM NaCl, 1 mM EDTA (pH 8.0), 1 mM PMSO, 50

M PR-619, one cOmpleteTM ULTRA tablet Protease Inhibitor cocktail (Roche, Germany) per 50 ml. The lysate was clarified by centrifugation at 16,000 × g for 15 min. Protein concentration was estimated using the PierceTM Coomassie Bradford assay kit (Thermo

Fisher Scientific, Waltham, MA). Samples were stored at -80 C.

61

3.2.2 Protein Digestion, Alkylation and De-salting

Proteins in each of the sample types were digested using the filter aided sample preparation method (FASP)29. Briefly, 250 g of protein sample were pipetted onto a 10 kDa centrifugal filter unit. It was then diluted 2-fold with 100 mM dithiothreitol (DTT) in 50 mM Tris buffer, pH 7.0 and incubated at 56 °C for 30 minutes. The samples were then alkylated with a final concentration of 10 mM N-ethylmaleimide (NEM)30 in 50 mM Tris buffer, pH 7.0 and incubated in the dark at room temperature for 30 minutes. The samples were then concentrated by centrifuging the units for 15 minutes at 14,000 × g at 20 °C.

The sample in the filter was then washed 3 times with 400 μL of 8 M Urea in 50 mM Tris buffer (pH = 7.0) in each wash. The samples were washed again, 3 times using 400 μL of 2 M Urea, 10 mM CaCl2 in 50 mM Tris buffer (pH = 7.0) in each wash. After the final washing, each filter unit was moved to a new collection tube. Each sample was digested with trypsin at a 1:50 enzyme: protein ratio in 2 M Urea, 10 mM CaCl2 in 50 mM Tris buffer, pH 7.0, and left overnight at 37 °C. The reaction was quenched by adding 50 L of 1% Formic acid and 0.001% Zwittergent 3-16. The samples were then centrifuged for

15 minutes at 14,000 × g at 20 °C. The samples were washed with 400 L of the quenching solution and centrifuged for 15 minutes at 14,000 × g in the same collection tube. Finally, each sample was desalted using C18 StageTips 31.

3.2.3 NanoLC-MS/MS using HRAM MS Platform Technology

Global proteomic measurements were performed using reversed-phase nano-LC-

MS/MS by means of an Easy nLC 1200 (Thermo Fisher Scientific, Waltham, MA) coupled

62

to a high field quadrupole Orbitrap mass spectrometer (Q Exactive HF, Thermo Fisher

Scientific, Bremen, Germany). The tryptic peptides were loaded onto a 20 cm, 75 m I.D.

Picofrit column (New Objective, Woburn, MA) packed with Phenomenex C18 2.6 µm stationary phase and eluted at a flow-rate of 300 nL/minute with a 240-minute linear gradient (5-35% B). Solvents used to create mobile phases were purchased from Fisher

Scientific. Mobile phase A consisted of 98% water, 2% acetonitrile, and 0.01% formic acid. Mobile phase B consisted of 80% acetonitrile, 20% water, and 0.01% formic acid.

The mass measurements were made using a Top20 data-dependent acquisition

(DDA) mode and analyzed for an m/z range of 375 to 1600. The resolving power (full- width half maximum at m/z = 200) was set to 120,000 for full MS acquisition and 15,000 for MS/MS acquisition. The AGC targets for MS and MS/MS were set to 3E6 and 2E5, respectively. The maximum ionization times were set to 50 ms and 60 ms for MS1 and

MS/MS, respectively. Higher-energy collisional dissociation was performed using 27 as the normalized collision energy. Precursor ions were collected with an isolation window of 1.5 m/z and the dynamic exclusion window was set to 20 seconds. Additionally, charge-state exclusion was used to prevent unassigned and 1+ charge states from being selected for fragmentation.

3.2.4 Data Analysis

The LC-MS/MS .RAW files were searched against the concatenated target- reverse M. oryzae database, MG8 (Magnaporthe comparative Sequencing Project, Broad

Institute of Harvard and MIT, http://www.broadinstitute.org) and identified using the

63

Sequest HT algorithm in Proteome Discoverer 2.1 (PD 2.1) (Thermo Fisher Scientific,

Waltham, MA). Search parameters were set to 4 missed trypsin cleavage sites and utilized a precursor mass tolerance of 5 ppm and a fragment tolerance of 0.02 Da. N- ethylmaleimide was a fixed modification on cysteine. Variable modifications included two ubiquitination tags, Gly-Gly tag on lysine, serine, and threonine and the missed cleavage tag Leu-Arg-Gly-Gly tag on lysine. Oxidation of methionine, and phosphorylation on serine, threonine, and tyrosine were also variable modifications. The maximum number of modifications per peptide was set to 4. The false discovery rates were calculated using the Percolator node and were set at a maximum of 1%. The number of unique peptide sequences required for protein identification was set to 1.

PD 2.1 searches the .RAW MS/MS spectra, with previously mentioned search parameters, against a FASTA database to determine the amino acid sequence of peptides and maps the distinct sequence to a protein sequence. If the peptide has one distinct protein, it is referred to as a unique peptide and if it shared, it was treated as a razor peptide (i.e., assigned to protein for which the most evidence exists). The LFQ of the proteins was carried out using peptide spectral matches (PSM’s), also known as spectral counts (SpCs), which were normalized across biological and technical replicates using the total normalization technique as previously described by Gokce and coworkers32. Proteins with normalized spectral counts (NSpC) across all biological and technical replicates totaling to 4 or less were not considered. Statistical significance was determined using an unpaired student’s t-test between WT and KO samples with a cutoff value of p ≤ 0.05. Fold-changes of protein expression were calculated by dividing the average normalized spectral counts of the KO proteins by the average normalized

64

spectral counts of the WT proteins. Previous reports shown by Gokce and coworkers suggested defining differential proteins at a ≥ 2-fold change and a p-value less than 0.05 in average normalized spectral counts to reduce the false-positive rate (FPR), based on the number of proteins found32.

3.3 RESULTS AND DISCUSSION

Here we compared protein profiles and PTMs of the WT strain with a strain lacking

MGG_13065 (E3 Ligase F-box protein), referred to as the E3 ligase protein, under mycelial conditions. Four biological replicates from both the WT and E3 ligase KO fungal strains grown in liquid culture were evaluated with each sample being injected twice, henceforth be referred to as technical replicates. All 16 injections were analyzed by nanoLC-MS/MS using a high-resolution accurate mass (HRAM) mass spectrometer as depicted in Figure 3.1. Data analysis was then conducted with a ≥ 5 NSpC cutoff and observed peptides were mapped to their respective proteins.

65

Figure 3.1: Experimental workflow for the analysis of WT and E3 Ligase KO M. oryzae.

3.3.1 E3 ligase protein expressed in WT but not KO strain

Initially, we demonstrated the expression of the E3 ligase protein in the WT and its absence from the E3 like KO mutant strain. These details are shown in Figure 3.2. The overall peptide density of the samples from WT and E3 ligase KO strain were similar.

Total ion chromatograms (TIC) were plotted for both the WT and E3 ligase KO samples, shown in Figure 3.2A and 3.2B. Inspection of these TIC’s indicated comparable results suggesting the genetic knockout did not drastically change the samples from one another allowing for an in depth inspection into the biological changes of peptide abundance as function of retention time.

66

Examination of the WT mass spectra revealed 5 doubly-charged unique peptides for MGG_13065, the E3 ligase protein (Figure 3.2H). One unique peptide had an m/z of

667.8023 and a retention time at 65.05 minutes. As shown in Figure 3.2C, the chromatographic peaks for the doubly-charged precursor ion [M]+2, [M+1]2+ (precursor ion with one 13C), and [M+2]2+ (precursor ion with two 13C) all align at the expected retention time whereas in Figure 3.2D, the chromatographic peak for the precursor ion is nearly undetectable and shifted to the right about 0.45 minutes to 65.50 minutes while the

[M+2]2+ ion has a small peak shifted to 65.50 minutes but elutes closer to 67 minutes.

To substantiate the absence of protein the E3 ligase F-box protein in the KO sample, the mass spectrum was examined at the retention time of the peptide as shown in Figure 3.2E for the WT sample and Figure 3.2F for the E3 ligase KO sample. In Figure

3.2E, the precursor spectrum shows the isotopic distribution starting at m/z 667.8023 which was selected for fragmentation. This isotopic distribution also indicates a doubly- charged ion (isotopic spacing). In Figure 3.2F, there is no discernable isotopic distribution nor was the ion selected for fragmentation as it did not exist.

To confirm identification of the unique peptide shown in Figure 3.2H, the tandem

(MS/MS) mass spectrum, Figure 3.2G, was inspected for corresponding y- and b- fragmentation ions. This MS/MS spectrum is labelled with the series of y-ions observed that correspond to fragments shown in Figure 3.2H. This E3 ligase KO protein was consistently found in every WT injection averaging to 2 SpC per injection while it was consistently not observed in any E3 ligase KO injection. Collectively, the extracted ion chromatograms and mass spectral data in Figure 3.2, provides many forms of evidence to demonstrate that the E3 ligase protein is only expressed in the WT M. oryzae and not

67

in the E3 Ligase KO M. oryzae; therefore allowing characterization of differences between the two proteomes to elucidate the biological mechanism of virulence related to this protein and identify of future knock out targets.

68

Figure 3.2. LC-MS/MS evidence of the E3 Ligase KO. Representative chromatograms of the (A) WT and (B) E3 ligase KO analyses. Extracted ion chromatograms of the representative peptide of this protein were then evaluated for the E3 ligase F-box protein in the (C) WT and (D) E3 ligase KO samples. Extracted ion chromatograms for the peptide of interest can be seen in the WT (E) and E3 ligase KO (F) samples. The blue lines indicate the precursor ion (A). The orange are the [A+1]+ (A plus one 13C) and the grey lines are the [A+2]2+ (A with two 13C) ions. The MS/MS spectrum with corresponding y- fragmentation patterns (G) of the E3 ligase F-box protein unique peptide observed in (H).

69

3.3.2 Technical analysis of protein profiles for WT and gene KO strains

To enable accurate comparison of the two experimental conditions, normalization was performed. In the normalization procedure, the SpCs were adjusted to the same relative scale to reduce variance between the biological and technical replicates, as well as, distinguish overall changes between the WT and the E3 ligase KO. The normalization using total normalization technique was employed using Equation 3.1 as previously described32. The average of normalization coefficients from all biological and technical injections is 1.203 suggesting low variability between the sample injections of the experiment prior to normalization.

Normalization Coeff = highest number of spectral counts (Equation 3.1) individual sample spectral counts

The graphs in Figure 3.3 show the normalization deviations of protein NSpCs at the biological and technical replicate levels. A Pearson correlation coefficient (r) of 1 indicates highly positive similarity to each other in the biological or technical replicates.

Comparing technical replicate NSpCs between injections 1 and 2 allows for examination of the analytical variance while comparing biological replicate NSpCs to the total NSpCs allowed for inspection of biological variances between samples.

70

Figure 3.3. Normalized spectral counting scatterplots of (A) WT and (B) E3 ligase KO technical replicates. To get an accurate comparison of each injection, the average normalized spectral counts for all first technical replicates were plotted against the average normalized spectral counts for all second technical replicates per protein. The

WT and E3 ligase KO biological replicates were then compared first for the (C) WT samples as shown and then for the (D) E3 ligase KO samples. The average number of normalized spectral counts per protein in a biological sample was plotted against the average total number of spectral counts across all biological samples.

71

The graphs of technical replicates in Figure 3.3A and 3.3B suggested there were very little differences between the data collected between each injection for both WT samples and E3 ligase KO samples. In Figure 3.3A and 3.3B the average NSpC of technical replicate 1 of each biological replicate for a given protein was plotted against the average NSpC of technical replicate 2 of each biological replicate for a given protein.

The technical replicate information for the WT samples can be found in Figure 3.3A. For unnormalized technical replicates, the Pearson correlation coefficient was found to be

0.996 (Fig. A.1A) and the normalized Pearson correlation coefficient is y= 0.996. The technical replicate information for the E3 ligase KO samples can be found in Figure 3.3B.

For unnormalized technical replicates, the slope was reported to be 0.995 (Fig. A.1B) and the normalized slope is 0.995. These data suggest that the technical replicate injections were nearly identical in the information obtained and provided little source of variation in our biological study.

The normalized biological replicate information for the WT and E3 ligase KO samples can be found in Figure 3.3C and 3.3D. In Figure 3.3C and 3.3D, the average

NSpC of a biological replicate for a given protein was plotted against the average NSpC across all the replicates (technical and biological) for a given protein. For the WT samples, normalized Pearson correlation coefficients ranged from 0.990 to 0.99, which was slightly above those observed for the E3 ligase KO samples which ranges from 0.964 to 0.993.

Normalization between the two sample types resulted in compensating changes to the

Pearson correlation coefficients, which ranged from 0.984 to 0.992 for WT non- normalized samples and 0.964 to 0.993 for E3 ligase KO non-normalized samples. (Fig

A.1C and A.1B).

72

The differences in Pearson correlation coefficients suggested there was slight variation between biological replications in both the WT and E3 ligase KO samples, which is not unexpected. Small deviations away from 1.0 likely due to the genuine biological differences in the samples and very minor changes in chromatography due to randomized sample injections. No large deviations or extreme Pearson correlation coefficient values were noted that would suggest sample contamination or accidental differences in collection points.

Figure 3.4. (A) Venn diagram of the total number of unique proteins discovered in all of the samples (before applying cut-off). (B) A histogram of the number of normalized spectral counts for the proteins found exclusively in the WT fungal samples. (C) A volcano plot comparing the normalized spectral counts of the overlapping proteins. Proteins with a statistical increase in abundance in the E3 ligase KO samples are in the green box while proteins with a statistical decrease in abundance are in the red box. (D) A histogram of the number of normalized spectral counts for the proteins found uniquely in the E3 ligase

KO fungal samples.

73

3.3.3 Proteome comparison of WT and E3 ligase KO strains

Overall, 5,601 proteins were identified in this study. Proteins totaling to 5,078 and

5,082 were discovered in the WT samples and E3 ligase KO samples, respectively.

Common proteins that were found in both sets of samples totaled to 4,559 proteins as shown in Figure 3.4A. After the NSpC ≥ 5 cutoff was applied, these totals were reduced to 4,432 total unique proteins, 4,360 proteins in the WT samples, and 4,372 proteins in the E3 ligase KO samples. The total of shared proteins found in both sets of samples were reduced 4,300 proteins.

We first identified proteins specific to each sample. Figures 3.4B and 3.4D show histograms, binned in groups of 5 NSpCs, showing the number of unique proteins found in WT samples or E3 ligase KO samples. There were 420 proteins, found only in the WT samples, and 452 proteins, found only in the E3 ligase KO samples, that had <5 NSpC.

These low abundance proteins may be examples of common proteins found near the limit of detection due to the nature of the DDA sampling and thus they were removed from further consideration. Fifty-nine and 71 proteins with ≥ 5 NSpCs were found only in the

WT either E3 ligase protein KO strain (Tables A.3 & A.4), respectively. Of these, 24 of the 59 unique peptides in the WT strain and 29 of the 71 unique peptides in the E3 ligase

KO strain could be annotated with putative biological function using Pfam33 (Table 3.1).

A list of these putative functions can be found in the Table A.6.

74

Table 3.1. Proteins up- and down-regulated and unique to the WT or E3 Ligase KO mycelia samples. The numbers in the parenthesis are the subset of that group with protein annotation while the rest are hypothetical proteins with no name or function known.

Modifications Increased in Decreased in Specific to Specific to E3 ligase KO E3 ligase KO WT E3 ligase KO Unmodified 62 (51) 61 (49) 28 (20) 32 (23) Phosphorylated 8 (6) 1 (1) 5 (1) 7 (2) Ubiquitinated 9 (6) 3 (2) 13 (7) 9 (3) Phosphorylated & Ubiquitinated 2 (1) 0 13 (7) 23 (14) Total 81 (64) 65 (52) 59 (35) 71 (42)

Second, we ascertained proteins that showed differential accumulation. Figure

3.4C is a volcano plot of the overlapping 4,559 proteins. Volcano plots are traditionally used in microarray analyses but, have been recently been applied to large proteomics data sets and used to identify proteins that show the significant changes in accumulation based on fold-change and p -value 34. A two tailed Student’s t-test was used to generate p-values for each protein. For this study, the p-value significance cut off of 0.05 and a fold-change value of 2 or more was chosen to define any significant increase or decrease of the protein abundance. In previous work, the spectral count cut-off value of 5 was chosen to ensure accurate quantification of the proteins by removing proteins considered below the limit of quantification. More specifically, Gokce and coworkers showed that by choosing a cut off value of 5 NSpC while using a fold change of ≥2 and a p-value of ≤0.05, with the number of proteins involved in this study, the false-positive rate (FPR) was approximately 5% which is lower than the typically used maximum of 10% FPR32. Figure

3.4C shows a total of 80 proteins with a statistical increase in abundance in the E3 ligase

KO samples, 16 of which were identified as hypothetical proteins with no Pfam annotation

75

and 65 proteins with a statistical decrease in abundance, of which 13 were hypothetical proteins with no Pfam annotation33 (Table 3.1 and Tables A.1, A.2, & A.6). In sum, from the combined data of both differential expression and unique to a particular sample, we identified 275 proteins of interest where 124 were associated with the WT sample and

151 with the E3 ligase KO mutant.

3.3.3.1 Occurrence of both ubiquitination and phosphorylation on proteins

We analyzed the occurrence of both ubiquitination and phosphorylation PTMs on the differential or unique proteins due to their relationship within the SCF E3 ligase. A comprehensive list of these proteins with their respective peptide and PTM site information can be found in Table A.5. Phosphorylation of the target protein is required to occur so that it can bind to the LRR of the F-box protein within the structure16,35–37. This allows the target protein to come into the proximity of the E2 ligase and therefore be ubiquitinated or ubiquitin is added to an already existing ubiquitination chain by the E3 ligase. Once polyubiquitination occurs, the proteins are typically directed to the 26S proteasome for degradation. Here, without the E3 ligase F-box protein, we hypothesized that phosphorylated protein targets would accumulate because they could not be ubiquitinated and degraded in the KO mutant. Furthermore, absence of the E3 ligase may result in reduced ubiquitination overall in the mutant strain.

Comparison of abundance and modifications between the WT and KO mutant proteins provided putative candidates of E3 ligase target proteins which were uniquely expressed or significantly enriched in the mutant as well as being phosphorylated. These

76

proteins included those involved in autophagy (MGG_09262, autophagy protein 5), carbohydrate metabolism (MGG_08985, beta-xylosidase, MGG_08919, UDP-glucose, sterol transferase), kinase activity (MGG_08174, mitogen-activated protein kinase organizer 1), sulfate transport (MGG_09838, sulfate transporter 4.1), and DNA replication

(MGG_07222, DNA polymerase epsilon subunit B).

In addition, a number of proteins including a polyketide synthase/peptide synthetase (MGG_12477), a lipase (MGG_00314), a CMGC/SRPK protein kinase

(MGG_10596), a MGG_10596 (MGG_13493) and a C6 zinc finger domain-containing protein (MGG_06778) were abundant in wild type protein sample. This data revealed that during mycelia growth, various biological processes are regulated by E3 ligase activity and therefore represent putative targets for further investigation.

3.4 CONCLUSIONS

A label-free global proteomics experiment was used to determine the changes in the mycelial proteome between a WT and E3 ligase KO fungi. Sixteen total injections from 8 biological samples were analyzed using LC-MS/MS. The E3 ligase F-box protein was found to be absent in the E3 ligase KO samples but present in the WT samples, confirming expectations that gene knockout worked correctly. The remaining 4432 proteins were then compared. One hundred and twenty-four proteins were observed with a statistical increase in abundance in or specific to wild type compared to 151 proteins with increased abundance in or specific to the E3 ligase KO samples. Proteins of interest included: MGG_09262, MGG_08985, beta-xylosidase, MGG_08919, UDP-glucose, sterol transferase, MGG_08174, mitogen-activated protein kinase organizer 1,

77

MGG_09838, sulfate transporter 4.1, MGG_07222, DNA polymerase epsilon subunit B,

MGG_12477, MGG_00314, MGG_10596, MGG_10596, MGG_13493 and MGG_06778 all involved in various biological pathways ranging from autophagy to kinase activities.

Biological pathway analysis proved to be difficult in part due to the limited number of proteins of interest identified in this study and the fact that more than 50% of the significant proteins were considered hypothetical and did not have any meaningful functional annotation. In addition, based on label-free global proteomics analysis, we were unable to confirm or exclude the involvement of phosphorylated and ubiquitinated proteins in a biological pathway linked to pathogenicity in M. oryzae. We suggest a pull-down of both phosphorylated and ubiquitinated proteins may provide more in-depth data for studying

PTMs directly.

3.5 ACKNOWLEDGMENTS

This work was performed in part by the Molecular Education, Technology and

Research Innovation Center (METRIC) at NC State University, which is supported by the

State of North Carolina. The authors gratefully acknowledge the financial support received from the U.S Department of Agriculture (National Institute of Food and

Agriculture) Grant Number 2014-67013-21722, and North Carolina State University.

78

3.6 LITERATURE CITED

(1) Payne, G. A.; Nierman, W. C.; Wortman, J. R.; Pritchard, B. L.; Brown, D.; Dean,

R. A.; Bhatnagar, D.; Cleveland, T. E.; Machida, M.; Yu, J. Whole Genome

Comparison of Aspergillus Flavus and A. Oryzae. Med. Mycol. 2006, 44 (s1), 9–

11. https://doi.org/10.1080/13693780600835716.

(2) Nierman, W. C.; Pain, A.; Anderson, M. J.; Wortman, J. R.; Kim, H. S.; Arroyo, J.;

Berriman, M.; Abe, K.; Archer, D. B.; Bermejo, C.; Bennett, J.; Bowyer, P.; Chen,

D.; Collins, M.; Coulsen, R.; Davies, R.; Dyer, P. S.; Farman, M.; Fedorova, N.;

Fedorova, N.; Feldblyum, T. V.; Fischer, R.; Fosker, N.; Fraser, A.; García, J. L.;

García, M. J.; Goble, A.; Goldman, G. H.; Gomi, K.; Griffith-Jones, S.; Gwilliam,

R.; Haas, B.; Haas, H.; Harris, D.; Horiuchi, H.; Huang, J.; Humphray, S.;

Jiménez, J.; Keller, N.; Khouri, H.; Kitamoto, K.; Kobayashi, T.; Konzack, S.;

Kulkarni, R.; Kumagai, T.; Lafton, A.; Latgé, J.-P.; Li, W.; Lord, A.; Lu, C.;

Majoros, W. H.; May, G. S.; Miller, B. L.; Mohamoud, Y.; Molina, M.; Monod, M.;

Mouyna, I.; Mulligan, S.; Murphy, L.; O’Neil, S.; Paulsen, I.; Peñalva, M. A.;

Pertea, M.; Price, C.; Pritchard, B. L.; Quail, M. A.; Rabbinowitsch, E.; Rawlins,

N.; Rajandream, M.-A.; Reichard, U.; Renauld, H.; Robson, G. D.; de Córdoba, S.

R.; Rodríguez-Peña, J. M.; Ronning, C. M.; Rutter, S.; Salzberg, S. L.; Sanchez,

M.; Sánchez-Ferrero, J. C.; Saunders, D.; Seeger, K.; Squares, R.; Squares, S.;

Takeuchi, M.; Tekaia, F.; Turner, G.; de Aldana, C. R. V.; Weidman, J.; White, O.;

Woodward, J.; Yu, J.-H.; Fraser, C.; Galagan, J. E.; Asai, K.; Machida, M.; Hall,

N.; Barrell, B.; Denning, D. W. Genomic Sequence of the Pathogenic and

Allergenic Filamentous Fungus Aspergillus Fumigatus. Nature 2005, 438 (7071), 79

1151–1156. https://doi.org/10.1038/nature04332.

(3) Machida, M.; Asai, K.; Sano, M.; Tanaka, T.; Kumagai, T.; Terai, G.; Kusumoto,

K.-I.; Arima, T.; Akita, O.; Kashiwagi, Y.; Abe, K.; Gomi, K.; Horiuchi, H.; Kitamoto,

K.; Kobayashi, T.; Takeuchi, M.; Denning, D. W.; Galagan, J. E.; Nierman, W. C.;

Yu, J.; Archer, D. B.; Bennett, J. W.; Bhatnagar, D.; Cleveland, T. E.; Fedorova,

N. D.; Gotoh, O.; Horikawa, H.; Hosoyama, A.; Ichinomiya, M.; Igarashi, R.;

Iwashita, K.; Juvvadi, P. R.; Kato, M.; Kato, Y.; Kin, T.; Kokubun, A.; Maeda, H.;

Maeyama, N.; Maruyama, J.; Nagasaki, H.; Nakajima, T.; Oda, K.; Okada, K.;

Paulsen, I.; Sakamoto, K.; Sawano, T.; Takahashi, M.; Takase, K.; Terabayashi,

Y.; Wortman, J. R.; Yamada, O.; Yamagata, Y.; Anazawa, H.; Hata, Y.; Koide, Y.;

Komori, T.; Koyama, Y.; Minetoki, T.; Suharnan, S.; Tanaka, A.; Isono, K.;

Kuhara, S.; Ogasawara, N.; Kikuchi, H. Genome Sequencing and Analysis of

Aspergillus Oryzae. Nature 2005, 438 (7071), 1157–1161.

https://doi.org/10.1038/nature04300.

(4) Brown, D. W.; Cheung, F.; Proctor, R. H.; Butchko, R. A. E.; Zheng, L.; Lee, Y.;

Utterback, T.; Smith, S.; Feldblyum, T.; Glenn, A. E.; Plattner, R. D.; Kendra, D.

F.; Town, C. D.; Whitelaw, C. A. Comparative Analysis of 87,000 Expressed

Sequence Tags from the Fumonisin-Producing Fungus Fusarium Verticillioides.

Fungal Genet. Biol. 2005, 42 (10), 848–861.

https://doi.org/10.1016/J.FGB.2005.06.001.

(5) Dean, R. A.; Talbot, N. J.; Ebbole, D. J.; Farman, M. L.; Mitchell, T. K.; Orbach, M.

J.; Thon, M.; Kulkarni, R.; Xu, J.-R.; Pan, H.; Read, N. D.; Lee, Y.-H.; Carbone, I.;

Brown, D.; Oh, Y. Y.; Donofrio, N.; Jeong, J. S.; Soanes, D. M.; Djonovic, S.;

80

Kolomiets, E.; Rehmeyer, C.; Li, W.; Harding, M.; Kim, S.; Lebrun, M.-H.; Bohnert,

H.; Coughlan, S.; Butler, J.; Calvo, S.; Ma, L.-J.; Nicol, R.; Purcell, S.; Nusbaum,

C.; Galagan, J. E.; Birren, B. W. The Genome Sequence of the Rice Blast Fungus

Magnaporthe Grisea. Nature 2005, 434 (7036), 980–986.

https://doi.org/10.1038/nature03449.

(6) Galagan, J. E.; Calvo, S. E.; Borkovich, K. A.; Selker, E. U.; Read, N. D.; Jaffe,

D.; FitzHugh, W.; Ma, L.-J.; Smirnov, S.; Purcell, S.; Rehman, B.; Elkins, T.;

Engels, R.; Wang, S.; Nielsen, C. B.; Butler, J.; Endrizzi, M.; Qui, D.; Ianakiev, P.;

Bell-Pedersen, D.; Nelson, M. A.; Werner-Washburne, M.; Selitrennikoff, C. P.;

Kinsey, J. A.; Braun, E. L.; Zelter, A.; Schulte, U.; Kothe, G. O.; Jedd, G.; Mewes,

W.; Staben, C.; Marcotte, E.; Greenberg, D.; Roy, A.; Foley, K.; Naylor, J.;

Stange-Thomann, N.; Barrett, R.; Gnerre, S.; Kamal, M.; Kamvysselis, M.;

Mauceli, E.; Bielke, C.; Rudd, S.; Frishman, D.; Krystofova, S.; Rasmussen, C.;

Metzenberg, R. L.; Perkins, D. D.; Kroken, S.; Cogoni, C.; Macino, G.;

Catcheside, D.; Li, W.; Pratt, R. J.; Osmani, S. A.; DeSouza, C. P. C.; Glass, L.;

Orbach, M. J.; Berglund, J. A.; Voelker, R.; Yarden, O.; Plamann, M.; Seiler, S.;

Dunlap, J.; Radford, A.; Aramayo, R.; Natvig, D. O.; Alex, L. A.; Mannhaupt, G.;

Ebbole, D. J.; Freitag, M.; Paulsen, I.; Sachs, M. S.; Lander, E. S.; Nusbaum, C.;

Birren, B. The Genome Sequence of the Filamentous Fungus Neurospora

Crassa. Nature 2003, 422 (6934), 859–868. https://doi.org/10.1038/nature01554.

(7) Martinez, D.; Larrondo, L. F.; Putnam, N.; Gelpke, M. D. S.; Huang, K.; Chapman,

J.; Helfenbein, K. G.; Ramaiya, P.; Detter, J. C.; Larimer, F.; Coutinho, P. M.;

Henrissat, B.; Berka, R.; Cullen, D.; Rokhsar, D. Genome Sequence of the

81

Lignocellulose Degrading Fungus Phanerochaete Chrysosporium Strain RP78.

Nat. Biotechnol. 2004, 22 (6), 695–700. https://doi.org/10.1038/nbt967.

(8) Greer, C. A.; Webster, R. K. Occurrence, Distribution, Epidemiology, Cultivar

Reaction, and Management of Rice Blast Disease in California. Plant Dis. 1096,

85 (10).

(9) Kim, Y.; Nandakumar, M. P.; Marten, M. R. Proteomics of Filamentous Fungi.

Trends Biotechnol. 2007, 25 (9), 395–400.

https://doi.org/10.1016/J.TIBTECH.2007.07.008.

(10) Compton, P. D.; Kelleher, N. L.; Gunawardena, J. Estimating the Distribution of

Protein Post-Translational Modification States by Mass-Spectrometry. J.

Proteome Res. 2018, acs.jproteome.8b00150.

https://doi.org/10.1021/acs.jproteome.8b00150.

(11) Xu, G.; Jaffrey, S. R. Proteomic Identification of Protein Ubiquitination Events.

Biotechnol. Genet. Eng. Rev. 2013, 29 (1), 73–109.

https://doi.org/10.1080/02648725.2013.801232.

(12) Leach, M. D.; Brown, A. J. P. Posttranslational Modifications of Proteins in the

Pathobiology of Medically Relevant Fungi. Eukaryot. Cell 2012, 11 (2), 98–108.

https://doi.org/10.1128/EC.05238-11.

(13) Aebersold, R.; Mann, M. Mass Spectrometry-Based Proteomics. Nature 2003,

422 (6928), 198–207. https://doi.org/10.1038/nature01511.

(14) Cieśla, J.; Frączyk, T.; Rode, W. Phosphorylation of Basic Amino Acid Residues

82

in Proteins: Important but Easily Missed; 2011.

(15) Wilkinson, K. D. Ubiquitination and Deubiquitination: Targeting of Proteins for

Degradation by the Proteasome. Semin. Cell Dev. Biol. 2000, 11 (3), 141–148.

https://doi.org/10.1006/SCDB.2000.0164.

(16) Liu, T.-B.; Xue, C. The Ubiquitin-Proteasome System and F-Box Proteins in

Pathogenic Fungi. Mycobiology 2011, 39 (4), 243–248.

https://doi.org/10.5941/MYCO.2011.39.4.243.

(17) Ubiquitin: Structures, Functions, Mechanisms. Biochim. Biophys. Acta - Mol. Cell

Res. 2004, 1695 (1–3), 55–72. https://doi.org/10.1016/J.BBAMCR.2004.09.019.

(18) Hershko, A.; Ciechanover, A. THE UBIQUITIN SYSTEM. Annu. Rev. Biochem.

1998, 67 (1), 425–479. https://doi.org/10.1146/annurev.biochem.67.1.425.

(19) Callis, J. The Ubiquitination Machinery of the Ubiquitin System. Arab. B. 2014, 12,

e0174. https://doi.org/10.1199/tab.0174.

(20) Akutsu, M.; Dikic, I.; Bremm, A. Ubiquitin Chain Diversity at a Glance. J. Cell Sci.

2016, 129 (5), 875–880. https://doi.org/10.1242/jcs.183954.

(21) Moon, J.; Parry, G.; Estelle, M. The Ubiquitin-Proteasome Pathway and Plant

Development. Plant Cell 2004, 16 (12), 3181–3195.

https://doi.org/10.1105/tpc.104.161220.

(22) Sullivan, J. A.; Shirasu, K.; Deng, X. W. The Diverse Roles of Ubiquitin and the

26S Proteasome in the Life of Plants. Nat. Rev. Genet. 2003, 4 (12), 948–958.

https://doi.org/10.1038/nrg1228.

83

(23) Belknap, W. R.; Garbarino, J. E. The Role of Ubiquitin in Plant Senescence and

Stress Responses. Trends Plant Sci. 1996, 1 (10), 331–335.

https://doi.org/10.1016/S1360-1385(96)82593-0.

(24) Devoto, A.; Muskett, P. R.; Shirasu, K. Role of Ubiquitination in the Regulation of

Plant Defence against Pathogens. Curr. Opin. Plant Biol. 2003, 6 (4), 307–311.

https://doi.org/10.1016/S1369-5266(03)00060-8.

(25) Sharma, B.; Joshi, D.; Yadav, P. K.; Gupta, A. K.; Bhatt, T. K. Role of Ubiquitin-

Mediated Degradation System in Plant Biology. Front. Plant Sci. 2016, 7, 806.

https://doi.org/10.3389/fpls.2016.00806.

(26) Oh, Y.; Franck, W. L.; Han, S.-O.; Shows, A.; Gokce, E.; Muddiman, D. C.; Dean,

R. A. Polyubiquitin Is Required for Growth, Development and Pathogenicity in the

Rice Blast Fungus Magnaporthe Oryzae. 2012, 7 (8), e42868.

https://doi.org/10.1371/journal.pone.0042868.

(27) Guo, M.; Gao, F.; Zhu, X.; Nie, X.; Pan, Y. M.; Gao, Z. MoGrr1, a Novel F-Box

Protein, Is Involved in Conidiogenesis and Cell Wall Integrity and Is Critical for the

Full Virulence of Magnaporthe Oryzae. Appl. Microbiol. Biotechnol. 2015, 99 (19),

8075–8088. https://doi.org/10.1007/s00253-015-6820-x.

(28) Gorelik, M.; Orlicky, S.; Sartori, M. A.; Tang, X.; Marcon, E.; Kurinov, I.;

Greenblatt, J. F.; Tyers, M.; Moffat, J.; Sicheri, F.; Sidhu, S. S. Inhibition of SCF

Ubiquitin Ligases by Engineered Ubiquitin Variants That Target the Cul1 Binding

Site on the Skp1–F-Box Interface. Proc. Natl. Acad. Sci. 2016.

https://doi.org/10.1073/pnas.1519389113.

84

(29) Wiśniewski, J. R.; Zougman, A.; Nagaraj, N.; Mann, M. Universal Sample

Preparation Method for Proteome Analysis. Nat. Methods 2009, 6 (5), 359–362.

https://doi.org/10.1038/nmeth.1322.

(30) Nielsen, M. L.; Vermeulen, M.; Bonaldi, T.; Cox, J.; Moroder, L.; Mann, M.

Iodoacetamide-Induced Artifact Mimics Ubiquitination in Mass Spectrometry. Nat.

Methods 2008, 5 (6), 459–460. https://doi.org/10.1038/nmeth0608-459.

(31) Rappsilber, J.; Mann, M.; Ishihama, Y. Protocol for Micro-Purification, Enrichment,

Pre-Fractionation and Storage of Peptides for Proteomics Using StageTips. Nat.

Protoc. 2007, 2 (8), 1896–1906. https://doi.org/10.1038/nprot.2007.261.

(32) Gokce, E.; Shuford, C. M.; Franck, W. L.; Dean, R. A.; Muddiman, D. C.

Evaluation of Normalization Methods on GeLC-MS/MS Label-Free Spectral

Counting Data to Correct for Variation during Proteomic Workflows. J. Am. Soc.

Mass Spectrom. 2011, 22 (12), 2199–2208. https://doi.org/10.1007/s13361-011-

0237-2.

(33) El-Gebali, S.; Mistry, J.; Bateman, A.; Eddy, S. R.; Luciani, A.; Potter, S. C.;

Qureshi, M.; Richardson, L. J.; Salazar, G. A.; Smart, A.; Sonnhammer, E. L. L.;

Hirsh, L.; Paladin, L.; Piovesan, D.; Tosatto, S. C. E.; Finn, R. D. The Pfam

Protein Families Database in 2019. Nucleic Acids Res. 2018, 47, 427–432.

https://doi.org/10.1093/nar/gky995.

(34) LI, W. Volcano Plots In Analyzing Differential Expressions With mRNA

Microarrays. J. Bioinform. Comput. Biol. 2012, 10 (06), 1231003.

https://doi.org/10.1142/S0219720012310038.

85

(35) Chang, B.; Partha, S.; Hofmann, K.; Lei, M.; Goebl, M.; Harper, J. W.; Elledge, S.

J. SKP1 Connects Cell Cycle Regulators to the Ubiquitin Proteolysis Machinery

through a Novel Motif, the F-Box. Cell 1996, 86 (2), 263–274.

https://doi.org/10.1016/S0092-8674(00)80098-7.

(36) Feldman, R. M. R.; Correll, C. C.; Kaplan, K. B.; Deshaies, R. J. A Complex of

Cdc4p, Skp1p, and Cdc53p/Cullin Catalyzes Ubiquitination of the Phosphorylated

CDK Inhibitor Sic1p. Cell 1997, 91 (2), 221–230. https://doi.org/10.1016/S0092-

8674(00)80404-3.

(37) Skowyra, D.; Craig, K. L.; Tyers, M.; Elledge, S. J.; Harper, J. W. F-Box Proteins

Are Receptors That Recruit Phosphorylated Substrates to the SCF Ubiquitin-

Ligase Complex. Cell 1997, 91 (2), 209–219. https://doi.org/10.1016/S0092-

8674(00)80403-1.

86

Chapter 4

Simultaneous Monitoring of Identification, Identification-Free, and Quantitative Metrics to Assess Systems Suitability in LC-MS/MS Based Proteomic Experiments

4.1 INTRODUCTION

Proteomics is the systematic study of a proteome and represents a complex biological field that provides significant insight into how an organism functions at a molecular level. Liquid chromatography tandem mass spectrometry (LC-MS/MS) is the premier analytical tool for proteomic investigations owing to its high sensitivity and unparalleled molecular specificity. As technologies continue to advance and experimental designs become more complex, data acquisition can take place over a course of months for a single project. Consequently, the inclusion of a systems suitability workflow to ensure optimal and reproducible data collection is critically important.

To quantitatively monitor the reproducibility of the data being collected, performance metrics of the LC-MS/MS need to be part of the experimental design. One approach is the use of system suitability standards to ensure the instruments are maintaining a level of acceptance based on varied criteria. System suitability standards are common for inter- and intra-lab reproducibility such as between people, between labs, core facilities, and pharmaceutical companies. There are 3 main types of system suitability standards used in modern proteomics research: simple, complex, and mixed1.

Simple system suitability standards generally consist of peptide digests of a few proteins

(bovine serum albumin (BSA) or cytochrome C)1,2. Simple system suitability standards

87

are used for tracking identification free metrics to monitor LC (e.g. retention times (RT) and peak widths) and MS (MMA, peptide abundance) performance. These system suitability samples are not ideal because they do not represent the complexity of a proteome. Complex system suitability standards are cellular lysates1–3. They are used to primarily evaluate MS performance by tracking identification metrics including the number of spectral matches, peptide and protein identifications. While these complex system suitability standards have the benefit of monitoring both the LC-MS and bioinformatic pipelines, changes in performance may not be readily identified due the effort required for database search.

Complex system suitability standards are inherently a proteome, it is essentially using an “unknown” to define MS performance attributes on a qualitative level. Mixed system suitability standards are a combination of both simple and complex system suitability standards that enable the benefits of both system suitability standards at the same time1. Since they are analyzed together, these system suitability standards take a shorter amount of time compared to running both separately, while gaining information on a variety of metrics including RT, peak width, mass measurement accuracy (MMA), peptide abundance (peak areas (PA) or PSMs), and number of protein IDs and groups.

A more recent and innovative approach is the introduction of stable isotope labeled peptides (isotopologues) spiked at 5 different concentrations which afford both the evaluation of quantitative metrics (LOD and LOQ) and the assessment of ID free metrics as a function of dynamic range to be monitored during the QC analysis4. Since the isotopologues are the same peptide, they will have the same retention times but they will

88

have different m/z values due to the change in overall mass with the increasing incorporation of 13C and 15N.

Herein we report the analysis of commercially available peptide reference mixes

(Pierce™ 7 × 5 LC-MS/MS System Suitability Standard5 and Promega 6 × 5 LC-MS/MS

Peptide Reference Mix6) spiked into HeLa cell lysate as new mixed system suitability standard sample where ID, ID free, and quantitative metrics can be monitored from a single injection. The LC metrics include RTs and peak widths at full width half max

(FWHM). From the isotopologue information within the mixes, were used to monitor ID free and quantitative changes (e.g. MMA, LOD, LOQ and dynamic range). Using the

HeLa, we were able to monitor changes in protein IDs as well as the number of PSMs within a whole proteome. These metrics were monitored over a series of 34 injections that changed only in sample composition (e.g. 6 × 5 mix and HeLa, 7 × 5 mix and HeLa and, 6 × 5 mix, 7 × 5 mix, and HeLa). Importantly, the mixed system suitability standards are created from commercially available products enabling adoption by the community.

We foresee these standards for not only evaluation of systems suitability but also a powerful tool to benchmark inter-instrument performance.

4.2 MATERIALS AND METHODS

4.2.1 Materials

Pierce™ HeLa Digest Standard (HeLa) and the Pierce™ LC-MS/MS System suitability Standard (7 × 5 mix) were purchased from Thermo Fisher Scientific (Rockford,

IL). The 6 × 5 LC-MS/MS Peptide Reference Mix (6 × 5 mix) was purchased from

89

Promega (Madison, WI). LC-MS grade water, acetonitrile and formic acid were purchased from Fisher Scientific (Hampton, NH).

4.2.2 Sample Preparation:

The HeLa standard was reconstituted in 100 µL of LC-MS grade water and 1% formic acid resulting in a final concentration of 200 ng/L. The 6 × 5 mix was reconstituted in 50 µL of LC-MS grade water resulting in a final concentration of 500 fmol/µL for the heaviest peptides. The 7 × 5 mix was reconstituted in 25 µL of LC-MS grade water and

37.5 µL of LC-MS grade water with 1% formic acid resulting in a concentration of 400 fmol/µL for the heaviest peptides. These stock solutions were further combined according to the sample contents in Table 4.1 so that the concentrations injected on column were

200 ng/µL of HeLa, 500 fmol/µL of the heaviest peptide of the 6 × 5 mix and 200 fmol/µL heaviest peptide of the 7 × 5 mix.

Table 4.1. Sample contents for the mixed system suitability standard study

90

4.2.3 NanoLC-MS/MS:

These studies were performed using reversed-phase nano-LC-MS/MS via an Easy nLC 1000 (Thermo Fisher Scientific, Waltham, MA) coupled to a Q Exactive high field quadrupole Orbitrap mass spectrometer (Q Exactive HF-X, Thermo Fisher Scientific,

Bremen, Germany). One, two or three µL of the samples were loaded onto a C18

Acclaim™ PepMap™ trap column (2 cm x 75 μm x 3 μm) (Thermo Fisher Scientific, West

Palm Beach, FL) connected to an C18 Acclaim™ PepMap™ analytical column (25 cm x

75 μm x 3 μm) (Thermo Fisher Scientific, West Palm Beach, FL). Mobile phase A (MPA) consisted of 98% water, 2% acetonitrile, and 0.01% formic acid. Mobile phase B (MPB) consisted of 98% acetonitrile, 2% water, and 0.01% formic acid. The peptides were eluted using a 50-minute linear gradient (2-40% MPB) at a flow-rate of 300 nL/minute.

Mass measurements were collected positive ions found in an m/z range of 375 to

1600 using a Top40 data-dependent acquisition (DDA) method. MS1 acquisition parameters include a resolving power (RP) of 120,000FWHM at m/z 200, an automatic gain control (AGC) of 3E6, and a maximum injection time (IT) of 100 ms. MS/MS acquisition parameters included an RP of 7,500FWHM, AGC of 1E5, and IT of 18 ms, 50 ms or 200 ms. Precursor ions were isolated within a 1.5 m/z window and dynamic exclusion was set for 20 seconds. Charge-state exclusion was used to prevent unassigned and 1+ charge states from being selected for fragmentation. Higher-energy collisional dissociation was performed using 27 as the normalized collision energy.

91

4.2.4 Data Analysis:

The LC-MS/MS .RAW files containing HeLa were searched against the concatenated homo sapien protein database and identified using the Sequest HT algorithm in Proteome Discoverer 2.2 (PD 2.2) (Thermo Fisher Scientific, Waltham, MA).

PD 2.2 searched the .RAW MS/MS spectra, with specified search parameters, against the aforementioned database to determine the amino acid sequence of peptides and maps the distinct sequence to a protein sequence. The search parameters specific to this experiment were set to 3 missed trypsin cleavage sites and utilized a precursor mass tolerance of 10 ppm and a fragment tolerance of 0.02 Da. Carbamidomethylation defined as a fixed modification on cysteine. Variable modifications included the oxidation of methionine (M) and two N-terminal modifications, acetylation and the loss of M with the addition an acetyl group. The maximum number of modifications per peptide was set to

3. The false discovery rates were calculated using the Percolator node and were set at a maximum of 1%.

The peptides and their isotopologues from samples containing the Promega 6 × 5 and Pierce™ 7 × 5 mixes were analyzed using MS1 filtering in Skyline. Retention times, peak widths at full width half max (FWHM), mass measurement accuracy, and peak abundances were all extracted from the analysis using Skyline7,8. The abundances of the isotopologues were used to calculate limit of detection (LOD) and limit of quantification

(LOQ). The LOD was calculated by performing a regression analysis using the average abundances of the isotopologues then setting the y equal to the upper 95% confidence interval of the average abundance of the isotopologues and solving for the concentration.

LOQ was simply defined as 3.3 times the LOD.

92

4.3 RESULTS AND DISCUSSION

In this study the 6 × 5 LC-MS/MS Peptide Reference Mix6 (6 × 5 mix) and the

Pierce™ LC-MS/MS System suitability Standard5 (7 × 5 mix) isotopologue mixtures

(Figure 4.1) were spiked into a complex HeLa cell lysate. The 6 × 5 mix includes six peptides (VTSGSTSTSR, LASVSVSR, YVYVADVAAK, VVGGLVALR, LLSLGAGEFK, and LGFTDLFSK) that range in amino acid lengths of 8-10, isoelectric points (pI) of 5.8 to 9.8 and, m/z values. These peptides also span a range of grand average of hydropathy

(GRAVY)9 scores between -0.6 to 1.9 and nonpolar surface areas (NPSA)10 between 765

Å to 1140 Å. Each of these peptides come in a series of five isotopologues that begin with one SIL amino acid and increases up to five or 6 SIL amino acids. With increasing SIL amino acids in the peptide, the concentration also increases linearly from 0.05 fmol up to

500 fmol where the peptide with 5-6 SIL amino acids in it has the highest concentration.

The 7 × 5 mix includes seven peptides (GISNEGQNASIK, IGDYAGIK,

TASEFDSAIAQDK, ELGQSGVDTYLQTK, SFANQPLEVVYSK, LTILEELR, and

ELASGLSFPVGFK) that range in amino acid lengths of 8-14, pIs of 4 to 6.1 and, m/z values. These peptides also span a range of GRAVY9 scores between -0.8 to 0.6 and

NPSAs10 between 943 Å to 1460 Å. Each of these peptides also come in a series of five isotopologues that begin with a light peptide (no SIL amino acids) and increases up to four SIL amino acids.

93

Figure 4.1. This figure is an adaptation from both reference mix protocols5,6. The top shows the retention time order of each isotopologue in the reference mixtures from

Promega and Pierce. The charts provide information about the individual peptides in the mixes. The bottom shows the examples how the SIL amino acids are incorporated into each peptide (red and underlined) as well as give the concentrations of each isotopologue in the samples.

First, the liquid chromatography metrics, the retention time and peak width stabilities, of the reference mixes were evaluated. Figure 4.2 shows example 94

chromatograms of the most abundant isotopologues of each mix and a combined sample.

In Figure 4.2A, the chromatogram shows the 6 × 5 mix peptides eluting across the gradient. The peptides eluted in this order: VTSGSTSTSR, LASVSVSR, YVYVADVAAK,

VVGGLVALR, LLSLGAGEFK, LGFTDLFSK. These peptides and their respective isotopologues have RTs of 1.03 ± 0.66 minutes, 16.27 ± 0.24 minutes, 21.79 ± 0.22 minutes, 24.07 ± 0.19 minutes, 26.60 ± 0.22 minutes, and 33.21 ± 0.29 minutes, respectively. Peak widths calculated at FWHM were determined to be 0.16 ± 0.21 minutes, 0.09 ± 0.04 minutes, 0.09 ± 0.03 minutes, 0.10 ± 0.04 minutes, 0.08 ± 0.03 minutes and 0.10 ± 0.04 minutes, in the same order. The average peak width of all of the peptides in the mix over the course of study is 0.10 ± 0.07 minutes The first peptide,

VTSGSTSTSR, is very hydrophilic and the majority of its isotopologues were only found in 3 of the 7 analyses at very low abundances while the remaining 5 peptides and the isotopologues were found consistently.

In Figure 4.2B, the chromatogram shows the 7 × 5 mix peptides eluting consistently across the gradient. The peptides eluted in this order: GISNEGQNASIK,

IGDYAGIK, TASEFDSAIAQDK, ELGQSGVDTYLQTK, SFANQPLEVVYSK, LTILEELR, and ELASGLSFPVGFK. Each peptide and their respective isotopologues have RTs of

15.04 ± 0.34 minutes, 19.36 ± 0.24 minutes, 21.60 ± 0.21 minutes, 24.46 ± 0.22 minutes,

27.23 ± 0.25 minutes, 29.36 ± 0.24 minutes and 32.75 ± 0.27, respectively. Peak widths calculated at FWHM were determined to be 0.07 ± 0.05 minutes, 0.08 ± 0.03 minutes,

0.10 ± 0.04 minutes, 0.08 ± 0.02 minutes, 0.11 ± 0.04 minutes, 0.10 ± 0.04 minutes and

0.14 ± 0.04 minutes, in the same order. The average peak width of all of the peptides in

95

the mix over the course of study is 0.10 ± 0.04 minutes. All seven peptides were and the majority of the isotopologues were found in every analysis.

As the peptides in each mix were different, the authors wanted to see if there were any benefits in combining the mixes such as covering more of the gradient. As shown in

Figure 4.2C, the combination of mixes lead to no enhancements as peptides as the range across the gradient was did not have a large increase. Peptide SFANQPLEVVYSK from the 7 × 5 mix slightly overlapped peptide YVYVADVAAK from the 6 × 5 mix and peptides

TASEFDSAIAQDK from the 7 × 5 mix and VVGGLVALR from the 6 × 5 mix completely overlapped during elution. LLSLGAGEFK from the 6 × 5 mix was detected at much lower levels than its overlapping peptide ELGQSGVDTYLQTK from the 7 × 5 mix with. The hydrophilic peptide VTSGSTSTSR from the 6 × 5 mix was detected at very low levels in the combined sample analyses.

96

Figure 4.2. Chromatograms of (A) the 6 × 5 mix, (B) the 7 × 5 mix and (C) a combined sample of both mixes. Peptides VTSGSTSTSR and LLSLGAGEFK from the 6 × 5 mix were identified at low abundances, if at all, in both the alone and combined samples. All seven peptides from the 7 × 5 mix were detected in every sample.

97

As mentioned previously, both the 6 × 5 mix and the 7 × 5 mix contain increasing concentrations of the peptides as the number of SIL amino acids in the isotopologues increase. The concentrations of the isotopologues range from 0.05 fmol to 500 fmol in the

6 × 5 mix and 0.13 fmol to 200 fmol in the 7 × 5 mix. This essentially provides a standard curve each peptide, as they are present in every analysis at known amounts. These standard curves allows for the calculation of the LOD and LOQ as well as observe the dynamic range of each analysis. Figure 4.3 contains examples of these standard curves when the mixes are combined with the HeLa cell lysate as well as the LOD and LOQ calculated for each peptide. The abundances for each isotopologue were normalized to most abundant isotoplogue of each peptide across all sample injections. Figure 4.3A shows an example standard curve for the most abundant peptide in the 6 × 5 mix and

HeLa combination sample, YVYVADVAAK. As mentioned previously, peptide

VTSGSTSTSR is extremely hydrophilic and was only found in four of the seven injections resulting in inconclusive results. The peptide highest concentration of LLSLGAGEFK was difficult to identify in this experiment for an unknown reason resulting in the removal of this outlier for analysis and only including the isotopologues with concentrations from 0.05 fmol to 50 fmol. The LOD’s calculated for each of the peptides analyzed individually were

4.02 fmol, 3.80 fmol, 2.13 fmol, 0.49 fmol and 5.22 fmol for peptides LASVSVSR,

VYVADVAAK, VVGGLVALR, LLSLGAGEFK and LGFTDLFSK, respectively. The LOD and LOQ of all peptides, except the outlier VTSGSTSTSR and the 500 fmol isotopologue of LLSLGAGEFK, combined was calculated by performing a linear regression using the average normalized abundances of all isotopologues at each concentration. The LOD was calculated to be 4.15 fmol and the LOQ was calculated to be 13.69 fmol, which is

98

just below the third lowest isotopologue concentration in the mix. The calculated R2 values of the standard curves (Figure B.1) for the four peptides that had all isotopologues concentrations were ≥ 0.99. This suggests that these peptides are highly reproducible across injections of the same sample. The calculated R2 value for LLSLGAGEFK with excluding the response from the isotopologue at 500 fmol isotopologue was also 0.99.

Figure 4.3B shows an example standard curve for the most abundant peptide,

ELGQSGVDTYLQTK, in the 7 × 5 mix and HeLa combination sample. LOD’s calculated for each of the peptides analyzed individually were 207.74 fmol, 0.23 fmol, 2.86 fmol, 2.08 fmol, 1.62 fmol, 13.22 fmol, and 1.84 fmol for peptides GISNEGQNASIK, IGDYAGIK,

TASEFDSAIAQDK, ELGQSGVDTYLQTK, SFANQPLEVVYSK, LTILEELR, and

ELASGLSFPVGFK, respectively. The LOD all of the peptides, except the outlier

GISNEGQNASIK, was 1.69 fmol and the LOQ was calculated to be 5.57 fmol, which is just below the third lowest isotopologue concentration. The calculated R2 values of the standard curves (Figure B. 2) were ≥0.99 for all peptides accept GISNEGQNASIK, which was 0.39. This also suggests that most of these peptides are highly reproducible across injections even when combined with HeLa cell lysate.

99

Figure 4.3. Example normalized standard curves for (A) the 6 × 5 mix and (B) the 7 × 5 mix and the calculated LOD and LOQ values for each peptide as well as the whole sample average. The calculations for the LOD and LOQ all of the peptides in each mix at each concentration did not include the outlier peptide VTSGSTSTSR or the 500 fmol isotopologue of LLSLGAGEFK in the 6 × 5 mix and did not include GISNEGQNASIK in the 7 × 5 mix.

The isotopologues also have the added benefit of increasing masses with the increasing incorporation of SIL amino acids. This allows for the tracking of MMAs a

100

function of dynamic range across the gradient. MMA values were extracted from Skyline for each isotopologue then, analyzed in Excel. Figure 4.4 depicts a series of boxplots for the MMA of all of the peptides at each concentration on column. Q Exactive Orbitrap mass spectrometers allow for an expected m/z fall within a window of ± 3 ppm (red dashed lines) when using a lock mass. Figure 4.4A depicts the changes in MMA for the peptides of the 6 × 5 mix. The top three most abundant peptides fall well within the 3 ppm error limit with only one outlier at the 500 fmol and 5 fmol, each. The peptides that are included in the mixes at 0.5 and 0.05 fmol show more variability in the identification of the correct m/z values with the highest variability attributed to the peptides at 0.05 fmol. The higher variability at these small amounts matches the results of the calculated LOD of this sample suggests that the dynamic range of the mixed system suitability standard sample containing both the 6 × 5 mix and HeLa spans over three orders of magnitude but not four orders of magnitude. Figure 4.4B depicts the changes in MMA for the peptides of the 7

× 5 mix. The top two most abundant peptides fall within the 3 ppm error limit with only one outlier at 20 fmol. The peptides that are included in the mix lower than 2 fmol show more variability in the identification of the correct m/z values with the highest variability belonging to the peptides at 0.13 fmol. The higher variability at these peptide concentrations also matches the results of the calculated LOD of this sample suggesting that the dynamic range of the mixed system suitability standard sample containing both the Pierce 7 × 5 mix and HeLa also spans over three orders of magnitude but not four.

101

Figure 4.4. Mass measurement accuracy box plots for all peptides in the (A) the 6 × 5 mix and (B) the 7 × 5 mix spiked into 200 ng HeLa cell lysate by concentration. The red dashed lines indicate 3 ppm, which is the maximum expected error for a Q Exactive instrument.

Finally, we evaluated the metrics associated with the performance of the mass spectrometer by evaluating the peptide identifications and the number of peptide spectral matching events for the HeLa cell lysate portion of the samples as seen in Figure 4.5.

102

Figure 4.5A shows the comparison of the number of protein identifications found when using the current method of injecting 1 µg of HeLa and using a 140 minute gradient to those employed with the mixed samples and Figure 4.5B shows the comparison for the

MS/MS events of the same samples. As each mass spectrometer and lab is different, the average of the number of accepted protein identifications and MS/MS events are arbitrary numbers that define the instrument is running correctly. In our lab, the arbitrary accepted numbers are 3000 protein identifications and 28,000 PSMs. Any number below these when using the standard protocol, 1 µg HeLa 140 minute gradient, indicates an issue within the mass spectrometer. To make sure the mass spectrometer was working correctly for this experiment, the standard protocol was run and an average of 3,400 proteins were identified from 32,024 PSMs. This suggested the mass spectrometer was in great working order and allowed the new sample procedures to suggest a new accepted arbitrary numbers for both metrics.

To establish new arbitrary thresholds using less injected protein and a shorter gradient, 200 ng samples of HeLa lysate were analyzed without any spike in mixtures.

These injections identified an average of 1,451 proteins from 6,649 PSMs. A ratio calculation was used to set the new arbitrary acceptable identifications limit at 1,250 proteins and 5,800 PSMs. The 6 × 5 mix in 200 ng of HeLa samples averaged 1417 protein identifications from 6,639 PSMs. When compared to the new limits for values, they were well above the new accepted thresholds. The 7 × 5 mix in 200 ng of HeLa samples averaged 1,283 protein identifications from 5,548 PSMs. When compared to the new limits for values, protein identifications were above the accepted value while the

103

PSMs were roughly 250 lower than expected. The lower values are likely due to unknown interferences of the peptide mixture, as it had not been previously tested.

Figure 4.5. (A) The average number of HeLa proteins identified in each sample type. (B)

The average number of peptide spectral matches (PSMs) obtained from each sample composition. Error bars were determined as the standard error associated from the seven injections of each. The samples containing 1 µg of HeLa lysate were analyzed using a

140 minute gradient for comparison to the current system suitability standard procedures.

104

4.4 CONCLUSIONS

We have reported the development of a novel mixed system suitability standard to evaluate both LC and MS simultaneously. Since both 6 × 5 mix and 7 × 5 mixes included isotopologues at increasing concentrations within the samples, we were also able to evaluate the LOD, LOQ, MMA, and dynamic range of the samples. Five of the six peptides in the 6 × 5 mix were consistently observed across the gradient but resulted in high LOD and LOQ values due to some inconsistencies with the two of the peptides. However, when these outliers were removed, the LOD was calculated to be just below the third concentration of the isotopologue in the mix. This, along with the MMA variability suggests that even when into HeLa cell lysate, the dynamic range of the sample was over three orders of magnitude. The 6 × 5 mix and HeLa mix also allowed for the number of proteins and PSMs to be well above the new accepted values. All 7 of the in the 7 × 5 mix were consistently observed across the gradient but resulted in low LOD and LOQ values even with slight inconsistencies for the detection of 1 of the peptides. With the removal of this outlier peptide, this resulted in the LOD fall just below the third concentration of the isotopologue in the mix. This, along with the MMA variability indicates that when spiked into HeLa cell lysate, the dynamic range of the sample also reaches just over three orders of magnitude. The analysis of the HeLa proteins in the sample containing both 7 × 5 mix and HeLa lysate allowed for the detected number of proteins to fall just above the new accepted values while the number of PSMs fell roughly 250 below the new accepted values. Given these results, the best mixed system suitability sample from this study would be the Promega 6 × 5 Reference mix spiked into 200 ng of HeLa cell lysate, as it does not interfere with the HeLa cell lysate results.

105

4.5 ACKNOWLEDGMENTS

The mass spectrometry measurements were carried out in the Molecular

Education, Technology, and Research Innovation Center (METRIC) at North Carolina

State University. The authors gratefully acknowledge the financial support received from

North Carolina State University.

106

4.6 LITERATURE CITED

(1) Bittremieux, W.; Tabb, D. L.; Impens, F.; Staes, A.; Timmerman, E.; Martens, L.;

Laukens, K. Quality Control in Mass Spectrometry-Based Proteomics. Mass

Spectrom. Rev. 2018, 37 (5), 697–711.

(2) Bereman, M. S. Tools for Monitoring System Suitability in LC MS/MS Centric

Proteomic Experiments. Proteomics 2015, 15 (5–6), 891–902.

(3) Beri, J.; Rosenblatt, M. M.; Strauss, E.; Urh, M.; Bereman, M. S. Reagent for

Evaluating Liquid Chromatography–Tandem Mass Spectrometry (LC-MS/MS)

Performance in Bottom-Up Proteomic Experiments. Anal. Chem. 2015, 87 (23),

11635–11640.

(4) Burkhart, J. M.; Premsler, T.; Sickmann, A. Quality Control of Nano-LC-MS

Systems Using Stable Isotope-Coded Peptides. Proteomics 2011, 11 (6), 1049–

1057.

(5) User Guide: Pierce LC-MS/MS System Suitability Standard (7 x 5 Mix)

https://www.thermofisher.com/document-connect/document-

connect.html?url=https%3A%2F%2Fassets.thermofisher.com%2FTFS-

Assets%2FLSG%2Fmanuals%2FMAN0018020_2162731_PierceSystemStdMix_

UG.pdf&title=VXNlciBHdWlkZTogUGllcmNlIExDLU1TL01TIFN5c3RlbSBTdWl0Y

WJpbGl0eSBTdGFuZGFyZCAoNyB4IDUgTWl4KQ==

(6) Technical Manual: 6 × 5 LC-MS/MS Peptide Reference Mix

https://www.promega.com/-/media/files/resources/protocols/technical-

manuals/101/6-x-5-lc-ms-ms-peptide-reference-mix-protocol.pdf

107

(7) Bereman, M. S.; Beri, J.; Sharma, V.; Nathe, C.; Eckels, J.; Maclean, B.;

Maccoss, M. J. An Automated Pipeline to Monitor System Performance in Liquid

Chromatography−Tandem Mass Spectrometry Proteomic Experiments. 2016.

(8) MacLean, B. X.; Pratt, B. S.; Egertson, J. D.; MacCoss, M. J.; Smith, R. D.; Baker,

E. S. Using Skyline to Analyze Data-Containing Liquid Chromatography, Ion

Mobility Spectrometry, and Mass Spectrometry Dimensions. J. Am. Soc. Mass

Spectrom. 2018, 29 (11), 2182–2188.

(9) Kyte, J.; Doolittle, R. F. A Simple Method for Displaying the Hydropathic

Character of a Protein. J. Mol. Biol. 1982, 157 (1), 105–132.

(10) Karplus, P. A. Hydrophobicity Regained. Protein Sci. 1997, 6, 1302–1307.

108

Chapter 5

Enhanced Protocol for Quantitative N-linked Glycomics Analysis Using Individuality Normalization when Labeling with Isotopic Glycan Hydrazide Tags (INLIGHT)™ Reused with permission from: Kalmar, J.G.; Butler, K. E.; Baker, E. S.; Muddiman, D. C. Anal. Bioanal. Chem. 2020, 412, 7569-7579. © Springer-Verlag GmbH Germany 2020

5.1 INTRODUCTION

Glycosylation of proteins is a crucial post-translational modification associated with intra- and extracellular communication1, protein folding and stability2, and protein trafficking3. Since glycan properties, including protein binding locations, monomer linkages, and synthesis site and time, greatly dictate their functional roles further investigations are greatly needed. N-linked glycan studies are of great interest for assessment of disease onset and progression as their disruption have been associated with neurodegenerative disorders4,5, inflammation-based diseases6, and numerous types of cancer3,7, among many others. In proteins containing the motif asparagine-X- serine/threonine, N-linked glycans are covalently bound to the asparagine (Asn) residue through a glycosidic bond, where X is any amino acid except proline 8. More than 70% of proteins found in eukaryotic systems have the sequon, asparagine-X-serine/threonine, necessary for N-glycosylation and as a consequence9, more than half of proteins are decorated with N-linked glycans10. While many methods exist for the removal and purification of N-linked glycans from glycoproteins, one of the most common methods utilizes the enzyme peptide-N-glycosidase F (PNGase F) for glycan release 11. PNGase

F cleaves N-linked glycans from glycoproteins at the nitrogen located in Asn, converting

109

Asn to aspartic acid (Asp) and leaving the glycan structure intact8. This cleavage and conversion results in a 0.084 Da mass shift, allowing the assessment of potential glycosylation sites on the peptide and complete characterization of the released glycan.

The released glycans can then be analyzed either directly or following chemical derivatization with chromatography with either optical detectors or mass spectrometry

(MS).

The use of MS-based approaches for the analysis of glycans has become highly preferable as it allows for MS and MS/MS structural elucidation12,13. To aid in their evaluations, the MS analysis is commonly preceded by liquid chromatography (LC) separation techniques such as hydrophilic interaction liquid chromatography (HILIC)14 or porous graphitic carbon (PGC) chromatography15, followed by electrospray ionization

(ESI). Glycans are difficult to ionize due to their innate hydrophilicity and hydrophobic bias of the electrospray droplets16. However, a variety of derivatization methods have been employed to facilitate glycan analysis including protection of the alcohol functional groups, as seen with permethylation17, or adding chemical labels to the reducing end of the sugar after cleavage from the protein. Permethylation is commonly used to derivatize O-linked glycans, as they do not contain a reducing end after cleavage without further chemical modification. The most popular reagents used for the derivatization of N-linked glycans are 2-AA and 2-AB18 which rely on reductive amination and include the ability to fluoresce.

Other popular reducing end tags rely on other chemical techniques such as a Michael addition with 1-phenyl-3-methyl-5-pyrazolone19 or oxime chemistry using aminooxy

Tandem Mass Tags20. The addition of derivatization reagents can increase the hydrophobicity of glycans by expanding their non-polar surface area (NPSA), significantly

110

enhancing their surface activity for better separations and increasing their likelihood of ionization while also providing diagnostic peaks in the mass spectrum for deep structural elucidation. Additionally, the increased hydrophobicity allows for the use of reversed- phase liquid chromatography (RPLC). Though effective, these derivatization methods have challenges including variable reaction efficiencies and additional cleanup steps that can result in long sample preparation times and increased analytical irregularities21.

The Individuality Normalization when Labeling with Isotopic Glycan Hydrazide

Tags (INLIGHT)TM strategy has previously been utilized to increase the hydrophobicity of glycans by adding 4-phenethyl-benzohydrazide (P2PGN) tags with large NPSA for improved ionization efficiency and analysis with RP-LC-MS16,21–23. Furthermore, the reaction efficiency is >95%, leading to almost complete labeling of glycans present in a sample21. The INLIGHT™ strategy also facilitates relative quantitative analysis of glycans

12 in biological samples as it employs the use of both a natural (NAT – C6) or stable-isotope

13 23 label (SIL – C6) phenyl ring . The co-elution of the NAT and SIL derivatized glycans is crucial for their identification and relative quantification. As the labeled reagents are added in an equal abundance and have the same reaction efficiency, relative quantification can be readily performed, even in complex matrices. This method has been used in conjunction with filter aided N-glycan separation (FANGS)24,25 and applied to the analysis of N-linked glycans in many types of complex biological samples, ranging from fetuin to a large-scale ovarian cancer study in human plasma23,26. The INLIGHT™ strategy has also been applied to the analysis of O-linked glycans27. Evaluating O-linked glycans is very challenging since only one enzyme is known for the specific cleavage of mucin-type core 1 glycans28. Thus, their sample preparation is quite different to that used

111

for N-linked glycans and chemical cleavage from the protein occurs via hydrazinoylsis 29 or β-elimination30. Subsequent chemical derivatization steps are then needed to form the aldehyde required for labelling with the P2PGN reagent31. The O-linked derivatization method employs more acidic reaction conditions, requires a lower concentration of the

P2PGN reagent (0.10 µg/mL reagent as compared to 200 µg/mL in previous procedures), and utilizes a shorter reaction time and lower reaction temperature which helps prevent the loss of sialic acid or fucose compared to the current N-linked derivatization method27.

Herein, we report on improvements to the sample preparation and detection of N-linked glycans using the INLIGHT™ strategy. We also provide mechanistic insights as to why this approach out-performs previous work.

5.2 EXPERIMENTAL

5.2.1 Materials

Peptide N-glycosidase F (PNGase F) were purchased from New England Biolabs

(Ipswitch, MA) and Bulldog Bio (Portsmith, NH). The natural (NAT) and stable-isotope labelled (SIL) P2PGN hydrazide reagents were purchased from Cambridge Isotope Labs

(Andover, MA). LC-MS grade solvents (water, acetonitrile, and methanol), ammonium bicarbonate, ammonium acetate, and formic acid were purchased from Fisher Scientific

(Hampton, NH). Human plasma (K2 EDTA, male and female) were purchased from

Golden West Biologicals (Temecula, CA). All other reagents, including the biological glycoprotein standards, fetuin, and horseradish peroxidase, were purchased from Sigma

Aldrich (St. Louis, MO).

112

5.2.2 Modified N-linked Glycan Preparation Protocol

A detailed, step-by-step protocol covering the cleavage and derivatization of N- linked glycans adapted from Hecht et al. 32 can be found in Appendix C. Biological glycoprotein samples were prepared using a modified filter aided N-glycan separation procedure outlined previously25,32. Briefly, 250 μg of protein were loaded onto the 10 kDa molecular weight cutoff filters. Proteins were denatured by adding 2 μL of dithiothreitol

(DTT) to the filter then diluted in 200 µL of digest buffer (100 mM ammonium bicarbonate, pH 7.0). The centrifuge tubes were capped, lightly vortexed, and incubated at 56 °C for

30 minutes. Following denaturation, the samples were alkylated by adding 50 μL of 1 M iodoacetamide and incubated at 37 °C for 60 minutes. The alkylated, denatured proteins were then concentrated onto the filters via centrifugation at 14,000 × g for 40 minutes for

40 minutes. Next, the samples were washed with 100 μL digestion buffer and centrifuged at 14,000 × g for 20 minutes. This step was repeated twice more for a total of three washes. The filters were transferred to a new, clean collection centrifuge tube prior to enzymatic removal of the glycans. This was accomplished by addition of 2 µL of PNGase

F (1,000 units), from either New England Biolabs (NEB) or Bulldog Bio (BB), to the filter.

An additional 98 μL of digestion buffer was added to the filters, bringing to total solution volume to 100 μL. The samples were mixed by pipetting up and down on the filter and then allowed to incubate at 37 °C for 18 hours. The glycans were eluted from the filters by centrifugation at 14,000 × g for 20 minutes. The filter was then washed with 100 μL of

100 mM ammonium bicarbonate buffer and centrifuged again at 14,000 × g for 20 minutes. This washing step was performed twice more for a total of three washes. The

113

samples were then stored at -80 °C until fully frozen (~30 min), then dried to completion in a vacuum concentrator.

Figure 5.1. Modified FANGS workflow for glycan sample preparation. A) Fetuin and horseradish peroxidase samples were added to molecular weight cut-off filters. B) These protein samples were denatured using DTT, then alkylated with iodoacetamide. C) The glycans were cleaved by addition of PNGase F purchased from either New England

Biolabs or Bulldog Bio. D) The glycans were eluted and E) derivatized using either natural

(NAT) or stable isotope label (SIL) INLIGHT™ tags using one of two derivatization methods. F) The NAT and SIL pairs were then combined 1:1 before analysis via LC-

MS/MS.

5.2.3 INLIGHT™ Derivatization of N-Linked Glycans

A standard and modified method were investigated for the derivatization of N- linked glycans as shown in Table 5.1. The standard method has been described extensively23,25,32 for analysis of N-linked glycans released from biological samples.

Briefly, the NAT and SIL INLIGHT™ reagents were suspended in 1 mL of 3:1 (v:v) methanol: acetic acid solution for a final concentration of 1 μg/μL and then, 200 μL (200

114

μg) of NAT and SIL reagents were added to their respective samples. The samples were lightly vortexed, briefly centrifuged (~ 5 sec), and incubated for 3 hours at 56 °C to allow the glycans to react with the labeling reagent. The modified method has previously been described as the optimal INLIGHT™ derivatization conditions for labeling O-linked glycans27. Briefly, the NAT and SIL INLIGHT™ reagents were suspended in 1 mL of methanol. 10 μL of NAT and SIL tag were then added to their respective samples and diluted with 45 μL of methanol and 45 μL of acetic acid, yielding a final concentration of

0.1 μg/μL of P2PGN reagent in a 55:45 (v:v) methanol: acetic acid solution. These samples were then incubated at 37 °C for 1.75 hrs. Both methods were quenched by drying to completion in a vacuum concentrator at 55 °C. The dried, derivatized glycans were stored at -20 °C until LC-MS/MS analysis. Samples were prepared using increasing amounts of P2PGN reagent while maintaining the solution composition, incubation temperature and incubation length. The P2PGN concentrations investigated were 0.001

µg/µL, 0.01 µg/µL, 0.05 µg/µL, 0.1 µg/µL, 0.25 µg/µL, 0.50 µg/µL, 0.75 µg/µL, and 1

µg/µL.

Table 5.1. Differences between 2 derivatization conditions used for labeling of released biological N-linked glycans.

115

5.2.4 Nano-LC MS/MS Analysis

Prior to nanoLC-MS/MS analysis, dried down and derivatized maltoheptaose, bovine fetuin, and horseradish peroxidase were reconstituted in 50 μL of LC-MS grade water and pipetted up and down to ensure suspension of glycans in the solution. The re- suspended glycans were briefly vortexed, centrifuged at 14,000 × g for 5 minutes, and the supernatant was carefully removed and placed into new centrifuge tubes to avoid the presence of excess tag in the samples during analysis. Equimolar NAT and SIL derivatized samples were combined 1:1 (v:v).

Analysis of combined NAT and SIL labeled glycan samples was performed using

RP nanoLC-MS/MS by means of an Easy nLC 1200 (Thermo Fisher Scientific, Waltham,

MA) coupled in-line to a high field quadrupole Orbitrap mass spectrometer (Q Exactive

HF-X, Thermo Fisher Scientific, Bremen, Germany). Five microliters of glycan sample were loaded onto a C18 Acclaim™ PepMap™ trap column (2 cm x 75 μm x 3 μm)

(Thermo Fisher Scientific, West Palm Beach, FL) connected to an EASY-Spray Column

(25 cm x 75 μm x 2 μm) (Thermo Fisher Scientific, West Palm Beach, FL). Mobile phase

A (MPA) consisted of 98% water, 2% acetonitrile, and 0.1% formic acid, while mobile phase B (MPB) was 80% acetonitrile, 20% water, and 0.1% formic acid. The glycans were eluted using a gradient that ramped from 5% to 35% MPB in 2 minutes, then 35% to 70%

MPB over 40 minutes, and finally ramped to 95% MPB in 1 minute. It was then held at

95% MPB for 6 min before returning to 5% MPB in 1 minute with another hold of 10 minutes for re-equilibration. The flow rate for the gradient was 300 nL/minute and a spray voltage of 1.8 kV was utilized.

116

MS measurements were performed using the Top12 data dependent acquisition

(DDA) mode and analyzed using an m/z range of 300 to 2000 m/z. The resolving power was set to 60,000 FWHM at 200 m/z for full MS1 acquisition and 15,000 at 200 m/z for

MS/MS acquisition. AGC targets for MS and MS/MS were set to 5 x105 and 5 x 104 with maximum ionization times of 64 ms and 100 ms for MS1 and MS/MS, respectively.

Higher-energy collisional dissociation was performed using 27 as the normalized collision energy. Precursor ions were collected with an isolation window of 1.4 m/z and the dynamic exclusion window was set to 15 seconds. Each sample was injected in triplicate to ensure run-to-run reproducibility.

5.2.5 UHPLC MS/MS Analysis

Preceding LC-MS/MS analysis, 25 μL of LC-MS grade water were added to the dried, derivatized maltoheptaose, bovine fetuin, and human pooled plasma glycan samples and pipetted up and down to ensure the glycans were suspended. The re- suspended glycans were briefly vortexed then centrifuged at 14,000 × g for 5 minutes. To reduce the amount of unreacted tag in the samples, the supernatant was vigilantly removed and equal volumes were combined (1:1 NAT:SIL) in a new centrifuge tube.

Analysis of combined NAT and SIL labeled glycan samples was performed with 5 distinct stationary phase and mobile phase combinations by means of a Vanquish UHPLC

(Thermo Fisher Scientific, Waltham, MA) coupled in-line to an ultra-high field Orbitrap tribrid mass spectrometer (Orbitrap ID-X Tribrid, Thermo Fisher Scientific, Bremen,

Germany). RP-LC separations were carried out using a Waters (Milford, MA) UPLC BEH

117

C18 (1.7 µm, 2.1 x 100mm) column. HILIC separations utilized an Agilent (Santa Clara,

CA) Zorbax HILIC RRHD (1.8 µm, 2.1 x 100 mm), and PGC separations were carried out on a Thermo Scientific (Waltham, MA) Hypercarb™ Porous Graphitic Carbon (3 µm, 1 x

100 mm) column. Ten μL of each glycan sample were loaded onto the column and subsequently eluted using optimized gradient conditions appropriate for the stationary phase/mobile phase being investigated. These optimized conditions can be found in

Tables C.1-C.3.

MS measurements were performed using DDA mode and analyzed using an m/z range of 400 to 2000 m/z. For full MS1 acquisition, the resolving power was set to

120,000 FWHM at 200 m/z. The automatic gain control (AGC) target was set to 100% and the maximum isolation time (IT) was set to 50 ms. The MS/MS acquisition parameters include resolving power of 30,000 at 200 m/z, AGC target set to predicted, and maximum

IT set to 100 ms. Higher-energy collisional dissociation was performed at a setting of

35%. Precursor ions were collected with an isolation window of 1.5 m/z and the dynamic exclusion window was set to 15 seconds. Each sample was injected in triplicate to ensure run-to-run reproducibility.

5.2.6 Data Analysis

Glycans were identified utilizing accurate mass (mass measurement accuracy ≤ 3 ppm) and manually evaluated using Qual Browser in Thermo Scientific’s XCalibur software and Skyline. The observed potential glycans were compared to a list of accurate glycan masses in literature23,33–37. A transition list comprised of the exact masses of

118

tagged glycans was curated in Skyline and used to verify co-elution of the NAT and SIL- labeled glycans. Quantification of the glycans was performed using the peak integration function in Skyline38,39. Normalization was then performed to account for variability within each experiment. The quantified data was normalized to the maximum base peak (BP) ion abundance of each injection within the same experiment. Normalization coefficients were generated by dividing the maximum base peak ion abundance by each individual injection base peak abundance, and data was multiplied by the resulting coefficient.

5.3 RESULTS AND DISCUSSION

In this study, we investigated several facets of the analysis of INLIGHT™ derivatized glycans by LC-MS/MS. The first step assessed the enzymatic cleavage of N- linked glycans from known glycoproteins using two sources of PNGase F. The first source was the nonrecombinant, in-solution PNGase F enzyme (New England Biolabs), which has facilitated numerous glycan identifications in complex biological samples23,25,26,32,40,41.

In contrast, the use of the lyophilized, mutant recombinant form of PNGase F (Bulldog

Bio) has primarily been implemented in Matrix Assisted Laser Desorption Ionization

(MALDI) MS imaging protocols for analyzing glycans 42–47. To assess run-to-run reproducibility and determine if discrepancies existed between the two enzymes, fetuin and horseradish peroxidase were evaluated with both PNGase F sources in triplicate.

While minor background changes were observed within each run, no differences in overall ion abundances or LC peak numbers were observed for either glycoprotein. Figure 5.2 shows representative base peak chromatograms (BPCs) for the glycans enzymatically released from fetuin (Figure 5.2A). The insets in Figure 5.2A show the mass spectra

119

summed over the retention window for INLIGHT™ derivatized glycans enzymatically released using either the recombinant (BB, top) and nonrecombinant (NEB, bottom)

PNGase F. The mass spectra were similar in both the number of NAT/SIL paired species present, as well as the relative peak abundances in the mass spectra as shown in Figure

5.2B for the INLIGHT™ derivatized glycan, GlcNAc2Man5. Comparisons between the absolute spectral abundances for the NAT and SIL peaks showed similar ion abundances regardless of the type of enzyme used. For both glycoprotein samples, the same derivatized glycans were identified regardless of the type of PNGase F used (Tables C.4 and C.5). This confirms that there were no biases between the nonrecombinant, in- solution, enzyme from New England Biolabs and the mutant recombinant, lyophilized enzyme from Bulldog Bio. Thus, either PNGase F source can be utilized in the FANGS-

INLIGHT™ strategy of sample preparation with comparable results.

120

Figure 5.2. PNGase F enzyme comparisons. A) Representative base peak chromatograms (BPC) for glycans cleaved from fetuin using both Bulldog Bio (BB)

PNGase F (top) and New England Biolabs (NEB) PNGase F (bottom). The mass spectra from the LC window in which most glycans elute (8.0 min-12.0 min) are displayed. B) The ion abundances in each of the representative mass spectra for the observed glycan

(GlcNAc)2(Man)5 were comparable for both sources of PNGase F.

121

Modification of the INLIGHT™ derivatization method used for labeling the glycans was investigated next. Here, the standard method for the derivatization of N-linked glycans, considered to be the optimal conditions for their analysis, was compared to several modifications found by King et al., in which less INLIGHT™ reagent, more acidic conditions, a shorter reaction time and lower temperature during derivatization decreased sialic acid peeling and the loss of fucose for O-linked glycans. Since they used different sample preparation steps prior to the derivatization using the INLIGHT™ strategy, further investigation was necessary. Using the same optimized temperature, solvent composition, and reaction condition length as King et al, the concentration of P2PGN was varied from 1 ng/µL to 1 µg/µL and reacted with maltoheptaose (Figure 5.3) and N-linked glycans cleaved from bovine fetuin. Triplicate injections were performed to assess inter- run variability and comparisons were made for the NAT and SIL peak areas. The data from the maltoheptaose experiments is shown in Figure 5.3 since it only requires derivatization prior to RPLC-MS/MS analysis; therefore, the potential for sample loss from

FANGS is minimal. In the analyses, the highest signal was observed for the reaction utilizing 0.1 µg/µL INLIGHT™ reagent for glycan derivatization. Derivatization reagent concentrations lower than 0.1 µg/µL was observed to have a decrease in observed abundance, which was attributed to a significant decrease in reaction efficiency.

Conversely, samples labeled with concentrations of >0.1 µg/µL of the derivatization reagent had lower abundance, which was attributed to ionization suppression due to excess tag remaining and competing with the labeled glycans for charge in the ESI process. The unreacted tag was filtered out by the quadrupole to avoid overfilling the C- trap thereby excluding the ions of interest from MS detection. N-linked glycans from

122

bovine fetuin were also analyzed and provided similar results.

Figure 5.3. P2PGN concentration optimization for maximum peak area and equivalent

NAT/SIL ratios by dervatizing 50 µg of maltoheptaose in each sample. Average peak areas normalized to the maximum base peak ion abundance of INLIGHT™ derivatized maltoheptaose. The error bars represent the standard error from the individual peak areas in the triplicate analyses.

Upon comparing samples prepared using the standard protocol to those using the modified protocol and 0.1 µg/µL P2PGN (Figure 5.4), a significant increase in the relative abundances of glycans was observed. Figure 5.4A depicts an extracted ion chromatogram for the NAT and SIL derivatized glycan (Man)5(GlcNAc)2 in bovine fetuin using both methods, while Figure 5.4B depicts (Xyl)1(GlcNAc)2(Man)3 from horseradish peroxidase. These data were normalized to the maximum base peak ion abundance

123

within this experiment. Normalization was done to identify the true difference in signal increase between the two methods by normalizing the signal to the highest base peak abundance from each system in the experiment so all of the data has the same relative abundances. The lower ion abundances observed when using the standard protocol is attributed to ion suppression from the excess (unreacted) tag as the tag bleeds from the column taking away a significant amount of charge from the electrospray droplets. As previously mentioned, the unreacted tag is not observed in the mass spectrum because it is purposefully filtered out by the quadrupole to allow filling of the C-trap with the labeled glycans.

The average peak area for (Man)5(GlcNAc)2 increased ~96 times when compared to the standard method and (Xyl)1(GlcNAc)2(Man)3 increased by ~23 times. This difference demonstrated the innate ionization differences between the two representative

N-linked glycans while also portraying a total ion signal increase of over a magnitude instead of four times as originally stated when the P2PGN tag was first characterized21.

It was also apparent across sample types that excess of the derivatization reagent, even when carefully avoided during sample preparation, results in issues with ionization and detection of low abundance glycans. For instance, the glycans

(Fuc)1(Gal)2(GlcNAc)4(Man)3(NeuAc)1 and (Gal)3(GlcNAc)5(Man)3(NeuAc)3 found in fetuin were only identified in the samples prepared using the modified method (Table

C.4).

124

Figure 5.4. Sensitivity enhancement utilizing the modified method. Extracted ion chromatograms and average peak areas, normalized to the maximum base peak abundance for A) (Man)5(GlcNAc)2 in fetuin and B) (Xyl)1(GlcNAc)2(Man)3 in horseradish peroxidase. The observed glycans were derivatized using the modified and standard methods described in Table 5.1. Deviations from 1:1 ratio are likely due systematic sample preparation errors and a slight bias of increased signal from the heavy labelled glycan previously noted by Walker et al 48. Error bars were derived from the peak areas in the triplicate analyses.

125

Figure 5.5. LC separation and sensitivity optimization for INLIGHT™ derivatized glycans.

Three stationary phases: A) porous graphitic carbon (PGC); B) hydrophilic interaction chromatography (HILIC); and C) reversed-phase liquid chromatography (RPLC), and their most common mobile phases all containing 0.1% formic acid. Maltoheptaose and N- linked glycans from bovine fetuin and pooled human plasma were used for their evaluation due to the increasing complexity and known glycans from literature. The chromatograms represent the separation of the INLIGHTTM labeled N-linked glycans derived from fetuin. Representative theoretical isotopic distributions for the NAT ( ) and

SIL ( ), calculated using enviPat49, are overlaid with the experimental data demonstrating how spectral accuracy can aid in the identification of the glycans. RPLC using a C18 stationary phase with mobile phase A comprising of 98% water, 2% acetonitrile (ACN) and 0.1% formic acid and mobile phase B comprising of 98% ACN, 2% water and 0.1% formic acid was determined to be the best LC system for separating and identifying

INLIGHT™ derivatized N-linked glycans.

126

The final optimization performed was evaluating the ideal LC separation parameters for the INLIGHT™ N-linked glycans. Numerous LC separation techniques exist for derivatized N-linked glycans, although the stationary phases, mobile phases and separation conditions vary greatly. The most commonly employed stationary phases are

PGC, HILIC, and RPLC, so each was evaluated. Mobile phase compositions were chosen based on those commonly employed when separating glycans with each specific stationary phase. To investigate which LC setup best separated the INLIGHT™ derivatized N-linked glycans, derivatized maltoheptaose, bovine fetuin, and human pooled plasma were analyzed with multiple columns and mobile phases (Figure 5.5).

In the LC evaluations for all stationary phases, the composition of MPA was varied while maintaining a consistent MPB composition (98% acetonitrile, 2% water, and 0.1% formic acid). For PGC, MPA normally consists of 10 mM ammonium bicarbonate, 10 mM ammonium acetate, or water with small amounts of acetonitrile and formic acid50–56, therefore, two separate MPA compositions were tested. The first MPA consisted of 10 mM ammonium acetate with 0.1% formic acid, while the second was comprised of 98% water, 2% acetonitrile, and 0.1% formic acid (Figure 5.5A). It was noted that a washing step using 95% methanol was ideal for maintaining the integrity of the PGC stationary phase so a blank sample consisting of each MPA was run with MPC of 95% methanol after every run to regenerate the analytical column56. HILIC mobile phase compositions for derivatized glycans typically have various concentrations of ammonium formate or acetonitrile for MPA, while MPB consists of primarily acetonitrile34,57–60. The Optamax NG ionization source on the Orbitrap ID-X and utilized in this study, does not function optimally with ion pairing reagents or other additives with concentrations that exceed 10

127

mM. Thus, two different MPA compositions selected for the HILIC analysis were: 10 mM ammonium formate in water with 0.1% formic acid and 98% water, 2% acetonitrile and

0.1% formic acid (Figure 5.5B). Finally, RPLC uses a MPA primarily comprised of water and MPB mainly consisting of acetonitrile23,25,27,61–63, therefore, 98% water, 2% acetonitrile, and 0.1% formic acid was used for MPA (Figure 5.5C).

To assess the retention of INLIGHT™ derivatized samples, derivatized maltoheptaose was analyzed for each setup. INLIGHT™ derivatized maltoheptaose, maltohexaose, maltopentaose and maltotetraose were retained in all cases except for the

HILIC column using the water and acetonitrile mobile phases. N-linked glycans enzymatically removed from bovine fetuin were analyzed next using each LC setup to observe the retention of the derivatized glycans. Glycans were identified only when both

NAT and SIL versions co-eluted and the mass spectrum contained both isotopic distributions. No INLIGHT™ derivatized N-linked glycans were retained on the PGC column with either mobile phase systems. Four INLIGHT™ derivatized N-linked glycans from bovine fetuin were retained on the HILIC column when using the water and acetonitrile mobile phase system, while five INLIGHT™ derivatized N-linked glycans were retained on the HILIC column with the ammonium formate and acetonitrile mobile phases.

The highest number of INLIGHT™ derivatized N-linked glycans, 18, were retained on the

C18. For those chromatographic conditions where glycans were retained, the gradient and method were optimized to further improve the separation (Tables C.1-C3).

Finally, N-linked glycans enzymatically removed from human pooled plasma proteins were analyzed using the optimized gradient for each LC system. INLIGHT™ derivatized N-linked glycans were not retained on the PGC column using either mobile

128

phase systems, nor were any detected following separation via the HILIC column with the water and acetonitrile mobile phases. However, 21 INLIGHT™ derivatized N-linked glycans were retained on the HILIC column when using ammonium formate and acetonitrile and 57 were retained and subsequently detected using the RPLC system.

Thus, the RPLC setup provided the best separation for INLIGHT™ derivatized N-linked glycans with the majority of detected glycans eluting between 6 and 9 minutes out of the

30-minute gradient (Table C.3). While the majority of the glycans eluted early within this gradient, it was necessary to keep the length of this gradient at 30 minutes as lessening the time resulted in the inability to efficiently separate derivatized glycans.

5.4 CONCLUSIONS

Three steps of the INLIGHT™ derivatized N-linked glycan pipeline were investigated and optimized. First, evaluation of recombinant and nonrecombinant

PNGase F illustrated no differences in the number of glycan observed. Second, modifications in the INLIGHT™ derivatization step including temperature optimization, solvent composition changes, reaction condition length and tag concentration resulted in over an order of magnitude increase in signal for the detected N-linked glycans in fetuin and horseradish peroxidase compared to the standard method. Not only did the modified method significantly increase the signal for N-linked glycans, it also improved sample throughput by decreasing the time necessary for sample preparation and minimized chances of sialic acid peeling and fucose losses by using a lower temperature. Finally,

LC setup optimization illustrated that a C18 RPLC column, and MPA: 98% water, 2% acetonitrile, and 0.1% formic acid and MPB: 98% acetonitrile, 2% water, and 0.1% formic

129

acid, enabled the best retention and separation of the INLIGHT™ derivatized N-linked glycans compared to two PGC and two HILIC setups. These method optimizations were demonstrated on a complex sample (human plasma) and enabled the best selectivity and sensitivity for INLIGHT™ derivatized N-linked glycan analyses.

5.5 ACKNOWLEDGMENTS

This work was performed in part by the Molecular Education, Technology and

Research Innovation Center (METRIC) at NC State University, which is supported by the

State of North Carolina. The authors gratefully acknowledge the financial support received from the National Institute on Aging at the National Institute of Health

(R56AG063885) and North Carolina State University.

5.6 COMPLIANCE WITH ETHICAL STANDARDS

The pooled male and female human plasma samples used in these studies were sourced from a licensed entity, Golden West Biologicals (Temecula, CA). The samples were de-identified by Golden West Biologicals as to which participants were used to make up the male and the female pool. Upon arrival and for this study, we further pooled the male and female plasma 1:1 (v/v) prior to analysis. No data in our study can be linked to any human subject.

130

5.7 LITERATURE CITED

(1) Parker, R. B.; Kohler, J. J. Regulation of Intracellular Signaling by Extracellular

Glycan Remodeling. ACS Chem. Biol. 2010, 5 (1), 35–46.

https://doi.org/10.1038/jid.2014.371.

(2) Jayaprakash, N. G.; Surolia, A. Role of Glycosylation in Nucleating Protein

Folding and Stability. Biochem. J. 2017, 474 (14), 2333–2347.

https://doi.org/10.1042/BCJ20170111.

(3) Bard, F.; Chia, J. Cracking the Glycome Encoder: Signaling, Trafficking, and

Glycosylation. Trends Cell Biol. 2016, 26 (5), 379–388.

https://doi.org/10.1016/j.tcb.2015.12.004.

(4) Cho, B. G.; Veillon, L.; Mechref, Y. N-Glycan Profile of Cerebrospinal Fluids from

Alzheimer’s Disease Patients Using Liquid Chromatography with Mass

Spectrometry. J. Proteome Res. 2019, 18 (10), 3770–3779.

https://doi.org/10.1021/acs.jproteome.9b00504.

(5) Abou-Abbass, H.; Abou-El-Hassan, H.; Bahmad, H.; Zibara, K.; Zebian, A.;

Youssef, R.; Ismail, J.; Zhu, R.; Zhou, S.; Dong, X.; Nasser, M.; Bahmad, M.;

Darwish, H.; Mechref, Y.; Kobeissy, F. Glycosylation and Other PTMs Alterations

in Neurodegenerative Diseases: Current Status and Future Role in Neurotrauma.

Electrophoresis. Wiley-VCH Verlag June 1, 2016, pp 1549–1561.

https://doi.org/10.1002/elps.201500585.

(6) McCarthy, C.; Saldova, R.; Wormald, M. R.; Rudd, P. M.; McElvaney, N. G.;

Reeves, E. P. The Role and Importance of Glycosylation of Acute Phase Proteins

131

with Focus on Alpha-1 Antitrypsin in Acute and Chronic Inflammatory Conditions.

J. Proteome Res. 2014, 13 (7), 3131–3143. https://doi.org/10.1021/pr500146y.

(7) Wang, H.; Ramakrishnan, A.; Fletcher, S.; Prochownik, E. V; Genetics, M. HHS

Public Protein Glycosylation in Cancer. Annu Rev Pathol 2015, 2 (2), 473–510.

https://doi.org/10.14440/jbm.2015.54.A.

(8) Stanley, P.; Taniguchi, N.; Aebi, M. Chapter 9. N-Glycans, Essentials of

Glycobiology, 2nd Edition. Essentials Glycobiol. 2017, 1–14.

https://doi.org/10.1101/glycobiology.3e.009.

(9) Stanley P, Taniguchi N, A. M. N-Glycans. Essentials Glycobiol. 2017, 1–14.

https://doi.org/10.1101/glycobiology.3e.009.

(10) Apweiler, R.; Hermjakob, H.; Sharon, N. On the Frequency of Protein

Glycosylation, as Deduced from Analysis of the SWISS-PROT Database.

Biochem. Biophys. Acta 1999, 1473 (1), 4–8. https://doi.org/10.1097/00013611-

198607000-00004.

(11) Tarentino, A. L.; Gomez, C. M.; Plummer, T. H. Deglycosylation of Asparagine-

Linked Glycans by Peptide: N-Glycosidase F. Biochemistry 1985, 24 (17), 4665–

4671. https://doi.org/10.1021/bi00338a028.

(12) Han, L.; Costello, C. E. Mass Spectrometry of Glycans. Biochem. 2013, 78 (7),

710–720. https://doi.org/10.1134/S0006297913070031.

(13) Wada, Y.; Azadi, P.; Costello, C. E.; Dell, A.; Dwek, R. A.; Geyer, H.; Geyer, R.;

Kakehi, K.; Karlsson, N. G.; Kato, K.; Kawasaki, N.; Khoo, K. H.; Kim, S.; Kondo,

132

A.; Lattova, E.; Mechref, Y.; Miyoshi, E.; Nakamura, K.; Narimatsu, H.; Novotny,

M. V.; Packer, N. H.; Perreault, H.; Peter-Katalinić, J.; Pohlentz, G.; Reinhold, V.

N.; Rudd, P. M.; Suzuki, A.; Taniguchi, N. Comparison of the Methods for Profiling

Glycoprotein Glycans - HUPO Human Disease Glycomics/Proteome Initiative

Multi-Institutional Study. Glycobiology 2007, 17 (4), 411–422.

https://doi.org/10.1093/glycob/cwl086.

(14) Nováková, L.; Havlíková, L.; Vlčková, H. Hydrophilic Interaction Chromatography

of Polar and Ionizable Compounds by UHPLC. TrAC Trends Anal. Chem. 2014,

63, 55–64. https://doi.org/10.1016/J.TRAC.2014.08.004.

(15) West, C.; Elfakir, C.; Lafosse, M. Porous Graphitic Carbon: A Versatile Stationary

Phase for Liquid Chromatography. Journal of Chromatography A. May 2010, pp

3201–3216. https://doi.org/10.1016/j.chroma.2009.09.052.

(16) Null, A. P.; Nepomuceno, A. I.; Muddiman, D. C. Implications of Hydrophobicity

and Free Energy of Solvation for Characterization of Nucleic Acids by

Electrospray Ionization Mass Spectrometry. Anal. Chem. 2003, 75 (6), 1331–

1339. https://doi.org/10.1021/ac026217o.

(17) Baldwin, M. A.; Stahl, N.; Reinders, L. G.; Gibson, B. W.; Prusiner, S. B.;

Burlingame, A. L. Permethylation and Tandem Mass Spectrometry of

Oligosaccharides Having Free Hexosamine: Analysis of the Glycoinositol

Phospholipid Anchor Glycan from the Scrapie Prion Protein. Anal. Biochem. 1990,

191 (1), 174–182. https://doi.org/10.1016/0003-2697(90)90405-X.

(18) Bigge, J. C.; Patel, T. P.; Bruce, J. A.; Goulding, P. N.; Charles, S. M.; Parekh, R.

133

B. Nonselective and Efficient Fluorescent Labeling of Glycans Using 2-Amino

Benzamide and Anthranilic Acid. Analytical Biochemistry. 1995, pp 229–238.

https://doi.org/10.1006/abio.1995.1468.

(19) Ruhaak, L. R.; Zauner, G.; Huhn, C.; Bruggink, C.; Deelder, A. M.; Wuhrer, M.

Glycan Labeling Strategies and Their Use in Identification and Quantification.

Analytical and Bioanalytical Chemistry. August 2010, pp 3457–3481.

https://doi.org/10.1007/s00216-010-3532-z.

(20) Hahne, H.; Neubert, P.; Kuhn, K.; Etienne, C.; Bomgarden, R.; Rogers, J. C.;

Kuster, B. Carbonyl-Reactive Tandem Mass Tags for the Proteome-Wide

Quantification of N-Linked Glycans. Anal. Chem. 2012, 84 (8), 3716–3724.

https://doi.org/10.1021/ac300197c.

(21) Walker, S. H.; Lilley, L. M.; Enamorado, M. F.; Comins, D. L.; Muddiman, D. C.

Hydrophobic Derivatization of N-Linked Glycans for Increased Ion Abundance in

Electrospray Ionization Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2011,

22 (8), 1309–1317. https://doi.org/10.1007/s13361-011-0140-x.

(22) Walker, S. H.; Carlisle, B. C.; Muddiman, D. C. Systematic Comparison of

Reverse Phase and Hydrophilic Interaction Liquid Chromatography Platforms for

the Analysis of N-Linked Glycans. Anal. Chem. 2012, 84 (19), 8198–8206.

https://doi.org/10.1021/ac3012494.

(23) Walker, S. H.; Taylor, A. D.; Muddiman, D. C. Individuality Normalization When

Labeling with Isotopic Glycan Hydrazide Tags (INLIGHT): A Novel Glycan-

Relative Quantification Strategy. J. Am. Soc. Mass Spectrom. 2013, 24 (9), 1376–

134

1384. https://doi.org/10.1007/s13361-013-0681-2.

(24) Abdul Rahman, S.; Bergström, E.; Watson, C. J.; Wilson, K. M.; Ashford, D. A.;

Thomas, J. R.; Ungar, D.; Thomas-Oates, J. E. Filter-Aided N-Glycan Separation

(FANGS): A Convenient Sample Preparation Method for Mass Spectrometric N-

Glycan Profiling. J. Proteome Res. 2014, 13 (3), 1167–1176.

https://doi.org/10.1021/pr401043r.

(25) Hecht, E. S.; McCord, J. P.; Muddiman, D. C. Definitive Screening Design

Optimization of Mass Spectrometry Parameters for Sensitive Comparison of Filter

and Solid Phase Extraction Purified, INLIGHT Plasma N-Glycans. Anal. Chem.

2015, 87 (14), 7305–7312. https://doi.org/10.1021/acs.analchem.5b01609.

(26) Hecht, E. S.; Scholl, E. H.; Walker, S. H.; Taylor, A. D.; Cliby, W. A.; Motsinger-

Reif, A. A.; Muddiman, D. C. Relative Quantification and Higher-Order Modeling

of the Plasma Glycan Cancer Burden Ratio in Ovarian Cancer Case-Control

Samples. J. Proteome Res. 2015, 14 (10), 4394–4401.

https://doi.org/10.1021/acs.jproteome.5b00703.

(27) King, S. R.; Hecht, E. S.; Muddiman, D. C. Demonstration of Hydrazide Tagging

for O-Glycans and a Central Composite Design of Experiments Optimization

Using the INLIGHTTM Reagent. Anal. Bioanal. Chem. 2018, 410 (5), 1409–1415.

https://doi.org/10.1007/s00216-017-0828-2.

(28) Fujita, K.; Oura, F.; Nagamine, N.; Katayama, T.; Hiratake, J.; Sakata, K.;

Kumagai, H.; Yamamoto, K. Identification and Molecular Cloning of a Novel

Glycoside Hydrolase Family of Core 1 Type O-Glycan-Specific Endo-α-N-

135

Acetylgalactosaminidase from Bifidobacterium Longum. J. Biol. Chem. 2005, 280

(45), 37415–37422. https://doi.org/10.1074/jbc.M506874200.

(29) Kuraya, N.; Hase2, S. Release of O-Linked Sugar Chains from Glycoproteins with

Anhydrous Hydrazine and Pyridylamination of the Sugar Chains with Improved

Reaction Conditions’. J. Biochem 1992, 112 (1), 122–126.

(30) Taylor, A. M.; Holst, O.; Thomas-Oates, J. Mass Spectrometric Profiling of O-

Linked Glycans Released Directly from Glycoproteins in Gels Using in-Gel

Reductive β-Elimination. Proteomics 2006, 6 (10), 2936–2946.

https://doi.org/10.1002/pmic.200500331.

(31) Bereman, M. S.; Comins, D. L.; Muddiman, D. C. Increasing the Hydrophobicity

and Electrospray Response of Glycans through Derivatization with Novel Cationic

Hydrazides. Chem. Commun. 2010, 46 (2), 237–239.

https://doi.org/10.1039/b915589a.

(32) Hecht, E. S.; McCord, J. P.; Muddiman, D. C. A Quantitative Glycomics and

Proteomics Combined Purification Strategy. J. Vis. Exp. 2016, No. 109, e53735.

https://doi.org/10.3791/53735.

(33) Sun, X.; Tao, L.; Yi, L.; Ouyang, Y.; Xu, N.; Li, D.; Linhardt, R. J.; Zhang, Z. N-

Glycans Released from Glycoproteins Using a Commercial Kit and

Comprehensively Analyzed with a Hypothetical Database. J. Pharm. Anal. 2017,

7 (2), 87–94. https://doi.org/10.1016/j.jpha.2017.01.004.

(34) Melmer, M.; Stangler, T.; Premstaller, A.; Lindner, W. Comparison of Hydrophilic-

Interaction, Reversed-Phase and Porous Graphitic Carbon Chromatography for

136

Glycan Analysis. J. Chromatogr. A 2011, 1218 (1), 118–123.

https://doi.org/10.1016/j.chroma.2010.10.122.

(35) Ding, W.; Nothaft, H.; Szymanski, C. M.; Kelly, J. Identification and Quantification

of Glycoproteins Using Ion-Pairing Normal-Phase Liquid Chromatography and

Mass Spectrometry. Mol. Cell. Proteomics 2009, 8 (9), 2170–2185.

https://doi.org/10.1074/mcp.M900088-MCP200.

(36) Gray, J. S.; Yang, B. Y.; Montgomery, R. Heterogeneity of Glycans at Each N-

Glycosylation Site of Horseradish Peroxidase. Carbohydr. Res. 1998, 311 (1–2),

61–69. https://doi.org/10.1016/s0008-6215(98)00209-2.

(37) Yang, B. Y.; Gray, J. S. S.; Montgomery, R. The Glycans of Horseradish

Peroxidase. Carbohydr. Res. 1996, 287 (2), 203–212.

https://doi.org/10.1016/0008-6215(96)00073-0.

(38) MacLean, B.; Tomazela, D. M.; Shulman, N.; Chambers, M.; Finney, G. L.;

Frewen, B.; Kern, R.; Tabb, D. L.; Liebler, D. C.; MacCoss, M. J. Skyline: An

Open Source Document Editor for Creating and Analyzing Targeted Proteomics

Experiments. Bioinformatics 2010, 26 (7), 966–968.

https://doi.org/10.1093/bioinformatics/btq054.

(39) Loziuk, P. L.; Hecht, E. S.; Muddiman, D. C. N-Linked Glycosite Profiling and Use

of Skyline as a Platform for Characterization and Relative Quantification of

Glycans in Differentiating Xylem of Populus Trichocarpa. Anal. Bioanal. Chem.

2017, 409 (2), 487–497. https://doi.org/10.1007/s00216-016-9776-5.

(40) Hecht, E. S.; Loziuk, P. L.; Muddiman, D. C. Xylose Migration During Tandem

137

Mass Spectrometry of N-Linked Glycans. J. Am. Soc. Mass Spectrom. 2017, 28

(4). https://doi.org/10.1007/s13361-016-1588-5.

(41) Niedzwiecki, M. M.; Walker, D. I.; Howell, J. C.; Watts, K. D.; Jones, D. P.; Miller,

G. W.; Hu, W. T. High‐resolution Metabolomic Profiling of Alzheimer’s Disease in

Plasma. Ann. Clin. Transl. Neurol. 2019, acn3.50956.

https://doi.org/10.1002/acn3.50956.

(42) Holst, S.; Heijs, B.; de Haan, N.; van Zeijl, R. J. M.; Briaire-de Bruijn, I. H.; van

Pelt, G. W.; Mehta, A. S.; Angel, P. M.; Mesker, W. E.; Tollenaar, R. A.; Drake, R.

R.; Bovée, J. V. M. G.; McDonnell, L. A.; Wuhrer, M. Linkage-Specific in Situ

Sialic Acid Derivatization for N-Glycan Mass Spectrometry Imaging of Formalin-

Fixed Paraffin-Embedded Tissues. Anal. Chem. 2016, 88 (11), 5904–5913.

https://doi.org/10.1021/acs.analchem.6b00819.

(43) West, C. A.; Wang, M.; Herrera, H.; Liang, H.; Black, A.; Angel, P. M.; Drake, R.

R.; Mehta, A. S. N-Linked Glycan Branching and Fucosylation Are Increased

Directly in Hcc Tissue As Determined through in Situ Glycan Imaging. J.

Proteome Res. 2018, 17 (10), 3454–3462.

https://doi.org/10.1021/acs.jproteome.8b00323.

(44) Powers, T.; Holst, S.; Wuhrer, M.; Mehta, A.; Drake, R. Two-Dimensional N-

Glycan Distribution Mapping of Hepatocellular Carcinoma Tissues by MALDI-

Imaging Mass Spectrometry. Biomolecules 2015, 5 (4), 2554–2572.

https://doi.org/10.3390/biom5042554.

(45) Angel, P. M.; Saunders, J.; Clift, C. L.; White-Gilbertson, S.; Voelkel-Johnson, C.;

138

Yeh, E.; Mehta, A.; Drake, R. R. A Rapid Array-Based Approach to N -Glycan

Profiling of Cultured Cells. J. Proteome Res. 2019, 18 (10), 3630–3639.

https://doi.org/10.1021/acs.jproteome.9b00303.

(46) Mehta, A.; Comunale, M. A.; Rawat, S.; Casciano, J. C.; Lamontagne, J.; Herrera,

H.; Ramanathan, A.; Betesh, L.; Wang, M.; Norton, P.; Steel, L. F.; Bouchard, M.

J. Intrinsic Hepatocyte Dedifferentiation Is Accompanied by Upregulation of

Mesenchymal Markers, Protein Sialylation and Core Alpha 1,6 Linked

Fucosylation. Sci. Rep. 2016, 6. https://doi.org/10.1038/srep27965.

(47) Granger, B. L. Propeptide Genesis by Kex2-Dependent Cleavage of Yeast Wall

Protein 1 (Ywp1) of Candida Albicans. PLoS One 2018, 13 (11), e0207955.

https://doi.org/10.1371/journal.pone.0207955.

(48) Walker, S. H.; Budhathoki-Uprety, J.; Novak, B. M.; Muddiman, D. C. Stable-

Isotope Labeled Hydrophobic Hydrazide Reagents for the Relative Quantification

of n-Linked Glycans by Electrospray Ionization Mass Spectrometry. Anal. Chem.

2011, 83 (17), 6738–6745. https://doi.org/10.1021/ac201376q.

(49) Loos, M.; Gerber, C.; Corona, F.; Hollender, J.; Singer, H. Accelerated Isotope

Fine Structure Calculation Using Pruned Transition Trees. Anal. Chem. 2015, 87

(11), 5738–5744. https://doi.org/10.1021/acs.analchem.5b00941.

(50) Chu, C. S.; Niñonuevo, M. R.; Clowers, B. H.; Perkins, P. D.; An, H. J.; Yin, H.;

Killeen, K.; Miyamoto, S.; Grimm, R.; Lebrilla, C. B. Profile of Native N-Linked

Glycan Structures from Human Serum Using High Performance Liquid

Chromatography on a Microfluidic Chip and Time-of-Flight Mass Spectrometry.

139

Proteomics 2009, 9 (7), 1939–1951. https://doi.org/10.1002/pmic.200800249.

(51) Ashwood, C.; Lin, C. H.; Thaysen-Andersen, M.; Packer, N. H. Discrimination of

Isomers of Released N- and O-Glycans Using Diagnostic Product Ions in

Negative Ion PGC-LC-ESI-MS/MS. J. Am. Soc. Mass Spectrom. 2018, 29 (6),

1194–1209. https://doi.org/10.1007/s13361-018-1932-z.

(52) Ashwood, C.; Pratt, B.; Maclean, B. X.; Gundry, R. L.; Packer, N. H.

Standardization of PGC-LC-MS-Based Glycomics for Sample Specific

Glycotyping. Analyst 2019, 144 (11), 3601–3612.

https://doi.org/10.1039/c9an00486f.

(53) Song, T.; Ozcan, S.; Becker, A.; Lebrilla, C. B. In-Depth Method for the

Characterization of Glycosylation in Manufactured Recombinant Monoclonal

Antibody Drugs. Anal. Chem. 2014, 86 (12), 5661–5666.

https://doi.org/10.1021/ac501102t.

(54) Song, T.; Aldredge, D.; Lebrilla, C. B. A Method for In-Depth Structural Annotation

of Human Serum Glycans That Yields Biological Variations. Anal. Chem. 2015, 87

(15), 7754–7762. https://doi.org/10.1021/acs.analchem.5b01340.

(55) Abrahams, J. L.; Campbell, M. P.; Packer, N. H. Building a PGC-LC-MS N-Glycan

Retention Library and Elution Mapping Resource. Glycoconj. J. 2018, 35 (1), 15–

29. https://doi.org/10.1007/s10719-017-9793-4.

(56) Bapiro, T. E.; Richards, F. M.; Jodrell, D. I. Understanding the Complexity of

Porous Graphitic Carbon (PGC) Chromatography: Modulation of Mobile-

Stationary Phase Interactions Overcomes Loss of Retention and Reduces

140

Variability. Anal. Chem. 2016, 88, 6190–6194.

https://doi.org/10.1021/acs.analchem.6b01167.

(57) Yamaguchi, Y.; Nishima, W.; Re, S.; Sugita, Y. Confident Identification of Isomeric

N -Glycan Structures by Combined Ion Mobility Mass Spectrometry and

Hydrophilic Interaction Liquid Chromatography. Rapid Commun. Mass Spectrom.

2012, 26 (24), 2877–2884. https://doi.org/10.1002/rcm.6412.

(58) Zhao, J.; Li, S.; Li, C.; Wu, S.-L.; Xu, W.; Chen, Y.; Shameem, M.; Richardson, D.;

Li, H. Identification of Low Abundant Isomeric N-Glycan Structures in Biological

Therapeutics by LC/MS. Anal. Chem. 2016, 88 (14), 7049–7059.

https://doi.org/10.1021/acs.analchem.6b00636.

(59) Szabo, Z.; Guttman, A.; Karger, B. L. Rapid Release of N-Linked Glycans from

Glycoproteins by Pressure-Cycling Technology. Anal. Chem. 2010, 82 (6), 2588–

2593. https://doi.org/10.1021/ac100098e.

(60) Largy, E.; Cantais, F.; Van Vyncht, G.; Beck, A.; Delobel, A. Orthogonal Liquid

Chromatography–Mass Spectrometry Methods for the Comprehensive

Characterization of Therapeutic Glycoproteins, from Released Glycans to Intact

Protein Level. J. Chromatogr. A 2017, 1498, 128–146.

https://doi.org/10.1016/j.chroma.2017.02.072.

(61) Higel, F.; Demelbauer, U.; Seidl, A.; Friess, W.; Sörgel, F. Reversed-Phase

Liquid-Chromatographic Mass Spectrometric N-Glycan Analysis of

Biopharmaceuticals. Anal. Bioanal. Chem. 2013, 405 (8), 2481–2493.

https://doi.org/10.1007/s00216-012-6690-3.

141

(62) Prater, B. D.; Connelly, H. M.; Qin, Q.; Cockrill, S. L. High-Throughput

Immunoglobulin G N-Glycan Characterization Using Rapid Resolution Reverse-

Phase Chromatography Tandem Mass Spectrometry. Anal. Biochem. 2009, 385

(1), 69–79. https://doi.org/10.1016/j.ab.2008.10.023.

(63) Chen, X.; Flynn, G. C. Analysis of N-Glycans from Recombinant Immunoglobulin

G by on-Line Reversed-Phase High-Performance Liquid Chromatography/Mass

Spectrometry. Anal. Biochem. 2007, 370 (2), 147–161.

142

Chapter 6

GlycoHunter: An Open-Source Software for the Detection and Relative Quantification of INLIGHT™ Labelled N-linked Glycans Submitted: Kalmar, J.G. Garrad, K.; Muddiman, D. C. J. Proteome. Res. 2020 © American Chemical Society 2020

6.1 INTRODUCTION

Glycosylation is a key co/post-translational protein modification that occurs on over

50% of proteins and has profound implications on biological pathways, homeostasis, and biotherapeutic processing1–4. Despite its established role within cancer biology, cellular and enzymatic functions5,6, diseases7, viral infection mechanisms8, and other widespread activities, the biological implications of glycosylation are still widely unknown. There are two main types of glycosylation events which are named according to the atom that they bind to within the proteins. In proteins containing the motif asparagine-X-serine/threonine,

N-linked glycans are covalently bound to the side chain nitrogen atom of the asparagine

(Asn) residue through a glycosidic bond where X is any amino acid except proline (Pro)9.

O-linked glycans are covalently bound to the side chain oxygen atoms of serine (Ser), threonine (Thr) and occasionally tyrosine (Tyr)10,11. Glycans are naturally hydrophilic making them difficult to separate using reversed-phase liquid chromatography, ionize, and then differentiate between various structural isomers12. The diversity of glycan structures and the complexity of glycan-protein interactions renders glycomics unsolvable by a single analytical technique often requiring multiple, orthogonal analytical techniques including advanced solution- and gas-phase separations and bioinformatic tools.

143

Chemical derivatization is responsible for some of the greatest analytical gains in the field of glycomics due to their ability to aide with separations, increase ionization efficiencies and provide the means for relative quantification. A variety of derivatization compounds employ diverse chemistries, including those which react with the sugar side chains such as permethylation13, peracetylation14, and esterification15, or those which react stoichiometrically with the reducing-terminal N-acetyl glucosamines that are generated upon glycan cleavage such as reductive amidation16, hydrazide17, aminoxy18, or carbamate reactions19. These compounds have evolved into a variety of effective strategies to derivatize glycans for stable isotope relative quantification produced by a mass difference resulting from the incorporation of light and heavy stable isotope labels

(SIL) of the same tag during sample preparation.

18 Heavy water (D2 O) is incorporated during enzymatic cleavage from the protein or using glycan reducing end dual isotopic labeling (GREDIL)20. Isobaric aldehyde reactive tags (iARTs)21, Isotopic Detection of Aminosugars With Glutamine (IDAWG), and aminooxy tandem mass tags (TMT)18, incorporate nitrogen-14 and nitrogen-15. Hydrogen and deuterium isotopes are also commonly used when creating new isotopic tags.

Deuterated tags used to identify glycans are found in some permethylation methods22, 1- phenyl-3-methyl-5-pyrazolone (PMP)23, p-touldine24 and isotopic amine tags25,26.

The most common isotopes incorporated into derivatization tags are carbon-12

(12C) in the light tag and carbon-13 (13C) in the heavy tag. This particular pair can be found

27 12 13 28 in methods such as quantification by isobaric labeling (QUIBL) and ( CH3I/ CH3I) permethylation strategies, glycan reductive isotope labeling (GRIL)29, 2-aminobenzoic acid (2-AA)30, duplex stable isotope labeling (DuSIL)31 and individuality normalization

144

when labeling with glycan hydrazide tags (INLIGHT™)32. The INLIGHT™ strategy utilizes

12 13 32 both a natural (NAT – C6) and stable-isotope label (SIL – C6) phenyl ring and demonstrates reproducible and quantifiable data with increased sensitivity due to the significant increase in the non-polar surface area32. INLIGHT™ has a reaction efficiency of >95%33 and has recently been shown to increase the ion abundance detection of glycans up to 100-fold compared to the previous reaction conditions34.

The analysis of glycans has been hindered by the lack of software; however, significant strides have been made in the past decade. The ability to search glycomics data (e.g. GlycoWorkbench, GlycoFragment, GlycoSearchMS, SimGlycan, and

GlySeeker) use a manually curated peak list or a list of theoretically calculated m/z for glycan structures which are then searched against the MS or MS/MS data to acquire matching m/z values within a set tolerance35–38. However, these values often do not include the mass of the tags mentioned previously. All previous studies using the

INLIGHT™ strategy have required all N-linked glycans to be identified through manual analysis due to the lack of available software21,39–41. Searching for the light and heavy pairs can aid in the identification of glycans, but can be tedious as it often done manually and many of the pairs have overlapping peaks. The glycomics community would greatly benefit from a tool that curates a list of all glycan m/z values in the data with a specific fixed mass difference that the chemical tags impart on the native glycan.

Herein, we report the development of GlycoHunter; an open-source software designed to identify and perform relative quantification of INLIGHT™ labelled N-linked glycans. GlycoHunter searches MS1 data for a m/z found in a manually curated list or a generated peak list of glycans based on user-defined settings including retention time,

145

mass (i.e.,. m/z) windows and offsets along with experiment and instrument specific tolerances. The peak lists of glycan m/z can then be plotted as a heatmap or 3D stem plot or exported for further analysis in Skyline42,43 or Excel. The software presented here enables researchers to accurately and efficiently process and analyze large and complex glycomics datasets.

6.2 EXPERIMENTAL

Data used in this manuscript are N-linked glycans cleaved from fetuin (Sigma

Aldrich). The N-linked glycan samples were prepared using a new modified filter aided sample preparation procedure followed by derivatization using the INLIGHT™ strategy as outlined in Kalmar et al.34 These glycans were then analyzed using a Thermo

Scientific™ EASY-nLC™ 1200 system coupled with a Q-Exactive HF-X mass spectrometer. GlycoHunter has the ability to accept two widely recognized file formats: imzML and mzXML. Data generated with the Q-Exactive HF-X (.RAW files) were converted to mzML files using the MSConvert v3 tool from ProteoWizard44. The mzML files were then converted to an imzML format using an imzML converter45. GlycoHunter performance measures for all functions were collected using MATLAB R2020a on a

Lenovo W540 Laptop, complete with Intel i7-4700MQ 2.40 GHz processor, 32GB of ram,

500GB mSATA SSD, and Nvidia Quadro KM1100 graphics card and can be found in

Tables D.1-D.3. Comparisons between the manual data collection list and the

GlycoHunter generated peak pairs list was done using the MSi Database tool in the

MSiReader46,47 software package.

146

6.3 RESULTS AND DISCUSSION

GlycoHunter, created in MATLAB48, uses some of the core functions found in

MSiReader including file loading, peak picking algorithms, and exporting data into Excel files, to improve the bioinformatic ‘bottleneck’ caused by the amount of data generated in a glycomics experiment. GlycoHunter aids in this effort by identifying all peak pairs associated with NAT and SIL labeled N-linked glycans using MS1 data, particularly those derivatized with INLIGHT™. Previously, data analysis was carried out manually and/or compared to peak lists curated from extensive literature searches resulting in limiting efforts to identify new glycan structures found within the biological analyses. GlycoHunter enables users to identify glycans based on the distinct mass difference found between the NAT and SIL labeled glycans.

The GlycoHunter graphical user interface (GUI), shown in Figure 6.1, allows users to view the total ion chromatogram of the loaded file and provides brief instructions for the use of the software. It also allows users to define regions of interest by defining a retention time window and/or m/z range, setting a distinct mass offset

(flexibility to use any labeling strategy), choosing the algorithm to identify peaks of interest, and set tolerances specific to the instrument used for the experiment.

147

Figure 6.1. GlycoHunter graphical user interface (GUI) that enables the view of the total ion chromatogram (TIC) and includes basic instructions for how to use the software.

To begin processing the mass spectrometry data in GlycoHunter, the native instrument files must be converted into imzML or mzXML format to be loaded into the software (Figure 6.2). Note that conversion to imzML requires an extra step, but data set load times are much faster (See Tables D.1-D.3). In these two formats, no filtering and no binning occurs before the identification of the peak pairs other than removing zero

148

abundance values and empty scans. Next, the user has the ability to upload Excel or text files containing m/z and scan lists that have been either curated through previous analyses or from the literature. Clicking either of the check boxes in the Custom Filters panel (see Figure 6.1) prompts file explorer to open a dialog allowing the user to select a file. If an Excel file is chosen, the user is prompted to select the worksheet and column containing the filter data. If a text file is chosen, the user is prompted to select a column when there is more than one. Context menus on the filter check boxes allow the user to remove a filter, copy the filter list to the clipboard, and in the case of the peak list specify the ppm tolerance allowed when matching an m/z in the filter with m/z values in a scan.

This particular tool is very useful for users optimizing analytical methods, comparing previous results with new sets of data or the rapid analysis for targeted experiments. Not only can the peak lists and scan lists help narrow the search through the data but GlycoHunter also includes the ability to search within a specific retention time window or MS1 m/z range. This is useful when a user knows where their ions of interest elute or if they would like to focus on a specific m/z range. Reducing the search space also speeds up analysis time. The default retention time and m/z settings encompass the entire run length and MS1 m/z range of the file that was loaded.

149

Figure 6.2. GlycoHunter informatics workflow. Data files should be converted to imzML or mzXML format before loading the data into the software. Once the data is loaded, a targeted peak list can be used to search the data or an untargeted list will be generated by defining the region of interest, mass offset values, peak identification algorithm and experiment-specific tolerances.

150

The user can choose which charge-states they would like to consider in their analysis. These options include the ability to define the m/z offset values within the application based on the mass difference between the NAT and SIL tags that are specific to their experiment. The charge-states that are visualized in the application can also be changed in the application’s .INI file that is downloaded with the software. These masses are arbitrary m/z values selected by the user. This gives GlycoHunter the capability to search for any mass offsets of interest. The mass offset for each charge-state defaults to the mass difference at each charge state for NAT and SIL labeled INLIGHT™. The m/z between NAT and SIL INLIGHT™ tagged glycans with a charge-state of +1 is 6.0201 Th.

With a charge-state of +2, the difference is 3.0101 Th and for a +3 charge-state, the difference is 2.0068 Th. Charge-states of the peak pairs can be confirmed with the isotopic spacing.

GlycoHunter utilizes three different peak picking algorithms titled Parabolic

Centroid, MS Peaks and Local Max. Explanations of each can be found in the

GlycoHunter’s user manual. Briefly, the parabolic centroid algorithm calculates the centroid location for the apex of a peak and its height by identifying the local maximum and two points on either side of the maximum and fitting a parabola to them. This algorithm originated with Comisarow and Marshall in their endeavors of interpreting FTMS spectra49,50 and works similarly to the algorithm used by the XCalibur software from

Thermo Fisher Scientific. MS peaks uses the mspeaks function provided in the bioinformatics toolbox in MATLAB48. As defined in the MATLAB user documentation48, mspeaks is designed to search the raw data by creating a list of centroided peaks using the discrete wavelet transform51. The third and final algorithm option, local max,

151

determines the m/z of the peak based on a maximum within a region of data. The authors have determined that the parabolic centroid algorithm performed best as it chooses peaks similarly to how the datasets were originally curated and is essentially the same algorithm used by the instrument vendor.

Additional parameters should be defined using knowledge specific to an experiment and instrument. Their meaning is shown graphically in the annotated plot above the Instructions panel in the GlycoHunter GUI in Figure 6.1. The Peak Window parameter defines an m/z range in which putative monoisotopic peaks must lie in units of parts per million (ppm) or m/z (Th). For example, many instruments have an associated mass measurement error that contributes to the putative identification of features. Many

Orbitrap systems, including the Q Exactive HF-X, can achieve a mass measurement accuracy less than 3 ppm from an expected value. In this study, the authors chose to use a Peak Window of 6 ppm allowing for 3 ppm error on either side of the expected m/z for glycans of interest. The m/z tolerance parameter is used to find matches at each selected

Charge State offset for every putative monoisotopic peak in every scan within the retention time region of interest. It defines the range of deviation in units of ppm or m/z allowed from the exact m/z offsets given for each Charge State. In this study, the authors used XCalibur to determine an average tolerance for mass spectral data that spanned roughly 0.05 m/z. These settings were used for the analysis of all files in this study.

The abundance threshold is a signal threshold defined in arbitrary units (a.u.) for peaks found by any of the algorithms chosen and for all matching pairs. For this experiment, a threshold of 10,000 a.u. was chosen based on an understanding after manual inspection that lower values would, in large part, be considered noise. The

152

occurrence threshold is the minimum number of times a specific peak pair must occur for it to be considered a feature. For example, glycans are typically observed multiple times over the width of a chromatographic peak. The occurrence threshold setting was evaluated at multiple values from 1-20 and it was found that the optimal minimum occurrence setting for our data sets was 5 as this was the least number of times an MS1 scan occurred across a chromatographic peak containing a known glycan moiety.

Once the files are loaded and the search parameters are defined, the user clicks the Find Peak Pairs button, which continues the GlycoHunter data processing workflow shown in Figure 6.2. In each mass spectrum, GlycoHunter identifies peaks based on the peak picking algorithm, peak window, and abundance threshold settings, then it creates a list of these values and pairs them based on the mass offset values for each chosen charge state within the m/z tolerance window. Once complete, GlycoHunter groups the pairs by m/z into duplicates signifying each as an occurrence. Then, the occurrence threshold, defined by the user, filters the list of peak pairs. Finally, abundance ratios between the heavy and light molecules are calculated for each occurrence and for the average of each group. INLIGHT™ derivatized glycans generally tend to have a heavy to light ratio (H:L) close to 1 if mixed 1:1 prior to analysis. The status bar allows for the visualization of the progress made by the program and the Peak Pairs Found box is also updated each time a pair is identified even if it is a duplicate. The sequential scan number being processes and its retention time are displayed over the status bar and updated in real-time. The process can be terminated at any time by clicking on the STOP button and the results obtained so far will retained. Once the algorithm is finished searching the data, a message box is displayed with the final number of unique pairs identified.

153

The data generated by the peak pairs finder can then be exported by clicking on the Export Peaks button. The Export Peaks GUI appears (Figure 6.3A) allowing users to choose the information they would like to visualize as well as other data export options such as an Excel worksheets and a Skyline transition list. Following the workflow in

Figure 6.3B, the user can define the data they would like to export (results, peak pairs, and abundance matrix) for each charge-state. The user can further select regions of interest using the same options, retention time and m/z range, they set on the main

GlycoHunter GUI. The regions have default values that pertain to the identified pairs rather than the entire data set (i.e., lowest and highest retention time and m/z of the peak pairs found). When choosing to plot the abundance matrix, the data can be visualized as a heat map or a 3D stem plot. These two options can help identify where the peak pairs elute within the chromatogram or illustrate if there are any clustering effects based on retention times or m/z. An example of the 3D stem plot can be found in Figure 6.3B.

Since abundance matrices can be quite large, GlycoHunter estimates memory requirements and queries MATLAB for the available RAM. If there is insufficient memory, the abundance matrix plot and export options are grayed-out and unavailable. Similarly, limits on the number of rows, columns, and cells in an Excel worksheet are checked before attempting to create an output file. Parameters in the .INI file allow the user to specify stricter limitations on export file size and abundance plots.

154

Figure 6.3. (A) Export Peaks GUI allows users to visualize or export results that are detected in specific chromatographic regions or mass ranges and for which charge states.

Following the workflow in (B), the user has the ability to generate multiple plots for different regions of interest and charge states as well as export peaks and results or a

Skyline transition file that can be immediately uploaded for further analysis.

155

As mentioned previously, the user can also choose to export the generated data into an Excel file (Figures D.1-D.3) and/or a Skyline transition file (Figure D.4) for further analysis. The information that can be found in the exported Excel files includes the

GlycoHunter settings, the m/z values for the peak pairs, the number of occurrences for that pair, the individual abundances of each m/z value and the H:L abundance ratios. The normalized abundance ratios are calculated using Equation 6.1 below where int is the abundance of the peak, Hi is the heavy peak and Lo is the light peak of each pair.

normHL = intHi / intLo * (sum(intLo) / sum(intHi)) (Equation 6.1)

Further analysis can be completed using the data from this exported information.

The Skyline transition file includes the molecule list name, the precursor name, the precursor charge and the precursor m/z. The molecule list is the pair ID number generated in the results of GlycoHunter. The precursor name is either light or heavy depending on the precursor m/z value within a pair and the precursor charge is defined by the m/z offset charge state that was used to identify the pair. The authors highly recommend using

Skyline to further annotate the list of identified peak pairs as it allows for rapid analysis of large amounts of data.

Using both the untargeted search method and the peak list option with the settings mentioned in this manuscript, GlycoHunter searched an average of 32,640 scans per file and identified an average of 3,147 unique peak pairs in the fetuin files. After analysis in

Skyline using the chromatogram comparison, the retention time comparison, the mass accuracy and the full scan tools for mass spectral confirmation42,52, 359 paired features were discovered based on co-elution and the precursor m/z mapping to the monoisotopic peaks of the corresponding isotopic distributions (Figure 6.4A). Forty-nine of the 359

156

unique pairs were determined to correspond to N-linked glycans or free sugar moieties based on precursor mass (Table D.4). When compared to the glycans that had been previously identified by searching the data manually from a curated list of m/z’s from the literature34, all18 glycans were found. An example of an N-linked glycan that was found using both manual analysis and GlycoHunter analysis can be found in Figure 6.4B. The abundances for the manual analysis were from the average mass spectrum of the most abundant chromatographic peak and the H:L ratio was calculated from them. The abundance reported from GlycoHunter is the average abundance from every occurrence of a peak and the H:L ratio is calculated using those averages.

157

Figure 6.4. (A) Identified paired features using GlycoHunter compared to values found manually from previous analysis34. Thirty-one glycans previously not identified in this data were found using GlycoHunter then verified using Skyline and fitting combinations of the carbohydrate monomers until the masses were within ± 3 ppm. The list of putative identifications can be found in Table D.4. (B) An example mass spectrum and resulting data of a glycan that was found through manual analysis and using GlycoHunter.

Literature identifies this glycan as (Fuc)1(Gal)2(GlcNAc)4(Man)3 or FH5N4.

158

6.4 CONCLUSIONS We present GlycoHunter, a user-friendly tool created in MATLAB to improve the identification of peak pairs corresponding to NAT and SIL pairs in glycomics data.

GlycoHunter accepts any data files converted into the imzML or mzXML formats and includes the ability to tailor search parameters to user’s specific experiment and instrumentation. GlycoHunter then allows the results to be viewed and exported for further analysis. Regardless of the limitations resulting from the inability to distinguish isotopic distributions, GlycoHunter has the ability to rapidly detect an abundance of peak pairs compared to those found in the literature and can quickly become an invaluable resource to researchers searching any data with stable isotope labelled pairings. Users can download the software for free from the from https://glycohunter.wordpress.ncsu.edu/ 53 or the GlycoHunter section of the MSiReader website54 and contact us through the website to report any impediments or offer suggestions.

6.5 ACKNOWLEDGMENTS

We acknowledge current and former members of the research team for their contributions to this work. This research was performed in part by the Molecular

Education, Technology and Research Innovation Center (METRIC) at NC State

University, which is supported by the State of North Carolina. The authors gratefully acknowledge the financial support received from the National Institute on Aging at the

National Institutes of Health (Grant Number R56AG063885).

159

6.6 LITERATURE CITED

(1) Apweiler, R.; Hermjakob, H.; Sharon, N. On the Frequency of Protein

Glycosylation, as Deduced from Analysis of the SWISS-PROT Database.

Biochim. Biophys. Acta - Gen. Subj. 1999, 1473 (1), 4–8.

(2) Hebert, D. N.; Lamriben, L.; Powers, E. T.; Kelly, J. W. The Intrinsic and Extrinsic

Effects of N-Linked Glycans on Glycoproteostasis. Nat. Chem. Biol. 2014, 10 (11),

902–910.

(3) Furukawa, K.; Ohkawa, Y.; Yamauchi, Y.; Hamamura, K.; Ohmi1, Y.; Furukawa,

K. Fine Tuning of Cell Signals by Glycosylation. J. Biochem 2012, 151 (6), 573–

578.

(4) Preston, R. J. S.; Rawley, O.; Gleeson, E. M.; O’Donnell, J. S.; Yamaguchi, S.;

Mosesson, M. W.; Meh, D. A.; DiOrio, J. P.; Takahashi, N.; Takahashi, H.; Nagai,

K.; Matsuda, M. Elucidating the Role of Carbohydrate Determinants in Regulating

Hemostasis: Insights and Opportunities. Blood 2013, 121 (19), 3801–3810.

(5) Gu, J.; Isaji, T.; Xu, Q.; Kariya, Y.; Gu, W.; Fukuda, T.; Du, Y. Potential Roles of

N-Glycosylation in Cell Adhesion. Glycoconj. J. 2012, 29 (8–9), 599–607.

(6) Ryšlavá, H.; Doubnerová, V.; Kavan, D.; Vaněk, O. Effect of Posttranslational

Modifications on Enzyme Function and Assembly. J. Proteomics 2013, 92, 80–

109.

(7) Hasnain, S. Z.; Gallagher, A. L.; Grencis, R. K.; Thornton, D. J. A New Role for

Mucins in Immunity: Insights from Gastrointestinal Nematode Infection. Int. J.

160

Biochem. Cell Biol. 2013, 45 (2), 364–374.

(8) Vigerust, D. J.; Shepherd, V. L. Virus Glycosylation: Role in Virulence and

Immune Interactions. Trends Microbiol. 2007, 15 (5), 211–218.

(9) Stanley, P.; Taniguchi, N.; Aebi, M. Chapter 9. N-Glycans, Essentials of

Glycobiology, 2nd Edition. Essentials Glycobiol. 2017, 1–14.

(10) Holt, G. D.; Hart, G. W. The Subcellular Distribution of Terminal N-

Acetylglucosamine Moieties. J. Biol. Chem. 1986, 261 (17), 8049–8057.

(11) Bock, K.; Schuster-Kolbe, J.; Altman, E.; Allmaier, G.; Stahl, B.; Christian, R.;

Sleytr, U. B.; Messner, P. Primary Structure of the O-Glycosidically Linked Glycan

Chain of the Crystalline Surface Layer Glycoprotein of Thermoanaerobacter

Thermohydrosulfuricus L111-69. Galactosyl Tyrosine as a Novel Linkage Unit. J.

Biol. Chem. 1994, 269 (10), 7137–7144.

(12) Lu, H.; Zhang, Y.; Yang, P. Advancements in Mass Spectrometry-Based

Glycoproteomics and Glycomics. Natl. Sci. Rev. 2016, 3 (3), 345–364.

(13) Baldwin, M. A.; Stahl, N.; Reinders, L. G.; Gibson, B. W.; Prusiner, S. B.;

Burlingame, A. L. Permethylation and Tandem Mass Spectrometry of

Oligosaccharides Having Free Hexosamine: Analysis of the Glycoinositol

Phospholipid Anchor Glycan from the Scrapie Prion Protein. Anal. Biochem. 1990,

191 (1), 174–182.

(14) Dell, A. [35] Preparation and Desorption Mass Spectrometry of Permethyl and

Peracetyl Derivatives of Oligosaccharides. Methods Enzymol. 1990, 193, 647–

161

660.

(15) Powell, A. K.; Harvey, D. J. Stabilization of Sialic Acids InN-Linked

Oligosaccharides and Gangliosides for Analysis by Positive Ion Matrix-Assisted

Laser Desorption/Ionization Mass Spectrometry. Rapid Commun. Mass

Spectrom. 1996, 10 (9), 1027–1032.

(16) Hase, S.; Hara, S.; Matsushima’, Y. Tagging of Sugars with a Fluorescent

Compound, 2-Aminopyridine. J. Biochem 1979, 85 (1), 217–220.

(17) Bendiak, B.; Salyan, M. E.; Pantoja, M. Sequential Removal of Monosaccharides

from the Reducing End of Oligosaccharides. I. A Reaction between Hydrazine

and Sugars Having a Glycosidic Substituent on a Carbon Atom Adjacent to the

Carbonyl Group. Tetrahedron Lett. 1994, 35 (5), 685–688.

(18) Hahne, H.; Neubert, P.; Kuhn, K.; Etienne, C.; Bomgarden, R.; Rogers, J. C.;

Kuster, B. Carbonyl-Reactive Tandem Mass Tags for the Proteome-Wide

Quantification of N-Linked Glycans. Anal. Chem. 2012, 84 (8), 3716–3724.

(19) Ullmer, R.; Plematl, A.; Rizzi, A. Derivatization by 6-Aminoquinolyl-N-

Hydroxysuccinimidyl Carbamate for Enhancing the Ionization Yield of Small

Peptides and Glycopeptides in Matrix-Assisted Laser Desorption/Ionization and

Electrospray Ionization Mass Spectrometry. Rapid Commun. Mass Spectrom.

2006, 20 (9), 1469–1479.

(20) Cao, W.; Zhang, W.; Huang, J.; Jiang, B.; Zhang, L.; Yang, P. Glycan Reducing

End Dual Isotopic Labeling (GREDIL) for Mass Spectrometry-Based Quantitative

N-Glycomics. Chem. Commun. 2015, 51 (71), 13603–13606.

162

(21) Yang, S.; Yuan, W.; Yang, W.; Zhou, J.; Harlan, R.; Edwards, J.; Li, S.; Zhang, H.

Glycan Analysis by Isobaric Aldehyde Reactive Tags and Mass Spectrometry.

Anal. Chem. 2013, 85 (17), 8188–8195.

(22) Kang, P.; Mechref, Y.; Kyselova, Z.; Goetz, J. A.; Novotny, M. V. Comparative

Glycomic Mapping through Quantitative Permethylation and Stable-Isotope

Labeling. Anal. C 2007, 79 (16), 6064–6073.

(23) Sić, S.; Maier, N. M.; Rizzi, A. M. Quantitative Fingerprinting of O-Linked Glycans

Released from Proteins Using Isotopic Coded Labeling with Deuterated1-Phenyl-

3-Methyl-5-Pyrazolone. J. Chromatogr. A 2015, 1408, 93–100.

(24) Shah, P.; Yang, S.; Sun, S.; Aiyetan, P.; Yarema, K. J.; Zhang, H. Mass

Spectrometric Analysis of Sialylated Glycans with Use of Solid-Phase Labeling of

Sialic Acids. Anal. Chem. 2013, 85 (7), 3606–3613.

(25) Yuan, J.; Hashii, N.; Kawasaki, N.; Itoh, S.; Kawanishi, T.; Hayakawa, T. Isotope

Tag Method for Quantitative Analysis of Carbohydrates by Liquid

Chromatography–Mass Spectrometry. J. Chromatogr. A 2005, 1067 (1–2), 145–

152.

(26) Bowman, M. J.; Zaia, J. Tags for the Stable Isotopic Labeling of Carbohydrates

and Quantitative Analysis by Mass Spectrometry. Anal. Chem. 2007, 79 (15),

5777–5784.

(27) Botelho, J. C.; Atwood, J. A.; Cheng, L.; Alvarez-Manilla, G.; York, W. S.;

Orlando, R. Quantification by Isobaric Labeling (QUIBL) for the Comparative

Glycomic Study of O-Linked Glycans. Int. J. Mass Spectrom. 2008, 278 (2–3),

163

137–142.

(28) Alvarez-Manilla, G.; Warren, N. L.; Abney, T.; Atwood III, J.; Azadi, P.; York, W.

S.; Pierce, M.; Orlando, R. Tools for Glycomics: Relative Quantitation of Glycans

by Isotopic Permethylation Using 13CH3I . Glycobiology 2007, 17 (7), 677–687.

(29) Xia, B.; Feasley, C. L.; Sachdev, G. P.; Smith, D. F.; Cummings, R. D. Glycan

Reductive Isotope Labeling for Quantitative Glycomics. Anal. Biochem. 2009, 387

(2), 162–170.

(30) Váradi, C.; Mittermayr, S.; Millán-Martín, S.; Bones, J. Quantitative Twoplex

Glycan Analysis Using 12C6 and 13C6 Stable Isotope 2-Aminobenzoic Acid

Labelling and Capillary Electrophoresis Mass Spectrometry. Anal. Bioanal. Chem.

2016, 408 (30), 8691–8700.

(31) Wei, L.; Cai, Y.; Yang, L.; Zhang, Y.; Lu, H. Duplex Stable Isotope Labeling

(DuSIL) for Simultaneous Quantitation and Distinction of Sialylated and Neutral N-

Glycans by MALDI-MS. Anal. Chem. 2018, 90 (17), 10442–10449.

(32) Walker, S. H.; Taylor, A. D.; Muddiman, D. C. Individuality Normalization When

Labeling with Isotopic Glycan Hydrazide Tags (INLIGHT): A Novel Glycan-

Relative Quantification Strategy. J. Am. Soc. Mass Spectrom. 2013, 24 (9), 1376–

1384.

(33) Walker, S. H.; Lilley, L. M.; Enamorado, M. F.; Comins, D. L.; Muddiman, D. C.

Hydrophobic Derivatization of N-Linked Glycans for Increased Ion Abundance in

Electrospray Ionization Mass Spectrometry. J. Am. Soc. Mass Spectrom. 2011,

22 (8), 1309–1317.

164

(34) Kalmar, J. G.; Butler, K. E.; Baker, E. S.; Muddiman, D. C. Enhanced Protocol for

Quantitative N-Linked Glycomics Analysis Using Individuality Normalization When

Labeling with Isotopic Glycan Hydrazide Tags (INLIGHT)TM. Anal. Bioanal. Chem.

2020, 1–11.

(35) Ceroni, A.; Maass, K.; Geyer, H.; Geyer, R.; Dell, A.; Haslam, S. M.

GlycoWorkbench: A Tool for the Computer-Assisted Annotation of Mass Spectra

of Glycans †. J. Proteome Res. 2008, 7 (4), 1650–1659.

(36) Lohmann, K. K.; von der Lieth, C.-W. GlycoFragment and GlycoSearchMS: Web

Tools to Support the Interpretation of Mass Spectra of Complex Carbohydrates.

Nucleic Acids Res. 2004, No. 32, 261–266.

(37) Meitei, N. S.; Apte, A.; Snovida, S. I.; Rogers, J. C.; Saba, J. Automating Mass

Spectrometry-Based Quantitative Glycomics Using Aminoxy Tandem Mass Tag

Reagents with SimGlycan. J. Proteomics 2015, 127, 211–222.

(38) Xiao, K.; Wang, Y.; Shen, Y.; Han, Y.; Tian, Z. Large-Scale Identification and

Visualization of N-Glycans with Primary Structures Using GlySeeker. Rapid

Commun. Mass Spectrom. 2018, 32 (2), 142–148.

(39) De Leoz, M. L. A.; Duewer, D. L.; Fung, A.; Liu, L.; Yau, H. K.; Potter, O.; Staples,

G. O.; Furuki, K.; Frenkel, R.; Hu, Y.; Sosic, Z.; Zhang, P.; Altmann, F.; Grunwald-

Grube, C.; Shao, C.; Zaia, J.; Evers, W.; Pengelley, S.; Suckau, D.; Wiechmann,

A.; Resemann, A.; Jabs, W.; Beck, A.; Froehlich, J. W.; Huang, C.; Li, Y.; Liu, Y.;

Sun, S.; Wang, Y.; Seo, Y.; An, H. J.; Reichardt, N. C.; Ruiz, J. E.; Archer-

Hartmann, S.; Azadi, P.; Bell, L.; Lakos, Z.; An, Y.; Cipollo, J. F.; Pucic-Bakovic,

165

M.; Štambuk, J.; Lauc, G.; Li, X.; Wang, P. G.; Bock, A.; Hennig, R.; Rapp, E.;

Creskey, M.; Cyr, T. D.; Nakano, M.; Sugiyama, T.; Leung, P. K. A.; Link-

Lenczowski, P.; Jaworek, J.; Yang, S.; Zhang, H.; Kelly, T.; Klapoetke, S.; Cao,

R.; Kim, J. Y.; Lee, H. K.; Lee, J. Y.; Yoo, J. S.; Kim, S. R.; Suh, S. K.; De Haan,

N.; Falck, D.; Lageveen-Kammeijer, G. S. M.; Wuhrer, M.; Emery, R. J.; Kozak, R.

P.; Liew, L. P.; Royle, L.; Urbanowicz, P. A.; Packer, N. H.; Song, X.; Everest-

Dass, A.; Lattová, E.; Cajic, S.; Alagesan, K.; Kolarich, D.; Kasali, T.; Lindo, V.;

Chen, Y.; Goswami, K.; Gau, B.; Amunugama, R.; Jones, R.; Stroop, C. J. M.;

Kato, K.; Yagi, H.; Kondo, S.; Yuen, C. T.; Harazono, A.; Shi, X.; Magnelli, P. E.;

Kasper, B. T.; Mahal, L.; Harvey, D. J.; O’Flaherty, R.; Rudd, P. M.; Saldova, R.;

Hecht, E. S.; Muddiman, D. C.; Kang, J.; Bhoskar, P.; Menard, D.; Saati, A.;

Merle, C.; Mast, S.; Tep, S.; Truong, J.; Nishikaze, T.; Sekiya, S.; Shafer, A.;

Funaoka, S.; Toyoda, M.; De Vreugd, P.; Caron, C.; Pradhan, P.; Tan, N. C.;

Mechref, Y.; Patil, S.; Rohrer, J. S.; Chakrabarti, R.; Dadke, D.; Lahori, M.; Zou,

C.; Cairo, C.; Reiz, B.; Whittal, R. M.; Lebrilla, C. B.; Wu, L.; Guttman, A.; Szigeti,

M.; Kremkow, B. G.; Lee, K. H.; Sihlbom, C.; Adamczyk, B.; Jin, C.; Karlsson, N.

G.; Örnros, J.; Larson, G.; Nilsson, J.; Meyer, B.; Wiegandt, A.; Komatsu, E.;

Perreault, H.; Bodnar, E. D.; Said, N.; Francois, Y. N.; Leize-Wagner, E.; Maier,

S.; Zeck, A.; Heck, A. J. R.; Yang, Y.; Haselberg, R.; Yu, Y. Q.; Alley, W.; Leone,

J. W.; Yuan, H.; Stein, S. E. NIST Interlaboratory Study on Glycosylation Analysis

of Monoclonal Antibodies: Comparison of Results from Diverse Analytical

Methods. Mol. Cell. Proteomics 2020, 19 (1), 11–30.

(40) Hecht, E. S.; Loziuk, P. L.; Muddiman, D. C. Xylose Migration During Tandem

166

Mass Spectrometry of N-Linked Glycans. J. Am. Soc. Mass Spectrom. 2017, 28

(4).

(41) Hecht, E. S.; Scholl, E. H.; Walker, S. H.; Taylor, A. D.; Cliby, W. A.; Motsinger-

Reif, A. A.; Muddiman, D. C. Relative Quantification and Higher-Order Modeling

of the Plasma Glycan Cancer Burden Ratio in Ovarian Cancer Case-Control

Samples. J. Proteome Res. 2015, 14 (10), 4394–4401.

(42) Pino, L. K.; Searle, B. C.; Bollinger, J. G.; Nunn, B.; Maclean, B.; Maccoss, M. J.

The Skyline Ecosystem: Informatics for Quantitative Mass Spectrometry

Proteomics. Mass Spectrom. Rev. 2017, 39 (3), 229–244.

(43) MacLean, B. X.; Pratt, B. S.; Egertson, J. D.; MacCoss, M. J.; Smith, R. D.; Baker,

E. S. Using Skyline to Analyze Data-Containing Liquid Chromatography, Ion

Mobility Spectrometry, and Mass Spectrometry Dimensions. J. Am. Soc. Mass

Spectrom. 2018, 29 (11), 2182–2188.

(44) Chambers, M. C.; MacLean, B.; Burke, R.; Amodei, D.; Ruderman, D. L.;

Neumann, S.; Gatto, L.; Fischer, B.; Pratt, B.; Egertson, J.; Hoff, K.; Kessner, D.;

Tasman, N.; Shulman, N.; Frewen, B.; Baker, T. A.; Brusniak, M. Y.; Paulse, C.;

Creasy, D.; Flashner, L.; Kani, K.; Moulding, C.; Seymour, S. L.; Nuwaysir, L. M.;

Lefebvre, B.; Kuhlmann, F.; Roark, J.; Rainer, P.; Detlev, S.; Hemenway, T.;

Huhmer, A.; Langridge, J.; Connolly, B.; Chadick, T.; Holly, K.; Eckels, J.;

Deutsch, E. W.; Moritz, R. L.; Katz, J. E.; Agus, D. B.; MacCoss, M.; Tabb, D. L.;

Mallick, P. A Cross-Platform Toolkit for Mass Spectrometry and Proteomics. Nat.

Biotechnol. 2012, 30 (10), 918–920.

167

(45) Race, A. M.; Styles, I. B.; Bunch, J. Inclusive Sharing of Mass Spectrometry

Imaging Data Requires a Converter for All. J. Proteomics 2012, 75 (16), 5111–

5112. https://doi.org/10.1016/j.jprot.2012.05.035.

(46) Robichaud, G.; Garrard, K. P.; Barry, J. A.; Muddiman, D. C. MSiReader: An

Open-Source Interface to View and Analyze High Resolving Power MS Imaging

Files on Matlab Platform. J. Am. Soc. Mass Spectrom. 2013, 24 (5), 718–721.

(47) Bokhart, M. T.; Nazari, M.; Garrard, K. P.; Muddiman, D. C. MSiReader v1.0:

Evolving Open-Source Mass Spectrometry Imaging Software for Targeted and

Untargeted Analyses. J. Am. Soc. Mass Spectrom. 2018, 29 (1), 8–16.

(48) The MathWorks https://www.mathworks.com.

(49) Giancaspro, C.; Comisarow, M. B. EXACT INTERPOLATION OF FOURIER

TRANSFORM SPECTRA. Appl. Spectrosc. 1983, 37 (2), 153–166.

(50) Marshall, A. G.; Verdun, F. R.; Ricca, T. L. Beating the Nyquist Limit by Means of

Interleaved Alternated Delay Sampling: Extension of Lower Mass Limit in Direct-

Mode Fourier Transform Ion Cyclotron Resonance Mass Spectrometry. Appl.

Spectrosc. 1988, 42 (2), 199–203.

(51) Coombes, K. R.; Tsavachidis, S.; Morris, J. S.; Baggerly, K. A.; Hung, M. C.;

Kuerer, H. M. Improved Peak Detection and Quantification of Mass Spectrometry

Data Acquired from Surface-Enhanced Laser Desorption and Ionization by

Denoising Spectra with the Undecimated Discrete Wavelet Transform. Proteomics

2005, 5 (16), 4107–4117.

168

(52) MacLean, B. X.; Pratt, B. S.; Egertson, J. D.; MacCoss, M. J.; Smith, R. D.; Baker,

E. S. Using Skyline to Analyze Data-Containing Liquid Chromatography, Ion

Mobility Spectrometry, and Mass Spectrometry Dimensions. J. Am. Soc. Mass

Spectrom. 2018, 29 (11), 2182–2188.

(53) GlycoHunter https://glycohunter.wordpress.ncsu.edu/

(54) MSiReader https://msireader.wordpress.ncsu.edu/

169

Appendices

170

Appendix A: Supplemental Materials for Chapter 3

Figure A.1. Unnormalized spectral counting scatterplots of (A) WT and (B) E3 ligase KO technical replicates. To get an accurate comparison of each injection, the average unnormalized spectral counts for all first technical replicates were plotted against the average unnormalized spectral counts for all second technical replicates per protein. The

WT and E3 ligase KO biological replicates were then compared first for the (C) WT samples as shown and then for the (D) E3 ligase KO samples. The average number of unnormalized spectral counts per protein in a biological sample was plotted against the average total number of spectral counts across all biological samples.

171

Table A.1. Proteins with increased abundance in the E3 Ligase Knock out samples.

Total Fold Accession Description p-value NSpC Change MGG_01364 | Magnaporthe oryzae 70-15 MGG_01364T0U 85.4015 2.4748 0.0042 hypothetical protein (1012 aa) MGG_02398 | Magnaporthe oryzae 70-15 MGG_02398T0U 195.1320 4.2133 0.0006 DENN domain-containing protein (906 aa) MGG_03223 | Magnaporthe oryzae 70-15 MGG_03223T0U 79.6347 2.0121 0.0275 hypothetical protein (2138 aa) MGG_05937 | Magnaporthe oryzae 70-15 MGG_05937T0U 13.3350 11.7463 0.0203 dynamin family protein (970 aa) MGG_05981 | Magnaporthe oryzae 70-15 MGG_05981T0U glutamine amidotransferase subunit pdxT 146.4937 2.3061 0.0001 (247 aa) MGG_06026 | Magnaporthe oryzae 70-15 MGG_06026T0U 61.9409 2.4578 0.0087 hypothetical protein (635 aa) MGG_08290 | Magnaporthe oryzae 70-15 MGG_08290T0U 18.9516 6.4538 0.0103 hypothetical protein (428 aa) MGG_09215 | Magnaporthe oryzae 70-15 MGG_09215T0U 33.4047 2.4569 0.0469 hypothetical protein (1677 aa) MGG_12004 | Magnaporthe oryzae 70-15 MGG_12004T0U 50.4159 2.0109 0.0451 C2HC5 finger protein (552 aa) MGG_02701 | Magnaporthe oryzae 70-15 MGG_02701T0P 44.1878 2.5453 0.0351 hypothetical protein (1113 aa) MGG_06404 | Magnaporthe oryzae 70-15 MGG_06404T0P 39.0019 2.5126 0.0059 hypothetical protein (399 aa) MGG_07222 | Magnaporthe oryzae 70-15 MGG_07222T0P 11.3636 9.8619 0.0363 DNA polymerase epsilon subunit B (866 aa) MGG_08267 | Magnaporthe oryzae 70-15 MGG_08267T0P 58.1117 3.6401 0.0042 hypothetical protein (541 aa) MGG_09394 | Magnaporthe oryzae 70-15 MGG_09394T0P 44.5931 2.8984 0.0379 hypothetical protein (350 aa) MGG_09838 | Magnaporthe oryzae 70-15 MGG_09838T0P 76.1882 3.3363 0.0195 sulfate transporter 4.1 (801 aa) MGG_11100 | Magnaporthe oryzae 70-15 MGG_11100T0P 14.5478 4.7794 0.0220 hypothetical protein (656 aa) MGG_11899 | Magnaporthe oryzae 70-15 MGG_11899T0P 47.5260 2.6763 0.0328 SH3 domain-containing protein (1013 aa) MGG_03192 | Magnaporthe oryzae 70-15 MGG_03192T0UP RNA polymerase II transcription factor B 77.5580 2.0489 0.0304 subunit 1 (654 aa) MGG_03286 | Magnaporthe oryzae 70-15 MGG_03286T0UP 70.2955 7.7137 0.0008 hypothetical protein (1175 aa) MGG_00583 | Magnaporthe oryzae 70-15 MGG_00583T0 10.3781 9.1663 0.0010 beta-galactosidase (1072 aa) MGG_00602 | Magnaporthe oryzae 70-15 MGG_00602T0 40.0262 16.1631 0.0002 cross-pathway control protein 1 (240 aa) MGG_00669 | Magnaporthe oryzae 70-15 MGG_00669T0 22.1006 3.1110 0.0331 hypothetical protein (555 aa)

172

Table A.1. Continued MGG_00859 | Magnaporthe oryzae 70-15 MGG_00859T0 52.1920 4.5748 0.0002 ariadne-1 (523 aa) MGG_01174 | Magnaporthe oryzae 70-15 MGG_01174T0 19.0049 4.3647 0.0150 hypothetical protein (527 aa) MGG_01381 | Magnaporthe oryzae 70-15 MGG_01381T0 15.2448 6.4668 0.0124 calcium permease (1159 aa) MGG_01665 | Magnaporthe oryzae 70-15 MGG_01665T0 12.6940 3.9657 0.0358 hypothetical protein (411 aa) MGG_01823 | Magnaporthe oryzae 70-15 MGG_01823T0 22.2452 2.3804 0.0467 LAlv9 family protein (812 aa) MGG_02123 | Magnaporthe oryzae 70-15 MGG_02123T0 8.2274 7.0595 0.0070 hypothetical protein (747 aa) MGG_02459 | Magnaporthe oryzae 70-15 MGG_02459T0 18.1937 4.4982 0.0001 DNA polymerase delta small subunit (498 aa) MGG_02891 | Magnaporthe oryzae 70-15 MGG_02891T0 55.7704 2.1981 0.0045 SH3 domain signaling protein (493 aa) MGG_02992 | Magnaporthe oryzae 70-15 MGG_02992T0 9.9156 8.9156 0.0097 hypothetical protein (605 aa) MGG_03150 | Magnaporthe oryzae 70-15 MGG_03150T0 cytosolic iron-sulfur protein assembly protein 47.9875 2.2313 0.0354 1 (447 aa) MGG_03292 | Magnaporthe oryzae 70-15 MGG_03292T0 30.8299 3.4458 0.0038 hypothetical protein (795 aa) MGG_03321 | Magnaporthe oryzae 70-15 MGG_03321T0 25.7635 2.6236 0.0191 hypothetical protein (432 aa) MGG_03416 | Magnaporthe oryzae 70-15 MGG_03416T0 148.3183 2.2084 0.0038 acetyltransferase (201 aa) MGG_03558 | Magnaporthe oryzae 70-15 PH MGG_03558T0 18.6657 3.0935 0.0361 domain-containing protein (932 aa) MGG_03569 | Magnaporthe oryzae 70-15 MGG_03569T0 56.3258 4.1761 0.0004 fatty acid desaturase (569 aa) MGG_03900 | Magnaporthe oryzae 70-15 MGG_03900T0 331.3938 2.0286 0.0054 aldehyde dehydrogenase (497 aa) MGG_04107 | Magnaporthe oryzae 70-15 MGG_04107T0 15.7002 2.6606 0.0427 SET domain-containing protein 8 (479 aa) MGG_04146 | Magnaporthe oryzae 70-15 MGG_04146T0 10.6710 3.6636 0.0344 hypothetical protein (675 aa) MGG_04327 | Magnaporthe oryzae 70-15 MGG_04327T0 9.5873 8.1641 0.0303 hypothetical protein (1241 aa) MGG_04398 | Magnaporthe oryzae 70-15 6- MGG_04398T0 42.1599 2.0394 0.0327 phosphogluconate dehydrogenase (334 aa) MGG_04469 | Magnaporthe oryzae 70-15 MGG_04469T0 15.2408 4.9209 0.0036 cytochrome P450 97B3 (603 aa) MGG_04843 | Magnaporthe oryzae 70-15 MGG_04843T0 19.4048 2.4866 0.0293 hypothetical protein (611 aa) MGG_05178 | Magnaporthe oryzae 70-15 AF- MGG_05178T0 9.6411 6.4974 0.0461 9 (264 aa) MGG_05256 | Magnaporthe oryzae 70-15 MGG_05256T0 vacuolar protein sorting-associated protein 31.8244 2.1065 0.0486 (830 aa) MGG_05284 | Magnaporthe oryzae 70-15 MGG_05284T0 11.4356 7.8929 0.0273 cytosolic regulator Pianissimo (1397 aa)

173

Table A.1. Continued MGG_06799 | Magnaporthe oryzae 70-15 MGG_06799T0 32.0775 7.3486 0.0192 hypothetical protein (796 aa) MGG_06925 | Magnaporthe oryzae 70-15 MGG_06925T0 18.8051 2.6881 0.0244 hypothetical protein (1337 aa) MGG_07307 | Magnaporthe oryzae 70-15 MGG_07307T0 85.7192 3.2424 0.0157 NADH-cytochrome b5 reductase 2 (309 aa) MGG_07573 | Magnaporthe oryzae 70-15 MGG_07573T0 12.5164 5.1170 0.0231 hypothetical protein (1060 aa) MGG_07890 | Magnaporthe oryzae 70-15 MGG_07890T0 44.8820 2.8234 0.0104 aldehyde dehydrogenase 3I1 (528 aa) MGG_07981 | Magnaporthe oryzae 70-15 MGG_07981T0 23.3258 2.7871 0.0061 aminopeptidase ypdF (492 aa) MGG_08114 | Magnaporthe oryzae 70-15 MGG_08114T0 10.1719 8.7228 0.0089 hypothetical protein (856 aa) MGG_08156 | Magnaporthe oryzae 70-15 MGG_08156T0 11.8046 6.9075 0.0158 kinesin-II 85 kDa subunit (615 aa) MGG_08293 | Magnaporthe oryzae 70-15 MGG_08293T0 8.9223 7.5284 0.0110 salicylate hydroxylase (472 aa) MGG_08463 | Magnaporthe oryzae 70-15 MGG_08463T0 19.4364 3.2068 0.0349 transcription factor MBP1 (716 aa) MGG_08615 | Magnaporthe oryzae 70-15 MGG_08615T0 18.6364 11.4839 0.0054 dihydrodipicolinate synthase (330 aa) MGG_08841 | Magnaporthe oryzae 70-15 MGG_08841T0 58.5108 3.1425 0.0062 hypothetical protein (1037 aa) MGG_08860 | Magnaporthe oryzae 70-15 MGG_08860T0 exosome complex exonuclease RRP45 (294 40.2269 2.0543 0.0460 aa) MGG_08914 | Magnaporthe oryzae 70-15 MGG_08914T0 8.2271 5.3867 0.0485 hypothetical protein (1103 aa) MGG_09071 | Magnaporthe oryzae 70-15 MGG_09071T0 aminobenzoyl-glutamate utilization protein B 78.9423 2.2481 0.0015 (410 aa) MGG_09283 | Magnaporthe oryzae 70-15 MGG_09283T0 NEDD8-activating enzyme E1 catalytic 34.6154 2.3797 0.0229 subunit (435 aa) MGG_10157 | Magnaporthe oryzae 70-15 MGG_10157T0 12.8727 4.5804 0.0235 hypothetical protein (727 aa) MGG_10189 | Magnaporthe oryzae 70-15 MGG_10189T0 57.7745 2.1419 0.0191 beta-glucosidase A (513 aa) MGG_10327 | Magnaporthe oryzae 70-15 MGG_10327T0 485.4915 2.2794 8.09E-06 hypothetical protein (730 aa) MGG_10528 | Magnaporthe oryzae 70-15 MGG_10528T0 31.4620 2.3303 0.0060 zinc binuclear cluster-type protein (1032 aa) MGG_10529 | Magnaporthe oryzae 70-15 MGG_10529T0 fungal specific transcription factor domain- 12.5233 3.9032 0.0488 containing protein (791 aa) MGG_11322 | Magnaporthe oryzae 70-15 MGG_11322T0 40.0425 2.5859 0.0192 dephospho-CoA kinase (278 aa) MGG_11426 | Magnaporthe oryzae 70-15 5'- MGG_11426T0 132.6259 2.1179 0.0031 3' exoribonuclease 2 (1021 aa) MGG_11575 | Magnaporthe oryzae 70-15 MGG_11575T0 115.0971 2.1933 0.0035 hypothetical protein (248 aa)

174

Table A.1. Continued MGG_13058 | Magnaporthe oryzae 70-15 MGG_13058T0 75.0941 49.1844 0.0002 hypothetical protein (828 aa) MGG_13191 | Magnaporthe oryzae 70-15 MGG_13191T0 45.0759 2.3589 0.0127 hypothetical protein (184 aa) MGG_14266 | Magnaporthe oryzae 70-15 MGG_14266T0 38.3065 3.3165 0.0464 ubiquitin-conjugating enzyme (261 aa) MGG_14270 | Magnaporthe oryzae 70-15 MGG_14270T0 22.4262 2.4033 0.0426 hypothetical protein (366 aa) MGG_16442 | Magnaporthe oryzae 70-15 MGG_16442T0 18.9157 3.9202 0.0072 hypothetical protein (277 aa) MGG_17680 | Magnaporthe oryzae 70-15 MGG_17680T0 35.6630 2.4375 0.0301 hypothetical protein (117 aa) MGG_17699 | Magnaporthe oryzae 70-15 MGG_17699T0 13.9315 4.9680 0.0077 aminodeoxychorismate synthase (879 aa) MGG_17701 | Magnaporthe oryzae 70-15 MGG_17701T0 12.0949 3.0038 0.0400 hypothetical protein (357 aa)

Table A.2. Proteins with decreased abundance in the E3 Ligase knockout samples.

Total Fold p- Accession Description NSpC Change value MGG_02055 | Magnaporthe oryzae 70-15 MGG_02055T0U 52.2216 0.2763 0.0072 hypothetical protein (616 aa) MGG_08005 | Magnaporthe oryzae 70-15 MGG_08005T0U phosphoinositide 3-phosphate phosphatase 56.6775 0.4887 0.0107 (659 aa) MGG_12999 | Magnaporthe oryzae 70-15 MGG_12999T0U 93.0919 0.4441 0.0455 acetoacetate-CoA ligase (705 aa) MGG_05257 | Magnaporthe oryzae 70-15 MGG_05257T0P 23.1149 0.3088 0.0356 RING finger protein (738 aa) MGG_00070 | Magnaporthe oryzae 70-15 3.15E- MGG_00070T0 10.8896 19.4507 hypothetical protein (235 aa) 06 MGG_00143 | Magnaporthe oryzae 70-15 3.73E- MGG_00143T0 39.1310 17.4952 hypothetical protein (396 aa) 05 MGG_00504 | Magnaporthe oryzae 70-15 MGG_00504T0 31.5709 4.7826 0.0001 zinc finger protein 740 (676 aa) MGG_00683 | Magnaporthe oryzae 70-15 MGG_00683T0 71.1382 12.4125 0.0001 endoplasmic oxidoreductin-1 (607 aa) MGG_00835 | Magnaporthe oryzae 70-15 MGG_00835T0 15.7801 14.1606 0.0002 flotillin domain-containing protein (503 aa) MGG_00851 | Magnaporthe oryzae 70-15 MGG_00851T0 17.7961 10.1686 0.0003 hypothetical protein (426 aa) MGG_01069 | Magnaporthe oryzae 70-15 MGG_01069T0 42.0158 16.1540 0.0012 hypothetical protein (104 aa) MGG_01305 | Magnaporthe oryzae 70-15 MGG_01305T0 119.4348 9.8019 0.0008 hypothetical protein (144 aa)

175

Table A.2. Continued MGG_01708 | Magnaporthe oryzae 70-15 MGG_01708T0 9.9272 1.3623 0.0014 AN1-type zinc finger protein 1 (324 aa) MGG_02109 | Magnaporthe oryzae 70-15 MGG_02109T0 15.4641 3.3675 0.0009 hypothetical protein (92 aa) MGG_02537 | Magnaporthe oryzae 70-15 MGG_02537T0 25.4721 7.2297 0.0031 hypothetical protein (869 aa) MGG_02944 | Magnaporthe oryzae 70-15 MGG_02944T0 70.4324 1.2114 0.0037 HAL protein kinase (851 aa) MGG_03000 | Magnaporthe oryzae 70-15 MGG_03000T0 37.2745 2.2174 0.0057 hypothetical protein (592 aa) MGG_03097 | Magnaporthe oryzae 70-15 MGG_03097T0 30.3224 1.7792 0.0068 oxidoreductase (445 aa) MGG_03203 | Magnaporthe oryzae 70-15 MGG_03203T0 31.9276 1.8477 0.0063 54S ribosomal protein L16 (252 aa) MGG_03462 | Magnaporthe oryzae 70-15 MGG_03462T0 71.9726 6.3567 0.0080 hypothetical protein (585 aa) MGG_03823 | Magnaporthe oryzae 70-15 MGG_03823T0 114.1477 6.4173 0.0058 NADH oxidase (419 aa) MGG_03863 | Magnaporthe oryzae 70-15 MGG_03863T0 36.9956 3.2941 0.0061 hypothetical protein (187 aa) MGG_04212 | Magnaporthe oryzae 70-15 L- MGG_04212T0 ornithine 5-monooxygenase (L-ornithine 151.9231 3.0445 0.0081 N(5)-oxygenase) (564 aa) MGG_04344 | Magnaporthe oryzae 70-15 MGG_04344T0 22.7061 5.1406 0.0091 hypothetical protein (563 aa) MGG_04677 | Magnaporthe oryzae 70-15 MGG_04677T0 10.7469 10.0045 0.0129 hypothetical protein (646 aa) MGG_04736 | Magnaporthe oryzae 70-15 MGG_04736T0 8.9216 1.8899 0.0129 hypothetical protein (304 aa) MGG_04973 | Magnaporthe oryzae 70-15 MGG_04973T0 18.1819 2.1930 0.0107 carbonate dehydratase (273 aa) MGG_05310 | Magnaporthe oryzae 70-15 MGG_05310T0 57.7183 2.9383 0.0128 sideroflexin-5 (341 aa) MGG_05674 | Magnaporthe oryzae 70-15 MGG_05674T0 13.5068 0.9480 0.0126 hypothetical protein (411 aa) MGG_05766 | Magnaporthe oryzae 70-15 MGG_05766T0 70.1020 1.6110 0.0139 hypothetical protein (288 aa) MGG_05815 | Magnaporthe oryzae 70-15 MGG_05815T0 36.8337 4.6989 0.0190 pre-rRNA-processing protein ESF2 (331 aa) MGG_05942 | Magnaporthe oryzae 70-15 MGG_05942T0 8.1108 1.6052 0.0229 flavin containing monooxygenase 9 (533 aa) MGG_06151 | Magnaporthe oryzae 70-15 MGG_06151T0 cytosolic Fe-S cluster assembly factor 14.5954 3.1531 0.0173 NBP35 (344 aa) MGG_06285 | Magnaporthe oryzae 70-15 MGG_06285T0 18.2848 1.7124 0.0214 hypothetical protein (978 aa) MGG_06545 | Magnaporthe oryzae 70-15 MGG_06545T0 48.4118 3.0093 0.0204 hypothetical protein (815 aa) MGG_06906 | Magnaporthe oryzae 70-15 MGG_06906T0 21.5556 3.8217 0.0188 hypothetical protein (568 aa)

176

Table A.2. Continued MGG_07223 | Magnaporthe oryzae 70-15 MGG_07223T0 altered inheritance-mitochondria protein 31 56.7933 6.0552 0.0170 (214 aa) MGG_07369 | Magnaporthe oryzae 70-15 MGG_07369T0 170.7811 3.6872 0.0289 hypothetical protein (412 aa) MGG_07394 | Magnaporthe oryzae 70-15 MGG_07394T0 33.1112 4.8728 0.0214 hypothetical protein (116 aa) MGG_07697 | Magnaporthe oryzae 70-15 MGG_07697T0 34.8546 3.3578 0.0195 superoxide dismutase (214 aa) MGG_07935 | Magnaporthe oryzae 70-15 MGG_07935T0 118.7296 4.5993 0.0206 galactonate dehydratase (384 aa) MGG_08083 | Magnaporthe oryzae 70-15 MGG_08083T0 8.8913 3.5701 0.0231 hypothetical protein (546 aa) MGG_08163 | Magnaporthe oryzae 70-15 MGG_08163T0 49.1428 2.9456 0.0284 30S ribosomal protein S14p/S29e (57 aa) MGG_08201 | Magnaporthe oryzae 70-15 MGG_08201T0 23.6339 4.8492 0.0249 hypothetical protein (137 aa) MGG_08308 | Magnaporthe oryzae 70-15 MGG_08308T0 18.1492 2.5263 0.0331 hypothetical protein (282 aa) MGG_08577 | Magnaporthe oryzae 70-15 MGG_08577T0 50.9072 4.1708 0.0255 hypothetical protein (432 aa) MGG_08641 | Magnaporthe oryzae 70-15 MGG_08641T0 46.4191 2.9320 0.0337 pre-mRNA-splicing factor CWC2 (395 aa) MGG_08688 | Magnaporthe oryzae 70-15 MGG_08688T0 mitochondrial inner membrane protein (550 46.7370 0.8839 0.0335 aa) MGG_08795 | Magnaporthe oryzae 70-15 MGG_08795T0 52.8734 0.9850 0.0383 hypothetical protein (595 aa) MGG_09514 | Magnaporthe oryzae 70-15 MGG_09514T0 13.0378 6.0894 0.0387 methyltransferase OMS1 (487 aa) MGG_10061 | Magnaporthe oryzae 70-15 MGG_10061T0 153.3026 1.6070 0.0384 catalase-1 (744 aa) MGG_10111 | Magnaporthe oryzae 70-15 MGG_10111T0 41.9779 1.8394 0.0313 glucose and ribitol dehydrogenase (347 aa) MGG_10274 | Magnaporthe oryzae 70-15 MGG_10274T0 219.8714 2.7055 0.0318 hypothetical protein (255 aa) MGG_10441 | Magnaporthe oryzae 70-15 MGG_10441T0 32.5816 1.7569 0.0339 lipase 2 (338 aa) MGG_11130 | Magnaporthe oryzae 70-15 MGG_11130T0 58.9006 1.3707 0.0355 hypothetical protein (713 aa) MGG_12745 | Magnaporthe oryzae 70-15 MGG_12745T0 39.7800 1.4013 0.0377 hypothetical protein (507 aa) MGG_13774 | Magnaporthe oryzae 70-15 MGG_13774T0 17.3119 1.9225 0.0475 hypothetical protein (340 aa) MGG_14678 | Magnaporthe oryzae 70-15 MGG_14678T0 7.8653 1.0485 0.0367 hypothetical protein (409 aa) MGG_15688 | Magnaporthe oryzae 70-15 MGG_15688T0 13.9507 5.0225 0.0425 hypothetical protein (692 aa) MGG_16374 | Magnaporthe oryzae 70-15 MGG_16374T0 33.6181 0.9480 0.0409 hypothetical protein (351 aa) MGG_17476 | Magnaporthe oryzae 70-15 MGG_17476T0 83.3382 4.0367 0.0484 hypothetical protein (253 aa)

177

Table A.2. Continued MGG_17595 | Magnaporthe oryzae 70-15 MGG_17595T0 21.9518 4.1662 0.0437 hypothetical protein (510 aa) MGG_17713 | Magnaporthe oryzae 70-15 MGG_17713T0 17.1908 0.8205 0.0452 hypothetical protein (637 aa) MGG_17823 | Magnaporthe oryzae 70-15 MGG_17823T0 109.7235 4.1150 0.0458 cyanide hydratase (357 aa) MGG_17829 | Magnaporthe oryzae 70-15 MGG_17829T0 59.7701 1.9857 0.0476 hypothetical protein (363 aa)

Table A.3. Proteins unique to the Wild Type samples.

Accession Description Total NSpC MGG_01080 | Magnaporthe oryzae 70-15 hypothetical MGG_01080T0U 5.1437 protein (1465 aa) MGG_07393 | Magnaporthe oryzae 70-15 histone-lysine N- MGG_07393T0U 5.1437 methyltransferase SET9 (886 aa) MGG_07953 | Magnaporthe oryzae 70-15 bifunctional P- MGG_07953T0U 20.3819 450:NADPH-P450 reductase (1096 aa) MGG_07960 | Magnaporthe oryzae 70-15 hypothetical MGG_07960T0U 5.1482 protein (401 aa) MGG_08912 | Magnaporthe oryzae 70-15 hypothetical MGG_08912T0U 5.1437 protein (446 aa) MGG_09384 | Magnaporthe oryzae 70-15 hypothetical MGG_09384T0U 10.3053 protein (514 aa) MGG_09710 | Magnaporthe oryzae 70-15 hypothetical MGG_09710T0U 7.7290 protein (294 aa) MGG_10322 | Magnaporthe oryzae 70-15 cortical patch MGG_10322T0U 7.7156 protein (255 aa) MGG_12009 | Magnaporthe oryzae 70-15 hypothetical MGG_12009T0U 13.0625 protein (858 aa) MGG_12447 | Magnaporthe oryzae 70-15 polyketide MGG_12447T0U 5.0728 synthase/peptide synthetase (4035 aa) MGG_13065 | Magnaporthe oryzae 70-15 SCF E3 ubiquitin MGG_13065T0U 16.6687 ligase complex F-box protein grrA (785 aa) MGG_16007 | Magnaporthe oryzae 70-15 hypothetical MGG_16007T0U 5.1437 protein (200 aa) MGG_17263 | Magnaporthe oryzae 70-15 hypothetical MGG_17263T0U 6.0000 protein (308 aa) MGG_01996 | Magnaporthe oryzae 70-15 hypothetical MGG_01996T1P 8.1667 protein (1015 aa) MGG_06699 | Magnaporthe oryzae 70-15 hypothetical MGG_06699T0P 9.5569 protein (480 aa) MGG_06778 | Magnaporthe oryzae 70-15 C6 zinc finger MGG_06778T0P 10.2254 domain-containing protein (725 aa) MGG_07599 | Magnaporthe oryzae 70-15 hypothetical MGG_07599T0P 10.0951 protein (141 aa) MGG_14758 | Magnaporthe oryzae 70-15 hypothetical MGG_14758T0P 11.4847 protein (190 aa) MGG_00314T0UP MGG_00314 | Magnaporthe oryzae 70-15 lipase 4 (599 aa) 7.7290

178

Table A.3. Continued MGG_00944 | Magnaporthe oryzae 70-15 hypothetical MGG_00944T0UP 11.5734 protein (480 aa) MGG_01638 | Magnaporthe oryzae 70-15 sodium/calcium MGG_01638T0UP 7.5499 exchanger protein (1005 aa) MGG_02309 | Magnaporthe oryzae 70-15 MGG_02309T0UP 15.4579 carboxypeptidase S1 (635 aa) MGG_03624 | Magnaporthe oryzae 70-15 hypothetical MGG_03624T0UP 10.0000 protein (230 aa) MGG_04329 | Magnaporthe oryzae 70-15 hypothetical MGG_04329T0UP 5.3298 protein (516 aa) MGG_07586 | Magnaporthe oryzae 70-15 trichothecene C- MGG_07586T0UP 6.1250 15 hydroxylase (167 aa) MGG_07817 | Magnaporthe oryzae 70-15 hypothetical MGG_07817T0UP 5.0728 protein (602 aa) MGG_07972 | Magnaporthe oryzae 70-15 hypothetical MGG_07972T0UP 5.9855 protein (292 aa) MGG_08348 | Magnaporthe oryzae 70-15 hypothetical MGG_08348T0UP 5.1437 protein (82 aa) MGG_09236 | Magnaporthe oryzae 70-15 hypothetical MGG_09236T0UP 11.9426 protein (857 aa) MGG_12530 | Magnaporthe oryzae 70-15 histidine kinase MGG_12530T0UP 6.4891 G7 (1272 aa) MGG_13542 | Magnaporthe oryzae 70-15 DEAD_2 domain- MGG_13542T0UP 5.1437 containing protein (922 aa) MGG_00052 | Magnaporthe oryzae 70-15 hypothetical MGG_00052T0 7.1753 protein (225 aa) MGG_00521 | Magnaporthe oryzae 70-15 hypothetical MGG_00521T0 6.2960 protein (874 aa) MGG_00532 | Magnaporthe oryzae 70-15 hypothetical MGG_00532T0 14.8194 protein (530 aa) MGG_01814 | Magnaporthe oryzae 70-15 hypothetical MGG_01814T0 5.0562 protein (460 aa) MGG_01941 | Magnaporthe oryzae 70-15 FAD binding MGG_01941T0 11.0618 domain-containing protein (521 aa) MGG_03070 | Magnaporthe oryzae 70-15 epoxide MGG_03070T0 87.2954 hydrolase domain-containing protein (411 aa) MGG_03347 | Magnaporthe oryzae 70-15 hypothetical MGG_03347T0 9.7565 protein (275 aa) MGG_04346 | Magnaporthe oryzae 70-15 sterol 24-C- MGG_04346T0 27.4567 methyltransferase (391 aa) MGG_05005 | Magnaporthe oryzae 70-15 LMBR1 domain- MGG_05005T0 5.5433 containing protein (728 aa) MGG_05022 | Magnaporthe oryzae 70-15 hypothetical MGG_05022T0 11.0813 protein (322 aa) MGG_05033 | Magnaporthe oryzae 70-15 hypothetical MGG_05033T0 6.0309 protein (743 aa) MGG_05139 | Magnaporthe oryzae 70-15 hypothetical MGG_05139T0 6.9786 protein (477 aa) MGG_05272 | Magnaporthe oryzae 70-15 hypothetical MGG_05272T0 7.4783 protein (785 aa)

179

Table A.3. Continued MGG_05756 | Magnaporthe oryzae 70-15 hypothetical MGG_05756T0 5.9784 protein (250 aa) MGG_05968 | Magnaporthe oryzae 70-15 MGG_05968T0 8.0859 endoribonuclease L-PSP (137 aa) MGG_06893 | Magnaporthe oryzae 70-15 hypothetical MGG_06893T0 5.0816 protein (548 aa) MGG_10156 | Magnaporthe oryzae 70-15 hypothetical MGG_10156T0 14.1536 protein (185 aa) MGG_10252 | Magnaporthe oryzae 70-15 hypothetical MGG_10252T0 328.0972 protein (487 aa) MGG_10533 | Magnaporthe oryzae 70-15 agmatinase 1 MGG_10533T0 30.9978 (424 aa) MGG_10596 | Magnaporthe oryzae 70-15 CMGC/SRPK MGG_10596T1 15.6928 protein kinase (534 aa) MGG_11431 | Magnaporthe oryzae 70-15 cytochrome c MGG_11431T0 5.3099 oxidase assembly protein COX11 (261 aa) MGG_12124 | Magnaporthe oryzae 70-15 hypothetical MGG_12124T0 10.5533 protein (54 aa) MGG_13185 | Magnaporthe oryzae 70-15 inositol-3- MGG_13185T0 14.1875 phosphate synthase (553 aa) MGG_13455 | Magnaporthe oryzae 70-15 hypothetical MGG_13455T0 5.0136 protein (469 aa) MGG_13493 | Magnaporthe oryzae 70-15 F-box protein MGG_13493T0 5.5433 (561 aa) MGG_16418 | Magnaporthe oryzae 70-15 hypothetical MGG_16418T0 5.6696 protein (220 aa) MGG_17677 | Magnaporthe oryzae 70-15 hypothetical MGG_17677T0 5.0527 protein (618 aa) MGG_17979 | Magnaporthe oryzae 70-15 hypothetical MGG_17979T0 10.1042 protein (145 aa)

Table A.4. Proteins unique to the E3 Ligase knockout samples.

Accession Description Total NSpC MGG_02114 | Magnaporthe oryzae 70-15 hypothetical protein MGG_02114T0U 6.2703 (813 aa) MGG_02377 | Magnaporthe oryzae 70-15 hypothetical protein MGG_02377T0U 7.9145 (884 aa) MGG_06430 | Magnaporthe oryzae 70-15 hypothetical protein MGG_06430T0U 9.0409 (565 aa) MGG_07598 | Magnaporthe oryzae 70-15 hypothetical protein MGG_07598T0U 5.2048 (200 aa) MGG_07905 | Magnaporthe oryzae 70-15 hypothetical protein MGG_07905T0U 5.4579 (1974 aa) MGG_08467 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08467T0U 5.2048 (412 aa) MGG_08697 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08697T0U 6.5060 (1249 aa) MGG_13692 | Magnaporthe oryzae 70-15 hypothetical protein MGG_13692T0U 13.1909 (87 aa)

180

Table A.4. Continued MGG_17625 | Magnaporthe oryzae 70-15 hypothetical protein MGG_17625T0U 5.7229 (814 aa) MGG_05047 | Magnaporthe oryzae 70-15 hypothetical protein MGG_05047T0P 14.5125 (883 aa) MGG_08007 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08007T0P 7.2163 (254 aa) MGG_08811 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08811T0P 5.5448 (320 aa) MGG_08946 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08946T0P 11.7286 (237 aa) MGG_08985 | Magnaporthe oryzae 70-15 beta-xylosidase MGG_08985T0P 5.2763 (848 aa) MGG_11606 | Magnaporthe oryzae 70-15 hypothetical protein MGG_11606T0P 18.2571 (343 aa) MGG_14575 | Magnaporthe oryzae 70-15 hypothetical protein MGG_14575T0P 5.7277 (514 aa) MGG_00072 | Magnaporthe oryzae 70-15 isoamyl alcohol MGG_00072T0UP 5.2286 oxidase (600 aa) MGG_02342 | Magnaporthe oryzae 70-15 hypothetical protein MGG_02342T0UP 11.7644 (853 aa) MGG_02408 | Magnaporthe oryzae 70-15 hypothetical protein MGG_02408T0UP 8.7239 (929 aa) MGG_02508 | Magnaporthe oryzae 70-15 hypothetical protein MGG_02508T0UP 5.9899 (269 aa) MGG_02988 | Magnaporthe oryzae 70-15 hypothetical protein MGG_02988T0UP 11.8718 (384 aa) MGG_04685 | Magnaporthe oryzae 70-15 hypothetical protein MGG_04685T0UP 9.2336 (1459 aa) MGG_04929 | Magnaporthe oryzae 70-15 hypothetical protein MGG_04929T0UP 8.9518 (750 aa) MGG_05798 | Magnaporthe oryzae 70-15 hypothetical protein MGG_05798T0UP 5.2048 (217 aa) MGG_05938 | Magnaporthe oryzae 70-15 2,2-dialkylglycine MGG_05938T0UP 5.2286 decarboxylase (452 aa) MGG_07476 | Magnaporthe oryzae 70-15 hypothetical protein MGG_07476T0UP 5.1973 (872 aa) MGG_07975 | Magnaporthe oryzae 70-15 hypothetical protein MGG_07975T0UP 5.2763 (193 aa) MGG_08174 | Magnaporthe oryzae 70-15 mitogen-activated MGG_08174T0UP 5.2406 protein kinase organizer 1 (358 aa) MGG_08211 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08211T0UP 7.8072 (572 aa) MGG_08919 | Magnaporthe oryzae 70-15 UDP-glucose,sterol MGG_08919T0UP 11.0897 transferase (1324 aa) MGG_09185 | Magnaporthe oryzae 70-15 hypothetical protein MGG_09185T0UP 7.1879 (1228 aa) MGG_09262 | Magnaporthe oryzae 70-15 autophagy protein 5 MGG_09262T0UP 7.2644 (315 aa) MGG_09537 | Magnaporthe oryzae 70-15 MSF1 domain- MGG_09537T0UP 21.5637 containing protein (233 aa) MGG_10001 | Magnaporthe oryzae 70-15 hypothetical protein MGG_10001T0UP 5.9899 (265 aa)

181

Table A.4. Continued MGG_10086 | Magnaporthe oryzae 70-15 hypothetical protein MGG_10086T0UP 5.2763 (129 aa) MGG_11230 | Magnaporthe oryzae 70-15 hypothetical protein MGG_11230T0UP 5.9907 (601 aa) MGG_12714 | Magnaporthe oryzae 70-15 hypothetical protein MGG_12714T0UP 6.2681 (310 aa) MGG_12858 | Magnaporthe oryzae 70-15 hypothetical protein MGG_12858T0UP 11.6665 (254 aa) MGG_14578 | Magnaporthe oryzae 70-15 hypothetical protein MGG_14578T0UP 8.9161 (285 aa) MGG_00199 | Magnaporthe oryzae 70-15 hypothetical protein MGG_00199T0 8.6501 (1296 aa) MGG_00569 | Magnaporthe oryzae 70-15 hypothetical protein MGG_00569T0 9.9302 (772 aa) MGG_02648 | Magnaporthe oryzae 70-15 interferon-induced MGG_02648T0 57.3946 GTP-binding protein Mx (734 aa) MGG_03098 | Magnaporthe oryzae 70-15 thiazole MGG_03098T0 7.1729 biosynthetic enzyme (328 aa) MGG_03162 | Magnaporthe oryzae 70-15 hypothetical protein MGG_03162T0 16.4886 (1010 aa) MGG_03542 | Magnaporthe oryzae 70-15 metallopeptidase MGG_03542T0 40.5928 (807 aa) MGG_03567 | Magnaporthe oryzae 70-15 hypothetical protein MGG_03567T0 6.5238 (317 aa) MGG_04133 | Magnaporthe oryzae 70-15 histone MGG_04133T0 5.1075 acetyltransferase esa-1 (535 aa) MGG_04196 | Magnaporthe oryzae 70-15 hypothetical protein MGG_04196T0 25.8386 (1132 aa) MGG_04559 | Magnaporthe oryzae 70-15 exocyst complex MGG_04559T0 8.0683 component protein (852 aa) MGG_04803 | Magnaporthe oryzae 70-15 transmembrane MGG_04803T0 12.4507 and coiled-coil domain-containing protein 4 (1375 aa) MGG_05281 | Magnaporthe oryzae 70-15 superoxide MGG_05281T1 9.1779 dismutase copper chaperone Lys7 (267 aa) MGG_05785T0 MGG_05785 | Magnaporthe oryzae 70-15 levanase (660 aa) 6.3435 MGG_06010 | Magnaporthe oryzae 70-15 hypothetical protein MGG_06010T0 6.1671 (125 aa) MGG_06324 | Magnaporthe oryzae 70-15 dTDP-D-glucose MGG_06324T0 5.5861 4,6-dehydratase (425 aa) MGG_06482 | Magnaporthe oryzae 70-15 STE/STE7 protein MGG_06482T1 12.4312 kinase (516 aa) MGG_07408 | Magnaporthe oryzae 70-15 NACHT and TPR MGG_07408T0 10.7315 domain-containing protein (2126 aa) MGG_08139 | Magnaporthe oryzae 70-15 methionine MGG_08139T0 67.5144 aminopeptidase 1 (396 aa) MGG_08341 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08341T0 6.7707 (250 aa) MGG_08699 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08699T0 5.8875 (219 aa) MGG_08949 | Magnaporthe oryzae 70-15 hypothetical protein MGG_08949T0 7.2644 (411 aa)

182

Table A.4. Continued MGG_09777 | Magnaporthe oryzae 70-15 trichothecene 3-O- MGG_09777T0 5.7312 acetyltransferase (474 aa) MGG_10359 | Magnaporthe oryzae 70-15 hypothetical protein MGG_10359T0 5.6160 (416 aa) MGG_10716 | Magnaporthe oryzae 70-15 hypothetical protein MGG_10716T0 9.5669 (180 aa) MGG_11475 | Magnaporthe oryzae 70-15 hypothetical protein MGG_11475T0 6.2467 (1029 aa) MGG_11761 | Magnaporthe oryzae 70-15 hypothetical protein MGG_11761T0 7.0266 (619 aa) MGG_11907 | Magnaporthe oryzae 70-15 hypothetical protein MGG_11907T0 6.0028 (461 aa) MGG_13796 | Magnaporthe oryzae 70-15 stress responsive MGG_13796T0 8.0652 A/B barrel domain-containing protein (111 aa) MGG_14388 | Magnaporthe oryzae 70-15 hypothetical protein MGG_14388T0 19.4916 (202 aa) MGG_15612 | Magnaporthe oryzae 70-15 hypothetical protein MGG_15612T0 18.6143 (1340 aa) MGG_15825 | Magnaporthe oryzae 70-15 hypothetical protein MGG_15825T0 6.6759 (593 aa) MGG_17682 | Magnaporthe oryzae 70-15 hypothetical protein MGG_17682T0 6.4147 (150 aa)

U Ubiquitinated P Phosphorylated UP Ubiquitinated and Phosphorylated

Table A.5. Post-translational modification sites of proteins listed in Tables A.1-A.4.

Accession Peptides Modification position MGG_01364T0 SRPVDPSLPQGDLR 2xGG [S1; S7] MGG_02398T0 AVFGGAVK 1xGG [K8] MGG_03223T0 EDVVAIETILR 1xGG [T8] MGG_05937T0 TPTMTSR 2xGG [T1; T3] MGG_05981T0 MTMTEVK 2xOxidation [M1; M3]; 1xGG [T4] MGG_06026T0 FTSVPVER 1xGG [S3] MGG_08290T0 TATITR 1xGG [T1] MGG_09215T0 VELPTTNAQTVR 1xGG [T] MGG_12004T0 QVLSIDVVGGK 1xGG [S4] SHSDAATRPIIVTTTK 1xPhospho [S3] MGG_02701T0 RLSGSLEVPNSPLR 1xPhospho [S3] 2xNethylmaleimide [C3; C4]; MGG_06404T0 EACCALLLGNSTTSLAALTLSETAK 1xPhospho [T23] MGG_07222T0 AGNGGVIVDGASPILKDILK 1xPhospho [S12] TGTPIWAIIFDLEGGAINDVPMNATAYA MGG_08267T0 1xOxidation [M22]; 1xPhospho [Y/T] HR MGG_09394T0 TSSTLSLTAR 1xPhospho [T/S] MGG_09838T0 NAASETTSLLR 1xPhospho [S/T]

183

Table A.5. Continued MGG_11100T0 GSSVSGGVSGHLAK 1xPhospho [S] MGG_11899T0 SASPNPYTASER 1xPhospho [S3] SSTPQPATNGNAANGSAAVSFASTAS MGG_03192T0 1xPhospho [T/S]; 1xGG [S20] NNASK SSLGSVNADGVSDRPDSSK 2xPhospho [S2; S] MGG_03286T0 AMQMSSPAPSVFSTPR 2xGG [S5; S6] MGG_02055T0 QTVNASQIVRSHGISR 2xGG [T2; S15] SETATTIGAENGASSSAASLPAEVSG MGG_08005T0 3xGG [T3; T5; S28] GSK VEDLVAPK 1xGG [K8] MGG_12999T0 QTGELAAALAGR 1xGG [T2] MGG_05257T0 SATPSHATSMPMPIPR 1xPhospho [S/T] MGG_01080T0 ENADPASPTTPARKVTLAEYQAAK 1xGG [K/T] AEVSETSTITVVNSVEGDSTSVIAGEV MGG_07393T0 1xGG [T/S]; 1xLeuArgGlyGly [S/T] AR MGG_07953T0 LVTSGVSTLR 2xGG [S4; S7] MGG_07960T0 IQLIDPGSDTLDEAPTVIR 1xGG [T/S] MGG_08912T0 IELVPTSGAR 1xGG [S/T] MGG_09384T0 SLTAATAQTAAAVR 2xGG [S/T]; 1xLeuArgGlyGly [T/S] AAQQCSASQTSPSASMPTSTPTMVG 2xGG [T20; T]; 1xNethylmaleimide [C5]; MGG_09710T0 MPK 1xOxidation [M] LGAAMAGLVAGTALFFNTLAVCLMTA 1xGG [K/T]; 2xOxidation [M5; M24]; MGG_10322T0 TFVK 1xNethylmaleimide [C22] GMSIASLEAPTTPTSIGESALPAKHQS 2xGG [S15; S19]; 1xLeuArgGlyGly MGG_12009T0 VMSK [K/S] MGG_12447T0 APSAAAVFYTSGSTGVPK 1xGG [T/S] MGG_13065T0 AITDTAVYAISK 1xGG [T5] WLLNPATAAQTGTGMATATAETAFLA MGG_16007T0 2xGG [T] TGY TCSTTVFLPPIQPDALMMPTPSLGNW 1xGG [T/S]; 1xNethylmaleimide [C2]; MGG_17263T0 PVHIGQSSPRAR 2xOxidation [M17; M18] MGG_01996T1 YRLTAGYYGMGQQLR 1xPhospho [Y/T] MGG_06699T0 SSPTADYPGSMR 1xPhospho [S] MGG_06778T0 SGTGGPLPVDSGSPNLAGR 1xPhospho [S/T] MGG_07599T0 QATAQHESLPNALTPGK 1xPhospho [T14] MGG_14758T0 DSWTSLASEAPAAVR 1xPhospho [S/T] MGG_00314T0 ASILSAFSAVFLTVAGSQVIR 2xGG [S/T]; 1xPhospho [S/T] TTTNNNGGQMVPPPAPTSGQTDSHS 1xGG [S/K]; 1xOxidation [M10]; MGG_00944T0 MVYQHIQETASKR 1xPhospho [T] LGYPVMALAACFGGPMLNILLGIGIGG 1xGG [K/T]; 2xPhospho [Y3; T32]; MGG_01638T0 AWMTTK 1xNethylmaleimide [C11] SHGTLPSVWTFPGPSSAPR 1xPhospho [S]; 1xLeuArgGlyGly [S1] 1xGG [T]; 1xPhospho [T]; MGG_02309T0 TLAPGSPIPPTNTTFR 1xLeuArgGlyGly [T] GVSAPSRPRCGSSPSRPGPFATFAIC 1xGG [S/T]; 1xPhospho [S/T]; MGG_03624T0 MMTTGTITLLSACK 3xNethylmaleimide [C10; C26; C39] 1xGG [S2]; 1xPhospho [T10]; GSMLGPALGTCNGISALGPVIGGAMA MGG_04329T0 1xNethylmaleimide [C11]; 1xOxidation LGTSGSK [M3] MGG_07586T0 IAAPSATTIVGHYIR 1xPhospho [T/S]; 1xGG [S/T] MGG_07817T0 HPRSTLPTRPTR 1xGG [T/S]; 1xPhospho [T] MGG_07972T0 IETSTAPPSAGSVK 2xGG [S9; S]; 1xPhospho [T/S]

184

Table A.5. Continued MGG_08348T0 ISVAAISMALLSLASAAPATQAAPIEAR 1xGG [S]; 1xPhospho [T20] MGG_09236T0 TAASRPESSLGVSELPR 1xGG [S/T]; 1xPhospho [T/S] EGSVSALGPADAPTK 1xGG [S3] MGG_12530T0 GSADPALTTFAELAVLR 1xPhospho [T/S] MEYIDSSTVAGLLDKQNGTTAGDVSA MGG_13542T0 2xGG [S7; S25]; 1xPhospho [S/T] RESTPSATVTKPPSTITR MGG_02114T0 TYIGIAIR 1xGG [T1] VEHAALPPRAGLSPASAAAGPAGEGS MGG_02377T0 1xGG [S/T/K] VTGSPSVGSAK MGG_06430T0 MSETSALLPSTR 1xGG [S/T]; 1xLeuArgGlyGly [S/T] MGG_07598T0 STGSITTSAVPIATAGAGR 1xGG [S/T] MGG_07905T0 STPEIIGVFIR 1xGG [T/S] GGKAPGAPAQSTTNTNKPAVGSTGTT MGG_08467T0 2xGG [T/K/S]; 1xNethylmaleimide [C34] SPAPNLACAPGANK MGG_08697T0 SASKSVTVHLGEGYSRTK 1xGG [S/T/K] SPVSPDYDLLSMPPPPSTGITSPLMAD MGG_13692T0 1xGG [T/S]; 1xOxidation [M] AGEAEFVDGR MGG_17625T0 SLKISADSPPR 1xGG [K3]; 1xLeuArgGlyGly [S1] MTAMHLGPPQPSHLLFSPGTSPNAPG MGG_05047T0 1xPhospho [S/T]; 1xOxidation [M] GR MGG_08007T0 DFLVDAPALDTSGPADVVR 1xPhospho [T/S] MGG_08811T0 LSSDVSVWASTSWSSVK 1xPhospho [T/S] MGG_08946T0 QTSPPPGSYPGAGAGSSRPSR 1xPhospho [S/T/Y] MGG_08985T0 STTSSYAALAFVR 1xPhospho [S/T/Y] EVVTMTNTITVTVQSTVTETDANVPAT MGG_11606T0 1xPhospho [T/S]; 1xOxidation [M5] TALSFTQR MSSPVNMSDPTAVAAMMAEQAAQMA MGG_14575T0 1xOxidation [M1]; 1xPhospho [S/T] AVK MGG_00072T0 TFPDDKPVSVAQVTLGSNGDNKTAK 1xGG [T/S/K]; 1xPhospho [S/T] MGG_02342T0 TAAAGASSSAPAPKPSGPK 1xPhospho [S16]; 1xLeuArgGlyGly [S] GSQMNSIDTSNSLIPTLPTTIR 1xGG [T/S] MGG_02408T0 SSMVLLTYLDELDAR 1xPhospho [S]; 1xLeuArgGlyGly [S] MGG_02508T0 TITHTSLGGTSLPAR 1xPhospho [S/T]; 1xLeuArgGlyGly [T1] HSHVLATFGGLMVVVEANLNALGVAL MGG_02988T0 1xGG [S/K]; 1xPhospho [S] AANSSSGPDQQALGK VGAGSGSGSGSGFDAAQFGTADTGA MGG_04685T0 1xGG [S5]; 1xPhospho [S/T] DFIPGTHETAIR MGG_04929T0 SSVSSGQAVLDSYPR 1xGG [S]; 2xPhospho [S] MKVSVTLATLAAVVSTVVAAPTAILEA MGG_05798T0 1xGG [T22]; 2xPhospho [T/S] R 1xGG [S7]; 1xOxidation [M]; 1xPhospho MGG_05938T0 MMELGVSANLATLASFGGAFR [T/S] MGG_07476T0 DHFSPTPTVGATGSSPALLSPPPK 1xPhospho [S/T]; 1xGG [K24] TASTMTSAATAAQTAAASSATGAASS 1xGG [K/S]; 1xPhospho [S]; MGG_07975T0 AGAASSAVPAAGAGAK 1xOxidation [M5] MGG_08174T0 TVFLWDVATATTLR 1xGG [T9]; 2xPhospho [T11; T12] MGG_08211T0 ARTSTAAGDLDAATPSAVSAAPATK 1xPhospho [T/S]; 1xLeuArgGlyGly [T/S] MGG_08919T0 LSISVNDTSSTGYLAK 1xGG [S/T]; 2xPhospho [T/S/Y] NVEPPQGTSAADVAGAFGFGLTESLT MGG_09185T0 1xPhospho [T/S]; 1xGG [S/T] R MGG_09262T0 IYIPSSGGGGAGGATPAGSFR 1xPhospho [S/T]; 1xLeuArgGlyGly [T/S]

185

Table A.5. Continued MGG_09537T0 ASYVLETSIVDIR 1xPhospho [S/Y/T]; 1xGG [T/S] FSSVFALSALIQAAVSAPIPSGTGTGN MGG_10001T0 2xPhospho [S/T]; 1xGG [K/T/S] AVNAITDAATDSTPLK MGG_10086T0 LLGVLTLAVSVAAISLDSGVSK 1xGG [T/K/S]; 2xPhospho [S/T] YEAAGGFSLIKGRAVPNGPPSFFPPA MGG_11230T0 1xGG [S8]; 1xPhospho [Y1] GSAYASDPVAK MGG_12714T0 GQDALVSTVGATGLAGQDNMVR 1xGG [S7]; 1xPhospho [T/S] EIFSSVDGTVPPVTTGATR 1xLeuArgGlyGly [T18] MGG_12858T0 LSGSTQVSNYFTATGTQGPGPISGSA 1xGG [S/T]; 1xPhospho [S/T] TAATSRPAETETRGTTSSSR MGG_14578T0 HGEEASVTVLTFSIDTGTPR 1xGG [T/S]; 2xPhospho [S/T]

Table A.6. List of hypothetical proteins in Tables A.1-A.4 with Pfam1 annotation compared to Pyricularia oryzae.

Accession Description Uniprot Entry PFAM Domains MGG_01364 | Magnaporthe MGG_01364T0 oryzae 70-15 hypothetical G4MZ22 Uncharacterized protein protein (1012 aa) MGG_03223 | Magnaporthe MGG_03223T0 oryzae 70-15 hypothetical G4N9Y5 Uncharacterized protein protein (2138 aa) MGG_06026 | Magnaporthe PAS domain-containing MGG_06026T0 oryzae 70-15 hypothetical G4N4V6 protein protein (635 aa) MGG_08290 | Magnaporthe Pyr_redox_2 domain- MGG_08290T0 oryzae 70-15 hypothetical G4MX87 containing protein protein (428 aa) MGG_09215 | Magnaporthe MGG_09215T0 oryzae 70-15 hypothetical G4MPS6 Uncharacterized protein protein (1677 aa) MGG_01364 | Magnaporthe MGG_01364T0 oryzae 70-15 hypothetical G4MZ22 Uncharacterized protein protein (1012 aa) MGG_02701 | Magnaporthe MGG_02701T0 oryzae 70-15 hypothetical G4NJA2 Uncharacterized protein protein (1113 aa) MGG_06404 | Magnaporthe Cytoplasmic tRNA 2- MGG_06404T0 oryzae 70-15 hypothetical G4N7I9 thiolation protein 2 protein (399 aa) MGG_08267 | Magnaporthe FAD-binding PCMH-type MGG_08267T0 oryzae 70-15 hypothetical G4MXB3 domain-containing protein protein (541 aa) MGG_09394 | Magnaporthe Macro domain-containing MGG_09394T0 oryzae 70-15 hypothetical G4NHZ0 protein protein (350 aa) MGG_11100 | Magnaporthe DUF4484 domain- MGG_11100T0 oryzae 70-15 hypothetical G4N056 containing protein protein (656 aa)

186

Table A.6. Continued MGG_03286 | Magnaporthe MGG_03286T0 oryzae 70-15 hypothetical G4N9G4 Uncharacterized protein protein (1175 aa) MGG_00669 | Magnaporthe MGG_00669T0 oryzae 70-15 hypothetical G4NB01 Uncharacterized protein protein (555 aa) MGG_01174 | Magnaporthe MGG_01174T0 oryzae 70-15 hypothetical G4MWK3 Uncharacterized protein protein (527 aa) MGG_01665 | Magnaporthe MGG_01665T0 oryzae 70-15 hypothetical A4RK04 Protein FYV10 protein (411 aa) MGG_02123 | Magnaporthe AAA domain-containing MGG_02123T0 oryzae 70-15 hypothetical G4MNR7 protein protein (747 aa) MGG_02483 | Magnaporthe RNase H2 complex MGG_02483T0 oryzae 70-15 hypothetical G4MRZ8 component protein (447 aa) MGG_02992 | Magnaporthe MGG_02992T0 oryzae 70-15 hypothetical G4NL24 G domain-containing protein protein (605 aa) MGG_03292 | Magnaporthe ANK_REP_REGION MGG_03292T0 oryzae 70-15 hypothetical G4N9F8 domain-containing protein protein (795 aa) MGG_03321 | Magnaporthe MGG_03321T0 oryzae 70-15 hypothetical G4N973 Uncharacterized protein protein (432 aa) MGG_03914 | Magnaporthe MGG_03914T0 oryzae 70-15 hypothetical G4NH80 Uncharacterized protein protein (295 aa) MGG_04146 | Magnaporthe MGG_04146T0 oryzae 70-15 hypothetical G4NIR2 ATP-binding domain protein (675 aa) MGG_04327 | Magnaporthe MYND-type domain- MGG_04327T0 oryzae 70-15 hypothetical G4NGF6 containing protein protein (1241 aa) MGG_04843 | Magnaporthe Pyriculol/pyriculariol MGG_04843T0 oryzae 70-15 hypothetical G4N2A1 biosynthesis cluster protein (611 aa) transcription factor 1 MGG_06799 | Magnaporthe DUF2828 domain- MGG_06799T0 oryzae 70-15 hypothetical G4MM18 containing protein protein (796 aa) MGG_06925 | Magnaporthe MGG_06925T0 oryzae 70-15 hypothetical G4MND8 Cation-transporting ATPase protein (1337 aa) MGG_07573 | Magnaporthe Calpain catalytic domain- MGG_07573T0 oryzae 70-15 hypothetical G4N265 containing protein protein (1060 aa) MGG_08114 | Magnaporthe MGG_08114T0 oryzae 70-15 hypothetical G4MYC8 Uncharacterized protein protein (856 aa)

187

Table A.6. Continued MGG_08841 | Magnaporthe Cnd3 domain-containing MGG_08841T0 oryzae 70-15 hypothetical G4MV38 protein protein (1037 aa) MGG_08914 | Magnaporthe RelA_SpoT domain- MGG_08914T0 oryzae 70-15 hypothetical G4MVM7 containing protein protein (1103 aa) MGG_10157 | Magnaporthe ATP-dependent DNA MGG_10157T0 oryzae 70-15 hypothetical G4MUL6 helicase II subunit 2 protein (727 aa) MGG_10327 | Magnaporthe AAA domain-containing MGG_10327T0 oryzae 70-15 hypothetical G4NJT2 protein protein (730 aa) MGG_11575 | Magnaporthe MGG_11575T0 oryzae 70-15 hypothetical G4NC13 Uncharacterized protein protein (248 aa) MGG_13058 | Magnaporthe MGG_13058T0 oryzae 70-15 hypothetical G4MLQ1 Uncharacterized protein protein (828 aa) MGG_13191 | Magnaporthe N-acetyltransferase domain- MGG_13191T0 oryzae 70-15 hypothetical G4N009 containing protein protein (184 aa) MGG_14270 | Magnaporthe MGG_14270T0 oryzae 70-15 hypothetical G4MLG2 Uncharacterized protein protein (366 aa) MGG_16442 | Magnaporthe Protein-lysine N- MGG_16442T0 oryzae 70-15 hypothetical G4MP08 methyltransferase EFM4 protein (277 aa) MGG_17680 | Magnaporthe MGG_17680T0 oryzae 70-15 hypothetical G4NIQ4 Uncharacterized protein protein (117 aa) MGG_17701 | Magnaporthe MGG_17701T0 oryzae 70-15 hypothetical G4NGR7 Elongator subunit Iki1 protein (357 aa) MGG_02055 | Magnaporthe MGG_02055T0 oryzae 70-15 hypothetical G4MN79 Uncharacterized protein protein (616 aa) MGG_00070 | Magnaporthe Zn(2)-C6 fungal-type MGG_00070T0 oryzae 70-15 hypothetical G4NER5 domain-containing protein protein (235 aa) MGG_00143 | Magnaporthe zf-LYAR domain-containing MGG_00143T0 oryzae 70-15 hypothetical G4NE73 protein protein (396 aa) MGG_00851 | Magnaporthe Ribosome biogenesis MGG_00851T0 oryzae 70-15 hypothetical G4NE08 protein TSR3 protein (426 aa) MGG_01069 | Magnaporthe MGG_01069T0 oryzae 70-15 hypothetical G4NCI9 Uncharacterized protein protein (104 aa) MGG_01305 | Magnaporthe UCR_hinge domain- MGG_01305T0 oryzae 70-15 hypothetical G4MYJ6 containing protein protein (144 aa)

188

Table A.6. Continued MGG_02109 | Magnaporthe MGG_02109T0 oryzae 70-15 hypothetical G4MNQ1 Uncharacterized protein protein (92 aa) MGG_02537 | Magnaporthe MGG_02537T0 oryzae 70-15 hypothetical G4NK40 Uncharacterized protein protein (869 aa) MGG_03000 | Magnaporthe MGG_03000T0 oryzae 70-15 hypothetical G5EI50 Uncharacterized protein protein (592 aa) MGG_03462 | Magnaporthe Amidase domain-containing MGG_03462T0 oryzae 70-15 hypothetical G4N898 protein protein (585 aa) MGG_03863 | Magnaporthe MGG_03863T0 oryzae 70-15 hypothetical G4NHF2 Uncharacterized protein protein (187 aa) MGG_04344 | Magnaporthe MGG_04344T0 oryzae 70-15 hypothetical G4NGD7 Uncharacterized protein protein (563 aa) MGG_04677 | Magnaporthe Glyco_trans_2-like domain- MGG_04677T0 oryzae 70-15 hypothetical G4MRC2 containing protein protein (646 aa) MGG_04736 | Magnaporthe F420_oxidored domain- MGG_04736T0 oryzae 70-15 hypothetical G4MQW1 containing protein protein (304 aa) MGG_05674 | Magnaporthe MGG_05674T0 oryzae 70-15 hypothetical G4MP12 Magnesium transporter protein (411 aa) MGG_05766 | Magnaporthe MGG_05766T0 oryzae 70-15 hypothetical G4MPZ2 Uncharacterized protein protein (288 aa) MGG_06285 | Magnaporthe Homeobox domain- MGG_06285T0 oryzae 70-15 hypothetical G4N881 containing protein protein (978 aa) MGG_06545 | Magnaporthe MGG_06545T0 oryzae 70-15 hypothetical G4N6K9 Uncharacterized protein protein (815 aa) MGG_06906 | Magnaporthe MGG_06906T0 oryzae 70-15 hypothetical G4MN06 Zinc carboxypeptidase protein (568 aa) MGG_07369 | Magnaporthe MGG_07369T0 oryzae 70-15 hypothetical G4MVH7 Uncharacterized protein protein (412 aa) MGG_07394 | Magnaporthe DNA-directed RNA MGG_07394T0 oryzae 70-15 hypothetical G4N0C7 polymerase subunit protein (116 aa) MGG_08083 | Magnaporthe MGG_08083T0 oryzae 70-15 hypothetical G4MY96 GLE1-like protein protein (546 aa) MGG_08201 | Magnaporthe GFA domain-containing MGG_08201T0 oryzae 70-15 hypothetical G4MZA0 protein protein (137 aa)

189

Table A.6. Continued MGG_08308 | Magnaporthe Nuclear pore complex MGG_08308T0 oryzae 70-15 hypothetical G4MWV7 component protein (282 aa) MGG_08577 | Magnaporthe Glyco_hydro_cc domain- MGG_08577T0 oryzae 70-15 hypothetical G4N658 containing protein protein (432 aa) MGG_08795 | Magnaporthe MGG_08795T0 oryzae 70-15 hypothetical G4NFK3 Cytochrome P450 protein (595 aa) MGG_10274 | Magnaporthe MGG_10274T0 oryzae 70-15 hypothetical Q2KEJ5 short-chain dehydrogenase protein (255 aa) MGG_11130 | Magnaporthe MGG_11130T0 oryzae 70-15 hypothetical G4MW77 Uncharacterized protein protein (713 aa) MGG_12745 | Magnaporthe tRNA_int_end_N2 domain- MGG_12745T0 oryzae 70-15 hypothetical G4N9W5 containing protein protein (507 aa) MGG_13774 | Magnaporthe MGG_13774T0 oryzae 70-15 hypothetical G4MR90 general transcription factor protein (340 aa) MGG_14678 | Magnaporthe DUF1640 domain- MGG_14678T0 oryzae 70-15 hypothetical G4NBT7 containing protein protein (409 aa) MGG_15688 | Magnaporthe Amino_oxidase domain- MGG_15688T0 oryzae 70-15 hypothetical G4MZ51 containing protein protein (692 aa) MGG_16374 | Magnaporthe NAD(P)-bd_dom domain- MGG_16374T0 oryzae 70-15 hypothetical G4MM59 containing protein protein (351 aa) MGG_17476 | Magnaporthe DLH domain-containing MGG_17476T0 oryzae 70-15 hypothetical G4NCY8 protein protein (253 aa) MGG_17595 | Magnaporthe DAO domain-containing MGG_17595T0 oryzae 70-15 hypothetical G4NG27 protein protein (510 aa) MGG_17713 | Magnaporthe MGG_17713T0 oryzae 70-15 hypothetical G4NGY9 Uncharacterized protein protein (637 aa) MGG_17829 | Magnaporthe Malate/L-lactate MGG_17829T0 oryzae 70-15 hypothetical G4NID5 dehydrogenase protein (363 aa) MGG_01080 | Magnaporthe Chromo domain-containing MGG_01080T0 oryzae 70-15 hypothetical G4NCH7 protein protein (1465 aa) MGG_07960 | Magnaporthe MGG_07960T0 oryzae 70-15 hypothetical G4N2U9 Uncharacterized protein protein (401 aa) MGG_08912 | Magnaporthe RING-type domain- MGG_08912T0 oryzae 70-15 hypothetical G4MVM5 containing protein protein (446 aa)

190

Table A.6. Continued MGG_09384 | Magnaporthe MGG_09384T0 oryzae 70-15 hypothetical G4NI00 Uncharacterized protein protein (514 aa) MGG_09710 | Magnaporthe MGG_09710T0 oryzae 70-15 hypothetical G4NAI4 Uncharacterized protein protein (294 aa) MGG_12009 | Magnaporthe MGG_12009T0 oryzae 70-15 hypothetical G4NI20 Uncharacterized protein protein (858 aa) MGG_16007 | Magnaporthe MGG_16007T0 oryzae 70-15 hypothetical G4MMU3 Uncharacterized protein protein (200 aa) MGG_17263 | Magnaporthe MGG_17263T0 oryzae 70-15 hypothetical G4N9P9 Uncharacterized protein protein (308 aa) MGG_01996 | Magnaporthe MGG_01996T1 oryzae 70-15 hypothetical G4MME0 Uncharacterized protein protein (1015 aa) MGG_06699 | Magnaporthe MGG_06699T0 oryzae 70-15 hypothetical G4ML61 Uncharacterized protein protein (480 aa) MGG_07599 | Magnaporthe MGG_07599T0 oryzae 70-15 hypothetical G4N2K0 Uncharacterized protein protein (141 aa) MGG_14758 | Magnaporthe MGG_14758T0 oryzae 70-15 hypothetical G4MV81 Uncharacterized protein protein (190 aa) MGG_00944 | Magnaporthe MGG_00944T0 oryzae 70-15 hypothetical G4NDD3 Uncharacterized protein protein (480 aa) MGG_03624 | Magnaporthe MGG_03624T0 oryzae 70-15 hypothetical G4N735 Uncharacterized protein protein (230 aa) MGG_04329 | Magnaporthe MFS domain-containing MGG_04329T0 oryzae 70-15 hypothetical G4NGF4 protein protein (516 aa) MGG_07817 | Magnaporthe MGG_07817T0 oryzae 70-15 hypothetical G4N1F9 Uncharacterized protein protein (602 aa) MGG_07972 | Magnaporthe MGG_07972T0 oryzae 70-15 hypothetical G4N2W3 Uncharacterized protein protein (292 aa) MGG_08348 | Magnaporthe MGG_08348T0 oryzae 70-15 hypothetical G4MWE7 Uncharacterized protein protein (82 aa) MGG_09236 | Magnaporthe MGG_09236T0 oryzae 70-15 hypothetical G4MPU9 Uncharacterized protein protein (857 aa) MGG_00052 | Magnaporthe MGG_00052T0 oryzae 70-15 hypothetical G4NET4 Uncharacterized protein protein (225 aa)

191

Table A.6. Continued MGG_00521 | Magnaporthe Tr-type G domain- MGG_00521T0 oryzae 70-15 hypothetical G4NBP4 containing protein protein (874 aa) MGG_00532 | Magnaporthe G_PROTEIN_RECEP_F2_4 MGG_00532T0 oryzae 70-15 hypothetical G4NBI3 domain-containing protein protein (530 aa) MGG_01814 | Magnaporthe MGG_01814T0 oryzae 70-15 hypothetical G4MW59 Uncharacterized protein protein (460 aa) MGG_03347 | Magnaporthe MGG_03347T0 oryzae 70-15 hypothetical G4N8Z2 Uncharacterized protein protein (275 aa) MGG_05022 | Magnaporthe NmrA domain-containing MGG_05022T0 oryzae 70-15 hypothetical G4N3U8 protein protein (322 aa) MGG_05033 | Magnaporthe Zn(2)-C6 fungal-type MGG_05033T0 oryzae 70-15 hypothetical G4N3W1 domain-containing protein protein (743 aa) MGG_05139 | Magnaporthe MGG_05139T0 oryzae 70-15 hypothetical G4N4P7 Uncharacterized protein protein (477 aa) MGG_05272 | Magnaporthe Microtubule associated MGG_05272T0 oryzae 70-15 hypothetical G4N5N4 protein protein (785 aa) MGG_05756 | Magnaporthe RNA cap guanine-N2 MGG_05756T0 oryzae 70-15 hypothetical G4MPX8 methyltransferase protein (250 aa) MGG_06893 | Magnaporthe DUF2838 domain- MGG_06893T0 oryzae 70-15 hypothetical G4MMY7 containing protein protein (548 aa) MGG_10156 | Magnaporthe DUF4149 domain- MGG_10156T0 oryzae 70-15 hypothetical G4MUA7 containing protein protein (185 aa) MGG_10252 | Magnaporthe upin Super family domain MGG_10252T0 oryzae 70-15 hypothetical Q2KEM0 containing protein protein (487 aa) MGG_12124 | Magnaporthe TOM domain containing MGG_12124T0 oryzae 70-15 hypothetical G4NH05 protein protein (54 aa) MGG_13455 | Magnaporthe Mannan endo-1,6-alpha- MGG_13455T0 oryzae 70-15 hypothetical G4N3E1 mannosidase protein (469 aa) MGG_16418 | Magnaporthe MGG_16418T0 oryzae 70-15 hypothetical G4MN62 Uncharacterized protein protein (220 aa) MGG_17677 | Magnaporthe MFS domain-containing MGG_17677T0 oryzae 70-15 hypothetical G4NIP9 protein protein (618 aa) MGG_17979 | Magnaporthe DUF2076 domain- MGG_17979T0 oryzae 70-15 hypothetical G4NJ62 containing protein protein (145 aa)

192

Table A.6. Continued MGG_02114 | Magnaporthe Dynamin GTPase effector MGG_02114T0 oryzae 70-15 hypothetical G4MNQ6 domain protein (813 aa) MGG_02377 | Magnaporthe Zn(2)-C6 fungal-type MGG_02377T0 oryzae 70-15 hypothetical G4MR05 domain-containing protein protein (884 aa) MGG_06430 | Magnaporthe Lon protease domain MGG_06430T0 oryzae 70-15 hypothetical G4N7A8 containing protein protein (565 aa) MGG_07598 | Magnaporthe MGG_07598T0 oryzae 70-15 hypothetical G4N2J9 Uncharacterized protein protein (200 aa) MGG_07905 | Magnaporthe MGG_07905T0 oryzae 70-15 hypothetical G4N2D5 Uncharacterized protein protein (1974 aa) MGG_08467 | Magnaporthe MGG_08467T0 oryzae 70-15 hypothetical G4NAL0 Uncharacterized protein protein (412 aa) MGG_08697 | Magnaporthe MGG_08697T0 oryzae 70-15 hypothetical G4NFV9 Uncharacterized protein protein (1249 aa) MGG_13692 | Magnaporthe MGG_13692T0 oryzae 70-15 hypothetical G4MMV8 Uncharacterized protein protein (87 aa) MGG_17625 | Magnaporthe MGG_17625T0 oryzae 70-15 hypothetical G4NGC3 Uncharacterized protein protein (814 aa) MGG_05047 | Magnaporthe MGG_05047T0 oryzae 70-15 hypothetical G4N429 Uncharacterized protein protein (883 aa) MGG_08007 | Magnaporthe hydrolase-1 domain- MGG_08007T0 oryzae 70-15 hypothetical G4MX53AB containing protein protein (254 aa) MGG_08811 | Magnaporthe MGG_08811T0 oryzae 70-15 hypothetical G4NFI6 Uncharacterized protein protein (320 aa) MGG_08946 | Magnaporthe MGG_08946T0 oryzae 70-15 hypothetical G4MW23 Uncharacterized protein protein (237 aa) MGG_11606 | Magnaporthe MGG_11606T0 oryzae 70-15 hypothetical G4ND76 Uncharacterized protein protein (343 aa) MGG_14575 | Magnaporthe MGG_14575T0 oryzae 70-15 hypothetical G4MQY6 Uncharacterized protein protein (514 aa) MGG_02342 | Magnaporthe MGG_02342T0 oryzae 70-15 hypothetical G4MQK7 Uncharacterized protein protein (853 aa) MGG_02408 | Magnaporthe Zn(2)-C6 fungal-type MGG_02408T0 oryzae 70-15 hypothetical G4MRF4 domain-containing protein protein (929 aa)

193

Table A.6. Continued MGG_02508 | Magnaporthe Tyrosinase_Cu-bd domain- MGG_02508T0 oryzae 70-15 hypothetical G4MKB2 containing protein protein (269 aa) MGG_02988 | Magnaporthe MGG_02988T0 oryzae 70-15 hypothetical G4NL30 Uncharacterized protein protein (384 aa) MGG_04685 | Magnaporthe MIT domain-containing MGG_04685T0 oryzae 70-15 hypothetical G4MRB0 protein protein (1459 aa) MGG_04929 | Magnaporthe MGG_04929T0 oryzae 70-15 hypothetical G4N357 Uncharacterized protein protein (750 aa) MGG_05798 | Magnaporthe MGG_05798T0 oryzae 70-15 hypothetical G4N0H2 Cutinase protein (217 aa) MGG_07476 | Magnaporthe MGG_07476T0 oryzae 70-15 hypothetical G4N196 Eisosome protein protein (872 aa) MGG_07975 | Magnaporthe MGG_07975T0 oryzae 70-15 hypothetical G4N2W6 Uncharacterized protein protein (193 aa) MGG_08211 | Magnaporthe MGG_08211T0 oryzae 70-15 hypothetical G4MZM2 Mis12-Mtw1 protein protein (572 aa) MGG_09185 | Magnaporthe MGG_09185T0 oryzae 70-15 hypothetical G4MPD1 Uncharacterized protein protein (1228 aa) MGG_10001 | Magnaporthe MGG_10001T0 oryzae 70-15 hypothetical G4N9E5 Uncharacterized protein protein (265 aa) MGG_10086 | Magnaporthe MGG_10086T0 oryzae 70-15 hypothetical - protein (129 aa) MGG_11230 | Magnaporthe MGG_11230T0 oryzae 70-15 hypothetical G4MYM5 Uncharacterized protein protein (601 aa) MGG_12714 | Magnaporthe NmrA domain-containing MGG_12714T0 oryzae 70-15 hypothetical G4N8W7 protein protein (310 aa) MGG_12858 | Magnaporthe MGG_12858T0 oryzae 70-15 hypothetical G4N840 Uncharacterized protein protein (254 aa) MGG_14578 | Magnaporthe MGG_14578T0 oryzae 70-15 hypothetical G4MRH4 Uncharacterized protein protein (285 aa) MGG_00199 | Magnaporthe RasGEF domain RasGEF domain containing MGG_00199T0 oryzae 70-15 hypothetical containing protein protein protein (1296 aa) MGG_00569 | Magnaporthe HET domain- HET domain-containing MGG_00569T0 oryzae 70-15 hypothetical containing protein protein protein (772 aa)

194

Table A.6. Continued MGG_03162 | Magnaporthe Uncharacterized MGG_03162T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (1010 aa) MGG_03567 | Magnaporthe Uncharacterized MGG_03567T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (317 aa) MGG_04196 | Magnaporthe Rab-GAP TBC Rab-GAP TBC domain- MGG_04196T0 oryzae 70-15 hypothetical domain-containing containing protein protein (1132 aa) protein MGG_06010 | Magnaporthe DUF952 domain DUF952 domain containing MGG_06010T0 oryzae 70-15 hypothetical containing protein protein protein (125 aa) Mediator of RNA MGG_08341 | Magnaporthe Mediator of RNA polymerase II MGG_08341T0 oryzae 70-15 hypothetical polymerase II transcription transcription protein (250 aa) subunit 7 subunit 7 MGG_08699 | Magnaporthe Nudix hydrolase Nudix hydrolase domain- MGG_08699T0 oryzae 70-15 hypothetical domain-containing containing protein protein (219 aa) protein MGG_08949 | Magnaporthe Uncharacterized MGG_08949T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (411 aa) MGG_10359 | Magnaporthe Uncharacterized MGG_10359T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (416 aa) MGG_10716 | Magnaporthe Uncharacterized MGG_10716T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (180 aa) MGG_11475 | Magnaporthe alpha-1,2- MGG_11475T0 oryzae 70-15 hypothetical alpha-1,2-Mannosidase Mannosidase protein (1029 aa) MGG_11761 | Magnaporthe Uncharacterized MGG_11761T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (619 aa) MGG_11907 | Magnaporthe Myosin-like coiled- Myosin-like coiled-coil MGG_11907T0 oryzae 70-15 hypothetical coil protein protein protein (461 aa) MGG_14388 | Magnaporthe Uncharacterized MGG_14388T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (202 aa) MGG_15612 | Magnaporthe Methyltransf_11 Methyltransf_11 domain- MGG_15612T0 oryzae 70-15 hypothetical domain-containing containing protein protein (1340 aa) protein MGG_15825 | Magnaporthe Uncharacterized MGG_15825T0 oryzae 70-15 hypothetical Uncharacterized protein protein protein (593 aa) MGG_17682 | Magnaporthe Ribonuclease H2 Ribonuclease H2 non- MGG_17682T0 oryzae 70-15 hypothetical non-catalytic catalytic subunit protein (150 aa) subunit

DUF – Domain of Unknown Function

195

A.1 LITERATURE CITED

(1) El-Gebali, S.; Mistry, J.; Bateman, A.; Eddy, S. R.; Luciani, A.; Potter, S. C.; Qureshi, M.;

Richardson, L. J.; Salazar, G. A.; Smart, A.; Sonnhammer, E. L. L.; Hirsh, L.; Paladin, L.;

Piovesan, D.; Tosatto, S. C. E.; Finn, R. D. The Pfam Protein Families Database in 2019.

Nucleic Acids Res. 2018, 47, 427–432. https://doi.org/10.1093/nar/gky995.

196

Appendix B: Supplemental Materials for Chapter 4

Figure B.1. Normalized standard curves for the remaining peptides in the 6 × 5 mix.

The 500 fmol isotopologue of LLSLGAGEFK was difficult to identify for an unknown reason and therefore removed from all analysis.

197

Figure B.2. Normalized standard curves for the remaining peptides in the 7 × 5 mix.

198

Appendix C: Supplemental Materials for Chapter 5

Table C.1. Optimized gradient and mobile phase conditions used for the analysis of

INLIGHT™ derivatized N-linked glycans and maintenance of the Porous Graphitic Carbon

(PGC) column.

Table C.2. Optimized gradient and mobile phase conditions used for the analysis of

INLIGHT™ derivatized N-linked glycans using Hydrophilic Interaction Chromatography

(HILIC) column.

199

Table C.3. Optimized gradient and mobile phase conditions used for the analysis of

INLIGHT™ derivatized N-linked glycans using reversed-phase C18 column.

200

Table C.4. N-linked glycan identifications in the fetuin samples based on previous literature in the order of neutral mass from top to bottom.

*Only identified in samples prepared with the new method derivatization protocol

201

Table C.5. N-linked glycan identifications in the horseradish peroxidase samples based on previous literature in the order of neutral mass from top to bottom.

The protocol described in the manuscript was developed using maltoheptaose and N- linked glycans cleaved from well-characterized glycoprotein standards, bovine fetuin and horseradish peroxidase and then extended to N-linked glycans cleaved from proteins in a more complex system (human plasma). This approach of comparing a sample to itself worked well for this fundamental study and can be used as a QC. However, the step-by- step protocol below is provided in the context of carrying out an N-linked relative quantification study using the INLIGHT™ strategy to address a biological hypothesis.

202

Protocol for Enzymatic Cleavage and Derivatization of Glycans with INLIGHT™ P2GPN

(Updated from Hecht et. al DOI: 10.3791/53735)

MATERIALS

10 kDa Molecular weight Acetic acid (Fisher 100 mM Ammonium cut-off filters (Sigma Scientific, A11350) bicarbonate, pH 7 (Fisher Aldrich, UFC5003BK) 1 M Dithiothreitol (Sigma Scientific, A643500) LC-MS grade water (Fisher Aldrich, 646563-10X) Scientific, W6-4) NAT/SIL P2GPN 1 M Iodoacetamide (Sigma INLIGHT™ reagents LC-MS grade methanol Aldrich, A3221-10VL) (Cambridge Isotope (Fisher Scientific, A456-4) Glycerol-free PNGase F Laboratories, GTK-1000) (Bulldog Bio, NZPP010LY)

PROCEDURE

1. Experimental design: identify samples to pair; these will later be combined 1:1

(v/v) for relative quantification following derivatization with the NAT/SIL

INLIGHT™ reagents.

2. Use a Bradford assay or bicinchoninic acid assay (BCA) to measure the protein

concentration for each sample.

3. Add 250 µg of protein from each sample to separate 10 kDa MWCO filters.

4. Denature the proteins by adding 2 µL of 1 M dithiothreitol (DTT) to each filter.

5. Dilute with 200 µL 100 mM ammonium bicarbonate.

6. Incubate the samples in an oven for 30 minutes at 56°C.

203

7. To denature the proteins and keep them from reforming their secondary

structures, alkylate the free thiol groups of the proteins by adding 50 µL of 1 M

iodoacetamide to the filters.

8. Incubate for 60 minutes at 37 °C.

9. Concentrate the proteins by centrifuging the samples at 14,000 × g for 40

minutes.

10. Wash the samples by adding 100 µL of 100 mM ammonium bicarbonate to the

filters then centrifuging for 20 minutes.

11. Repeat Step 10 twice more for a total of three washes.

12. Remove filters and place into new centrifuge tubes. Discard old centrifuge tubes

containing the flow through.

13. *CRITICAL STEP* Enzymatically cleave the N-linked glycans from the proteins

by adding 2 µL of PNGase F (1,000 units) to the filter.

14. Dilute each sample with 98 µL of 100 mM ammonium bicarbonate and mix by

pipetting up and down.

15. Incubate the samples in an oven for 18 hours at 37 °C.

16. Elute the cleaved glycans by centrifuging at 14,000 × g for 20 minutes.

17. Continue elution by adding 100 µL of 100 mM ammonium bicarbonate and

centrifuging at 14,000 × g for 20 minutes.

18. Repeat Step 17 twice more for a total of three times.

19. Discard the filters.

20. Place the centrifuge tubes in -80 °C freezer for 30 minutes or until frozen.

204

21. Once the samples are completely frozen, place centrifuge tubes in vacuum

concentrator at room temperature for 4 hours or until completely dry. *PAUSE

POINT* At this point, the samples can be stored at -20 °C for up to six

months.

22. In different tubes, reconstitute the natural (NAT) and stable isotope label (SIL)

INLIGHT™ reagents in 1.00 mL of LC-MS grade methanol (final concentration of

each reagent will be 1.00 µg/µL) and vortex for 10 minutes to ensure complete

solubilization of the reagents.

23. *CRITICAL STEP* Derivatize each sample by adding 10 µL of NAT reagent to

one sample and 10 µL of SIL reagent to the other.

24. *CRITICAL STEP* Dilute each sample with 45 µL of LC-MS grade methanol and

45 µL acetic acid, resulting in a tagging solution of 0.1 µg/µL in 55:45 (v/v) of

methanol to acetic acid.

25. Incubate samples at 37 °C for 1 hour and 45 minutes.

26. Immediately place samples in the vacuum concentrator set to a temperature of

55 °C.

27. *CRITICAL STEP* Dry for 1 hour or until completely dry. This quenches the

derivatization reaction.

a. *PAUSE POINT* The completely dry samples can be stored at -20 °C

for up to 6 months.

28. Prior to nano-LC-MS/MS analysis, reconstitute the NAT derivatized sample and

the SIL derivatized sample in 50 µL of LC-MS grade water. Pipet up and down to

fully suspend the labeled glycans.

205

a. If preparing for UHPLC-MS/MS, reconstitute the samples in 25 µL of LC-

MS grade water to increase injection amount for optimal ion abundance

during analysis.

29. *CRITICAL STEP* Centrifuge samples for 5 minutes at 14,000 × g to remove

excess tag. The unreacted tag will be at the bottom on the tube but this may not

be visible to the eye.

30. Carefully remove the supernatant by avoiding the bottom of the centrifuge tube

and transfer contents into a new sample tube.

31. Combine the pair of NAT and SIL derivatized samples 1:1 by volume for relative

quantification. Vortex the sample for 3 minutes.

a. For nano-LC MS/MS analysis, inject 5 µL on column

b. For UHPLC MS/MS analysis, inject 10 µL on column

206

Appendix D: Supplemental Materials for Chapter 6

Table D.1. Operation Timing for Loading Data into GlycoHunter.

File File size* Number Build ibh Load time Average Type (scans) Load (min:sec) (min:sec) (min:sec)

1st load 0:44.27 0:51.27 - - 0:51.27 imzML 32942 Nth Load - 0:05.85 0:05.81 0:05.48 0:05.71

th mzXML 32812 N Load N/A 4:17.87 4:15.23 4:21.96 4:18.35

*Scan numbers are different due to the formatting during the conversion

Table D.2. Operation Timing for Finding Peak Pairs using GlycoHunter.

Peak Id m/z Peak Abundance Peak Pairs Unique Time Average Algorithm Tolerance Window Threshold Occurrence Found Pairs (min:sec) (min:sec) Local Max 0.1 0.1 1000 4 1112 118 0:22.92 0:22.71 0:22.74 0:22.79 Local Max 0.1 0.1 1 1 5192 3992 0:22.69 0:22.71 0:22.66 0:22.69 Parabolic 0.1 0.1 1000 4 2238 341 0:23.96 0:23.68 0:24.19 0:23.94 Centroid Parabolic 0.1 0.1 1 1 11325 7185 0:24.19 0:23.79 0:23.72 0:23.90 Centroid MS Peaks 0.1 0.1 1000 4 1103 187 3:34.94 3:36.32 3:36.22 3:35.83 MS Peaks 0.1 0.1 1 1 5708 3868 3:39.94 3:41.80 3:39.92 3:40.55

207

Table D.3. Operation Timing for Exporting Data into Excel files and Skyline Transition Lists from GlycoHunter.

Export All without Peak Id Abundance Build Abundance Plot Heatmap Export All2 Abundance Matrix Algorithm Matrix Size1 Matrix (min:sec) (min:sec) (min:sec) (min:sec) Local Max 1114 x 378 0:00.29 0:02.00 0:06.36 0:04.12 Local Max 5194 x 6786 0:00.41 0:07.45 2:19.29 0:07.40 Parabolic 2240 x 684 0:00.27 0:02.80 0:08.95 0:04.87 Centroid Parabolic 11327 x 14374 0:00.82 0:15.64 Failed3 0:10.19 Centroid MS Peaks 1105 x 376 0:00.26 0:01.68 0:06.38 0:04.05 MS Peaks 5710 x 7738 0:00.42 0:08.03 2:30.53 0:07.45

208

Table D.4. Glycan Features identified using GlycoHunter (GH) then Skyline and comparing it to previous results from the same files where the m/z’s were curated from the literature and found manually (L/M).

Carbohydrate features NAT m/z SIL m/z Charge Adduct ID Platform

N2 661.3082 667.3283 1 H+ GH

HNA2 1202.4743 1208.4928 1 H+ GH

HNA 911.3776 917.3979 1 H+ GH

HN2 805.3507 811.3710 1 +H-H2O GH

HN2 823.3608 829.3809 1 H+ GH, L/M

H5N2 1453.5632 1459.5840 1 +H-H2O GH

H5N2 1471.5744 1477.5949 1 H+ GH, L/M

H4 903.3608 909.3810 1 H+ GH

H3 741.3080 747.3280 1 H+ GH

H2N2 985.4143 991.4346 1 H+ GH

FHN 766.3379 772.3582 1 H+ GH

HN2 412.1840 415.1941 2 2H+ GH, L/M

H8N5 1283.9888 1286.9993 2 2H+ GH

H8N2 979.3702 982.3805 2 2H+ GH

H7N6 1304.5012 1307.5113 2 2H+ GH, L/M

H7N4A 1246.9695 1260.4355 2 2H+ GH

H7N4 1101.4225 1104.4334 2 2H+ GH

H7N3 999.8827 1002.8930 2 2H+ GH

H7N2 898.3433 901.3535 2 2H+ GH

H6N5A3 1558.5780 1561.5880 2 2H+ GH, L/M

H6N5A2 1413.0303 1416.0403 2 2H+ GH, L/M

H6N4A 1165.9450 1168.9548 2 2H+ GH

209

Table D.4. Continued H6N5A 1267.4848 1286.9993 2 2H+ GH, L/M

H6N4 1020.3962 1023.4064 2 2H+ GH

H6N3 918.8562 921.8666 2 2H+ GH

H6N2 817.3160 820.3261 2 2H+ GH, L/M

H5N5A 1267.4826 1270.4926 2 2H+ GH, L/M

H5N4A2 1230.4642 1233.4743 2 2H+ GH, L/M

H5N4A 1084.9165 1087.9266 2 2H+ GH, L/M

H5N4 939.3690 942.3791 2 2H+ GH, L/M

H5N3A 983.3779 986.3882 2 2H+ GH

H5N3 837.8297 840.8398 2 2H+ GH

H5N2 736.2909 739.3006 2 2H+ GH, L/M

H4N3A 902.3512 905.3613 2 2H+ GH

H4N3 756.8029 759.8130 2 2H+ GH, L/M

H4N2 655.2635 658.2734 2 2H+ GH

H3N4 777.3162 780.3263 2 2H+ GH, L/M

H3N3 675.7769 678.7871 2 2H+ GH

H3N2 574.2368 577.2469 2 2H+ GH, L/M

2[+H- H3N10 1368.5472 1371.5565 2 GH H2O]

FH8N5 1357.0206 1360.0307 2 2H+ GH

FH6N5A2 1486.0620 1489.0714 2 2H+ GH

FH6N4 1093.4238 1096.4339 2 2H+ GH

FH5N5 1113.9377 1116.9477 2 2H+ GH, L/M

FH5N4A2 1303.4960 1354.5515 2 2H+ GH, L/M

FH5N4A 1157.9469 1160.9570 2 2H+ GH

FH5N4 1012.3980 1015.4080 2 2H+ GH, L/M

FH4N5 1032.9123 1035.9230 2 2H+ GH

210

Table D.4. Continued FH4N4 931.3716 934.3817 2 2H+ GH

FH3N7 1173.9426 1176.9523 2 H+ K+ GH

FH3N5 951.8849 954.8949 2 2H+ GH, L/M

FH3N4 850.3452 853.3552 2 2H+ GH, L/M

H7N5A2 996.3762 998.3833 3 3H+ GH

211

Figure D.1. GlycoHunter Peak Identification search parameters in the Excel export file.

212

Figure D.2. GlycoHunter peak pair identification results which include detailed information about where each peak pair was identified in the Excel export file.

213

Figure D.3. GlycoHunter Peak Pair identifications grouped by charge state in the Excel export file.

214

Figure D.4. Example Skyline transition list exported from GlycoHunter.

215