Direct Data-Independent Acquisition (Direct DIA) Enables Substantially Improved Label-Free Quantitative Proteomics in Arabidopsis

bioRxiv preprint doi: https://doi.org/10.1101/2020.11.07.372276; this version posted November 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Direct data-independent acquisition (direct DIA) enables substantially improved label-free quantitative proteomics in Arabidopsis Devang Mehta1, Sabine Scandola1 and R. Glen Uhrig1 # 1 Abstract 2 The last decade has seen significant advances in the application of 3 quantitative mass spectrometry-based proteomics technologies to tackle 4 important questions in plant biology. This has included the use of both 5 labelled and label-free quantitative liquid-chromatography mass 6 spectrometry (LC-MS) strategies in model1,2 and non-model plants3. While 7 chemical labelling-based workflows (e.g. iTRAQ and TMT) are generally 8 considered to possess high quantitative accuracy, they nonetheless suffer 1 Department of Biological 4,5 9 Sciences, University of from ratio distortion and sample interference issues , while being less 10 Alberta, Edmonton T6G 2E9, cost-effective and offering less throughput than label-free approaches. 11 Alberta, Canada Consequently, label free quantification (LFQ) has been widely used in 12 comparative quantitative experiments profiling the native6 and post- 13 translationally modified (PTM-ome)7,8 proteomes of plants. However, LFQ 14 shotgun proteomics studies in plants have so far, almost universally, used 15 data-dependent acquisition (DDA) for tandem MS (MS/MS) analysis. Here, #Correspondence 16 Dr. R. Glen Uhrig we systematically compare and benchmark a state-of-the-art DDA LFQ 17 [email protected] workflow for plants against a new direct data-independent acquisition 18 (direct DIA) method9. Our study demonstrates several advantages of direct 19 DIA and establishes it as the method of choice for quantitative proteomics 20 Keywords on plant tissue. We also applied direct DIA to perform a quantitative 21 Arabidopsis thaliana, cell proteomic comparison of dark and light grown Arabidopsis cell cultures, 22 culture, proteome, quantitative providing a critical resource for future plant interactome studies using this 23 proteomics, data dependent well-established biochemistry platform. acquisition, data independent 24 acquisition, mass spectrometry Introduction 25 In a typical DDA workflow, elution groups of digested peptide ions 1 26 (precursor ions) are first analysed at the MS level using a high-resolution 27 mass analyser (such as modern Orbitrap devices). Subsequently, selected Funding 28 precursor ions are isolated and fragmented, generating MS2 spectra that This work was funded by the 29 deduce the sequence of the precursor peptide. For each MS1 scan usually National Science and 2 30 Engineering Research Council around 10-12 MS scans are performed after which the instrument cycles to 1 31 of Canada (NSERC) and the the next MS scan and the cycle repeats. While this “TopN” approach enables 32 Canadian Foundation for identification of precursors spanning the entire mass range, the 33 Innovation (CFI). fragmentation of semi-stochastically selected precursor ions (generally, 1 bioRxiv preprint doi: https://doi.org/10.1101/2020.11.07.372276; this version posted November 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Data-dependent & independent acquisition analysis of Arabidopsis proteome Mehta et al., 2020 Figure 1: Experimental workflow and summary results. Total protein was isolated using denaturing extraction from light and dark grown Arabidopsis cells and digested using trypsin. Peptides were desalted and subject to LC-MS/MS using two different acquisition modes. Ion maps showing a single MS1 scan and subsequent MS2 scans are presented to illustrate differences in acquisition schemes. Raw data was analyzed by MaxQuant & Perseus for data-dependent acquisition (DDA) analysis and by Spectronaut for data- independent acquisition (DIA) analysis using spectral libraries created from both acquisitions, and for direct DIA analysis. Counts of FDR-filtered (0.01) peptide spectrum matches (PSMs)/precursors, peptides, and protein groups for each analysis type are shown. 34 more intense ions) limits the reproducibility of individual DDA runs, results 35 in missing values between replicate runs, and biases quantitation toward 36 more abundant peptides10. These limitations are also true for faster, hybrid 37 Orbitrap devices which are still only able to use 1% of the full ion current for 38 mass analysis, hence missing out on lower abundant species11. This is 39 particularly disadvantageous for label-free workflows and samples with a 40 high protein dynamic range, such as human plasma and photosynthetic 41 tissue. 42 In order to address these limitations, several data-independent acquisition 43 (DIA) workflows have been pioneered, famously exemplified by Sequential 44 Window Acquisition of All Theoretical Mass Spectra (SWATH-MS)12,13. In DIA 45 workflows, specific, often overlapping, m/z windows spanning a defined 46 mass range are used to sub-select groups of precursors for fragmentation 47 and MS2 analysis. As a result, complete fragmentation of all precursors in 48 that window follows MS1 scans resulting in a more reproducible and 49 complete analysis. A major disadvantage of DIA workflows however, is that 50 each MS2 scan contains multiplexed spectra from several precursor ions 51 making accurate identification of peptides difficult. Traditionally, this has 52 been addressed through the use of global or project-specific spectral- 53 libraries obtained from a fractionated, high-resolution DDA survey of all 54 samples—adding to experimental labour and instrumentation analysis 2 bioRxiv preprint doi: https://doi.org/10.1101/2020.11.07.372276; this version posted November 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Data-dependent & independent acquisition analysis of Arabidopsis proteome Mehta et al., 2020 55 time. More recently, alternative approaches have been developed that avoid 56 the use of spectral libraries and instead use “pseudo-spectra” derived from 57 DIA runs that are then searched in a spectrum-centric approach analogous 58 to conventional DDA searches14–16. Improvements in such library-free DIA 59 approaches have included the incorporation of high precision indexed 60 Retention Time (iRT) prediction17 and the use of deep-learning 61 approaches18–20. Direct DIA (an implementation of a library-free DIA 62 method; Biognosys AG) and a hybrid (direct DIA in combination with 63 library-based DIA) approach has been recently used to quantify more than 64 10,000 proteins in human tissue21 and reproducibly identify >10,000 65 phosphosites across hundreds of human retinal pigment epithelial-1 cell 66 line samples9. 67 Direct DIA combines the advantages of DIA for reproducible quantification 68 of proteins in complex mixtures with high dynamic range, with the ease of 69 use of earlier DDA methodologies. Hence, a systematic comparison of these 70 two technologies for LFQ proteomics is essential to define best practice in 71 plant proteomics. In order to execute this analysis, we compared the 72 proteomes of light- and dark-grown Arabidopsis suspension cells. 73 Arabidopsis suspension cells are a long-established platform for plant 74 biochemistry and have recently seen a resurgence in popularity due to their 75 utility in facilitating protein interactomic experimentation using 76 technologies such as tandem affinity purification-mass spectrometry 22–26, 77 nucleic acid crosslinking27, and proximity labelling (e.g. TurboID)28. Despite 78 this, no existing resource profiling the basal differences in proteomes of 79 Arabidopsis cells grown in light or dark exists—a fundamental requirement 80 to determine the choice of growth conditions for protein interactomics 81 experiments and targeted proteomic assays in this system. 82 Results & Discussion 83 We performed total protein extraction under denaturing conditions from 84 Arabidopsis (cv. Ler) suspension cells grown for five days in either constant 85 light or dark. Trypsin digestion of the extracted proteome was performed 86 using an automated sample preparation protocol and 1ug of digested peptide 87 was analysed using an Orbitrap Fusion Lumos mass spectrometer operated 88 in either DDA or DIA acquisition mode over 120-minute gradients. A total of 89 8 injections (4 light & 4 dark) per analysis type were carried out. DDA data 90 processing was performed using MaxQuant, while DIA data processing was 91 performed using Spectronaut v14 (Biognosys AG.). For DIA analysis, both 92 hybrid (library + direct DIA) and direct DIA analysis was performed. The 93 hybrid analysis was performed by first creating a spectral library from DDA 94 raw files using the Pulsar search engine implemented in Spectronaut, 95 followed by a peptide-centric DIA analysis with DIA raw output files. Direct 96 DIA was performed directly on raw DIA files as implemented in Spectronaut. 97 The entire workflow is depicted in Figure 1. Both hybrid DIA and direct DIA 3 bioRxiv preprint doi: https://doi.org/10.1101/2020.11.07.372276; this version posted November 8, 2020. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed

Direct Data-Independent Acquisition (Direct DIA) Enables Substantially Improved Label-Free Quantitative Proteomics in Arabidopsis

Standard Flow Multiplexed Proteomics (Sflompro) – an Accessible and Cost-Effective Alternative to Nanolc Workflows

W Yang Et Al. Proteomic Approaches to the Analysis of Multiprotein

Proteomics in Forensic Analysis: Applications for Human Samples

Challenges in Clinical Metaproteomics Highlighted by the Analysis of Acute Leukemia Patients with Gut Colonization by Multidrug-Resistant Enterobacteriaceae

Improving Data-Independent Acquisition Proteomics 12 July 2021

The Power of Three in Cannabis Shotgun Proteomics: Proteases, Databases and Search Engines

Phosphoproteomics for the Masses

Leveraging Shotgun Proteomics for Optimised Interpretation of Data- Independent Acquisition Data: Identification of Diagnostic Biomarkers for Paediatric Tuberculosis

2. Introduction to Shot Gun Proteomics

Proteomics in Non-Model Organisms: a New Analytical Frontier

Challenges and Perspectives of Metaproteomic Data Analysis

CAMPI): a Multi-Lab Comparison of Established Workflows