Open Source Proteomics Tools Along with Existing Commercial Software

Proteomics software available in the public domain. Pratik Jagtap Minnesota Supercomputing institute Two-Dimensional gel electrophoresis pI Mw Proteins are resolved based on their isolelectric point (using isoelectric focusing) and then molecular weight (using SDS-PAGE). Gels are compared, differentially expressed proteins are excised and identified. Proteomics Fifteen Years Ago… Proteomics Fifteen Years Ago… Mass Data Extracon. Search Analysis Soware that correlates the Spectrometry algorithm protein ID to the excised gel spot. Two-Dimensional gel electrophoresis pI Mw 2DGE : High molecular weight proteins, low molecular weight proteins, proteins with extreme isoelectric points, membrane proteins were underrepresented in the analysis. Multi-Dimensional Protein Identification Technology Proteomics workflow Protein Peptide Fragmentation Search against database. Mass spectrum mass spectrometry Mass Spectrometers & data formats Thermofinnigan Sequest Xcalibur / .raw .dta .out Life Technologies ProteinPilot Analyst / .wiff ; .t2d .t2d .group Waters Masslynx / .raw Bruker X! tandem .baf .xml mzxml mzData OMSSA pepxml protxml .xml .omx Mascot mzml .mgf .dat Proteo-Informatics Data Extracon. Stascal validaon of Mass Data Conversion. Search Targeted pepde and Quantave Spectrometry algorithm Proteomics protein Tools. De novo idenficaons. Tools. Spectral Data Disseminaon Matching Data extraction Data Extracon. Mass Spectrometry ReAdW http://www.ionsource.com/functional_reviews/readw/t2x_update_readw.htm ReAdW converts Xcalibur .raw files to universal mzXML format. T2D Extractor https://www.prime-sdms.org/PRIMEInstallationSite/MSViewer/T2DExtractor.zip • A tool that can access the Applied Biosystem’s MALDI-TOF/TOF 4700 and 4800 database and can extract T2D files as well as peak lists. • It can be used to extract individual spectra, runs, or entire spotsets. MS/MS peaklists are provided in .mgf formats. • Runs on Java 1.5 platform. • LCMS Peaklist Extractor – Batch mode tool for extracting concatenated .mgf peaklist files. • Quantitation Extractor – Batch mode tool for extracting areas for peaks in MS/MS spectra. Mass Spectrometers & data formats Thermofinnigan Mascot Xcalibur / .raw .mgf .dat Life Technologies Sequest Analyst / .wiff ; .t2d .dta .out Waters X! tandem Masslynx / .raw .xml Bruker OMSSA .baf .xml .omx mzxml mzData pepxml protxml Data Conversion. mzml Mass Spectrometry data conversion Data Conversion. Mass Spectrometry mzXML2Other http://www.proteomecommons.org/current/522/ Converter from mzXML to sequest dta, mascot generic and micromass pkl formats. Peak List Conversion Utility (Java Web Start) https://proteomecommons.org/tool.jsp?i=1012 The ProteomeCommons.org IO Framework's tool for converting peak list and spectrum files between different formats. The tool can also merge multiple peak lists into a single concatinated peak list. The tools uses Java Web Start and runs locally on your computer. MassMatrix File Conversion Tools http://searcher.rrc.uic.edu/mm-docs/downloads /MM_File_Conversion_1p0.exe These tools convert between common input formats: .RAW, .mzXML, .MGF. search algorithm Data Extracon. Mass Data Conversion. Search Spectrometry algorithm SEARCH ALGORITHM Search algorithm X!tandem & the GPM Search algorithm http://www.thegpm.org/TANDEM/index.html • X! Tandem can be utilized as a web-based application or deployed locally using precompiled binaries and FASTA-formatted files. • X!Tandem takes inputs in .xml format and outputs .xml format. • The data analysis components consist of Input file ; FASTA, Taxonomy; Parameters and output. • Central Axiom : “For each identifiable protein, there is at least one detectable tryptic peptide.” • Extensively search for modified/ non-enzymatic peptides only on identified proteins. • How far is the top-scoring match from the rest of the pack? Uses E-value. Much faster than Sequest’s Xcorr. The Global Proteome Machine Organization • X!Hunter • X! P3 • Common OMSSA Search algorithm http://pubchem.ncbi.nlm.nih.gov/omssa/ • OMSSA takes experimental ms/ms spectra, filters noise peaks, extracts m/z values, and then compares these m/z values to calculated m/z values derived from peptides produced by an in silico digestion of a protein sequence library. • Calculates E-value as a discriminant score. • An E-value for a hit is a score that is the expected number of random hits from a search library to a given spectrum such that the random hits have an equal or better score than the hit. • It uses classical hypothesis testing based on type of statistical model that is used in BLAST. • Faster; Runs on all platforms Search maxquant algorithm http://www.maxquant.org/ • MaxQuant is an integrated suite of algorithms specifically developed for high- resolution, quantitative MS data. • MaxQuant detects peaks, isotope clusters and stable amino acid isotope-labeled (SILAC) peptide pairs as three-dimensional objects in m/z, elution time and signal intensity space. • By integrating multiple mass measurements, mass accuracy in the p.p.b. range is achieved. • MaxQuant quantifies several hundred thousand peptides per SILAC-proteome experiment. De novo tools Data Extracon. Mass Data Conversion. Search Spectrometry algorithm De novo Tools. De novo de novo analysis Tools. Protein Peptide Fragmentation Mass spectrum Search against database. • De novo Analysis : Generate sequence from spectrum and match against database by using BLAST pepnovo De novo Tools. hp://pepde.ucsd.edu/pepnovo.html • PepNovo is a software for de novo sequencing of peptides from mass spectra. • PepNovo uses a probabilistic network to model the peptide fragmentation events in a mass spectrometer. • In addition, it uses a likelihood ratio hypothesis test to determine if the peaks observed in the mass spectrum are more likely to have been produced under the fragmentation model, than under a probabilistic model that treats the appearance of peaks as random events. De novo lutefisk Tools. http://sourceforge.net/projects/lutefiskxp • LUTEFISK uses a graph theory approach for de novo peptide sequence determinations from low-energy collision-induced dissociation (CID) data of tryptic peptides. • Lutefisk converts all of the ions into their corresponding b-ion masses by making N- and C-terminal “evidence lists” that contain evidence for cleavage at every possible b-ion mass. Once the sequence spectrum has been established, the program proceeds by tracing sequences starting at the N-terminus. • Highest ranked sequences are subjected to a cross-correlation analysis and scores are combined and normalized to produce a final score and ranking. spectral matching Data Extracon. Mass Data Conversion. Search Spectrometry algorithm De novo Tools. Spectral Matching Spectral x!hunter Matching http://www.thegpm.org • X! Hunter is a search engine that compares experimentally observed spectra directly with a library of spectra that have been confidently assigned to a particular peptide sequence (an Annotated Spectrum Library, or ASL). • It can identify proteins using information from large number of spectra in GPMDB database. • Creation of ASLs : 1) Confident assignments for human and yeast peptides were extracted from GPMDB. 2) Replicate observations of the same peptide were averaged together and a final list of averaged peptide spectra was produced. • Because the sequence modifications and cleavage sites for the peptides in the sequence library are already known, it is not necessary to specify as many parameters for this type of search as in more conventional search engines. • This type of pattern matching tool is ideal for applications such as biomarker discovery. Spectral MS-Clustering Matching http://proteomics.bioprojects.org/MassSpec MS-Clustering of MS/MS spectra takes advantage of dataset redundancy by identifying multiple spectra of the same peptide and replacing them with a single representative spectrum. Analyzing only representative spectra results in significant speed-up of MS/MS database searches. Large MS/MS data sets (over 10 million spectra) were reduced to smaller datasets and resulted in higher number of peptide identifications as compared to regular nonclustered searches. Data Extracon. Stascal validaon of Mass Data Conversion. Search pepde and Spectrometry algorithm protein De novo idenficaons. Tools. Spectral Matching Stascal validaon of pepde and protein idenficaons. Trans-proteomic pipeline Stascal validaon of pepde and protein idenficaons. Trans-Proteomic Pipeline (TPP) is a data analysis pipeline for the analysis of LC/ MS/MS proteomics data. TPP includes modules for validation of database search results, quantitation of isotopically labeled samples, and validation of protein identifications, as well as tools for viewing raw LC/MS data, peptide identification results, and protein identification results. The XML backbone of this pipeline enables a uniform analysis for LC/MS/MS data generated by a wide variety of mass spectrometer types, and assigned peptides using a wide variety of database search engines. peparml Stascal validaon of pepde and http://mac.softpedia.com/get/Math-Scientific/PepArML.shtml protein idenficaons. Feature extraction X!Tandem PepArML Mascot OMSSA Other A model-free, result-combining peptide identification arbiter via machine learning. Quantitative tools Data Extracon. Stascal validaon of Mass Data Conversion. Search pepde and Quantave Spectrometry algorithm protein Tools. De novo idenficaons. Tools. Spectral Matching iTRAQ™ : Isobaric Tags for Relative and Absolute Quantification. Isobaric Tag (Total

Open Source Proteomics Tools Along with Existing Commercial Software

Standard Flow Multiplexed Proteomics (Sflompro) – an Accessible and Cost-Effective Alternative to Nanolc Workflows

Targeted Quantitative Proteomics Using Selected Reaction Monitoring

Quantitative Proteomics Reveals the Selectivity of Ubiquitin-Binding Autophagy Receptors in the Turnover of Damaged Lysosomes by Lysophagy

Mass Spectrometry and Proteomics Using R/Bioconductor

Quantitative Assay of Targeted Proteome in Tomato Trichome

Novel Insights Into Quantitative Proteomics from an Innovative

Introduction to Label-Free Quantification

Integrated Identification and Quantification Error Probabilities For

Quantitative Proteomics Reveals the Protective Effects of ESD Against

When 2D Is Not Enough, Go for an Extra Dimension. Rabilloud T.Proteomics

Quantitative Proteomics Links the LRRC59 Interactome to Mrna Translation on the ER Membrane

Applications of Modern Proteomics in Biology and Medicine