
PROTEOGENOMICS & METAPROTEOMICS USING THE GALAXY PLATFORM PRATIK JAGTAP University of Minnesota St.Paul / Minneapolis Minnesota, USA MINNEAPOLIS TO HYDERABAD MINNEAPOLIS, MINNESOTA Center for Mass Spectrometry and Proteomics hp://cbs.umn.edu/cmsp/home • PROTEOMICS • PROTEOGENOMICS • CHALLENGES • BIOLOGICAL INSIGHTS • METAPROTEOMICS • GALAXYP • LINKS AND ACKNOWLEDGMENTS GENOMIC AND PROTEOMIC DATABASES Finished and Published Genomes • 3551 Bacterial Genomes. • 211 Archaeal Genomes. • 58 Eukaryal Genomes. • 3363 Viral Genomes http://www.genomesonline.org/index DEFINING PROTEOMICS : LOOKING WITHIN Mass spectrum Reference Protein Database PepEde Spectral Match from genomic annotation PROTEOMICS WORKFLOW Search Database Peaklist Generation Eng et al 2011 Mol Cell Proteomics. 10(11): R111.009522. Database Search Peptide Spectral Match : Statistical Validation Protein Inference PROTEOGENOMICS Nat Methods. 11(11): 1114–1125. DEFINING PROTEOGENOMICS: LOOKING WITHIN AND WITHOUT Mass spectrum Reference Protein Database from genomic annotation RNASeq data Genome six-frame cDNA translation three- frame translation PROTEOGENOMICS : BIOINFORMATIC CHALLENGES • Large database sizes (6-frame and 3-frame translaon and metagenomic databases). • False Discovery Rate (FDR) EsEmaEon strategies (for novel pepdes). • False-posiEve sources and their eliminaEon. • Validaon of the pepde idenficaon. (Search using BLAST-P) • PSM EvaluaEon / Targeted proteomics of idenEfied pepdes. • Genomic localizaon. • Validang biological interpretaon. • Disparate tools and numerous processing steps. PROTEOGENOMICS: STEPS INVOLVED ~ 2 million proteins ~ 10,000 proteins ~ 5,000 proteins ~ 1,000 pepEdes ~ 100 pepEdes ~ 50 pepEdes 3D FRACTIONATED SALIVARY SUPERNATANT 6 individuals Salivary supernatant Digested O/N with trypsin The dataset was searched Hexapepde library enrichment against FASTA database with (ProteoMiner ™) followed by SCX / IEF human proteins, contaminant and LC-MS (200 FracEons) proteins, 3-frame translated cDNA database from EnSEMBL Thermofinnigan Orbitrap and Human Oral Microbiome (Orbi MS, MS/MS LTQ) database (HOMD). 200 RAW Files INPUTS : PEAKLISTS and SEARCH db Bandhakavi et al (2009) J. Prot. Res PROTEOGENOMICS WORKFLOW Galaxy-P provides an integrated platform for every step of proteogenomic analysis. • Build target database – download and translate EST databases. • Numerous tools for identification and text manipulation. • Workflow utilizing BLAST to identify novel peptides. • Tool to assess peptide-spectrum matches and visualize spectra. • Visualize identified peptides on the genome. • 140 steps: Seamless, integrated proteogenomic workflow. Flexible and accessible workflows for improved proteogenomic analysis using Galaxy framework. J. Proteome Res. (2014) DOI: 10.1021/pr500812t Link: z.umn.edu/pgfirstlook NOVEL PROTEOFORMS : PRB1 and PRB2 region PROTEOGENOMICS WORKFLOW Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. J Proteome Res. (2014) 13(12):5898-908. doi: 10.1021/pr500812t HIBERNATION PROTEOGENOMICS Tracing of core body temperature (Tb, black line) from a single animal measured by a surgically implanted transmier, along with the controlled ambient temperature (blue line) over the course of the hibernaon season. * TOR (Torpor), J-IBA (January IBA), M-IBA (March IBA) HIBERNATION PROTEOGENOMICS • The datasets were run in triplicates and were searched against proteomic dataset from RNASeq data. • Differenally expressed genes from RNASeq data and differenally expressed proteins from iTRAQ data were compared. - OCT shows the highest correla1on between protein and transcript expression. • FuncEonal analysis of differenEally expressed proteins revealed that: - Protein expression in APRIL rela1ve to AUGUST shows increased mitochondrial funcon and regula1on of muscle contracon. - Protein expression in OCTOBER rela1ve to AUGUST reveals increased hypoxia tolerance and hypertrophy at a 1me of decreased metabolism. - Protein expression in hiberna1on rela1ve to AUGUST highlights faGy acid and ketone metabolism and altered calcium handling and contracle funcon in the heart. • 162 novel pepde sequences were idenfied in all three replicates. METAPROTEOMICS / COMMUNITY PROTEOMICS / MICROBIOMES “The large-scale characterizaNon of the enNre protein complement of environmental microbiota at a given point in me” Bond and Wiilmes (2004) Environ. Microbiol. 6, 911–920. “Through the applicaNon of metaproteomics to different microbial consorNa over the past decade, we have learnt much about key funcNonal traits in the various environmental seVngs where they occur.” Wilmes P, Heintz-Buschart A, Bond PL. (2015) Proteomics. doi: 10.1002/pmic.201500183. DEFINING METAPROTEOMICS : LOOKING WITHIN AND WITHOUT Mass spectrum Reference Protein Database from genomic annotation RNASeq data Genome six-frame cDNA translation three- frame translation Metagenomic sequences DEFINING METAPROTEOMICS Mass spectrum Metagenomic sequences MATCHED PAIR OF CONTROL VERSUS LESION Oral premalignant lesion (OPML) versus control The dataset was searched Digested O/N with trypsin against FASTA database with human proteins, contaminant proteins, 3- SCX FracEonaEon and and LC-MS frame translated cDNA (7 FracEons) database from EnSEMBL and Human Oral Thermofinnigan Orbitrap Microbiome database (Orbi MS, MS/MS LTQ) (HOMD). 7 RAW Files each INPUTS : PEAKLISTS and SEARCH db Kooren et al (2010) Clin. Prot TAXONOMIC AND FUNCTIONAL ANALYSIS DEFINING METAPROTEOMICS: STEPS INVOLVED Metaproteomic analysis using the Galaxy framework. Proteomics. (2015) doi: 10.1002/pmic.201500074. METAPROTEOMICS : BIOLOGICAL INSIGHTS METAPROTEOMICS OF CHILDHOOD CARIES Sucrose No Sucrose • In vitro investigation of sucrose-induced changes in the metaproteomes of children with caries. Prof. Joel Rudney • Major shifts in taxonomy and function in paired microcosm oral biofilms grown without and with sucrose respectively. Twelve replicates have been analyzed. • SEED analysis of Oral microcosm biofilms showed characteristic NS and WS patterns of protein expression that were highly conserved across taxonomically diverse communities. Moreover, many of the proteins that differed between each pH phenotype had functions that would act to promote maintenance of neutral pH under NS conditions, or acid production and tolerance under WS conditions. • Targeted proteomic approaches then can be used to determine whether those proteins are also expressed when plaque is exposed to sucrose in the mouth. In that case, it may be possible to define a set of dysbiosis biomarkers that could be used to detect at-risk tooth surfaces before the development of overt carious lesions. GALAXY PLATFORM Goecks J et al Genome Biol. 2010;11(8):R86. Benefits of Galaxy • A web-based bioinformatics data analysis platform. • Software accessibility and usability. • Share-ability of tools, workflows and histories. • Reproducibility and ability to test and compare results after using multiple parameters. • Software tools can be used in a sequential manner to generate analytical workflows that can be reused, shared and creatively modified for multiple studies. GALAXY-P : IMPLEMENTATION OF PROTEOMICS TOOLS WITHIN GALAXY ENVIRONMENT. OUTPUT INPUT TOOLS CENTRAL PANE HISTORY TOOLS & WORKFLOWS • Software tools can be used in a sequential manner to generate analytical workflows that can be reused, shared and creatively modified for multiple studies. For example, Protein Database Downloader downloads UniProt protein FASTA databases of various organisms. GALAXY-P : IMPLEMENTATION OF PROTEOMICS TOOLS WITHIN GALAXY ENVIRONMENT. PROTEOGENOMICS: STEPS INVOLVED ~ 2 million proteins ~ 10,000 proteins ~ 5,000 proteins ~ 1,000 pepEdes ~ 100 pepEdes ~ 50 pepEdes RNASeq DERIVED PROTEOMIC DATABASES “Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations.” Sheynkman G et al BMC Genomics. doi: 10.1186/1471-2164-15-703. Gloria Sheynkman James Johnson PROTEOGENOMICS WORKFLOW Galaxy-P provides an integrated platform for every step of proteogenomic analysis. • Build target database – download and translate EST databases or perform gene prediction with Augustus. • Numerous tools for identification and text manipulation. • Workflow utilizing BLAST to identify novel peptides. • Tool to assess peptide-spectrum matches and visualize spectra. • Visualize identified peptides on the genome. • 140 steps: Seamless, integrated proteogenomic workflow. Flexible and accessible workflows for improved proteogenomic analysis using Galaxy framework. J. Proteome Res. (2014) DOI: 10.1021/pr500812t Link: z.umn.edu/pgfirstlook PSM EVALUATION 6 GENOME VISUALIZATION USING IGV BROWSER 7 GALAXYP : ONGOING PROJECTS REPERTOIRE OF WORKFLOWS • Sharing of analytical workflows that can be reused, shared and creatively modified for multiple studies. • Multiple workflows for metaproteomics, quantitative proteomics, proteogenomics, RNASeq workflows, are being developed, shared and used. COMMUNITY BASED SOFTWARE DEVELOPMENT • Community-based software development model should prove effective for future implementation, testing and continued improvement of command-line driven software tools. • We plan to offer the many functionalities of MS-GF+ and PeptideShaker in Galaxy, along with opportunities for integration with other software tools via use of workflows. COMMUNITY-BASED SOFTWARE DEVELOPMENT Soware Developers SearchGUI / PepEdeShaker Galaxy Improvements Wrapper to the soZware USER FORUM / tool GITHUB Users test the tools and provide feedback Soware tool to
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages42 Page
-
File Size-