PANTHER Tutorial 2011.Pptx
Total Page:16
File Type:pdf, Size:1020Kb
PANTHER Classificaon System version 7 Huaiyu Mi Department of Preven3ve Medicine Keck School of Medicine University of Southern California USA August 27, 2011, ICSB Tutorial, Heidelberg, Germany 0 Outline • PANTHER Background – How PANTHER is built? • PANTHER Website at a Glance – Brief overview of all PANTHER pages • PANTHER Basic Func3onali3es • PANTHER Tools – Tutorial on tool usage 1 PANTHER BACKGROUND 2 PANTHER Database 3 4 What’s new in PANTHER 7.0? • Whole genome sequence coverage from 48 organisms. • New tree building algorithm (GIGA) for improved phylogene3c relaonships of genes and families. • Improved Hidden-Markov Models • Improved ortholog iden3ficaon. • Implement GO slim and PANTHER protein class for classifying genes and families. • Expanded sets of genomes and sequence iden3fier for PANTHER tools. • PANTHER Pathway diagram in SBGN. 5 PANTHER PROTEIN LIBRARY 6 What is PANTHER? PANTHER library (PANTHER/LIB) • a family tree Sequences • a mul3ple sequence alignment • an HMM PANTHER subfamily HMM models PANTHER GO slim and Protein Class Stas3c models Phylogene3c trees Mul3sequence (HMM) alignments • Molecular func3on • Biological process • Cellular component • Protein class 7 Building PANTHER Protein Family Library Select sequences Build clusters Curaon PANTHER Build MSA Protein Libray Build trees PANTHER GO slim Build and Protein Class HMMs ontology 8 Complete Gene Sets • 12 GO Reference Genomes • 36 other genomes to help reconstruct evolu3onary history – 14 bacterial genomes – 2 archaeal genomes – 2 fungal genomes – 2 plant genomes – 1 amoebozoan genome – 3 prost genomes – 2 protostome genomes – 10 deuterostome genomes 9 “Standard” set of protein coding genes and corresponding protein sequences Get list of genes in each genome • 48 genomes • Sources of genes – MOD Get list of all protein products – ENSEMBL from given source – NCBI (Entrez) • Sources of protein sequences Get mapping of – UniProt each protein – product to UniProt NCBI (Refseq) – ENSEMBL • One protein is selected for Select one each gene. “representative” protein for each gene 10 Building Clusters and MSA Score against PANTHER 6.1 • Family and subfamily IDs HMM library (PTHRxxxxx:SFx) are tracked as much as possible. • New IDs are assigned if no Interpro for necessary. Hit an HMM? addi3onal clusters • In PANTHER 7.2 (release in the end of 2011), all clusters with yes at least one sequence from the 12 MOD will be included in the Family cluster library. • MSA are built with ma, a freely available mul3ple sequence alignment sofware package (Katoh, Nucleic Acid MSA by ma Res., 30:3059-3066) 11 GIGA • An algorithm that makes phylogene3c inferences under the constraint of the species tree. • Use sequence–based distance from mul3ple sequence alignment at each step. – Speciaon – Duplicaon – Ortholog group (subfamily) Thomas, 2010 BMC Bioinformacs, 11:312 12 Phylogene3c inferences based on species tree speciaon speciaon “Fixed differences” between species 13 Speciaon event human human chimpanzee chimpanzee mouse mouse rat rat cow cow horse horse chicken chicken frog frog mosquito mosquito fruit fly fruit fly worm worm yeast yeast 14 human Duplicaon event chimpanzee human chimpanzee human human chimpanzee chimpanzee mouse mouse rat rat cow cow horse horse chicken chicken frog frog mosquito mosquito fruit fly fruit fly worm worm yeast yeast 15 PANTHER Phylogene3c Tree Tree from PTHR11537 • Green node: speciaon • Yellow node: duplicaon • Blue diamond: subfamily 16 PANTHER Protein Library Building 600,000 sequences from 48 62,972 subfamilies organisms Curaon annotated with GO terms and PANTHER pathways. 400,000 sequences In 6594 family clusters 17 Tree Representaon of Subfamilies 18 MSA 19 PANTHER Ontology in Tree 20 PANTHER in InterPro 21 PANTHER in FlyBase 22 PANTHER and Gene Ontology Reference Genome Project 23 PANTHER PATHWAY 24 Goals • Go beyond individual protein. • To understand how mul3ple proteins work together in a complex system. • To build an integrated infrastructure with expert-curated pathways. • To help to establish a standard that will enable the content to be used across a large number of sofware applicaons. • The system should allow users to: – Predict gene and protein func3ons – Analyze research data – Navigate or browse literatures – Design new experiments 25 Biological process ontology vs. Pathway 26 Phylogene3c relaonships help pathway building M p A A 27 Phylogene3c relaonships help pathway building M p A A >40,000 orthologous trees A p X X 28 Phylogene3c relaonships help pathway building M p A A >40,000 orthologous trees A p X X 29 Two approaches to build pathways databases • Boom-up – Start from individual protein/reac3on – Build species specific pathways (or par3al pathways) – Infer to other organisms based on orthologue mapping – Generate a more comprehensive pathway map – Example databases: MetaCyc and Reactome • Top-down – Start with pathways at the conceptual level, usually based on review papers or textbooks – Build a comprehensive pathway map – Assign protein sequences to the pathway 30 PANTHER Pathway Data Structure PANTHER Pathway • A pathway diagram Pathway • Curate the pathway • Display the pathway Reac3on Pathway Molecule Cell type/ • Unambiguous graphical Classes Cellular locaon representation of pathway data Sequences • Structured data for pathway PANTHER subfamily HMM models • Link pathway classes to the sequence database Stas3c models Phylogene3c tree Mul3sequence (HMM) alignment PANTHER library 31 PANTHER Pathway Data Structure • Catalysis • Transition • Nucleus • Transcription and translation • Mitochondria activation/inhibition • Cytoplasm PANTHER pathway • Activation / Inhibition • Nerve terminal • Phosphorylation / dephosphorylation • Lymphocyte • Complex formation Pathway • Astrocytes • Transportation • Upstream / downstream Reac3on Pathway Molecule Cell type/ Classes Cellular locaon Sequences • Proteins: receptor, kinase • Genes:PANTHER subfamily HMM models receptor gene, kinase gene • Simple molecules: Glucose, pyruvate, • Ions: Calcium ion Stas3c models Phylogene3c tree Mul3sequence (HMM) • Phenotypes: stress, glucose deprivationalignment • This entity is also used to link out to other pathways. PANTHER library 32 CellDesigner 33 Pathway Curaon Process Iden3fy pathways To curate Iden3fy curators CellDesigner Pathway Diagrams SBML parser Pathway Index PANTHER library Pathway DB Pathway curaon Web infrastructure PANTHER database Pathway diagram With library sequences applet Associated to pathways Web delivery 34 35 Ac3vity flow view 36 Standard view 37 SBGN-PD view 38 History of PANTHER • 1998: Project was launched at Molecular Applicaon Group. • 1999: Acquired by Celera Genomics. • 2000: PANTHER 1 released in Celera Discovery Systems (CDS). • 2001: PANTHER 2 released, which is used in the annotaon of the first published human genome Celera. • 2002: PANTHER 3 released. PANTHER annotaons are integrated in FlyBase. Moved to ABI • 2003: PANTHER 4 released with the public release of PANTHER Classificaon System. • 2005: PANTHER 5 released with PANTHER Pathway and analysis tool. Establish collaboraon with Interpro. • 2006: PANTHER 6 released. Move to SRI. • 2010: PANTHER 7 released. • 2011: Move to USC. 39 User Stas3cs • 12,000 visits per month • From over 90 countries and territories with USA, India, UK, Germany, China, Japan, Canada, France, Australia and Netherland on the top 10. • 130,000 page views per month • Cited in 2280 scien3fic papers (up to August 2011) 40 PANTHER Stas3cs • 48 organisms • 400,000 genes • 62,972 subfamilies • 6,594 families • 165 pathways 41 PANTHER WEBSITE AT A GLANCE 42 43 Main menu tabs to access to each subject main page 44 PANTHER keyword search and HMM score. 45 Quick links to popular PANTHER func3onali3es. 46 PANTHER news and publicaons. 47 PANTHER Website Pges • List page – Gene list page – Family/subfamily list page – Ontology or pathway list page • Informaon detail page – Gene detail page – Family/subfamily detail page – Pathway descrip3on page – Pathway molecule class detail page – Ontology term detail page • Graph and diagram page – Pie chart – Pathway diagram – Tree viewer 48 PANTHER Gene List Page 49 PANTHER Gene List Page • Click to view the pie chart 50 PANTHER Gene List Page Choose an organism to display your gene list. 51 PANTHER Gene List Page • Sort the list by clicking the column name. • Collapse the column(s) by clicking on the “x” icon. 52 PANTHER Gene List Page • Convert the gene list to another list type 53 PANTHER Gene List Page • Export the list to o Workspace – Need to register an account o File on your computer o Text on the website 54 Gene Detail Page • Informaon is divided into 3 sec3ons – General informaon about the gene • Including IDs, names, gene symbol, alternave IDs, etc. – PANTHER classificaon of the gene • PANTHER family and subfamily informaon. • Links to view the tree and MSA • PANTHER GO slim and protein class – Orthlogs of the gene 55 Gene Detail Page • Columns – ID – Unique gene iden3fiers in PANTHER – Organism - The modern-day organisms in which the ortholog is found. For paralogs, the organism column gives the two speciaon events between which the duplicaon occurred that generated the paralogous genes. ”ND” means ”not determined”. Thus different paralogs can be dis3nguished by how long ago the relevant duplicaons occurred. – Type • LDO - least diverged ortholog • O - other, more diverged orthologs (in case of gene duplicaon) • P - paralogs • Orthologs are genes that can be traced to the same gene in the genome of their most recent common ancestor species.