Gene Ontology and Pathways
Total Page:16
File Type:pdf, Size:1020Kb
Gene ontology and pathways Ståle Nygård [email protected] Bioinformatics Core Facility, Oslo University Hospital/University of Oslo So: here you are Gene lists • Long list of differentially expressed genes • Possibly hundreds of papers describing the functions of the genes • Misleading names • Different names in different organisms Genes seldomly operate on it's own -Genes are by nature not independent. Biologically related genes will often show expression changes together -Trends supported by several genes in a group gives more power to statistical tests vs a test for an individual gene -Need predefined groups of biologically related genes to help process our list for systematic changes. Ontologies • Gene Ontology (GO) • Sequence Ontology (SO) (sequence features) • Phenotype and Trait Ontology (PATO) • Taxon (NCBI) • Anatomy (Penn) • Disease (ICD9) • Developmental stage (multiple sources) Gene Ontology (GO) • Why Gene Ontology? – Produce a controlled vocabulary describing aspects of molecular biology, that can be applied to all organisms. – Facilitate communication between people and organization. – Improve interoperability between systems. Goal of GO Consortium (http://www.geneontology.org/) • Produce a controlled vocabulary describing aspects of molecular biology, that could be applied to all organism. • Describe gene products using vocabulary terms (annotation). • Develop tools: – to query and modify the vocabularies and annotations How does GO work? What information might we want to capture about a gene product? • What does the gene product do? • Why does it perform these activities? • Where does it act? The Gene Ontology (GO) – Molecular function: • Gene product at biochemical level. – Biological process: • Cellular events to which the gene product contributes. – Cellular component: • Location or complex of gene/protein. Molecular Function • activities or “jobs” of a gene product Insulin binding Insulin transport activity Biological Process • a commonly recognized series of events cell division Cellular Component • where a gene product acts Content of GO Molecular Function 8,731 terms Biological Process 19,022 terms Cellular Component 2,737 terms Total 30,490 terms Obsolete terms: 1434 As of May 2010 GO Annotation • Association between gene product and applicable GO terms • Provided by member databases. Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations. • Made by manual or automated methods. • GO Annotation • Database object: gene or gene product • GO term ID • Evidence supporting annotation • Reference – publication or computational method Overrepresentation of GO terms • We have a subset of genes – List of differentially expressed genes – List of genes that cluster together • Which biological processes do these genes take part in? • Is there an over-representation of the number of genes belonging to a particular biological process, compared to what could be expected? Gene Ontology Tools • eGON (from NTNU, www.genetools.no) • GSEA • DAVID • EASE • TopGO • GOstat • + many more Question: which cellular biological processes occur? 0 2 4 6 8 10 12 14 16 18 20 22 24 hours human fibroblasts - 24 h time course thymidine-block release Questions what is the function of up-regulated genes? 0 2 4 6 8 10 12 14 16 18 20 22 24 hours what is the function of down-regulated genes? human fibroblasts - 24 h time course thymidine-block release 173 genes up-regulated 0-4 hours compared to all genes on the array Ordered by significance: 146 genes down-regulated 0-4 hours compared to all genes on the array homeostasis cell adhesion lipid transport chemotaxis amino acid metabolism lipid metabolism response to stress 0 2 4 6 8 10 12 14 16 18 20 22 24 hours cell signaling S-phase ion transport apoptosis cell cycle arrest apoptosis human fibroblasts - 24 h time course thymidine-block release Biological pathways Type of pathways • Metabolic pathways – convert raw materials from the environment into value-added products and recycle or dispose of intracellular materials • Signaling pathways – convert mechanical/chemical stimulus to a cell into a specific cellular response • Regulatory pathways – alter the output of the genetic program through transcriptional and translational regulation • Signaling, regulatory and Signaling metabolic events are often linked Regulatory Metabolic Types of pathway representations • Cartoons – Textbooks – Biocarta • Circuit diagrams – KEGG – Reactome – geneRifs • Computational networks – SBML models – Transcription factor networks KEGG • A large collection of signaling, metabolic and regulatory pathways • Organised by separate pathways with hand drawn diagrams • Academic (freely available) • The pathways can be used to look for overrepresentation or enrichment • Can be used to visually check for path- ness or direction TGF Beta signalling patway Same pathway in Biocarta GO vs. Pathways • Overview • Detail view • Can handle a large • Focused sets of number of genes genes • Many genes • Scattered data annotated sources • Every gene • Focuses on considered on its own interactions between genes Network construction • Information about established pathways (e.g. in KEGG) is (not at all) complete • Pathways interact and depend on context • An alternative approach to using established pathways is to construct networks from the data. Network construction • Networks can be inferred inferred from – correlation in the data (recall gene clustering) and/or – interaction databases: • Protein-protein interactions: BioGRID, IntACT, DIP,HPRD ++ • Transcription factor data bases: TRANSFAC, JASPAR ++ • Literature: PubGENE Network construction: case study WT AB CXCR5 KO AB Mice with the chemokine CXCR5 receptor knocked out develop dialated hypertrophy after banding of the aorta. Microarray study WT SHAM KO SHAM WT AB KO AB (n=3) (n=3) (n=4) (n=4) Aim of study: Find the molecular mecanism behind the altered phenotype of the heart. Network construction using prior knowledge This method constructs a network of interacting genes based on literature reported interactions, protein-protein interactions and correlations in the data. ResultsResults FMOD - fibromodulin …may regulate TGF-beta activities by sequestering TGF- beta into the extracellular matrix CXCL13 B lymphocyte Thbs1- thrombospondin 1 Fn1-Fibronectin 1 chemoattractant Adhesive glycoprotein that Extracellular matrix mediates cell-to-cell and glycoprotein that cell-to-matrix interactions. Tgfb2 - transforming binds to membrane growth factor, beta 2 -spanning Extracellular receptor proteins glycosylated protein. called integrins. Spp1- secreted phosphoprotein 1 Cytokine. Probably Thbs4- important to cell- thrombospondin matrix interaction 4 Lox – lysil oxidase Col14a1- Extracellular copper Collagen, type XIV, enzyme that initiates alpha 1 the crosslinking of collagens and elastin. KO AB vs KO SHAM The method finds a cluster of differentially expressed extracellular matrix locallized genes Conclusion • GO is the world map of molecular biology • Pathways provide more detailed information • Network construction using interaction databases can reveal information beyond classical pathways Questions? .