<<

Gene ontology and pathways

Ståle Nygård [email protected]

Bioinformatics Core Facility, Oslo University Hospital/University of Oslo

So: here you are

Gene lists

• Long list of differentially expressed • Possibly hundreds of papers describing the functions of the genes • Misleading names • Different names in different Genes seldomly operate on it's own

-Genes are by nature not independent. Biologically related genes will often show expression changes together

-Trends supported by several genes in a group gives more power to statistical tests vs a test for an individual gene

-Need predefined groups of biologically related genes to help process our list for

systematic changes. Ontologies • (GO) • Sequence Ontology (SO) (sequence features) • and Trait Ontology (PATO) • Taxon (NCBI) • Anatomy (Penn) • (ICD9) • Developmental stage (multiple sources)

Gene Ontology (GO) • Why Gene Ontology? – Produce a controlled vocabulary describing aspects of molecular biology, that can be applied to all organisms. – Facilitate communication between people and organization. – Improve interoperability between systems.

Goal of GO Consortium

(http://www.geneontology.org/)

• Produce a controlled vocabulary describing aspects of molecular biology, that could be applied to all . • Describe gene products using vocabulary terms (annotation). • Develop tools: – to query and modify the vocabularies and annotations

How does GO work?

What information might we want to capture about a gene product?

• What does the gene product do? • Why does it perform these activities? • Where does it act?

The Gene Ontology (GO)

– Molecular function: • Gene product at biochemical level.

– Biological process: • Cellular events to which the gene product contributes.

– Cellular component: • Location or complex of gene/.

Molecular Function • activities or “jobs” of a gene product

Insulin binding transport activity

Biological Process • a commonly recognized series of events

division

Cellular Component • where a gene product acts

Content of GO

Molecular Function 8,731 terms Biological Process 19,022 terms Cellular Component 2,737 terms

Total 30,490 terms

Obsolete terms: 1434

As of May 2010

GO Annotation • Association between gene product and applicable GO terms • Provided by member databases. Collaborating databases annotate their gene products (or genes) with GO terms, providing references and indicating what kind of evidence is available to support the annotations. • Made by manual or automated methods. • GO Annotation • Database object: gene or gene product • GO term ID • Evidence supporting annotation • Reference

– publication or computational method Overrepresentation of GO terms

• We have a subset of genes – List of differentially expressed genes – List of genes that cluster together

• Which biological processes do these genes take part in?

• Is there an over-representation of the number of genes belonging to a particular biological process, compared to what could be expected?

Gene Ontology Tools

• eGON (from NTNU, www.genetools.no) • GSEA • DAVID • EASE • TopGO • GOstat • + many more

Question: which cellular biological processes occur?

0 2 4 6 8 10 12 14 16 18 20 22 24 hours

human fibroblasts - 24 h time course thymidine-block release

Questions

what is the function of up-regulated genes?

0 2 4 6 8 10 12 14 16 18 20 22 24 hours what is the function of down-regulated genes?

human fibroblasts - 24 h time course thymidine-block release

173 genes up-regulated 0-4 hours compared to all genes on the array

Ordered by significance:

146 genes down-regulated 0-4 hours compared to all genes on the array

homeostasis

cell adhesion lipid transport chemotaxis amino acid lipid metabolism response to stress

0 2 4 6 8 10 12 14 16 18 20 22 24 hours

cell signaling S-phase ion transport apoptosis cell cycle arrest apoptosis human fibroblasts - 24 h time course thymidine-block release

Biological pathways

Type of pathways • Metabolic pathways – convert raw materials from the environment into value-added products and recycle or dispose of intracellular materials • Signaling pathways – convert mechanical/chemical stimulus to a cell into a specific cellular response • Regulatory pathways – alter the output of the genetic program through transcriptional and translational regulation

• Signaling, regulatory and Signaling metabolic events are often linked Regulatory

Metabolic

Types of pathway representations • Cartoons – Textbooks – Biocarta • Circuit diagrams – KEGG – – geneRifs • Computational networks – SBML models – factor networks

KEGG

• A collection of signaling, metabolic and regulatory pathways • Organised by separate pathways with hand drawn diagrams • Academic (freely available) • The pathways can be used to look for overrepresentation or enrichment • Can be used to visually check for path- ness or direction

TGF Beta signalling patway

Same pathway in Biocarta

GO vs. Pathways

• Overview • Detail view • Can handle a large • Focused sets of number of genes genes • Many genes • Scattered data annotated sources • Every gene • Focuses on considered on its own interactions between genes

Network construction

• Information about established pathways (e.g. in KEGG) is (not at all) complete • Pathways interact and depend on context • An alternative approach to using established pathways is to construct networks from the data.

Network construction

• Networks can be inferred inferred from – correlation in the data (recall gene clustering) and/or – interaction databases: • Protein-protein interactions: BioGRID, IntACT, DIP,HPRD ++ • Transcription factor data bases: TRANSFAC, JASPAR ++ • Literature: PubGENE

Network construction: case study

WT AB CXCR5 KO AB

Mice with the chemokine CXCR5 receptor knocked out develop dialated hypertrophy after banding of the aorta.

Microarray study

WT SHAM KO SHAM WT AB KO AB (n=3) (n=3) (n=4) (n=4)

Aim of study: Find the molecular mecanism behind the altered phenotype of the heart. Network construction using prior knowledge

This method constructs a network of interacting genes based on literature reported interactions, protein-protein interactions and correlations in the data. ResultsResults

FMOD - …may regulate TGF-beta activities by sequestering TGF- beta into the extracellular matrix CXCL13 B lymphocyte Thbs1- Fn1- 1 chemoattractant Adhesive glycoprotein that Extracellular matrix mediates cell-to-cell and glycoprotein that cell-to-matrix interactions. Tgfb2 - transforming binds to membrane growth factor, beta 2 -spanning Extracellular receptor glycosylated protein. called integrins. Spp1- secreted phosphoprotein 1 Cytokine. Probably Thbs4- important to cell- thrombospondin matrix interaction 4 Lox – lysil oxidase Col14a1- Extracellular copper Collagen, type XIV, that initiates alpha 1 the crosslinking of collagens and elastin. KO AB vs KO SHAM

The method finds a cluster of differentially expressed extracellular matrix locallized genes Conclusion

• GO is the world map of molecular biology

• Pathways provide more detailed information

• Network construction using interaction databases can reveal information beyond classical pathways

Questions?