ArrayExpress Expression Atlas Investigating gene expression patterns

The Bioinformatics Roadshow – Rotterdam Erasmus Postgraduate school Molecular Medicine

Ibrahim Emam Functional Genomics Group European Bioinformatics Institute ArrayExpress – two databases

May 2, 2012 2 How we view experiments…

• Given one experiment where the effect of a particular compound treatment is assayed from two different strains in four different tissue types.

02/05/2012 3 Examine profile of Saa4 in one experiment

• With respect to Compound Treatment

02/05/2012 4 Examine profile of Saa4 in one experiment

• With respect to Genotype

02/05/2012 5 Examine profile of Saa4 in one experiment

• With respect to Organism Part (Tissue)

02/05/2012 6 Atlas construction Atlas construction - example Meta-analysis framework

• For each experiment: • Identify differentially expressed between groups of samples • A gene is significantly differentially expressed if the combined F- statistic derived from all pairwise comparisons of the means of a gene's expression levels across factors has a sufficiently small adjusted p-value. • Score every condition/gene/experiment triplet – this score gives us the likelihood that this gene is differentially expressed for this condition in the given experiment • Correct these scores for multiple testing and make a cut-off – differentially expressed: yes/no • Repeat for all experiments

02/05/20129 Meta-analysis framework-cont’d

• For every condition-gene pair count in how many experiments it is differentially expressed

• The result is a two-dimensional matrix where rows correspond to genes and columns correspond to conditions, rather than samples.

• The matrix entries are p-values together with a sign, indicating the significance and direction of differential expression

02/05/201210 Gene Expression Atlas Matrix

Conditions Sample annotation Genes Gene expression Gene levels annotations Saa4 in E-AFMX-4

02/05/2012 12 Saa4 in E-MEXP-114

02/05/2012 13 Saa4 in E-MEXP-748

02/05/2012 14 Experiment selection criteria

• The criteria we use for selecting experiments for inclusion in the Atlas are as follows:

• Array designs relating to experiment must be provided to enable re- annotation using Ensembl or Uniprot (or have the potential for this to be done) • High MIAME scores • Experiment must have 6 or more hybridizations • Sufficient replication and large sample size • EF and EFV must be well annotated • Adequate sample annotation must be provided • Processed data must be provided or raw data which can be renormalized must be available Gene Expression Atlas – when to use it

• Find out if the expression of a gene (or a group of genes with a common gene attribute, e.g. GO term) change(s) across all the experiments available in the Expression Atlas;

• Discover which genes are differentially expressed in a particular biological condition that you are interested in

May 2, 2012 16 Gene Expression Atlas Database

• Provides a gene/condition centric view of data in ArrayExpress

• Queries are optimized for summary, meta-analytical gene expression results (over-expressed/under-expressed) across all experiments in any condition and any gene

• Use cases • Search for all genes over/under-expressed in a condition/set of conditions • Search for a gene/set of genes over/under-expressed in a condition/ set of conditions • View all summary expression results for a specific gene • View gene expression patterns in a particular experiment

02/05/2012 17 Gene Expression Atlas Atlas home page http://www.ebi.ac.uk/gxa/

Restrict query by direction of Query for genes differential expression Query for conditions

The ‘advanced query’ option allows building more complex queries Atlas searching fields – auto suggest

• The ‘Genes’ and ‘Conditions’ search fields Scenario

• Imagine you are a scientist working in a drug discovery laboratory developing new therapies for neurodegenerative diseases.

You want to find human genes involved in the disease that could possibly be targets for drug therapy.

You have recently read a paper stating that 'glutamate receptors are important in neurodegeneration',

So you are particularly interested in finding signaling receptor containing an NMDA domain (a particular class of glutamate receptor) that are deregulated in neurodegenerative disease.

02/05/201221 Search for genes

Start typing Interested in genes involved in receptor activity your query use GO term ‘receptor activity’ in genes search box in the ‘gene search box’

auto- suggest will display all matching gene properties available in the Atlas

02/05/2012 22 02/05/2012 Add experimental conditions to your search

Start typing search for ‘nervous system disease’ in the 'Conditions' box and see if any EFO term matches your search criteria

02/05/2012 23 02/05/2012 Search results – heatmap view

Columns: Conditions in EFO terms

Rows: Genes

Heatmap cell: expression and number of times gene is up- or down- regulated

02/05/2012 24 02/05/2012 Advance query

search if among these genes there are some which encode Clicking on ‘advanced for a carrying an NMDA receptor domain query’ will expand the query window to add more query items

Choosing ‘InterPro Term’ from the ‘gene property’ drop down will add a new query item

02/05/2012 25 02/05/2012 Gene Expression Atlas Views

• Search results views • Heatmap view • List view • Gene View • Experiment View • Download results

02/05/201226 Search results – heatmap view Search results – heatmap view

Click on heatmap cell

Plots of experiments supporting the selected gene- condition pair will be shown

p-value of significance of differential expression

02/05/2012 28 02/05/2012 Search results – list view

Each row represents a gene-condition pair

Expanding the row displays thumbnail plots for corresponding experiments

Refine query

Refine query Download search results

• Download a tab-delimited file of your search results • Keep track of all downloads per session Terms and external databases Gene view cross-references

List of experiments showing differential expression for this gene. Clicking on a particular factor on the heatmap will Anatogram showing gene expr in filter only experiments showing diff exp different tissues for that factor

Expression heat maps summary listing all conditions in which the gene was observed differentially expressed

33 02/05/2012 A word of caution

Differential expression of a gene in a certain condition was calculated in context of individual experiments

When we say this gene is over-expressed in kidney in 10 experiments we are not suggesting it is a kidney specific gene.

It means that in each experiment the expression of this gene in kidney was differentially expressed compared to other conditions in each experiment

Gene view plots

Click on liver from expression summary

Liver samples are clearly showing a POTENTIAL expression specificity to this gene

35 02/05/2012 Gene view expression summary

Shows all conditions where this gene has been differentially expressed

02/05/2012 36 02/05/2012 Atlas Experiment View

• Plot the expression of genes in a particular experiment showing the different conditions and experimental factors studied in this experiment

• Show top DE genes for an experiment and be able to plot their expression pattern

• Search for gene(s) of interest to examine their behavior in this experiment

• Identify sample properties Atlas experiment view

Three sections: Plot, Genes, Samples

38 02/05/2012 Experiment box plot

Hovering on bars will show summary statistics for the gene expression

02/05/2012 39 02/05/2012 Experiment box plot

Hovering on bars will show summary statistics for the gene expression

02/05/2012 40 02/05/2012 Box plot

Displays graphically the so-called 5-number summary of a dataset

The summary consists of the median, the upper and lower quartiles, the range, and, possibly, individual extreme values

41 02/05/2012 Experiment line plot

Samples are grouped by condition

Clicking on each EF will plot the same gene but showing different condition groups

02/05/2012 42 02/05/2012 Experiment line plot

Samples are grouped by condition

Clicking on each EF will plot the same gene but showing different condition groups

02/05/2012 43 02/05/2012 Experiment line plot

Hovering over a sample will display all its properties

44 02/05/2012 Condensed experiment plots

Zoom in/out of plot to see all conditions

02/05/2012 45 02/05/2012 Experiment view – HTS data

Clicking on “genome view” will show sequence on ensembl genome browser

46 02/05/2012 Adding genes to plot

Select genes by EF

Search for gene,

list of top DE genes (default)

Add/Remove gene from plot by clicking on little + / - next to gene name

47 02/05/2012 Viewing sample attributes

Clicking on experiment design

Export to tab delimited file

02/05/2012 48 02/05/2012 Summary

• The Gene Expression Atlas is a database that provides information about gene expression patterns at within different biological conditions

• Search for differentially expressed genes either: • by gene name or gene attribute(s) (e.g. terms) • by biological conditions (e.g. diseases, organism parts, cell types) • by using both gene(s) and condition(s)

• Different output views used in the Atlas • the gene page • the experiment page • the heatmap/list views That’s all folks! Questions?