Tools for Gene Enrichment Analysis: DAVID, Webgestalt, and GSEA

Tools For Gene Enrichment Analysis: DAVID, WebGestalt, and GSEA Rolando Garcia-Milian [email protected] Biomedical Sciences Research Support Contents The Database for Annotation, Visualization and Integrated Discovery (DAVID) .......................................... 7 “Web-based Gene Set Analysis Toolkit” (WebGestalt) .............................................................................. 14 Gene Set Enrichment Analysis (GSEA) ........................................................................................................ 19 File formatting......................................................................................................................................... 19 References .................................................................................................................................................. 31 Glossary of terms and databases ................................................................................................................ 32 2 For this demo we will use The Gene Expression Omnibus Dataset Series GSE15947 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15947) (Kovalenko, Zhang, Cui, Clinton, & Fleet, 2010) will be used for the qualitative gene enrichment analysis example. All screenshots were taken between January and February 2015. GEO Dataset GSE15947 Platform [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array We will analyze the effect of the treatment of 1,25 dihydroxyvitamin D (100 nM)- for 6 hours- on the transcript profile of proliferating RWPE1 cells, an immortalized, non-tumorigenic prostate epithelial cell line. NOTE: You can go directly to the gene enrichment analysis tools (starting on page 7) without obtaining the gene list from GEO, this is just to show the origin of the gene list used in this demo. The GSE15947 was analyzed with GEO2R directly from the Series page. Go to this page by clicking on the link http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15947 Scroll-down the page and click on the “Analyze with GEO2R” link. 3 Click on the “Define groups” link and enter a name for the control and treated with VitD for 6 hrs. -inside the box under “Enter a group name”- CTR and VITD in this example. Select the group of samples for the CTR group and for the VITD group. Scroll-down and click on the “Top 250” button located under the “GEO2R” tab to run the analysis. 4 Once the NCBI return the results of the analysis, click on the “Save all results” link. This will open a new tab with all results (might take several minutes). Save the results by clicking on “File” --- “Save As”. Results will be saved as a *.txt file. Open Excel and import this file. A list of the differentially expressed genes will be arbitrarily selected - experimental versus control samples= for those genes with a P-value ≤0.05 and log2 fold change ≥1. The resulting gene list is shown below. ABCA1 ADRB1 AKR1C1 APCDD1 ARL4D BTBD11 CALML3 ABHD4 AFAP1L2 AKR1C2 ARHGEF16 ARSI C10orf54 CAMK2G ADAMTS15 AIFM2 ALOX5AP ARHGEF28 BARX2 C1QTNF5 CCDC88C ADM AKAP12 AMOTL1 ARHGEF37 BCL3 C2CD2 CD14 5 CD97 EFTUD1 GEM LMCD1 OSR2 RHOF ST3GAL5 CDA EGLN1 GPCPD1 LMO2 P2RY2 RHPN2 SULF1 CEBPB EGR3 GRAMD4 LOC1005067 PADI3 RNF144B SYT12 18 CGN EHBP1L1 GRK5 PCDH7 RNF24 TACSTD2 LOC284837 CHAC1 ELFN2 HAS3 PDE2A RNF44 TCF7L2 LOX CHRM3 EOMES HBEGF PDGFA RTN4R TFCP2L1 LRIG3 CHST11 ETNK2 HCAR3 PDPN RYR1 TGFB2 LYPD3 CITED2 ETS2 HES1 PER1 SEC14L1 THBD LYPD5 CITED4 EVA1A HRCT1 PHACTR3 SEMA3B TINAGL1 MAFB CLCF1 EXTL3 ID1 PHLDA1 SEMA3F TMEM37 MALL CLDN1 FAM129A IER3 PITPNC1 SEMA4B TMEM40 MARCH3 CLDN11 FAM20C IER5L PLAT SEMA6D TMEM79 MCAM CLDN23 FAM43A IFITM10 PLAU SERPINB1 TNFAIP2 MED24 CLMN FBLIM1 IGFBP3 PLCD4 SERPINB13 TNS3 METRNL CRABP2 FIBIN IL1B PLD6 SERPINB2 TPST1 MEX3B CRLF1 FJX1 IL6 PLEKHG3 SERPINE1 TRAF4 MFSD2A CST6 FLI1 IL6R PLXNA2 SESN3 TRIM6 MICAL3 CXXC5 FOS INSIG2 PPP1R3C SH3TC1 TSKU MN1 CYGB FOSL1 IRAK2 PRICKLE1 SHB TSLP MOK CYP1A1 FOXK1 ISL1 PRR16 SHE TWIST2 MTSS1 CYP24A1 FOXQ1 ITPRIP PTAFR SHISA9 TXNRD1 MYLIP CYP26B1 FST KCNJ15 PTGER4 SLC12A7 UCA1 NANOS1 CYR61 FZD8 KCNJ2 PTGES SLC22A23 USP2 NEFL DENND6B G0S2 KIAA1324L PTGS2 SLC37A2 VEGFC NET1 DLK2 G6PD KIF26A PTHLH SLC45A4 VPS37B NFE2L2 DNMBP GADD45A KIF3C PTPN1 SLC46A1 WDSUB1 NFKBIA DUSP1 GATA2 KLF4 RASSF5 SMIM3 WNT7A NGF DUSP10 GATA6 KLK10 RFFL SNN ZBED2 NINJ1 EDN1 GATSL3 KLK6 RFX2 SNX8 ZFP36 NUAK2 EFNB2 GCLC LAMB3 RGCC SOSTDC1 ZNF436 6 The Database for Annotation, Visualization and Integrated Discovery (DAVID) National Institute of Allergy and Infectious Diseases (NIAID), NIH Open the DAVID (Huang da, Sherman, & Lempicki, 2009a, 2009b) home page (http://david.abcc.ncifcrf.gov/ ). Click on “Star Analysis” on the top menu bar. A new window will open. Copy the above list of genes and paste it in the box “A. Paste a list” under “Step 1: Enter Gene List”. Select “OFFICIAL_GENE_SYMBOL” under “Step 2: Select Identifier”. 7 Select “Gene List” under “Step 3: List Type” and click on the “Submit List” button. A warning window will open if more than one species is detected. If this happens, click on the “OK” button and a new page will open. Select “Homo sapiens” from the list of species and click on the “Select Species” button as the background species and click on the “Functional Annotation Tool” link located under “Step 2. Analyze above gene list with one of DAVID tools” Please note that DAVID recognized only 227 IDs out of the 237 IDs in the user list. DAVID default population background in enrichment calculation is the genome-wide genes. The default background is a good choice for the studies in genome-wide scope or close to genome-wide scope. In this case, we will select the DAVID pre-built background Affymetrix (Affymetrix Human Genome U133 Plus 2.0 Array), since it was the platform used in this experiment. Click on the “Background” tab of the left-hand blue menu and select the “Human Genome U133 Plus 2 Array 8 Make sure that list is set to “Homo sapiens” and the “Current Background: Human Genome U133 Plus 2” by reviewing “Step 1. Successfully submitted gene list”. Follow DAVID’s “Step 2. Analyze above gene list with one of DAVID tools” The following table may help you to decide which DAVID tools to choose (http://david.abcc.ncifcrf.gov/content.jsp?file=FAQs.html#25) 9 A detailed explanation on how to interpret DAVID results can be found here http://david.abcc.ncifcrf.gov/content.jsp?file=functional_annotation.html . For this demo, we will use the “Functional Annotation Clustering” tool. Click on the “Functional Annotation Clustering” link. A new page will open, click on the “Functional Annotation Clustering” button under the “Combined View for Selected Annotation” section. A new window will open showing the different annotation clusters resulting from the enrichment analysis. 10 The “Functional Annotation Clustering” function reduces the redundant/repeated nature of annotations by reporting groups that displays similar annotations together. Click on the “Functional Annotation Clustering”. A new page will open. Go back to the “Annotation Summary Results” page. In order to explore the pathways involved, click on the “+” sign next to “Pathways”. A box will open showing the different pathway databases and the results for each database. Click on the “Chart” button next to the “KEGG_PATHWAY”. A new page will open showing the enriched pathways for this gene list. 11 Click on the “Pathways in cancer” link to open this pathway. Those enriched genes will be shown highlighted on the pathway. 12 NOTE: The presentation that comes along with this handouts contains some examples on how to report the DAVID results. 13 “Web-based Gene Set Analysis Toolkit” (WebGestalt) http://bioinfo.vanderbilt.edu/webgestalt/ For this demo, we will use the same gene list generated from the Gene Expression Omnibus Dataset Series GSE15947 (http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE15947 ) (Kovalenko, Zhang, Cui, Clinton, & Fleet, 2010) as we did in the DAVID demo. Platform [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array Go to the WebGestalt (Wang, Duncan, Shi, & Zhang, 2013; Zhang, Kirov, & Snoddy, 2005) home page http://bioinfo.vanderbilt.edu/webgestalt/ and click on the “START” link to begin the analysis. A new web page will open. “Select the organism of interest” from the drop-down menu depending on your gene list. For this example, we will select human “hsapiens” “Select gene ID type” from the drop-down menu located right under the organism menu. In this example, official gene symbols ( “hsapiens_gene_symbol”). 14 Copy/paste the gene list that we used for DAVID demo- at the beginning of this handouts- into the box under “Upload gene list” and click on the “Enter” button. A new page will open. Please note that WebGestalt recognized 234 IDs out of the 237 IDs of the user list. Under “Enrichment Analysis”, select and click on “GO Analysis” from the drop-down menu. 15 “Select Reference Set for Enrichment Analysis” by selecting and clicking on the “hsapiens_affy_hg_plus_2 since the platform used was Affymetrix Human Genome U133 Plus 2.0 Array. For the purposes of this demo, I am leaving the rest of the parameters as default. Click on the “Run the Enrichment Analysis” button. A new tab will open. Click on the “View results” button. 16 A new tab will open containing significantly enriched GO categories under Biological Process, Molecular Function, and Cellular Component with three separate Directed Acyclic Graphs (DAGs) in one page. In addition, you can run a “GO Slim Classification” analysis by selecting this option from the analysis

Tools for Gene Enrichment Analysis: DAVID, Webgestalt, and GSEA

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support