Gene Ontology Analysis with Cytoscape
Total Page:16
File Type:pdf, Size:1020Kb
Introduction to Systems Biology Exercise 6: Gene Ontology Analysis Overview: Gene Ontology (GO) is a useful resource in bioinformatics and systems biology. GO defines a controlled vocabulary of terms in biological process, molecular function, and cellular location, and relates the terms in a somewhat-organized fashion. This exercise outlines the resources available under Cytoscape to perform analyses for the enrichment of gene annotation (GO terms) in networks or sub-networks. In this exercise you will: • Learn how to navigate the Cytoscape Gene Ontology wizard to apply GO annotations to Cytoscape nodes • Learn how to look for enriched GO categories using the BiNGO plugin. What you will need: • The BiNGO plugin, developed by the Computational Biology Division, Dept. of Plant Systems Biology, Flanders Interuniversitary Institute for Biotechnology (VIB), described in Maere S, et al. Bioinformatics. 2005 Aug 15;21(16):3448-9. Epub 2005 Jun 21. • galFiltered.sif used in the earlier exercises and found under sampleData. Please download any missing files from http://www.cbs.dtu.dk/courses/27041/exercises/Ex3/ Download and install the BiNGO plugin, as follows: 1. Get BiNGO from the course page (see above) or go to the BiNGO page at http://www.psb.ugent.be/cbd/papers/BiNGO/. (This site also provides documentation on BiNGO). 2. Unzip the contents of BiNGO.zip to your Cytoscape plugins directory (make sure the BiNGO.jar file is in the plugins directory and not in a subdirectory). 3. If you are currently running Cytoscape, exit and restart. First, we will load the Gene Ontology data into Cytoscape. 1. Start Cytoscape, under Edit, Preferences, make sure your default species, “defaultSpeciesName”, is set to “Saccharomyces cerevisiae”. Set it if needed. 2. Re-start Cytoscape if you had to set the defaultSpeciesName. Now load the network galFiltered.sif. As you may recall, this file contains a network of yeast, S. cervisiae, proteins from the galactose pathway. 3. Under the File menu in your Cytoscape Desktop, select Load and Bio Data Server. Go to Cytoscape2.1/annotation directory and load the “manifest” file (the annotation dir is in the same location as the plugins dir). 4. Now, we will apply Gene Ontology annotations to the nodes in the network, and browse through them. 5. On the Cytoscape Desktop, click on the Annotation button, shown below: 6. This will bring up a window labeled Annotation, shown below: 7. Click on the + sign next to GO, Biological Process. This should bring up one link per GO level, 1 through 13. These levels correspond to the depth of the tree to work at: higher levels represent more general classifications. 8. Click on 3, and click the button labeled Apply Annotation to All Nodes. 9. On the right-hand panel, an entry should appear labeled GO Biological Process (level 3) with a + sign at the left. Click on the + sign to expand this list. 10. The right-hand panel should now list the Level 3 terms, as shown below: 11. On the right-hand panel, click on the term cell cycle. Notice how this action selects several nodes on the Cytoscape canvas. 12. The level 3 GO Biological Process annotation are now available to the Node Attribute Browser, and may be mapped as node labels. 13. In the Annotation window, select some other Level 3 GO Biological Process, and notice how nodes selected on the Cytoscape canvas. See if you can find any sections of the network with a concentration of a particular annotation term. Here we will use the BiNGO plugin to see if GO term enrichment is statistically- significant. 1. Select a candidate sub-network of your choosing or the first neighbors of RAP1 or GAL4 (e.g. Select -> Nodes -> By Name, enter “RAP1” or “GAL4” and click “Search”. Select -> Nodes -> First neighbors of selected nodes). 2. Select these nodes and all edges into a new network, creating a child network. This will make subsequent steps in this tutorial easier. 3. Select all the nodes in this child network. 4. In the Plugins menu, select BiNGO. This should bring up a window called BiNGO Settings. 5. Fill in your BiNGO Settings as follows: 1. Give your cluster a short name such as "test". 2. Leave the Get Cluster from Network box checked. 3. Under Select a statistical test, select Hypergeometric. Binomial testing is used when the amount of data is very large, but hypergeometric testing is appropriate for most Cytoscape usage scenarios. 4. Under Select a multiple testing correction, choose Benjamini & Hochberg False Discovery Rate (FDR). This is less conservative than Bonferroni testing, but still sufficient for most cases. 5. Under Choose a significance level, enter 0.05. This threshold controls which GO classes are detailed in the output. This is not a conservative threshold, but later, one can choose GO classes with lower P values interactively. 6. Under Select the categories to be visualized, select Overrepresented after correction. With very few exceptions, this is the setting you will want. 7. Under Select reference set, select Test cluster versus complete annotation. This will compare your set of nodes to all genes in the yeast genome. 8. Under Select ontology, select GO Biological Process. 9. Click Start BiNGO. 6. After a brief pause, a network will appear on your canvas such as the one shown below 7. Within this network o Each node represents a GO term, and is labeled accordingly (zoom into the network to see labels). o The topology depicts the hierarchy of GO biological processes. Apply the yFiles hierarchical layout. o The yellow and orange nodes represent terms with significant enrichment, with darker orange representing a higher degree of significance, as shown by the legend on your screen: o White nodes are terms with no significant enrichment, but are included because they have a significant child term. Branches of GO with no significant terms are not shown. o The size of each node in a BiNGO graph is proportional to the number of nodes in your query set with that term. 8. Go to Visual Styles editor (Visualization -> Set Visual Properties -> Define -> Node Label), and look at the available attributes under Map Attribute. You should see several more, including: o description_test: the name of the GO biological process o adjustedPValue_test: the p-value for the node, adjusted for multiple hypothesis testing (note that the un-adjusted p-value is also there, with the name pValue_test, but this P value is less useful for most applications). o n_test: the number of genes in the yeast genome with this GO term. o x_test: the number of nodes that you have selected which have this GO term. o N_test: the total number of genes in the yeast genome with GO annotations. o X_test: the total number of genes that you have selected. These last four quantities are used in the calculation of the adjusted P value. 9. Select these attributes. Now, select some nodes in your BiNGO graph, and look at their attributes under the Node Attribute Browser 10. Select some of these terms, and browse through the nodes in your BiNGO graph. 11. Here is a good case for Cytoscape's hiding controls. When we zoom into this network, we see the following: Notice how the region on the right contains two nodes of marginal significance, plus several nodes of no significance included because they are parents of these nodes. 12. Select these nodes. Click on the Hide Selected Region button (shown) These nodes will disappear from the canvas. 13. You can make these nodes visible again with the Show All Nodes and Edges button (below) 14. BiNGO will optionally produce an output file listing the p-values of all nodes with significant enrichment, as follows: . Return to the child network you created previously, and make sure that all nodes are still selected. i. Return to your BiNGO Settings window. ii. Specify a new cluster name: test2. iii. Near the bottom of the window, click on Check box for saving data. iv. Click on the button labeled Save BiNGO Data file in:, and select a directory for BiNGO's output file v. Rerun BiNGO. vi. In the specified directory, you should now have a file called test.2.bgo. Your screen should show a new window titled test.2 BiNGO Results. Both of these should summarize your BiNGO parameters, and report on the enrichment of all terms meeting your p-value threshold. 15. Recall that this BiNGO graph reports on the enrichment of a subnetwork centered on RAP1, GAL4 or the particular nodes you selected. Recall also that this entire network consists of nodes involved in one single pathway: galactose utilization. So when we look at the enriched GO terms in your BiNGO graph, which terms relate to galactose utilization in general, and which relate to your sub-network specifically? Here, we shall see how to answer that question. i. Return to your parent network and verify that the sub-network is still selected. ii. Return to your BiNGO settings window, and choosing Test cluster versus network under Select reference set. Specify a new name in the Cluster name: box, "test3". iii. Rerun BiNGO. iv. Compare the new BiNGO network against the old one. You should see fewer significant GO terms in the new BiNGO network. Which terms are lost? These are probably associated with galactose utilization in general. v. Go to the Node Attribute browser, and click on Select Attributes. Note that the available attributes include adjustedPValue_test (reporting enrichment against the completed genome) and adjustedPValue_test3 (reporting enrichment against the galFiltered network). Select these two attributes for a side-by-side comparison of the p-values of some nodes in the BiNGO graphs.