How to Use Quickgo and GO Slims for Your Research
Total Page:16
File Type:pdf, Size:1020Kb
How to use QuickGO and GO slims for your research George Georghiou Gene Ontology Annotation Curator www.ebi.ac.uk What is gene ontology (GO)? • A way to capture Less specific concepts biological knowledge in a written and computable form • A set of concepts and their relationships to each other arranged as a hierarchy More specific concepts Aspects of The Gene Ontology Three structured ontologies that allow describing gene products in terms of their… biological processes molecular functions cellular components How scientists can use the gene ontology? • Access gene product functional information • Obtain functional information for novel gene products • Validation of experimental techniques • Analyse high-throughput genomic or proteomic datasets QuickGO is a web-based browser for looking at GO annotations https://www.ebi.ac.uk/QuickGO How to view all available annotations and filter them How to read a GO annotation on QuickGO Gene Product and Symbol lists the identifier and gene name Qualifiers determine the relation of the gene product to the GO term GO term qualifiers • modify the interpretation of an annotation • available qualifiers are: • colocalizes_with • contributes_to • NOT GO term lists the GO term ID and the aspect of the ontology it represents Annotations are supported by evidence – we use GO evidence codes that map to ECO codes Evidence codes used in GO curation : IEA Inferred from Electronic Annotation IDA Inferred from Direct Assay IDA: IMP Inferred from Mutant Phenotype • Enzyme assays • In vitro reconstitution IPI Inferred from Protein Interaction (transcription) IEP Inferred from Expression Pattern • Immunofluorescence IGI Inferred from Genetic Interaction • Cell fractionation & ISS Inferred from Sequence or Structural Similarity IGC Inferred from Genomic Context RCA Reviewed Computational Analysis TAS Traceable Author Statement TAS: NAS Non-traceable Author Statement • In the literature source the original experiments IC Inferred from Curator judgement are referenced. ND No Data available http://www.geneontology.org/GO.evidence.shtml Reference lists where the proof for the annotation comes from With/From column is used to list interacting gene products, metabolites, or chemicals Taxon lists the taxon ID for the gene product Assigned By lists the group that created the annotation Annotation extension allows curators to combine GO terms with other ontologies, database IDs, and GO terms Filters can be used to help find the annotations you are interested in. How scientists can use the gene ontology? • Access gene product functional information • Obtain functional information for novel gene products • Validation of experimental techniques • Analyse high-throughput genomic or proteomic datasets Accessing Gene Product Information • Pick a protein and get the UniProt Identifier (www.uniprot.org) or by typing in the name of it in the search bar. • For this example, we will use human c-Src (UniProt ID: P12931) • Let’s filter for manual experimental annotations • Click on the ’Evidence’ drop down menu and select ‘All manual experimental codes’ and click apply. • And filter once more for annotations for Src made by UniProt • Click on the ‘More’ drop down menu, go to the ‘Assigned By’ filter, and select UniProt. Click apply Using QuickGO to find information about your protein of interest Click on the annotations to see what has been annotated to human c-Src Here are all the annotations to human c-Src You could also filter annotations for your protein of interest here if you enter the ID in the Gene Product ID menu Lets filter our annotations for manual experimental evidence Finally, lets filter for annotations from only UniProt Here our filtered results! Lets get a summary of what has been annotated by looking at the statistics The Summary page gives you a basic overview of the number of annotations and gene products We can find a list of the GO terms that c-Src was annotated to As well as a breakdown of how many annotations are from which aspect of GO We can also find out what types of experimental evidence were used for the annotations How scientists can use the gene ontology? • Access gene product functional information • Obtain functional information for novel gene products • Validation of experimental techniques • Analyse high-throughput genomic or proteomic datasets Using QuickGO to find information on a novel gene product • You’re interested in finding a protein from a bacterial species that there is little known about experimentally for your next grant. • Use the search bar for DNA polymerase • Select the results for gene products • Lets take a look at the Bacillus cereus • Select this from the Organism column on the left hand side • And now let’s take a look at DNA polymerase I (UniProt ID: A0A0K6J8L8) • Click on the 25 annotations and examine the evidence codes for the annotations listed. Search for DNA polymerase Now select Bacillus cereus from the organism column Seeing our filtered results, lets take a look at DNA Polymerase I Notice that our results only show annotations with evidence code IEA There are proteins in UniProt that have no manual annotation. Electronic annotations are made using sequence homology to other proteins to give some information about what these genes may do How scientists can use the gene ontology? • Access gene product functional information • Obtain functional information for novel gene products • Validation of experimental techniques • Analyse high-throughput genomic or proteomic datasets Using QuickGO to validate an experimental result • You’ve done cell fractionation using ultra-centrifugation and mass spectrometry and want to verify that the proteins you are looking at are in the right cellular compartment. • For this example, we’ll use human GAPDH (P04406), Abl kinase (P00519), and Gylcogen Synthase Kinase-3 Beta (P49841). • We only want to see annotations that come from the literature since we are validating an experiment. • Use QuickGO to filter for annotations to the UniProt identifiers above, then filter for aspect, then filter for manual experimental evidences and finally filter for PMID references Use the Gene Product ID drop down to filter for your proteins of interest Use the Gene Product ID drop down to filter for your proteins of interest Filter for the Cellular Component Aspect Now filter for experimental evidence Now filter for PMID references and hit apply Lets take a look at our results • We see that these proteins are found in either the nucleus or cytoplasm. • By clicking on the PMID in the reference column on the table, it will take you to that paper so you can check out the data for yourself. • Alternatively, we can perform the same task with a GO slim How scientists can use the gene ontology? • Access gene product functional information • Obtain functional information for novel gene products • Validation of experimental techniques • Analyse high-throughput genomic or proteomic datasets What is a GO slim and why are they useful? • GO slims are cut-down versions of the GO ontologies containing a subset of the terms in the whole GO. They give a broad overview of the ontology content without the detail of the specific fine grained terms. • GO slims are particularly useful for giving a summary of the results of GO annotation of a genome, microarray, or cDNA collection when broad classification of gene product function is required. • GO slims are created by users according to their needs, and may be specific to species or to particular areas of the ontologies. Analyzing a proteomic dataset • You’ve done an immunoprecipitation for your protein of interest from a mouse cell line and followed up with mass spectrometry analysis to determine what proteins interact with it. You’ve found 25 proteins that are potential interactors and want to know their biological role. • For this example, use the pre-defined mouse GO slim (goslim_mouse) and click ‘Add terms to selection’ • Copy and paste the UniProt IDs from the GO_Slim_Target.docx into the ‘Gene Product ID’ field and click ’Add to selection’ • Click on Apply and examine the results by going to ’Statistics’ page How do we set up GO slims? View, select, and create GO slims Select the mouse GO slim and click on Add terms to selection Paste in the UniProt IDs from the target list and click add to selection. GO Slim Results from our proteomic analysis GO term in annotation that maps to slimmed term Term used in slim Some annotations slim up to two GO terms View the ‘Slim Statistics’ in the ‘Statistics’ Tab Tips for creating a GO Slim 1. When creating a slim for the entire genome, you should try to make sure that it covers as many annotated genes in your set as possible. • You should be aware of how many genes are annotated but not in your slim, and how many are "unknown" (i.e., annotated only to the root node). 2. For display purposes you usually want to keep the number of terms as small as possible to convey your results. However, you should ensure that the terms you include are specific enough to capture biologically relevant information. • Many terms (e.g. metabolic process or cellular process are too general for the purpose of most slim-based analyses). These tips are courtesy of PomBase Tips for creating a GO Slim 3. On a related note, if you are using your slim for data analysis (e.g. to summarize an enrichment), you should ensure that the terms are specific enough to demonstrate their relevance to the biological topic of interest. • For example, lumping all genes involved in transport may mask overrepresentation of transmembrane transport vs. underrepresentation of vesicle-mediated transport in your results set, so you need to ensure that the slim has categories to represent your results effectively. 4. Most current implementations of software to create "GO slims" include the regulates relationship by default, so that (for example) genes involved in regulation of cytokinesis will be included with the set of genes annotated to cytokinesis.