Ontology - A Way Forwards Ruth Lovering, Varsha Khodiyar, Pete Scambler, Mike Hubank, Rolf Apweiler and Philippa Talmud

Centre for Cardiovascular Genetics, UCL Department of Medicine, Rayne Institute 5 University Street London WC1E 6JF. Molecular Medicine Unit, Institute of Child Health, 30 Guilford Street, London WC1N 1EH. Molecular Hematology and Cancer Biology Unit, Institute of Child Health, 30 Guilford Street, London WC1N 1EH. European Institute, Hinxton, Cambridge, CB10 1SD.

Gene Ontology (GO) provides a controlled vocabulary to describe the attributes of (TNF alpha) Inhibitory action of lipoxins on pro-inflammatory TNF-alpha signalling How is GO used? and gene products in any organism. This resource is proving highly useful for Proteomes and differentially regulated mRNAs can be analysed researchers investigating complex phenotypes such as cardiovascular disease, as well with GO data, to provide an overview of the predominant activities as those interpreting results from high-throughput methodologies. By providing current the constituent are involved in or where they are normally functional knowledge in a format that can be exploited by high-throughput located1. Furthermore, often the generation of hypotheses to technologies, the GOC provides a freely available key public annotation resource that explain proteome-wide alterations in response to certain diseases, can help bridge the gap between data collation and data analysis such as cardiac hypertrophy2, or stress states, such as hypoxia3, (www.geneontology.org). relies on the use of GO annotation data. The ability to review experimental results, with respect to known functional information, The UCL-based GO annotation team aims to work with bench scientists to improve the has also proved useful when investigators need to select a subset annotation of human proteins. Improvements in the GO annotation of your favourite PTPN11 of proteins to analyse in greater depth in order to identify new sets will lead to an improved public resource for everyone. of disease biomarkers4,5. GO data also provides an indispensable resource to indicate the success of subcellular enrichment For more information about contributing to the annotation of the strategies or large scale confocal microscopy analyses6,7. Already, contact [email protected] (IKBKG) MAP3K14 drug treatments are being tailored according to molecular pathway imbalances, detected through individual-specific microarray or

Gene Ontology provides a systematic language for the description of (CHUK) proteomic data. (SFN, YWHA family) gene product attributes in three key domains (NFKB1A) High-throughput technologies and research into multi-factorial diseases are also highlighting how highly investigated proteins in one field of biology are relevant to processes associated with FOXO1 CDKN1B CCNE1 Biological Process Molecular Function Cellular Component (NFKB1) another field of biology. For example, in the central figure, several (IL-6) genes (IL-6, IL-8, STAT3 and TNF-alpha) are associated with the Annotation TNF-alpha pro-inflammatory signalling pathway and are also associated with cardiovascular disease.

GO terms are associated with gene products (proteins) References MetaCore Map, GeneGO, www.genego.com KEY 1. Pasini, E.M., Kirkegaard, M., Mortensen, P., et al. In-depth anyalysis of the membrane and cytosolic proteome of Activation red blood cells. Blood, 2006, 108, 791-801. (CCNE1, Cyclin E1) Associated with 2. Pan, Y., Kislinger, T., Gramolini, A. O., et al. Identification of biochemical adaptations in hyper- or hypocontractile Distribution of Data Cardiovascular Disease Inhibition Kinase Unspecified hearts from phospholamban mutant mice by expression , Proc Natl Acad Sci U S A, 2004, 101: 2241-2246. 3. Boraldi, F., Annovi, G., Carraro, F., et al. Hypoxia influences the cellular cross-talk of human dermal fibroblasts. A Phosphatase Cytoplasm proteomic approach, Biochim Biophys Acta, 2007, 1774: 1402-1413. Extracellular Phospholipase 4. Shi, M., Jin, J., Wang, Y., et al. Mortalin: a protein associated with progression of Parkinson disease?, J GO annotations are available through major biological databases Protein Plasma Membrane Neuropathol Exp Neurol, 2008, 67: 117-124. Transfactor Nucleus 5. Perco, P., Wilflingseder, J., Bernthaler, A., et al. Biomarker candidates for cardiovascular disease and bone

and numerous high-throughput analysis GO tools Molecule B Binding metabolism disorders in chronic kidney disease: A systems biology perspective, J Cell Mol Med, 2008. Phospholipid CR Class relation 6. Kislinger, T., Rahman, K., Radulovic, D., et al. PRISM, a generic large scale proteomic investigation strategy for Ligand CS Complex subunit mammals, Mol Cell Proteomics, 2003, 2: 96-106. Binding protein IE Influence on expression 7. Barbe, L., Lundberg, E., Oksvold, P., et al. Toward a confocal subcellular atlas of the human proteome, Mol Cell Large number of uses Proteomics, 2008, 7: 499-508. Receptor +P Phosphorylation GPCP TR Transcription regulation Spot the Difference Protein Family Z Catalysis Number of publications and GO terms associated with lipoxins/TNF-alpha signalling pathway proteins • Biomarker discovery Completing the annotation of every gene product, using Gene Ontology (GO), is a substantial undertaking, 80 6 • Enhancing annotation of any genome especially for highly investigated genes. Consequently, at present, there is a wide variation between the quality Unique GO terms 70 • Validation of cell separation methodologies and quantity of annotations associated with different proteins. Publications 5 • Identification of disease-associated processes 60 • Quick access to information about individual proteins QuickGO (www.ebi.ac.uk/ego) views of the GO terms associated with TNF-alpha, IL-6 and CCNE1 (above) and 4 50 • Validation of automated ways of deriving gene information the histogram, to the right, illustrate the variation in the number of unique GO terms associated with human • Drug therapies based on process variations between individuals proteins. This variation is not simply a reflection of the current knowledge about these proteins. Thousands of 40 3 • Identification of predominant activities within a specific group of proteins publications describe TNF-alpha and IL-6 and yet there are over twice as many GO terms associated with TNF- alpha (68) as there are with IL-6 (28). This difference is due to the time constraints facing GO curators. At 30 • Identification of common pathways targeted by different pathogens, proteins etc 2 present there are only 2 projects (funding 4 curators) that prioritise the comprehensive annotation of human 20 Log Number of Publications genes. IL-1B, IL-6, PTPN11 and TNF-alpha have been prioritised for annotation by the Cardiovascular GO Number of Unique GO Terms Grant: SP/07/007/23671 1 Annotation Initiative, however, of these only TNF-alpha has been annotated by this project, to date. 10

0 0

The quality of annotations also varies between proteins. Proteins annotated mostly through automated methods IL8 IL6 TNF SFN IL1B JAK1 AKT2 AKT3 PLD1 AKT1 RELA CDK2 CHUK RIPK1 IKBKB STAT3 IKBKG FPRL1 TRAF2 NFKB1 PDPK1 CCNE1 FOXO1 SOCS1 PRKCZ TRADD PIK3R2 PIK3R1 ERLIN1 PIK3CA PIK3CB NFKBIE NFKBIB NFKBIA YWHAZ PIK3CD YWHAE YWHAB

tend to have more general GO terms (see CCNE1). Whereas, proteins with annotations made by GO curators, PTPN11 YWHAH YWHAQ YWHAG CDKN1B TNFRSF1 MAP3K14 TNFRSF1 PPAPDC2 www.cardiovasculargeneontology.com based on published experimental evidence, tend to have more specific GO terms (see TNF-alpha). Gene Symbol