A Group Database for Systematic Analysis of Microarray Data Madelaine Marchin and Chris Seidel Stowers Institute for Medical Research, Kansas City, Missouri, USA

Introduction Gene Group Database Other databases Often when analysing microarray data, it is easy to focus only on the that show Gene group analysis does not work without groups of genes. We built a web database Other gene group databases have been created. GSEA has MSigDB (Molecular the most difference between two samples. This artifi cially crops the center out of the (http://research.stowers-institute.org/microarray/gene_groups) for people to use with signatures database), which contains groups of genes, including groups based on data, in effect ignoring small changes that could offer valuable insight. group analysis. chromosomal position, signaling pathways, and experimental res Microarray Data Up Genes We do not limit the type of group users can upload. We encourage users to upload anyfrom literature. MSigDB does not currently allow users to upload new groups or groups they have developed. Groups might show enrichment in someone else’s data redistribute the database. Data Analysis set, which could be meaningful to both parties.We originally populated the database L2L is another gene list database.All lists represent experimental gene expression Up Genes with groups from Gene Ontology and L2L (1,2). Here is a screenshot of the download results. Users can contribute their own experimental results.This database has been Down Genes page. included in our Gene Group Database. Our database imposes no limitation on groups, is generally extensible, encourage Down Genes sharing, and serves multiple group analysis platforms.

Instead, gene group analysis (also known as gene set enrichment or iterative group analysis) aims to take predefi ned groups of genes and check their overall values in the Visualization microarray data. If a group’s member genes are showing a change in expression, that group is suddenly very interesting. electron transport untreated / osmotic shock groups extracellular region electron transport extracellular region mitochondrial electron transport, ubiquinol to cytochrome c cAMP−mediated signaling mitochondrial electron transport, ubiquinol to cytochrome c proteasomal −dependent catabolism ATP synthesis coupled proton transport (sensu Eukaryota) phosphoinositide binding mitochondrial matrix signal transduction proteasome regulatory particle, base subcomplex (sensu Eukaryota) endopeptidase activity damaged DNA binding cAMP−mediated signaling external side of plasma membrane mitochondrial genome maintenance protein amino acid dephosphorylation aerobic respiration clathrin binding autophagy oxysterol binding proteasomal ubiquitin−dependent protein catabolism general RNA II factor activity meiosis Group 2 response to stress cytochrome c oxidase complex assembly remodeling ATP synthesis coupled proton transport (sensu Eukaryota) membrane actin binding iron ion homeostasis protein targeting to vacuole Golgi to endosome transport cytokinesis, contractile ring formation calcium ion binding hexose transporter activity mitochondrial large ribosomal subunit phosphoinositide binding Hsp70 protein binding Group 1 integral to membrane nuclear mRNA splicing, via Group 4 peroxisomal membrane ER to Golgi transport vesicle Golgi to vacuole transport Group 3 condensed nuclear chromosome kinetochore FAD binding mitochondrial matrix lipid metabolism peptide alpha−N−acetyltransferase activity chromosome organization and biogenesis (sensu Eukaryota) mitochondrial inner membrane ER to Golgi vesicle−mediated transport sulfur metabolism protein ubiquitination during ubiquitin−dependent protein catabolism signal transduction actin cortical patch nuclear envelope−endoplasmic reticulum network carrier activity 3−5− activity protein binding, bridging vacuolar membrane (sensu Fungi) mitochondrial nucleoid G1/S transition of mitotic proteasome regulatory particle, base subcomplex (sensu Eukaryota) eukaryotic translation initiation factor 3 complex response to DNA damage stimulus Group 5 You can search groups to fi nd all groups containing genes of interest. When users mitochondrion cell cortex of cell tip integral to endoplasmic reticulum membrane meiotic recombination endopeptidase activity cytosolic small ribosomal subunit (sensu Eukaryota) cell tip vesicle−mediated transport double−strand break repair response to oxidative stress actin cytoskeleton organization and biogenesis integral to plasma membrane endosome cell wall biosynthesis (sensu Fungi) damaged DNA binding translation initiation factor activity protein amino acid ribosome meiotic chromosome segregation mitotic spindle checkpoint upload groups, they are asked to “tag” them with various category terms for easy user- anaphase−promoting complex calcium ion homeostasis tricarboxylic acid cycle DNA damage checkpoint external side of plasma membrane activity assembly oligosaccharyl complex ubiquitin binding negative regulation of transcription by glucose nuclear membrane plasma membrane protein ubiquitination G2/M transition of mitotic cell cycle mitochondrial genome maintenance protein binding activity endoplasmic reticulum mitochondrial outer membrane complex COPI vesicle coat generated categories. On the download page, a tag cloud shows the tags that have the protein amino acid dephosphorylation dolichol−linked oligosaccharide biosynthesis Groups of genes can be created for a number of different reasons. Groups of genes nucleus transcription factor TFIID complex DASH complex mismatch repair nuclear chromatin telomere maintenance cytosol aerobic respiration mRNA cleavage cell wall (sensu Fungi) SNARE binding nucleotide−excision repair RNA binding complex endonuclease activity mitochondrial intermembrane space riboflavin biosynthesis most member groups and allows for easy navigation. transcription from RNA polymerase II cell surface clathrin binding mitochondrial protein processing are limited only by the imagination, and could include: acetylation cell wall organization and biogenesis sporulation (sensu Fungi) prefoldin complex vacuolar membrane double−strand break repair via homologous recombination mRNA export from nucleus ER−associated protein catabolism Ctf18 RFC−like complex autophagy cell wall mannoprotein biosynthesis ESCRT III complex GTP binding exocyst phosphoprotein phosphatase activity cytoplasmic mRNA processing body SNAP receptor activity intracellular protein transport late endosome to vacuole transport oxysterol binding de novo pyrimidine base biosynthesis chromosome condensation chromatin silencing at telomere cytosolic large ribosomal subunit (sensu Eukaryota) ergosterol biosynthesis Golgi membrane metallopeptidase activity heme biosynthesis DNA repair general RNA polymerase II transcription factor activity signal recognition particle (sensu Eukaryota) mitochondrial outer membrane • Gene Ontology groups kinetochore Golgi apparatus iron−sulfur cluster assembly cell septum conjugation with cellular fusion mitochondrial small ribosomal subunit proteolysis chromosome segregation translation elongation factor activity barrier septum mRNA catabolism meiosis DNA replication, removal of RNA primer polynucleotide adenylyltransferase activity mRNA processing DNA−dependent ATPase activity protein folding COPII vesicle coat response to cadmium ion cyclin−dependent protein regulator activity cytosolic ribosome (sensu Eukaryota) protein amino acid acetylation • Chromosomal locations nucleolar ribonuclease P complex DNA−directed RNA polymerase III complex attachment of spindle microtubules to kinetochore response to stress glutathione transferase activity actin cortical patch distribution cell wall chitin biosynthesis mRNA cleavage and specificity factor complex chromatin silencing at silent mating−type cassette mRNA binding RNA elongation from RNA polymerase II promoter DNA binding cytochrome c oxidase complex assembly chromosome, telomeric region chromatin silencing homologous chromosome segregation DNA recombination commitment complex cellular response to starvation attachment of spindle microtubules to kinetochore during meiosis I nuclear pore • Pathways posttranslational protein targeting to membrane centromeric DNA binding cellular morphogenesis establishment and/or maintenance of cell polarity (sensu Fungi) ATPase activity transcription initiation from RNA polymerase III promoter histidine biosynthesis anaphase−promoting complex−dependent proteasomal ubiquitin−dependent protein catabolism medial ring Rho guanyl−nucleotide exchange factor activity RNA splicing factor activity, transesterification mechanism spore wall (sensu Fungi) DNA−directed RNA polymerase II, holoenzyme chromatin silencing at centromere protein sumoylation collapsed replication fork processing microtubule cytoskeleton organization and biogenesis barrier septum formation • Groups from experimental results regulation of progression through cell cycle methionine metabolism sphingolipid biosynthesis GTPase activator activity endocytosis horsetail nuclear movement membrane alpha−1,2−galactosyltransferase activity isoleucine biosynthesis endoplasmic reticulum membrane mannosyltransferase activity DNA replication bipolar exocytosis acetyl−CoA biosynthesis from pyruvate calcium ion transport actin binding nitrogen compound metabolism DNA−directed RNA polymerase activity cell cortex chromatin assembly or disassembly • genes targeted by a specifi c microRNA chromatin S−adenosylmethionine−dependent methyltransferase activity cytokinesis, site selection chaperonin−containing T−complex de novo IMP biosynthesis intra−Golgi vesicle−mediated transport iron ion homeostasis GPI anchor biosynthesis histone acetyltransferase complex GTPase activity guanyl−nucleotide exchange factor activity induction of conjugation upon nitrogen starvation potassium ion homeostasis equatorial microtubule organizing center ATP−dependent DNA activity ATP−dependent RNA helicase activity protein targeting to vacuole mating type switching 3−5 exonuclease activity DNA clamp loader activity processing of 20S pre−rRNA 35S primary transcript processing DNA replication checkpoint regulation of transcription, DNA−dependent alpha−amylase activity Golgi to endosome transport alpha DNA polymerase: complex ribosome biogenesis and assembly rRNA processing prospore formation 1,6−beta−glucan biosynthesis cytokinesis If you take every group of genes that you and other people can think of and examine centric alpha−glucan biosynthesis cytokinesis, contractile ring formation ATPase activity, coupled to transmembrane movement of substances 1,3−beta−glucan biosynthesis glutamate catabolism glycolysis cell separation during cytokinesis DNA replication initiation ribosome biogenesis signal transduction during conjugation with cellular fusion microtubule motor activity Tag cloud tRNA modification spindle pole body thiamin biosynthesis calcium ion binding cation homeostasis arginine biosynthesis Search options pyrimidine salvage the behavior of the groups in your data, the results are an easy to analyze list of groups hexose transporter activity fatty acid biosynthesis mitochondrial large ribosomal subunit changing in your experiment that can offer surprising insights. Users can upload and download groups without registering. Users who would like Hsp70 protein binding 0.0 0.2 0.4 0.6 0.8

to keep track of private gene groups may register with a password. Groups can Enrichment Factor Group Analysis be downloaded in your chosen fi le format (currently three formats are offered to wt mut Shown here are results using groups from our database and iGA on data from Elena correspond with iGA, Catmap, and GSEA). A help page explains how to use iGA, Hidalgo, S. pombe untreated vs. Osmotic shock. Shown are the groups that most 012345 012345 500 Random groups of 6 Catmap, and Gene Group Database to analyze data. YKL067W changed from untreated to osmotic shock. YMR271C YMR217W YMR120C Behind the scenes A YML106W The database back-end is a MySQL database with 7 tables to hold information about Summary YGR061C Density YGL234W groups, users, tags, and genes.The website is written in HTML, PHP and javascript. YJL130C -We created a public web database to store gene groups for 0.00 0.10 YIL116W 0 1020304050 user microarray data analysis. id N = 500 Bandwidth = 0.7609 RPI1 email password SNO1 -We used our gene groups to run group analysis on a data set. SNZ1 GO groups of 6 1 SNZ2 m SNZ3 ER lumen vacuolar protein catabolism THI11 THI12 group_user group_id THI13 B user_id

THI2 Density Contact Information THI20 Chris Seidel, Managing Director of Microarray, Stowers Institute THI21 1

0.00 0.10 1 Entity-Relationship Diagram for Gene Groups THI22 [email protected] 0 1020304050 Database. Each shape is a table in the data- THI4 group http://research.stowers-institute.org/microarray/gene_groups THI5 N = 111 Bandwidth = 0.8769 id base, and terms in red are keys. THI80 description For any given group of genes, a distance based group_name date Acknowledgements on expression values can be calculated between m creator m Two groups of genes are shown above, visibility Thanks to Elena Hidalgo for the S. pombe untreated vs. osmotic shock data and as measured in a time course of yeast two conditions, such as wt and mutant. For the downloads 1 Antony Cooper for the ER stress data. induced for expression of either a wt example above, distances between wt ant mutant 1 protein, or a misfolded mutant protein, were calculated for groups containing randomly References group_tag group_gene chosen genes. One can see that groups with random group_id which causes ER stress. Genes in group group_id Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., & Cherry, J. M. et al. (2000). Nature Genetics, members give a distribution of distances. However tag_id A are roughly the same under the two gene_id 25(1), 25-29. when distances are calculated for groups of genes conditions, whereas genes in group B 1 1 Breitling, R., Amtmann, A., & Herzyk, P. (2004). BMC Bioinformatics, 5. show differential expression in the wt based on Gene Ontology terms, some groups partition themselves far away from the mean. In this case, m m cells relative to the mutant cells. tag gene Breslin, T., Ede´n, P., & Krogh, M. (2004). BMC Bioinformatics, 5. groups relevant to ER stress distinguish themselves. id id Alternative methods of group analysis exist but all tag gene_name Newman, J. C., & Weiner, A. M. (2005). Genome biology, 6 require a source of groups. Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., & Gillette, M. A. et al. (2005).