UCSCUCSC GenomeGenome BioinformaticsBioinformatics

Kate Rosenbloom Center for Biomolecular Science and Engineering University of California, Santa Cruz GMOD User Interface Caucus January 18, 2007 http://genome.ucsc.edu TheThe UCSCUCSC GenomeGenome BrowserBrowser PresentsPresents FullyFully AnnotatedAnnotated GenomesGenomes

Vertebrates Invertebrates • human • sea squirt • chimp • sea urchin • rhesus macaque • fruitfly (12) • dog • honeybee • cow • mosquito • mouse • worm (2) • rat • yeast • opossum And coming soon… • chicken • cat • tetraodon, fugu, zebrafish • platypus • medaka, stickleback HardwareHardware •• Under the hood KiloKluster = 1000 CPUs -- Red Hat 9, Apache, Parasol -- 10-Gigabit data transmission -- dual 866 MHz machines x 500 -- 1 Gb RAM each Smaller Clusters -- 100-node cluster: dual Xeon 2.6 GHz -- 400-node cluster •• Public Site -- 400-node cluster NFS -- 8 machines -- redundant -- 12 machines on RAID arrays -- 64-bit -- 4 - 8 Gb RAM -- 8 Gb RAM -- 20+ Tb storage -- 1500 Gb storage + 15 blat servers DataData ContributorsContributors •• •• Genbank/DDJ/EMBL contributors •• ENCODE Consortium •• Novartis GNF foundation •• Affymetrix, Perlegen, SNP Consortium •• SwissProt, Ensembl, EBI and NCBI •• Jackson Labs, RGD, Wormbase, Flybase •• Many contributors of gene prediction and other tracks. HighHigh volumevolume datadata handlinghandling

• All Genbank mRNAs loaded and aligned to the genome nightly; all ESTs weekly (24-48 hours to process).

• At least 6000 - 7000 regular users (separate IP addresses daily).

• 2 - 3 million hits a week

• Consistently #1 or #2 user of bandwidth on the UCSC campus UCSCUCSC BioinformaticsBioinformatics ToolsTools

•• Genome Browser •• Table Browser •• Gene Sorter •• VisiGene •• Custom Tracks •• BLAT •• Downloads server, DAS server, mySQL access GenomeGenome BrowserBrowser TrackTrack configurationconfiguration && descriptiondescription TableTable BrowserBrowser GeneGene SorterSorter VisigeneVisigene (a(a ““virtualvirtual microscopemicroscope””)) http://genome.ucsc.edu/ENCODE ENCODEENCODE BrowserBrowser NewNew features:features: GenomewikiGenomewiki

http:http://genomewiki.cse.ucsc//genomewiki.cse.ucsc.edu.edu NewNew features:features: CustomCustom tracktrack managermanager NewNew feature:feature: TrackTrack reorderingreordering NewNew features:features: ComparativeComparative genomicsgenomics

•• Gap annotation •• Genomic breaks •• Codon translation at base level NewNew featuresfeatures (under(under review):review): SavingSaving useruser sessionssessions NewNew featuresfeatures (in(in development):development): WholeWhole genomegenome graphinggraphing

SNP association study, prepublication data GMODGMOD ScenarioScenario #1:#1: SearchSearch forfor genegene byby namename…… GMODGMOD ScenarioScenario #1:#1: …… andand viewview informationinformation pagepage GMODGMOD ScenarioScenario #1:#1: …… andand viewview informationinformation pagepage (2)(2) GMODGMOD ScenarioScenario #1:#1: …… andand viewview informationinformation pagepage (3)(3) GMODGMOD ScenarioScenario #2#2 (sort(sort of):of): SearchSearch byby keywordkeyword GMODGMOD ScenarioScenario #3:#3: CustomizedCustomized reportreport onon aspectsaspects ofof genegene

•• ExonExon countcount •• GOGO termsterms •• DescriptionDescription GMODGMOD ScenarioScenario #3#3 Alternate:Alternate: CustomizedCustomized reportreport onon aspectsaspects ofof genegene

• Exon count • GO terms • Swiss-Prot disease description GMODGMOD ScenarioScenario #3:#3: CustomizedCustomized reportreport onon gene,gene, cont.cont. GMODGMOD ScenarioScenario #3:#3: ReportReport onon aspectsaspects ofof gene,gene, cont.(2)cont.(2)

•• ExonExon countcount •• GOGO termsterms •• Swiss-ProtSwiss-Prot diseasedisease descriptiondescription GMODGMOD ScenariosScenarios 44 && 5:5: BulkBulk queriesqueries andand externalexternal datadata integration;integration; Compare user gene set to UCSC Known Genes

•• How many user genes are not in Known Genes ? •• How well conserved across different species are the genes unique to the user gene set ? GMODGMOD ScenariosScenarios 44 && 5:5: LoadingLoading externalexternal datadata GMODGMOD ScenariosScenarios 44 && 5:5: LoadingLoading externalexternal data,data, cont.cont. GMODGMOD ScenariosScenarios 44 && 5:5: IntersectionIntersection onon wholewhole datasetdataset GMODGMOD ScenariosScenarios 44 && 5:5: IntersectionIntersection onon wholewhole dataset,dataset, cont.cont. KentKent’’ss UIUI GuidelinesGuidelines •• Keep it reliable •• Keep it fast •• Label everything in plain English •• Put the most commonly used controls on the top of the page •• Keep it as simple as possible (but no simpler) •• Try to make options work together in an orthogonal way •• Remember your users are *intelligent* professionals. Don’t dumb things down; complexity comes with the territory •• Don’t change the site unnecessarily once people have gotten used to it. UserUser interfaceinterface challenges:challenges: User-configurableUser-configurable orderingordering UserUser interfaceinterface challenges:challenges: TrackTrack groupinggrouping toto avoidavoid overloadoverload UserUser interfaceinterface challenges:challenges: CompositeComposite trackstracks toto groupgroup similarsimilar datadata UserUser SupportSupport andand TrainingTraining

• FAQs: http://genome.cse.ucsc.edu/FAQ/ • questions? [email protected] archived answers: http://genome.ucsc.edu/contacts.html • OpenHelix: http://www.openhelix.com/ – Classes, seminars – Free online tutorial – Quick reference cards Thanks!Thanks!

• UCSC Genome Browser Team: – – PI – Jim Kent – Browser Concept, BLAT, Team Leader – Donna Karolchik – Engineering Mgr, Docs & Training – Mark Diekhans, Fan Hsu, Angie Hinrichs, Kate Rosenbloom, Hiram Clawson, Rachel Harte, Heather Trumbower, Galt Barber, Andy Pohl - Engineering – Robert Kuhn (mgr), Ann Zweig, Kayla Smith, Brooke Rhead, Archana Thakkapallayil – QA/Support – Jorge Garcia, Chester Manuel, Victoria Lin, Erich Weller, Paul Tatarsky – KiloKluster, Sys-admin • Funding: – National Human Genome Research Institute – Howard Hughes Medical Institute – National Cancer Institute