The UCSC Genome Browser Database: 2019 Update Maximilian Haeussler1,*, Ann S
Total Page:16
File Type:pdf, Size:1020Kb
Published online 8 November 2018 Nucleic Acids Research, 2019, Vol. 47, Database issue D853–D858 doi: 10.1093/nar/gky1095 The UCSC Genome Browser database: 2019 update Maximilian Haeussler1,*, Ann S. Zweig1, Cath Tyner1, Matthew L. Speir1,Kate R. Rosenbloom1, Brian J. Raney1, Christopher M. Lee1, Brian T. Lee 1,AngieS.Hinrichs1, Jairo Navarro Gonzalez1, David Gibson1, Mark Diekhans1, Hiram Clawson1, Jonathan Casper1, Galt P. Barber1, David Haussler1,2, Robert M. Kuhn1 and W. James Kent1 1Genomics Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA and 2Howard Hughes Medical Institute, University of California Santa Cruz, Santa Cruz, CA 95064, USA Received September 14, 2018; Revised October 17, 2018; Editorial Decision October 18, 2018; Accepted October 19, 2018 ABSTRACT description and configuration options. A track can beset to one of the four display modes, from hidden, to a single The UCSC Genome Browser (https://genome.ucsc. line (‘dense’), thin lines (‘squish’), normal lines (‘pack’), up edu) is a graphical viewer for exploring genome an- to ‘full’ with one annotation per line (see our new video notations. For almost two decades, the Browser has https://youtu.be/jKix2B3hwnw). The most-used tracks on provided visualization tools for genetics and molec- the human genome are the various gene models, followed ular biology and continues to add new data and fea- by variants and ENCODE annotations. Documentation for tures. This year, we added a new tool that lets users each track is available by right-clicking it, clicking on its interactively arrange existing graphing tracks into name below the image or the vertical gray bar next to the new groups. Other software additions include new track. In addition, files in our public ‘makeDb/doc’ direc- formats for chromosome interactions, a ChIP-Seq tory contain the UNIX commands to build any track (see peak display for track hubs and improved support http://bit.ly/makeDocs). At the top of the documentation page, tracks can be configured. Depending on the track for HGVS. On the annotation side, we have added type, one can change the size, show only certain features, gnomAD, TCGA expression, RefSeq Functional ele- activate coding sequence codon display, etc. To share a com- ments, GTEx eQTLs, CRISPR Guides, SNPpedia and plete track selection and configuration, the user can create created a 30-way primate alignment on the human a stable link using ‘My Data > My Sessions’ and share with genome. Nine assemblies now have RefSeq-mapped collaborators or further create a ‘Public Session’ hosted on gene models. our servers. In addition to the main Genome Browser, the website INTRODUCTION contains related utilities in the ‘Tools’ menu. These util- ities may find sequences in the genome (BLAT, in-silico The University of California at Santa Cruz (UCSC) PCR), add annotations to uploaded variants (Variant An- Genome Browser (1) is a viewer for genome annota- notation Integrator), let users inspect, join and download tions, primarily those from human and mouse genomes. our annotations as text files in various table formats (Ta- The UCSC Genome Browser team has continually added ble Browser, Data Integrator) or map annotations between data and software features to the website since 2001 and genomes (LiftOver and also ‘View > In Other Genomes’). currently hosts 195 assemblies and 105 species (Menu: Other tools show data not directly located on chromo- ‘Genomes > Other Genomes’). We continue to add new somes, like VisiGene for in-situ images and our viewer for species and assemblies, but most of our user base focuses gene/protein interaction networks (‘Tools > Gene Interac- on human and mouse. As a result, most of our annotation tions’). is concentrated on these two species. All genome annotations can also be downloaded as Apart from showing the DNA sequence itself, the text files (‘Downloads > Genome Data’) or retrieved by Genome Browser is primarily used to display genome an- MySQL (‘Downloads > MySQL Access’). Under ‘Down- notations. Most of these are derived in some way from loads > Utilities’, we provide several hundred tools that al- aligning sequences to the genome, created by us or other low users to convert and analyze our genomics data files groups, and they are stored and shown as small rectan- on a UNIX-like operating system. Finally, we provide all gles with defined chromosome:start-end coordinates (i.e. Genome Browser source code along with scripts that help chr10:123 400–123 499) or graphs of annotation density. install these or a Genome Browser with selected assemblies Annotations are organized into tracks, each with a name, *To whom correspondence should be addressed. Tel: +1 831 295 0653; Email: [email protected] C The Author(s) 2018. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. D854 Nucleic Acids Research, 2019, Vol. 47, Database issue on the user’s own server and a completely configured virtual Trans eQTLs, which is hosted on the public GTEx Analysis machine (‘Mirrors > Mirroring Instructions’). track hub at UCSC. More details for these tracks are at A major aim of the group over the last few years has been https://genome.ucsc.edu/cgi-bin/hgGtexTrackSettings?db= the development of new features that allow researchers to hg19&g=gtexEqtlCluster and http://hgdownload.soe.ucsc. show their own annotations on the Genome Browser. Un- edu/hubs/gtexAnalysis/gtexTransEqtl.html. Finally, the der ‘My Data > Custom Tracks’ a user can type in cus- animated GTEx Body Map graphic has been deployed to a tom annotations (e.g. ‘chr1 1 1000 my annotation’) or paste stand-alone web page, freely available for linking or image URLs of next-generation sequencing files, such as BAM, capture: https://genome.ucsc.edu/gtexBodyMap.html. VCF and many other formats. A research group can use ‘Track Hubs’ (2), to create a group of customized tracks, New NCBI RefSeq tracks configure and document them in detail and link to itfrom a research article. Authors of track hubs can contact us at RefSeq (15) is a group at NCBI that manually curates tran- [email protected] to have their hub added to the script sequences. To obtain genome annotations, these se- list of public track hubs. At the moment around 60 groups quences have to be aligned to the genome and can some- have done so, with 11 hubs added in 2018 (see Supple- times differ from it. The UCSC Genome Browser has always mental Table S1), and a broad selection of tracks is avail- provided a BLAT-based mapping (16) and NCBI made able as a result. Examples include cis-regulatory annota- available genome annotations which differ slightly. For this tions created by JASPAR (3), ReMap (4), the Eukaryotic reason, we created the ‘NCBI RefSeq’ track from these orig- Promoter Database (5), ENCODE(6), the Ensembl regu- inal NCBI transcript mappings for hg38/GRCh38, which latory build (7) and Brain and Roadmap Epigenomics (8), are released every two months. This year we have added RNAs from FANTOM5 (9), Mouse Alleles from MGI (10), this track to eleven model organism assemblies: mouse, C. lncRNAs from LNCipedia (11), Cancer genomics (show- elegans, yeast, zebrafish, fruitfly, X. tropicalis andrat.Our ing variants from TCGA (12)) and many others. Most of new ‘NCBI RefSeq’ tracks still include a subtrack with tran- the tracks were created for the human genome, but there script mappings performed by UCSC, both for backwards are also Caenorhabditis elegans WormBase alignments and compatibility and to allow comparison between the NCBI DNA methylation levels in pigs. Researchers can even sub- and UCSC mappings, which may be informative especially mit annotations for genomes we do not host by adding the when they differ. In addition, the UCSC tracks are contin- genome sequence to create an ‘Assembly hub’. More infor- ually updated while the NCBI RefSeq tracks are only re- mation is available at https://genome.ucsc.edu/goldenpath/ leased every few months but as a stable and archived ‘NCBI help/hgTrackHubHelp.html Annotation Release’. At the time of writing the current an- notation release is 109. It should not be confused with the ‘Refseq Release’ which contains only the sequences and is NEW ASSEMBLIES AND ANNOTATIONS currently at release 89. Since the databases for fruit fly, yeast A detailed list of the annotation tracks that were added this and C. elegans (FlyBase, SGD and WormBase, respectively) year is available in Supplemental Table S2. submit to RefSeq, our gene models on these organisms now match the main model database. More information on these NCBI alignment tracks can be found at http://genome.ucsc. Genomes from more organisms and the CHO cell line edu/blog/the-new-ncbi-refseq-tracks-and-you. In the past year, seven new assemblies, three for new species, were added. The new organisms are African clawed frog ‘RefSeq functional elements’ (xenLae2), Lamprey germline (petMar3) and a new assem- bly, the Chinese Hamster ovary cell line CHO-K1 (criGri- The RefSeq group now also curates functional elements CHOV1). Existing genomes that were updated are Xenopus in non-genic regions (17). These include experimentally- tropicalis (xenTro9), pig (susScr11), chimp (panPan2) and verified gene regulatory regions, known structural elements, zebrafish (danRer11). well-characterized DNA replication origins, and clinically- significant sites of DNA recombination and genomic in- stability. We have added them as tracks on the human Genotype-Tissue expression project hg38/GRCh38 and mouse mm10/GRCm38 assemblies. The GTEx Gene track on human genome assemblies displays expression levels of genes in 53 different tissue Ensembl/gencode updates types (13).