Introduction to the UCSC Genome Brower
1 Overview
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Search and Configure
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
2 http://genome.ucsc.edu/ Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Search and Configure
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
3 What is the UCSC Genome Browser
The UCSC Genome Browser is a graphical visualization tool to explore the genome and genomic data. Developed by Jim Kent and maintained by the UCSC Genome Browser Project Team.
4 http://genome.ucsc.edu/ Genome Browser like Google Maps for Genomes (Cont.)
Businesses
Bus Stops Traffic
5 Genome Browser like Google Maps for Genomes (cont.)
Coordinates (like streets): Position in the DNA sequence
Scale 5 kb hg19 chr21: 33,026,000 33,027,000 33,028,000 33,029,000 33,030,000 33,031,000 33,032,000 33,033,000 33,034,000 33,035,000 33,036,000 33,037,000 33,038,000 33,039,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) UCSC Genes RNA-seq from ENCODE/Caltech GM78 2x75 Sg 1 GM78 2x75 Sg 2 GM78 1x75D - 1 GM78 1x75D - 2 GM78 1x75D + 1 GM78 1x75D + 2 K562 2x75 Sg 1 K562 2x75 Sg 2 K562 1x75D - 1 K562 1x75D - 2 K562 1x75D + 1 K562 1x75D + 2 Transcription Factor Binding Sites by ChIP-seq from ENCODE/HAIB GM78 Pol2-4H8 PCR1 1
Expression Genes Regulation “Annotation tracks”: everything below the coordinates annotates something notable about that region
6 Why use the UCSC Genome Browser?
§ Visualize genetic and genomic data
– Publicly available data
– Your own data
§ Download data
– Useful if you want subset data
– Common interface for data from many pulic sources
§ Tools for working with genomic data
– Graphical interface
– Common-line
7 Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Display and Configuration
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
8 http://genome.ucsc.edu/ Genome Browser Home Page
9 Gateway: Start Page
Change Assembly
Change Species Search Genes • Gene Names, symbols, IDs • Chromosome Number (chr7) or Position chr11:4000-5000 • Keywords: kinase, receptor
10 Choosing an Assembly
§ The Genome Browser displays a “reference sequence for every organism
– genetic variation (SNPs, insertions, deletions, etc.) is annotated in terms of how it is different from the reference
§ The species specic genome reference sequence is still being fine=tuned, and there are different versions (called “assemblies”)
§ If you have a list of genomic coordinates, they belong to a particular assembly!
11 Gateway: Start Page
• Hit “GO” without entering a search term to go to a default location • Or the last place you were – the browser ”remembers” it
12 Genome Browser View Zoom Navigation
Ideogram Search
Genome View
13 Genome Browser View (cont.)
Tracks Category
Change Visibility
14 Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Display and Configuration
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
15 Scale 2 kb hg38 chr1: 11,097,500 11,098,000 11,098,500 11,099,000 11,099,500 11,100,000 11,100,500 11,101,000 11,101,500 11,102,000 GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) EXOSC10 EXOSC10 RP4-635E18.6 RefSeq gene predictions from NCBI RefSeq Curated OMIM Allelic Variants OMIM Alleles Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors)
EXOSC10
RP4-635E18.6 H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE 58.48 _ GM12878 0.24 _ DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) DNase Clusters 4.88 _ 100 vertebrates Basewise Conservation by PhyloP
Cons 100 Verts 0 - -4.5 _ Multiz Alignments of 100 Vertebrates Rhesus Mouse Dog Elephant Chicken Visualization of Track Types Scale 2 kb hg38 X_chr1t: ro1p1,09i7,c500 alis11,098,000 11,098,500 11,099,000 11,099,500 11,100,000 11,100,500 11,101,000 11,101,500 11,102,000 GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) EXOSC10 EXOSCZ10 ebrafish RP4-635E18.6 RefSeq gene predictions from NCBI RefSeq Curated Lamprey OMIM Allelic Variants OMIM Alleles Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors) Gene Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples EXOSC10 Feature rs376968979 rsDirecti12131787 5’ UTR rs62623443 rs57976837 rs12136339 rs17552394 rs2335276 rs2791644 rs370840174 3’ UTR Exon on rs12131652 RP4-635E18.6 rs17036324 rs12133922 rs112467639 rs1194757 H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE 58.48 _ GM12878 rs558146924 rs17552387 0.24 _ Wiggle DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) DNase Clusters 4.88 _ rs1398056100 v9erteb6rates Basewise Conservation by PhyloP Cons 100 Verts 0 - Repeating Elements by RepeatMasker -4.5 _ Regions Multiz Alignments of 100 Vertebrates Rhesus Mouse SINE Dog Elephant Chicken X_tropicalis LINE Zebrafish Lamprey Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples rs376968979 rsL1213T1787 Rrs62623443 rs57976837 rs12136339 rs17552394 rs2335276 rs2791644 rs370840174 rs12131652 rs17036324 rs12133922 rs112467639 rs1194757 rs558146924 rs17552387 rs139805696 DNA Repeating Elements by RepeatMasker SINE LINE LTR Simple DNA Simple 16 Low Complexity Low CSateollite mplexity RNA Other Unknown Satellite RNA Other Unknown Additional options for Changing Images
• Drag Tracks up and down to re-arrange
• Drag-and-Select • Zoom-in • Highlight selected section
• Highlight Chromosome View to zoom in/out of section
• Right-click track to change visibility
17 Basic Visibility Example • Hide: remove track from view
• Dense: all itemds collapsed on a single line
• Squish: each item on separate line, but 50% height + packed
• Pack: each item on separate line and efficiently stacked
• Full: each item on separate line
18 Mid-Page options to Change Settings
Reverse 5’<-> 3’ Data Search Reset Default Hides all Track settings tracks
Slice track view Fit Browser to (e.g only exons) Window
19 Search an item
§ If the name of the item you search:
– Perfect Unique Match in track
– Track will open and the item will be highlighted
– Matches multiple items or tracks
– See a list of possible matches and you need to select the item/track you want
– Returns “alternate” chromosome location
– There are alternate haplotype chromosomes
20 Display Configuration (demo)
§ Track visibility
– Hide, dense, squish, pack full (pack is usually a good choice)
– Tracks automatically collapse to squish/dense format when you zoom out far enough
§ Track ordering: drag and drop
§ Track right click menu
– Another way to change the visibility and configure display
§ Track highlighting and zoom
§ Search
21 Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Display and Configuration
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
22 Track Details
Configuration
Description
23 Item Details
Click any part of the gene
Item Description page With additional information
24 Get Sequence from Description Page
Genomic Sequence Coding Sequence Translated Protein
Annotation Description Page
Region Page
25 Get DNA Sequence for whole View
26 Details and Sequence (demo)
§ Track Details
§ Item Details
§ Get DNA sequence
– Region
– Item
27 Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Display and Configuration
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
28 How to Download data
• Method 1: Use the Table Browser to get a subset of data
• Method 2: Entire table
29 Method 1: The Table Browser
30 The Table Browser (Assembly and Table)
Specify the assembly and table
31 The Table Browser (Region)
Specify whole genome, or just a region
32 The Table Browser (Identifers)
Optional: Get output only for specific item names
33 The Table Browser (Identifiers)
Optional: Get output only for specific item names
34 Table Browser Query (demo)
Retrieve a list of common SNPs in the BRCA2 gene
1. Naviage to BRCA2 gene locus using the serach box
2. Go to “Tools” -> “Table Browser”
3. Select:
– Assembly: GRCh38/hg38
– Group: Variation
– Track: Common SNPs (147)
– Table: snp147Common
– Region: position (these coordinates will be the BRCA2 position)
35 Table Browser Query (demo cont.)
Retrieve a list of common SNPs in the BRCA2 gene
4. Hit “get output”
5. Note that you get every column in the table
6. To limit to a few columns, go back and choose:
– Output format: “selected fields from primary and related tables”
7. Select only these fields
– chrom, chromStart, chromEnd, name, strand, class, func
36 Table Browser (Output)
§ Note: You need to add 1 to table start coordiantes to get position displayed in the Genome Browser
§ See explanations: https://genome.ucsc.edu/FAQ/FAQtrac ks.html#tracks1
§ http://genomewiki.ucsc.edu/index.ph p/Coordinate_Transforms
37 The Table Browser (File Output)
• Enter name for output file. • Pop-up box asks where to save output • Output will be tab- delimited file
38 Browser settings ”stick”
§ Note that settings you make in the Table Browser stay there until you change them
§ If you get stuck you can reset everything with the below link (under “get output”)
39 Method 2: Entire Table
40 Download Entire Table (Species)
41 Download Entire Table (Assembly and data type)
The genome assembly data
The track data
42 Download Entire Table (Table Names)
§ Use the Table Browser to find table names, and/or see this video: http://genome.ucsc.edu/training/vids/index.html#vid06
43 Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Display and Configuration
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
44 Search Sequences (BLAT)
BLAT = BLAST-like Alignment Tool • Rapid search by indexing the entire genome
• Works best with highly similar matches
45 BLAT Search Results
• Browser: To see a graphic view of the BLAT alignment • Details: To see a matching sequence of query sequence and target sequences
46 Browser View of BLAT Results
47 Details View of BLAT Result
Matching Sequence
48 Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Display and Configuration
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
49 http://genome.ucsc.edu/ Sharing your view (Sessions)
• Requires Login • Lifespan: 4 months
50 Sessions (Naming and Sharing)
List of Saved Sessions
Naming of Session and Permissions
51 Exporting Images for Publication
52 Exporting Images for Publication (File Format)
• Can export PDF and EPS images (vector graphics, don’t get pixelated like screen shots)
53 Saving Session and Exporting Image (demo)
§ Set Tracks and Save Session
§ Export region to PDF file
54 Contents
Today we’re going to cover:
§ Introduction to the UCSC Genome Browser
§ Navigation and layout
§ Display and Configuration
§ Details and Sequences
§ Download data
§ Search by Sequence (BLAT)
§ Share configuration and Export Images
§ Visualize your own data
55 Visualize your own data
§ Custom Tracks
§ Track Hubs
56 Setting Up Custom Track
1. Format the data set
– Formulate your data set as a tab-separated file using one of the formats supported by the Genome Browser. (eg. bedGraph, GTF, BED, bigBed, WIG, bigWig, BAM)
2. Define the Genome Browser display characteristics
3. Define the annotation track display characteristics
4. Display the annotation track in the browser
More Information: https://genome.ucsc.edu/goldenPath/help/customTrack.html
57 Where to Visualize Custom Tracks
There are two options to visualize Custom Tracks for BioHPC Users.
1. Share a custom track file using a link in the lamella web interface, and visualize on the internal BioHPC UCSC Genome Browser mirror at http://genome.biohpc.swmed.edu
2. Upload data to BioHPC's external file exchange site, share it with a link in the cloud.biohpc.swmed.edu web interface, and visualize on the public UCSC Genome Browser site. This method is not suitable for private or sensitive data.
More Information: https://portal.biohpc.swmed.edu/content/guides/ucsc-genome-browser/
58 Custom Track Demo (BioHPC)
1. Create simple bed file in /project or /work directories
2. Create shared link for the file on the lamella web services
3. Add Custom Track on BioHPC Mirror of UCSC
4. View on Browser
59 What are Track Hubs
§ Web-accessible directories (HTTP or FTP) of genomic data that can viewed on the UCSC Genome Browser
§ Useful for visualizing large number of genome-wide data sets
§ Genomic data is stored in compressed binary index (e.g. bigBed, bigBarChart, bigGenePred, bigPsl, bigChain, bigMaf, bigWig, BAM, CRAM, HAL or VCF)
More Information: https://genome.ucsc.edu/goldenpath/help/hgTrackHubHelp.html
60 Setting up a Track Hub
1. One or more datasets in binary indexed format (e.g. bigBed, bigBarChart, bigGenePred, bigPsl, bigChain, bigMaf, bigWig, BAM, CRAM, HAL or VCF)
2. A set of text files that specify properties for the track hub and for each of the data tracks within it
3. An Internet-enabled web server or ftp server
More Information: https://genome.ucsc.edu/goldenpath/help/hgTrackHubHelp.html
61 Where to Visualize and Host a Track Hub
BioHPC users can visualize Track Hubs on the internal BioHPC USCS Genome Browser mirror
§ Upload data to BioHPC's Thunder FTP server
§ More info: https://thunder.biohpc.swmed.edu/manage/instructions/
62 Track Hub Demo (BioHPC)
1. Create hub direcotry in /project/thunder_ftp directory
2. Create simple bed file and convert to bigBed
3. Create hub.txt, genome.txt and trackDb.txt files
4. Get shared link for hub directory
5. Add Track Hub on BioHPC Mirror of UCSC
6. View on Browser
63 Other Tools
§ Use LiftOver to convert coordinates of one assembly to another:
– http://genome.ucsc.edu/cgi-bin/hgLiftOver
§ Use the Variant Annotation Integrator to predict the functional effects of SNPs:
– http://genome.ucsc.edu/cgi-bin/hgVai
§ Use Data Hubs to share tracks and assemblies with a wider audience:
– http://genome.ucsc.edu/cgi-bin/hgHubConnect
64 Where to go for Help
§ FAQ and link to archived questions/answers:
– http://genome.ucsc.edu/FAQ/
§ Online training videos and tutorials:
– http://genome.ucsc.edu/training/index.html
– Open Helix videos highly recommended
§ If you are struggling, send an email to [email protected] . . . they answer!
(note that this is a publicly searchable list)
§ BioHPC specific mirror questions, send an email to [email protected]
65 WashU Epigenome Browser
Internal BioHPC Wasu Epigenom mirror at http://epigenome.biohpc.swmed.edu/
66 Perspective (Genome Browser in 2002)
67 Thank you
Thank you to the UCSC Genome Browser and BioHPC teams!
68 Questions
Questions?
69