<<

Introduction to the UCSC Brower

1 Overview

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Search and Configure

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

2 http://genome.ucsc.edu/ Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Search and Configure

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

3 What is the UCSC Genome Browser

The UCSC Genome Browser is a graphical visualization tool to explore the genome and genomic data. Developed by Jim Kent and maintained by the UCSC Genome Browser Project Team.

4 http://genome.ucsc.edu/ Genome Browser like Google Maps for (Cont.)

Businesses

Bus Stops Traffic

5 Genome Browser like Google Maps for Genomes (cont.)

Coordinates (like streets): Position in the DNA sequence

Scale 5 kb hg19 chr21: 33,026,000 33,027,000 33,028,000 33,029,000 33,030,000 33,031,000 33,032,000 33,033,000 33,034,000 33,035,000 33,036,000 33,037,000 33,038,000 33,039,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & ) UCSC Genes RNA-seq from ENCODE/Caltech GM78 2x75 Sg 1 GM78 2x75 Sg 2 GM78 1x75D - 1 GM78 1x75D - 2 GM78 1x75D + 1 GM78 1x75D + 2 K562 2x75 Sg 1 K562 2x75 Sg 2 K562 1x75D - 1 K562 1x75D - 2 K562 1x75D + 1 K562 1x75D + 2 Transcription Factor Binding Sites by ChIP-seq from ENCODE/HAIB GM78 Pol2-4H8 PCR1 1

Expression Genes Regulation “Annotation tracks”: everything below the coordinates annotates something notable about that region

6 Why use the UCSC Genome Browser?

§ Visualize genetic and genomic data

– Publicly available data

– Your own data

§ Download data

– Useful if you want subset data

– Common interface for data from many pulic sources

§ Tools for working with genomic data

– Graphical interface

– Common-line

7 Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Display and Configuration

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

8 http://genome.ucsc.edu/ Genome Browser Home Page

9 Gateway: Start Page

Change Assembly

Change Species Search Genes • Gene Names, symbols, IDs • Chromosome Number (chr7) or Position chr11:4000-5000 • Keywords: kinase, receptor

10 Choosing an Assembly

§ The Genome Browser displays a “reference sequence for every organism

– genetic variation (SNPs, insertions, deletions, etc.) is annotated in terms of how it is different from the reference

§ The species specic genome reference sequence is still being fine=tuned, and there are different versions (called “assemblies”)

§ If you have a list of genomic coordinates, they belong to a particular assembly!

11 Gateway: Start Page

• Hit “GO” without entering a search term to go to a default location • Or the last place you were – the browser ”remembers” it

12 Genome Browser View Zoom Navigation

Ideogram Search

Genome View

13 Genome Browser View (cont.)

Tracks Category

Change Visibility

14 Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Display and Configuration

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

15 Scale 2 kb hg38 chr1: 11,097,500 11,098,000 11,098,500 11,099,000 11,099,500 11,100,000 11,100,500 11,101,000 11,101,500 11,102,000 GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) EXOSC10 EXOSC10 RP4-635E18.6 RefSeq gene predictions from NCBI RefSeq Curated OMIM Allelic Variants OMIM Alleles Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors)

EXOSC10

RP4-635E18.6 H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE 58.48 _ GM12878 0.24 _ DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) DNase Clusters 4.88 _ 100 vertebrates Basewise Conservation by PhyloP

Cons 100 Verts 0 - -4.5 _ Multiz Alignments of 100 Vertebrates Rhesus Mouse Dog Elephant Chicken Visualization of Track Types Scale 2 kb hg38 X_chr1t: ro1p1,09i7,c500 alis11,098,000 11,098,500 11,099,000 11,099,500 11,100,000 11,100,500 11,101,000 11,101,500 11,102,000 GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) EXOSC10 EXOSCZ10 ebrafish RP4-635E18.6 RefSeq gene predictions from NCBI RefSeq Curated Lamprey OMIM Allelic Variants OMIM Alleles Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors) Gene Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples EXOSC10 Feature rs376968979 rsDirecti12131787 5’ UTR rs62623443 rs57976837 rs12136339 rs17552394 rs2335276 rs2791644 rs370840174 3’ UTR on rs12131652 RP4-635E18.6 rs17036324 rs12133922 rs112467639 rs1194757 H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE 58.48 _ GM12878 rs558146924 rs17552387 0.24 _ Wiggle DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) DNase Clusters 4.88 _ rs1398056100 v9erteb6rates Basewise Conservation by PhyloP Cons 100 Verts 0 - Repeating Elements by RepeatMasker -4.5 _ Regions Multiz Alignments of 100 Vertebrates Rhesus Mouse SINE Dog Elephant Chicken X_tropicalis LINE Zebrafish Lamprey Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples rs376968979 rsL1213T1787 Rrs62623443 rs57976837 rs12136339 rs17552394 rs2335276 rs2791644 rs370840174 rs12131652 rs17036324 rs12133922 rs112467639 rs1194757 rs558146924 rs17552387 rs139805696 DNA Repeating Elements by RepeatMasker SINE LINE LTR Simple DNA Simple 16 Low Complexity Low CSateollite mplexity RNA Other Unknown Satellite RNA Other Unknown Additional options for Changing Images

• Drag Tracks up and down to re-arrange

• Drag-and-Select • Zoom-in • Highlight selected section

• Highlight Chromosome View to zoom in/out of section

• Right-click track to change visibility

17 Basic Visibility Example • Hide: remove track from view

• Dense: all itemds collapsed on a single line

• Squish: each item on separate line, but 50% height + packed

• Pack: each item on separate line and efficiently stacked

• Full: each item on separate line

18 Mid-Page options to Change Settings

Reverse 5’<-> 3’ Data Search Reset Default Hides all Track settings tracks

Slice track view Fit Browser to (e.g only ) Window

19 Search an item

§ If the name of the item you search:

– Perfect Unique Match in track

– Track will open and the item will be highlighted

– Matches multiple items or tracks

– See a list of possible matches and you need to select the item/track you want

– Returns “alternate” chromosome location

– There are alternate haplotype chromosomes

20 Display Configuration (demo)

§ Track visibility

– Hide, dense, squish, pack full (pack is usually a good choice)

– Tracks automatically collapse to squish/dense format when you zoom out far enough

§ Track ordering: drag and drop

§ Track right click menu

– Another way to change the visibility and configure display

§ Track highlighting and zoom

§ Search

21 Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Display and Configuration

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

22 Track Details

Configuration

Description

23 Item Details

Click any part of the gene

Item Description page With additional information

24 Get Sequence from Description Page

Genomic Sequence Coding Sequence Translated

Annotation Description Page

Region Page

25 Get DNA Sequence for whole View

26 Details and Sequence (demo)

§ Track Details

§ Item Details

§ Get DNA sequence

– Region

– Item

27 Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Display and Configuration

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

28 How to Download data

• Method 1: Use the Table Browser to get a subset of data

• Method 2: Entire table

29 Method 1: The Table Browser

30 The Table Browser (Assembly and Table)

Specify the assembly and table

31 The Table Browser (Region)

Specify whole genome, or just a region

32 The Table Browser (Identifers)

Optional: Get output only for specific item names

33 The Table Browser (Identifiers)

Optional: Get output only for specific item names

34 Table Browser Query (demo)

Retrieve a list of common SNPs in the BRCA2 gene

1. Naviage to BRCA2 gene locus using the serach box

2. Go to “Tools” -> “Table Browser”

3. Select:

– Assembly: GRCh38/hg38

– Group: Variation

– Track: Common SNPs (147)

– Table: snp147Common

– Region: position (these coordinates will be the BRCA2 position)

35 Table Browser Query (demo cont.)

Retrieve a list of common SNPs in the BRCA2 gene

4. Hit “get output”

5. Note that you get every column in the table

6. To limit to a few columns, go back and choose:

– Output format: “selected fields from primary and related tables”

7. Select only these fields

– chrom, chromStart, chromEnd, name, strand, class, func

36 Table Browser (Output)

§ Note: You need to add 1 to table start coordiantes to get position displayed in the Genome Browser

§ See explanations: https://genome.ucsc.edu/FAQ/FAQtrac ks.html#tracks1

§ http://genomewiki.ucsc.edu/index.ph p/Coordinate_Transforms

37 The Table Browser (File Output)

• Enter name for output file. • Pop-up box asks where to save output • Output will be tab- delimited file

38 Browser settings ”stick”

§ Note that settings you make in the Table Browser stay there until you change them

§ If you get stuck you can reset everything with the below link (under “get output”)

39 Method 2: Entire Table

40 Download Entire Table (Species)

41 Download Entire Table (Assembly and data type)

The genome assembly data

The track data

42 Download Entire Table (Table Names)

§ Use the Table Browser to find table names, and/or see this video: http://genome.ucsc.edu/training/vids/index.html#vid06

43 Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Display and Configuration

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

44 Search Sequences (BLAT)

BLAT = BLAST-like Alignment Tool • Rapid search by indexing the entire genome

• Works best with highly similar matches

45 BLAT Search Results

• Browser: To see a graphic view of the BLAT alignment • Details: To see a matching sequence of query sequence and target sequences

46 Browser View of BLAT Results

47 Details View of BLAT Result

Matching Sequence

48 Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Display and Configuration

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

49 http://genome.ucsc.edu/ Sharing your view (Sessions)

• Requires Login • Lifespan: 4 months

50 Sessions (Naming and Sharing)

List of Saved Sessions

Naming of Session and Permissions

51 Exporting Images for Publication

52 Exporting Images for Publication (File Format)

• Can export PDF and EPS images (vector graphics, don’t get pixelated like screen shots)

53 Saving Session and Exporting Image (demo)

§ Set Tracks and Save Session

§ Export region to PDF file

54 Contents

Today we’re going to cover:

§ Introduction to the UCSC Genome Browser

§ Navigation and layout

§ Display and Configuration

§ Details and Sequences

§ Download data

§ Search by Sequence (BLAT)

§ Share configuration and Export Images

§ Visualize your own data

55 Visualize your own data

§ Custom Tracks

§ Track Hubs

56 Setting Up Custom Track

1. Format the data set

– Formulate your data set as a tab-separated file using one of the formats supported by the Genome Browser. (eg. bedGraph, GTF, BED, bigBed, WIG, bigWig, BAM)

2. Define the Genome Browser display characteristics

3. Define the annotation track display characteristics

4. Display the annotation track in the browser

More Information: https://genome.ucsc.edu/goldenPath/help/customTrack.html

57 Where to Visualize Custom Tracks

There are two options to visualize Custom Tracks for BioHPC Users.

1. Share a custom track file using a link in the lamella web interface, and visualize on the internal BioHPC UCSC Genome Browser mirror at http://genome.biohpc.swmed.edu

2. Upload data to BioHPC's external file exchange site, share it with a link in the cloud.biohpc.swmed.edu web interface, and visualize on the public UCSC Genome Browser site. This method is not suitable for private or sensitive data.

More Information: https://portal.biohpc.swmed.edu/content/guides/ucsc-genome-browser/

58 Custom Track Demo (BioHPC)

1. Create simple bed file in /project or /work directories

2. Create shared link for the file on the lamella web services

3. Add Custom Track on BioHPC Mirror of UCSC

4. View on Browser

59 What are Track Hubs

§ Web-accessible directories (HTTP or FTP) of genomic data that can viewed on the UCSC Genome Browser

§ Useful for visualizing large number of genome-wide data sets

§ Genomic data is stored in compressed binary index (e.g. bigBed, bigBarChart, bigGenePred, bigPsl, bigChain, bigMaf, bigWig, BAM, CRAM, HAL or VCF)

More Information: https://genome.ucsc.edu/goldenpath/help/hgTrackHubHelp.html

60 Setting up a Track Hub

1. One or more datasets in binary indexed format (e.g. bigBed, bigBarChart, bigGenePred, bigPsl, bigChain, bigMaf, bigWig, BAM, CRAM, HAL or VCF)

2. A set of text files that specify properties for the track hub and for each of the data tracks within it

3. An Internet-enabled web server or ftp server

More Information: https://genome.ucsc.edu/goldenpath/help/hgTrackHubHelp.html

61 Where to Visualize and Host a Track Hub

BioHPC users can visualize Track Hubs on the internal BioHPC USCS Genome Browser mirror

§ Upload data to BioHPC's Thunder FTP server

§ More info: https://thunder.biohpc.swmed.edu/manage/instructions/

62 Track Hub Demo (BioHPC)

1. Create hub direcotry in /project/thunder_ftp directory

2. Create simple bed file and convert to bigBed

3. Create hub.txt, genome.txt and trackDb.txt files

4. Get shared link for hub directory

5. Add Track Hub on BioHPC Mirror of UCSC

6. View on Browser

63 Other Tools

§ Use LiftOver to convert coordinates of one assembly to another:

– http://genome.ucsc.edu/cgi-bin/hgLiftOver

§ Use the Variant Annotation Integrator to predict the functional effects of SNPs:

– http://genome.ucsc.edu/cgi-bin/hgVai

§ Use Data Hubs to share tracks and assemblies with a wider audience:

– http://genome.ucsc.edu/cgi-bin/hgHubConnect

64 Where to go for Help

§ FAQ and link to archived questions/answers:

– http://genome.ucsc.edu/FAQ/

§ Online training videos and tutorials:

– http://genome.ucsc.edu/training/index.html

– Open Helix videos highly recommended

§ If you are struggling, send an email to [email protected] . . . they answer!

(note that this is a publicly searchable list)

§ BioHPC specific mirror questions, send an email to [email protected]

65 WashU Epigenome Browser

Internal BioHPC Wasu Epigenom mirror at http://epigenome.biohpc.swmed.edu/

66 Perspective (Genome Browser in 2002)

67 Thank you

Thank you to the UCSC Genome Browser and BioHPC teams!

68 Questions

Questions?

69