UCSC Genome Browser Tutorial.Pdf

Introduction to the UCSC Genome Brower 1 Overview Today we’re going to cover: § Introduction to the UCSC Genome Browser § Navigation and layout § Search and Configure § Details and Sequences § Download data § Search by Sequence (BLAT) § Share configuration and Export Images § Visualize your own data 2 http://genome.ucsc.edu/ Contents Today we’re going to cover: § Introduction to the UCSC Genome Browser § Navigation and layout § Search and Configure § Details and Sequences § Download data § Search by Sequence (BLAT) § Share configuration and Export Images § Visualize your own data 3 What is the UCSC Genome Browser The UCSC Genome Browser is a graphical visualization tool to explore the genome and genomic data. Developed by Jim Kent and maintained by the UCSC Genome Browser Project Team. 4 http://genome.ucsc.edu/ Genome Browser like Google Maps for Genomes (Cont.) Businesses Bus Stops Traffic 5 Genome Browser like Google Maps for Genomes (cont.) Coordinates (like streets): Position in the DNA sequence Scale 5 kb hg19 chr21: 33,026,000 33,027,000 33,028,000 33,029,000 33,030,000 33,031,000 33,032,000 33,033,000 33,034,000 33,035,000 33,036,000 33,037,000 33,038,000 33,039,000 UCSC Genes (RefSeq, GenBank, CCDS, Rfam, tRNAs & Comparative Genomics) UCSC Genes RNA-seq from ENCODE/Caltech GM78 2x75 Sg 1 GM78 2x75 Sg 2 GM78 1x75D - 1 GM78 1x75D - 2 GM78 1x75D + 1 GM78 1x75D + 2 K562 2x75 Sg 1 K562 2x75 Sg 2 K562 1x75D - 1 K562 1x75D - 2 K562 1x75D + 1 K562 1x75D + 2 Transcription Factor Binding Sites by ChIP-seq from ENCODE/HAIB GM78 Pol2-4H8 PCR1 1 Expression Genes Regulation “Annotation tracks”: everything below the coordinates annotates something notable about that region 6 Why use the UCSC Genome Browser? § Visualize genetic and genomic data – Publicly available data – Your own data § Download data – Useful if you want subset data – Common interface for data from many pulic sources § Tools for working with genomic data – Graphical interface – Common-line 7 Contents Today we’re going to cover: § Introduction to the UCSC Genome Browser § Navigation and layout § Display and Configuration § Details and Sequences § Download data § Search by Sequence (BLAT) § Share configuration and Export Images § Visualize your own data 8 http://genome.ucsc.edu/ Genome Browser Home Page 9 Gateway: Start Page Change Assembly Change Species Search Genes • Gene Names, symbols, IDs • Chromosome Number (chr7) or Position chr11:4000-5000 • Keywords: kinase, receptor 10 Choosing an Assembly § The Genome Browser displays a “reference sequence for every organism – genetic variation (SNPs, insertions, deletions, etc.) is annotated in terms of how it is different from the reference § The species specic genome reference sequence is still being fine=tuned, and there are different versions (called “assemblies”) § If you have a list of genomic coordinates, they belong to a particular assembly! 11 Gateway: Start Page • Hit “GO” without entering a search term to go to a default location • Or the last place you were – the browser ”remembers” it 12 Genome Browser View Zoom Navigation Ideogram Search Genome View 13 Genome Browser View (cont.) Tracks Category Change Visibility 14 Contents Today we’re going to cover: § Introduction to the UCSC Genome Browser § Navigation and layout § Display and Configuration § Details and Sequences § Download data § Search by Sequence (BLAT) § Share configuration and Export Images § Visualize your own data 15 Scale 2 kb hg38 chr1: 11,097,500 11,098,000 11,098,500 11,099,000 11,099,500 11,100,000 11,100,500 11,101,000 11,101,500 11,102,000 GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) EXOSC10 EXOSC10 RP4-635E18.6 RefSeq gene predictions from NCBI RefSeq Curated OMIM Allelic Variants OMIM Alleles Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors) EXOSC10 RP4-635E18.6 H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE 58.48 _ GM12878 0.24 _ DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) DNase Clusters 4.88 _ 100 vertebrates Basewise Conservation by PhyloP Cons 100 Verts 0 - -4.5 _ Multiz Alignments of 100 Vertebrates Rhesus Mouse Dog Elephant Chicken Visualization of Track Types Scale 2 kb hg38 X_chr1t: ro1p1,09i7,c500 alis11,098,000 11,098,500 11,099,000 11,099,500 11,100,000 11,100,500 11,101,000 11,101,500 11,102,000 GENCODE v24 Comprehensive Transcript Set (only Basic displayed by default) EXOSC10 EXOSCZ10 ebrafish RP4-635E18.6 RefSeq gene predictions from NCBI RefSeq Curated Lamprey OMIM Allelic Variants OMIM Alleles Gene Expression in 53 tissues from GTEx RNA-seq of 8555 samples (570 donors) Gene Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples EXOSC10 Feature rs376968979 rsDirecti12131787 5’ UTR rs62623443 rs57976837 rs12136339 rs17552394 rs2335276 rs2791644 rs370840174 3’ UTR Exon on rs12131652 RP4-635E18.6 rs17036324 rs12133922 rs112467639 rs1194757 H3K27Ac Mark (Often Found Near Regulatory Elements) on 7 cell lines from ENCODE 58.48 _ GM12878 rs558146924 rs17552387 0.24 _ Wiggle DNase I Hypersensitivity Peak Clusters from ENCODE (95 cell types) DNase Clusters 4.88 _ rs1398056100 v9erteb6rates Basewise Conservation by PhyloP Cons 100 Verts 0 - Repeating Elements by RepeatMasker -4.5 _ Regions Multiz Alignments of 100 Vertebrates Rhesus Mouse SINE Dog Elephant Chicken X_tropicalis LINE Zebrafish Lamprey Simple Nucleotide Polymorphisms (dbSNP 147) Found in >= 1% of Samples rs376968979 rsL1213T1787 Rrs62623443 rs57976837 rs12136339 rs17552394 rs2335276 rs2791644 rs370840174 rs12131652 rs17036324 rs12133922 rs112467639 rs1194757 rs558146924 rs17552387 rs139805696 DNA Repeating Elements by RepeatMasker SINE LINE LTR Simple DNA Simple 16 Low Complexity Low CSateollite mplexity RNA Other Unknown Satellite RNA Other Unknown Additional options for Changing Images • Drag Tracks up and down to re-arrange • Drag-and-Select • Zoom-in • Highlight selected section • Highlight Chromosome View to zoom in/out of section • Right-click track to change visibility 17 Basic Visibility Example • Hide: remove track from view • Dense: all itemds collapsed on a single line • Squish: each item on separate line, but 50% height + packed • Pack: each item on separate line and efficiently stacked • Full: each item on separate line 18 Mid-Page options to Change Settings Reverse 5’<-> 3’ Data Search Reset Default Hides all Track settings tracks Slice track view Fit Browser to (e.g only exons) Window 19 Search an item § If the name of the item you search: – Perfect Unique Match in track – Track will open and the item will be highlighted – Matches multiple items or tracks – See a list of possible matches and you need to select the item/track you want – Returns “alternate” chromosome location – There are alternate haplotype chromosomes 20 Display Configuration (demo) § Track visibility – Hide, dense, squish, pack full (pack is usually a good choice) – Tracks automatically collapse to squish/dense format when you zoom out far enough § Track ordering: drag and drop § Track right click menu – Another way to change the visibility and configure display § Track highlighting and zoom § Search 21 Contents Today we’re going to cover: § Introduction to the UCSC Genome Browser § Navigation and layout § Display and Configuration § Details and Sequences § Download data § Search by Sequence (BLAT) § Share configuration and Export Images § Visualize your own data 22 Track Details Configuration Description 23 Item Details Click any part of the gene Item Description page With additional information 24 Get Sequence from Description Page Genomic Sequence Coding Sequence Translated Protein Annotation Description Page Region Page 25 Get DNA Sequence for whole View 26 Details and Sequence (demo) § Track Details § Item Details § Get DNA sequence – Region – Item 27 Contents Today we’re going to cover: § Introduction to the UCSC Genome Browser § Navigation and layout § Display and Configuration § Details and Sequences § Download data § Search by Sequence (BLAT) § Share configuration and Export Images § Visualize your own data 28 How to Download data • Method 1: Use the Table Browser to get a subset of data • Method 2: Entire table 29 Method 1: The Table Browser 30 The Table Browser (Assembly and Table) Specify the assembly and table 31 The Table Browser (Region) Specify whole genome, or just a region 32 The Table Browser (Identifers) Optional: Get output only for specific item names 33 The Table Browser (Identifiers) Optional: Get output only for specific item names 34 Table Browser Query (demo) Retrieve a list of common SNPs in the BRCA2 gene 1. Naviage to BRCA2 gene locus using the serach box 2. Go to “Tools” -> “Table Browser” 3. Select: – Assembly: GRCh38/hg38 – Group: Variation – Track: Common SNPs (147) – Table: snp147Common – Region: position (these coordinates will be the BRCA2 position) 35 Table Browser Query (demo cont.) Retrieve a list of common SNPs in the BRCA2 gene 4. Hit “get output” 5. Note that you get every column in the table 6. To limit to a few columns, go back and choose: – Output format: “selected fields from primary and related tables” 7. Select only these fields – chrom, chromStart, chromEnd, name, strand, class, func 36 Table Browser (Output) § Note: You need to add 1 to table start coordiantes to get position displayed in the Genome Browser § See explanations: https://genome.ucsc.edu/FAQ/FAQtrac ks.html#tracks1 § http://genomewiki.ucsc.edu/index.ph p/Coordinate_Transforms 37 The Table Browser (File Output) • Enter name for output file.

UCSC Genome Browser Tutorial.Pdf

Mouse Kcnip2 Conditional Knockout Project (CRISPR/Cas9)

ENCODE Genome-Wide Data on the UCSC Genome Browser Melissa Cline ENCODE Data Coordination Center (DCC) UC Santa Cruz

BIO4342 Exercise 2: Browser-Based Annotation and RNA-Seq Data

BLAT—The BLAST-Like Alignment Tool

No Evidence for Recent Selection at FOXP2 Among Diverse Human Populations

A Multithread Blat Algorithm Speeding up Aligning Sequences to Genomes Meng Wang and Lei Kong*

Homology & Alignment

Databases/Resources on the Web

The UCSC Genome Browser Database: 2021 Update Jairo Navarro Gonzalez 1,*, Ann S

A Dissertation

The UCSC Genome Browser Database: Update 2011 Pauline A

Genetic Analyses in a Bonobo (Pan Paniscus) with Arrhythmogenic Right Ventricular Cardiomyopathy Received: 19 September 2017 Patrícia B