Browsing Genomes with Ensembl
Total Page:16
File Type:pdf, Size:1020Kb
Browsing Genomes with Ensembl www.ensembl.org www.ensemblgenomes.org Coursebook v91 training.ensembl.org/events Erasmus MC, Rotterdam 13 - 14 February 2018 1 Introduction to Ensembl 3 Synopsis — What can I do with Ensembl? 4 Need more help? 5 Stay in touch! 5 Further reading 5 Exploring the Ensembl genome browser 7 Demo: Homepage, assemblies and species 7 Demo: The Location tab and Region in detail view 13 Genes and transcripts 20 Demo: The Gene tab 20 Demo: The Transcript tab 27 Variation 32 Demo: Exploring variants in Ensembl 32 Demo: The Variant Effect Predictor (VEP) 41 Comparative genomics 47 Demo: Gene trees and homologues 47 Demo: Whole genome alignments 50 Regulation 55 BioMart 63 Advanced access 68 Demo: Custom data upload: Upload small files 68 Demo: Track Hubs 71 Demo: REST API 75 Quick Guide to Databases and Projects 78 2 Introduction to Ensembl Getting started with Ensembl www.ensembl.org Ensembl is a project from the EMBL-EBI (European Molecular Biology Laboratory-European Bioinformatics Institute) that focuses on the annotation of vertebrate genomes. Gene sets from model organisms such as Saccharomyces cerevisiae, Caenorhabditis elegans and Drosophila melanogaster are also included for comparative analysis by the Ensembl ‘Compara’ team. Most annotation is updated every two months, leading to increasing Ensembl versions (such as version 90), however the gene sets are determined less frequently. A sister browser at www.ensemblgenomes.org is set up to access non-vertebrates, namely bacteria, plants, fungi, metazoa, and protists. Ensembl provides genes and other annotation such as regulatory regions, conserved base pairs across species, and sequence variations. The Ensembl gene set is based on protein and mRNA evidence in UniProtKB and NCBI RefSeq databases, along with manual annotation from the VEGA/Havana group. All the data are freely available and can be accessed via the web browser at www.ensembl.org. Perl programmers can directly access Ensembl databases through Application Programming Interfaces (Perl APIs), while a REST API is available to access Ensembl databases using other programming languages. Gene sequences can be downloaded from the Ensembl browser itself, or through the use of the BioMart web interface, which can extract information from the Ensembl databases without the need for programming knowledge by the user. Finally, all data can be downloaded via the Ensembl FTP server and code is available on GitHub. 3 Synopsis — What can I do with Ensembl? ● View genes with other annotation along the chromosomes. ● View alternative transcripts (i.e. splice variants) for a given gene. ● Explore homologues and phylogenetic trees across more than 100 species for any gene. ● Compare whole genome alignments and conserved regions across species. ● View microarray probe sequences that match to Ensembl genes. ● View ESTs, clones, mRNA and proteins for any chromosomal region. ● Examine single nucleotide polymorphisms (SNPs) for a gene or chromosomal region. ● View SNPs across strains (rat, mouse), populations (human), or breeds (dog). ● View positions and sequence of mRNAs and proteins that align with Ensembl genes. ● Upload your own data. ● Use BLAST, or BLAT against any genome in Ensembl. ● Export sequences or create a table with gene information using BioMart. ● Determine how variants affect genes and transcripts using the Variant Effect Predictor. ● Share Ensembl displays with your colleagues and collaborators. 4 Need more help? ● Check Ensembl documentation ● Watch video tutorials on YouTube or YouKu ● View the FAQs ● Try some exercises ● Read some publications ● Take our online course ● Watch our webinar series Stay in touch! ● Email the team with comments or questions at [email protected] ● Follow the Ensembl blog ● Sign up to a mailing list ● Find us on Facebook: https://www.facebook.com/Ensembl.org/ ● Follow us on Twitter: @ensembl and @ensemblgenomes Further reading Zerbino, Daniel R. et al. Ensembl 2018 Nucleic Acids Research (Database Issue) https://doi.org/10.1093/nar/gkx1098 Kersey, PJ et al. Ensembl Genomes 2018: an integrated omics infrastructure for non-vertebrate species Nucleic Acids Research (Database Issue) https://doi.org/10.1093/nar/gkx1011 For a complete list of publications, visit: http://www.ensembl.org/info/about/publications.html http://ensemblgenomes.org/info/publications 5 6 Exploring the Ensembl genome browser All clickable links highlighted in blue. All typed and manually submitted text highlighted in red. Demo: Homepage, assemblies and species The front page of Ensembl is found at ensembl.org. It contains lots of information and links to help you navigate Ensembl: At the top right you can see the current release number and what has come out in this release. To access old releases, scroll to the bottom of the page and click on View in archive site. 7 Click on the links to go to the archives. Alternatively, you can jump quickly to the correct release by adding it to the URL: for example, e78.ensembl.org jumps to release 78. Click on View full list of all Ensembl species under the ‘All genomes’ heading. Click on the common name of your species of interest to go to the species homepage. We will click on Human. 8 To find out more about the genome assembly and genebuild, click on More information and statistics. The current genome assembly for human is GRCh38. If you want to see the previous assembly, GRCh37, visit our dedicated site, grch37.ensembl.org. 9 Let’s take a look at the Ensembl Genomes homepage at ensemblgenomes.org. Click on Ensembl Bacteria, Fungi, Metazoa, Plants or Protists to see their homepages. Each one is colour-coded. 10 Protists Fungi Metazoa Plants Bacteria You can navigate most of the taxa in the same way as you would with Ensembl, but Ensembl Bacteria has a large number of genomes, so needs slightly different methods. Let’s look at it in more detail. 11 The search box does not include a full list of bacterial species as it would be hard to navigate them due to the large number of species. To find a species, start typing the species name into the genome search box. A drop-down list will appear with possible species. For example, to find a strain of Streptococcus pyogenes, start typing the species name. The autocomplete will retrieve useful results as soon as you get to stre. 12 The drop-down contains various strains of Streptococcus pyogenes. Let’s choose Streptococcus pyogenes AA216. This will take us to another species homepage, where we can explore various features. Demo: The Location tab and Region in detail view Start at the Ensembl front page, ensembl.org. You can search for a region by typing it into a search box, but you have to specify the species. Type (or copy and paste) human 4:122868000-122946000 into either search box. or Press Enter or click Go to jump directly to the Region in detail Page. Click on the button to view page-specific help. The help pages provide links to Frequently Asked Questions, a Glossary, Video Tutorials, and a form to Contact Helpdesk. There is a help video on this page at http://youtu.be/tTKEvgPUq94. 13 The Location tab shows three images: Chromosome, Overview and Region. Let’s look at each one in detail. The first image shows the chromosome: You can jump to a different region by dragging out a box in this image. Drag out a box on the chromosome; a pop-up menu will appear. 14 If you would like to move to the region, you could click on Jump to region (### bp). To highlight it, click on Mark region (### bp). For now, we will close the pop-up by clicking on the X on the corner. The second image shows a 1Mb region around our selected region. This view allows you to scroll back and forth along the chromosome. You can also drag out and jump to or mark a region. Click on the X to close the pop-up menu. 15 Click on the Drag/Select button to change the action of your mouse click. Now you can scroll along the chromosome by clicking and dragging within the image. As you do this you will see the image below grey out and two blue buttons appear. Clicking on Update this image would jump the lower image to the region central to the scrollable image. We want to go back to where we started, so we will click on Reset scrollable image. The third image is a detailed, configurable view of the region. Click on the Drag/Select option at the top or bottom right to switch mouse action. On Drag, you can click and drag left or right to move along the genome; the page will reload when you drop the mouse button. On Select you can drag out a box to highlight or zoom in on a region of interest. With the tool set to Select, drag out a box around an exon and choose Mark region. 16 The highlight will remain in place if you zoom in and out or move around the region. This allows you to keep track of regions or features of interest. You can edit what you see on this page by clicking on the blue Configure this page menu at the left. This will open a menu that allows you to change the image. You can put some tracks on in different styles; more details are in this FAQ: http://www.ensembl.org/Help/Faq?id=335. 17 Let’s add some tracks to this image. Add: ● Proteins (mammal) from UniProt – Labels ● 1000 Genomes - EUR - short variants (SNPs and indels) Now click on the tick at top left to save and close the menu. Alternatively, click anywhere outside of the menu. You can now see the tracks in the image.