Ensembl Bacteria Coursebook
Total Page:16
File Type:pdf, Size:1020Kb
Browsing Bacterial Genomes www.ensemblgenomes.org bacteria.ensembl.org Coursebook http://www.ebi.ac.uk/~dstaines/Workshops/2014/Marseille Marseille – 23rd June 2014 1 TABLE OF CONTENTS Introduction to Ensembl Bacteria ........................................................... 3 What can I do with Ensembl Bacteria? ................................................. 3 Exploring the Ensembl Bacteria genome browser ............................ 5 Demo: Finding genomes .............................................................................. 5 Exercise: Finding genomes......................................................................... 7 Demo: The Region in detail view ............................................................. 7 Exercises: The Region in Detail view ...................................................10 Genes and transcripts .............................................................................. 11 Demo: The gene tab .....................................................................................11 Demo: The transcript tab ..........................................................................16 Demo: Searching by sequence ................................................................19 Exercises: Genes and transcripts ...........................................................21 Comparative genomics ............................................................................ 22 Demo: Gene trees and homologues ......................................................22 Demo: Gene familes .....................................................................................24 Exercises: Comparative genomics ........................................................26 Using your own data ................................................................................. 27 Demo: Viewing your data in the browser ..........................................27 Demo: The Variant Effect Predictor (VEP) ........................................29 Exercises: Using your own data .............................................................31 Downloading data ...................................................................................... 32 Exporting individual data .........................................................................32 Downloading whole genomes .................................................................33 Programming with Ensembl Bacteria ................................................. 34 Getting started with the REST API ........................................................34 Getting started with the Perl API ...........................................................34 2 Introduction to Ensembl Bacteria Ensembl Genomes (www.ensemblgenomes.org) is an integrating resource for genome-scale data from non-vertebrate species, provided by the EBI (European Bioinformatics Institute). The project exploits and extends technologies for genome annotation, analysis and dissemination, developed in the context of the vertebrate-focused Ensembl project (www.ensembl.org), and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. The Ensembl Genomes project includes the Ensembl Bacteria portal, which provides access to over 10,000 genomes from more than 2,000 species of bacteria and archaea. What can I do with Ensembl Bacteria? Explore more than 10,000 genomes from over 2,000 bacterial and archaeal species. View genes with other annotation along the chromosome. View detailed annotation for genes Download gene sequences and annotation for individual genes or genomic regions Explore homologues and phylogenetic trees across the taxonomic range for selected genomes. Find genes in the same family for any gene Upload your own data for viewing in a genome. Find sequences matching your own from any genome. Determine how your variants affect genes and transcripts using the Variant Effect Predictor. Find genes involved in metabolic pathways Share Ensembl views with your colleagues and collaborators. Download data in different formats for any genome Use Ensembl Bacteria data in your own programs and scripts using the REST and Perl APIs 3 Need more help? Read our documentation View the FAQs Read some publications Stay in touch! E-mail the team with comments or questions at [email protected] Sign up to a mailing list Further reading Kersey, PJ et al. Ensembl Genomes 2013: scaling up access to genome-wide data Nucleic acids research 2014, 42 (D1): D546-D552 PMID: 24163254 doi:10.1093/nar/gkt979 Spudich, GM and Fernández-Suárez, XM Touring Ensembl: A practical guide to genome browsing BMC Genomics 2010, 11:295 (11 May 2010) 4 Exploring the Ensembl Bacteria genome browser Aims By the end of this section, you should know how to… …find a genome you’re interested in …find information about a genome …browse the genome Demo: Finding genomes The front page of Ensembl Bacteria is found at bacteria.ensembl.org. It contains lots of information and links to help you navigate: Link back to homepage Tools Help Search Top bar present on every page Search for a gene Information on Ensembl Bacteria Complete genome list News Whilst there is a full genome list for bacteria, the large number of genomes makes it hard to navigate. To find a particular genome, the 5 easiest way is to start to type the species name into the genome search box. A drop down list will appear with possible genomes. You can then select your choice. Alternatively, if you hit enter at this point, you’ll be taken to a search page where you can examine the possible matches. For example, to find a substrain of Clostridium difficile type in Clostridium d. The drop down contains various strains of Clostridium difficile. Let’s choose Clostridium difficile 630. This will take us to the genome homepage, where we can explore various features. Search this Annotation genome examples Comparative examples Launch the VEP tool 6 Exercise: Finding genomes Go to Ensembl Bacteria and find the species Belliella baltica. How many coding and non-coding genes does it have? Demo: The Region in detail view The Region in Detail view shows the genes and other features present in an area of the genome, and allows you to navigate around the genome. We’ll search for C. difficile as we did before. On the species homepage there’s a link to a sample region Chromosome:1889811-1890515. Click on it to jump to the Region in Detail page. Circular chromosome – plasmids may also be shown 100kb shown in this view 7 Let’s look at the chromosome in more detail: Origin of sequencing (sometimes this is also the origin of replication but not Gene density always) GC skew GC content Our region of interest You can jump to a different region of interest by dragging out the handles of the region slice. This gives you a red highlighted slice that you can click on to jump to a new region. Next, the upper panel in this display shows an overview of genes, colour-coded by type in the region: Contig Controls for Controls for configuration, configuration, sharing and sharing and export Gene export Region to show in detail 8 Finally the lower panel shows a detailed view of the region selected in the red box, including features such as transcripts: Controls for configuration, Contig sharing and export Transcript %GC You can find out more about features in either panel by clicking on them. There are also options to configure the images in bacteria by clicking the “Configure this page” button. This opens a new panel where you can choose new tracks to show in the display including repeats, genomic features and translated sequences: Controls for configuration, sharing and Active track export Inactive track 9 Exercises: The Region in Detail view Exercise – Exploring a genomic region in E. coli (a) Find the genome of E. coli str. K-12 substrain MG1655. Go to the region from 2,781,249 to 2,860,361 bp on the chromosome. What is the name of the sequence read on which this genome is based? (b) Zoom in on the proV gene. (c) Turn on the All Repeats track in this view. Are there any repeats near the proV gene? (d) Create a Share link for this display. Email it to yourself and open the link. (e) Export the genomic sequence of the region you are looking at in FASTA format. (f) Turn off all tracks you added to the Region in detail page. 10 Genes and transcripts Aims By the end of this section, you should know how to… …find out about a particular gene …search for a gene by keyword or name …search for a gene by sequence similarity Demo: The gene tab If you click on any one of the transcripts in the Region in detail image, a pop-up menu will appear, allowing you to jump directly to that gene or transcript. Links Another way to go to a gene of interest is to search directly for it. You can do this from the gene search box home page, from the search box in the top bar, or from the genome-specific search box on the home page for a particular genome. We’re going to use the gene search box on the home page. We’re going to look at the lacZ gene in E. coli. This gene encodes beta- galactosidase, an enzyme involved in the metabolism of lactose, and part of the well-studied lac operon. 11 From bacteria.ensembl.org, type lacZ into the gene search box and click the Go button: Gene search You will get a list of hits from all genomes: Result type Genome filter 12 To filter for a given