Training Materials
• Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ • If you wish to re-use these materials, please credit Ensembl for their creation • If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html
http://training.ensembl.org/events Browsing Genes and Genomes with
Ensembl
Astrid Gall Ensembl Outreach Officer, EMBL-EBI 13th Ensembl browser workshop Erasmus MC, Rotterdam EBI is an Outstation of the European Molecular Biology Laboratory. 13 - 14 February 2018 This Workshop
Today Tomorrow Introduction & Exploring the Comparative genomics Ensembl genome browser Coffee/Tea Coffee/Tea Regulation Genes and transcripts Lunch break Lunch break BioMart Variation Coffee/Tea Coffee/Tea Tools, custom data, track The Variant Effect Predictor (VEP) hubs, advanced access
http://training.ensembl.org/events Structure of the Workshop
Presentation: What the data/tool is How we produce/process the data
Demo: Follow along if Getting the data you want to Using the tool
Exercises: Trying things out for yourself (alone/pairs?) Going beyond the demo Not a test! http://training.ensembl.org/events Extra Exercises Course Materials
training.ensembl.org/events
• Presentation • Coursebook (demos) • Questionbook (exercises) • Copysheet (text file for exercises) • Answerbook (exercise answers)
http://training.ensembl.org/events Objectives
You will learn:
• What Ensembl is • What type of data you can explore in Ensembl • How to navigate the Ensembl browser website • Where to find help and documentation
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Introduction to Ensembl
EBI is an Outstation of the European Molecular Biology Laboratory. Outline
• Genome browsers • Ensembl features and access scales • Ensembl and Ensembl Genomes • Release Cycle • Genome Assembly
http://training.ensembl.org/events Why do we need Genome Browsers?
1977: 1st genome sequenced (5 kb) 2004: Human genome sequence finished (3 Gb)
http://training.ensembl.org/events Why do we need Genome Browsers?
CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCAC TTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTC CGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATT GAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACA GCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGA GTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGA ATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAA ATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCAT AAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATA AAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTT TTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACT AAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGA GACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTAC ACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACA TAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCA TATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCT AAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGA ATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTC CCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTA TTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCA TTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAAT TATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATG AGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAA GCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACAC TCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATC TTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTATGACTGTTAGCTA http://training.ensembl.org/events Genome Browsers unlock the Code
https://www.ensembl.org https://www.ncbi.nlm.nih. gov/projects/mapview
https://genome.ucsc.edu http://training.ensembl.org/events Ensembl Features • Genomes and gene builds for >100 species • Variation data • Alignments, gene trees, homologues • Regulatory build (ENCODE) • Gene expression and function
• BioMart (data export) • Tools for data processing, e.g. VEP • Display your own data • Programmatic access via APIs • Completely Open Source (FTP, GitHub) http://training.ensembl.org/events Access Scales
One by one Main browser Mobile site
BioMart REST API Perl API VEP MySQL
Groups FTP Whole genome
http://training.ensembl.org/events Ensembl - 100+ Vertebrate and Model Organism Genomes
http://training.ensembl.org/events Ensembl Genomes - Extending Ensembl across the Taxonomic Space
www.ensembl.org www.ensemblgenomes.org ● Vertebrates ● Bacteria
● Fungi
● Metazoa ● Other representative species ● Plants
● Protists http://training.ensembl.org/events Ensembl and Ensembl Genomes
Ensembl Ensembl Genomes Released 2000 2009 Species Vertebrates (fly, worm and Non-vertebrates (protists, yeast as outgroups) plants, fungi, metazoa, bacteria) Annotation By Ensembl In collaboration with the scientific communities URL www.ensembl.org www.ensemblgenomes.org
http://training.ensembl.org/events Release Cycle New/updated interfaces 9091 AugDec 2017 Updated New regulation genome data assemblies ~3 months Updated variation Underlying data software updates Comparative Genomics including new genes Updated and genomes http://training.ensembl.org/events gene sets How does a Genome Assembly work?
Sequence reads CGGCCTTTGGGCTCCGCCTTCAGCTCAAGA CAGCTGTCCCAGATGAC ACTTAACTTCCCTCCCAGCTGTCC GGGCTCCGCCTTCAGCTC TCCCAGCTGTCCCAGATGACGCCAT AACTTCCCTCCCAGCT CGGCCTTTGGGCTCC TCCGCCTTCAGCTCAAGACTTAACTTC CAGATGACGCC
Match up overlaps
CGGCCTTTGGGCTCCGCCTTCAGCTCAAGA AACTTCCCTCCCAGCT CAGATGACGCC TCCGCCTTCAGCTCAAGACTTAACTTC TCCCAGCTGTCCCAGATGACGCCAT GGGCTCCGCCTTCAGCTC ACTTAACTTCCCTCCCAGCTGTCC CGGCCTTTGGGCTCC CAGCTGTCCCAGATGAC
Contig CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCAT http://training.ensembl.org/events Cloning into Bacterial Artificial Chromosomes (BACs)
BL Cut into fragments & clone into BACs
BL001 BL002 BL003 BL004
Grow in bacterial cultures
http://training.ensembl.org/events Genome Assembly: Tilepath, contigs, scaffolds, chromosome
BACs from different BL001 IM004 individuals assembled, AL002 CM003 including overlaps: Tilepath
Overlaps trimmed to give contigs. A run of contigs with no gaps is a scaffold.
Genetic maps used to assemble scaffolds into a chromosome. http://training.ensembl.org/events Genome Contigs
A contig is a contiguous stretch of DNA sequence that BL102 has been assembled based on sequencing information.
BL AL476
AL CM553
CM IM768
IM http://training.ensembl.org/events Human Genome Assemblies
• GRCh38 (aka hg38) • No gaps. Many rare/private alleles replaced. • www.ensembl.org • Most up-to-date and supported • GRCh37 (aka hg19) • 250 gaps • grch37.ensembl.org • Limited data and software updates • NCBI36 (aka hg18) • 150,000 gaps • ncbi36.ensembl.org • No longer updated http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Exploring the Ensembl Genome Browser:
Homepage, Assemblies & Species
EBI is an Outstation of the European Molecular Biology Laboratory. Hands on We will look at the Ensembl homepage and find information about the genome assemblies and species in Ensembl.
• Demo: Coursebook page 7-13 • Exercises: Questionbook page 3 • Answers: Answerbook page 3-4
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Exploring the Ensembl Genome Browser:
The Location tab & Region in detail view
EBI is an Outstation of the European Molecular Biology Laboratory. Hands on We will look at a region of the human genome, 4:122868000-122946000, and manipulate the view to see the data we are interested in.
• Demo: Coursebook page 13-19 • Exercises: Questionbook page 3-5 • Answers: Answerbook page 4-8
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Genes and
Transcripts
EBI is an Outstation of the European Molecular Biology Laboratory. Outline
• Gene views in Ensembl • Annotation • Automatic • Manual • Imported • Golden transcripts, GENCODE, CCDS • Which transcript do I use? • Ensembl stable IDs • Gene Ontology
http://training.ensembl.org/events Gene Views
Coding exon Intron Non-coding exon Merged transcript
Protein coding transcript
Non-coding transcript
http://training.ensembl.org/events Ensembl and Havana Annotation
Automatic Annotation Manual Annotation
http://training.ensembl.org/events Automatic Annotation
• Genome-wide determination using the Ensembl automated pipeline
• Predictions based on experimental (biological) data
http://training.ensembl.org/events Biological Evidence International Nucleotide Sequence Database Collaboration • cDNAs • ESTs • RNAseq
Protein sequence databases • Swiss-Prot: Manually curated • TrEMBL: Unreviewed translations
NCBI RefSeq • Manually annotated proteins and mRNAs (NP, NM)
http://training.ensembl.org/events Other Species
• Infer genes from homology to other species E.g. predict genes in by mapping cDNAs/proteins from to the genome
• RNAseq data
http://training.ensembl.org/events Manual Annotation
• Genome-wide annotation
• Gene determination on a case-by-case basis by a person
• Uses data from databases and papers
http://training.ensembl.org/events Manual Annotation Advantages Disadvantages • More comprehensive • Slower • More transcripts per gene, • Small scale especially non-coding transcripts • More genes, especially non-coding genes • Require less evidence (quality > thresholds) • More accurate for difficult regions • UTRs • Splice sites • Single exon transcripts • Immunoglobulins http://training.ensembl.org/events Imported Annotation
• We have imported annotation for a small number of species from external sources • Quality ensured by: • Our rigorous QC • Trusted sources
Chinese hamster ovary Ryukyu mouse Shrew mouse
http://training.ensembl.org/events Golden (merged) Transcripts
• Identical annotation
•
• Higher confidence and quality
http://training.ensembl.org/events GENCODE
• The GENCODE gene set is the merged set of:
• Ensembl automatically annotated genes and
• Havana manually annotated genes
• It is the default gene set used by ENCODE, 1000 Genomes and other major projects.
http://training.ensembl.org/events CCDS Transcripts
• Consensus Coding DNA Sequence set • Agreement between EBI, WTSI, UCSC and NCBI
http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi
http://training.ensembl.org/events Which transcript do I use?
Transcript support level: Scored 1-5 for quality (1= best) https://www.ensembl.org/Help/Glossary?id=492 GENCODE basic: Prioritises full-length protein coding transcripts https://www.ensembl.org/Help/Glossary?id=500 APPRIS principal isoform: The major isoform(s) from combining protein structural information, functionally important residues and evidence from cross-species alignments https://www.ensembl.org/Help/Glossary?id=521 +CCDS and +Golden transcripts only http://training.ensembl.org/events Ensembl stable IDs ENSG########### Ensembl Gene ID ENST########### Ensembl Transcript ID ENSP########### Ensembl Peptide ID ENSE########### Ensembl Exon ID ENSR########### Ensembl Regulatory region ID
For non-human species a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG### https://www.ensembl.org/info/genome/stable_ids/index.html http://training.ensembl.org/events More Information
Aken et al. The Ensembl gene annotation system. Database (Oxford) (2016) doi: 10.1093/database/baw093 http://europepmc.org/articles/PMC4919035
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Why Gene Ontology (GO)?
Multiple terms for Innate Non-specific the same thing immunity immunity
Gene descriptions Complement Cytokines too specific Natural killer cells
Phagocytes Mast cells
http://training.ensembl.org/events GO terms form a Controlled Vocabulary
GO:0045087 - innate immune response
Innate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.
http://training.ensembl.org/events GO terms are hierarchical GO:0006955 immune response
GO:0045087 GO:0006957 innate immune response GO:0009682 complement activation, induced systemic alternative pathway resistance GO:0035420 GO:0042381 GO:0009814 MAPK cascade involved in defence response, hemolymph coagulation innate immune response incompatible interaction GO:0034342 GO:0001867 GO:0002227 GO:0009616 response to type II complement activation, innate immune response virus induced gene interferon lectin pathway in mucosa silencing GO:0034340 GO:0035006 GO:0002228 GO:0034341 response to type I melanisation defence natural killer cell response to interferon response mediated immunity interferon-gamma GO:0045089 GO:0009626 GO:0045824 GO:0045088 positive reg of innate plant-type negative reg of innate regulation of innate immune response hypersensitive response immune response immune response http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Hands on
• We will look at the ESPN gene and find information about the gene and its transcripts.
• Demo: Coursebook page 20-31 • Exercises: Questionbook page 6-8 • Answers: Answerbook page 9-13
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Variation
EBI is an Outstation of the European Molecular Biology Laboratory. Outline
• Classification of variants • Consequences of variants • Species with variation data & variant sources • Browsing variation data • Gene tab • Transcript tab • Location tab • Variant tab • Variant Effect Predictor
http://training.ensembl.org/events Variant Types
1) Small scale in one or few nucleotides of a gene • Single nucleotide polymorphism (SNP) • Small insertions and deletions (DIPs or indels) A G A C T T G A C C T G T C T - A A C T G G A T G A C T T G A C - T G T C T G A A C G G G A
2) Large scale in chromosomal structure (structural variation) • Copy number variants (CNV) • Large deletions/duplications, insertions, translocations
deletion duplication insertion translocation http://training.ensembl.org/events Variant Consequences
CODING CODING Synonymous Non-synonymous Regulatory (Missense)
ATG AAAAAAA
5’ Upstream 5’ UTR Splice site Intronic 3’ UTR 3’ Downstream
http://training.ensembl.org/events Consequence Terms: Sequence Ontology (SO)
http://www.ensembl.org/info/genome/variation/data_description.html#classes http://training.ensembl.org/events SIFT and PolyPhen
• Predict the effect of variants on amino acid sequences using scores based on:
• Level of conservation of the protein • Chemical change in the amino acid • 3D structure and domains (PolyPhen only)
• Only provide predictions, not facts • Predictions will never be as good as experimental validation
http://training.ensembl.org/events SIFT PolyPhen
1 1 Probably damaging 0.2 Possibly damaging 0.1
Tolerated
Benign
0.05 Deleterious 0 0 http://training.ensembl.org/events 22 Species with Variation Data
http://training.ensembl.org/events Species with Variation Data in
Ensembl Genomes
http://training.ensembl.org/events Variant Sources
https://www.ensembl.org/info/docs/variation/sources_documentation.html http://training.ensembl.org/events HapMap Project Genotyping: 1,301 individuals from 11 populations
CHD TSI JPT Chinese Tuscan Japanese CEU NW European CHB GIH Han Chinese Gujarati
MEX ASW LWK Mexican African Luhya YRI Yoruba American MKK Maasai
America Africa Europe East Asia Central-South Asia http://training.ensembl.org/events 1000 Genomes Project Sequencing: 2,500 individuals at 4X coverage
ITU STU FIN GBR CHB CEU IBR JPT GIH TSI CDX
MXL ASW PUR GW PJL CHS BEB MSL YRI LWK CLM ACB D ESN KHV
PEL
America Africa Europe East Asia Central-South Asia http://training.ensembl.org/events Genome Aggregation Database (GnomAD) Allele frequency data for >130,000 individuals from seven populations
http://gnomad.broadinstitute.org/about https://macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/ http://training.ensembl.org/events Human Population Genetics: Allele Frequencies
• HapMap: Worldwide, genotyping • 1000 Genomes: Worldwide, WGS, healthy individuals • ExAC: Exomes, diseased/healthy individuals, skewed to most studied populations • gnomAD: WGS and exomes, diseased/healthy individuals, skewed to most studied populations • UK10K: UK-wide, exomes, diseased individuals • TOPMed: WGS, diseased individuals
http://training.ensembl.org/events Reference Alleles BL102 BL102
BL AL476
AGTCGTAGCTAGC TAGGCCATAGGCGA
AL T is the allele in the contig used, therefore:
CM553 ● T is the reference allele ● G is the alternate allele
● Alleles are T/G CM IM768 Frequency T = 0.05, frequency G = 0.95 G is the allele in all primates
IM T causes disease susceptibility http://training.ensembl.org/events Allele Strand
AGTCGTAGCTAGC T/GAGGCCATAGGCGA
A/C GCTAGCTACGACT TCGCCTATGGCCT
Exon sequence: TATGGCCTA/CGCTAGC
Alleles in database = T/G Alleles in gene = A/C Alleles = A/C -ve strand or T/G +ve strand
Alleles = A/C or T/G Often lack further information http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Hands on
• We will look at the MCM6 gene and find variants in the gene. • We will look at the region of MCM6 and find variants in the region. • We will look at the variant rs4988235 and find information about it.
• Demo: Coursebook page 32-41 • Exercises: Questionbook page 9-10 • Answers: Answerbook page 14-16
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events What is the Variant Effect Predictor (VEP)?
A tool to predict and annotate the functional consequences of variants (SNPs, insertions, deletions, CNVs, other structural variants) Output • Affected gene, transcript Input and protein sequence • Variant coordinates • VCF • Pathogenicity /
• HGVS • Frequency data • Variant IDs • Regulatory consequences
• Splicing consequences
Ensembl resources and data (including Variation) • Literature citations http://training.ensembl.org/events https://www.ensembl.org/info/docs/variation/predicted_data.html VEP Features ● Ensembl Ensembl Regulatory Build: • Over 5,000 species ● RefSeq ● ENCODE ● Merged ● BLUEPRINT • ● GENCODE basic ● NIH Epigenomics Roadmap Consequence predictions Can be limited to specific cell types for different transcript sets ● dbSNP ● COSMIC • Consequence predictions ● ClinVar for regulatory regions ● ESP ● HGMD-Public ● 1000 Genomes ● PhenCode ● ESP • Known variants ● ExAC ● GnomAD
• Allele frequencies Via plugins: Built in: ● CADD ● SIFT & ● FATHMM • Pathogenicity predictions ● PolyPhen ● LRT ● MutationTaster ● Many more! • Phenotype/disease, clinical ● OMIM significance ● Orphanet ● GWAS catalog ● ClinVar http://training.ensembl.org/events Use the VEP!
Large user-base, including: ● Illumina ● CRUK ● Congenica ● Fabric Genomics ● Personalis
https://www.ensembl.org/info/docs/tools/vep/index.html http://training.ensembl.org/events Input Formats
Variant coordinates 1 881907 881906 -/C + 5 140532 140532 T/C + 12 1017956 1017956 T/A + 2 946507 946507 G/C + 14 19584687 19584687 C/T -
HGVS notation ENST00000285667.3:c.1047_1048insC 5:g.140532T>C NM_153681.2:c.7C>T ENSP00000439902.1:p.Ala2233Asp NP_000050.2:p.Ile2285Val
VCF #CHROM POS ID REF ALT 20 14370 rs6054257 G A 20 17330 . T A 20 1110696 rs6040355 A G,T 20 1230237 . T .
Variant IDs rs41293501 COSM327779 rs146120136 FANCD1:c.475G>A rs373400041 http://training.ensembl.org/events The VEP works for many Species
?
+ everything in Plants, Fungi, Metazoa, Protists and Bacteria http://training.ensembl.org/events Set up a Cache • To speed up your VEP script! • Use prebuilt caches for species in Ensembl or • Make your own from GTF and FASTA files - even for genomes not in Ensembl
✓
https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html http://training.ensembl.org/events VEP Plugins
• Add extra functionality • Can extend, filter or manipulate the output • May make use of external data or code • Are available with both web tool and script
http://training.ensembl.org/events More Information
McLaren et al. The Ensembl Variant Effect Predictor. Genome Biology (2016) 17:122 http://europepmc.org/abstract/MED/27268795
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Hands on We have identified four variants on human chromosome nine, an A deletion at 128328461, C->A at 128322349, C->G at 128323079 and G->A at 128322917.
We will use the VEP to determine: • If the variants have been annotated in Ensembl already • Which genes are affected by the variants • If any of these variants affect gene regulation
• Demo: Coursebook page 41-46 • Exercises: Coursebook page 10-11 • Answers: Answerbook page 16-17
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Comparative Genomics
EBI is an Outstation of the European Molecular Biology Laboratory. Outline
• Comparative genomics: Applications and species • Gene trees • Homology predictions • Whole genome alignments • Pairwise • Multiple • Shared synteny
http://training.ensembl.org/events Applications of Comparative Genomics
• Evolutionary analysis • Differences between the genomes of species • Gene function based on homology • Distribution of highly conserved regions
http://training.ensembl.org/events Comparative Analysis by Taxa
Ensembl Ensembl Metazoa Compara Compara
http://training.ensembl.org/events Pan-taxonomic Compara
http://ensemblgenomes.org/info/ http://training.ensembl.org/events genomes?pan_compara=1 Pan-taxonomic Compara
Ensembl Metazoa Compara
http://training.ensembl.org/events Gene Trees
• Representative protein translation of each gene from all species in Ensembl • BLAST, clustering, multiple alignment • Phylogenetic tree for each aligned cluster • Reconciliation against species tree inferred from taxonomy • Orthologue/Paralogue inference • dN/dS values (optional) Orthologues and Paralogues
http://www.ensembl.org/info/docs/compara/h http://training.ensembl.org/events omology_method.html Homology Relationships
Paralogues Orthologues Genes emerged Genes emerged through duplication through speciation e.g. e.g. c1 and c2 c1 and h1 h1 and h2 h2 and m c2 and m
c1 h1 c2 h2 m
One-to-one One-to-many
Speciation Duplication http://training.ensembl.org/events More Information
Herrero J. et al Ensembl comparative genomics resources. Database (Oxford) (2016) doi: 10.1093/database/bav096 http://europepmc.org/abstract/MED/26896847
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Hands on
We will look at the BRCA2 gene to find gene trees and homologues.
• Demo: Coursebook page 47-50
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Whole Genome Alignments Pairwise and multiple
• To identify highly conserved regions • Sequences that evolve slowly • Regions likely to be functional • Coding or non-coding sequences • To spot problematic gene predictions • To define syntenic regions
http://training.ensembl.org/events Alignments
• Pairwise alignments with BLASTZ (older) LASTZ-net (newer)
• EPO (Enredo-Pecan-Ortheus) analysis • For selected sets (12 fish, 7 sauropsids, 67 eutherian mammals, 12 primates)
• Mercator-Pecan analysis • For 31 amniota vertebrates (mammals and birds)
http://www.ensembl.org/info/genome/c http://training.ensembl.org/events ompara/analyses.html#pecan Shared Synteny
http://www.ensembl.org/info/docs/ http://training.ensembl.org/events compara/analyses.html Questions?
http://training.ensembl.org/events Hands on
We will look at a region of the human genome, 2:176087000-176202000, which contains the HoxD cluster to find alignments and conserved regions.
• Demo: Coursebook page 50-54 • Exercises: Questionbook page 12-14 • Answers: Answerbook page 18-22
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Ensembl Regulation
EBI is an Outstation of the European Molecular Biology Laboratory. Outline - Ensembl Regulation Annotation of the genome with features that may play a role in transcriptional regulation of genes • Data types & detection methods that inform the Ensembl Regulatory Build • Epigenetic marks • DNA methylation • Histone modifications • Transcription Factor and RNA Pol binding • Open/closed chromatin • Current and future data in Ensembl • Processing the data
http://training.ensembl.org/events Epigenetics
The study of heritable changes in gene function, without changes in the DNA sequence.
These changes regulate gene expression.
http://training.ensembl.org/events Epigenetic Change -> Cell Differentiation
Stem cell
Differentiated cell http://training.ensembl.org/events Epigenetic Change -> Cell Differentiation
http://training.ensembl.org/events Epigenetic Change -> Cell Differentiation
The cells have a different: • Function • Morphology • Gene expression • Epigenome http://training.ensembl.org/events DNA Methylation of a Promoter Region represses Transcription
CHCH CH CH
CH3 3 3 CH3 CH3 3 CH3 3
CC CG CAATCC CGCG ATCTAA CGCGCG GCTAGCATAAG CGCG GGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAGCGGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAGCGCTATCAG
3 3 3 3
3 3 3 3 CH CH CH CH CH CH CH CH
http://training.ensembl.org/events Bisulfite Sequencing To detect DNA Methylation
CH3 CH3 CH3 GGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAG
CH3 CH3
Bisulfite treatment
GGUGGGATTGUGUGTTAGATCGCGCGUTTATGUTAGUCGCGUTGATAG
Sequence and compare to reference http://training.ensembl.org/events Histone Modifications
We describe histone modifications using the form: Subunit, Amino acid, Position, Modification, e.g. H3K36me3. http://training.ensembl.org/events Histone Code
Modification Histone
H3K4 H3K9 H3K14 H3K27 H3K79 H4K20 H2BK5
me1
me2
me3
ac
http://training.ensembl.org/events Promoters, Enhancers etc
• Transcription factor (TF)-binding to promoters and enhancers is necessary for transcription • Combinations of epigenetic marks affect the ability and probability of TF-binding at these sites
http://training.ensembl.org/events ChIP-seq To detect Histone Modifications and TF-binding DNA-binding protein DNA Antibody Crosslink protein to DNA
Covalent Unlink, wash, bond purify DNA
Shear DNA
Sequence DNA fragments Pull down with an antibody binding ACGCTGACTAGAATCAATGGCT protein of interest TCTCTTCGCATATGGCTGACTA http://training.ensembl.org/events Open/closed Chromatin
Open chromatin is transcriptionally active. Closed chromatin is inactive.
http://training.ensembl.org/events DNase Sensitivity To assay open/closed Chromatin
DNase treatment
Purify DNA
Sequence and compare to reference http://training.ensembl.org/events Detecting Features involved in Regulation of Genes
Feature Detection Method
DNA methylation Bilsulfite sequencing
Histone modifications ChIP-seq
Transcription factor binding ChIP-seq
Open/closed chromatin DNase sensitivity
http://training.ensembl.org/events Current Data
http://training.ensembl.org/events The Future
http://training.ensembl.org/events A Subset of Cell Types
• Ensembl only displays a subset of available data. • We display cell types that have, at a minimum: • CTCF binding • DNase or FAIRE data • H3K4me3, H3K27me3, H3K36me3 data • We display all TFBS and histone modification data known for these cell types. • We process these data to predict activity.
• You can add further data using track hubs.
http://training.ensembl.org/events Processing the Data
• We take raw data from the various sources. • We predict the positions of regulatory features, such as promoters, enhancers and insulators. • We predict the activity of these features in different cell types.
• You can view all of this in the genome browser.
http://training.ensembl.org/events Raw Data
Transcription Factor A Transcription Factor B Transcription Factor C Histone mod1 Histone mod2 Histone mod3
http://training.ensembl.org/events Searching for Patterns
known promoter
known promoter
known promoter
http://training.ensembl.org/events Segmentation
Transcription Factor A Transcription Factor B Transcription Factor C
Histone mod1 Histone mod2 Histone mod3
http://training.ensembl.org/events MultiCell Features
Cell type 1
Cell type 2
Cell type 3
Cell type 4
http://training.ensembl.org/events Cell-specific Features
MultiCell
Cell type 1
Cell type 2
Cell type 3
Cell type 4
http://training.ensembl.org/events We do not…
• …link promoters/enhancers/insulators or any other regulatory features to genes. We allow you to see what is where and to make your own inferences.
• …link regulatory features to gene expression. We have cell-line specific regulation data and tissue specific expression data – make of it what you will.
Regulatory data is incredibly complex and still in relative infancy. There is no comprehensive database of regulation data.
http://training.ensembl.org/events More Information
Zerbino et al. The Ensembl Regulatory Build Genome Biology (2015) 16:56 http://europepmc.org/abstract/MED/25887522
Zerbino et al. Ensembl regulation resources Database (Oxford) (2016) doi: 10.1093/database/bav119 http://europepmc.org/abstract/MED/26888907
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Hands on
• We will look at the region of the gene LIMD2 and find regulatory features. • We will explore what cells types they are active in and what evidence there is to support this.
• Demo: Coursebook page 55-62 • Exercises: Questionbook page 15 • Answers: Answerbook page 23-25
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Data Mining with BioMart
EBI is an Outstation of the European Molecular Biology Laboratory. Outline
● Introduction to BioMart ● The Principle: Four steps ● biomaRt
http://training.ensembl.org/events What is BioMart?
A tool in your browser to:
• Build queries with a few mouse clicks • Export data without programming • Generate custom data tables and files
http://training.ensembl.org/events Why use BioMart?
Because some tasks are too time consuming and/or difficult with the Ensembl browser
• Query multiple things (genes/variants) at once:
• ID conversions • Gene locations • Download sequences
• Export medium to large amounts of data http://training.ensembl.org/events Where can I find BioMart?
www.ensembl.org/biomart/martview
metazoa.ensembl.org/biomart/martview
http://training.ensembl.org/events For which Genomes is BioMart available?
Ensembl Ensembl Fungi (some exceptions) Ensembl Metazoa Ensembl Plants Ensembl Protists (some exceptions)
http://training.ensembl.org/events How do I use BioMart? The four steps
Dataset Filters Attributes Results Choose Narrow Specify what View & database down the to include as export table/ & species dataset your output sequences
http://training.ensembl.org/events Step 1: Dataset
Dataset Filters Attributes Results
Define the database you want to search with your filters: • Genes • Variation • Regulation • Mouse strains
Define the species/mouse strain http://training.ensembl.org/events Step 2: Filters
Dataset Filters Attributes Results
Define a (large) set of genes/variants by combining parameters, e.g.:
• A region Filter 1 Filter 2 • A list of IDs • Function (GO term) • Phenotypes
http://training.ensembl.org/events Get Attributes for these Step 3: Attributes
Dataset Filters Attributes Results
Define the data you want for that set, e.g.: • IDs • Features • Variants • Orthologues/Paralogues • Sequences
http://training.ensembl.org/events Step 4: Results
Dataset Filters Attributes Results
View and export the data table/sequences in a number of formats: • html • csv • tsv • xls • fasta
http://training.ensembl.org/events biomaRt: A Bioconductor package for BioMart
http://www.bioconductor.org/packages/release/bioc/html/biomaRt.html
Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data using the R statistical programming language
• Easy installation in R: source("http://bioconductor.org/biocLite.R") biocLite("biomaRt") • Documentation: http://www.bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/d oc/biomaRt.html http://training.ensembl.org/events More Information
Kinsella R.J. et al Ensembl BioMarts: a hub for data retrieval across taxonomic space Database (Oxford) (2011) doi: 10.1093/database/bar030 http://europepmc.org/abstract/MED/21785142
Smedley, D. et al BioMart – biological queries made easy BMC Genomics (2009) 10:22 http://europepmc.org/abstract/MED/19144180
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Hands on
We will look at a set of six Homo sapiens genes, ESPN, MYH9, USH1C, CISD2, THRB and GIPC3, and find out: • Their NCBI gene IDs • Their function via GO terms • Their cDNA sequences
Dataset Filters Attributes Results Gene set Gene names NCBI gene IDs A. gambiae List of GO terms Homogenes sapiens names cDNA sequences
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Hands on
We will look at a set of six Homo sapiens genes, ESPN, MYH9, USH1C, CISD2, THRB and GIPC3, and find out: • Their NCBI gene IDs • Their function via GO terms • Their cDNA sequences
• Demo: Coursebook page 63-67 • Exercises: Questionbook page 16-18 • Answers: Answerbook page 26-31
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Ensembl Tools
EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools
My favourite tool does not I want to work with GRCh38! work with the file I But all my data has been downloaded from the Ensembl generated with GRCh37... FTP site!
Are there any viral sequences in my genome? And where? Is my variant in Linkage Disequilibrium with Using my new PCR other variants? primers, I see more than one band on the gel. Where do my primers bind? And how The VCF files from the 1000 often? Genomes project are huge! It I cannot find the gene I takes too long to download work on, or read them and I ran out of disk about, in Ensembl space… And I am only anymore! interested in the MC1R gene in the GBR population. http://training.ensembl.org/events Ensembl Tools
https://www.ensembl.org/info/docs/tools/index.html http://training.ensembl.org/events BLAST/BLAT To search our genomes for your DNA or protein sequence • Up to 30 sequences against on one or multiple genomes • Output: • Table with standard BLAST/BLAT output • Visualisation on karyotype and query sequence • Link to Location tab viewing the result in Region in detail view • For all species in Ensembl and text, FASTA or sequence ID input
https://www.ensembl.org/Help/View?db=core;id=451 http://training.ensembl.org/events File Chameleon To convert Ensembl files for use with other analysis tools • Files on the Ensembl FTP site are in standard formats, but… … different tools define the standards differently • Your favourite tool may need files in a slightly different format • Reformat files with these adjustments and download them • For all species in Ensembl and GFF3, GTF or FASTA files
http://www.ensembl.org/Homo_sapiens/Tools/FileChameleon?db=cor http://training.ensembl.org/events e Assembly Converter To map (liftover) your data's coordinates to the current assembly [or vice versa]
• Uses CrossMap: http://crossmap.sourceforge.net • Convert coordinates on one genome assembly to another, e.g.
• Only for and BED, GFF, GTF, VCF or WIG files http://training.ensembl.org/events ID History Converter To convert a set of Ensembl IDs from a previous release into their current equivalents • Find what Ensembl IDs map to in the current release and see their history through releases • Only for Ensembl stable IDs starting with ENS • For all species in Ensembl
http://training.ensembl.org/events Linkage Disequilibrium Calculator To calculate LD between variants using genotypes from a selected population
• Calculate pairwise LD: • in a given region • for a given list of variants • for a given variant within a defined window size • Get a table with: • Variant 1 & 2 ID, location, consequence, evidence • r2 and D’ • Currently (!) only for data from 1000 Genomes project phase 3
https://www.ensembl.org/Help/View?db=core;id=587 http://training.ensembl.org/events Ensembl Tools (GRCh37)
All tools from GRCh38 plus:
https://grch37.ensembl.org/info/docs/tools/index.html http://training.ensembl.org/events Data Slicer To get a subset of data from a BAM or VCF file • Uses SAMtools: http://samtools.sourceforge.net and VCFtools: https://vcftools.github.io/specs.html • BAM or VCF files for whole genomes contain all variants and genotypes of all individuals • You may want to analyse a small region, e.g. a gene, and a single population only • Take a slice of a file based on coordinates (BAM or VCF), and filtering by individuals or populations (VCF only) • Use data from 1000 Genomes project phase 1 or 3 or provide URL to your file • Only for and
http://grch37.ensembl.org/Homo_sapiens/Tools/DataSlicer?db=core http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Custom Data
EBI is an Outstation of the European Molecular Biology Laboratory. Visualise your own Data Add custom tracks with your own data
• BED • WIG • BedGraph • BAM/CRAM • GFF2/GTF • BigBed • GFF3 • BigWig • Pairwise interactions (WashU) • VCF • PSL
http://www.ensembl.org/info/website/tutorials/userdata.html http://www.ensembl.org/info/website/upload/index.html#format http://training.ensembl.org/eventss Adding Custom Tracks
• Upload a file to our servers (max 20MB unzipped) from: • Your own computer or • A location on the web (URL-based data) • Attach an indexed file via a URL • E.g. BigWig, BigBed or BAM • The file will be read each time you visit a page that has the track on.
• Tracks can be saved to your account and shared with others • Only trivial security, do not use for sensitive data
http://www.ensembl.org/info/website/upload/index.html#addtrack http://training.ensembl.org/events Hands on
We will upload a small BED file and view it in the browser. The BED format consists of one line per feature, each containing 3-12 columns of data (chrom, chromStart, chromEnd, name…)
chr5 36821632 37091234 P1 chr5 36731476 36978306 P2 chr5 36908552 37108671 P3
• Demo: Coursebook page 68-70
http://www.ensembl.org/info/website/upload/bed.html
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Track Hubs
EBI is an Outstation of the European Molecular Biology Laboratory. Track Hubs
• The Track Hub Registry (https://trackhubregistry.org) is a global centralised collection of publicly accessible track hubs.
• Many large projects (ENCODE, Roadmap Epigenomics, Blueprint) make data available in sets of tracks organised as a single hub.
• Discover and use track hubs with different types of data and view it in Ensembl and/or Create a track hub containing your own data to share with the public and collaborators!
http://www.ensembl.org/info/website/public_trackhubs.html
http://training.ensembl.org/events Hands on
We will add the Blueprint Track Hub and view a number of tracks containing data from the Blueprint project.
• Demo: Coursebook page 71-74
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Advanced Access
EBI is an Outstation of the European Molecular Biology Laboratory. Access Scales
One by one Main browser Mobile site
BioMart REST API Perl API VEP MySQL
Groups FTP Whole genome
http://training.ensembl.org/events Advanced Access to Data in Ensembl
To access data at larger scales: • FTP: Download files of complete Ensembl databases • MySQL: Direct database access • Perl API: Programmatic access • Ensembl Virtual Machine • REST API: Language agnostic access
http://training.ensembl.org/events FTP Site
Files of our complete databases • Genomic DNA, cDNA, CDS, ncRNA and protein sequence (FASTA) • Annotated sequence (EMBL, GenBank) • Gene sets (GTF, GFF3) • Variation (GVF, VCF) • Regulation (GFF, BED, BigWig) • Whole-genome multiple and gene-based multiple alignments (MAF) • Constrained elements (BED) • RNA-Seq files (BAM, BigWig) • MySQL databases
http://training.ensembl.org/events Access our FTP Site ftp://ftp.ensembl.org/pub/
Your favourite FTP Command line FTPclient downloads webpage: https://www.ensembl.org/info/data/ftp/index.html
http://training.ensembl.org/events Files on the FTP Site are large
• Multiple Mb/Gb • Lots of time to download/unzip
• Do you really need this data? • Make sure it is the right file before you download it. • Use a fast (wired) connection.
http://training.ensembl.org/events FTP Site Summary
Skills needed Web-browsing or FTP client use. Handling and parsing the files types. Scalability Complete database only. Speed Many minutes to download/unzip a file. Difficulty to query Files easy to download/unzip. Long-term New files with each release. File types are identical. Sequences? Yes.
http://training.ensembl.org/events Questions?
http://training.ensembl.org/events What is REST?
• Representational state transfer (REST) allows you to query the database using simple URLs giving output in plain text format
E.g. http://rest.ensembl.org/xrefs/symbol/homo_sapiens/BRCA2?con tent-type=application/json gives: [{"type":"gene","id":"ENSG00000139618"},{"type":"gene","id" :"LRG_293"}]
• This means you can write scripts in any language to construct these URLs and read their output http://training.ensembl.org/events REST API
• We have had a Perl API for a long time … … but not everybody works in Perl
• The REST API allows language agnostic access
• Visit rest.ensembl.org for installation, documentation and examples
Use grch37.rest.ensembl.org for GRCh37
http://training.ensembl.org/events Hands on: REST API - Single Endpoint
• I have the coordinates of a protein motif in a protein. I would like to find the coordinates of the motif in the genome.
• I am interested in a coiled-coil domain at position 116-216 in the protein ENSP00000386200.
• Demo: Coursebook page 75-77
http://training.ensembl.org/events Hands on: REST API - Single Endpoint
• I want to get a FASTA sequence for an Ensembl gene ID • I need to find an appropriate endpoint: http://rest.ensembl.org/
http://rest.ensembl.org/sequence/id/ENSG00000157764
gives:
>ENSG00000157764 chromosome:GRCh38:7:140719327:140924764:-1 CGCCTCCCTTCCCCCTCCCCGCCCGACAGCGGCCGCTCGGGCCCCGGCTCTCGGTTATAA GATGGCGGCGCTGAGCGGTGGCGGTGGTGGCGGCGCGGAGCCGGGCCAGGCTCTGTTCAA CGGGGACATGGAGCCCGAGGCCGGCGCCGGCGCCGGCGCCGCGGCCTCTTCGGCTGCGGA http://training.ensembl.org/events Hands on: REST API - Multiple Endpoints
• I want a script that gets a gene name from the command line and prints gene name, ID and sequence.
• There is no one endpoint that does this action, so we have to combine two endpoints with a script.
http://training.ensembl.org/events Hands on: Python Script
#!/usr/bin/env python # decode the json output import json # Get modules needed for script genes = json.loads(get_genes) import sys import urllib # move through the genes one-by-one import urllib2 for gene in genes: import json import time # define the REST query to get the sequence import httplib2, sys from the gene http = httplib2.Http(".cache") ext_get_seq = '/sequence/id/' + gene['id'] + '?';
# Get the gene name from the command line # submit the query gene_name = sys.argv[1] resp, get_seq = http.request(server+ext_get_seq, method="GET", # define the general URL parameters headers={"Content-Type":"application/json"}) server = "http://rest.ensembl.org" # decode the json output # define REST query to get the gene ID from the import json gene name seq = json.loads(get_seq) ext_get_gene = "/xrefs/symbol/homo_sapiens/" + gene_name + "?" # print the gene name, ID and sequence
print '>', gene_name, gene['id'], "\n", # submit the query seq['seq'] resp, get_genes = http.request(server+ext_get_gene, method="GET", headers={"Content-Type":"application/json"}) http://training.ensembl.org/events REST API Summary Skills needed Programming in any language. Understanding of features in Ensembl. Scalability Can query whole database. Speed Minimum query speed 150ms. Time cost per datapoint. Difficulty to query Need to construct URLs. May need scripts to dissect data. Long-term API takes database changes into account. Scripts do not need to change between releases. Sequences? Yes. http://training.ensembl.org/events Questions?
http://training.ensembl.org/events MySQL
Direct database access using MySQL queries
mysql -u anonymous -h ensembldb.ensembl.org
mysql> use homo_sapiens_core_82_38;
mysql> SELECT gene.stable_id FROM gene, xref WHERE gene.display_xref_id = xref.xref_id AND xref.display_label LIKE ’brca2';
https://www.ensembl.org/info/data/mysql.html http://training.ensembl.org/events Perl API
Database querying using Perl scripts
my $gene_adaptor = $registry-> get_adaptor( 'human', 'core', ‘gene' );
my $gene = $gene_adaptor-> fetch_by_display_label( 'brca2' );
print $gene->stable_id, "\n";
https://www.ensembl.org/info/data/api.html http://training.ensembl.org/events Questions?
http://training.ensembl.org/events Feedback
http://training.ensembl.org/events/
http://training.ensembl.org/events Summary
Ensembl is a genome browser which integrates: • Genome annotation • Variation • Regulation • Comparative Genomics
http://training.ensembl.org/events How is all this Data organised?
• Ensembl browser websites • Main website, Pre!, Archive!, GRCh37, Ensembl Genomes • BioMart ‘Data Mining tool’ • Ensembl Database (open source) • Perl API, REST API, MySQL • FTP server • http://www.ensembl.org/info/data/ftp/index.htm l http://training.ensembl.org/events Help and Documentation
Courses https://www.ebi.ac.uk/training/online/ Tutorials www.ensembl.org/info/website/tutorials
Flash animations www.youtube.com/user/EnsemblHelpdesk http://u.youku.com/Ensemblhelpdesk
Email us [email protected] Public mailing lists [email protected] [email protected]
http://training.ensembl.org/events Publications
http://www.ensembl.org/info/about/publications.html
Zerbino D. et al Ensembl 2018 Nucleic Acids Research (2017) gkx1098, doi.org/10.1093/nar/gkx1098 https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkx1098/4634002
Xosé M. Fernández-Suárez and Michael K. Schuster Using the Ensembl Genome Server to Browse Genomic Sequence Data. Current Protocols in Bioinformatics (2010) 30:1.15.1-1.15.48 http://europepmc.org/abstract/MED/20521244
Giulietta M. Spudich and Xosé M. Fernández-Suárez Touring Ensembl: A practical guide to genome browsing BMC Genomics (2010) 11:295 http://europepmc.org/articles/PMC2894802
and topic-specific publications mentioned throughout the workshop http://training.ensembl.org/events Follow us
www.facebook.com/Ensembl.org @Ensembl www.ensembl.info @AstridGAGall
http://training.ensembl.org/events Ensembl Acknowledgements The Entire Ensembl Team Daniel R. Zerbino1, Premanand Achuthan1, Wasiu Akanni1, M. Ridwan Amode1, Daniel Barrell1,2, 1 1 1 1 1 1 Jyothish Bhai , Konstantinos Billis , Carla Cummins , Astrid Gall , Carlos García Giroń , Laurent Gil , Leo Gordon1, Leanne Haggerty1, Erin Haskell1, Thibaut Hourlier1, Osagie G. Izuogu1, Sophie H. Janacek1, Thomas Juettemann1, Jimmy Kiang To1, Matthew R. Laird1, Ilias Lavidas1, Zhicheng Liu1, Jane E. Loveland1, Thomas Maurel1, William McLaren1, Benjamin Moore1, Jonathan Mudge1, Daniel N. Murphy1, Victoria Newman1, Michael Nuhn1, Denye Ogeh1, Chuang Kee Ong1, Anne Parker1, Mateus Patricio1, Harpreet Singh Riat1, Helen Schuilenburg1, Dan Sheppard1, Helen Sparrow1, Kieron Taylor1, Anja Thormann1, Alessandro Vullo1, Brandon Walts1, Amonida Zadissa1, Adam Frankish1, Sarah E. Hunt1, Myrto Kostadima1, Nicholas Langridge1, Fergal J. Martin1, Matthieu Muffato1, Emily Perry1, Magali Ruffier1, Dan M. Staines1, Stephen J. Trevanion1, Bronwen L. Aken1, Fiona Cunningham1, Andrew Yates1 and Paul Flicek1,3
1European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK, 2Eagle Genomics Ltd., Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK and 3Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Funding
Co-funded by the European Union http://training.ensembl.org/events Training Materials
• Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ • If you wish to re-use these materials, please credit Ensembl for their creation • If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html
http://training.ensembl.org/events