Training Materials

• Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ • If you wish to re-use these materials, please credit Ensembl for their creation • If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html

http://training.ensembl.org/events Browsing and with

Ensembl

Astrid Gall Ensembl Outreach Officer, EMBL-EBI 13th Ensembl browser workshop Erasmus MC, Rotterdam EBI is an Outstation of the European Molecular Laboratory. 13 - 14 February 2018 This Workshop

Today Tomorrow Introduction & Exploring the Comparative Ensembl browser Coffee/Tea Coffee/Tea Regulation Genes and transcripts Lunch break Lunch break BioMart Variation Coffee/Tea Coffee/Tea Tools, custom data, track The Variant Effect Predictor (VEP) hubs, advanced access

http://training.ensembl.org/events Structure of the Workshop

Presentation: What the data/tool is How we produce/process the data

Demo: Follow along if Getting the data you want to Using the tool

Exercises: Trying things out for yourself (alone/pairs?) Going beyond the demo Not a test! http://training.ensembl.org/events Extra Exercises Course Materials

training.ensembl.org/events

• Presentation • Coursebook (demos) • Questionbook (exercises) • Copysheet (text file for exercises) • Answerbook (exercise answers)

http://training.ensembl.org/events Objectives

You will learn:

• What Ensembl is • What type of data you can explore in Ensembl • How to navigate the Ensembl browser website • Where to find help and documentation

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Introduction to Ensembl

EBI is an Outstation of the European Molecular Biology Laboratory. Outline

• Genome browsers • Ensembl features and access scales • Ensembl and • Release Cycle • Genome Assembly

http://training.ensembl.org/events Why do we need Genome Browsers?

1977: 1st genome sequenced (5 kb) 2004: Human genome sequence finished (3 Gb)

http://training.ensembl.org/events Why do we need Genome Browsers?

CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCAC TTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTC CGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATT GAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACA GCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGA GTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGA ATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAA ATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCAT AAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATA AAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTT TTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACT AAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGA GACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTAC ACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACA TAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCA TATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCT AAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGA ATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTC CCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTA TTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCA TTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAAT TATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATG AGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAA GCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACAC TCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATC TTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTATGACTGTTAGCTA http://training.ensembl.org/events Genome Browsers unlock the Code

https://www.ensembl.org https://www.ncbi.nlm.nih. gov/projects/mapview

https://genome.ucsc.edu http://training.ensembl.org/events Ensembl Features • Genomes and builds for >100 species • Variation data • Alignments, gene trees, homologues • Regulatory build (ENCODE) • Gene expression and function

• BioMart (data export) • Tools for data processing, e.g. VEP • Display your own data • Programmatic access via APIs • Completely Open Source (FTP, GitHub) http://training.ensembl.org/events Access Scales

One by one Main browser Mobile site

BioMart REST API Perl API VEP MySQL

Groups FTP Whole genome

http://training.ensembl.org/events Ensembl - 100+ Vertebrate and Model Genomes

http://training.ensembl.org/events Ensembl Genomes - Extending Ensembl across the Taxonomic Space

www.ensembl.org www.ensemblgenomes.org ● Vertebrates ●

● Fungi

● Metazoa ● Other representative species ●

http://training.ensembl.org/events Ensembl and Ensembl Genomes

Ensembl Ensembl Genomes Released 2000 2009 Species Vertebrates (fly, worm and Non-vertebrates (protists, yeast as outgroups) plants, fungi, metazoa, bacteria) Annotation By Ensembl In collaboration with the scientific communities URL www.ensembl.org www.ensemblgenomes.org

http://training.ensembl.org/events Release Cycle New/updated interfaces 9091 AugDec 2017 Updated New regulation genome data assemblies ~3 months Updated variation Underlying data software updates including new genes Updated and genomes http://training.ensembl.org/events gene sets How does a Genome Assembly work?

Sequence reads CGGCCTTTGGGCTCCGCCTTCAGCTCAAGA CAGCTGTCCCAGATGAC ACTTAACTTCCCTCCCAGCTGTCC GGGCTCCGCCTTCAGCTC TCCCAGCTGTCCCAGATGACGCCAT AACTTCCCTCCCAGCT CGGCCTTTGGGCTCC TCCGCCTTCAGCTCAAGACTTAACTTC CAGATGACGCC

Match up overlaps

CGGCCTTTGGGCTCCGCCTTCAGCTCAAGA AACTTCCCTCCCAGCT CAGATGACGCC TCCGCCTTCAGCTCAAGACTTAACTTC TCCCAGCTGTCCCAGATGACGCCAT GGGCTCCGCCTTCAGCTC ACTTAACTTCCCTCCCAGCTGTCC CGGCCTTTGGGCTCC CAGCTGTCCCAGATGAC

Contig CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCAT http://training.ensembl.org/events Cloning into Bacterial Artificial (BACs)

BL Cut into fragments & clone into BACs

BL001 BL002 BL003 BL004

Grow in bacterial cultures

http://training.ensembl.org/events Genome Assembly: Tilepath, contigs, scaffolds,

BACs from different BL001 IM004 individuals assembled, AL002 CM003 including overlaps: Tilepath

Overlaps trimmed to give contigs. A run of contigs with no gaps is a scaffold.

Genetic maps used to assemble scaffolds into a chromosome. http://training.ensembl.org/events Genome Contigs

A contig is a contiguous stretch of DNA sequence that BL102 has been assembled based on sequencing information.

BL AL476

AL CM553

CM IM768

IM http://training.ensembl.org/events Human Genome Assemblies

• GRCh38 (aka hg38) • No gaps. Many rare/private alleles replaced. • www.ensembl.org • Most up-to-date and supported • GRCh37 (aka hg19) • 250 gaps • grch37.ensembl.org • Limited data and software updates • NCBI36 (aka hg18) • 150,000 gaps • ncbi36.ensembl.org • No longer updated http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Exploring the Ensembl Genome Browser:

Homepage, Assemblies & Species

EBI is an Outstation of the European Molecular Biology Laboratory. Hands on We will look at the Ensembl homepage and find information about the genome assemblies and species in Ensembl.

• Demo: Coursebook page 7-13 • Exercises: Questionbook page 3 • Answers: Answerbook page 3-4

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Exploring the Ensembl Genome Browser:

The Location tab & Region in detail view

EBI is an Outstation of the European Molecular Biology Laboratory. Hands on We will look at a region of the human genome, 4:122868000-122946000, and manipulate the view to see the data we are interested in.

• Demo: Coursebook page 13-19 • Exercises: Questionbook page 3-5 • Answers: Answerbook page 4-8

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Genes and

Transcripts

EBI is an Outstation of the European Molecular Biology Laboratory. Outline

• Gene views in Ensembl • Annotation • Automatic • Manual • Imported • Golden transcripts, GENCODE, CCDS • Which transcript do I use? • Ensembl stable IDs • Gene Ontology

http://training.ensembl.org/events Gene Views

Coding Intron Non-coding exon Merged transcript

Protein coding transcript

Non-coding transcript

http://training.ensembl.org/events Ensembl and Havana Annotation

Automatic Annotation Manual Annotation

http://training.ensembl.org/events Automatic Annotation

• Genome-wide determination using the Ensembl automated pipeline

• Predictions based on experimental (biological) data

http://training.ensembl.org/events Biological Evidence International Nucleotide Sequence Database Collaboration • cDNAs • ESTs • RNAseq

Protein sequence databases • Swiss-Prot: Manually curated • TrEMBL: Unreviewed translations

NCBI RefSeq • Manually annotated and mRNAs (NP, NM)

http://training.ensembl.org/events Other Species

• Infer genes from to other species E.g. predict genes in by mapping cDNAs/proteins from to the genome

• RNAseq data

http://training.ensembl.org/events Manual Annotation

• Genome-wide annotation

• Gene determination on a case-by-case basis by a person

• Uses data from databases and papers

http://training.ensembl.org/events Manual Annotation Advantages Disadvantages • More comprehensive • Slower • More transcripts per gene, • Small scale especially non-coding transcripts • More genes, especially non-coding genes • Require less evidence (quality > thresholds) • More accurate for difficult regions • UTRs • Splice sites • Single exon transcripts • Immunoglobulins http://training.ensembl.org/events Imported Annotation

• We have imported annotation for a small number of species from external sources • Quality ensured by: • Our rigorous QC • Trusted sources

Chinese hamster ovary Ryukyu mouse Shrew mouse

http://training.ensembl.org/events Golden (merged) Transcripts

• Identical annotation

• Higher confidence and quality

http://training.ensembl.org/events GENCODE

• The GENCODE gene set is the merged set of:

• Ensembl automatically annotated genes and

• Havana manually annotated genes

• It is the default gene set used by ENCODE, 1000 Genomes and other major projects.

http://training.ensembl.org/events CCDS Transcripts

• Consensus Coding DNA Sequence set • Agreement between EBI, WTSI, UCSC and NCBI

http://www.ncbi.nlm.nih.gov/CCDS/CcdsBrowse.cgi

http://training.ensembl.org/events Which transcript do I use?

Transcript support level: Scored 1-5 for quality (1= best) https://www.ensembl.org/Help/Glossary?id=492 GENCODE basic: Prioritises full-length protein coding transcripts https://www.ensembl.org/Help/Glossary?id=500 APPRIS principal isoform: The major isoform(s) from combining protein structural information, functionally important residues and evidence from cross-species alignments https://www.ensembl.org/Help/Glossary?id=521 +CCDS and +Golden transcripts only http://training.ensembl.org/events Ensembl stable IDs ENSG########### Ensembl Gene ID ENST########### Ensembl Transcript ID ENSP########### Ensembl Peptide ID ENSE########### Ensembl Exon ID ENSR########### Ensembl Regulatory region ID

For non-human species a suffix is added: MUS (Mus musculus) for mouse: ENSMUSG### DAR (Danio rerio) for zebrafish: ENSDARG### https://www.ensembl.org/info/genome/stable_ids/index.html http://training.ensembl.org/events More Information

Aken et al. The Ensembl gene annotation system. Database (Oxford) (2016) doi: 10.1093/database/baw093 http://europepmc.org/articles/PMC4919035

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Why Gene Ontology (GO)?

Multiple terms for Innate Non-specific the same thing immunity immunity

Gene descriptions Complement Cytokines too specific Natural killer cells

Phagocytes Mast cells

http://training.ensembl.org/events GO terms form a Controlled Vocabulary

GO:0045087 - innate immune response

Innate immune responses are defense responses mediated by germline encoded components that directly recognise components of potential pathogens.

http://training.ensembl.org/events GO terms are hierarchical GO:0006955 immune response

GO:0045087 GO:0006957 innate immune response GO:0009682 complement activation, induced systemic alternative pathway resistance GO:0035420 GO:0042381 GO:0009814 MAPK cascade involved in defence response, hemolymph coagulation innate immune response incompatible interaction GO:0034342 GO:0001867 GO:0002227 GO:0009616 response to type II complement activation, innate immune response virus induced gene interferon lectin pathway in mucosa silencing GO:0034340 GO:0035006 GO:0002228 GO:0034341 response to type I melanisation defence natural killer cell response to interferon response mediated immunity interferon-gamma GO:0045089 GO:0009626 GO:0045824 GO:0045088 positive reg of innate -type negative reg of innate regulation of innate immune response hypersensitive response immune response immune response http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Hands on

• We will look at the ESPN gene and find information about the gene and its transcripts.

• Demo: Coursebook page 20-31 • Exercises: Questionbook page 6-8 • Answers: Answerbook page 9-13

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Variation

EBI is an Outstation of the European Molecular Biology Laboratory. Outline

• Classification of variants • Consequences of variants • Species with variation data & variant sources • Browsing variation data • Gene tab • Transcript tab • Location tab • Variant tab • Variant Effect Predictor

http://training.ensembl.org/events Variant Types

1) Small scale in one or few nucleotides of a gene • Single nucleotide polymorphism (SNP) • Small insertions and deletions (DIPs or indels) A G A C T T G A C C T G T C T - A A C T G G A T G A C T T G A C - T G T C T G A A C G G G A

2) Large scale in chromosomal structure (structural variation) • Copy number variants (CNV) • Large deletions/duplications, insertions, translocations

deletion duplication insertion translocation http://training.ensembl.org/events Variant Consequences

CODING CODING Synonymous Non-synonymous Regulatory (Missense)

ATG AAAAAAA

5’ Upstream 5’ UTR Splice site Intronic 3’ UTR 3’ Downstream

http://training.ensembl.org/events Consequence Terms: Sequence Ontology (SO)

http://www.ensembl.org/info/genome/variation/data_description.html#classes http://training.ensembl.org/events SIFT and PolyPhen

• Predict the effect of variants on amino acid sequences using scores based on:

• Level of conservation of the protein • Chemical change in the amino acid • 3D structure and domains (PolyPhen only)

• Only provide predictions, not facts • Predictions will never be as good as experimental validation

http://training.ensembl.org/events SIFT PolyPhen

1 1 Probably damaging 0.2 Possibly damaging 0.1

Tolerated

Benign

0.05 Deleterious 0 0 http://training.ensembl.org/events 22 Species with Variation Data

http://training.ensembl.org/events Species with Variation Data in

Ensembl Genomes

http://training.ensembl.org/events Variant Sources

https://www.ensembl.org/info/docs/variation/sources_documentation.html http://training.ensembl.org/events HapMap Project Genotyping: 1,301 individuals from 11 populations

CHD TSI JPT Chinese Tuscan Japanese CEU NW European CHB GIH Han Chinese Gujarati

MEX ASW LWK Mexican African Luhya YRI Yoruba American MKK Maasai

America Africa Europe East Asia Central-South Asia http://training.ensembl.org/events Sequencing: 2,500 individuals at 4X coverage

ITU STU FIN GBR CHB CEU IBR JPT GIH TSI CDX

MXL ASW PUR GW PJL CHS BEB MSL YRI LWK CLM ACB D ESN KHV

PEL

America Africa Europe East Asia Central-South Asia http://training.ensembl.org/events Genome Aggregation Database (GnomAD) Allele frequency data for >130,000 individuals from seven populations

http://gnomad.broadinstitute.org/about https://macarthurlab.org/2017/02/27/the-genome-aggregation-database-gnomad/ http://training.ensembl.org/events Human Population Genetics: Allele Frequencies

• HapMap: Worldwide, genotyping • 1000 Genomes: Worldwide, WGS, healthy individuals • ExAC: Exomes, diseased/healthy individuals, skewed to most studied populations • gnomAD: WGS and exomes, diseased/healthy individuals, skewed to most studied populations • UK10K: UK-wide, exomes, diseased individuals • TOPMed: WGS, diseased individuals

http://training.ensembl.org/events Reference Alleles BL102 BL102

BL AL476

AGTCGTAGCTAGC TAGGCCATAGGCGA

AL T is the allele in the contig used, therefore:

CM553 ● T is the reference allele ● G is the alternate allele

● Alleles are T/G CM IM768 Frequency T = 0.05, frequency G = 0.95 G is the allele in all primates

IM T causes disease susceptibility http://training.ensembl.org/events Allele Strand

AGTCGTAGCTAGC T/GAGGCCATAGGCGA

A/C GCTAGCTACGACT TCGCCTATGGCCT

Exon sequence: TATGGCCTA/CGCTAGC

Alleles in database = T/G Alleles in gene = A/C Alleles = A/C -ve strand or T/G +ve strand

Alleles = A/C or T/G Often lack further information http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Hands on

• We will look at the MCM6 gene and find variants in the gene. • We will look at the region of MCM6 and find variants in the region. • We will look at the variant rs4988235 and find information about it.

• Demo: Coursebook page 32-41 • Exercises: Questionbook page 9-10 • Answers: Answerbook page 14-16

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events What is the Variant Effect Predictor (VEP)?

A tool to predict and annotate the functional consequences of variants (SNPs, insertions, deletions, CNVs, other structural variants) Output • Affected gene, transcript Input and protein sequence • Variant coordinates • VCF • Pathogenicity /

• HGVS • Frequency data • Variant IDs • Regulatory consequences

• Splicing consequences

Ensembl resources and data (including Variation) • Literature citations http://training.ensembl.org/events https://www.ensembl.org/info/docs/variation/predicted_data.html VEP Features ● Ensembl Ensembl Regulatory Build: • Over 5,000 species ● RefSeq ● ENCODE ● Merged ● BLUEPRINT • ● GENCODE basic ● NIH Epigenomics Roadmap Consequence predictions Can be limited to specific cell types for different transcript sets ● dbSNP ● COSMIC • Consequence predictions ● ClinVar for regulatory regions ● ESP ● HGMD-Public ● 1000 Genomes ● PhenCode ● ESP • Known variants ● ExAC ● GnomAD

• Allele frequencies Via plugins: Built in: ● CADD ● SIFT & ● FATHMM • Pathogenicity predictions ● PolyPhen ● LRT ● MutationTaster ● Many more! • Phenotype/disease, clinical ● OMIM significance ● Orphanet ● GWAS catalog ● ClinVar http://training.ensembl.org/events Use the VEP!

Large user-base, including: ● Illumina ● CRUK ● Congenica ● Fabric Genomics ● Personalis

https://www.ensembl.org/info/docs/tools/vep/index.html http://training.ensembl.org/events Input Formats

Variant coordinates 1 881907 881906 -/C + 5 140532 140532 T/C + 12 1017956 1017956 T/A + 2 946507 946507 G/C + 14 19584687 19584687 C/T -

HGVS notation ENST00000285667.3:c.1047_1048insC 5:g.140532T>C NM_153681.2:c.7C>T ENSP00000439902.1:p.Ala2233Asp NP_000050.2:p.Ile2285Val

VCF #CHROM POS ID REF ALT 20 14370 rs6054257 G A 20 17330 . T A 20 1110696 rs6040355 A G,T 20 1230237 . T .

Variant IDs rs41293501 COSM327779 rs146120136 FANCD1:c.475G>A rs373400041 http://training.ensembl.org/events The VEP works for many Species

?

+ everything in Plants, Fungi, Metazoa, Protists and Bacteria http://training.ensembl.org/events Set up a Cache • To speed up your VEP script! • Use prebuilt caches for species in Ensembl or • Make your own from GTF and FASTA files - even for genomes not in Ensembl

https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html http://training.ensembl.org/events VEP Plugins

• Add extra functionality • Can extend, filter or manipulate the output • May make use of external data or code • Are available with both web tool and script

http://training.ensembl.org/events More Information

McLaren et al. The Ensembl Variant Effect Predictor. Genome Biology (2016) 17:122 http://europepmc.org/abstract/MED/27268795

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Hands on We have identified four variants on human chromosome nine, an A deletion at 128328461, C->A at 128322349, C->G at 128323079 and G->A at 128322917.

We will use the VEP to determine: • If the variants have been annotated in Ensembl already • Which genes are affected by the variants • If any of these variants affect gene regulation

• Demo: Coursebook page 41-46 • Exercises: Coursebook page 10-11 • Answers: Answerbook page 16-17

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Comparative Genomics

EBI is an Outstation of the European Molecular Biology Laboratory. Outline

• Comparative genomics: Applications and species • Gene trees • Homology predictions • Whole genome alignments • Pairwise • Multiple • Shared synteny

http://training.ensembl.org/events Applications of Comparative Genomics

• Evolutionary analysis • Differences between the genomes of species • Gene function based on homology • Distribution of highly conserved regions

http://training.ensembl.org/events Comparative Analysis by Taxa

Ensembl Ensembl Metazoa Compara Compara

http://training.ensembl.org/events Pan-taxonomic Compara

http://ensemblgenomes.org/info/ http://training.ensembl.org/events genomes?pan_compara=1 Pan-taxonomic Compara

Ensembl Metazoa Compara

http://training.ensembl.org/events Gene Trees

• Representative protein translation of each gene from all species in Ensembl • BLAST, clustering, multiple alignment • Phylogenetic tree for each aligned cluster • Reconciliation against species tree inferred from • Orthologue/Paralogue inference • dN/dS values (optional) Orthologues and Paralogues

http://www.ensembl.org/info/docs/compara/h http://training.ensembl.org/events omology_method.html Homology Relationships

Paralogues Orthologues Genes emerged Genes emerged through duplication through speciation e.g. e.g. c1 and c2 c1 and h1 h1 and h2 h2 and m c2 and m

c1 h1 c2 h2 m

One-to-one One-to-many

Speciation Duplication http://training.ensembl.org/events More Information

Herrero J. et al Ensembl comparative genomics resources. Database (Oxford) (2016) doi: 10.1093/database/bav096 http://europepmc.org/abstract/MED/26896847

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Hands on

We will look at the BRCA2 gene to find gene trees and homologues.

• Demo: Coursebook page 47-50

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Whole Genome Alignments Pairwise and multiple

• To identify highly conserved regions • Sequences that evolve slowly • Regions likely to be functional • Coding or non-coding sequences • To spot problematic gene predictions • To define syntenic regions

http://training.ensembl.org/events Alignments

• Pairwise alignments with BLASTZ (older) LASTZ-net (newer)

• EPO (Enredo-Pecan-Ortheus) analysis • For selected sets (12 fish, 7 sauropsids, 67 eutherian mammals, 12 primates)

• Mercator-Pecan analysis • For 31 amniota vertebrates (mammals and birds)

http://www.ensembl.org/info/genome/c http://training.ensembl.org/events ompara/analyses.html#pecan Shared Synteny

http://www.ensembl.org/info/docs/ http://training.ensembl.org/events compara/analyses.html Questions?

http://training.ensembl.org/events Hands on

We will look at a region of the human genome, 2:176087000-176202000, which contains the HoxD cluster to find alignments and conserved regions.

• Demo: Coursebook page 50-54 • Exercises: Questionbook page 12-14 • Answers: Answerbook page 18-22

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Ensembl Regulation

EBI is an Outstation of the European Molecular Biology Laboratory. Outline - Ensembl Regulation Annotation of the genome with features that may play a role in transcriptional regulation of genes • Data types & detection methods that inform the Ensembl Regulatory Build • Epigenetic marks • DNA methylation • Histone modifications • Factor and RNA Pol binding • Open/closed chromatin • Current and future data in Ensembl • Processing the data

http://training.ensembl.org/events Epigenetics

The study of heritable changes in gene function, without changes in the DNA sequence.

These changes regulate gene expression.

http://training.ensembl.org/events Epigenetic Change -> Cell Differentiation

Stem cell

Differentiated cell http://training.ensembl.org/events Epigenetic Change -> Cell Differentiation

http://training.ensembl.org/events Epigenetic Change -> Cell Differentiation

The cells have a different: • Function • Morphology • Gene expression • Epigenome http://training.ensembl.org/events DNA Methylation of a Promoter Region represses Transcription

CHCH CH CH

CH3 3 3 CH3 CH3 3 CH3 3

CC CG CAATCC CGCG ATCTAA CGCGCG GCTAGCATAAG CGCG GGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAGCGGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAGCGCTATCAG

3 3 3 3

3 3 3 3 CH CH CH CH CH CH CH CH

http://training.ensembl.org/events Bisulfite Sequencing To detect DNA Methylation

CH3 CH3 CH3 GGCGGGATTGCGCGTTAGATCGCGCGCTTATGCTAGCCGCGCTGATAG

CH3 CH3

Bisulfite treatment

GGUGGGATTGUGUGTTAGATCGCGCGUTTATGUTAGUCGCGUTGATAG

Sequence and compare to reference http://training.ensembl.org/events Histone Modifications

We describe histone modifications using the form: Subunit, Amino acid, Position, Modification, e.g. H3K36me3. http://training.ensembl.org/events Histone Code

Modification Histone

H3K4 H3K9 H3K14 H3K27 H3K79 H4K20 H2BK5

me1

me2

me3

ac

http://training.ensembl.org/events Promoters, Enhancers etc

• Transcription factor (TF)-binding to promoters and enhancers is necessary for transcription • Combinations of epigenetic marks affect the ability and probability of TF-binding at these sites

http://training.ensembl.org/events ChIP-seq To detect Histone Modifications and TF-binding DNA-binding protein DNA Antibody Crosslink protein to DNA

Covalent Unlink, wash, bond purify DNA

Shear DNA

Sequence DNA fragments Pull down with an antibody binding ACGCTGACTAGAATCAATGGCT protein of interest TCTCTTCGCATATGGCTGACTA http://training.ensembl.org/events Open/closed Chromatin

Open chromatin is transcriptionally active. Closed chromatin is inactive.

http://training.ensembl.org/events DNase Sensitivity To assay open/closed Chromatin

DNase treatment

Purify DNA

Sequence and compare to reference http://training.ensembl.org/events Detecting Features involved in Regulation of Genes

Feature Detection Method

DNA methylation Bilsulfite sequencing

Histone modifications ChIP-seq

Transcription factor binding ChIP-seq

Open/closed chromatin DNase sensitivity

http://training.ensembl.org/events Current Data

http://training.ensembl.org/events The Future

http://training.ensembl.org/events A Subset of Cell Types

• Ensembl only displays a subset of available data. • We display cell types that have, at a minimum: • CTCF binding • DNase or FAIRE data • H3K4me3, H3K27me3, H3K36me3 data • We display all TFBS and histone modification data known for these cell types. • We process these data to predict activity.

• You can add further data using track hubs.

http://training.ensembl.org/events Processing the Data

• We take raw data from the various sources. • We predict the positions of regulatory features, such as promoters, enhancers and insulators. • We predict the activity of these features in different cell types.

• You can view all of this in the genome browser.

http://training.ensembl.org/events Raw Data

Transcription Factor A Transcription Factor B Transcription Factor C Histone mod1 Histone mod2 Histone mod3

http://training.ensembl.org/events Searching for Patterns

known promoter

known promoter

known promoter

http://training.ensembl.org/events Segmentation

Transcription Factor A Transcription Factor B Transcription Factor C

Histone mod1 Histone mod2 Histone mod3

http://training.ensembl.org/events MultiCell Features

Cell type 1

Cell type 2

Cell type 3

Cell type 4

http://training.ensembl.org/events Cell-specific Features

MultiCell

Cell type 1

Cell type 2

Cell type 3

Cell type 4

http://training.ensembl.org/events We do not…

• …link promoters/enhancers/insulators or any other regulatory features to genes. We allow you to see what is where and to make your own inferences.

• …link regulatory features to gene expression. We have cell-line specific regulation data and tissue specific expression data – make of it what you will.

Regulatory data is incredibly complex and still in relative infancy. There is no comprehensive database of regulation data.

http://training.ensembl.org/events More Information

Zerbino et al. The Ensembl Regulatory Build Genome Biology (2015) 16:56 http://europepmc.org/abstract/MED/25887522

Zerbino et al. Ensembl regulation resources Database (Oxford) (2016) doi: 10.1093/database/bav119 http://europepmc.org/abstract/MED/26888907

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Hands on

• We will look at the region of the gene LIMD2 and find regulatory features. • We will explore what cells types they are active in and what evidence there is to support this.

• Demo: Coursebook page 55-62 • Exercises: Questionbook page 15 • Answers: Answerbook page 23-25

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Data Mining with BioMart

EBI is an Outstation of the European Molecular Biology Laboratory. Outline

● Introduction to BioMart ● The Principle: Four steps ● biomaRt

http://training.ensembl.org/events What is BioMart?

A tool in your browser to:

• Build queries with a few mouse clicks • Export data without programming • Generate custom data tables and files

http://training.ensembl.org/events Why use BioMart?

Because some tasks are too time consuming and/or difficult with the Ensembl browser

• Query multiple things (genes/variants) at once:

• ID conversions • Gene locations • Download sequences

• Export medium to large amounts of data http://training.ensembl.org/events Where can I find BioMart?

www.ensembl.org/biomart/martview

metazoa.ensembl.org/biomart/martview

http://training.ensembl.org/events For which Genomes is BioMart available?

Ensembl Ensembl Fungi (some exceptions) Ensembl Metazoa Ensembl Plants Ensembl Protists (some exceptions)

http://training.ensembl.org/events How do I use BioMart? The four steps

Dataset Filters Attributes Results Choose Narrow Specify what View & database down the to include as export table/ & species dataset your output sequences

http://training.ensembl.org/events Step 1: Dataset

Dataset Filters Attributes Results

Define the database you want to search with your filters: • Genes • Variation • Regulation • Mouse strains

Define the species/mouse strain http://training.ensembl.org/events Step 2: Filters

Dataset Filters Attributes Results

Define a (large) set of genes/variants by combining parameters, e.g.:

• A region Filter 1 Filter 2 • A list of IDs • Function (GO term) • Phenotypes

http://training.ensembl.org/events Get Attributes for these Step 3: Attributes

Dataset Filters Attributes Results

Define the data you want for that set, e.g.: • IDs • Features • Variants • Orthologues/Paralogues • Sequences

http://training.ensembl.org/events Step 4: Results

Dataset Filters Attributes Results

View and export the data table/sequences in a number of formats: • html • csv • tsv • xls • fasta

http://training.ensembl.org/events biomaRt: A Bioconductor package for BioMart

http://www.bioconductor.org/packages/release/bioc/html/biomaRt.html

Bioconductor provides tools for the analysis and comprehension of high-throughput genomic data using the R statistical programming language

• Easy installation in R: source("http://bioconductor.org/biocLite.R") biocLite("biomaRt") • Documentation: http://www.bioconductor.org/packages/release/bioc/vignettes/biomaRt/inst/d oc/biomaRt.html http://training.ensembl.org/events More Information

Kinsella R.J. et al Ensembl BioMarts: a hub for data retrieval across taxonomic space Database (Oxford) (2011) doi: 10.1093/database/bar030 http://europepmc.org/abstract/MED/21785142

Smedley, D. et al BioMart – biological queries made easy BMC Genomics (2009) 10:22 http://europepmc.org/abstract/MED/19144180

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Hands on

We will look at a set of six Homo sapiens genes, ESPN, MYH9, USH1C, CISD2, THRB and GIPC3, and find out: • Their NCBI gene IDs • Their function via GO terms • Their cDNA sequences

Dataset Filters Attributes Results Gene set Gene names NCBI gene IDs A. gambiae List of GO terms Homogenes sapiens names cDNA sequences

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Hands on

We will look at a set of six Homo sapiens genes, ESPN, MYH9, USH1C, CISD2, THRB and GIPC3, and find out: • Their NCBI gene IDs • Their function via GO terms • Their cDNA sequences

• Demo: Coursebook page 63-67 • Exercises: Questionbook page 16-18 • Answers: Answerbook page 26-31

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Ensembl Tools

EBI is an Outstation of the European Molecular Biology Laboratory. Ensembl Tools

My favourite tool does not I want to work with GRCh38! work with the file I But all my data has been downloaded from the Ensembl generated with GRCh37... FTP site!

Are there any viral sequences in my genome? And where? Is my variant in Linkage Disequilibrium with Using my new PCR other variants? primers, I see more than one band on the gel. Where do my primers bind? And how The VCF files from the 1000 often? Genomes project are huge! It I cannot find the gene I takes too long to download work on, or read them and I ran out of disk about, in Ensembl space… And I am only anymore! interested in the MC1R gene in the GBR population. http://training.ensembl.org/events Ensembl Tools

https://www.ensembl.org/info/docs/tools/index.html http://training.ensembl.org/events BLAST/BLAT To search our genomes for your DNA or protein sequence • Up to 30 sequences against on one or multiple genomes • Output: • Table with standard BLAST/BLAT output • Visualisation on and query sequence • Link to Location tab viewing the result in Region in detail view • For all species in Ensembl and text, FASTA or sequence ID input

https://www.ensembl.org/Help/View?db=core;id=451 http://training.ensembl.org/events File Chameleon To convert Ensembl files for use with other analysis tools • Files on the Ensembl FTP site are in standard formats, but… … different tools define the standards differently • Your favourite tool may need files in a slightly different format • Reformat files with these adjustments and download them • For all species in Ensembl and GFF3, GTF or FASTA files

http://www.ensembl.org/Homo_sapiens/Tools/FileChameleon?db=cor http://training.ensembl.org/events e Assembly Converter To map (liftover) your data's coordinates to the current assembly [or vice versa]

• Uses CrossMap: http://crossmap.sourceforge.net • Convert coordinates on one genome assembly to another, e.g.

• Only for and BED, GFF, GTF, VCF or WIG files http://training.ensembl.org/events ID History Converter To convert a set of Ensembl IDs from a previous release into their current equivalents • Find what Ensembl IDs map to in the current release and see their history through releases • Only for Ensembl stable IDs starting with ENS • For all species in Ensembl

http://training.ensembl.org/events Linkage Disequilibrium Calculator To calculate LD between variants using genotypes from a selected population

• Calculate pairwise LD: • in a given region • for a given list of variants • for a given variant within a defined window size • Get a table with: • Variant 1 & 2 ID, location, consequence, evidence • r2 and D’ • Currently (!) only for data from 1000 Genomes project phase 3

https://www.ensembl.org/Help/View?db=core;id=587 http://training.ensembl.org/events Ensembl Tools (GRCh37)

All tools from GRCh38 plus:

https://grch37.ensembl.org/info/docs/tools/index.html http://training.ensembl.org/events Data Slicer To get a subset of data from a BAM or VCF file • Uses SAMtools: http://samtools.sourceforge.net and VCFtools: https://vcftools.github.io/specs.html • BAM or VCF files for whole genomes contain all variants and genotypes of all individuals • You may want to analyse a small region, e.g. a gene, and a single population only • Take a slice of a file based on coordinates (BAM or VCF), and filtering by individuals or populations (VCF only) • Use data from 1000 Genomes project phase 1 or 3 or provide URL to your file • Only for and

http://grch37.ensembl.org/Homo_sapiens/Tools/DataSlicer?db=core http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Custom Data

EBI is an Outstation of the European Molecular Biology Laboratory. Visualise your own Data Add custom tracks with your own data

• BED • WIG • BedGraph • BAM/CRAM • GFF2/GTF • BigBed • GFF3 • BigWig • Pairwise interactions (WashU) • VCF • PSL

http://www.ensembl.org/info/website/tutorials/userdata.html http://www.ensembl.org/info/website/upload/index.html#format http://training.ensembl.org/eventss Adding Custom Tracks

• Upload a file to our servers (max 20MB unzipped) from: • Your own computer or • A location on the web (URL-based data) • Attach an indexed file via a URL • E.g. BigWig, BigBed or BAM • The file will be read each time you visit a page that has the track on.

• Tracks can be saved to your account and shared with others • Only trivial security, do not use for sensitive data

http://www.ensembl.org/info/website/upload/index.html#addtrack http://training.ensembl.org/events Hands on

We will upload a small BED file and view it in the browser. The BED format consists of one line per feature, each containing 3-12 columns of data (chrom, chromStart, chromEnd, name…)

chr5 36821632 37091234 P1 chr5 36731476 36978306 P2 chr5 36908552 37108671 P3

• Demo: Coursebook page 68-70

http://www.ensembl.org/info/website/upload/bed.html

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Track Hubs

EBI is an Outstation of the European Molecular Biology Laboratory. Track Hubs

• The Track Hub Registry (https://trackhubregistry.org) is a global centralised collection of publicly accessible track hubs.

• Many large projects (ENCODE, Roadmap Epigenomics, Blueprint) make data available in sets of tracks organised as a single hub.

• Discover and use track hubs with different types of data and view it in Ensembl and/or Create a track hub containing your own data to share with the public and collaborators!

http://www.ensembl.org/info/website/public_trackhubs.html

http://training.ensembl.org/events Hands on

We will add the Blueprint Track Hub and view a number of tracks containing data from the Blueprint project.

• Demo: Coursebook page 71-74

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Advanced Access

EBI is an Outstation of the European Molecular Biology Laboratory. Access Scales

One by one Main browser Mobile site

BioMart REST API Perl API VEP MySQL

Groups FTP Whole genome

http://training.ensembl.org/events Advanced Access to Data in Ensembl

To access data at larger scales: • FTP: Download files of complete Ensembl databases • MySQL: Direct database access • Perl API: Programmatic access • Ensembl Virtual Machine • REST API: Language agnostic access

http://training.ensembl.org/events FTP Site

Files of our complete databases • Genomic DNA, cDNA, CDS, ncRNA and protein sequence (FASTA) • Annotated sequence (EMBL, GenBank) • Gene sets (GTF, GFF3) • Variation (GVF, VCF) • Regulation (GFF, BED, BigWig) • Whole-genome multiple and gene-based multiple alignments (MAF) • Constrained elements (BED) • RNA-Seq files (BAM, BigWig) • MySQL databases

http://training.ensembl.org/events Access our FTP Site ftp://ftp.ensembl.org/pub/

Your favourite FTP Command line FTPclient downloads webpage: https://www.ensembl.org/info/data/ftp/index.html

http://training.ensembl.org/events Files on the FTP Site are large

• Multiple Mb/Gb • Lots of time to download/unzip

• Do you really need this data? • Make sure it is the right file before you download it. • Use a fast (wired) connection.

http://training.ensembl.org/events FTP Site Summary

Skills needed Web-browsing or FTP client use. Handling and parsing the files types. Scalability Complete database only. Speed Many minutes to download/unzip a file. Difficulty to query Files easy to download/unzip. Long-term New files with each release. File types are identical. Sequences? Yes.

http://training.ensembl.org/events Questions?

http://training.ensembl.org/events What is REST?

• Representational state transfer (REST) allows you to query the database using simple URLs giving output in plain text format

E.g. http://rest.ensembl.org/xrefs/symbol/homo_sapiens/BRCA2?con tent-type=application/json gives: [{"type":"gene","id":"ENSG00000139618"},{"type":"gene","id" :"LRG_293"}]

• This means you can write scripts in any language to construct these URLs and read their output http://training.ensembl.org/events REST API

• We have had a Perl API for a long time … … but not everybody works in Perl

• The REST API allows language agnostic access

• Visit rest.ensembl.org for installation, documentation and examples

Use grch37.rest.ensembl.org for GRCh37

http://training.ensembl.org/events Hands on: REST API - Single Endpoint

• I have the coordinates of a protein motif in a protein. I would like to find the coordinates of the motif in the genome.

• I am interested in a coiled-coil domain at position 116-216 in the protein ENSP00000386200.

• Demo: Coursebook page 75-77

http://training.ensembl.org/events Hands on: REST API - Single Endpoint

• I want to get a FASTA sequence for an Ensembl gene ID • I need to find an appropriate endpoint: http://rest.ensembl.org/

http://rest.ensembl.org/sequence/id/ENSG00000157764

gives:

>ENSG00000157764 chromosome:GRCh38:7:140719327:140924764:-1 CGCCTCCCTTCCCCCTCCCCGCCCGACAGCGGCCGCTCGGGCCCCGGCTCTCGGTTATAA GATGGCGGCGCTGAGCGGTGGCGGTGGTGGCGGCGCGGAGCCGGGCCAGGCTCTGTTCAA CGGGGACATGGAGCCCGAGGCCGGCGCCGGCGCCGGCGCCGCGGCCTCTTCGGCTGCGGA http://training.ensembl.org/events Hands on: REST API - Multiple Endpoints

• I want a script that gets a gene name from the command line and prints gene name, ID and sequence.

• There is no one endpoint that does this action, so we have to combine two endpoints with a script.

http://training.ensembl.org/events Hands on: Python Script

#!/usr/bin/env python # decode the json output import json # Get modules needed for script genes = json.loads(get_genes) import sys import urllib # move through the genes one-by-one import urllib2 for gene in genes: import json import time # define the REST query to get the sequence import httplib2, sys from the gene http = httplib2.Http(".cache") ext_get_seq = '/sequence/id/' + gene['id'] + '?';

# Get the gene name from the command line # submit the query gene_name = sys.argv[1] resp, get_seq = http.request(server+ext_get_seq, method="GET", # define the general URL parameters headers={"Content-Type":"application/json"}) server = "http://rest.ensembl.org" # decode the json output # define REST query to get the gene ID from the import json gene name seq = json.loads(get_seq) ext_get_gene = "/xrefs/symbol/homo_sapiens/" + gene_name + "?" # print the gene name, ID and sequence

print '>', gene_name, gene['id'], "\n", # submit the query seq['seq'] resp, get_genes = http.request(server+ext_get_gene, method="GET", headers={"Content-Type":"application/json"}) http://training.ensembl.org/events REST API Summary Skills needed Programming in any language. Understanding of features in Ensembl. Scalability Can query whole database. Speed Minimum query speed 150ms. Time cost per datapoint. Difficulty to query Need to construct URLs. May need scripts to dissect data. Long-term API takes database changes into account. Scripts do not need to change between releases. Sequences? Yes. http://training.ensembl.org/events Questions?

http://training.ensembl.org/events MySQL

Direct database access using MySQL queries

mysql -u anonymous -h ensembldb.ensembl.org

mysql> use homo_sapiens_core_82_38;

mysql> SELECT gene.stable_id FROM gene, xref WHERE gene.display_xref_id = xref.xref_id AND xref.display_label LIKE ’brca2';

https://www.ensembl.org/info/data/mysql.html http://training.ensembl.org/events Perl API

Database querying using Perl scripts

my $gene_adaptor = $registry-> get_adaptor( 'human', 'core', ‘gene' );

my $gene = $gene_adaptor-> fetch_by_display_label( 'brca2' );

print $gene->stable_id, "\n";

https://www.ensembl.org/info/data/api.html http://training.ensembl.org/events Questions?

http://training.ensembl.org/events Feedback

http://training.ensembl.org/events/

http://training.ensembl.org/events Summary

Ensembl is a genome browser which integrates: • Genome annotation • Variation • Regulation • Comparative Genomics

http://training.ensembl.org/events How is all this Data organised?

• Ensembl browser websites • Main website, Pre!, Archive!, GRCh37, Ensembl Genomes • BioMart ‘Data Mining tool’ • Ensembl Database (open source) • Perl API, REST API, MySQL • FTP server • http://www.ensembl.org/info/data/ftp/index.htm l http://training.ensembl.org/events Help and Documentation

Courses https://www.ebi.ac.uk/training/online/ Tutorials www.ensembl.org/info/website/tutorials

Flash animations www.youtube.com/user/EnsemblHelpdesk http://u.youku.com/Ensemblhelpdesk

Email us [email protected] Public mailing lists [email protected] [email protected]

http://training.ensembl.org/events Publications

http://www.ensembl.org/info/about/publications.html

Zerbino D. et al Ensembl 2018 Nucleic Acids Research (2017) gkx1098, doi.org/10.1093/nar/gkx1098 https://academic.oup.com/nar/advance-article/doi/10.1093/nar/gkx1098/4634002

Xosé M. Fernández-Suárez and Michael K. Schuster Using the Ensembl Genome Server to Browse Genomic Sequence Data. Current Protocols in (2010) 30:1.15.1-1.15.48 http://europepmc.org/abstract/MED/20521244

Giulietta M. Spudich and Xosé M. Fernández-Suárez Touring Ensembl: A practical guide to genome browsing BMC Genomics (2010) 11:295 http://europepmc.org/articles/PMC2894802

and topic-specific publications mentioned throughout the workshop http://training.ensembl.org/events Follow us

www.facebook.com/Ensembl.org @Ensembl www.ensembl.info @AstridGAGall

http://training.ensembl.org/events Ensembl Acknowledgements The Entire Ensembl Team Daniel R. Zerbino1, Premanand Achuthan1, Wasiu Akanni1, M. Ridwan Amode1, Daniel Barrell1,2, 1 1 1 1 1 1 Jyothish Bhai , Konstantinos Billis , Carla Cummins , Astrid Gall , Carlos García Giroń , Laurent Gil , Leo Gordon1, Leanne Haggerty1, Erin Haskell1, Thibaut Hourlier1, Osagie G. Izuogu1, Sophie H. Janacek1, Thomas Juettemann1, Jimmy Kiang To1, Matthew R. Laird1, Ilias Lavidas1, Zhicheng Liu1, Jane E. Loveland1, Thomas Maurel1, William McLaren1, Benjamin Moore1, Jonathan Mudge1, Daniel N. Murphy1, Victoria Newman1, Michael Nuhn1, Denye Ogeh1, Chuang Kee Ong1, Anne Parker1, Mateus Patricio1, Harpreet Singh Riat1, Helen Schuilenburg1, Dan Sheppard1, Helen Sparrow1, Kieron Taylor1, Anja Thormann1, Alessandro Vullo1, Brandon Walts1, Amonida Zadissa1, Adam Frankish1, Sarah E. Hunt1, Myrto Kostadima1, Nicholas Langridge1, Fergal J. Martin1, Matthieu Muffato1, Emily Perry1, Magali Ruffier1, Dan M. Staines1, Stephen J. Trevanion1, Bronwen L. Aken1, Fiona Cunningham1, Andrew Yates1 and Paul Flicek1,3

1European Molecular Biology Laboratory, European Bioinformatics Institute, , Hinxton, Cambridge CB10 1SD, UK, 2Eagle Genomics Ltd., Wellcome Genome Campus, Hinxton, Cambridge CB10 1DR, UK and 3Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, UK Funding

Co-funded by the European Union http://training.ensembl.org/events Training Materials

• Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ • If you wish to re-use these materials, please credit Ensembl for their creation • If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html

http://training.ensembl.org/events