Training Materials
Total Page:16
File Type:pdf, Size:1020Kb
Training Materials • Ensembl training materials are protected by a CC BY license http://creativecommons.org/licenses/by/4.0/ • If you wish to re-use these materials, please credit Ensembl for their creation • If you use Ensembl for your work, please cite our papers http://www.ensembl.org/info/about/publications.html http://training.ensembl.org/events Browsing Genes and Genomes with Ensembl Astrid Gall Ensembl Outreach Officer, EMBL-EBI 13th Ensembl browser workshop Erasmus MC, Rotterdam EBI is an Outstation of the European Molecular Biology Laboratory. 13 - 14 February 2018 This Workshop Today Tomorrow Introduction & Exploring the Comparative genomics Ensembl genome browser Coffee/Tea Coffee/Tea Regulation Genes and transcripts Lunch break Lunch break BioMart Variation Coffee/Tea Coffee/Tea Tools, custom data, track The Variant Effect Predictor (VEP) hubs, advanced access http://training.ensembl.org/events Structure of the Workshop Presentation: What the data/tool is How we produce/process the data Demo: Follow along if Getting the data you want to Using the tool Exercises: Trying things out for yourself (alone/pairs?) Going beyond the demo Not a test! http://training.ensembl.org/events Extra Exercises Course Materials training.ensembl.org/events • Presentation • Coursebook (demos) • Questionbook (exercises) • Copysheet (text file for exercises) • Answerbook (exercise answers) http://training.ensembl.org/events Objectives You will learn: • What Ensembl is • What type of data you can explore in Ensembl • How to navigate the Ensembl browser website • Where to find help and documentation http://training.ensembl.org/events Questions? http://training.ensembl.org/events Introduction to Ensembl EBI is an Outstation of the European Molecular Biology Laboratory. Outline • Genome browsers • Ensembl features and access scales • Ensembl and Ensembl Genomes • Release Cycle • Genome Assembly http://training.ensembl.org/events Why do we need Genome Browsers? 1977: 1st genome sequenced (5 kb) 2004: Human genome sequence finished (3 Gb) http://training.ensembl.org/events Why do we need Genome Browsers? CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCATCTGAAATTTCTTGGAAACACGATCAC TTTAACGGAATATTGCTGTTTTGGGGAAGTGTTTTACAGCTGCTGGGCACGCTGTATTTGCCTTACTTAAGCCCCTGGTAATTGCTGTATTC CGAAGACATGCTGATGGGAATTACCAGGCGGCGTTGGTCTCTAACTGGAGCCCTCTGTCCCCACTAGCCACGCGTCACTGGTTAGCGTGATT GAAACTAAATCGTATGAAAATCCTCTTCTCTAGTCGCACTAGCCACGTTTCGAGTGCTTAATGTGGCTAGTGGCACCGGTTTGGACAGCACA GCTGTAAAATGTTCCCATCCTCACAGTAAGCTGTTACCGTTCCAGGAGATGGGACTGAATTAGAATTCAAACAAATTTTCCAGCGCTTCTGA GTTTTACCTCAGTCACATAATAAGGAATGCATCCCTGTGTAAGTGCATTTTGGTCTTCTGTTTTGCAGACTTATTTACCAAGCATTGGAGGA ATATCGTAGGTAAAAATGCCTATTGGATCCAAAGAGAGGCCAACATTTTTTGAAATTTTTAAGACACGCTGCAACAAAGCAGGTATTGACAA ATTTTATATAACTTTATAAATTACACCGAGAAAGTGTTTTCTAAAAAATGCTTGCTAAAAACCCAGTACGTCACAGTGTTGCTTAGAACCAT AAACTGTTCCTTATGTGTGTATAAATCCAGTTAACAACATAATCATCGTTTGCAGGTTAACCACATGATAAATATAGAACGTCTAGTGGATA AAGAGGAAACTGGCCCCTTGACTAGCAGTAGGAACAATTACTAACAAATCAGAAGCATTAATGTTACTTTATGGCAGAAGTTGTCCAACTTT TTGGTTTCAGTACTCCTTATACTCTTAAAAATGATCTAGGACCCCCGGAGTGCTTTTGTTTATGTAGCTTACCATATTAGAAATTTAAAACT AAGAATTTAAGGCTGGGCGTGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGTGGGCGGATCACTTGAGGCCAGAAGTTTGA GACCAGCCTGGCCAACATGGTGAAACCCTATCTCTACTAAAAATACAAAAAATGTGCTGCGTGTGGTGGTGCGTGCCTGTAATCCCAGCTAC ACGGGAGGTGGAGGCAGGAGAATCGCTTGAACCCTGGAGGCAGAGGTTGCAGTGAGCCAAGATCATGCCACTGCACTCTAGCCTGGGCCACA TAGCATGACTCTGTCTCAAAACAAACAAACAAACAAAAAACTAAGAATTTAAAGTTAATTTACTTAAAAATAATGAAAGCTAACCCATTGCA TATTATCACAACATTCTTAGGAAAAATAACTTTTTGAAAACAAGTGAGTGGAATAGTTTTTACATTTTTGCAGTTCTCTTTAATGTCTGGCT AAATAGAGATAGCTGGATTCACTTATCTGTGTCTAATCTGTTATTTTGGTAGAAGTATGTGAAAAAAAATTAACCTCACGTTGAAAAAAGGA ATATTTTAATAGTTTTCAGTTACTTTTTGGTATTTTTCCTTGTACTTTGCATAGATTTTTCAAAGATCTAATAGATATACCATAGGTCTTTC CCATGTCGCAACATCATGCAGTGATTATTTGGAAGATAGTGGTGTTCTGAATTATACAAAGTTTCCAAATATTGATAAATTGCATTAAACTA TTTTAAAAATCTCATTCATTAATACCACCATGGATGTCAGAAAAGTCTTTTAAGATTGGGTAGAAATGAGCCACTGGAAATTCTAATTTTCA TTTGAAAGTTCACATTTTGTCATTGACAACAAACTGTTTTCCTTGCAGCAACAAGATCACTTCATTGATTTGTGAGAAAATGTCTACCAAAT TATTTAAGTTGAAATAACTTTGTCAGCTGTTCTTTCAAGTAAAAATGACTTTTCATTGAAAAAATTGCTTGTTCAGATCACAGCTCAACATG AGTGCTTTTCTAGGCAGTATTGTACTTCAGTATGCAGAAGTGCTTTATGTATGCTTCCTATTTTGTCAGAGATTATTAAAAGAAGTGCTAAA GCATTGAGCTTCGAAATTAATTTTTACTGCTTCATTAGGACATTCTTACATTAAACTGGCATTATTATTACTATTATTTTTAACAAGGACAC TCAGTGGTAAGGAATATAATGGCTACTAGTATTAGTTTGGTGCCACTGCCATAACTCATGCAAATGTGCCAGCAGTTTTACCCAGCATCATC TTTGCACTGTTGATACAAATGTCAACATCATGAAAAAGGGTTGAAAAAAGGAATATTTTAATAGTTTTCAGTTACTTTATGACTGTTAGCTA http://training.ensembl.org/events Genome Browsers unlock the Code https://www.ensembl.org https://www.ncbi.nlm.nih. gov/projects/mapview https://genome.ucsc.edu http://training.ensembl.org/events Ensembl Features • Genomes and gene builds for >100 species • Variation data • Alignments, gene trees, homologues • Regulatory build (ENCODE) • Gene expression and function • BioMart (data export) • Tools for data processing, e.g. VEP • Display your own data • Programmatic access via APIs • Completely Open Source (FTP, GitHub) http://training.ensembl.org/events Access Scales One by one Main browser Mobile site BioMart REST API Perl API VEP MySQL Groups FTP Whole genome http://training.ensembl.org/events Ensembl - 100+ Vertebrate and Model Organism Genomes http://training.ensembl.org/events Ensembl Genomes - Extending Ensembl across the Taxonomic Space www.ensembl.org www.ensemblgenomes.org ● Vertebrates ● Bacteria ● Fungi ● Metazoa ● Other representative species ● Plants ● Protists http://training.ensembl.org/events Ensembl and Ensembl Genomes Ensembl Ensembl Genomes Released 2000 2009 Species Vertebrates (fly, worm and Non-vertebrates (protists, yeast as outgroups) plants, fungi, metazoa, bacteria) Annotation By Ensembl In collaboration with the scientific communities URL www.ensembl.org www.ensemblgenomes.org http://training.ensembl.org/events Release Cycle New/updated interfaces 9091 AugDec 2017 Updated New regulation genome data assemblies ~3 months Updated variation Underlying data software updates Comparative Genomics including new genes Updated and genomes http://training.ensembl.org/events gene sets How does a Genome Assembly work? Sequence reads CGGCCTTTGGGCTCCGCCTTCAGCTCAAGA CAGCTGTCCCAGATGAC ACTTAACTTCCCTCCCAGCTGTCC GGGCTCCGCCTTCAGCTC TCCCAGCTGTCCCAGATGACGCCAT AACTTCCCTCCCAGCT CGGCCTTTGGGCTCC TCCGCCTTCAGCTCAAGACTTAACTTC CAGATGACGCC Match up overlaps CGGCCTTTGGGCTCCGCCTTCAGCTCAAGA AACTTCCCTCCCAGCT CAGATGACGCC TCCGCCTTCAGCTCAAGACTTAACTTC TCCCAGCTGTCCCAGATGACGCCAT GGGCTCCGCCTTCAGCTC ACTTAACTTCCCTCCCAGCTGTCC CGGCCTTTGGGCTCC CAGCTGTCCCAGATGAC Contig CGGCCTTTGGGCTCCGCCTTCAGCTCAAGACTTAACTTCCCTCCCAGCTGTCCCAGATGACGCCAT http://training.ensembl.org/events Cloning into Bacterial Artificial Chromosomes (BACs) BL Cut into fragments & clone into BACs BL001 BL002 BL003 BL004 Grow in bacterial cultures http://training.ensembl.org/events Genome Assembly: Tilepath, contigs, scaffolds, chromosome BACs from different BL001 IM004 individuals assembled, AL002 CM003 including overlaps: Tilepath Overlaps trimmed to give contigs. A run of contigs with no gaps is a scaffold. Genetic maps used to assemble scaffolds into a chromosome. http://training.ensembl.org/events Genome Contigs A contig is a contiguous stretch of DNA sequence that BL102 has been assembled based on sequencing information. BL AL476 AL CM553 CM IM768 IM http://training.ensembl.org/events Human Genome Assemblies • GRCh38 (aka hg38) • No gaps. Many rare/private alleles replaced. • www.ensembl.org • Most up-to-date and supported • GRCh37 (aka hg19) • 250 gaps • grch37.ensembl.org • Limited data and software updates • NCBI36 (aka hg18) • 150,000 gaps • ncbi36.ensembl.org • No longer updated http://training.ensembl.org/events Questions? http://training.ensembl.org/events Exploring the Ensembl Genome Browser: Homepage, Assemblies & Species EBI is an Outstation of the European Molecular Biology Laboratory. Hands on We will look at the Ensembl homepage and find information about the genome assemblies and species in Ensembl. • Demo: Coursebook page 7-13 • Exercises: Questionbook page 3 • Answers: Answerbook page 3-4 http://training.ensembl.org/events Questions? http://training.ensembl.org/events Exploring the Ensembl Genome Browser: The Location tab & Region in detail view EBI is an Outstation of the European Molecular Biology Laboratory. Hands on We will look at a region of the human genome, 4:122868000-122946000, and manipulate the view to see the data we are interested in. • Demo: Coursebook page 13-19 • Exercises: Questionbook page 3-5 • Answers: Answerbook page 4-8 http://training.ensembl.org/events Questions? http://training.ensembl.org/events Genes and Transcripts EBI is an Outstation of the European Molecular Biology Laboratory. Outline • Gene views in Ensembl • Annotation • Automatic • Manual • Imported • Golden transcripts, GENCODE, CCDS • Which transcript do I use? • Ensembl stable IDs • Gene Ontology http://training.ensembl.org/events Gene Views Coding exon Intron Non-coding exon Merged transcript Protein coding transcript Non-coding transcript http://training.ensembl.org/events Ensembl and Havana Annotation Automatic Annotation Manual Annotation http://training.ensembl.org/events Automatic Annotation • Genome-wide determination using the Ensembl automated pipeline • Predictions based on experimental (biological) data http://training.ensembl.org/events Biological Evidence International Nucleotide Sequence Database Collaboration • cDNAs • ESTs • RNAseq Protein sequence databases • Swiss-Prot: Manually curated • TrEMBL: