EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013

Exploring Sequences and Browsing : ENA, Ensembl & Ensembl Genomes

Bert Overduin, Ph.D. Vertebrate Genomics Team

European Institute (EMBL-EBI) European Molecular Biology Laboratory Campus Hinxton Cambridge CB10 1SD United Kingdom EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 EBI Bioinformatics Services

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Outline

• European Nucleotide Archive Introduction

1: Exploring an ENA record

• Ensembl & Ensembl Genomes Introduction

2: Browser basics 3: Visualising your own data 4: Variation Effect Predictor (VEP) 5: BioMart

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013

Goal

To provide a comprehensive record of the world’s nucleotide sequencing information, covering raw sequencing data, information and functional annotation

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 History

• 1980: EMBL Data Library (EMBL Heidelberg, Germany) World’s first public database of nucleotide sequences • 1995: EMBL-Bank (EBI Hinxton, UK) • 2003: Trace Archive Capillary Sequencing reads • 2008: Sequence (formerly: Short) Read Archive (SRA) Next Generation Sequencing reads

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 INSDC

• International Nucleotide Sequence Database Collaboration • ENA, NCBI GenBank and DNA Data Bank of Japan • Data are submitted to one of the databases • Databases are synchronized on a daily basis • http://www.insdc.org

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Three-tiered data architecture

EMBL-Bank

Sequence Read Archive

Trace Archive

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Content

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Submitting data

• Many journals and funders require authors to submit their sequence to an INSDC database prior to publication • Only submit to one INSDC database (ENA, GenBank or DDBJ) • Unique accession numbers are assigned to all submitted data • Submitted data can be made public immediately or kept private until the associated work has been published • Once public, data will be exchanged with NCBI and DDBJ • Data belong to the submitter and can only be updated with submitter consent

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Submitting data

• Preferred: Webin interactive web submission system • Other tools for e.g. genome projects and large sequencing centers • http://www.ebi.ac.uk/ena/about/submit_and_update

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Retrieving data

• ENA Browser • Free text search: ENA homepage, EB-eye • Sequence similarity search: ENA homepage, ENA Sequence Search • Programmatic data access using REST URLs • Formats: FASTA, FASTQ, flat file, HTML, XML • Bulk data download: using FTP or Aspera • http://www.ebi.ac.uk/ena/about/search_and_browse

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Demo 1 - Exploring an ENA record

Background:

Task: © Mo Hassan Retrieve and browse the mitochondrial genome of the cave bear (Ursus spelaeus)

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Help

• Data submissions, helpdesk, enquiries [email protected]

• Updates, publication notifications [email protected]

• EBI Train Online • http://www.ebi.ac.uk/training/online/

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Acknowledgements

Guy Cochrane, Blaise Alako, Clara Amid, Ana Cerdeño-Tárraga, Iain Cleland, Richard Gibson, Neil Goodgame, Simon Kay, Rasko Leinonen, Xin Liu, Arnaud Oisel, Nima Pakseresht, Sheila Plaister, Rajesh Radhakrishnan, Kethi Reddy, Stephane Riviere, Marc Rossello, Alexander Senf, Nicole Silvester, Petra Ten Hoopen, Dmitriy Smirnov, Ana Toribio, Daniel Vaughan, Vadim Zalunin

CTAAAGTTCTGAAAGACCTGTTGCTTTTCACCAGGAAGTTTTACTGGGCATCTCCTGAGCCTAGGCAATAGCTGTAGGGTGACTTCTGGAGCCATCCCCGTTTCCCCGCCCCCCAAAAGAAGCGGAGATTTAACGGG GACGTGCGGCCAGAGCTGGGGAAATGGGCCCGCGAGCCAGGCCGGCGCTTCTCCTCCTGATGCTTTTGCAGACCGCGGTCCTGCAGGGGCGCTTGCTGCGTGAGTCCGAGGGCTGCGGGCGAACTAGGGGCGCGGCG GGGGTGGAAAAATCGAAACTAGCTTTTTCTTTGCGCTTGGGAGTTTGCTAACTTTGGAGGACCTGCTCAACCCTATCCGCAAGCCCCTCTCCCTACTTTCTGCGTCCAGACCCCGTGAGGGAGTGCCTACCACTGAA CTGCAGATAGGGGTCCCTCGCCCCAGGACCTGCCCCCTCCCCCGGCTGTCCCGGCTCTGCGGAGTGACTTTTGGAACCGCCCACTCCCTTCCCCCAACTAGAATGCTTTTAAATAAATCTCGTAGTTCCTCACTTGA GCTGAGCTAAGCCTGGGGCTCCTTGAACCTGGAACTCGGGTTTATTTCCAATGTCAGCTGTGCAGTTTTTTCCCCAGTCATCTCCAAACAGGAAGTTCTTCCCTGAGTGCTTGCCGAGAAGGCTGAGCAAACCCACA GCAGGATCCGCACGGGGTTTCCACCTCAGAACGAATGCGTTGGGCGGTGGGGGCGCGAAAGAGTGGCGTTGGGGATCTGAATTCTTCACCATTCCACCCACTTTTGGTGAGACCTGGGGTGGAGGTCTCTAGGGTGG GAGGCTCCTGAGAGAGGCCTACCTCGGGCCTTTCCCCACTCTTGGCAATTGTTCTTTTGCCTGGAAAATTAAGTATATGTTAGTTTTGAACGTTTGAACTGAACAATTCTCTTTTCGGCTAGGCTTTATTGATTTGC AATGTGCTGTGTAATTAAGAGGCCTCTCTACAAAGTACTGATAATGAACATGTAAGCAATGCACTCACTTCTAAGTTACATTCATATCTGATCTTATTTGATTTTCACTAGGCATAGGGAGGTAGGAGCTAATAATA CGTTTATTTTACTAGAAGTTAACTGGAATTCAGATTATATAACTCTTTTCAGGTTACAAAGAACATAAATAATCTGGTTTTCTGATGTTATTTCAAGTACTACAGCTGCTTCTAATCTTAGTTGACAGTGATTTTGC CCTGTAGTGTAGCACAGTGTTCTGTGGCACACGCCGGCCTCAGCACAGCACTTTGAGTTTTGGTACTACGTGTATCCACATTTTACACATGACAAGAATGAGGCATGGCACGGCCTGCTTCCTGGCAAATTTATTCA ATGGTACATGGGCTTTGGTGGCAGAGCTCATGTCTCCACTTCATAGCTATGATTCTTAAACATCACACTGCATTAGAGGTTGAATAATAAAATTTCATGTTGAGCAGAAATATTCATTGTTTACAAGTGTAAATGAG TCCCAGCCATGTGTTGCACTGTTCAAGCCCCAAGGGAGAGAGCAGGGAAACAAGTCTTTACCCTTTGATATTTTGCATTCTAGTGGGAGAGATGACAATAAGCAAATGAGCAGAAAGATATACAACATCAGGAAATC ATGGGTGTTGTGAGAAGCAGAGAAGTCAGGGCAAGTCACTCTGGGGCTGACACTTGAGCAGAGACATGAAGGAAATAAGAATGATATTGACTGGGAGCAGTATTTCCCAGGCAAACTGAGTGGGCCTGGCAAGTTGG ATTAAAAAGCGGGTTTTCTCAGCACTACTCATGTGTGTGTGTGTGGGGGGGGGGGGCGGCGTGGGGGTGGGAAGGGGGACTACCATCTGCATGTAGGATGTCTAGCAGTATCCTGTCCTCCCTACTCACTAGGTGCT AGGAGCACTCCCCCAGTCTTGACAACCAAAAATGTCTCTAAACTTTGCCACATGTCACCTAGTAGACAAACTCCTGGTTAAGAAGCTCGGGTTGAAAAAAATAAACAAGTAGTGCTGGGGAGTAGAGGCCAAGAAGT AGGTAATGGGCTCAGAAGAGGAGCCACAAACAAGGTTGTGCAGGCGCCTGTAGGCTGTGGTGTGAATTCTAGCCAAGGAGTAACAGTGATCTGTCACAGGCTTTTAAAAGATTGCTCTGGCTGCTATGTGGAAAGCA GAATGAAGGGAGCAACAGTAAAAGCAGGGAGCCCAGCCAGGAAGCTGTTACACAGTCCAGGCAAGAGGTAGTGGAGTGGGCTGGGTGGGAACAGAAAAGGGAGTGACAAACCATTGTCTCCTGAATATATTCTGAAG GAAGTTGCTGAAGGATTCTATGTTGTGTGAGAGAAAGAGAAGAATTGGCTGGGTGTAGTAGCTCATGCCAAGGAGGAGGCCAAGGAGAGCAGATTCCTGAGCTCAGGAGTTCAAGACCAGCCTGGGCAACACAGCAA AACCCCTTCTCTACAAAAAATACAAAAATTAGCTGGGTGTGGTGGCATGCACCTGTGATCCTAGCTACTCGGGAGGCTGAGGTGGAGGGTATTGCTTGAGCCCAGGAAGTTGAGGCTGCAGTGAGCCATGACTGTGC CACTGTACTTCAGCCTAGGTGACAGAGCAAGACCCTGTCTCCCCTGACCCCCTGAAAAAGAGAAGAGTTAAAGTTGACTTTGTTCTTTATTTTAATTTTATTGGCCTGAGCAGTGGGGTAATTGGCAATGCCATTTC TGAGATGGTGAAGGCAGAGGAAAGAGCAGTTTGGGGTAAATCAAGGATCTGCATTTGGACATGTTAAGTTTGAGATTCCAGTCAGGCTTCCAAGTGGTGAGGCCACATAGGCAGTTCAGTGTAAGAATTCAGGACCA AGGCTGGGCACGGTGGCTCACTTCTGTAATCCCAGCACTTTGGTGGCTGAGGCAGGTAGATCATTTGAGGTCAGGAGTTTGAGACAAGCTTGGCCAACATGGTGAAACCCCATGTCTACTAAAAATACAAAAATTAG CCTGGTGTGGTGGCGCACGCCTATAGTCCCAGGTTTTCAGGAGGCTTAGGTAGGAGAATCCCTTGAACCCAGGAGGTGCAGGTTGCAGTGAGCTGAGATTGTGCCACTGCACTCCAGCCTGGGTGATAGAGTGAGAC TCTGTCTCAAAAAAAAAAAAAAAAAAAAAAAAAAAAACTGAAGGAATTATTCCTCAGGATTTGGGTCTAATTTGCCCTGAGCACCAACTCCTGAGTTCAACTACCATGGCTAGACACACCTTAACATTTTCTAGAAT CCACCAGCTTTAGTGGAGTCTGTCTAATCATGAGTATTGGAATAGGATCTGGGGGCAGTGAGGGGGTGGCAGCCACGTGTGGCAGAGAAAAGCACACAAGGAAAGAGCACCCAGGACTGTCATATGGAAGAAAGACA GGACTGCAACTCACCCTTCACAAAATGAGGACCAGACACAGCTGATGGTATGAGTTGATGCAGGTGTGTGGAGCCTCAACATCCTGCTCCCCTCCTACTACACATGGTTAAGGCCTGTTGCTCTGTCTCCAGGTTCA CACTCTCTGCACTACCTCTTCATGGGTGCCTCAGAGCAGGACCTTGGTCTTTCCTTGTTTGAAGCTTTGGGCTACGTGGATGACCAGCTGTTCGTGTTCTATGATCATGAGAGTCGCCGTGTGGAGCCCCGAACTCC ATGGGTTTCCAGTAGAATTTCAAGCCAGATGTGGCTGCAGCTGAGTCAGAGTCTGAAAGGGTGGGATCACATGTTCACTGTTGACTTCTGGACTATTATGGAAAATCACAACCACAGCAAGGGTATGTGGAGAGGGG GCCTCACCTTCCTGAGGTTGTCAGAGCTTTTCATCTTTTCATGCATCTTGAAGGAAACAGCTGGAAGTCTGAGGTCTTGTGGGAGCAGGGAAGAGGGAAGGAATTTGCTTCCTGAGATCATTTGGTCCTTGGGATGG TGGAAATAGGGACCTATTCCTTTGGTTGCAGTTAACAAGGCTGGGGATTTTTCCAGAGTCCCACACCCTGCAGGTCATCCTGGGCTGTGAAATGCAAGAAGACAACAGTACCGAGGGCTACTGGAAGTACGGGTATG ATGGGCAGGACCACCTTGAATTCTGCCCTGACACACTGGATTGGAGAGCAGCAGAACCCAGGGCCTGGCCCACCAAGCTGGAGTGGGAAAGGCACAAGATTCGGGCCAGGCAGAACAGGGCCTACCTGGAGAGGGAC TGCCCTGCACAGCTGCAGCAGTTGCTGGAGCTGGGGAGAGGTGTTTTGGACCAACAAGGTATGGTGGAAACACACTTCTGCCCCTATACTCTAGTGGCAGAGTGGAGGAGGTTGCAGGGCACGGAATCCCTGGTTGG AGTTTCAGAGGTGGCTGAGGCTGTGTGCCTCTCCAAATTCTGGGAAGGGACTTTCTCAATCCTAGAGTCTCTACCTTATAATTGAGATGTATGAGACAGCCACAAGTCATGGGTTTAATTTCTTTTCTCCATGCATA TGGCTCAAAGGGAAGTGTCTATGGCCCTTGCTTTTTATTTAACCAATAATCTTTTGTATATTTATACCTGTTAAAAATTCAGAAATGTCAAGGCCGGGCACGGTGGCTCACCCCTGTAATCCCAGCACTTTGGGAGG CCGAGGCGGGTGGTCACAAGGTCAGGAGTTTGAGACCAGCCTGACCAACATGGTGAAACCCGTCTCTAAAAAAATACAAAAATTAGCTGGTCACAGTCATGCGCACCTGTAGTCCCAGCTAATTGGAAGGCTGAGGC AGGAGCATCGCTTGAACCTGGGAAGCGGAAGTTGCACTGAGCCAAGATCGCGCCACTGCACTCCAGCCTAGGCAGCAGAGTGAGACTCCATCTTAAAAAAAAAAAAAAAAAAAAAAAGAGAATTCAGAGATCTCAGC TATCATATGAATACCAGGACAAAATATCAAGTGAGGCCACTTATCAGAGTAGAAGAATCCTTTAGGTTAAAAGTTTCTTTCATAGAACATAGCAATAATCACTGAAGCTACCTATCTTACAAGTCCGCTTCTTATAA CAATGCCTCCTAGGTTGACCCAGGTGAAACTGACCATCTGTATTCAATCATTTTCAATGCACATAAAGGGCAATTTTATCTATCAGAACAAAGAACATGGGTAACAGATATGTATATTTACATGTGAGGAGAACAAG CTGATCTGACTGCTCTCCAAGTGACACTGTGTTAGAGTCCAATCTTAGGACACAAAATGGTGTCTCTCCTGTAGCTTGTTTTTTTCTGAAAAGGGTATTTCCTTCCTCCAACCTATAGAAGGAAGTGAAAGTTCCAG TCTTCCTGGCAAGGGTAAACAGATCCCCTCTCCTCATCCTTCCTCTTTCCTGTCAAGTGCCTCCTTTGGTGAAGGTGACACATCATGTGACCTCTTCAGTGACCACTCTACGGTGTCGGGCCTTGAACTACTACCCC CAGAACATCACCATGAAGTGGCTGAAGGATAAGCAGCCAATGGATGCCAAGGAGTTCGAACCTAAAGACGTATTGCCCAATGGGGATGGGACCTACCAGGGCTGGATAACCTTGGCTGTACCCCCTGGGGAAGAGCA GAGATATACGTGCCAGGTGGAGCACCCAGGCCTGGATCAGCCCCTCATTGTGATCTGGGGTATGTGACTGATGAGAGCCAGGAGCTGAGAAAATCTATTGGGGGTTGAGAGGAGTGCCTGAGGAGGTAATTATGGCA GTGAGATGAGGATCTGCTCTTTGTTAGGGGGTGGGCTGAGGGTGGCAATCAAAGGCTTTAACTTGCTTTTTCTGTTTTAGAGCCCTCACCGTCTGGCACCCTAGTCATTGGAGTCATCAGTGGAATTGCTGTTTTTG TCGTCATCTTGTTCATTGGAATTTTGTTCATAATATTAAGGAAGAGGCAGGGTTCAAGTGAGTAGGAACAAGGGGGAAGTCTCTTAGTACCTCTGCCCCAGGGCACAGTGGGAAGAGGGGCAGAGGGGATCTGGCAT CCATGGGAAGCATTTTTCTCATTTATATTCTTTGGGGACACCAGCAGCTCCCTGGGAGACAGAAAATAATGGTTCTCCCCAGAATGAAAGTCTCTAATTCAACAAACATCTTCAGAGCACCTACTATTTTGCAAGAG CTGTTTAAGGTAGTACAGGGGCTTTGAGGTTGAGAAGTCACTGTGGCTATTCTCAGAACCCAAATCTGGTAGGGAATGAAATTGATAGCAAGTAAATGTAGTTAAAGAAGACCCCATGAGGTCCTAAAGCAGGCAGG AAGCAAATGCTTAGGGTGTCAAAGGAAAGAATGATCACATTCAGCTGGGGATCAAGATAGCCTTCTGGATCTTGAAGGAGAAGCTGGATTCCATTAGGTGAGGTTGAAGATGATGGGAGGTCTACACAGACGGAGCA ACCATGCCAAGTAGGAGAGTATAAGGCATACTGGGAGATTAGAAATAATTACTGTACCTTAACCCTGAGTTTGCGTAGCTATCACTCACCAATTATGCATTTCTACCCCCTGAACATCTGTGGTGTAGGGAAAAGAGEBI Resources Introductory Course AATCAGAAAGAAGCCAGCTCATACAGAGTCCAAGGGTCTTTTGGGATATTGGGTTATGATCACTGGGGTGTCATTGAAGGATCCTAAGAAAGGAGGACCACGATCTCCCTTATATGGTGAATGTGTTGTTAAGAAGT TAGATGAGAGGTGAGGAGACCAGTTAGAAAGCCAATAAGCATTTCCAGATGAGAGATAATGGTTCTTGAAATCCAATAGTGCCCAGGTCTAAATTGAGATGGGTGAATGAGGAAAATAAGGAAGAGAGAAGAGGCAAZaragoza, Spain, 17-18 September 2013 GATGGTGCCTAGGTTTGTGATGCCTCTTTCCTGGGTCTCTTGTCTCCACAGGAGGAGCCATGGGGCACTACGTCTTAGCTGAACGTGAGTGACACGCAGCCTGCAGACTCACTGTGGGAAGGAGACAAAACTAGAGA CTCAAAGAGGGAGTGCATTTATGAGCTCTTCATGTTTCAGGAGAGAGTTGAACCTAAACATAGAAATTGCCTGACGAACTCCTTGATTTTAGCCTTCTCTGTTCATTTCCTCAAAAAGATTTCCCCATTTAGGTTTC! Goal

To provide access to genome-scale data from completely sequenced species of scientific interest from across the

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 History

• 1999: Start of Ensembl project for the Project • 2000: First release of data and web interface • 2009: First release of Ensembl Genomes • 2013: Ensembl v73: 71 vertebrate genomes (+ 6 others) • Ensembl Genomes v20: 9225 non-vertebrate genomes

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Ensembl vs Ensembl Genomes

• Vertebrates • (49x) (71x, 63x fully supported) • (26x) • (41x) • (22x) • (>9000x)

• http://www.ensembl.org • http://www.ensemblgenomes.org

• Annotation in-house by the • Annotation by or in collaboration Ensembl project with the scientific community

• European Bioinformatics • European Bioinformatics Institute Institute & Wellcome Trust Sanger Institute

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Ensembl species of agricultural importance

• alpaca, chicken, cod, cow, duck, horse, pig, sheep, turkey

• banana, , brassica rapa, foxtail millet, grape, , potato, rice, sorghum, soybean, , wheat

• … but also …

• pathogenic fungi, and

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Data

• Genomic sequence • / transcript / models • External references • Mapped cDNAs, , microarray probes, BAC clones, cytogenetic bands, repeats, markers etc. etc. • Variation data: • sequence variants, structural variants, phenotypes • Comparative data: • gene trees, orthologues, paralogues • whole genome alignments, syntenic regions • Regulatory data: • “best guess” set of regulatory elements

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Ensembl Genebuild

Genome assembly + Ensembl Genebuild pipeline Experimental evidence (cDNAs, proteins and RNAseq)

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Access to data

• Web browser http://www.ensembl.org (with US West, US East and Asia mirrors and Pre! and Archive! sites) http://www.ensemblgenomes.org

• BioMart http://www.biomart.org

• FTP ftp.ensembl.org/pub/ ftp.ensemblgenomes.org/pub/

• Public MySQL server ensembldb.ensembl.org:5306:anonymous mysql.ebi.ac.uk:4157:anonymous

• Perl API http://www.ensembl.org/info/docs/api/

• REST API http://beta.rest.ensembl.org http://beta.rest.ensemblgenomes.org

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Demo 2 - Browser basics

Background:

The melanocyte-stimulating hormone receptor (MSHR or MC1R) is involved in coat colour determination in cow by controlling the relative amounts of eumelanin (black/brown) and phaeomelanin (red/yellow) (Klungland et al. The role of melanocyte-stimulating hormone (MSH) receptor in bovine coat color determination. Mamm Genome. 1995 Sep;6(9):636-9.).

Task:

Explore the gene encoding MSHR in cow.

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Visualising your own data

Upload data • Data saved by Ensembl • 5 MB limit (and therefore not possible for large file formats)

Attach remote file • URL-based (HTTP or FTP) • No size limit

• Uploaded / attached tracks can be saved (to account) • Uploaded / attached tracks can be shared with other users • Only trivial security, do not use for sensitive data!

• http://www.ensembl.org/info/website/upload/index.html

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Visualising your own data - Possible formats

• BAM sequence alignments (no upload) • BED genes / features • BedGraph continuous-valued data • BigBed genes / features (no upload) • BigWig continuous-valued data (no upload) • TrackHub collection of tracks • Gbrowse genes / features • GFF/GTF genes / features • PSL sequence alignments • VCF variants (no upload) • WIG continuous-valued data

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Demo 3 - Visualising your own data

Background:

The file http://www.ebi.ac.uk/~bert/pig_rna.bam contains alignments of Illumina HiSeq2000

RNAseq reads from RNA from pig oocytes. A .bam.bai index file is located in the same directory as the .bam file.

Task:

Attach the .bam file to Ensembl and check the expression of ZAR1 (zygote arrest 1), an oocyte- specific gene that appears to function at the oocyte-to-gamete transition.

The .bam and .bai files were kindly provided by drs. Jason W. Ross, Max F. Rothschild, Rebecca Boddicker, Cai-Xia Yang and Zhi-Qiang Du, Department of Science and Center for Integrated Animal Genomics, Iowa State University.

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013

Variant Effect Predictor (VEP)

• Predicts functional consequences of known and unknown variants • For substitutions, insertions, deletions and structural variants • Web interface (for up to 750 variants), standalone Perl script, Perl API and REST API • http://www.ensembl.org/info/docs/variation/vep/index.html

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Demo 4 - Variant Effect Predictor

Background:

Variants in the bestrophin 1 (VMD2) gene are associated with various retinal disorders in man. Dog is used as a model to study these. The file http://www.ebi.ac.uk/~bert/dog_variants.txt contains a number of new variants discovered in the bestrophin 1 gene of a Lapponian Herder.

Task:

Determine the effect of the above variants on the dog VMD2 gene.

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 BioMart

• Data retrieval tool • Originally developed for Ensembl (EnsMart) • Now used by many large data resources • Integrated with several widely used software packages, e.g. Galaxy (https://main.g2.bx.psu.edu/) • Central portal: http://www.biomart.org

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 BioMart - 4 Steps

1. Dataset Choose your dataset and species

Example: All rice genes

2. Filters Filter your dataset

Example: Only genes that map to a list of differentially regulated microarray probes

3. Attributes Specify what information you want to output

Example: Ensembl Gene ID, name, description, probe ID

4. Results Preview and output your results

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Demo 5 - BioMart

Background:

The file http://www.ebi.ac.uk/~bert/rice_probes.txt contains a list of differentially regulated Affymetrix Rice GeneChip® probe sets as described in the paper Root-Specific Transcript Profiling of Contrasting Rice Genotypes in Response to Salinity Stress (Cotsaftis et al. Mol . 2011 Jan; 4(1):25-41).

Task:

Generate a list of the genes to which the above probe sets map. Include the Ensembl Gene ID, name, description and probe ID.

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Help

• Helpdesk [email protected] [email protected]

• Mailing lists http://www.ensembl.org/info/about/contact/mailing.html http://plants.ensembl.org/info/about/contact/mailing.html

• Ensembl YouTube and YouKu (优 酷网) channels http://www.youtube.com/user/EnsemblHelpdesk http://u.youku.com/user_show/uid_Ensemblhelpdesk

• EBI Train Online • http://www.ebi.ac.uk/training/online/

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013

Staying in touch

• Blog • http://www.ensembl.info

• Facebook • http://www.facebook.com/Ensembl.org

• Twitter • http://twitter.com/Ensembl

EBI Resources Introductory Course Zaragoza, Spain, 17-18 September 2013 Paul Flicek, Stephen M. J. Searle, Ikhlak Ahmed, Bronwen L. Aken, M. Ridwan Amode, Daniel Barrell, Kathryn Beal, Ewan Birney, Simon Brent, Denise Carvalho-Silva, Peter Clapham, Guy Coates, Fiona Cunningham, Ian Dunham, Susan Fairley, Stephen Fitzgerald, Laurent Gil, Carlos García-Girón, Leo Gordon, Jennifer Harrow, Javier Herrero, Thibaut Hourlier, Tim J. P. Hubbard, Sarah Hunt, Nathan Johnson, Thomas Juettemann, Andreas K. Kähäri, Stephen Keenan, Rhoda Kinsella, Monika Komorowska, Eugene Kulesha, Ian Longden, Thomas Maurel, William M. McLaren, Matthieu Muffato, Rishi Nag, Bert Overduin, Anne Parker, Miguel Pignatelli, Bethan Pritchard, Emily Pritchard, Harpreet Singh Riat, Graham R. S. Ritchie, Magali Ruffier, Michael Schuster, Daniel Sheppard, Daniel Sobral, Giulietta Spudich, Kieron Taylor, Anja Thormann, Stephen Trevanion, Simon White, Steven P. Wilder, Andy Yates, Amonida Zadissa

Most recent general paper: Flicek et al. Ensembl 2013. Nucl. Acids Res. doi: 10.1093/nar/gks1236 (2012). Funding

European Commission Framework Programme 7 Paul Kersey, James Allen, Dan Bolser, Mikkel Christensen, Paul Davies, Paul Derwent, Lee Falin, Christoph Grabmüller, Malcolm Hinsley, Kevin Howe, Daniel Hughes, Jay Humphrey, Arnaud Kerhornou, Yulia Kobhova, Eugene Kulesha, Nick Langridge, Dan Lawson, Manolis Lyrakis, Uma Maheswari, Gareth Maslen, Mark McDowall, Michael Nuhn, Chuang Kee Ong, Michael Paulini, Helder Pedro, Claudia Rato da Silva, Dan Staines, Mary-Ann Tuli, Brandon Walts, Gareth Williams

Most recent general paper: Kersey et al. Ensembl Genomes: an integrative resource for genome-scale data from non-vertebrate species. Nucl. Acids Res. doi: 10.1093/nar/gkr895 (2012). Funding

European Commission Framework Programme 7