<<

European Nucleotide Archive (ENA) and Ensembl Browser

Denise Carvalho-Silva

Bioinformatics Roadshow

Rotterdam May 9th 2012 Presentation

http://www.ebi.ac.uk/~denise/ Rotterdam_May_2012/ This morning

Topic/time 09:00 09:30 10:30 11:00 11:30 12:00

ENA (intro)

ENA (train online)

Coffee Break

Ensembl browser (intro)

Ensembl browser (demo) European Nucleotide Archive (ENA) Outline

• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Outline

• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Introduction

ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data

One can search/download data Submit their own data

Submitted data are validated by automated QC, manual inspection and curation 1980: EMBL Data Library (EMBL Heidelberg, Germany) World’s first public database of nucleotide sequences

EMBL European Trace SRA Bank Archive

1995 2008 2008

annotated Raw data Raw data assembled (electrophoresis (NextGenSeq) sequences based machines) Data architecture

ENA provides access to

the whole scale of EMBL-Bank EMBL-Bank

information

from raw reads to high

SRA level functional annotation Trace archive archive Trace INSDC

• International Nucleotide Collaboration • Consists of ENA, NCBI GenBank and DNA Data Bank of Japan

• Databases are synchronized on a daily basis • www.insdc.org Database growth*

EMBL bank SRA

*23/04/12 Outline

• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Search and download data

www.ebi.ac.uk/ena/about/search_and_browse Free text and sequence search

www.ebi.ac.uk/ena/ Advanced sequence search options

www.ebi.ac.uk/ena/search/#Search Programmatic data access to ENA

Formats for data retrieval: FASTA, FASTQ, flat file, HTML, XML

www.ebi.ac.uk/ena/about/browser EBI search tool

www.ebi.ac.uk EBI search tool (results) Outline

• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Demo - EMBL-Bank

Retrieve and browse the mitochondrial genome of the cave bear (Ursus spelaeus). © Mo Hassan

Demo – SRA

Retrieve and browse the data for this study.

© Robin Hoffman Outline

• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online and help • Acknowledgements Submitting data to ENA

• Journals and funders may require sequences to be submitted to an INSDC database prior to publication • Submission to one INSDC database only • Unique accession numbers are assigned • Submitted data can be made public immediately or kept private until the work is published • Once public, submitted data will be exchanged among INSDC partners • Data belong to the submitter and can only be updated with submitter consent

Submissions

• Manual and automated

• Use Webin (interactive web submission system) for new sequencing projects, assembled sequences and annotation

• For other types of data, different channels are available (e.g. [email protected], FTP, RESTful web-based service)

• www.ebi.ac.uk/ena/about/submit_and_update Outline

• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements What can you do with ENA?

• Archive and distribute your sequence data to wider audiences • Share your pre-publication data

• Reduce your local hardware requirements for archiving NGS data

• Report novel annotation relating to existing sequence data

• Locate, retrieve and aggregate existing sequence data

• Browse existing sequence and annotation • Find all sequences/annotation available for your gene of interest

• Find out what is known about your sequence of interest

• Link through from nucleotide data to a host of integrated resources such as Ensembl, UniProt, InterPro Outline

• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements EBI Train online (beginners)

http://www.ebi.ac.uk/training/online/course/european-nucleotide-archive-quick-tour

EBI Train online (intermediate)

http://www.ebi.ac.uk/training/online/course/nucleotide-sequence-data-resources-ebi Exercises

http://www.ebi.ac.uk/training/online/course/ nucleotide-sequence-data-resources-ebi

• Guided examples of using ENA

• Finding orang-utan sequence data

• Finding information on transgenic sequences Help

• Data submissions, helpdesk, enquiries:

[email protected]

• Updates, publication notifications:

[email protected]

Acknowledgements

Clara Amid, , Lawrence Bower, Ana Cerdeño-Tárraga, Ying Cheng, Iain Cleland, Nadeem Faruque, Richard Gibson, Neil Goodgame, Christopher Hunter, Mikyung Jang, Rasko Leinonen, Xin Liu, Arnaud Oisel, Nima Pakseresht, Sheila Plaister, Rajesh Radhakrishnan, Kethi Reddy, Stephane Rivière, Marc Rossello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane

EMBL, EU, Wellcome Trust, BBSRC