European Nucleotide Archive (ENA) and Ensembl Genome Browser
Denise Carvalho-Silva
Bioinformatics Roadshow
Rotterdam May 9th 2012 Presentation
http://www.ebi.ac.uk/~denise/ Rotterdam_May_2012/ This morning
Topic/time 09:00 09:30 10:30 11:00 11:30 12:00
ENA (intro)
ENA (train online)
Coffee Break
Ensembl browser (intro)
Ensembl browser (demo) European Nucleotide Archive (ENA) Outline
• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Outline
• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Introduction
ENA provides a comprehensive, accessible and publicly available repository for nucleotide sequence data
One can search/download data Submit their own data
Submitted data are validated by automated QC, manual inspection and curation 1980: EMBL Data Library (EMBL Heidelberg, Germany) World’s first public database of nucleotide sequences
EMBL European Trace SRA Bank Archive
1995 2008 2008
annotated Raw data Raw data assembled (electrophoresis (NextGenSeq) sequences based machines) Data architecture
ENA provides access to
the whole scale of EMBL-Bank EMBL-Bank
sequencing information
from raw reads to high
SRA level functional annotation Trace archive archive Trace INSDC
• International Nucleotide Sequence Database Collaboration • Consists of ENA, NCBI GenBank and DNA Data Bank of Japan
• Databases are synchronized on a daily basis • www.insdc.org Database growth*
EMBL bank SRA
*23/04/12 Outline
• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Search and download data
www.ebi.ac.uk/ena/about/search_and_browse Free text and sequence search
www.ebi.ac.uk/ena/ Advanced sequence search options
www.ebi.ac.uk/ena/search/#Search Programmatic data access to ENA
Formats for data retrieval: FASTA, FASTQ, flat file, HTML, XML
www.ebi.ac.uk/ena/about/browser EBI search tool
www.ebi.ac.uk EBI search tool (results) Outline
• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements Demo - EMBL-Bank
Retrieve and browse the mitochondrial genome of the cave bear (Ursus spelaeus). © Mo Hassan
Demo – SRA
Retrieve and browse the data for this study.
© Robin Hoffman Outline
• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online and help • Acknowledgements Submitting data to ENA
• Journals and funders may require sequences to be submitted to an INSDC database prior to publication • Submission to one INSDC database only • Unique accession numbers are assigned • Submitted data can be made public immediately or kept private until the work is published • Once public, submitted data will be exchanged among INSDC partners • Data belong to the submitter and can only be updated with submitter consent
Submissions
• Manual and automated
• Use Webin (interactive web submission system) for new sequencing projects, assembled sequences and annotation
• For other types of data, different channels are available (e.g. [email protected], FTP, RESTful web-based service)
• www.ebi.ac.uk/ena/about/submit_and_update Outline
• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements What can you do with ENA?
• Archive and distribute your sequence data to wider audiences • Share your pre-publication data
• Reduce your local hardware requirements for archiving NGS data
• Report novel annotation relating to existing sequence data
• Locate, retrieve and aggregate existing sequence data
• Browse existing sequence and annotation • Find all sequences/annotation available for your gene of interest
• Find out what is known about your sequence of interest
• Link through from nucleotide data to a host of integrated resources such as Ensembl, UniProt, InterPro Outline
• Introduction • Searching data ENA • Demo (EMBL bank and SRA) • Submitting data to ENA • Summary • Train online, exercises and help • Acknowledgements EBI Train online (beginners)
http://www.ebi.ac.uk/training/online/course/european-nucleotide-archive-quick-tour
EBI Train online (intermediate)
http://www.ebi.ac.uk/training/online/course/nucleotide-sequence-data-resources-ebi Exercises
http://www.ebi.ac.uk/training/online/course/ nucleotide-sequence-data-resources-ebi
• Guided examples of using ENA
• Finding orang-utan sequence data
• Finding information on transgenic sequences Help
• Data submissions, helpdesk, enquiries:
• Updates, publication notifications:
Acknowledgements
Clara Amid, Ewan Birney, Lawrence Bower, Ana Cerdeño-Tárraga, Ying Cheng, Iain Cleland, Nadeem Faruque, Richard Gibson, Neil Goodgame, Christopher Hunter, Mikyung Jang, Rasko Leinonen, Xin Liu, Arnaud Oisel, Nima Pakseresht, Sheila Plaister, Rajesh Radhakrishnan, Kethi Reddy, Stephane Rivière, Marc Rossello, Alexander Senf, Dimitriy Smirnov, Petra Ten Hoopen, Daniel Vaughan, Robert Vaughan, Vadim Zalunin and Guy Cochrane
EMBL, EU, Wellcome Trust, BBSRC