Guide to Data Resources

Guide to Data Resources

The EMBL-European Bioinformatics Institute Guide to data resources Copyright © 2012 EMBL-European Bioinformatics Institute Cover image courtesy of Samuel Kerrien The European Bioinformatics Institute Guide to data resources Contents About this guide 2 Genes and genomes 2 Gene expression 3 Proteins and proteomics 4 Macromolecular structures 4 Small molecules 5 Enzymes and reactions 5 Interactions, pathways and networks 6 Patent sequences 6 Literature and text mining 7 Ontologies 7 Tools 8 Genes and genomes About us Next-generation sequencing has transformed the task of collecting DNA The EMBL-European Bioinformatics Institute (EMBL-EBI) maintains one sequence data. We no longer sequence just single genomes but also sets of of the world’s most comprehensive related genomes in structured studies. This has made possible the 1000 collections of freely available Genomes Project, an international initiative to build a comprehensive biological databases. Our free tools catalogue of all common human variation. We also capture genetic variation allow you to analyse this information of medical relevance. Such data require careful attention to access control, and share it with the research ensuring that only authorised scientists can use them, and that they do so in a community. way that reflects the spirit of consent and confidentiality. About this guide Ensembl Ensembl produces and maintains both automatic This document provides an overview www.ensembl.org and manually curated annotation on selected of EMBL-EBI’s core resources. You eukaryotic genomes. Automatic annotation can find overviews of these resources is based on mRNA and protein information. in Train online. Ensembl provides valuable insights into variation www.ebi.ac.uk/training/online within and between species, and allows you to compare whole genomes to identify conserved Support elements. It is integrated with several other Detailed help documentation important molecular resources, for example is available through each of the UniProt, and can be accessed programmatically. resources described here. You can Ensembl is developed as a joint project between also find answers to some of your EMBL-EBI and the Wellcome Trust questions through our homepage: Sanger Institute. www.ebi.ac.uk/help Ensembl Genomes Ensembl Genomes is a portal to genome-scale Training www.ensemblgenomes.org data from bacteria, protists, fungi, plants and invertebrate metazoa, through a unified set EMBL-EBI offers training at all of interactive and programmatic interfaces. levels. Our hands-on training courses Domain-centric resources (focused on particular are held regularly on the Genome areas of scientific interest) are developed in Campus in Hinxton, UK and at host institutions throught the world. collaboration with the appropriate scientific Train online guides you through communities, and integrated in the broad many bioinformatics resources, in taxonomic context through comparative analysis your own time and at your own pace. and standardised annotation. www.ebi.ac.uk/training European Nucleotide ENA, a member of the International Sequence Stay in touch Archive Database Collaboration, contains all the www.ebi.ac.uk/ena nucleotide sequences in the public domain You can find EMBL-EBI news, service and consolidates data from EMBL-Bank, the updates and training opportunities on European Trace Archive and the Sequence Read our website, Twitter and Facebook. Archive. The EBI Search returns results from www.ebi.ac.uk selected annotated sections of the ENA. @emblebi European Genome- EGA allows you to explore datasets from numerous genotype experiments – including /EMBLEBI phenome Archive www.ebi.ac.uk/ega case-control, population and family studies – that are supplied by a range of data providers. Metagenomics Portal Our Metagenomics service is an automated www.ebi.ac.uk/metagenomics pipeline for the analysis and archiving of metagenomic data, and provides insights into the functional and metabolic potential of a sample. Search www.ebi.ac.uk Our search service is your gateway to a world of information spanning DNA, genes, functional genomics, proteins, structures, small molecules, enzymes, interactions, pathways, scientific publications and patent sequences. 2 Leanne, Gene expression discovery biologist Genome-wide expression assays, originally using microarrays and more recently high-throughput sequencing, can answer specific biological questions and provide reference data sets. The associated large-scale expression datasets can be used to answer questions that might not be related to the study for which the data were originally generated. For example, a gene expression study that reveals differentially expressed genes characteristic of a particular type of cancer may also reveal candidate genes for therapeutics development. EMBL-EBI’s functional genomics resources provide easy access to gene expression data and related information. Our visualisation tools collate information from multiple data sets and present it in an intuitive way. Leanne works on discovery and validation of candidate drug targets in a pharmaceutical company. She explains: ‘I do a lot of cloning and expression analysis. I need to find published cDNA sequences and their orthologues, so that I can find appropriate models in which to validate my targets. For discovery, I’d like a simple way to search across all public-domain biological information. A colleague mentioned a gene to me in passing recently and I only caught the name of it. I’d like to be able to find out more about it without having to forage around in lots of different publications.’ ArrayExpress The MIAME-standard compliant ArrayExpress www.ebi.ac.uk/arrayexpress Archive stores functional genomics experiments EBI resources for performed using RNA-Seq/ChIP-Seq and Leanne: array-based technologies. EBI Search for quick summary Expression Atlas The Expression Atlas allows you to search for reports on specific genes. www.ebi.ac.uk/gxa gene expression changes measured in various cell BLAST to root out types, organism parts, disease states and many vector contamination. other biological and experimental conditions. It represents a curated subset of the ArrayExpress The Expression Atlas for Achive experiments, and can be downloaded for transcription profiles and local use. functional genomics. UniProt to learn more about R-Workbench The R-Workbench provides access to R and protein function. www.ebi.ac.uk/Tools/rcloud Bioconductor in the EBI cloud, and rapid access to data stored in EMBL-EBI databases such as ChEMBL for druggability of ArrayExpress. You can upload your own data and target molecules. use EMBL-EBI’s infrastructure to perform your data analyses. 3 Francesco, marine biologist Proteins and proteomics Our protein and nucleotide data are extensively linked, both at the level of underlying data and through the coordination of web resources. This helps us capture, organise and interpret sequence-related data, providing information freely in a variety of formats including user-friendly web sites. Wherever possible, we collaborate with other groups worldwide with similar aims. UniProt This comprehensive resource organises protein www.uniprot.org sequence information so that you can find splice variants, functional information and links to related Francesco is Head of Department resources from a single entry. UniProt is a joint in a new collaborative centre effort with the Swiss Institute of Bioinfomatics and for marine biology. His group is Georgetown University. It comprises four components: doing metagenomic studies of the UniProt Knowledgebase (UniProtKB), extreme marine environments but Reference Clusters (UniRef), Archive (UniParc) his department has just secured and Metagenomic and Environmental Sequences funding to look for novel wound- (UniMES). UniProtKB is the hub for the collection healing compounds in marine of functional information on proteins, with accurate, algae. “We’re about to scale up our consistent and rich annotation. sequencing efforts severalfold. We need to put some pipelines in place InterPro InterPro classifies proteins into families and predicts that will enable us to make sense of www.ebi.ac.uk/interpro the presence of important domains and sites. all the data that we’re generating. InterProScan allows you to query your sequence The extremophile project is a against the database. collaborative effort, and we want to be able to share information with PRIDE The Proteomics Identifications Database is a our research network.” www.ebi.ac.uk/pride centralised, standards-compliant, public repository for proteomics data. It contains protein and peptide EBI resources for identifications and their associated supporting Francesco: evidence. PRIDE also captures details of post-translational modifications. European Nucleotide Archive for DNA barcodes to identify organisms in his samples. His team could set up a pipeline Macromolecular structures to submit their data to ENA and Three-dimensional structures give us mechanistic insight into how make it publicly available. macromolecules work, and help explain how their functions are disrupted by Ensembl and Ensembl Genomes mutation or interaction with small molecules. Our structural data resources to characterise organisms by are engineered to reflect the needs of a growing and diverse structural comparing them with a wealth of biology community. information on the genomes of many fully sequenced organisms. PDBe The Protein Data Bank in Europe (PDBe), part of the InterProScan to help identify www.ebi.ac.uk/pdbe

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us