Biomart–Biological Queries Made Easy Damian Smedley, Syed Haider, Benoit Ballester, Richard Holland, Darin London, Gudmundur Thorisson, Arek Kasprzyk

Total Page:16

File Type:pdf, Size:1020Kb

Biomart–Biological Queries Made Easy Damian Smedley, Syed Haider, Benoit Ballester, Richard Holland, Darin London, Gudmundur Thorisson, Arek Kasprzyk BioMart–biological queries made easy Damian Smedley, Syed Haider, Benoit Ballester, Richard Holland, Darin London, Gudmundur Thorisson, Arek Kasprzyk To cite this version: Damian Smedley, Syed Haider, Benoit Ballester, Richard Holland, Darin London, et al.. BioMart– biological queries made easy. BMC Genomics, BioMed Central, 2009, 10 (1), pp.22. 10.1186/1471- 2164-10-22. hal-01615162 HAL Id: hal-01615162 https://hal-amu.archives-ouvertes.fr/hal-01615162 Submitted on 19 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License BMC Genomics BioMed Central Software Open Access BioMart – biological queries made easy Damian Smedley1, Syed Haider1, Benoit Ballester1, Richard Holland1, Darin London2, Gudmundur Thorisson3 and Arek Kasprzyk*4 Address: 1European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK, 2Institute for Genome Sciences & Policy (IGSP), Duke University CIEMAS, 101 Science Drive, DUMC Box 3382, Durham, NC 27708, USA, 3Department of Genetics, University of Leicester, University Road, Leicester, LE1 7RH, UK and 4Ontario Institute for Cancer Research, MaRS Centre, South Tower, 101 College Street, Suite 800 Toronto, Ontario, M5G 0A3, Canada Email: Damian Smedley - [email protected]; Syed Haider - [email protected]; Benoit Ballester - [email protected]; Richard Holland - [email protected]; Darin London - [email protected]; Gudmundur Thorisson - [email protected]; Arek Kasprzyk* - [email protected] * Corresponding author Published: 14 January 2009 Received: 22 September 2008 Accepted: 14 January 2009 BMC Genomics 2009, 10:22 doi:10.1186/1471-2164-10-22 This article is available from: http://www.biomedcentral.com/1471-2164/10/22 © 2009 Smedley et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Abstract Background: Biologists need to perform complex queries, often across a variety of databases. Typically, each data resource provides an advanced query interface, each of which must be learnt by the biologist before they can begin to query them. Frequently, more than one data source is required and for high-throughput analysis, cutting and pasting results between websites is certainly very time consuming. Therefore, many groups rely on local bioinformatics support to process queries by accessing the resource's programmatic interfaces if they exist. This is not an efficient solution in terms of cost and time. Instead, it would be better if the biologist only had to learn one generic interface. BioMart provides such a solution. Results: BioMart enables scientists to perform advanced querying of biological data sources through a single web interface. The power of the system comes from integrated querying of data sources regardless of their geographical locations. Once these queries have been defined, they may be automated with its "scripting at the click of a button" functionality. BioMart's capabilities are extended by integration with several widely used software packages such as BioConductor, DAS, Galaxy, Cytoscape, Taverna. In this paper, we describe all aspects of BioMart from a user's perspective and demonstrate how it can be used to solve real biological use cases such as SNP selection for candidate gene screening or annotation of microarray results. Conclusion: BioMart is an easy to use, generic and scalable system and therefore, has become an integral part of large data resources including Ensembl, UniProt, HapMap, Wormbase, Gramene, Dictybase, PRIDE, MSD and Reactome. BioMart is freely accessible to use at http:// www.biomart.org. Background to test and drive their research hypotheses. Typically, each In this post-genomics era, data of increasing volume and data source provides an advanced query interface on their complexity is being deposited into databases around the website to satisfy this requirement. However, each site has world. Biologists need to ask complex queries of this data its own solution and subsequently, the user has a learning Page 1 of 12 (page number not for citation purposes) BMC Genomics 2009, 10:22 http://www.biomedcentral.com/1471-2164/10/22 curve before they can start interacting with the data. A fur- tools and cross platform compatibility. These include ther problem the researcher has is that they often need to model organism databases such as Gramene [11], Dicty- query more than one data source, necessitating mastering base [12], Wormbase [13] and RGD (Rat Genome Data- more than one interface and having to cut and paste base) [14] as well as HapMap variation [15], pancreatic results between the sites. If the analysis involves high- expression database [16], Reactome pathways [17] and throughput data, this approach is not usually scalable. To PRIDE proteomic [18] databases (see Table 1 for the full overcome this problem, many groups rely on bioinforma- list). A wide variety of analyses and tasks are possible from ticians who can generate scripts to interact with the vary- the publicly available BioMarts, ranging from SNP (single ing programmatic interfaces of the different data sources. nucleotide polymorphism) selection for candidate gene They also often have to learn a number of different web screening, microarray annotation, cross-species analysis, services or application programmatic interfaces (APIs) for through to recovery of disease links, sequence variations each resource. A preferable solution would be to have and expression patterns. generic software that a biologist can use on top of any data source. BioMart[1] is such a solution. The range of interfaces is designed with both biologists and bioinformaticians in mind. The simplest way of que- BioMart is an open source data management system that rying BioMart is via the web interface called MartView comes with a range of query interfaces that allow users to (either on our central portal [2] or follow the links on our group and refine data based upon many different criteria. main page [1] to the individual sites). Programmatic In addition, the software features a built-in query opti- access is available via a Perl API or BioMart's web services miser for fast data retrieval. A BioMart installation can (MartServices). An important and novel feature of provide domain-specific querying of a single data source BioMart is that it offers "scripting at the click of a button". or function as a one-stop shop (web portal) to a wide A user can generate an API or MartServices script by build- range of BioMarts as our central portal [2] does. All ing up a query on the MartView website followed by a BioMart websites have the same look and feel (only vary- simple click of a button. All the interfaces allow the user ing in colour scheme and branding), which has obvious to build up biological queries by first selecting dataset(s) advantages to users moving between different resources. of interest, then the data to view and/or save (attributes), However, the power of the system comes from integrated some optional restrictions (filters) on the query and querying of the different BioMarts. If any datasets share finally the format for the data. common identifiers (such as Ensembl gene IDs or Uni- prot IDs) or even mappings to a common genome assem- Implementation bly, these can be used to link BioMarts together in Here we will describe a top-level view of the BioMart sys- integrated queries. Additionally, these datasets do not tem as the focus of this paper is on practical use of have to be located on the same server or even at the same BioMart rather than implementation and deployment. geographical location. This distributed solution has many Further documentation on these aspects can be found on advantages; not least of which is the fact that each site can the BioMart website http://www.biomart.org. BioMart is utilise their own domain expertise to deploy their designed around a simple, three tier architecture: BioMart. (i) The first tier consists of one or more relational data- BioMart also has the advantage of being integrated with bases. Each database may consist of one or more "marts", external software packages such as BioConductor [3], the which are schema compliant with BioMart definitions. Distributed Annotation System (DAS) [4], Galaxy [5], Each "mart" may hold a number of different datasets. The Cytoscape [6], Taverna [7]. This enables users to perform BioMart data model is a denormalised, query optimised integrated queries with non-BioMart data sources as well schema and can be deployed using Oracle, MySQL or as detailed analysis of the results. BioMart is also part of Postgres relational database management systems. Each the GMOD (Generic Model Organism Database) [8] suite dataset uses a reversed star model [10] where data map- of tools for building a model organism site. ping 1:1 with the central object being modelled is stored in a main table. Data mapping 1:n is stored in one or Originally developed for the Ensembl genome browser [9] more satellite, dimension tables. Two tools are provided as the EnsMart data warehouse [10], BioMart has now to build and configure the mart databases in the first tier: become a fully generic data integration solution. Although applicable to any type of data, BioMart is partic- • MartBuilder, to construct SQL statements that will trans- ularly suited for advanced searching of the complex form your schema into a mart.
Recommended publications
  • Description of the Biomart Package
    Description of the biomaRt package Steffen Durinck‡∗, Wolfgang Huber¶†, Yves Moreau‡, Bart De Moor‡ July 25, 2006 ‡Department of Electronical Engineering, ESAT-SCD, K.U.Leuven, Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium, http://www.esat.kuleuven.ac.be/~dna/BioI and ¶European Bioinformatics Institute, Hinxton, UK Contents 1 Introduction 2 2 objects 2 2.1 Mart-class..................................... 2 3 Simple biomaRt functions for frequently used queries to Ensembl 3 3.1 Selecting a BioMart database to use . 3 3.1.1 useMart .................................. 3 3.1.2 martDisconnect ............................. 3 3.2 Annotating identifiers with gene information . 4 3.2.1 GOannotation .............................. 6 3.2.2 OMIMannotation ............................ 7 3.2.3 INTERPRO protein domains . 7 3.3 Homologymapping................................ 8 3.4 Identify subsets of genes for further analysis with the getFeature function . 8 3.5 Sequenceinformation............................... 9 3.6 Single Nucleotide Polymorphisms . 9 3.7 Moreexoticfunctions............................... 10 3.7.1 getPossibleXrefs ............................ 10 3.7.2 getXref .................................. 10 ∗Steff[email protected][email protected] 1 4 Advanced data retrieval with BioMart API functions 10 4.1 listMarts...................................... 11 4.2 useMart ...................................... 11 4.3 listDatasets .................................... 11 4.4 useDataset..................................... 12 4.5 Filter, Values and Attributes . 12 5 Local BioMart databases 13 1 Introduction The BioConductor biomaRt package provides and API in R to query BioMart databases such as Ensembl (http://www.ensembl.org), a software system which produces and maintains au- tomatic annotation on metazoan genomes. Two sets of functions are currently implemented. A first set of functions is tailored towards Ensembl and are a set of commonly used queries in microarray data analysis.
    [Show full text]
  • The Biomart User's Guide
    The biomaRt user's guide Steffen Durinck,∗ Wolfgang Hubery September 15, 2015 Contents 1 Introduction 2 2 Selecting a BioMart database and dataset 3 3 How to build a biomaRt query 7 4 Examples of biomaRt queries 9 4.1 Task 1: Annotate a set of Affymetrix identifiers with HUGO symbol and chromosomal locations of corresponding genes . 9 4.2 Task 2: Annotate a set of EntrezGene identifiers with GO annotation . 9 4.3 Task 3: Retrieve all HUGO gene symbols of genes that are located on chromosomes 17,20 or Y , and are associated with one the following GO terms: "GO:0051330","GO:0000080","GO:0000114","GO:0000082" (here we'll use more than one filter) . 10 4.4 Task 4: Annotate set of idenfiers with INTERPRO protein domain identifiers . 10 4.5 Task 5: Select all Affymetrix identifiers on the hgu133plus2 chip and Ensembl gene identifiers for genes located on chro- mosome 16 between basepair 1100000 and 1250000. 11 4.6 Task 6: Retrieve all entrezgene identifiers and HUGO gene symbols of genes which have a "MAP kinase activity" GO term associated with it. 11 ∗[email protected] [email protected] 1 4.7 Task 7: Given a set of EntrezGene identifiers, retrieve 100bp upstream promoter sequences . 12 4.8 Task 8: Retrieve all 5' UTR sequences of all genes that are located on chromosome 3 between the positions 185514033 and 185535839 . 13 4.9 Task 9: Retrieve protein sequences for a given list of Entrez- Gene identifiers . 13 4.10 Task 10: Retrieve known SNPs located on the human chro- mosome 8 between positions 148350 and 148612 .
    [Show full text]
  • Variation Data in Ensembl Plants
    Overview of the next two hours... Before tea After tea Session 1, Browser: Session 2, Tools: ● Introduction ● Variant Effect ○ Ensembl Plants and Predictor plant variation data ○ Introduction and ● Hands-on hands-on ○ Variation in the ● Mining variation Ensembl browser data with BioMart ● Displaying your ○ Introduction and data in Ensembl hands-on Plants Resources These slides: http://tinyurl.com/transplant2015 Course booklet: http://tinyurl.com/transplant2015b ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● → → Currently 39 genomes in Ensembl Plants http://plants.ensembl.org Brassicales Fabales Malpighiales Rosales Solanales Malvales Vitales Zingiberales Poales ... ● ● ● Or all the above! ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Genome Browsers Genome Browser Databases • • Baseline Annotation • • • Transposon • discovery • • Gene prediction • • EST Alignment • PAG 2015 Recent updates (Release 27) http://plants.ensembl.org ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● On to variation... ● ● ● ● ● ● ● ● ● ● Species Samps Pops* ~Variations Arabidopsis 1,610 7 14,000,000 thaliana SV: 14,000 Brachypodium 3 3 300,000 ● distachyon Hordeum vulgare 188 16 24,000,000 Oryza glaberrima 21 1 800,000 Oryza indica 19 19 5,000,000 ● Oryza sativa 416 21 6,000,000 ● Solanum 84 8 71,000,000 lycopersicum ● Sorghum bicolor 3 1 SV: 64,000 Triticum aestivum 248 1 700,000 ✱ Vitis vinifera 17 3 500,000 Zea mays 103 1 51,000,000 Variation Data • • PAG 2015 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Viewing Variation Data PAG 2015 Looking at some tomato variation data... EXERCISES Variation in the Ensembl Browser Exercise 1 – Exploring a SNP in Arabidopsis The arabidopsis ATCDSP32 gene is a chloroplastic, 32 kDa, drought-induced stress protein that is proposed to participate in cell redox homeostasis (GO: 0045454). A. How many variants have been identified in the gene that can cause a change in the protein sequence? B.
    [Show full text]
  • Original Article the Reactome Biomart
    Database, Vol. 2011, Article ID bar031, doi:10.1093/database/bar031 ............................................................................................................................................................................................................................................................................................. Original article The Reactome BioMart Robin A. Haw1, David Croft2, Christina K. Yung1, Nelson Ndegwa2,3, Peter D’Eustachio4, Henning Hermjakob2 and Lincoln D. Stein1,5,6,* Downloaded from 1Ontario Institute for Cancer Research, Toronto, ON, M5G0A3, Canada, 2European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, 3Faculty of Life Science, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK, 4NYU School of Medicine, Department of Biochemistry, New York, NY 10016, 5Cold Spring Harbor Laboratory, Bioinformatics and Genomics, Cold Spring Harbor, NY 11724, USA and 6Department of Molecular Genetics, University of Toronto, Toronto, ON, M5S 1A1, Canada *Corresponding author: Tel: +1 416 673 8514; Fax: +1 416 977 1118; Email: [email protected] http://database.oxfordjournals.org/ Submitted 24 March 2011; Revised 20 June 2011; Accepted 21 June 2011 ............................................................................................................................................................................................................................................................................................
    [Show full text]
  • The Biomart User's Guide
    The biomaRt user's guide Steffen Durinck,∗ Wolfgang Hubery October 18, 2010 Contents 1 Introduction 2 2 Selecting a BioMart database and dataset 3 3 How to build a biomaRt query 5 4 Examples of biomaRt queries 7 4.1 Task 1: Annotate a set of Affymetrix identifiers with HUGO symbol and chromosomal locations of corresponding genes . 7 4.2 Task 2: Annotate a set of EntrezGene identifiers with GO annotation . 8 4.3 Task 3: Retrieve all HUGO gene symbols of genes that are located on chromosomes 1,2 or Y , and are associated with one the following GO terms: "GO:0051330","GO:0000080","GO:0000114","GO:0000082" (here we'll use more than one filter) . 9 4.4 Task 4: Annotate set of idenfiers with INTERPRO protein domain identifiers . 9 4.5 Task 5: Select all Affymetrix identifiers on the hgu133plus2 chip and Ensembl gene identifiers for genes located on chro- mosome 16 between basepair 1100000 and 1250000. 10 4.6 Task 6: Retrieve all entrezgene identifiers and HUGO gene symbols of genes which have a "MAP kinase activity" GO term associated with it. 10 ∗steff[email protected] [email protected] 1 4.7 Task 7: Given a set of EntrezGene identifiers, retrieve 100bp upstream promoter sequences . 11 4.8 Task 8: Retrieve all 5' UTR sequences of all genes that are located on chromosome 3 between the positions 185514033 and 185535839 . 12 4.9 Task 9: Retrieve protein sequences for a given list of Entrez- Gene identifiers . 12 4.10 Task 10: Retrieve known SNPs located on the human chro- mosome 8 between positions 148350 and 148612 .
    [Show full text]
  • The Biomart User's Guide
    The biomaRt user's guide Steffen Durinck,∗ Wolfgang Hubery May 4, 2016 Contents 1 Introduction 2 2 Selecting a BioMart database and dataset 3 3 How to build a biomaRt query 5 4 Examples of biomaRt queries 7 4.1 Task 1: Annotate a set of Affymetrix identifiers with HUGO symbol and chromosomal locations of corresponding genes . 7 4.2 Task 2: Annotate a set of EntrezGene identifiers with GO annotation . 8 4.3 Task 3: Retrieve all HUGO gene symbols of genes that are located on chromosomes 17,20 or Y , and are associated with one the following GO terms: "GO:0051330","GO:0000080","GO:0000114","GO:0000082" (here we'll use more than one filter) . 8 4.4 Task 4: Annotate set of idenfiers with INTERPRO protein domain identifiers . 8 4.5 Task 5: Select all Affymetrix identifiers on the hgu133plus2 chip and Ensembl gene identifiers for genes located on chro- mosome 16 between basepair 1100000 and 1250000. 9 4.6 Task 6: Retrieve all entrezgene identifiers and HUGO gene symbols of genes which have a "MAP kinase activity" GO term associated with it. 10 ∗[email protected] [email protected] 1 4.7 Task 7: Given a set of EntrezGene identifiers, retrieve 100bp upstream promoter sequences . 10 4.8 Task 8: Retrieve all 5' UTR sequences of all genes that are located on chromosome 3 between the positions 185514033 and 185535839 . 11 4.9 Task 9: Retrieve protein sequences for a given list of Entrez- Gene identifiers . 11 4.10 Task 10: Retrieve known SNPs located on the human chro- mosome 8 between positions 148350 and 148612 .
    [Show full text]
  • The Biomart Community Portal
    The BioMart community portal: an innovative alternative to large, centralized data repositories Damian Smedley, Syed Haider, Steffen Durinck, Luca Pandini, Paolo Provero, James Allen, Olivier Arnaiz, Mohammad Hamza Awedh, Richard Baldock, Giulia Barbiera, et al. To cite this version: Damian Smedley, Syed Haider, Steffen Durinck, Luca Pandini, Paolo Provero, et al.. The BioMart community portal: an innovative alternative to large, centralized data repositories. Nucleic Acids Research, Oxford University Press, 2015, 43 (W1), pp.W589-W598. 10.1093/nar/gkv350. hal- 01146849 HAL Id: hal-01146849 https://hal-univ-rennes1.archives-ouvertes.fr/hal-01146849 Submitted on 28 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Nucleic Acids Research Advance Access published April 20, 2015 Nucleic Acids Research, 2015 1 doi: 10.1093/nar/gkv350 The BioMart community portal: an innovative alternative to large, centralized data repositories Damian Smedley1, Syed Haider2, Steffen Durinck3, Luca Pandini4, Paolo Provero4,5, James Allen6, Olivier Arnaiz7, Mohammad Hamza Awedh8, Richard Baldock9, Giulia Barbiera4, Philippe Bardou10,TimBeck11, Andrew Blake12, Merideth Bonierbale13, Anthony J. Brookes11, Gabriele Bucci4, Iwan Buetti4, Sarah Burge6,Cedric´ Cabau10, Joseph W. Carlson14, Claude Chelala15, Charalambos Chrysostomou11, Davide Cittaro4, Olivier Collin16, Raul Cordova13,RosalindJ.Cutts15, Erik Dassi17, Alex Di Genova18, Anis Djari19, Anthony Esposito20, Heather Estrella20, Eduardo Eyras21,22, Julio Fernandez-Banet20, Simon Forbes1, Robert C.
    [Show full text]
  • The Biomart User's Guide
    The biomaRt user's guide Steffen Durinck,∗ Wolfgang Hubery October 13, 2014 Contents 1 Introduction 2 2 Selecting a BioMart database and dataset 3 3 How to build a biomaRt query 7 4 Examples of biomaRt queries 9 4.1 Task 1: Annotate a set of Affymetrix identifiers with HUGO symbol and chromosomal locations of corresponding genes . 9 4.2 Task 2: Annotate a set of EntrezGene identifiers with GO annotation . 10 4.3 Task 3: Retrieve all HUGO gene symbols of genes that are located on chromosomes 17,20 or Y , and are associated with one the following GO terms: "GO:0051330","GO:0000080","GO:0000114","GO:0000082" (here we'll use more than one filter) . 10 4.4 Task 4: Annotate set of idenfiers with INTERPRO protein domain identifiers . 10 4.5 Task 5: Select all Affymetrix identifiers on the hgu133plus2 chip and Ensembl gene identifiers for genes located on chro- mosome 16 between basepair 1100000 and 1250000. 11 4.6 Task 6: Retrieve all entrezgene identifiers and HUGO gene symbols of genes which have a "MAP kinase activity" GO term associated with it. 11 ∗[email protected] [email protected] 1 4.7 Task 7: Given a set of EntrezGene identifiers, retrieve 100bp upstream promoter sequences . 12 4.8 Task 8: Retrieve all 5' UTR sequences of all genes that are located on chromosome 3 between the positions 185514033 and 185535839 . 13 4.9 Task 9: Retrieve protein sequences for a given list of Entrez- Gene identifiers . 13 4.10 Task 10: Retrieve known SNPs located on the human chro- mosome 8 between positions 148350 and 148612 .
    [Show full text]
  • The Biomart User's Guide
    The biomaRt user's guide Steffen Durinck,∗ Wolfgang Hubery April 3, 2013 Contents 1 Introduction 2 2 Selecting a BioMart database and dataset 3 3 How to build a biomaRt query 7 4 Examples of biomaRt queries 9 4.1 Task 1: Annotate a set of Affymetrix identifiers with HUGO symbol and chromosomal locations of corresponding genes . 9 4.2 Task 2: Annotate a set of EntrezGene identifiers with GO annotation . 10 4.3 Task 3: Retrieve all HUGO gene symbols of genes that are located on chromosomes 1,2 or Y , and are associated with one the following GO terms: "GO:0051330","GO:0000080","GO:0000114","GO:0000082" (here we'll use more than one filter) . 10 4.4 Task 4: Annotate set of idenfiers with INTERPRO protein domain identifiers . 11 4.5 Task 5: Select all Affymetrix identifiers on the hgu133plus2 chip and Ensembl gene identifiers for genes located on chro- mosome 16 between basepair 1100000 and 1250000. 11 4.6 Task 6: Retrieve all entrezgene identifiers and HUGO gene symbols of genes which have a "MAP kinase activity" GO term associated with it. 12 ∗steff[email protected] [email protected] 1 4.7 Task 7: Given a set of EntrezGene identifiers, retrieve 100bp upstream promoter sequences . 12 4.8 Task 8: Retrieve all 5' UTR sequences of all genes that are located on chromosome 3 between the positions 185514033 and 185535839 . 13 4.9 Task 9: Retrieve protein sequences for a given list of Entrez- Gene identifiers . 13 4.10 Task 10: Retrieve known SNPs located on the human chro- mosome 8 between positions 148350 and 148612 .
    [Show full text]
  • The Biomart Community Portal
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Digital Repository @ Iowa State University Animal Science Publications Animal Science 7-2015 The ioMB art Community Portal: An Innovative Alternative to Large, Centralized Data Repositories Damian Smedley Wellcome Trust Sanger Institute Syed Haider University of Oxford Steffen Durinck Genentech, Inc. Luca Pandini Center for Translational Genomics and Bioinformatics San Raffaele Scientific nI stitut Paolo Provero FCenoltloerw fo rthi Trsa nasndlatio anddal Genitionomalic sw aorknd Bsio aitnf:ohrmttap://ltics Saibn. drRaff.iasaeleta Sciente.edu/atific nI nsts_pubsitut See nePxat pratge of for the addiAtiongraicl aulturuthors e Commons, Animal Sciences Commons, Genetics and Genomics Commons, Health Information Technology Commons, Laboratory and Basic Science Research Commons, and the Medical Sciences Commons The ompc lete bibliographic information for this item can be found at http://lib.dr.iastate.edu/ ans_pubs/169. For information on how to cite this item, please visit http://lib.dr.iastate.edu/ howtocite.html. This Article is brought to you for free and open access by the Animal Science at Digital Repository @ Iowa State University. It has been accepted for inclusion in Animal Science Publications by an authorized administrator of Digital Repository @ Iowa State University. For more information, please contact [email protected]. Authors Damian Smedley, Syed Haider, Steffen Durinck, Luca Pandini, Paolo Provero, James Allen, Olivier Arnaiz, Mohammad Hamza Awedh, Richard Baldock, Giulia Barbiera, Philippe Bardou, Tim Beck, Andrew Blake, Meredith Bonierbale, Anthony J. Brookes, Gabrielle Bucci, Iwan Bueti, Sarah Burge, Cédric Cabua, Joseph W. Carlson, Claude Chelala, Charalambos Chrysostomou, Davide Citaro, Olivier Collin, Raul Cordova, Rosalind J.
    [Show full text]
  • Advanced Genomic Data Mining
    Education Advanced Genomic Data Mining Xose´ M. Ferna´ndez-Sua´rez*, Ewan Birney European Bioinformatics Institute, Ensembl Group, Wellcome Trust Genome Campus, Hinxton, Cambridge, United Kingdom Introduction trieve information through different plat- pathways in the Reactome database). forms such as Galaxy [5] and the biomaRt Click on the ‘count’ button at the top to As data banks increase their size, one of package of BioConductor [11,12]. Many obtain this number. the current challenges in bioinformatics is of these tools also interact with the UCSC Next, we can enrich our search for to be able to query them in a sensible way. Table Browser and have similar approach- enzymes in the UniProt database. This Information is contained in different es using the UCSC system. We also will require the ‘linked’ or secondary databases, with various data representa- address programmatic access using Bio- dataset. Follow this description, or view tions or formats, making it very difficult to Mart’s own implementation of Web ser- the tutorials for use of the linked database use a single query tool to search more than vices (MartService). For local deployment at http://www.ensembl.org/common/ a single data source. of BioMart, see Table 1. Workshops_Online?id = 117. Data mining is vital to bioinformatics as Click on the second ‘Dataset’ option at it allows users to go beyond simple BioMart Web Interface the left of the page. Select ‘UniProt browsing of genome browsers, such as proteomes’ as the database. In this Ensembl [1,2] or the UCSC Genome Browser
    [Show full text]
  • The Biomart Community Portal: an Innovative Alternative to Large
    Nucleic Acids Research Advance Access published April 20, 2015 Nucleic Acids Research, 2015 1 doi: 10.1093/nar/gkv350 The BioMart community portal: an innovative alternative to large, centralized data repositories Damian Smedley1, Syed Haider2, Steffen Durinck3, Luca Pandini4, Paolo Provero4,5, James Allen6, Olivier Arnaiz7, Mohammad Hamza Awedh8, Richard Baldock9, Giulia Barbiera4, Philippe Bardou10,TimBeck11, Andrew Blake12, Merideth Bonierbale13, Anthony J. Brookes11, Gabriele Bucci4, Iwan Buetti4, Sarah Burge6,Cedric´ Cabau10, Joseph W. Carlson14, Claude Chelala15, Charalambos Chrysostomou11, Davide Cittaro4, Olivier Collin16, Raul Cordova13,RosalindJ.Cutts15, Erik Dassi17, Alex Di Genova18, Anis Djari19, Anthony Esposito20, Heather Estrella20, Eduardo Eyras21,22, Julio Fernandez-Banet20, Simon Forbes1, Robert C. Free11, Takatomo Fujisawa23, Emanuela Gadaleta15, Jose M. Garcia-Manteiga4, David Goodstein14, Kristian Gray24, Jose´ Afonso Guerra-Assunc¸ao˜ 15, Bernard Haggarty9, Dong-Jin Han25,26, Byung Woo Han27,28, Todd Harris29, Jayson Harshbarger30, Robert K. Hastings11, Richard D. Hayes14, Claire Hoede19, Shen Hu31, Zhi-Liang Hu32, Lucie Hutchins33, Zhengyan Kan20, Hideya Kawaji30,34, Aminah Keliet35, Arnaud Kerhornou6, Sunghoon Kim25,26, Rhoda Kinsella6, Christophe Klopp19, Lei Kong36, Daniel Lawson37, Dejan Lazarevic4, Ji-Hyun Lee25,27,28, Thomas Letellier35, Chuan-Yun Li38, Pietro Lio39, Chu-Jun Liu38, Jie Luo6, Alejandro Maass18,40, Jerome Mariette19, Thomas Maurel6, Stefania Merella4, Azza Mostafa Mohamed41, Francois Moreews10, Ibounyamine Nabihoudine19, Nelson Ndegwa42, Celine´ Noirot19, Cristian Perez-Llamas22, Michael Primig43, Alessandro Quattrone17, Hadi Quesneville35, Davide Rambaldi4, James Reecy32, Michela Riba4, Steven Rosanoff6, Amna Ali Saddiq44, Elisa Salas13, Olivier Sallou16, Rebecca Shepherd1, Reinhard Simon13, Linda Sperling7, William Spooner45,46, Daniel M. Staines6, Delphine Steinbach35, Kevin Stone33, Elia Stupka4, Jon W. Teague1, Abu Z.
    [Show full text]