The EMBL-EBI Search and Sequence Analysis Tools Apis in 2019
Total Page:16
File Type:pdf, Size:1020Kb
W636–W641 Nucleic Acids Research, 2019, Vol. 47, Web Server issue Published online 12 April 2019 doi: 10.1093/nar/gkz268 The EMBL-EBI search and sequence analysis tools APIs in 2019 Fabio´ Madeira†, Young mi Park†, Joon Lee, Nicola Buso, Tamer Gur, Nandana Madhusoodanan, Prasad Basutkar, Adrian R.N. Tivey, Simon C. Potter , Robert D. Finn and Rodrigo Lopez* European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Received January 31, 2019; Revised March 22, 2019; Editorial Decision April 01, 2019; Accepted April 03, 2019 ABSTRACT lic data resources focus on a specific data type, but often value arises from the interconnectivity between multiple re- The EMBL-EBI provides free access to popular sources and datasets (1). Cross-referencing between data re- bioinformatics sequence analysis applications as sources combined with expert curation and knowledge has well as to a full-featured text search engine with the potential to enhance the value of the existing data. An- powerful cross-referencing and data retrieval capa- other way of improving the value of these data is to develop bilities. Access to these services is provided via novel ways to use bioinformatics applications that take ad- user-friendly web interfaces and via established vantage of these cross-references and enrich tool output and RESTful and SOAP Web Services APIs (https: analytical results. Good examples of such applications are //www.ebi.ac.uk/seqdb/confluence/display/JDSAT/ sequence similarity search suites such as NCBI BLAST+ EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). (2), FASTA (3), HMMER3 (4), as well as functional clas- Both systems have been developed with the same sification and annotation programs such as InterProScan5 (5), which allow large collections of sequence data to be core principles that allow them to integrate an ever- searched. The EMBL-EBI has devoted a lot of effort to de- increasing volume of biological data, making them velop two Web Service API-centred frameworks, Job Dis- an integral part of many popular data resources patcher (6) and EBI Search (7), for providing access to (i) provided at the EMBL-EBI. Here, we describe the sequence analysis tools and to (ii) a free text search and latest improvements made to the frameworks which powerful cross-referencing engine, respectively. Here, we de- enhance the interconnectivity between public EMBL- scribe the various enhancements made recently to these ser- EBI resources and ultimately enhance biological vices. Additionally, examples of integration between the re- data discoverability, accessibility, interoperability sources are provided, which aim to demonstrate the ‘Soft- and reusability. ware as a Service’ (SaaS) capabilities of the APIs. INTRODUCTION The frameworks With the advent of many technological developments, the The Job Dispatcher framework (6) provides reliable Bioin- volume and types of data generated in the life sciences have formatics Sequence Analysis Web Services. Tools run- greatly expanded over the last years. High-throughput data ning under it comprise core bioinformatics analytical tools is increasingly not only generated in fields ranging from and nucleotide and protein sequence databases hosted at proteomics, metabolomics and metagenomics, but also in EMBL-EBI via three fundamental interfaces: web browser, those such as structural biology, traditionally viewed as RESTful and SOAP Web Services. The framework has three low-throughput. This puts pressure on the data custodians main components: (i) a tools configuration module; (ii) a that are not only expected to provide continuous access to cluster scheduling interface that communicates with a high- thedata,butalsotodosoinalternativeformatsandover performance computer (HPC) queuing system and (iii) a new access methods. Application Programming Interfaces rendering module for displaying enriched results. Extensive (APIs) are increasingly used to improve the way in which bi- validation routines are built into the web service to ensure ological data are consumed and integrated into third-party input parameter values and input data types sent to the systems. Despite these efforts, the reality is that most pub- tools are valid and appropriate. Similarly, tool outputs are *To whom correspondence should be addressed. Tel: +44 1223 494 423; Fax: +44 1223 494 468; Email: [email protected] †The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors. C The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Nucleic Acids Research, 2019, Vol. 47, Web Server issue W637 examined to confirm tool execution, detect errors and pro- terface itself is a client of its own API and provides a good duce human-readable reports. In many cases, visual rep- example of its capabilities. resentations of tool results are provided to help the user The EBI Search engine indexes cross-references between understand the job output both interactively using a web entries in different domains. This is now more visible to browser or programmatically. users by displaying the number of, and links to the cross- EBI Search (7) is available from https://www.ebi.ac.uk/ references available. The implementation of the new visual- ebisearch and is a fast and scalable search engine imple- ization (see Figure 1) is the result of a collaboration with the mented with the Apache Lucene framework (https://lucene. user experience (UX) teams at the EMBL-EBI. apache.org) that provides easy and uniform access to pub- During 2017, EBI Search was involved in the Thor lic biological data resources hosted at the EMBL-EBI. The project (https://project-thor.eu/), with the aim of establish- search engine indexes freely available data resources in var- ing associations between scientists and the datasets they ious formats such as XML, JSON and plain text from have worked on. EBI Search is now indexing and showing EMBL-EBI and provides text searching as well as other these via a User Interface (UI) that scientists can use to useful functions for query management, result filtering and claim datasets into their ORCID (https://orcid.org/). Fur- cross-reference exploration through its RESTful API. This ther information on this service is available from: https: powerful API allows users to create a complex view of //www.ebi.ac.uk/ebisearch/orcidclaimdocumentation.ebi. data resources, helping them find answers to their biolog- ical questions. EBI Search is also accessible via a web in- Updates on the frameworks and APIs terface developed with the Angular JavaScript framework (https://angular.io/). Job Dispatcher was improved with the addition of Swag- ger OpenAPI UI , available at https://www.ebi.ac.uk/Tools/ common/tools/help. This UI covers all of the bioinformat- Bioinformatics tools as a service ics tools provided by Job Dispatcher and allows users to Job Dispatcher Web Services have been integrated into mul- explore the API endpoints, as well as to explore valid pa- tiple EMBL-EBI resources. Examples of tools integration rameters and options that each bioinformatics application are the NCBI BLAST+ services integrated into the web- expects. Collaboration with UX experts has led to im- sites of UniProtKB (8), ENA (9) and Ensembl Genomes proved user experience when navigating through the Job (10). Similar integration is done with SSEARCH (3)aspart Dispatcher web pages and results. Work was carried out to of services offered by the PDBe (11). One of the biggest improve the way tool results are displayed and how they users of the framework is InterPro (12) whose InterProScan can be integrated into workflows. Notably, the framework 5(5) protein sequence scanning against InterPro databases now provides a convenient set of example inputs, as well is powered by Job Dispatcher. A new integration for this up- as outputs, for all bioinformatics tools provided. A new set date is batch HMMER3 (4) phmmer, hmmscan and hmm- of auto-generated REST clients in Python, Perl and Java search analysis jobs, which are similarly powered by the have been introduced, which are actively maintained (https: Job Dispatcher Web Services. Dbfetch (http://www.ebi.ac. //github.com/ebi-wp/webservice-clients). These clients are uk/Tools/dbfetch) is a service that allows a variety of data, available for local installation, but a Docker (https://www. including sequence data, to be retrieved in a variety of for- docker.com/) container is also provided so that the clients mats using a unique endpoint and a consistent API. It pro- can be reproducibly run in a variety of operating sys- vides a web interface and programmatic interface, acting tems and environments. The Perl and Python clients in- as a central support service for other EMBL-EBI services. clude a feature that allow sequence search analysis (e.g. For example, Job Dispatcher and SIFTS sequence-structure NCBI BLAST+, FASTA and HMMER3) to be submit- mapping provided by the PDBe (13), which also highlights ted in batches, from an input that is composed by mul- the interconnectivity between data resources and the appli- tiple sequences in fasta format (using the optional flag ‘– cations. multifasta’). Common Workflow Language (CWL)16 ( )de- scriptions, as well as sample CWL workflows, have also been made available (https://github.com/ebi-wp/webservice-cwl) EBI Search as a service and can be used to ease the integration of the Web Services The capability of ‘EBI Search as a Service’ implemented in into workflows. Dbfetch now allows Cross-Origin Resource the EBI Search API has been used by more than 15 projects Sharing (CORS) from all origins, which enables usage of the in EMBL-EBI (https://www.ebi.ac.uk/ebisearch/overview.