W636–W641 Nucleic Acids Research, 2019, Vol. 47, Web Server issue Published online 12 April 2019 doi: 10.1093/nar/gkz268 The EMBL-EBI search and sequence analysis tools APIs in 2019 Fabio´ Madeira†, Young mi Park†, Joon Lee, Nicola Buso, Tamer Gur, Nandana Madhusoodanan, Prasad Basutkar, Adrian R.N. Tivey, Simon C. Potter , Robert D. Finn and Rodrigo Lopez*
European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK
Received January 31, 2019; Revised March 22, 2019; Editorial Decision April 01, 2019; Accepted April 03, 2019
ABSTRACT lic data resources focus on a specific data type, but often value arises from the interconnectivity between multiple re- The EMBL-EBI provides free access to popular sources and datasets (1). Cross-referencing between data re- bioinformatics sequence analysis applications as sources combined with expert curation and knowledge has well as to a full-featured text search engine with the potential to enhance the value of the existing data. An- powerful cross-referencing and data retrieval capa- other way of improving the value of these data is to develop bilities. Access to these services is provided via novel ways to use bioinformatics applications that take ad- user-friendly web interfaces and via established vantage of these cross-references and enrich tool output and RESTful and SOAP Web Services APIs (https: analytical results. Good examples of such applications are //www.ebi.ac.uk/seqdb/confluence/display/JDSAT/ sequence similarity search suites such as NCBI BLAST+ EMBL-EBI+Web+Services+APIs+-+Data+Retrieval). (2), FASTA (3), HMMER3 (4), as well as functional clas- Both systems have been developed with the same sification and annotation programs such as InterProScan5 (5), which allow large collections of sequence data to be core principles that allow them to integrate an ever- searched. The EMBL-EBI has devoted a lot of effort to de- increasing volume of biological data, making them velop two Web Service API-centred frameworks, Job Dis- an integral part of many popular data resources patcher (6) and EBI Search (7), for providing access to (i) provided at the EMBL-EBI. Here, we describe the sequence analysis tools and to (ii) a free text search and latest improvements made to the frameworks which powerful cross-referencing engine, respectively. Here, we de- enhance the interconnectivity between public EMBL- scribe the various enhancements made recently to these ser- EBI resources and ultimately enhance biological vices. Additionally, examples of integration between the re- data discoverability, accessibility, interoperability sources are provided, which aim to demonstrate the ‘Soft- and reusability. ware as a Service’ (SaaS) capabilities of the APIs.
INTRODUCTION The frameworks With the advent of many technological developments, the The Job Dispatcher framework (6) provides reliable Bioin- volume and types of data generated in the life sciences have formatics Sequence Analysis Web Services. Tools run- greatly expanded over the last years. High-throughput data ning under it comprise core bioinformatics analytical tools is increasingly not only generated in fields ranging from and nucleotide and protein sequence databases hosted at proteomics, metabolomics and metagenomics, but also in EMBL-EBI via three fundamental interfaces: web browser, those such as structural biology, traditionally viewed as RESTful and SOAP Web Services. The framework has three low-throughput. This puts pressure on the data custodians main components: (i) a tools configuration module; (ii) a that are not only expected to provide continuous access to cluster scheduling interface that communicates with a high- thedata,butalsotodosoinalternativeformatsandover performance computer (HPC) queuing system and (iii) a new access methods. Application Programming Interfaces rendering module for displaying enriched results. Extensive (APIs) are increasingly used to improve the way in which bi- validation routines are built into the web service to ensure ological data are consumed and integrated into third-party input parameter values and input data types sent to the systems. Despite these efforts, the reality is that most pub- tools are valid and appropriate. Similarly, tool outputs are
*To whom correspondence should be addressed. Tel: +44 1223 494 423; Fax: +44 1223 494 468; Email: [email protected] †The authors wish it to be known that, in their opinion, the first two authors should be regarded as Joint First Authors.