Web Service Management System for Bioinformatics Research: a Case Study. Service Oriented Computing and Applications, 5 (1)

Middlesex University Research Repository An open access repository of Middlesex University research http://eprints.mdx.ac.uk Xu, Kai ORCID: https://orcid.org/0000-0003-2242-5440, Yu, Qi, Liu, Qing, Zhang, Ji and Bouguettaya, Athman (2011) Web service management system for bioinformatics research: a case study. Service Oriented Computing and Applications, 5 (1) . pp. 1-15. ISSN 1863-2386 [Article] (doi:10.1007/s11761-011-0076-9) This version is available at: https://eprints.mdx.ac.uk/7037/ Copyright: Middlesex University Research Repository makes the University’s research available electronically. Copyright and moral rights to this work are retained by the author and/or other copyright owners unless otherwise stated. The work is supplied on the understanding that any use for commercial gain is strictly forbidden. A copy may be downloaded for personal, non-commercial, research or study without prior permission and without charge. Works, including theses and research projects, may not be reproduced in any format or medium, or extensive quotations taken from them, or their content changed in any way, without first obtaining permission in writing from the copyright holder(s). They may not be sold or exploited commercially in any format or medium without the prior written permission of the copyright holder(s). Full bibliographic details must be given when referring to, or quoting from full items including the author’s name, the title of the work, publication details where relevant (place, publisher, date), pag- ination, and for theses or dissertations the awarding institution, the degree type awarded, and the date of the award. If you believe that any material held in the repository infringes copyright law, please contact the Repository Team at Middlesex University via the following email address: [email protected] The item will be removed from the repository while any claim is being investigated. See also repository copyright: re-use policy: http://eprints.mdx.ac.uk/policies.html#copy Service Oriented Computing and Applications manuscript No. (will be inserted by the editor) Web Service Management System for Bioinformatics Research: A Case Study Kai Xu · Qi Yu · Qing Liu · Ji Zhang · Athman Bouguettaya Received: date / Accepted: date Abstract In this paper, we present a case study of 1 Introduction the design and development of a Web Service management system for bioinformatics research. The described The recent advancements of molecular biology experi- system is a prototype that provides a complete solu- mental instruments, such as microarray technology [32], tion to manage the entire life cycle of Web services in have led to rapid increase in the size and variety of avail- bioinformatics domain, which include semantic service able genomic data. Analysis of these data using compu- description, service discovery, service selection, service tational methods|the so called in silico experiments| composition, service execution, and service result pre- is becoming an integral part of modern biological stud- sentation. A challenging issue we encountered is to pro- ies. According to a recent survey, there are 1078 biolog- vide the system capability to assist users to select the ical databases [17] and over 1200 bioinformatics tools \right" service based on not only functionality but also [12] publicly available online. properties such as reliability, performance, and analy- Many of these online resources provide Web Ser- sis quality. As a solution, we used both bioinformatics vice interface, which allows easy access and integra- and service ontology to provide these two types of ser- tion of a number of services. Significant progress has vice descriptions. A service selection algorithm based been made towards building integration platforms that on skyline query algorithm is proposed to provide users utilize these resources to support bioinformatics anal- with a short list of candidates of the \best" service. yses. A common feature of these platforms is provid- The evaluation results demonstrate the efficiency and ing searching facilities to help users identify required scalability of the service selection algorithm. Finally, data and analysis services, and then compose them into the important lessons we learned are summarized and workflows to perform complex analysis. Taverna [26] remaining challenging issues are discussed as possible is such a platfrom widely used in the bioinformatics future research directions. community, whereas Kepler [2] and Triana [24] are two popular platforms in the wider scientific community. Keywords Web Service · Bioinformatics · Ontology · Recently there is a trend to extend such platforms Workflow to support semantic Web Services, as more semanti- cally linked data repositories become available (with the Linked Data [8] being a prominent example). Semantic Kai Xu was at CSIRO, Australia and is currently at the Middle- annotation provides richer information than Web Ser- sex University, UK. vice description alone and can be used for automatic E-mail: [email protected] Qi Yu is at the Rochester Institute of Technology, USA. reasoning when they conform to certain ontology. The E-mail: [email protected] semantic support can be added through plug-in (such as Qing Liu and Athman Bouguettaya are at CSIRO, Australia. the Feta plug-in [23] for Taverna) or expanding the ex- E-mail: fQ.Liu, [email protected] isting system (such as the semantic \Tagging" function Ji Zhang was at CSIRO, Australia and is currently at the Uni- versity of Southern Queensland, Australia. in Kepler [1]). Considerable work has also been done E-mail: [email protected] to develop bioinformatics-related ontology, which provides a language of describing bioinformatics services. 2 Examples include the ontology from myGrid [43] and { BSMS has an all-service architecture, i.e., all system BioMoby [42]. Finally, there are Web Service registries components are services. This allows easy change for listing available bioinformatics services. Such reg- of individual components and addition of new ones istries usually provide semantic description and thus to adapt to other disease studies if required. Al- more powerful searching functionality. Registries like though the current system only has the components Moby Central [42] and the more recent BioCatalogue required for colorectal cancer research, it can be eas- [7] also monitor service availability and automatically ily extended. remove unresponsive ones. { The whole system is based on a Semantic Web Ser- A biological research group within our organization vice framework (WSMO [30]). As a result, all system is working on the genetic causes of colorectal cancer, components provide native semantic support. Both and they conduct complex analysis procedures as de- bioinformatics and service ontology are included to scribed before on a daily basis. Currently, such analysis provide functionality based service discovery. is done manually, which is time consuming and error { Considering the potentially large number of services, prone. The existing systems for bioinformatics research there may be functionality overlaps between dif- are mainly designed for computer literate users. Our ferent service providers. We believe that the non- collaborators find it overwhelming given the complex- functional properties of services, especially those re- ity of the user interface and large number of resources lated to Quality of Service (QoS), such as reliability, listed. We propose to adopt a different development performance, and analysis quality should be consid- strategy as the existing systems. Instead of building ered when several services providing similar func- a large and complex system that can accommodate a tion or information. A service selection algorithm wide range of bioinformatics research requirements, our based on skyline query is proposed to help users system is designed specifically for colorectal cancer re- identify the \best" service. Evaluation confirms the search at this stage. The flexibility of our system archi- efficiency and scalability of our algorithm. tecture allows it to easily adapt to other disease studies { An interactive workflow environment is incorporated later on if required. While there are some efforts in the to provide better usability for biologists. Only es- existing systems to help users to find the \right" ser- sential services are visible and automatically service vices (such as adding semantic annotation), they are filtering and selection are done wherever possible still in very early stage and can not meet the require- to avoid overwhelming users with large number of ments of our users. Also, we believe it is essential to choices. support semantic Web Service at a system level to fully What reported here is our attempt to address these realize its potential. In other words, all the core system challenging issues (especially the third one), and by no components should provide native support for semantic means we have solved them. It is our hope that by shar- Web Service, which is not how the existing systems are ing the experience others can learn from our lessons and designed. become aware of the research problems that require fur- This paper focuses on the design and development ther attention. of a service management system prototype using bioinformatics as a target application. In this respect, we used experiments used in gene identification of colorec- 2 Related Work tal cancer as a specific application. The goal of this work is not to provide ready-to-use tool but rather 2.1 Bioinformatics Web Services a proof-of-concept for the approach we are proposing. The proposed Bioinformatics Service Management Sys- There are a large number of Web Service-based work- tem (BSMS) provides a complete system solution to flow environments have been built to support scien- manage the entire life cycle of Web services in bioinfor- tific research [31,40]. Among them, BioMoby and my- matics domain, which include semantic service descrip- Grid are two of the most widely used bioinformatics tion, service discovery, service selection, service com- platforms. BioMoby provides a registry of bioinformat- position, service execution, and service result presen- ics services and searching functionality.

Web Service Management System for Bioinformatics Research: a Case Study. Service Oriented Computing and Applications, 5 (1)

Genome Informatics 4–8 September 2002, Wellcome Trust Genome Campus, Hinxton, Cambridge, UK

Towards a Knowledge Graph for Science

Biocuration 2016 - Posters

Gearing up to Handle the Mosaic Nature of Life in the Quest for Orthologs. Kristoffer Forslund

ISCB's Initial Reaction to the New England Journal of Medicine

The Evaluation of Ontologies: Editorial Review Vs

Challenges for Ontology Repositories and Applications to Biomedicine & Agronomy

Downloaded from [16, 18]

ACM-BCB 2016 the 7Th ACM Conference on Bioinformatics

Gaëlle GARET Classification Et Caractérisation De Familles Enzy

I S C B N E W S L E T T

Interactive Knowledge Capture in the New Millennium: How the Semantic Web Changed Everything