Cloud Computing with E-Science Applications

TERZO Information Technology • MOSSUCCA Cloud Computing with e-Science Applications The amount of data in everyday life has been exploding. This data increase has been especially signicant in scientic elds, where substantial amounts of data must be captured, communicated, aggregated, stored, and analyzed. Cloud Computing with e-Science Applications explains how cloud computing can improve data management in data-heavy elds such as bioinformatics, earth science, and computer science. Cloud Computing The book begins with an overview of cloud models supplied by the National Institute of Standards and Technology (NIST), and then: • Discusses the challenges imposed by big data on scientic data infrastructures, including security and trust issues • Covers vulnerabilities such as data theft or loss, privacy concerns, infected applications, threats in virtualization, and cross-virtual machine attack with • Describes the implementation of workows in clouds, proposing an architecture composed of two layers—platform and application • Details infrastructure-as-a-service (IaaS), platform-as-a-service (PaaS), e-Science Applications and software-as-a-service (SaaS) solutions based on public, private, and hybrid cloud computing models • Demonstrates how cloud computing aids in resource control, vertical and horizontal scalability, interoperability, and adaptive scheduling Featuring signicant contributions from research centers, universities, Cloud Computing and industries worldwide, Cloud Computing with e-Science Applications presents innovative cloud migration methodologies applicable to a variety of elds where large data sets are produced. The book provides the scientic with community with an essential reference for moving applications to the cloud. e-Science Applications K20498 EDITED BY OLIVIER TERZO • LORENZO MOSSUCCA Cloud Computing with e-Science Applications Cloud Computing with e-Science Applications EDITED BY OLIVIER TERZO ISMB, TURIN, ITALY LORENZO MOSSUCCA ISMB, TURIN, ITALY Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Group, an informa business CRC Press Taylor & Francis Group 6000 Broken Sound Parkway NW, Suite 300 Boca Raton, FL 33487-2742 © 2015 by Taylor & Francis Group, LLC CRC Press is an imprint of Taylor & Francis Group, an Informa business No claim to original U.S. Government works Version Date: 20141212 International Standard Book Number-13: 978-1-4665-9116-5 (eBook - PDF) This book contains information obtained from authentic and highly regarded sources. Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged. Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. Visit the Taylor & Francis Web site at http://www.taylorandfrancis.com and the CRC Press Web site at http://www.crcpress.com Contents Preface .................................................................................................................... vii Acknowledgments .............................................................................................. xiii About the Editors ..................................................................................................xv List of Contributors ............................................................................................ xvii 1 Evaluation Criteria to Run Scientific Applications in the Cloud .........1 Eduardo Roloff, Alexandre da Silva Carissimi, and Philippe Olivier Alexandre Navaux 2 Cloud-Based Infrastructure for Data-Intensive e-Science Applications: Requirements and Architecture ....................................... 17 Yuri Demchenko, Canh Ngo, Paola Grosso, Cees de Laat, and Peter Membrey 3 Securing Cloud Data ....................................................................................41 Sushmita Ruj and Rajat Saxena 4 Adaptive Execution of Scientific Workflow Applications on Clouds .......................................................................................................73 Rodrigo N. Calheiros, Henry Kasim, Terence Hung, Xiaorong Li, Sifei Lu, Long Wang, Henry Palit, Gary Lee, Tuan Ngo, and Rajkumar Buyya 5 Migrating e-Science Applications to the Cloud: Methodology and Evaluation .....................................................................89 Steve Strauch, Vasilios Andrikopoulos, Dimka Karastoyanova, and Karolina Vukojevic-Haupt 6 Closing the Gap between Cloud Providers and Scientific Users ..... 115 David Susa, Harold Castro, and Mario Villamizar 7 Assembling Cloud-Based Geographic Information Systems: A Pragmatic Approach Using Off-the-Shelf Components................. 141 Muhammad Akmal, Ian Allison, and Horacio González–Vélez 8 HCloud, a Healthcare-Oriented Cloud System with Improved Efficiency in Biomedical Data Processing ................ 163 Ye Li, Chenguang He, Xiaomao Fan, Xucan Huang, and Yunpeng Cai v vi Contents 9 RPig: Concise Programming Framework by Integrating R with Pig for Big Data Analytics ............................................................... 193 MingXue Wang and Sidath B. Handurukande 10 AutoDock Gateway for Molecular Docking Simulations in Cloud Systems ........................................................................................ 217 Zoltán Farkas, Péter Kacsuk, Tamás Kiss, Péter Borsody, Ákos Hajnal, Ákos Balaskó, and Krisztián Karóczkai 11 SaaS Clouds Supporting Biology and Medicine .................................. 237 Philip Church, Andrzej Goscinski, Adam Wong, and Zahir Tari 12 Energy-Aware Policies in Ubiquitous Computing Facilities ............. 267 Marina Zapater, Patricia Arroba, José Luis Ayala Rodrigo, Katzalin Olcoz Herrero, and José Manuel Moya Fernandez Preface The interest in cloud computing in both industry and research domains is continuously increasing to address new challenges of data management, computational requirements, and flexibility based on needs of scientific communities, such as custom software environments and architectures. It provides cloud platforms in which users interact with applications remotely over the Internet, bringing several advantages for sharing data, for both applications and end users. Cloud computing provides everything: computing power, computing infrastructure, applications, business processes, storage, and interfaces, and can provide services wherever and whenever needed. Cloud computing provides four essential characteristics: elasticity; scalability; dynamic provisioning of applications, storage, and resources; and billing and metering of service usage in a pay-as-you-go model. This flexibility of management and resource optimization is also what attracts the main scientific communities to migrate their applications to the cloud. Scientific applications often are based on access to large legacy data sets and application software libraries. Usually, these applications run in dedicated high performance computing (HPC) centers with a low-latency interconnec- tion. The main cloud features, such as customized environments, flexibility, and elasticity, could provide significant benefits. Since every day the amount of data is exploding, this book describes how cloud computing technology can help such scientific communities as bioinformatics, earth science, and many others, especially in scientific domains where large data sets are produced. Data in more scenarios must be captured, communicated, aggregated, stored, and analyzed, which opens new challenges in terms of tool development for data and resource management, such as a federation of cloud infrastructures and automatic discovery of services. Cloud computing has become a platform for scalable services and deliv- ery in the field of services computing. Our intention is to put the empha- sis on scientific applications using solutions based on cloud computing models—public, private, and hybrid—with innovative methods, including data capture, storage, sharing, analysis, and visualization for scientific algo- rithms needed for a variety of fields. The intended audience includes those who work in industry, students, professors, and researchers from information technology, computer science, computer engineering, bioinformatics, science, and

Cloud Computing with E-Science Applications

Book of Abstracts

User! 2020 Tutorials – Afternoon Session

Computational Reproducibility in Archaeological Research: Basic Principles and a Case Study of Their Implementation

R Studio R Code Execution Error

Mastering Spark with R the Complete Guide to Large-Scale Analysis and Modeling

Package 'Drake'

The VGAM Package, Which Is Ca- Ter of the Publication Will Be Largely Unchanged

Business Analytics in R Introduction to Statistical Programming

Sparkr: Scaling R Programs with Spark

Large-Scale Parallel Statistical Forecasting Computations in R

Compiler and Runtime Techniques for Optimizing Dynamic Scripting Languages

Speaking Serial R with a Parallel Accent