D4.1 Provider Agnostic Interface Definition & Mapping Cycle

Total Page:16

File Type:pdf, Size:1020Kb

D4.1 Provider Agnostic Interface Definition & Mapping Cycle Ref. Ares(2018)3067143 - 11/06/2018 Title: Provider agnostic interface definition & mapping cycle Executive summary: Multi-cloud Execution-ware for Large-scale Optimised Data- Intensive Computing This deliverable presents the Executionware component of H2020-ICT-2016-2017 the Melodic project. The tasks of the Executionware are: (a) Leadership in Enabling and the allocation of resources from a heterogeneous multi-cloud Industrial Technologies; environment, (b) the usage of those resources to deploy and Information and Communication Technologies run (data processing) tasks and (c) monitoring the runtime context of the running tasks. Grant Agreement No.: 731664 This document focuses on the provider agnostic interface Duration: used to abstract syntactic and semantic differences in the 1 December 2016 - cloud providers’ APIs and the required mapping to translate 30 November 2019 the agnostic interface to concrete implementations on the www.melodic.cloud providers’ side. In addition, it presents a first draft of the Deliverable reference: resource management layer, focusing on resource D4.1 advertisement to Melodic’s Upperware. Finally, the Date: deliverable gives an outlook for a refined resource 09 April 2019 management layer and the data processing layer that will Responsible partner: span on top of it. UULM Editor(s): Daniel Baur Author(s): Daniel Baur, Daniel Seybold Approved by: Ernst Gunnar Gran ISBN number: N/A Document URL: http://www.melodic.cloud/deli verables/D4.1 Provider agnostic interface definition & mapping cycle This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731664 www.melodic.cloud Deliverable reference: Editor(s): 4.1 Daniel Baur Document Period Covered M1-16 Deliverable No. D4.1 Deliverable Title Provider agnostic interface definition & mapping cycle Editor(s) Daniel Baur Author(s) Daniel Baur, Daniel Seybold Reviewer(s) Gregoris Mentzas, Marcin Prusiński Work Package No. 4 Work Package Title Executionware Lead Beneficiary Ulm University Distribution PU Version 1.0 Draft/Final Final Total No. of Pages 36 This project has received funding from the European Union’s Horizon 2020 research and www.melodic.cloud innovation programme under grant agreement No 731664 2 Deliverable reference: Editor(s): 4.1 Daniel Baur Table of Contents 1 Introduction ......................................................................................................................................................... 5 1.1 Scope of the document ............................................................................................................................... 5 1.2 Structure of the document ........................................................................................................................ 6 2 Related Work ....................................................................................................................................................... 6 2.1 IaaS Mapping ................................................................................................................................................. 6 2.2 PaaS Mapping ................................................................................................................................................ 8 2.3 Cross-Level Mapping .................................................................................................................................. 9 2.4 Resource Management ............................................................................................................................. 10 3 Features .............................................................................................................................................................. 10 3.1 Provider agnostic interface & mapping ............................................................................................... 11 IaaS ................................................................................................................................................................... 11 PaaS ................................................................................................................................................................ 15 3.2 Job Description ........................................................................................................................................... 18 3.3 Resource Management ............................................................................................................................. 19 Resource Advertisement ........................................................................................................................ 20 Matchmaking / Scheduling ................................................................................................................... 22 Resource Allocation ..................................................................................................................................23 3.4 Deployment ..................................................................................................................................................23 3.5 Monitoring ................................................................................................................................................... 24 3.6 Adaptation ................................................................................................................................................... 26 4 Architecture ..................................................................................................................................................... 26 5 Implementation............................................................................................................................................... 28 6 Integration and Documentation ................................................................................................................ 28 6.1 Integration ................................................................................................................................................... 29 6.2 Documentation ........................................................................................................................................... 30 7 Future Work ....................................................................................................................................................... 31 7.1 Resource Management ............................................................................................................................. 31 7.2 Deployment .................................................................................................................................................. 31 7.3 Adaptation ....................................................................................................................................................32 This project has received funding from the European Union’s Horizon 2020 research and www.melodic.cloud innovation programme under grant agreement No 731664 3 Deliverable reference: Editor(s): 4.1 Daniel Baur 7.4 Data Processing Layer ..............................................................................................................................32 8 Conclusion .........................................................................................................................................................32 Bibliography .............................................................................................................................................................. 33 List of Figures Figure 1: Melodic Architecture [1] ......................................................................................................................... 5 Figure 2: ComputeService and DiscoveryService Interface ....................................................................... 13 Figure 3: Discovery Class Model ........................................................................................................................ 15 Figure 4: PlatformService Interface and Plaform Entities ......................................................................... 16 Figure 5: Job Description Framework .............................................................................................................. 19 Figure 6: Requirement .......................................................................................................................................... 22 Figure 7: Monitoring Framework ........................................................................................................................ 25 Figure 8: Monitoring Class Diagram .................................................................................................................. 25 Figure 9: Cloudiator Architecture ...................................................................................................................... 27 Figure 10: Cloudiator Integration Tools & Workflow ................................................................................... 30 List of Tables Table 1: IaaS Compute Entities ............................................................................................................................ 13 Table 2: Supported Cloud Providers ................................................................................................................... 14 Table 3: PaaS Entities ............................................................................................................................................
Recommended publications
  • Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions
    00 Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions MUTAZ BARIKA, University of Tasmania SAURABH GARG, University of Tasmania ALBERT Y. ZOMAYA, University of Sydney LIZHE WANG, China University of Geoscience (Wuhan) AAD VAN MOORSEL, Newcastle University RAJIV RANJAN, Chinese University of Geoscienes and Newcastle University Interest in processing big data has increased rapidly to gain insights that can transform businesses, government policies and research outcomes. This has led to advancement in communication, programming and processing technologies, including Cloud computing services and technologies such as Hadoop, Spark and Storm. This trend also affects the needs of analytical applications, which are no longer monolithic but composed of several individual analytical steps running in the form of a workflow. These Big Data Workflows are vastly different in nature from traditional workflows. Researchers arecurrently facing the challenge of how to orchestrate and manage the execution of such workflows. In this paper, we discuss in detail orchestration requirements of these workflows as well as the challenges in achieving these requirements. We alsosurvey current trends and research that supports orchestration of big data workflows and identify open research challenges to guide future developments in this area. CCS Concepts: • General and reference → Surveys and overviews; • Information systems → Data analytics; • Computer systems organization → Cloud computing; Additional Key Words and Phrases: Big Data, Cloud Computing, Workflow Orchestration, Requirements, Approaches ACM Reference format: Mutaz Barika, Saurabh Garg, Albert Y. Zomaya, Lizhe Wang, Aad van Moorsel, and Rajiv Ranjan. 2018. Orchestrating Big Data Analysis Workflows in the Cloud: Research Challenges, Survey, and Future Directions.
    [Show full text]
  • Building Machine Learning Inference Pipelines at Scale
    Building Machine Learning inference pipelines at scale Julien Simon Global Evangelist, AI & Machine Learning @julsimon © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Problem statement • Real-life Machine Learning applications require more than a single model. • Data may need pre-processing: normalization, feature engineering, dimensionality reduction, etc. • Predictions may need post-processing: filtering, sorting, combining, etc. Our goal: build scalable ML pipelines with open source (Spark, Scikit-learn, XGBoost) and managed services (Amazon EMR, AWS Glue, Amazon SageMaker) © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. Apache Spark https://spark.apache.org/ • Open-source, distributed processing system • In-memory caching and optimized execution for fast performance (typically 100x faster than Hadoop) • Batch processing, streaming analytics, machine learning, graph databases and ad hoc queries • API for Java, Scala, Python, R, and SQL • Available in Amazon EMR and AWS Glue © 2019, Amazon Web Services, Inc. or its affiliates. All rights reserved. MLlib – Machine learning library https://spark.apache.org/docs/latest/ml-guide.html • Algorithms: classification, regression, clustering, collaborative filtering. • Featurization: feature extraction, transformation, dimensionality reduction. • Tools for constructing, evaluating and tuning pipelines • Transformer – a transform function that maps a DataFrame into a new
    [Show full text]
  • Evaluation of SPARQL Queries on Apache Flink
    applied sciences Article SPARQL2Flink: Evaluation of SPARQL Queries on Apache Flink Oscar Ceballos 1 , Carlos Alberto Ramírez Restrepo 2 , María Constanza Pabón 2 , Andres M. Castillo 1,* and Oscar Corcho 3 1 Escuela de Ingeniería de Sistemas y Computación, Universidad del Valle, Ciudad Universitaria Meléndez Calle 13 No. 100-00, Cali 760032, Colombia; [email protected] 2 Departamento de Electrónica y Ciencias de la Computación, Pontificia Universidad Javeriana Cali, Calle 18 No. 118-250, Cali 760031, Colombia; [email protected] (C.A.R.R.); [email protected] (M.C.P.) 3 Ontology Engineering Group, Universidad Politécnica de Madrid, Campus de Montegancedo, Boadilla del Monte, 28660 Madrid, Spain; ocorcho@fi.upm.es * Correspondence: [email protected] Abstract: Existing SPARQL query engines and triple stores are continuously improved to handle more massive datasets. Several approaches have been developed in this context proposing the storage and querying of RDF data in a distributed fashion, mainly using the MapReduce Programming Model and Hadoop-based ecosystems. New trends in Big Data technologies have also emerged (e.g., Apache Spark, Apache Flink); they use distributed in-memory processing and promise to deliver higher data processing performance. In this paper, we present a formal interpretation of some PACT transformations implemented in the Apache Flink DataSet API. We use this formalization to provide a mapping to translate a SPARQL query to a Flink program. The mapping was implemented in a prototype used to determine the correctness and performance of the solution. The source code of the Citation: Ceballos, O.; Ramírez project is available in Github under the MIT license.
    [Show full text]
  • Polycom Realpresence Cloudaxis Open Source Software OFFER
    Polycom® RealPresence® CloudAXIS™ Suite OFFER of Source for GPL and LGPL Software You may have received from Polycom®, certain products that contain—in part—some free software (software licensed in a way that allows you the freedom to run, copy, distribute, change, and improve the software). As a part of this product, Polycom may have distributed to you software, or made electronic downloads, that contain a version of several software packages, which are free software programs developed by the Free Software Foundation. With your purchase of the Polycom RealPresence® CloudAXIS™ Suite, Polycom has granted you a license to the above-mentioned software under the terms of the GNU General Public License (GPL), GNU Library General Public License (LGPLv2), GNU Lesser General Public License (LGPL), or BSD License. The text of these Licenses can be found at the internet address provided in Table A. For at least three years from the date of distribution of the applicable product or software, we will give to anyone who contacts us at the contact information provided below, for a charge of no more than our cost of physically distributing, the following items: • A copy of the complete corresponding machine-readable source code for programs listed below that are distributed under the GNU GPL • A copy of the corresponding machine-readable source code for the libraries listed below that are distributed under the GNU LGPL, as well as the executable object code of the Polycom work that the library links with The software included or distributed for the product, including any software that may be downloaded electronically via the internet or otherwise (the "Software") is licensed, not sold.
    [Show full text]
  • Flare: Optimizing Apache Spark with Native Compilation
    Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data Gregory Essertel, Ruby Tahboub, and James Decker, Purdue University; Kevin Brown and Kunle Olukotun, Stanford University; Tiark Rompf, Purdue University https://www.usenix.org/conference/osdi18/presentation/essertel This paper is included in the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’18). October 8–10, 2018 • Carlsbad, CA, USA ISBN 978-1-939133-08-3 Open access to the Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation is sponsored by USENIX. Flare: Optimizing Apache Spark with Native Compilation for Scale-Up Architectures and Medium-Size Data Grégory M. Essertel1, Ruby Y. Tahboub1, James M. Decker1, Kevin J. Brown2, Kunle Olukotun2, Tiark Rompf1 1Purdue University, 2Stanford University {gesserte,rtahboub,decker31,tiark}@purdue.edu, {kjbrown,kunle}@stanford.edu Abstract cessing. Systems like Apache Spark [8] have gained enormous traction thanks to their intuitive APIs and abil- In recent years, Apache Spark has become the de facto ity to scale to very large data sizes, thereby commoditiz- standard for big data processing. Spark has enabled a ing petabyte-scale (PB) data processing for large num- wide audience of users to process petabyte-scale work- bers of users. But thanks to its attractive programming loads due to its flexibility and ease of use: users are able interface and tooling, people are also increasingly using to mix SQL-style relational queries with Scala or Python Spark for smaller workloads. Even for companies that code, and have the resultant programs distributed across also have PB-scale data, there is typically a long tail of an entire cluster, all without having to work with low- tasks of much smaller size, which make up a very impor- level parallelization or network primitives.
    [Show full text]
  • Large Scale Querying and Processing for Property Graphs Phd Symposium∗
    Large Scale Querying and Processing for Property Graphs PhD Symposium∗ Mohamed Ragab Data Systems Group, University of Tartu Tartu, Estonia [email protected] ABSTRACT Recently, large scale graph data management, querying and pro- cessing have experienced a renaissance in several timely applica- tion domains (e.g., social networks, bibliographical networks and knowledge graphs). However, these applications still introduce new challenges with large-scale graph processing. Therefore, recently, we have witnessed a remarkable growth in the preva- lence of work on graph processing in both academia and industry. Querying and processing large graphs is an interesting and chal- lenging task. Recently, several centralized/distributed large-scale graph processing frameworks have been developed. However, they mainly focus on batch graph analytics. On the other hand, the state-of-the-art graph databases can’t sustain for distributed Figure 1: A simple example of a Property Graph efficient querying for large graphs with complex queries. Inpar- ticular, online large scale graph querying engines are still limited. In this paper, we present a research plan shipped with the state- graph data following the core principles of relational database systems [10]. Popular Graph databases include Neo4j1, Titan2, of-the-art techniques for large-scale property graph querying and 3 4 processing. We present our goals and initial results for querying ArangoDB and HyperGraphDB among many others. and processing large property graphs based on the emerging and In general, graphs can be represented in different data mod- promising Apache Spark framework, a defacto standard platform els [1]. In practice, the two most commonly-used graph data models are: Edge-Directed/Labelled graph (e.g.
    [Show full text]
  • Apache Spark Solution Guide
    Technical White Paper Dell EMC PowerStore: Apache Spark Solution Guide Abstract This document provides a solution overview for Apache Spark running on a Dell EMC™ PowerStore™ appliance. June 2021 H18663 Revisions Revisions Date Description June 2021 Initial release Acknowledgments Author: Henry Wong This document may contain certain words that are not consistent with Dell's current language guidelines. Dell plans to update the document over subsequent future releases to revise these words accordingly. This document may contain language from third party content that is not under Dell's control and is not consistent with Dell's current guidelines for Dell's own content. When such third party content is updated by the relevant third parties, this document will be revised accordingly. The information in this publication is provided “as is.” Dell Inc. makes no representations or warranties of any kind with respect to the information in this publication, and specifically disclaims implied warranties of merchantability or fitness for a particular purpose. Use, copying, and distribution of any software described in this publication requires an applicable software license. Copyright © 2021 Dell Inc. or its subsidiaries. All Rights Reserved. Dell Technologies, Dell, EMC, Dell EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. [6/9/2021] [Technical White Paper] [H18663] 2 Dell EMC PowerStore: Apache Spark Solution Guide | H18663 Table of contents Table of contents
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]
  • Debugging Spark Applications a Study on Debugging Techniques of Spark Developers
    Debugging Spark Applications A Study on Debugging Techniques of Spark Developers Master Thesis Melike Gecer from Bern, Switzerland Philosophisch-naturwissenschaftlichen Fakultat¨ der Universitat¨ Bern May 2020 Prof. Dr. Oscar Nierstrasz Dr. Haidar Osman Software Composition Group Institut fur¨ Informatik und angewandte Mathematik University of Bern, Switzerland Abstract Debugging is the main activity to investigate software failures, identify their root causes, and eventually fix them. Debugging distributed systems in particular is burdensome, due to the challenges of managing numerous devices and concurrent operations, detecting the problematic node, lengthy log files, and real-world data being inconsistent. Apache Spark is a distributed framework which is used to run analyses on large-scale data. Debugging Apache Spark applications is difficult as no tool, apart from log files, is available on the market. However, an application may produce a lengthy log file, which is challenging to examine. In this thesis, we aim to investigate various techniques used by developers on a distributed system. In order to achieve that, we interviewed Spark application developers, presented them with buggy applications, and observed their debugging behaviors. We found that most of the time, they formulate hypotheses to allay their suspicions and check the log files as the first thing to do after obtaining an exception message. Afterwards, we use these findings to compose a debugging flow that can help us to understand the way developers debug a project. i Contents
    [Show full text]
  • In Cloud Computing 1 1 0
    PRIVATE CLOUD e-zine Strategies for building a private cloud In this issue: q TRENDS IN CLOUD COMPUTING 1 1 0 2 By SearchCloudComputing.com Staff R E B M E V q OPEN SOURCE IN THE CLOUD: BOON OR BUST? O N | By Bill Claybrook 4 . 0 N | NO DEMOCRACY FOR APPS IN THE CLOUD? 1 q . L O V By Mike Laverick 1E EDITOR’S LETTER OPEN SOURCE MEETS CLOUD COMPUTING HOME AS CLOUD COMPUTING continues to for evaluating your data center’s mature, IT managers want more. application portfolio and associated They are clamoring for better inte - concerns, including poor application EDITOR’S LETTER gration of cloud platforms with performance and latency, data leak - existing tools, greater control and age, and issues with compliance or TRENDS management, improved self-service, other regulations. and greater portability among cloud But first, in our Cloud One on One environments . interview, we catch up with Altaf OPEN SOURCE Enter open source software, Rupani, the VP of global strategic IN THE CLOUD: which has become the architectural planning and architecture at Dow BOON OR BUST? foundation for many cloud projects . Jones, to explore the company’s Open source software is often lower private cloud rollout and some of cost than proprietary alternatives, its challenges in working with public NO DEMOCRACY FOR APPS IN and its open code base can prevent cloud providers to get the project THE CLOUD? the vendor lock-in common with up and running. The company’s proprietary technologies. Open ongoing efforts may provide a source comes with its challenges, guide for your own initiative.
    [Show full text]
  • Adrian Florea Software Architect / Big Data Developer at Pentalog [email protected]
    Adrian Florea Software Architect / Big Data Developer at Pentalog [email protected] Summary I possess a deep understanding of how to utilize technology in order to deliver enterprise solutions that meet requirements of business. My expertise includes strong hands-on technical skills with large, distributed applications. I am highly skilled in Enterprise Integration area, various storage architectures (SQL/NoSQL, polyglot persistence), data modelling and, generally, in all major areas of distributed application development. Furthermore I have proven the ability to manage medium scale projects, consistently delivering these engagements within time and budget constraints. Specialties: API design (REST, CQRS, Event Sourcing), Architecture, Design, Optimisation and Refactoring of large scale enterprise applications, Propose application architectures and technical solutions, Projects coordination. Big Data enthusiast with particular interest in Recommender Systems. Experience Big Data Developer at Pentalog March 2017 - Present Designing and developing the data ingestion and processing platform for one of the leading actors in multi-channel advertising, a major European and International affiliate network. • Programming Languages: Scala, GO, Python • Storage: Apache Kafka, Hadoop HDFS, Druid • Data processing: Apache Spark (Spark Streaming, Spark SQL) • Continuous Integration : Git(GitLab) • Development environments : IntelliJ Idea, PyCharm • Development methodology: Kanban Back End Developer at Pentalog January 2015 - December 2017 (3 years)
    [Show full text]
  • Second Year Cloud-Like Management of Grid Sites Research Report
    Second Year Cloud-like Management of Grid Sites Research Report Henar Muñoz Frutos, Ignacio Blasco Lopez, Juan Carlos Cuesta Cuesta, Eduardo Huedo, Rubén Montero, Rafael Moreno, Ignacio Llorente To cite this version: Henar Muñoz Frutos, Ignacio Blasco Lopez, Juan Carlos Cuesta Cuesta, Eduardo Huedo, Rubén Montero, et al.. Second Year Cloud-like Management of Grid Sites Research Report. 2012. hal- 00705635 HAL Id: hal-00705635 https://hal.archives-ouvertes.fr/hal-00705635 Submitted on 8 Jun 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Enhancing Grid Infrastructures with Virtualization and Cloud Technologies Second Year Cloud-like Management of Grid Sites Research Report Deliverable D6.6 (V1.0) 4 June 2012 Abstract This report presents the results of the research and technological development ac- tivities undertaken during the second phase of the project by the three tasks in which WP6 is divided. Mainly, this work has been focused on management of complex multi-tier applications, scaling, monitoring and balancing them to face peaks in demand. In addition, advanced networking (network isolation and fire- walling) and storage capabilities (datastore abstraction and new transfer drivers) have been developed.
    [Show full text]