Implementation and Evaluation of a Data Pipeline for Industrial Iot Using Apache Nifi

Total Page:16

File Type:pdf, Size:1020Kb

Implementation and Evaluation of a Data Pipeline for Industrial Iot Using Apache Nifi The Faculty of Health, Science and Technology Computer Science Pontus Sj¨oberg, Lina Vilhelmsson Implementation and Evaluation of a Data Pipeline for Industrial IoT Using Apache NiFi Bachelor's Project 2020:06 Implementation and Evaluation of a Data Pipeline for Industrial IoT Using Apache NiFi Pontus Sj¨oberg, Lina Vilhelmsson c 2020 The author(s) and Karlstad University This report is submitted in partial fulfillment of the requirements for the Bachelor's degree in Computer Science. All material in this report which is not my own work has been identified and no material is included for which a degree has previously been conferred. Pontus Sj¨oberg Lina Vilhelmsson Approved, June 01, 2020 Advisor: Prof. Andreas Kassler Examiner: Per Hurtig iii Abstract In the last few years, the popularity of Industrial IoT has grown a lot, and it is expected to have an impact of over 14 trillion USD on the global economy by 2030. One application of Industrial IoT is using data pipelining tools to move raw data from industry machines to data storage, where the data can be processed by analytical instruments to help optimize the industrial operations. This thesis analyzes and evaluates a data pipeline setup for Industrial IoT built with the tool Apache NiFi. A data flow setup was designed in NiFi, which connected an SQL database, a file system, and a Kafka topic to a distributed file system. To evaluate the NiFi data pipeline setup, some tests were conducted to see how the system performed under different workloads. The first test consisted of determining which size to merge a FlowFile into to get the lowest latency, the second test if data from the different data sources should be kept separate or be merged together. The third test was to compare the NiFi setup with an alternative setup, which had a Kafka topic as an intermediary between NiFi and the endpoint. The first test showed that the lowest latency was achieved when merging FlowFiles together into 10 kB files. In the second test, merging together FlowFiles from all three sources gave a lower latency than keeping them separate for larger merging sizes. Finally, it was shown that there was no significant difference between the two test setups. v Acknowledgements We want to thank our mentor at Karlstad University, Andreas Kassler, for helping us with writing our report and guiding us through the project. We also want to thank Erik Hallin, our mentor at Uddeholm AB, for helping and guiding us with all the different tools and software we used throughout the project, and giving us insight in how our implementation might be used in an industry context. Lastly, we want to thank Uddeholm AB for letting us do this project for them. vi Contents 1 Introduction 1 2 Background 3 2.1 Introduction . .3 2.2 Concepts . .3 2.2.1 Industrial Internet of Things . .3 2.2.2 Data Pipelining . .4 2.2.3 Data Streaming . .5 2.3 Apache Kafka . .6 2.3.1 Topics . .6 2.3.2 Cluster . .6 2.3.3 Producers . .7 2.3.4 Consumers . .7 2.4 Apache NiFi . .8 2.4.1 Primary Components . .9 2.4.2 Extensions . 11 2.4.3 Security . 12 2.4.4 Cluster . 13 2.4.5 Compatibility . 14 2.5 NiFi as a Producer and Consumer for Kafka . 14 2.5.1 MiNiFi . 14 2.5.2 NiFi as a Producer . 15 2.5.3 NiFi as a Consumer . 16 2.6 Related Tools . 16 2.6.1 Apache Airflow . 16 2.6.2 Apache Spark . 17 vii 2.6.3 Apache Storm . 17 2.6.4 Azure Data Factory . 18 2.6.5 Logstash . 18 3 Data Pipelining Architecture and Prototype 19 3.1 Introduction . 19 3.2 Current Setup . 19 3.3 Why Bring in NiFi? . 21 3.4 New Pipelining Setups . 23 3.4.1 New Setup . 23 3.4.2 Alternative New Setup . 24 3.5 NiFi Processors Used . 26 3.5.1 Consuming from Kafka Topic . 27 3.5.2 Getting Files from File System . 27 3.5.3 Getting Data from MariaDB . 28 3.5.4 Other Processors . 29 4 Experimental Setup 31 4.1 Introduction . 31 4.2 Additional Software Used . 32 4.2.1 Apache Hadoop and HDFS . 32 4.2.2 MariaDB . 33 4.3 Compute Nodes . 34 4.3.1 Node 1 . 34 4.3.2 Node 2 . 35 4.3.3 Node 3 . 35 4.4 Experiment Description . 35 4.4.1 Performance Metrics . 35 viii 4.4.2 Test Descriptions . 37 5 Results & Evaluation 40 5.1 Introduction . 40 5.2 Results of Test 1 . 40 5.3 Results of Test 2 . 44 5.4 Results of Test 3 . 46 5.5 Conclusion of Results . 48 6 Conclusions 50 6.1 Project Summary and Evaluation . 50 6.2 Future Work . 51 References 54 A Appendix 59 A.1 Python Script for Processing Kafka Messages . 59 A.2 SQL Script for Loading Rows into MariaDB . 59 A.3 Software Download Links . 59 A.4 Raw Data . 59 A.5 Pictures . 60 ix List of Figures 2.1 A simplified view of the Kafka architecture . .7 2.2 NiFi's GUI . .9 2.3 A simplified view of the NiFi architecture . .9 2.4 A simplified view of a NiFi Cluster . 14 3.1 The current setup at Uddeholm AB . 20 3.2 New setup with NiFi sending data directly to HDFS . 24 3.3 Alternative new setup with NiFi sending data to HDFS through Kafka . 25 3.4 The processors used in NiFi for the new setup . 26 4.1 The three compute nodes used for the experiment. 34 4.2 NiFi data flow for the second test. 38 4.3 NiFi data flow for alternative new setup used for the third test. 39 5.1 Average FlowFile latency for different merging sizes. 40 5.2 Percentage of the total average FlowFile latency made up by the time be- tween MariaDB and NiFi, before being sent to HDFS. 42 5.3 Average FlowFile latency for different merging sizes. 42 5.4 Latency distribution for different merging sizes. 42 5.5 Average FlowFile latency for different merging sizes, comparing separate and combined merging. 44 5.6 Latency distribution for combined and separate merging. 44 5.7 Average throughput for the two different sources with different amounts of 1 kB sources. 46 5.8 Average FlowFile latency for the two different setups. 47 5.9 Latency distribution for the two different setups. 47 A.1 Full-size version of Figure 3.4 . 60 A.2 Full-size version of Figure 4.2 . 61 A.3 Full-size version of Figure 4.3 . 62 x List of Tables 4.1 Intervals for achieving different merging sizes . 37 xi 1 Introduction The Industrial Internet of Things, or Industrial IoT, is a subset of Internet of Things (IoT) specific for an industrial use, and it covers the machine-to-machine and industrial com- munication parts of IoT. [1] Industrial IoT has grown a lot in the past few years, and it is expected by some to have an impact on the global economy of over 14 trillion USD by the year 2030. [2] Industrial IoT focuses on integrating and interconnecting already exist- ing devices, whereas "consumer" IoT (e.g. smart devices) focuses more on creating new devices. An example of an Industrial IoT application is collecting large amounts of data from industrial machines, and sending this data to various analytical tools which then can optimize the industrial operations, based on how the machines are performing currently. [1, 3] One way this can be done is by the use of data pipelining tools, which are tools for moving data from one place to another. [4] In this project, we will try to evaluate data pipelining and data flows in the context of Industrial IoT by creating data pipelining setups in the tool Apache NiFi (or, simply NiFi) and try to find the best way to include NiFi into an architecture where data needs to be moved from several starting points into a cloud-based file system. To evaluate this setup, we will also be testing the performance of the data flow setup in NiFi with different configurations and workloads. Currently, there are very few scientific papers available that test the performance of a NiFi data flow. [5, 6] Therefore, the result of this project is interesting to the task provider Uddeholm AB, as they are looking into using NiFi as part of their data streaming architecture. The specific tests performed in this project are designed with Uddeholm AB in mind, to answer the questions they have about the performance of a NiFi data flow setup. 1 The disposition of the report is as follows: In Chapter 2, some background to the technologies, concepts, and tools used is given. The technologies and concepts described in this chapter are Industrial IoT, data pipelining, and data streaming. The tools Apache Kafka and Apache NiFi are also described in detail in this chapter, along with some shorter information about some alternative data.
Recommended publications
  • Working with Storm Topologies Date of Publish: 2018-08-13
    Apache Storm 3 Working with Storm Topologies Date of Publish: 2018-08-13 http://docs.hortonworks.com Contents Packaging Storm Topologies................................................................................... 3 Deploying and Managing Apache Storm Topologies............................................4 Configuring the Storm UI.................................................................................................................................... 4 Using the Storm UI.............................................................................................................................................. 5 Monitoring and Debugging an Apache Storm Topology......................................6 Enabling Dynamic Log Levels.............................................................................................................................6 Setting and Clearing Log Levels Using the Storm UI.............................................................................6 Setting and Clearing Log Levels Using the CLI..................................................................................... 7 Enabling Topology Event Logging......................................................................................................................7 Configuring Topology Event Logging.....................................................................................................8 Enabling Event Logging...........................................................................................................................8
    [Show full text]
  • Apache Flink™: Stream and Batch Processing in a Single Engine
    Apache Flink™: Stream and Batch Processing in a Single Engine Paris Carboney Stephan Ewenz Seif Haridiy Asterios Katsifodimos* Volker Markl* Kostas Tzoumasz yKTH & SICS Sweden zdata Artisans *TU Berlin & DFKI parisc,[email protected][email protected][email protected] Abstract Apache Flink1 is an open-source system for processing streaming and batch data. Flink is built on the philosophy that many classes of data processing applications, including real-time analytics, continu- ous data pipelines, historic data processing (batch), and iterative algorithms (machine learning, graph analysis) can be expressed and executed as pipelined fault-tolerant dataflows. In this paper, we present Flink’s architecture and expand on how a (seemingly diverse) set of use cases can be unified under a single execution model. 1 Introduction Data-stream processing (e.g., as exemplified by complex event processing systems) and static (batch) data pro- cessing (e.g., as exemplified by MPP databases and Hadoop) were traditionally considered as two very different types of applications. They were programmed using different programming models and APIs, and were exe- cuted by different systems (e.g., dedicated streaming systems such as Apache Storm, IBM Infosphere Streams, Microsoft StreamInsight, or Streambase versus relational databases or execution engines for Hadoop, including Apache Spark and Apache Drill). Traditionally, batch data analysis made up for the lion’s share of the use cases, data sizes, and market, while streaming data analysis mostly served specialized applications. It is becoming more and more apparent, however, that a huge number of today’s large-scale data processing use cases handle data that is, in reality, produced continuously over time.
    [Show full text]
  • DSP Frameworks DSP Frameworks We Consider
    Università degli Studi di Roma “Tor Vergata” Dipartimento di Ingegneria Civile e Ingegneria Informatica DSP Frameworks Corso di Sistemi e Architetture per Big Data A.A. 2017/18 Valeria Cardellini DSP frameworks we consider • Apache Storm (with lab) • Twitter Heron – From Twitter as Storm and compatible with Storm • Apache Spark Streaming (lab) – Reduce the size of each stream and process streams of data (micro-batch processing) • Apache Flink • Apache Samza • Cloud-based frameworks – Google Cloud Dataflow – Amazon Kinesis Streams Valeria Cardellini - SABD 2017/18 1 Apache Storm • Apache Storm – Open-source, real-time, scalable streaming system – Provides an abstraction layer to execute DSP applications – Initially developed by Twitter • Topology – DAG of spouts (sources of streams) and bolts (operators and data sinks) Valeria Cardellini - SABD 2017/18 2 Stream grouping in Storm • Data parallelism in Storm: how are streams partitioned among multiple tasks (threads of execution)? • Shuffle grouping – Randomly partitions the tuples • Field grouping – Hashes on a subset of the tuple attributes Valeria Cardellini - SABD 2017/18 3 Stream grouping in Storm • All grouping (i.e., broadcast) – Replicates the entire stream to all the consumer tasks • Global grouping – Sends the entire stream to a single task of a bolt • Direct grouping – The producer of the tuple decides which task of the consumer will receive this tuple Valeria Cardellini - SABD 2017/18 4 Storm architecture • Master-worker architecture Valeria Cardellini - SABD 2017/18 5 Storm
    [Show full text]
  • Use Splunk with Big Data Repositories Like Spark, Solr, Hadoop and Nosql Storage
    Copyright © 2016 Splunk Inc. Use Splunk With Big Data Repositories Like Spark, Solr, Hadoop And Nosql Storage Raanan Dagan, May Long Big Data Architect, Splunk Disclaimer During the course of this presentaon, we may make forward looking statements regarding future events or the expected performance of the company. We cauJon you that such statements reflect our current expectaons and esJmates based on factors currently known to us and that actual events or results could differ materially. For important factors that may cause actual results to differ from those contained in our forward-looking statements, please review our filings with the SEC. The forward- looking statements made in the this presentaon are being made as of the Jme and date of its live presentaon. If reviewed aer its live presentaon, this presentaon may not contain current or accurate informaon. We do not assume any obligaon to update any forward looking statements we may make. In addiJon, any informaon about our roadmap outlines our general product direcJon and is subject to change at any Jme without noJce. It is for informaonal purposes only and shall not, be incorporated into any contract or other commitment. Splunk undertakes no obligaon either to develop the features or funcJonality described or to include any such feature or funcJonality in a future release. 2 Agenda Use Cases: Fraud With Solr, Splunk, And Splunk AnalyJcs For Hadoop Business AnalyJcs With Cassandra, Splunk Cloud, And Splunk AnalyJcs For Hadoop Document Classificaon With Spark And Splunk Network IT With Kaa And Splunk Kaa Add On Demo 3 Fraud With Solr, Splunk, And Splunk AnalyJcs For Hadoop Use Case: Fraud – Why Apache Solr Apache Solr is an open source enterprise search plaorm from the Apache Lucene API.
    [Show full text]
  • Apache Storm Tutorial
    Apache Storm Apache Storm About the Tutorial Storm was originally created by Nathan Marz and team at BackType. BackType is a social analytics company. Later, Storm was acquired and open-sourced by Twitter. In a short time, Apache Storm became a standard for distributed real-time processing system that allows you to process large amount of data, similar to Hadoop. Apache Storm is written in Java and Clojure. It is continuing to be a leader in real-time analytics. This tutorial will explore the principles of Apache Storm, distributed messaging, installation, creating Storm topologies and deploy them to a Storm cluster, workflow of Trident, real-time applications and finally concludes with some useful examples. Audience This tutorial has been prepared for professionals aspiring to make a career in Big Data Analytics using Apache Storm framework. This tutorial will give you enough understanding on creating and deploying a Storm cluster in a distributed environment. Prerequisites Before proceeding with this tutorial, you must have a good understanding of Core Java and any of the Linux flavors. Copyright & Disclaimer © Copyright 2014 by Tutorials Point (I) Pvt. Ltd. All the content and graphics published in this e-book are the property of Tutorials Point (I) Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republish any contents or a part of contents of this e-book in any manner without written consent of the publisher. We strive to update the contents of our website and tutorials as timely and as precisely as possible, however, the contents may contain inaccuracies or errors.
    [Show full text]
  • Personal Information Backgrounds Abilities Highlights
    Yi Du Computer Network Information Center, Chinese Academy of Sciences Phone:+86-15810134970 Email:[email protected] Homepage: http://yiducn.github.io/ Personal Information Gender: Male Date of Graduate:July, 2013 Address: 4# Building, South 4th Street Zhong Guan Cun. Beijing, P.R.China. 100190 Backgrounds 2015.12~Now Department of Big Data Technology and Application Development, Computer Network Information Center Beijing, China Job Title: Associate Professor Research Interest: Data Mining, Visual Analytics 2015.09~2016.09 School of Electrical and Computer Engineering, Purdue University USA Job Title: Visiting Scholar Research Interest: Spatio-temporal Visualization, Visual Analytics 2013.09~2015.12 Scientific Data Center, Computer Network Information Center, CAS Beijing, China Job Title: Assistant Professor Research Interest: Data Processing, Data Visualization, HCI 2008.09~2013.09 Institute of Software Chinese Academy of Sciences Beijing, China Major: Computer Applied Technology Doctoral Degree Research Interest: Human Computer Interaction(HCI), Information Visualization 2004.08-2008.07 Shandong University Jinan Major: Software Engineering Bachelor's Degree Abilities Master the design and development of data science system, including data collecting, wrangling, analyzing, mining, visualizing and interacting. Master analyzing and mining of large-scale spatio-temporal data. Experiences in coding with Java, JavaScript. Familiar with Python and C++. Experiences in MongoDB, DB2. Familiar with MS SQLServer and Oracle. Experiences in traditional data mining and machine learning algorithms. Familiar with Hadoop, Titan, Kylin, etc. Highlights Participated in an open source project Gephi, contributed several plugins with over 3000 lines of code. An analytics platform named DVIZ, in which I played the leading role, gained several prizes in China.
    [Show full text]
  • Hortonworks Cybersecurity Platform Administration (April 24, 2018)
    Hortonworks Cybersecurity Platform Administration (April 24, 2018) docs.cloudera.com Hortonworks Cybersecurity April 24, 2018 Platform Hortonworks Cybersecurity Platform: Administration Copyright © 2012-2018 Hortonworks, Inc. Some rights reserved. Hortonworks Cybersecurity Platform (HCP) is a modern data application based on Apache Metron, powered by Apache Hadoop, Apache Storm, and related technologies. HCP provides a framework and tools to enable greater efficiency in Security Operation Centers (SOCs) along with better and faster threat detection in real-time at massive scale. It provides ingestion, parsing and normalization of fully enriched, contextualized data, threat intelligence feeds, triage and machine learning based detection. It also provides end user near real-time dashboarding. Based on a strong foundation in the Hortonworks Data Platform (HDP) and Hortonworks DataFlow (HDF) stacks, HCP provides an integrated advanced platform for security analytics. Please visit the Hortonworks Data Platform page for more information on Hortonworks technology. For more information on Hortonworks services, please visit either the Support or Training page. Feel free to Contact Us directly to discuss your specific needs. Except where otherwise noted, this document is licensed under Creative Commons Attribution ShareAlike 4.0 License. http://creativecommons.org/licenses/by-sa/4.0/legalcode ii Hortonworks Cybersecurity April 24, 2018 Platform Table of Contents 1. HCP Information Roadmap .........................................................................................
    [Show full text]
  • My Steps to Learn About Apache Nifi
    My steps to learn about Apache NiFi Paulo Jerônimo, 2018-05-24 05:36:18 WEST Table of Contents Introduction. 1 About this document . 1 About me . 1 Videos with a technical background . 2 Lab 1: Running Apache NiFi inside a Docker container . 3 Prerequisites . 3 Start/Restart. 3 Access to the UI . 3 Status. 3 Stop . 3 Lab 2: Running Apache NiFi locally . 5 Prerequisites . 5 Installation. 5 Start . 5 Access to the UI . 5 Status. 5 Stop . 6 Lab 3: Building a simple Data Flow . 7 Prerequisites . 7 Step 1 - Create a Nifi docker container with default parameters . 7 Step 2 - Access the UI and create two processors . 7 Step 3 - Add and configure processor 1 (GenerateFlowFile) . 7 Step 4 - Add and configure processor 2 (Putfile) . 10 Step 5 - Connect the processors . 12 Step 6 - Start the processors. 14 Step 7 - View the generated logs . 14 Step 8 - Stop the processors . 15 Step 9 - Stop and destroy the docker container . 15 Conclusions . 15 All references . 16 Introduction Recently I had work to produce a document with a comparison between two tools for Cloud Data Flow. I didn’t have any knowledge of this kind of technology before creating this document. Apache NiFi is one of the tools in my comparison document. So, here I describe some of my procedures to learn about it and take my own preliminary conclusions. I followed many steps on my own desktop (a MacBook Pro computer) to accomplish this task. This document shows you what I did. Basically, to learn about Apache NiFi in order to do a comparison with other tool: • I saw some videos about it.
    [Show full text]
  • Handling Data Flows of Streaming Internet of Things Data
    IT16048 Examensarbete 30 hp Juni 2016 Handling Data Flows of Streaming Internet of Things Data Yonatan Kebede Serbessa Masterprogram i datavetenskap Master Programme in Computer Science i Abstract Handling Data Flows of Streaming Internet of Things Data Yonatan Kebede Serbessa Teknisk- naturvetenskaplig fakultet UTH-enheten Streaming data in various formats is generated in a very fast way and these data needs to be processed and analyzed before it becomes useless. The technology currently Besöksadress: existing provides the tools to process these data and gain more meaningful Ångströmlaboratoriet Lägerhyddsvägen 1 information out of it. This thesis has two parts: theoretical and practical. The Hus 4, Plan 0 theoretical part investigates what tools are there that are suitable for stream data flow processing and analysis. In doing so, it starts with studying one of the main Postadress: streaming data source that produce large volumes of data: Internet of Things. In this, Box 536 751 21 Uppsala the technologies behind it, common use cases, challenges, and solutions are studied. Then it is followed by overview of selected tools namely Apache NiFi, Apache Spark Telefon: Streaming and Apache Storm studying their key features, main components, and 018 – 471 30 03 architecture. After the tools are studied, 5 parameters are selected to review how Telefax: each tool handles these parameters. This can be useful for considering choosing 018 – 471 30 00 certain tool given the parameters and the use case at hand. The second part of the thesis involves Twitter data analysis which is done using Apache NiFi, one of the tools Hemsida: studied. The purpose is to show how NiFi can be used for processing data starting http://www.teknat.uu.se/student from ingestion to finally sending it to storage systems.
    [Show full text]
  • HDP 3.1.4 Release Notes Date of Publish: 2019-08-26
    Release Notes 3 HDP 3.1.4 Release Notes Date of Publish: 2019-08-26 https://docs.hortonworks.com Release Notes | Contents | ii Contents HDP 3.1.4 Release Notes..........................................................................................4 Component Versions.................................................................................................4 Descriptions of New Features..................................................................................5 Deprecation Notices.................................................................................................. 6 Terminology.......................................................................................................................................................... 6 Removed Components and Product Capabilities.................................................................................................6 Testing Unsupported Features................................................................................ 6 Descriptions of the Latest Technical Preview Features.......................................................................................7 Upgrading to HDP 3.1.4...........................................................................................7 Behavioral Changes.................................................................................................. 7 Apache Patch Information.....................................................................................11 Accumulo...........................................................................................................................................................
    [Show full text]
  • Hdf® Stream Developer 3 Days
    TRAINING OFFERING | DEV-371 HDF® STREAM DEVELOPER 3 DAYS This course is designed for Data Engineers, Data Stewards and Data Flow Managers who need to automate the flow of data between systems as well as create real-time applications to ingest and process streaming data sources using Hortonworks Data Flow (HDF) environments. Specific technologies covered include: Apache NiFi, Apache Kafka and Apache Storm. The course will culminate in the creation of a end-to-end exercise that spans this HDF technology stack. PREREQUISITES Students should be familiar with programming principles and have previous experience in software development. First-hand experience with Java programming and developing within an IDE are required. Experience with Linux and a basic understanding of DataFlow tools and would be helpful. No prior Hadoop experience required. TARGET AUDIENCE Developers, Data & Integration Engineers, and Architects who need to automate data flow between systems and/or develop streaming applications. FORMAT 50% Lecture/Discussion 50% Hands-on Labs AGENDA SUMMARY Day 1: Introduction to HDF Components, Apache NiFi dataflow development Day 2: Apache Kafka, NiFi integration with HDF/HDP, Apache Storm architecture Day 3: Storm management options, multi-language support, Kafka integration DAY 1 OBJECTIVES • Introduce HDF’s components; Apache NiFi, Apache Kafka, and Apache Storm • NiFi architecture, features, and characteristics • NiFi user interface; processors and connections in detail • NiFi dataflow assembly • Processor Groups and their elements
    [Show full text]
  • ADMI Cloud Computing Presentation
    ECSU/IU NSF EAGER: Remote Sensing Curriculum ADMI Cloud Workshop th th Enhancement using Cloud Computing June 10 – 12 2016 Day 1 Introduction to Cloud Computing with Amazon EC2 and Apache Hadoop Prof. Judy Qiu, Saliya Ekanayake, and Andrew Younge Presented By Saliya Ekanayake 6/10/2016 1 Cloud Computing • What’s Cloud? Defining this is not worth the time Ever heard of The Blind Men and The Elephant? If you still need one, see NIST definition next slide The idea is to consume X as-a-service, where X can be Computing, storage, analytics, etc. X can come from 3 categories Infrastructure-as-a-S, Platform-as-a-Service, Software-as-a-Service Classic Cloud Computing Computing IaaS PaaS SaaS My washer Rent a washer or two or three I tell, Put my clothes in and My bleach My bleach comforter dry clean they magically appear I wash I wash shirts regular clean clean the next day 6/10/2016 2 The Three Categories • Software-as-a-Service Provides web-enabled software Ex: Google Gmail, Docs, etc • Platform-as-a-Service Provides scalable computing environments and runtimes for users to develop large computational and big data applications Ex: Hadoop MapReduce • Infrastructure-as-a-Service Provide virtualized computing and storage resources in a dynamic, on-demand fashion. Ex: Amazon Elastic Compute Cloud 6/10/2016 3 The NIST Definition of Cloud Computing? • “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” On-demand self-service, broad network access, resource pooling, rapid elasticity, measured service, http://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-145.pdf • However, formal definitions may not be very useful.
    [Show full text]