Improving Innovation and Entrepreneurship Competences of Iranian Higher Education Graduates through Data Analytics

Module 5: Distributed Systems

Dr. Atakan Aral, Vienna University of Technology February 8, 2018 Evolution of Distributed Systems

2

1960s 1970s 1980s 1990s 2000s 2010s • Main- • Client / • Clusters • Grids • Clouds • Micro- frame Server services systems • Edge / Fog Why Distributed Systems?

3

 Performance  Redundancy / Fault Tolerance / Availability  Scalability / Flexibility  Resource sharing / Utilization  Economics  Accessibility  Mobility Issues with Distributed Systems

4

 Security  Overloading / Load balance  Synchronization  Management  More components to fail (e.g. Network) 5

Module Outline Module Outline

6

 Cloud Computing (Yesterday 09:00-12:15)  Use cases  Example problems and solutions  Distributed Systems (Today 09:00-17:00)  Technologies from industry  Hands-on session Module Outline

7

 Server including Hypervisors  Network virtualization  Cloud OSs  SLAs and markets, managing market liquidity  SLA management and negotiation  Practical session: Distributed system modeling with runway 8

High Performance Computing Aerodynamic simulation Stock exchange simulation Discovery of new galaxies

Modeling and simulation of Preoperative surgery planning meso-scale weather phenomena Parallel Processing – does it really work?

10

1 worker = 1000 days 2 workers = 500 days

...

1000 workers = 1 day?! HPC - Infrastructures

11 Supercomputer •Custom processor •Tightly coupled

Cluster • COTS-Components • Loosely coupled • Beowulf Clusters

Grids •Interconnection of computational resources across different administration domains •Virtual organizations (VO) How sustainable is LSDC?

Cluster and HPC Distributed Computing Computing • Tightly coupled • Loosely coupled • Homogeneous • Heterogeneous • Single System Image • Single administration

Big Data Grid Computing • No SQL DBs • Real time • Large scale processing • Cross-organizational • Distributed • Geographical distribution Queries • Distributed management After 2013: Cloud Computing Sky, Galaxy, Computing, • Provisioned on demand Ultra scale data ?? – • Service guarantee well, we are ready! • VMs and Web 2.0-based Sequential Processing

13

Memory  Store program and data in memory

fetch execute  CPU get instructions and data from memory CPU  Decode instruction

 Execute it sequentially Supercomputer - Architectures

14

 Vector Processor  Single operation on multiple data Single Multiple instruction Instruction  Scalar Processor  Single operation on single data Single Data SISD MISD

 SISD e.g. sequential e.g. … doesn't computer really exist…  Single Instruction, Single Data Multiple MISD MIMD  SIMD Data e.g. vector e.g. clusters,  Single Instruction, Multiple Data computing nowadays  MISD multicores  Multiple Instruction, Single Data

 MIMD Flynn's taxonomy (1966)  Multiple Instruction, Multiple Data Memory architecture (I)

15

CPU  Multiple CPUs can operate independently CPU Memory CPU  Changes in memory visible to other CPUs CPU  Uniform memory access: Symmetric Multiprocessor - SMP  Non-uniform memory access Memory Architecture (II)

16

CPU CPU  Memory Memory Communication Network network required  Processors have their CPU CPU own memory Memory Memory  No global address space Memory Architecture (II)

17

 Hybrid-distributed- Node 4 Memory Memory Node 1 shared CPU CPU CPU CPU Network  Used in most

parallel computers CPU CPU CPU CPU today Node 3 Memory Memory Node 2  Cache-coherent SMP nodes  Distributed memory  multiple SMP nodes Parallel Programming

18

 Shared Memory (“”) Machine 1 Machine 2

 e.g. OpenMP Task 0 Task 1  Threads Data  Message Passing Data (“distributed computing”)  e.g. Message Passing interface Network (MPI) send(data) receive(data)  Hybrid approaches  e.g. OpenMP and MPI  MPI and POSIX Threads TOP500 Ranking

19

1018 floating point operations per second HPC - Who does it need ?

20

Multi-core architectures  Scientists? ?

 Or everyone? 21

Grid Computing Grid Computing

 hardware and software infrastructure that clusters and integrates high- end computers, networks, databases and scientific instruments

 virtual supercomputer

 virtual organizations

WWGrid 23

Source: “CSA“ Grid Computing Vision

G Mobile R Supercomputer, PC-Cluster Access I D

M I D Workstation D L Data-storage, Sensors, Experiments E W A R E Source: “EGEE“ Visualising Internet, Networks The Grid Reality – that didn‘t work well ... Replica Catalogue UI Input “sandbox” DataSets info JDL Information Output “sandbox” Service Resource

Broker Job Submit Event Submit Job

Author. Query Job

&Authen. Publish Expanded JDLExpanded

Globus RSL Storage Element Job StatusJob Submission Service Logging & Compute Book-keeping Job Status Element Grid Computing

 Internet → sharing, distribution, and pervasive access to information

 Grid Computing → sharing, distribution, and pervasive access to computing power

 …“computational Grid is hardware and software infrastructure that provides dependable, consistent, and pervasive access to high-end computational capabilities”… Foster, Kesselman (1998)

 …“grid computing is concerned with coordinated resource sharing and problem solving in dynamic, multi-institutional virtual organizations”… Foster, Kesselman (2000)

 ... „Grid ...  uses standard, open, general purpose protocols and interfaces  coordinates resources that are NOT subject to centralized control  delivers non-trivial qualities of service”… Foster, Kesselman (2002) What is the Grid?

27

 Grid Computing  Origin in academia  Moving to industry, commerce,…  highly popular

 Compute Grids, Data Grids, Science Grids, Access Grids, Knowledge Grids, Bio Grids, Sensor Grids, Cluster Grids, Campus Grids, Tera Grids, Commodity Grids,…

 Grid Checklist:  … coordinates resources that are not subject to centralized control …

 … using standard, open, general-purpose protocols and interfaces…

 … to deliver nontrivial qualities of service… Success Stories

28 EGEE Infrastructure

Scale > 170 sites in 39 countries > 17 000 CPUs > 5 PB storage > 10 000 concurrent jobs per day Source: Erwin Laure, > 60 Virtual Organizations CERN 30

Virtualization Virtualization Middleware

31

 Virtual Machines OS  VMWare – (Vsphere)  Xen  Middleware Management  OpenNebula  Eucalyptus  Aneka Clouds  FoSII  Monitoring, Knowledge Management, SLA management, energy efficiency  Programming Models  Map Reduce  Access Management  VieSLAF, Compliance Management, Security Issues VMs

32

 VMM decouples the software from the hardware by forming a level of indirection between the software running in the virtual machine (layer above the VMM) and the hardware.

 VMM

 total mediation of all interactions between the virtual machine and underlying hardware

 allowing strong isolation between virtual machines and supporting the multiplexing of many virtual machines on a single hardware platform.

 The central design goals for VMMs:

 Compatibility

 Performance

 Simplicity Why virtualization?

33

 Consolidate workloads to reduce hardware, power, and space requirements.

 Run multiple operating systems simultaneously — as an enterprise upgrade path, or to leverage the advantages of specific operating systems

 Run legacy software on newer, more reliable, and more power-efficient hardware.

 Dynamically migrate workloads to provide fault tolerance.

 Provide redundancy to support disaster recovery.

 Elasticity by means of vertical and horizontal scaling Types of virtualization

34

 Software, or full virtualization  Hypervisor “trap” the machine operations to read or modify the system’s status or perform input/output (I/O) operations  Emulation of operations  Status code consistent with the OS

 Partial virtualization or para-virtualization  Eliminates trapping and emulating  Guest OS knows about hypervisor

 Hardware-assisted virtualization  hardware extensions to the x86 system architecture to eliminate much of the hypervisor overhead assoc. with trapping / emulating I/O ops.  Rapid Virtualization Indexing  Hardware assisted memory management Hosted vs. Hypervisor

35

 A hosted architecture  A hypervisor (bare-metal)

 installs and runs the virtualization architecture layer as an application on top of  installs the virtualization layer an operating system and supports directly on a clean x86-based the broadest range of hardware system. configurations.  a hypervisor is more efficient  it has direct access to the than a hosted architecture and hardware delivers greater scalability,  VMware Player, ACE, robustness and performance Workstation and Server  Xen Basic VM Techniques

36

 CPU Virtualization  the basic VMM technique of direct execution – executing the virtual machine on the real machine, while letting the VMM retain ultimate control of the CPU. X86 architecture  Memory Virtualization  VMM maintain a shadow of the virtual machine’s memory- management data structure  I/O Virtualization  using a channel processor, the VMM safely export I/O device access directly to the virtual machine  e.g. VNICs CPU Virtualization

37

 Basic direct execution  running the virtual machine’s privileged (operating-system kernel) and unprivileged code in the CPU’s unprivileged mode, while the VMM runs in privileged mode.

 When the virtual machine attempts to perform a privileged operation, the CPU traps into the VMM, which emulates the privileged operation on the virtual machine state that the VMM manages

 Example: interrupts  Letting a guest operating system disable interrupts would not be safe since the VMM could not regain control of the CPU.  Instead, the VMM would trap the operation to disable interrupts and then record that interrupts were disabled for that virtual machine. Challenges in Virtualizing CPUs

38

 most modern CPU architectures were not designed to be virtualizable, including the popular x86 architecture.  x86 operating systems use the x86 POPF instruction (pop CPU flags from stack) to set and clear the interrupt-disable flag.  When it runs in unprivileged mode, POPF does not trap. ?? it simply ignores the changes to the interrupt flag, so direct execution techniques will not work for privileged-mode code that uses this instruction

 x86 architecture: unprivileged instructions let the CPU access privileged state.  Software running in the virtual machine can read the code segment register to determine the processor’s current privilege level.  A virtualizable processor would trap this instruction, and the VMM could then patch what the software running in the virtual machine to reflect the virtual machine’s privilege level.  The x86, however, doesn’t trap the instruction, so with direct execution, the software would see the wrong privilege level in the code segment register. Techniques for Virtualizing CPUs

 Paravirtualization  VMM builder defines the virtual machine interface by replacing nonvirtualizable portions of the original instruction set with easily virtualized and more efficient equivalents.  Although operating systems must be ported to run in a virtual machine, most normal applications run unmodified.  Drawback: incompatibility.  Any operating system run in a paravirtualized VMM must be ported to that architecture. Operating system vendors must cooperate, legacy operating systems cannot run, and existing machines cannot easily migrate into virtual machines.

 Direct execution combined with fast binary translation  In most modern operating systems, the processor modes that run normal application programs are virtualizable and hence can run using direct execution.  A binary translator can run privileged modes that are nonvirtualizable, patching the nonvirtualizable x86 instructions.   a high-performance virtual machine that matches the hardware and thus maintains total software compatibility. Techniques for Virtualizing CPUs

40

 Hardware assisted trapping  Sensitive instructions are automatically trapped by the hardware  No need for binary translation or OS modification  A new privilege level: The hypervisor can now run at "Ring -1“. Memory Virtualization

41

 The shadow page table  lets the VMM precisely control which pages of the machine’s memory are available to a virtual machine.  When the operating system running in a virtual machine establishes a mapping in its page table, the VMM detects the changes and establishes a mapping in the corresponding shadow page table entry that points to the actual page location in the hardware memory.

 When the virtual machine is executing, the hardware uses the shadow page table for memory translation so that the VMM can always control what memory each virtual machine is using. Challenges in Memory Virtual.

42

 VMM’s virtual memory subsystem constantly controls how much memory goes to a virtual machine  Periodically reclaim some of that memory by paging a portion of the virtual machine out to disk.

 The operating system running in the virtual machine (the GuestOS), however, is likely to have much better information than a VMM’s virtual memory system about which pages are good candidates for paging out.  E.g., GuestOS might note that the process that created a page has exited, which means nothing will access the page again. The VMM operating at the hardware level does not see this and might wastefully page out that page.  Solution: balloon processes by inflating the memory (see Xen part)

 size of modern operating systems and applications.  Solution: content-based page sharing for their server products I/O Virtualization

43

 Rather than communicating with the device using traps into the VMM

  the software in the virtual machine could directly read and write the device. Network Virtualization

44

 External: Combine and divide LANs into virtual networks

 Internal: Emulate a physical network with software in a virtualized server  Separates logical network behaviour from the underlying physical network resources

 Virtualized Network Interface Card (vNIC)  Has its own MAC address  create virtual networks between virtual machines without the network traffic consuming bandwidth on the physical network  NIC teaming allows multiple physical NICS to appear as one and failover transparently for virtual machines  virtual machines can be seamlessly relocated to different systems while keeping their existing MAC addresses  Similar to memory virtualization, total virtual bandwidth is more than what is physically available Network Virtualization

45

 A Distributed (or Elastic) Virtual Switch functions as a single virtual switch across all associated hosts  Virtual machines to maintain consistent network configuration as they migrate across multiple hosts  It can route traffic internally between virtual machines or link to an external network by connecting to physical Ethernet adapters . Main benefits network virtualization . Better utilization of the available resources by consolidating various applications on less servers . Cost effective because hardware NICs can be virtualized . Reduces the time to provision network / deploy virtual machines and applications 46 47

Xen Xen and the Art of Virtualization

48

 Drawbacks with the full virtualization  Certain supervisor instructions must be handled by the VMM for correct virtualization, but executing these with insufficient privilege fails silently rather than causing a convenient trap  Efficiently virtualizing x86 MMU is also difficult

-- > paravirtualization Source: “Ian Pratt, Keir Fraser, Steven Hand, Christian Limpach, and Andrew Warfield. Xen 3.0 and the art of virtualization. In Proceedings of the Linux Symposium, volume 2, Ottawa, Ontario, Canada, July 2005.” Paravirtualization

49

 Support for unmodified application binaries is essential, or users will not transition to Xen.

 Supporting full multi-application operating systems is important, as this allows complex server configurations to be virtualized within a single guest OS instance.

 Paravirtualization is necessary to obtain high performance and Even on cooperative machine strong resource isolation on architectures, completely hiding the uncooperative machine effects of resource virtualization from architectures such as x86. guest Oses risks both correctness and performance. Paravirtualization

50

Source: “Ian Pratt, Keir Fraser, Steven Hand, Christian Limpach, and Andrew Warfield. Xen 3.0 and the art of virtualization. In Proceedings of the Linux Symposium, volume 2, Ottawa, Ontario, Canada, July 2005.” Domains in XEN

51

 Dom0  Priviledged access to hardware  Major goal: device multiplexing

 DomUs  Guest Oses  No direct access to hardware, there are special drivers  But there are exceptions

 Hardware Virtual Machine  Since extensions for the VMs are build in IA-32/AMD64 it is possible for Xen to run unmodified Oses Xen – CPU Virtualization

52

 IA-32 has 4 rings

 AMD64 has only 2 rings – security zone sharing between guest Os and the apps 

 separation of kernels with the apps using the hypervisors Xen - Hypercalls

53 Xen – Memory Management

54

shadows balloons Xen – I/O

55 56

VMWare Before and After Virtualization

57

Before Virtualization: After Virtualization: • Single OS image per machine • Hardware-independence of operating system and applications • Software and hardware tightly coupled • Virtual machines can be provisioned to any system • Running multiple applications on same machine often creates conflict • Can manage OS and application as a single unit by • Underutilized resources encapsulating them into virtual machines • Inflexible and costly infrastructure

Source: VMWare Hosted - Hypervisor Architecture

58 VMWare Family

59 60

Cloud Markets 61 Introduction

62

 An important characteristic of Cloud markets is the liquidity of the traded good  For the market to function efficiently, a sufficient number of market participants is needed  Creating such a market with a large number of providers and consumers is far from trivial  Resource consumers will only join, if they are able to find what they need quickly  Resource providers will only join, if they can be fairly certain that their resources will be sold  Not meeting either of these conditions will deter providers and consumer from using the market Cloud Characteristic

63

 Deployment Types  Delivery Models  Private cloud  Cloud Software as a Service (SaaS)  Enterprise owned or leased e.g., in  Use provider’s applications over a network case of data centers, HPC centers,… e.g., Salesforce.com,…  Community cloud  Cloud Platform as a Service (PaaS)  Shared infrastructure for specific  Deploy customer-created applications to a community cloud e.g., Google App Engine, Microsoft Azure, …  Public cloud  Cloud Infrastructure as a Service (IaaS)  Sold to the public, mega-scale  Rent processing, storage, network capacity, infrastructure e.g., ec2, S3,… and other fundamental computing resources  Hybrid cloud e.g., EC2 – Elastic Computer Cloud, S3 – Simple Storage Service, Simple DB,…  Composition of two or more clouds

Source: ”Effectively and Securely Using the Cloud Computing Paradigm” Peter Mell, Tim Grance NIST, Information Technology Laboratory Cloud enabling Technologies

64 Primary Technologies Other Technologies

 Virtualization  Autonomic Systems

 Grid technology  Web 2.0

 SOA  Web application

 Distributed Computing frameworks

 Broadband Network  SLAs

 Browser as a platform Source: ”Effectively, Information Technology Laboratoryand Securely Using the Cloud  Free and Open Source Computing Paradigm” Software Peter Mell, Tim Grance NIST Problems when providing virtual goods 65

 The use of virtualization enables providers to create a wide range of resource types and allows consumers to specify their needs precisely  If the resource variability of both sides is large, consumers and providers will not meet, since their offers may differ slightly

 Demonstrate the problems caused by a large number of resource definitions  Simulation  We will then introduce an approach, based on SLA mappings, which ensures sufficient liquidity in the market State-of-the-Art: Resource Markets in Research 66

 The research into resource markets can be divided into two groups, when looking at their attempts of describing the tradable good

 The first group consists largely of Grid market designs that did not define goods clearly:  E.g. GRACE developed a market architecture for Grid markets and outlined a market mechanism  E.g. The SORMA project focused more on fairness and efficient resource allocation; it also identified several requirements for open Grid markets (allocative efficiency, computational tractability, individual rationality, etc) State-of-the-Art: Resource Markets in Research 67

 The second group has simplified the computing resource good by focusing on only one aspect of it  In MACE, the importance of developing a definition for the tradable good was recognized and an abstraction was developed. The liquidity of goods and the likelihood that consumer and providers with common offers can meet, was not addressed  The Popcorn market only traded Java Operations which simplified the matching between consumers and providers  The Spawn market was envisioned to work with CPU time slices which makes the matching of demand and supply trivial but forces consumers to determine the number of required CPU cycles  Neither group discusses the liquidity of goods in Cloud computing markets State-of-the-Art: Commercial Resource Providers 68

 In recent years, a large number of commercial Cloud providers have entered the utility computing market, offering a number of different types of services  Resource providers who only provide computing resources (e.g. Amazon, Tsunamic Technologies)  Saas providers who sell their own resources together with their own software services (e.g. Google Apps, Salesforce.com)  Companies that attempt to run a mixed approach, i.e. they allow users to create their own services but at the same time, offer their own services (Sun N1 Grid, Microsoft Azure)

 In the current market, providers only sell a single type of resources (with the exception of Amazon)

 This limited number of different resource types enables a market creation, since all demand is channeled towards very few resource types Liquidity Problems in Markets

69

 If an open Cloud market is created, in which resource specifications are left to the trader, would such a market be able to match providers and consumers?

 This we have simulated in a double auction market environment with varying numbers of resource types and traders

 The matching probability was used as a measure to determine how attractive a market would be to providers and consumers Liquidity Problems in Markets

70

 With 5000 resource types, about 40,000 traders would be needed to achieve a matching probability of 75%

 With 10,000 resource types, about 46,000 traders would be needed to achieve a matching probability of 75%

 With 276,480 resource types, about 33 million traders would be needed to achieve a matching probability of 75% The Challenge

 Based on the analysis of the liquidity problems in markets, we are faced with an interesting research challenge:  On the one hand, to fully exploit the potential of open markets, a large number of providers and consumers is necessary  On the other hand, the large number of potential traders might inflate the variety of resources which leads to the problem that the supply and the demand are spread across a wide range of resources

 To give traders few restrictions, an approach is needed which allows traders to define their resources (or requirements) freely while facilitating SLA matching 71 Importance of SLAs in markets

72

 Current adaptive SLA matching mechanisms are based on OWL and DAML-S and other semantic technologies  However, none of these approaches address the issues of the open market  In most existing approaches, user and consumer have to agree either on specific ontologies or have to belong to a specific portal  None of the approaches deal with semi-automatic definition of SLA mappings enabling negotiations between inconsistent SLA templates. Heterogeneity of Clouds

73 Solution: NegotiationBootstrapping Cloud How to map … between different DB DB SLAtemplates? DB

AppsApps.. WS

SLA Negotiation SLA Template Strategy A Y Client How to map Middleware between different Consumer negotiation strategies?

SLA Negotiation Template Solution: Strategy B X ServiceMediation Provider Managing SLAs

74

 The figure depicts the registry responsible for the management of SLA mappings

 The registry comprises different SLA templates, each of which represents a specific application domain

 The registry works as follows  Providers assign to their service a particular public SLA template  Next, they may assign SLA mappings  Consumers search for the services using meta-data and search terms  After finding appropriate services, each consumer may define mappings  Public SLA templates should be updated frequently to reflect the actual SLAs used The SLA Template Lifecycle

75 1. We assume that for specific domains, specific SLA templates are generated 2. These generated SLA templates are then published in the public registry, while learning functions for the adaptation of these public SLA templates are defined 3. SLA mappings are defined manually by users 4. Automatic mapping of SLAs. 5. Based on the learning function and based on the submitted SLA mappings a new version of the SLA template can be defined and published in the registry Autonomic Process

76

QoS Example

Autonomic Manager Service QoS Metric Compositions Protocol analysis planning Mapping Strategies

Knowledge

QoS Metric Negotiation using Protocol VieSLAF framework Evaluation monitoring execution

Sensor Actuator Autonomic Process for MN and SM 77 Definition and publication Prerequisite Sensor of meta-negotiation document

monitoring Execution of meta negotiation Detection of SLA inconsistencies

Evaluation of existing Evaluation of existing analysis bootstrapping strategies SLA mappings

Knowledge Application of existing and Application of existing and definition of new planning definition of new SLA mappings bootstrapping strategies

Application of SLA mappings to Execution of bootstrapping execution fulfill successful SLA contracting

Negotiation Service Mediation Actuator Bootstrapping Lifecycle of a self-manageable Cloud Service 78

Meta Negotiation Negotiation

Self-Management

Post processing Execution Architecture for Self-manageable Cloud Services 79

Autonomic Manager Service 1

planning execution Actuator Service 2 Knowledge …

analysis monitoring Sensor Service n

Self-management interface Negotiation interface Job management interface Auction Overview

80

 In an auction, a seller offers an item for sale, but does not establish a price

 Stakeholders:  Bidders (i.e., potential buyers)  Sellers  Intermediaries

 Shill bidders place bids on behalf of the seller to artificially inflate the price of an item Auction Models

81

Public  Auction Types Template X  English Auction (1) define mapping (1) define mapping  Dutch Auction (2) send bids  Reverse English Auction Local Local  Alternate Offers Template Template A (n) match B Auction

 Automated Definition of SLA mappings Application of SLA mappings auctions based on agent models Dutch Auctions

82

 Dutch auctions (i.e., descending-price auctions)

 Form of open auction in which bidding starts at a high price and drops until a bidder accepts the price

 Dutch auctions often are better for the seller

 An effective means of moving large numbers of commodity items quickly Posted Price Mechanism

83

Public Template X (1) define mapping (1) define mapping 2. post price

Local Local Template 3. match Template A B

Definition of SLA mappings Application of SLA mappings Negotiated Price

84

Public Template X

(1) define mapping (2) send (1) define mapping metadata

(3) offer

Local (4) acknowledgment Local Template Template A B

Definition of SLA mappings Application of SLA mappings Practical Session (runway)

85

Contact: Atakan Aral, PhD Institute of Information Systems Engineering Vienna University of Technology [email protected] http://rucon.ec.tuwien.ac.at

These slides are mostly adapted from course materials by Univ. Prof. Dr. Ivona Brandic for 184.271 Large-scale Distributed Computing course at TU Wien.