TCSS 562: Software Engineering for Cloud [Fall 2019] School of Engineering and Technology, UW-Tacoma

TCSS 562: OBJECTIVES SOFTWARE ENGINEERING FOR  Syllabus and course introduction

 Cloud Computing – How did we get here? Introduction Introduction to parallel and distributed systems (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition)

Wes J. Lloyd School of Engineering and Technology University of Washington - Tacoma

TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.2 School of Engineering and Technology, University of Washington - Tacoma

1 2

TCSS562 – SOFTWARE ENGINEERING DEMOGRAPHICS SURVEY FOR CLOUD COMPUTING

 Please complete the ONLINE demographics survey:  Syllabus online at: http://faculty.washington.edu/wlloyd/courses/tcss562/  https://forms.gle/dE4Q7Lt13rAXtahJ9  Grading  Linked from course webpage in Canvas:  http://faculty.washington.edu/wlloyd/courses/tcss562/  Schedule announcements.html  Assignments

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.3 September 25, 2019 L1.4 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

3 4

REFERENCES REFERENCES - 2

 [1] Cloud Computing: Concepts, Technology and Architecture*  [4] Systems Performance: Enterprise and the Cloud *  Thomas Erl, Prentice Hall 2013  Brendan Gregg, First Edition 2013

 [2] Cloud Computing - Theory and Practice  [5] AWS Administration – The Definitive Guide *  Dan Marinescu, First Edition 2013 *, Second Edition 2018  Yohan Wadia, First Edition 2016

 [3] Cloud Computing:  Research papers A Hands-On Approach  Arshdeep Bahga 2013

* - available online via UW library * - available online via UW library

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.5 September 25, 2019 L1.6 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

5 6

Slides by Wes J. Lloyd L1.1 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

TCSS 562 – Fall 2019  Mondays/Wednesdays TCS562 COURSE WORK . Lecture, midterm, quiz, activities . No class: Monday Nov 11 TCSS 562  Project Proposal Class online: Wednesday Nov 27 FALL . Key Topics: 2019  Project Status Reports / Activities / Quiz . IaaS, Virtualization, Serverless computing, FaaS, Containerization . ~ 2-4 total items (??) . Variety of formats: in class, online, reading, activity  Tutorials . Lab Day: Tutorials, group project work  Midterm  No Final exam . Open book, note, etc.  Midterm Wednesday October 30th (tentative)  Class Presentation  Term Project: Build and evaluate alternate implementations of a native cloud serverless application; or group proposed  Term Project / Paper / Presentation cloud research project TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 September 25, 2019 L1.8 School of Engineering and Technology, University of Washington - Tacoma L1.7 School of Engineering and Technology, University of Washington - Tacoma

7 8

CLASS PRESENTATION TCS562 TERM PROJECT

 Each student will make one presentation in a team of ~3  Project description to be posted  Teams of ~3, self formed, one project leader  Technology sharing presentation  Proposal due: Friday October 11, 11:59pm (tentative) . PPT Slides, demonstration  Focus: . Provide technology overview of one cloud service offering . Build a native cloud serverless application . Present overview of features, performance, etc. . Compose multiple FaaS functions (services) . Compare alternate implementation of:  Cloud Research Paper Presentation . Service compositions . PPT slides, identify research contributions, strengths and . Application flow control - AWS Step Functions, laptop client, etc. weaknesses of paper, possible areas for future work . External cloud components (e.g. database, key-value store) . How does application design impact cost and performance?

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.9 September 25, 2019 L1.10 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

9 10

TCSS562 TERM PROJECT - 2 ALTERNATE TERM PROJECT IDEAS

 Deliverables  GOAL: propose cloud development project that serves as a vehicle to compare and contrast the use of alternative cloud . Demo in class at end of quarter (TBD) services . Project report paper (4-6 pgs IEEE format, template  Examples: provided)  Object/blob storage services . GitHub (project source) . Amazon S3, Google blobstore, Azure blobstore, vs. self-hosted . How-To document (via GitHub markdown)  Cloud Relational Database services . Amazon Relational Database Service (RDS), Aurora, Self-Hosted DB  Platform-as-a-Service hosting (PaaS) alternatives . Amazon Elastic Beanstalk, Heroku, others  Function-as-a-Service platforms . Google Cloud Functions, Azure Functions, IBM Cloud Functions

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.11 September 25, 2019 L1.12 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

11 12

Slides by Wes J. Lloyd L1.2 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

TERM PROJECT IDEAS - 2 TERM PROJECT: RESEARCH

 File-based storage systems  Alternative: conduct a cloud-related research project on any . Amazon EBS, Amazon EFS, others topic to answer a set of research questions  Container orchestration services . Can be used to help spur MS Capstone/Thesis work . Amazon ECS, AKS, Azure Kubernetes Service  If you’re interested in this option, please talk with the  Queueing services comparison instructor . Amazon SQS, Amazon MQ, Apache Kafka, RabbitMQ, 0Mq, others  First step is to identify 1 – 2 research questions  NoSQL database services comparison . DynamoDB, Google BigTable, MongoDB, Cassandra  Instructor will help guide projects throughout the quarter

 Approval based on team preparedness to execute project

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.13 September 25, 2019 L1.14 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

13 14

PROJECT SUPPORT TCSS562 TERM PROJECT OPPORTUNITIES

 Project cloud infrastructure support:  Projects can lead to papers or posters presented at  Sign up for the Github Student Developer Pack: ACM/IEEE/USENIX conferences, workshops . https://education.github.com/pack . Includes up to $150 in Amazon Cloud Credits . Networking and travel opportunity . Includes up to $100 in Microsoft Azure Credits . Conference participation (posters, papers) . AWS credit extensions provided as needed helps differentiate your resume from others  Microsoft Azure for Students . $100 free credit per account valid for 1 year  Project can support preliminary work for: . https://azure.microsoft.com/en-us/free/students/ UWT - MS capstone/thesis project proposals . Also: $200 free credit option for 1 month  Google Cloud  Research projects provide valuable practicum experience . $300 free credit for 1 year with cloud systems analysis, prototyping . https://cloud.google.com/free/  Chameleon / CloudLab  Publications are key for building your resume/CV, . Bare metal NSF cloud - free Also key if applying to PhD programs

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.15 September 25, 2019 L1.16 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

15 16

TCSS562 TERM PROJECT - 3 OBJECTIVES

 Project status report / term project check-ins  Cloud Computing: How did we get here? . Written status report . Parallel and distributed systems . 2-3 times in quarter (Marinescu Ch. 2 - 1st edition, Ch. 4 - 2nd edition) . Part of: “Project Status Reports / Activities / Quizzes” . Data, -level, task-level parallelism category . Parallel architectures . 10% of grade . SIMD architectures, vector processing, multimedia extensions  Project meetings with instructor . Graphics processing units . After class, end half of class, office hours . Speed-up, Amdahl's Law, Scaled . Properties of distributed systems . Modularity

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.17 September 25, 2019 L1.18 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

17 18

Slides by Wes J. Lloyd L1.3 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

CLOUD COMPUTING: HOW DID WE GET HERE?

 General interest in . Moore’s Law - # of transistors doubles every 18 months . Post 2004: heat dissipation challenges: can no longer easily increase cloud speed . Overclocking to 7GHz takes more than just liquid nitrogen: . https://tinyurl.com/y93s2yz2 Solutions: . Vary CPU clock speed . Add CPU cores . Multi-core technology

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.19 September 25, 2019 L1.20 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

19 20

AMD’S 64-CORE 7NM CPUS HYPER THREADING

 Epyc Rome CPUs  Modern CPUs provide multiple instruction pipelines, supporting multiple execution threads, usually 2  Announced August 2019 to feed instructions to a single CPU core…  EPYC 7H12 requires liquid cooling  Two hyper-threads are not equivalent to (2) CPU cores

 i7-4770 and i5-4760 same CPU, with and without HTT  Example:  hyperthreads add +32.9%

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.21 September 25, 2019 L1.22 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

21 22

CLOUD COMPUTING: CLOUD COMPUTING: HOW DID WE GET HERE? - 2 HOW DID WE GET HERE? - 3

 To make computing faster, we must go “parallel”  Cloud computing provides access to “infinite”  Difficult to expose parallelism in scientific scalable compute infrastructure on demand applications  Infrastructure availability is key to exploiting  Not every problem solution has a parallelism . Chicken and egg problem…  Cloud applications  Many commercial efforts promoting pure parallel .Based on client-server paradigm programming efforts have failed .Thin clients leverage compute hosted on the cloud  Enterprise computing world has been skeptical and less involved in parallel programming .Applications run many web service instances .Employ load balancing

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.23 September 25, 2019 L1.24 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

23 24

Slides by Wes J. Lloyd L1.4 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

CLOUD COMPUTING: SMITH WATERMAN USE CASE HOW DID WE GET HERE? - 4  Applies dynamic programming to find best local  Big Data requires massive amounts of compute resources alignment of two protein sequences . , each task can run in isolation  MAP – REDUCE . Use case for GPU acceleration .Single instruction, multiple data (SIMD)  AWS Lambda Serverless Computing Use Case: .Exploit data level parallelism Goal: Pair-wise comparison of all unique human protein sequences (20,336)  Bioinformatics example . Python client as scheduler . C Striped Smith-Waterman (SSW) execution engine From: Zhao M, Lee WP, Garrison EP, Marth GT: SSW library: an SIMD Smith- Waterman C/C++ library for use in genomic applications. PLoS One 2013, 8:e82138

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.25 September 25, 2019 L1.26 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

25 26

CLOUD COMPUTING: SMITH WATERMAN RUNTIME HOW DID WE GET HERE? - 3

 Laptop server and client (2-core, 4-HT): 8.7 hours  Compute clouds are large-scale distributed  AWS Lambda FaaS, laptop as client: 2.2 minutes systems . Partitions 20,336 sequences into 41 sets .Heterogeneous systems . Execution cost: ~ 82¢ (~237x speed-up) .Homogeneous systems  AWS Lambda server, EC2 instance as client: 1.28 .Autonomous minutes . Execution cost: ~ 87¢ (~408x speed-up) .Self organizing  Hardware . Laptop client: Intel i5-7200U 2.5 GHz :4 HT, 2 CPU . Cloud client: EC2 Virtual Machine - m5.24xlarge: 96 vCPUs . Cloud server: Lambda ~1000 Intel E5-2666v3 2.9GHz CPUs

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.27 September 25, 2019 L1.28 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

27 28

PARALLELISM PARALLELISM - 2

 Discovering parallelism and development of parallel  Coordination of nodes algorithms requires considerable effort  Requires message passing or  Example: numerical analysis problems, such as solving large  Debugging parallel message passing code is easier systems of linear equations or solving systems of Partial than parallel shared memory code Differential Equations (PDEs), require algorithms based on domain decomposition methods.  Message passing: all of the interactions are clear  How can problems be split into independent chunks? . Coordination via specific programming API (MPI)  Fine-grained parallelism  Shared memory: interactions can be implicit – must read the code!! . Only small bits of code can run in parallel without coordination . Communication is required to synchronize state across nodes  Processing speed is orders of magnitude faster than  Coarse-grained parallelism communication speed (CPU > memory speed) . Large blocks of code can run without coordination  Avoiding coordination achieves the best speed-up TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.29 September 25, 2019 L1.30 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

29 30

Slides by Wes J. Lloyd L1.5 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

TYPES OF PARALLELISM THREAD LEVEL PARALLELISM (TLP)

 Parallelism:  Number of threads an application runs at any one time . Goal: Perform multiple operations at the same time  Varies throughout program execution to achieve a speed-up  As a metric:  Minimum: 1 thread  Thread-level parallelism (TLP)  Can measure average, maximum (peak) .Control flow architecture  QUESTION: What are the consequences of average (TLP)  Data-level parallelism for scheduling an application to run on a computer with a fixed number of CPU cores and hyperthreads? .Data flow architecture  Let’s say there are 4 cores, or 8 hyper-threads…  Bit-level parallelism  Instruction-level parallelism (ILP) Key to avoiding waste of computing resources is knowing your application’s TLP…

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.31 September 25, 2019 L1.32 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

31 32

CONTROL-FLOW ARCHITECTURE DATA-LEVEL PARALLELISM

 By John von Neumann (1945)  Partition data into big chunks, run separate copies  Also called the of the program on them with little or no  Dominant computer system architecture communication  Program (PC) determines next instruction  Problems are considered to be to load into embarrassingly parallel  Program execution is sequential  Also perfectly parallel or pleasingly parallel…  Little or no effort needed to separate problem into a number of parallel tasks  MapReduce programming model is an example

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.33 September 25, 2019 L1.34 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

33 34

DATA FLOW ARCHITECTURE DATA FLOW ARCHITECTURE - 2

 Alternate architecture used by network routers, digital  Architecture not as popular as control-flow signal processors, special purpose systems  Modern CPUs emulate data flow architecture for dynamic  Operations performed when input (data) becomes instruction scheduling since the 1990s available . Out-of-order execution – reduces CPU idle time by not blocking  Envisioned to provide much higher parallelism for instructions requiring data by defining execution windows . Execution windows: identify instructions that can be run by  Multiple problems has prevented wide-scale adoption data dependency . Efficiently broadcasting data tokens in a massively . Instructions are completed in data dependency order within parallel system execution window . Efficiently dispatching instruction tokens in a massively . Execution window size typically 32 to 200 instructions parallel system . Building content addressable memory large enough to Utility of data flow architectures has been hold all of the dependencies of a real program much less than envisioned TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.35 September 25, 2019 L1.36 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

35 36

Slides by Wes J. Lloyd L1.6 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

BIT-LEVEL PARALLELISM INSTRUCTION-LEVEL PARALLELISM (ILP)

 Computations on large words (e.g. 64-bit integer) are  CPU pipelining architectures enable ILP performed as a single instruction  CPUs have multi-stage processing pipelines  Fewer instructions are required on 64-bit CPUs to  Pipelining: split instructions into sequence of steps that larger operands (A+B) providing dramatic performance can execute concurrently on different CPU circuitry improvements  Processors have evolved: 4-bit, 8-bit, 16-bit, 32-bit, 64-bit  Basic RISC CPU - Each instruction has 5 stages:  IF – instruction fetch QUESTION: How many instructions are required to add two  ID- instruction decode 64-bit numbers on a 16-bit CPU? (Intel 8088)  EX – instruction execution  64-bit MAX int = 9,223,372,036,854,775,807 (signed)  MEM – memory access  16-bit MAX int = 32,767 (signed)  WB – write back  Intel 8088 – limited to 16-bit registers

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.37 September 25, 2019 L1.38 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

37 38

CPU PIPELINING INSTRUCTION LEVEL PARALLELISM - 2

 RISC CPU:  After 5 clock cycles, all 5 stages of an instruction are loaded  Starting with 6th clock cycle, one full instruction completes each cycle  The CPU performs 5 tasks per clock cycle! Fetch, decode, execute, memory read, memory write back

 Pentium 4 (CISC CPU) – processing pipeline w/ 35 stages!

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.39 September 25, 2019 L1.40 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

39 40

MICHAEL FLYNN’S COMPUTER FLYNN’S TAXONOMY ARCHITECTURE TAXONOMY

 Michael Flynn’s proposed taxonomy of computer  SISD (Single Instruction Single Data) architectures based on concurrent instructions and Scalar architecture with one /core. number of data streams (1966) . Individual cores of modern multicore processors are “SISD”  SISD (Single Instruction Single Data)  SIMD (Single Instruction, Multiple Data)  SIMD (Single Instruction, Multiple Data)  MIMD (Multiple Instructions, Multiple Data) Supports vector processing . When SIMD instructions are issued, operations on  LESS COMMON: MISD (Multiple Instructions, Single Data) individual vector components are carried out concurrently  Pipeline architectures: functional units perform different . Two 64-element vectors can be added in parallel operations on the same data . Vector processing instructions added to modern CPUs  For fault tolerance, may want to execute same instructions . Example: Intel MMX (multimedia) instructions redundantly to detect and mask errors – for task replication TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.41 September 25, 2019 L1.42 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

41 42

Slides by Wes J. Lloyd L1.7 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

(SIMD): VECTOR PROCESSING FLYNN’S TAXONOMY - 2 ADVANTAGES

 Exploit data-parallelism: vector operations enable speedups  MIMD (Multiple Instructions, Multiple Data) - system with several processors and/or cores that function asynchronously  Vectors architecture provide vector registers that can store and independently entire matrices into a CPU register  At any time, different processors/cores may execute different  SIMD CPU extension (e.g. MMX) add support for vector instructions on different data operations on traditional CPUs  Multi-core CPUs are MIMD  Processors share memory via interconnection networks  Vector operations reduce total number of instructions for . Hypercube, 2D torus, 3D torus, omega network, other topologies large vector operations  MIMD systems have different methods of sharing memory  Provides higher potential speedup vs. MIMD architecture . (UMA) . Only Memory Access (COMA)  Developers can think sequentially; not worry about . Non-Uniform Memory Access (NUMA) parallelism

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.43 September 25, 2019 L1.44 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

43 44

ARITHMETIC INTENSITY ROOFLINE MODEL

 Arithmetic intensity: Ratio of work (W) to  When program reaches a given arithmetic intensity memory traffic r/w (Q) performance of code running on CPU hits a “roof” Example: # of floating point ops per byte of data read  CPU performance bottleneck changes from:  Characterizes application with SIMD support memory bandwidth (left)  floating point performance (right) . SIMD can perform many fast matrix operations in parallel Key take-aways:  High arithmetic Intensity: When a program’s has low Programs with dense matrix operations scale up nicely Arithmetic Intensity, memory (many calcs vs memory RW, supports lots of parallelism) bandwidth limits performance..

 Low arithmetic intensity: With high Arithmetic intensity, Programs with sparse matrix operations do not scale well the system has peak parallel with problem size performance… (memory RW becomes bottleneck, not enough ops!)  performance is limited by?? TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.45 September 25, 2019 L1.46 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

45 46

GRAPHICAL PROCESSING UNITS (GPUS) PARALLEL COMPUTING

 GPU provides multiple SIMD processors  Parallel hardware and software systems allow:  Typically 7 to 15 SIMD processors each . Solve problems demanding resources not available on single system.  32,768 total registers, divided into 16 lanes . Reduce time required to obtain solution (2048 registers each)  GPU programming model: single instruction, multiple thread  The speed-up (S) measures effectiveness of parallelization:  Programmed using CUDA- C like programming language by NVIDIA for GPUs S(N) = T(1) / T(N)  CUDA threads – single thread associated with each T(1)  execution time of total sequential computation data element (e.g. vector or matrix) T(N)  execution time for performing N parallel  Thousands of threads run concurrently computations in parallel TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.47 September 25, 2019 L1.48 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

47 48

Slides by Wes J. Lloyd L1.8 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

SPEED-UP EXAMPLE AMDAHL’S LAW

 Consider embarrassingly parallel image processing  Portion of computation which cannot be parallelized  Eight images (multiple data) determines the overall speedup  Apply image transformation (greyscale) in parallel  For an embarrassingly parallel job of fixed size  8-core CPU, 16 hyperthreads  Assuming no overhead for distributing the work, and a perfectly even work distribution  Sequential processing: perform transformations one at a time using a single program thread α: fraction of program run time which can’t be parallelized . 8 images, 3 seconds each: T(1) = 24 seconds (e.g. must run sequentially)  Parallel processing  Maximum speedup is: . 8 images, 3 seconds each: T(N) = 3 seconds  Speedup: S(N) = 24 / 3 = 8x speedup S = 1/ α  Called “perfect scaling”  Example: Consider a program where 25% cannot be parallelized  Must consider data transfer and computation setup time Q: What is the maximum possible speedup of the program?

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.49 September 25, 2019 L1.50 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

49 50

GUSTAFSON'S LAW GUSTAFSON’S EXAMPLE

 Calculates the scaled speed-up using “N” processors  QUESTION: What is the maximum theoretical speed-up on a 2-core CPU ? S(N) = N + (1 - N) α S(N) = N + (1 - N) α N: Number of processors N=2, α=.25 α: fraction of program run time which can’t be parallelized S(N) = 2 + (1 - 2) .25 (e.g. must run sequentially) S(N) = ?

 Example:  What is the maximum theoretical speed-up on a 4-core CPU? Consider a program that is embarrassingly parallel, S(N) = N + (1 - N) α but 25% cannot be parallelized. α=.25 N=4, α=.25 QUESTION: If deploying the job on a 2-core CPU, what S(N) = 4 + (1 - 4) .25 scaled speedup is possible assuming the use of two S(N) = ? processes that run in parallel?

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.51 September 25, 2019 L1.52 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

51 52

MOORE’S LAW DISTRIBUTED SYSTEMS

 Transistors on a chip doubles approximately every 1.5 years  Collection of autonomous computers, connected through a  CPUs now have billions of transistors network with distribution software called “middleware” that enables coordination of activities and sharing of resources  Power dissipation issues at faster clock rates leads to heat removal challenges  Key characteristics: . Transition from: increasing clock rates  to adding CPU cores  Users perceive system as a single, integrated computing facility.  Symmetric core processor –multi-core CPU, all cores have the  Compute nodes are autonomous same computational resources and speed  Scheduling, resource management, and security implemented  Asymmetric core processor – on a multi-core CPU, some cores by every node have more resources and speed  Multiple points of control and failure  Dynamic core processor – processing resources and speed can be dynamically configured among cores  Nodes may not be accessible at all times  System can be scaled by adding additional nodes  Observation: asymmetric processors offer a higher speedup  Availability at low levels of HW/software/network reliability TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.53 September 25, 2019 L1.54 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

53 54

Slides by Wes J. Lloyd L1.9 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

TRANSPARENCY PROPERTIES OF DISTRIBUTED SYSTEMS - 2 DISTRIBUTED SYSTEMS

 Key non-functional attributes  Access transparency: local and remote objects accessed using identical operations . Known as “ilities” in software engineering  Location transparency: objects accessed w/o knowledge of their location.  Availability – 24/7 access?  Concurrency transparency: several processes run concurrently  Reliability - Fault tolerance using shared objects w/o interference among them  Accessibility – reachable?  Replication transparency: multiple instances of objects are used to increase reliability  Usability – user friendly - users are unaware if and how the system is replicated  Understandability – can under  Failure transparency: concealment of faults  Scalability – responds to variable demand  Migration transparency: objects are moved w/o affecting operations performed on them  Extensibility – can be easily modified, extended  Performance transparency: system can be reconfigured based  Maintainability – can be easily fixed on load and quality of service requirements  Consistency – data is replicated correctly in timely manner  Scaling transparency: system and applications can scale w/o change in system structure and w/o affecting applications

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.55 September 25, 2019 L1.56 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

55 56

CLOUD COMPUTING – HOW DID WE GET HERE? TYPES OF MODULARITY SUMMARY OF KEY POINTS

 Soft modularity: TRADITIONAL  Multi-core CPU technology and hyper-threading  Divide a program into modules (classes) that call each other  What is a and communicate with shared-memory . Heterogeneous system?  A procedure calling convention is used (or method invocation) . Homogeneous system? . Autonomous or self-organizing system?  Enforced modularity: CLOUD COMPUTING  Program is divided into modules that communicate only  Fine grained vs. coarse grained parallelism through message passing  Parallel message passing code is easier to debug than  The ubiquitous client-server paradigm shared memory (e.g. p-threads)  Clients and servers are independent decoupled modules  Know your application’s max/avg Thread Level  System is more robust if servers are stateless Parallelism (TLP)  May be scaled and deployed separately  Data-level parallelism: Map-Reduce, (SIMD) Single  May also FAIL separately! Instruction Multiple Data, Vector processing & GPUs TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.57 September 25, 2019 L1.58 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

57 58

CLOUD COMPUTING – HOW DID WE GET HERE? CLOUD COMPUTING – HOW DID WE GET HERE? SUMMARY OF KEY POINTS - 2 SUMMARY OF KEY POINTS - 3

 Bit-level parallelism  Speed-up (S)  Instruction-level parallelism (CPU pipelining) S(N) = T(1) / T(N)  Flynn’s taxonomy: computer system architecture classification  Amdahl’s law: . SISD – Single Instruction, Single Data (modern core of a CPU) S = 1/ α α = percent of program that must be sequential . SIMD – Single Instruction, Multiple Data () . MIMD – Multiple Instruction, Multiple Data  Scaled speedup with N processes: S(N) = N – α( N-1) . MISD is RARE; application for fault tolerance…  Moore’s Law  Arithmetic intensity: ratio of calculations vs memory RW  Symmetric core, Asymmetric core, Dynamic core CPU  Roofline model: Memory bottleneck with low arithmetic intensity  Distributed Systems Non-function quality attributes  GPUs: ideal for programs with high arithmetic intensity  Distributed Systems – Types of Transparency . SIMD and Vector processing supported by many large registers  Types of modularity- Soft, Enforced

TCSS562: Software Engineering for Cloud Computing [Fall 2019] TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 L1.59 September 25, 2019 L1.60 School of Engineering and Technology, University of Washington - Tacoma School of Engineering and Technology, University of Washington - Tacoma

59 60

Slides by Wes J. Lloyd L1.10 TCSS 562: Software Engineering for Cloud Computing [Fall 2019] School of Engineering and Technology, UW-Tacoma

QUESTIONS

TCSS562: Software Engineering for Cloud Computing [Fall 2019] September 25, 2019 School of Engineering and Technology, University of Washington - Tacoma L1.61

61

Slides by Wes J. Lloyd L1.11