Cloud Computing 雲端計算

CSF643 – Cloud Computing 雲端計算 Cloud Computing – A System View 吳俊興國立高雄大學資訊工程學系 2017 Outline • Dissecting Cloud Computing – Development of distributed systems – Inside a cloud • Server/Datacenter View – Technologies – Case studies • Client/Terminal View –IoT – Hardware – Software • Summary 2 What is Cloud Computing? A new computing paradigm? 3 Inside Clouds • Computing Devices –Servers – Desktop and laptop PCs – Handheld devices – Smart phones • Communication Links – Wired / wireless – Narrowband / broadband – LAN / WAN Yet Another Kind of Distributed Systems? 4 Distributed Systems • Motivation: Networks of computers are everywhere! – Mobile phone networks – Corporate networks – Factory networks – Campus networks – Home networks – In-car networks – Planetary networks Desire to Why networked? share resources • Influence: Networked computers impact system designers and implementers 5 Defining Distributed Systems • “A system in which hardware or software components located at networked computers communicate and coordinate their actions only by message passing.” [Coulouris] – Networked computers could be far apart or in the same room • relying on computer networking • i.e. cluster and grid • “A distributed system is a collection of independent computers that appear to the users of the system as a single computer.” [Tanenbaum] 6 Architecture Models of Distributed Systems 1980~ 1990~ 2000~ Terminal-Mainframe Client-Server Peer-to-Peer (Super-computing) (Micro-computing (Macro-computing) /Personal Computer) RS-232 Dialup/10M Ethernet ADSL/100M+ Ethernet VT100/DOS Windows 31/95 Linux/Windows XP 7 Emerging 2010~ Model Cloud Computing 10GbE/4G/WiGig Android/iOS/WP 8 Client-server Model Clients and servers each with distinct roles Request Clients Server U11 Service U12 S U13 U21 The server and the network U22 become the bottlenecks and points of failure U31 U32 •DDoS •Flash Crowd 9 Peer-to-peer Model “Peer-to-Peer (P2P) is a way of structuring distributed applications such that the individual nodes have symmetric roles. Rather than being divided into clients and servers each with quite distinct roles, in P2P applications a node may act as both a client and a server.” Excerpt from the Charter of Peer-to-Peer Research Group, IETF/IRTF, June 24, 2003 http://www.irtf.org/charters/p2prg.html Peers play similar roles No distinction of responsibilities 10 Google Search Trends Cluster computing, Grid computing, Cloud computing, Big data Web Services -> service/utility computing -> cloud computing 11 Cloud Computing Model “Cloud computing is Web-based processing, whereby shared resources, software, and information are provided to computers and other devices (such as smartphones) on demand over the Internet… ” Excerpt from Wikipedia Hybrid of terminal-mainframe, client-server, and peer-to-peer involving over-the-Internet provision of dynamically scalable and often virtualized resources 12 Cloud Computing Model 13 Five Layers of Cloud Stack (Smart devices) Client Browse the Web Service Application/Software Deliver software as a service over the Internet Needn’t install & run applications on its own computers SaaS •Microsoft Office/Livemesh Platform Deliver a computing platform and/or solution stack as a service PaaS •Google’s AppEngine Virtualization Infrastructure Deliver computer equipments (i.e. virtual machines, storages, networks) as a service over the Internet IaaS •Amazon Web Service (EC2, S3) (Datacenters) HardwareServer Software Donate/lend your hardware resources Warning: Layers not clearly defined yet! 14 What(’s new) in Today’s Clouds? Three major features: 1. On-demand Access: Pay-as-you-go, no upfront commitment – Anyone can access it (e.g., Washington Post – Hillary Clinton example) 2. Data-intensive Nature: What was MBs has now become TBs – Daily logs, forensics, Web data, etc. – Do you know the size of Wikipedia dump? 3. New Cloud Programming Paradigms: MapReduce/Hadoop, Pig Latin, DryadLinq, Swift, and many others – High in accessibility and ease of programmability Combination of one or more of these gives rise to novel and unsolved distributed computing problems in cloud computing 15 Technologies of Cloud Computing Intelligent interface Client Smart devices Ubiquitous connection Service Application/Software SaaS On-demand Access Data-intensive Nature Platform New Cloud Programming Paradigms PaaS Virtualization Infrastructure Virtual Machines High-throughput Communication IaaS Datacenters of Containers HardwareServer Software Servers of Commodity PCs 10Gb+ Networking 16 Outline • Dissecting Cloud Computing – Development of distributed systems – Inside a cloud • Server/Datacenter View – Technologies – Case studies • Client/Terminal View –IoT – Hardware – Software • Summary 17 A Sample Cloud Topology So then, what is a cluster? Core Switch Top of the Rack Switch Rack Servers 18 Container as Building Block • Data Center Module – Contains network gear, compute, storage, & cooling – Just plug in power, network, & chilled water • Increased cooling efficiency – Variable water & air flow – Better air flow management (higher delta-T) – 80% air handling power reductions (Rackable Systems) • Bring your own data center shell – Just central networking, power, cooling, & admin center – Grow beyond existing facilities – Can be stacked 3 to 5 high – Less regulatory issues (e.g. no building permit) – Avoids (for now) building floor space taxes • Meet seasonal load requirements • Single customs clearance on import • Single FCC compliance certification 19 Larger Unit of Data Center Growth • One at a time: – 1 system – Racking & networking: 14 hrs ($1,330) • Rack at a time: – ~40 systems – Install & networking: .75 hrs ($60) • Container at a time: – ~1,000 systems – No packaging to remove – No floor space required – Power, network, & cooling only • Weatherproof & easy to transport • Data center construction takes 24+ months – Both new build & DC expansion require regulatory approval 20 Scale of Industry Datacenters • Microsoft [NYTimes, 2008] – 150,000 machines – Growth rate of 10,000 per month – Largest datacenter: 48,000 machines – 80,000 total running Bing • Yahoo! [Hadoop Summit, 2009] – 25,000 machines – Split into clusters of 4000 • AWS EC2 (Oct 2009) – 40,000 machines – 8 cores/machine • Google – (Rumored) several hundreds of thousands of machines 21 Virtualization • A virtual machine is a software implementation of a machine (computer) that executes instructions like a physical machine – It provides an interface identical to the underlying bare hardware – Para-virtualization: presents a software interface to VM that is similar but not identical to that of the underlying hardware – Emulator: provides an emulation of the functions of one system using a different system • Two major categories – System virtual machine: providing a complete system platform which supports the execution of a complete operating system (OS) • VMWare, Vitual PC, VirtualBox, Xen – Process virtual machine: designed to run a single program, which means that it supports a single process • Jave VM, Microsoft’s .NET Common Language Infrastructure VM 22 VMware Architecture 23 The Java Virtual Machine 24 Google’s Key Patent on Cloud Computing US2008/0262828 “Encoding and Adaptive Scalable Accessing of Distributed Models” “Systems, methods, and apparatus for accessing distributed models in automated machine processing, including using large language models in machine translation, speech recognition and other applications.” • filed in February 2006 • 91 claims 25 System Example Machine processing using machines such as computers to perform processing tasks such as machine translation FIG. 12 An example of a distributed FIG. 13 An example computer processing system that can be system in a communication network configured to provide a language that provides distributed processing processing function based on a large language model 26 Americas Asia Berkeley County, South Carolina Google Changhua County, Taiwan Council Bluffs, Iowa Singapore Douglas County, Georgia Data Centers Europe Quilicura, Chile Hamina, Finland Jackson County, Alabama St Ghislain, Belgium Mayes County, Oklahoma Dublin, Ireland Lenoir, North Carolina Eemshaven, Netherlands The Dalles, Oregon • 36 data centers / 500 IPs? http://www.google.com/about/datacenters/ • Continuous evolution: 7 significant revisions in last 10 years • An ordinary search query involves 700 to 1,000 servers 27 Inside a Google Data Center • A small data center consists of a minimum of 2,400 servers – racks of 80 servers tied together with 10Gb Ethernet or other high- speed network fabrics – 30 or more of these racks are deployed into a single cluster • Each of these servers has 16GBs of RAM with fast 2TB (Terabyte) hard drives – A patent on a power supply that integrates a battery, allowing it to function as an uninterruptible power supply (UPS) – Google-optimized Ubuntu Linux YouTube - Google container data center tour (2009.4.7) 28 Google’s In-House Software • Google File System: A scalable distributed file system for large distributed data-intensive applications – It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients • BigTable: A distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers – Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance • Google Web Server (GWS) and Google Front

Cloud Computing 雲端計算

The Google File System (GFS)

A Review on GOOGLE File System

F1 Query: Declarative Querying at Scale

A Study of Cryptographic File Systems in Userspace

Cloud Computing Bible Is a Wide-Ranging and Complete Reference

Mapreduce: Simplified Data Processing On

Google File System 2.0: a Modern Design and Implementation

A Survey on Cloud Storage

Comparative Analysis of Distributed and Parallel File Systems' Internal Techniques

Qos-Enabled Distributed Mutual Exclusion in Public Clouds

Google: the World's First Information Utility?

Scaling to Build the Consolidated Audit Trail