CIT 668: System Architecture

4/27/2011 CIT 668: System Architecture Review Topics 1. What is Architecture? 2. What is Cloud Computing? 3. Computer Architecture and Parallelism 4. Data Centers 5. High Availability and Load Balancing 6. Distributed Databases and NoSQL 7. Security and Privacy What is Architecture? architecture(n): the complex or carefully designed structure of something Specifically in computing: the conceptual structure and logical organization of a computer or computer-based system: a client/ server architecture - http://oxforddictionaries.com/ 1 4/27/2011 Cloud Computing What is Cloud Computing? “Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” NIST definition of Cloud Computing Cloud Service Models Abstraction Layers 2 4/27/2011 Cloud Deployment Architectures Cloud Computing Advantages • Flexibility • Scalability • Cost • Maintenance • Utilization • Power Cloud is enabled by Virtualization Virtual Linux BSD W2k8 Machines Physical Machine 3 4/27/2011 Computer Architecture A Single CPU Computer Components The 5 Von Neumann Components Input/Output CPU and ALU Memory 4 4/27/2011 Processor-Memory Bottleneck Solution: Caches Principle of Locality Programs tend to reuse data and instructions near those they have used recently. Temporal locality: Recently referenced items are likely to be referenced in the near future. Spatial locality: Items with nearby addresses tend to be referenced close together in time. Caches Cache: A smaller, faster storage device that transparently stores a subset of the data in a larger slower device so that future requests for that data can be served more quickly. Average Memory Access Time = Time for a Cache Hit + Miss Rate × Miss Penalty 5 4/27/2011 Memory Hierarchy The Processor • The Brain: a functional unit that interprets and carries out instructions (mathematical operations) • Also called a CPU (actually includes CPU + ALU) • Consists of hundreds of millions of transistors. Moore’s Law – Number of transistors doubles every 18 months More transistors = Cheaper CPUs Higher speeds More features More cache 18 6 4/27/2011 Improvements in CPU Clock Speed Serial and Parallel Computation Serial Parallel Flynn’s Taxonomy Single Instruction Multiple Instruction Single Data SISD: Pentium III MISD: None today Multiple Data SIMD: SSE MIMD: Xeon e5345 instruction set (Clovertown) 7 4/27/2011 Instruction Level Parallelism (ILP) Running independent instructions on separate execution units simultaneously. Serial Execution: If each instruction takes one cycle, it takes x = a + b 3 clock cycles to run program. y = c + d z = x + y Parallel Execution: – First two programs are independent, so can be executed simultaneously. – Third instruction depends on first two, so must be executed afterwards. – Two clock cycles to run program. Superscalar Architecture Instead of one ALU, use multiple execution units – Some execution units are identical to others – Others are different: Integer, FPU, multi-media Multicore • Multicore CPU chips contain multiple complete processors • Individual L1 and shared L2 caches • OS and applications see each core as an independent processor • Each core can run a separate task • A single application must be divided into multiple tasks to improve performance 8 4/27/2011 Amdahl’s Law Speedup due to enhancement E is 퐸푥푒푐푢푡표푛 푡푚푒 푤푡ℎ표푢푡 퐸 푆푝푒푒푑푢푝 = 퐸푥푒푐푢푡표푛 푡푚푒 푤푡ℎ 퐸 Suppose E accelerates a piece P (P<1) of task by a factor S (S>1) and remainder unaffected Exec time with E = Exec time w/o E × [ 1 - P + P/S ] 1 푆푝푒푒푑푢푝 = 1 − 푃 + 푃/푆 Amdahl’s Law: Example Consider an application whose work is divided into the following four components: Work Memory Disk Network Computation Load Access Access Access Time 10% 70% 10% 10% What is the expected percent improvement if: Memory access speed is doubled? 5% Computation speed is doubled? 35% Amdahl’s Law for Parallelization 1 푆푝푒푒푑푢푝 = 1 − 푃 + 푃/푆 • Let P be the parallelizable portion of code • As the number of processors increases, the time to do the parallel portion of the program, P/S tends towards zero, reducing the equation to: 1 푆푝푒푒푑푢푝 = 1 − 푃 • If P=0, then speedup=1 (no improvement) • If P=1, then speedup grows without limit. • If P=0.5, then maximum speed is 2. 9 4/27/2011 Amdahl’s Law: Parallel Example Consider an application whose work is divided into the following four functions: Work f1 f2 f3 f4 Load Time 4% 10% 80% 6% Assume f1, f3, and f4 can be parallelized, but f2 must be computed serially. Parallelizing which part would best improve performance? f3 What is the best performance speedup that could be reached by parallelizing all 10X three parallelizable functions? Amdahl’s Law: Time Example Consider an application whose work is divided into the following four functions: Work f1 f2 f3 f4 Load Time 2ms 5ms 40ms 3ms Assume f1, f3, and f4 can be parallelized, but f2 must be computed serially. Assume that running the whole program takes 50ms. What is the best running time that can be achieved by parallelizing f1, f3, and f4? 5ms 5ms is the time Why can’t parallelizing the program decrease required for serial the total running time below that time? part. Even if parallel part takes 0ms, f2 still takes 5ms to run. Amdahl’s Law 10 4/27/2011 Scalability Vertical Scaling Plenty of Fish • 1.2 billion page views per month, 500,000 average unique logins per day • 30+ million hits per day, 500-600 per second • 45 million visitors per month • top 30 site in the US, top 10 in Canada, top 30 in the UK • 2 load balanced Windows Server 2003 x64 web servers with 2 Quad Core 2.66Ghz CPUs, 8 GB RAM, 2 hard drives • 3 database servers. No data on their configuration • Approaching 64,000 simultaneous connections and 2 million page views per hour • Internet connection is a 1 Gbps line, 200 Mbps is used • 1 TB per day serving 171 million images through Akamai • 6 TB storage array to handle millions of full sized images uploaded every month to the site http://highscalability.com/plentyoffish-architecture 11 4/27/2011 Plenty of Fish Scaling “We upgraded from a machine with 64 GB of ram and 8 CPU’s to a HP ProLiant DL785 with 512 GB of ram and 32 CPU’s and moved from SQLserver 2005 to 2008 and windows 2008.” – Markus, https://plentyoffish.wordpress.com/2009/06/14/upgrade s-themes-date-night/ Estimated cost ~ $100,000 Horizontal Scaling Googol = 10100 Large Container First Rack Data • 8 CPUs Centers • 200 GB • > 106 Sun Ultra 2 servers • 2 200 MHz processors 12 4/27/2011 Horizontal vs. Vertical Scaling Example • Total budget is $100,000 • Vertical: PoF HP ProLiant DL785 32CPU,512GB • Horizontal: 83 1U servers for $1150 each Lenovo ThinkServer RS110 barebones $600 8 GB RAM $100 2 x eBay drive brackets $50 2 x 500 GB SATA hard drives, mirrored $100 Intel Xeon X3360 2.83 GHz quad-core CPU $300 • Comparison: Scaling Up Scaling Out CPUs 32 332 RAM 512 GB 664 GB Disk 4 TB 40.5 TB http://www.codinghorror.com/blog/2009/06/scaling-up-vs-scaling-out-hidden-costs.html Distributed System Types Shared • All CPUs share memory/disk • Scalability limited by memory Memory contention (vertical scaling only) Shared • CPUs share storage, not RAM • Scalability limited by disk contention Disk (vertical scaling only) Shared • Each CPU has its own RAM and disks • Very high (horizontal) scalability since Nothing no contention for shared resources Data Centers 13 4/27/2011 Data Center Components Measuring Power Efficiency PUE is ratio of total building power to IT power; efficiency of datacenter building infrastructure SPUE is ratio of total server input to its useful power, where useful power is power consumed by CPU, DRAM, disk, motherboard, etc. Excludes losses due to power supplies, fans, etc. Computation efficiency depends on software and workload and measures useful work done per watt Improving Power Efficiency 14 4/27/2011 Performance Impact of App Living Across Data Center Racks Servers Processors Cores Total Cost of Ownership (TCO) TCO = Data Center Depreciation + Data Center Operating Expenses (Opex) + Server Depreciation + Server Operating Expenses (Opex) Depreciation is the process of allocating cost of assets across period during which assets are used. Example: server cost = $10,000, $0 residual value annual depreciation over 4 years = $2500 High Availability and Load Balancing 15 4/27/2011 Load Balancing and High Availability Load Balancing High Availability round-robin DNS reverse proxy heartbeat data partitioning wackamole hot spare Availability 푀푇퐵퐹 퐴 = 푀푇퐵퐹 + 푀푇푇푅 MTBF = Mean Time Between Failures MTTR = Maximum Time To Resolution Example: MTBF=1000 hours, MTTR=1 hour 1000 퐴 = 1000+1 = 0.999000 = 99.9% Single Point of Failure • A single point of failure (SPOF) is a component which will cause the entire system to fail if it fails. • A fault tolerant system cannot have a single point of failure, and so has redundant components. 16 4/27/2011 Failover Requirements Transparency Failover should not be noticeable by clients. Speed Failover should happen quickly, so that there is only a short downtime while it occurs. Automatic Failover should not require sysadmin to intervene. Consistent Clients should see same data on failover server as they saw on the original server prior to failover. Failover Components Move from failed server to the failover server – Network identity: DNS name, IP address, or MAC address must be transferred, depending on what layer of protocol stack service functions on. – Data: Usually must be accomplished by a shared storage system: NAS or SAN. – Processes: Server processes associated with data must be restarted once data and network identity are transferred.

Load more