4/27/2011
CIT 668: System Architecture
Review
Topics
1. What is Architecture? 2. What is Cloud Computing? 3. Computer Architecture and Parallelism 4. Data Centers 5. High Availability and Load Balancing 6. Distributed Databases and NoSQL 7. Security and Privacy
What is Architecture? architecture(n): the complex or carefully designed structure of something
Specifically in computing: the conceptual structure and logical organization of a computer or computer-based system: a client/ server architecture
- http://oxforddictionaries.com/
1 4/27/2011
Cloud Computing
What is Cloud Computing?
“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” NIST definition of Cloud Computing
Cloud Service Models Abstraction Layers
2 4/27/2011
Cloud Deployment Architectures
Cloud Computing Advantages
• Flexibility • Scalability • Cost • Maintenance • Utilization • Power
Cloud is enabled by Virtualization
Virtual Linux BSD W2k8 Machines
Physical Machine
3 4/27/2011
Computer Architecture
A Single CPU Computer Components
The 5 Von Neumann Components
Input/Output
CPU and ALU
Memory
4 4/27/2011
Processor-Memory Bottleneck
Solution: Caches
Principle of Locality
Programs tend to reuse data and instructions near those they have used recently.
Temporal locality: Recently referenced items are likely to be referenced in the near future.
Spatial locality: Items with nearby addresses tend to be referenced close together in time.
Caches Cache: A smaller, faster storage device that transparently stores a subset of the data in a larger slower device so that future requests for that data can be served more quickly.
Average Memory Access Time = Time for a Cache Hit + Miss Rate × Miss Penalty
5 4/27/2011
Memory Hierarchy
The Processor
• The Brain: a functional unit that interprets and carries out instructions (mathematical operations) • Also called a CPU (actually includes CPU + ALU) • Consists of hundreds of millions of transistors.
Moore’s Law – Number of transistors doubles every 18 months
More transistors =
Cheaper CPUs
Higher speeds
More features
More cache
18
6 4/27/2011
Improvements in CPU Clock Speed
Serial and Parallel Computation
Serial
Parallel
Flynn’s Taxonomy
Single Instruction Multiple Instruction
Single Data SISD: Pentium III MISD: None today
Multiple Data SIMD: SSE MIMD: Xeon e5345 instruction set (Clovertown)
7 4/27/2011
Instruction Level Parallelism (ILP)
Running independent instructions on separate execution units simultaneously.
Serial Execution: If each instruction takes one cycle, it takes x = a + b 3 clock cycles to run program. y = c + d z = x + y Parallel Execution: – First two programs are independent, so can be executed simultaneously. – Third instruction depends on first two, so must be executed afterwards. – Two clock cycles to run program.
Superscalar Architecture
Instead of one ALU, use multiple execution units – Some execution units are identical to others – Others are different: Integer, FPU, multi-media
Multicore
• Multicore CPU chips contain multiple complete processors • Individual L1 and shared L2 caches • OS and applications see each core as an independent processor • Each core can run a separate task • A single application must be divided into multiple tasks to improve performance
8 4/27/2011
Amdahl’s Law
Speedup due to enhancement E is 퐸푥푒푐푢푡𝑖표푛 푡𝑖푚푒 푤𝑖푡ℎ표푢푡 퐸 푆푝푒푒푑푢푝 = 퐸푥푒푐푢푡𝑖표푛 푡𝑖푚푒 푤𝑖푡ℎ 퐸 Suppose E accelerates a piece P (P<1) of task by a factor S (S>1) and remainder unaffected
Exec time with E = Exec time w/o E × [ 1 - P + P/S ] 1 푆푝푒푒푑푢푝 = 1 − 푃 + 푃/푆
Amdahl’s Law: Example
Consider an application whose work is divided into the following four components:
Work Memory Disk Network Computation Load Access Access Access Time 10% 70% 10% 10% What is the expected percent improvement if:
Memory access speed is doubled? 5%
Computation speed is doubled? 35%
Amdahl’s Law for Parallelization 1 푆푝푒푒푑푢푝 = 1 − 푃 + 푃/푆 • Let P be the parallelizable portion of code • As the number of processors increases, the time to do the parallel portion of the program, P/S tends towards zero, reducing the equation to: 1 푆푝푒푒푑푢푝 = 1 − 푃 • If P=0, then speedup=1 (no improvement) • If P=1, then speedup grows without limit. • If P=0.5, then maximum speed is 2.
9 4/27/2011
Amdahl’s Law: Parallel Example
Consider an application whose work is divided into the following four functions:
Work f1 f2 f3 f4 Load Time 4% 10% 80% 6% Assume f1, f3, and f4 can be parallelized, but f2 must be computed serially.
Parallelizing which part would best improve performance? f3
What is the best performance speedup that could be reached by parallelizing all 10X three parallelizable functions?
Amdahl’s Law: Time Example
Consider an application whose work is divided into the following four functions:
Work f1 f2 f3 f4 Load Time 2ms 5ms 40ms 3ms Assume f1, f3, and f4 can be parallelized, but f2 must be computed serially. Assume that running the whole program takes 50ms.
What is the best running time that can be achieved by parallelizing f1, f3, and f4? 5ms
5ms is the time Why can’t parallelizing the program decrease required for serial the total running time below that time? part. Even if parallel part takes 0ms, f2 still takes 5ms to run.
Amdahl’s Law
10 4/27/2011
Scalability
Vertical Scaling
Plenty of Fish
• 1.2 billion page views per month, 500,000 average unique logins per day • 30+ million hits per day, 500-600 per second • 45 million visitors per month • top 30 site in the US, top 10 in Canada, top 30 in the UK • 2 load balanced Windows Server 2003 x64 web servers with 2 Quad Core 2.66Ghz CPUs, 8 GB RAM, 2 hard drives • 3 database servers. No data on their configuration • Approaching 64,000 simultaneous connections and 2 million page views per hour • Internet connection is a 1 Gbps line, 200 Mbps is used • 1 TB per day serving 171 million images through Akamai • 6 TB storage array to handle millions of full sized images uploaded every month to the site http://highscalability.com/plentyoffish-architecture
11 4/27/2011
Plenty of Fish Scaling “We upgraded from a machine with 64 GB of ram and 8 CPU’s to a HP ProLiant DL785 with 512 GB of ram and 32 CPU’s and moved from SQLserver 2005 to 2008 and windows 2008.” – Markus, https://plentyoffish.wordpress.com/2009/06/14/upgrade s-themes-date-night/
Estimated cost ~ $100,000
Horizontal Scaling
Googol = 10100
Large Container First Rack Data • 8 CPUs Centers • 200 GB • > 106 Sun Ultra 2 servers • 2 200 MHz processors
12 4/27/2011
Horizontal vs. Vertical Scaling Example
• Total budget is $100,000 • Vertical: PoF HP ProLiant DL785 32CPU,512GB • Horizontal: 83 1U servers for $1150 each Lenovo ThinkServer RS110 barebones $600 8 GB RAM $100 2 x eBay drive brackets $50 2 x 500 GB SATA hard drives, mirrored $100 Intel Xeon X3360 2.83 GHz quad-core CPU $300
• Comparison: Scaling Up Scaling Out CPUs 32 332 RAM 512 GB 664 GB Disk 4 TB 40.5 TB http://www.codinghorror.com/blog/2009/06/scaling-up-vs-scaling-out-hidden-costs.html
Distributed System Types
Shared • All CPUs share memory/disk • Scalability limited by memory Memory contention (vertical scaling only)
Shared • CPUs share storage, not RAM • Scalability limited by disk contention Disk (vertical scaling only)
Shared • Each CPU has its own RAM and disks • Very high (horizontal) scalability since Nothing no contention for shared resources
Data Centers
13 4/27/2011
Data Center Components
Measuring Power Efficiency
PUE is ratio of total building power to IT power; efficiency of datacenter building infrastructure SPUE is ratio of total server input to its useful power, where useful power is power consumed by CPU, DRAM, disk, motherboard, etc. Excludes losses due to power supplies, fans, etc. Computation efficiency depends on software and workload and measures useful work done per watt
Improving Power Efficiency
14 4/27/2011
Performance Impact of App Living Across
Data Center
Racks
Servers
Processors
Cores
Total Cost of Ownership (TCO)
TCO = Data Center Depreciation + Data Center Operating Expenses (Opex) + Server Depreciation + Server Operating Expenses (Opex) Depreciation is the process of allocating cost of assets across period during which assets are used. Example: server cost = $10,000, $0 residual value annual depreciation over 4 years = $2500
High Availability and Load Balancing
15 4/27/2011
Load Balancing and High Availability
Load Balancing High Availability
round-robin DNS reverse proxy heartbeat data partitioning wackamole hot spare
Availability
푀푇퐵퐹 퐴 = 푀푇퐵퐹 + 푀푇푇푅 MTBF = Mean Time Between Failures MTTR = Maximum Time To Resolution Example: MTBF=1000 hours, MTTR=1 hour 1000 퐴 = 1000+1 = 0.999000 = 99.9%
Single Point of Failure
• A single point of failure (SPOF) is a component which will cause the entire system to fail if it fails. • A fault tolerant system cannot have a single point of failure, and so has redundant components.
16 4/27/2011
Failover Requirements
Transparency Failover should not be noticeable by clients. Speed Failover should happen quickly, so that there is only a short downtime while it occurs. Automatic Failover should not require sysadmin to intervene. Consistent Clients should see same data on failover server as they saw on the original server prior to failover.
Failover Components
Move from failed server to the failover server – Network identity: DNS name, IP address, or MAC address must be transferred, depending on what layer of protocol stack service functions on. – Data: Usually must be accomplished by a shared storage system: NAS or SAN. – Processes: Server processes associated with data must be restarted once data and network identity are transferred.
Heartbeat
• Heartbeat is the signal that failover pairs use to communicate their status with each other. • Heartbeat signals are typically sent over a dedicated network between the pair, so that other network issues won’t interfere. • When heartbeat signals are not received by the secondary host for a specified time, it will start the failover process.
17 4/27/2011
Load Balanced Servers
Internet • Round robin DNS • Layer 4 load balancer • Web switch • Reverse proxy
Server 1 Server 2 … Server N
Load Balancing + Failover
Databases
18 4/27/2011
Database Scaling Techniques
B a s e c a s e S c a l e u p a 1 TPS system to a 2 TPS centralized system
1 TPS server 2 TPS server 1 0 0 U s e r s 2 0 0 U s e r s
Partitioning Replication Two 1 TPS system s Two 2 TPS system s
1 TPS server 2 TPS server 1 0 0 U s e r s 1 0 0 U s e r s
O t p s
O t p s 1 t p s
1 t p s
1 TPS server 2 TPS server 1 0 0 U s e r s 1 0 0 U s e r s
Master/Slave Replication • Slave DBs only accept read operations for application. – Flickr.com DBs logged 13 SELECTS for each write. • Master DB – Accepts write operations. – Copies write operations to slave DBs. – Single point of failure!
Two-phase commit
1. Request phase: transaction coordinator asks nodes to try to commit. Nodes execute transaction up to point where they will be asked to commit. Each node returns commit/abort. 2. Commit phase: if any node returns abort, coordinator tells all nodes to rollback; if all nodes return commit, http://xml.sys-con.com/node/43755 then coordinator tells all to commit and nodes send acknowledgement to coordinator when done.
19 4/27/2011
What is NoSQL?
“NoSQL is the term used to designate database management systems that differ from classic relational database management systems in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally.” -- Wikipedia
Non-relational may be more accurate than NoSQL, as some NoSQL DBs support a subset of SQL.
Why not stick to relational DBs? Limited scalability – Master/slave clusters are limited by write bandwidth and have a SPOF. – Partitioning requires that you rewrite your application to find your data, and does not scale joins. Availability is more important than consistency – A RDBMS will make data unavailable until it is consistent on each node. – In large clusters or if a node is down, the period of unavailability can be too long.
NoSQL and RDBMS
20 4/27/2011
Brewer’s CAP Theorem Web services can at most ensure 2 of the 3 following properties: – Strong consistency. All clients perceive that a set of operations has occurred completely or not at all. – Availability. All clients can read or write to some replica of the data, even if some nodes fail. – Partition tolerance. Operations will complete, even if individual components are unavailable. Distributed systems must be partition tolerant, so we have to choose between Consistency and Availability.
Brewer’s CAP Theorem
Eventual Consistency
Eventual consistency is a specific form of weak consistency, which guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.
Example: DNS – Owner makes DNS changes on local server – Servers update name after old name TTL expires – Clients see updated name after their caches expire
21 4/27/2011
Reverse Proxy Caching
Database Caching
Security and Privacy
22 4/27/2011
No Security Perimeter • Little control over physical or network location of cloud instance VMs • Network access must be controlled on a host by host basis.
Use iptables and EC2 Security Groups
Larger Attack Surface
Cloud Provider
Your Network
23 4/27/2011
Virtualization Security
VMs provide isolation and net attack vectors – Attacks from other VMs on same host – Attacks on hypervisor – Side channels allow other VMs to obtain data VM snapshots can remove malware or add it – Transient VMs may not be updated – Snapshots can revert to vulnerable or infected states
Data Security
Symmetric Homomorphic SSL Encryption Encryption Confidentiality
MAC Homomorphic SSL Integrity Encryption
Redundancy Redundancy Redundancy
Availability
Storage Processing Transmission
Case Study: Wikipedia
24 4/27/2011
Wikipedia Architecture 2004
Wikipedia Architecture 2005
Wikipedia Architecture 2006
25 4/27/2011
Wikipedia Architecture 2008
Wikipedia Architecture 2010
Wikipedia Shard Architecture
26 4/27/2011
Case Study: Quora
• You can ask questions • You can answer questions • You can comment on answered questions • You can vote-up or vote-down answers to questions • Questions can be assigned to topics • You can write a post (an informative statement, rather like a orphaned answer or blog post) • You can follow questions, topics or other users • Auto-complete search-box at the top, which doubles as the method for entering new questions
Quora Technologies • AMI is Ubuntu Linux • Static Content uses Cloudfront CDN • Load balancing – HAProxy for front-end – nginx as reverse-proxy to web servers • Web server – Pylons, a lightweight Python web framework • MySQL used as NoSQL data store – Sharded onto multiple servers – No joins or normalized schemas – memcached used as caching layer http://www.quora.com/What-languages-and-frameworks-were-used-to-code-Quora
27 4/27/2011
Schema-less data in MySQL
CREATE TABLE entities ( added_id INT PRIMARY KEY, id BINARY(16) NOT NULL, updated TIMESTAMP NOT NULL, body MEDIUMBLOB, UNIQUE KEY (id), KEY (updated) ) ENGINE=InnoDB;
http://bret.appspot.com/entry/how-friendfeed-uses-mysql
28