
CIT 668: System Architecture



1. What is Architecture? 2. What is ? 3. Computer Architecture and Parallelism 4. Data Centers 5. High Availability and Load Balancing 6. Distributed and NoSQL 7. Security and Privacy

What is Architecture? architecture(n): the complex or carefully designed structure of something

Specifically in computing: the conceptual structure and logical organization of a computer or computer-based system: a client/ architecture

- http://oxforddictionaries.com/

1 4/27/2011

Cloud Computing

What is Cloud Computing?

“Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” NIST definition of Cloud Computing

Cloud Service Models Abstraction Layers

2 4/27/2011

Cloud Deployment Architectures

Cloud Computing Advantages

• Flexibility • Scalability • Cost • Maintenance • Utilization • Power

Cloud is enabled by

Virtual Linux BSD W2k8 Machines

Physical Machine

3 4/27/2011

Computer Architecture

A Single CPU Computer Components

The 5 Von Neumann Components




4 4/27/2011

Processor-Memory Bottleneck

Solution: Caches

Principle of Locality

Programs tend to reuse data and instructions near those they have used recently.

Temporal locality: Recently referenced items are likely to be referenced in the near future.

Spatial locality: Items with nearby addresses tend to be referenced close together in time.

Caches Cache: A smaller, faster storage device that transparently stores a subset of the data in a larger slower device so that future requests for that data can be served more quickly.

Average Memory Access Time = Time for a Cache Hit + Miss Rate × Miss Penalty

5 4/27/2011

Memory Hierarchy

The Processor

• The Brain: a functional unit that interprets and carries out instructions (mathematical operations) • Also called a CPU (actually includes CPU + ALU) • Consists of hundreds of millions of transistors.

Moore’s Law – Number of transistors doubles every 18 months

More transistors =

Cheaper CPUs

Higher speeds

More features

More cache


6 4/27/2011

Improvements in CPU Clock Speed

Serial and Parallel Computation



Flynn’s Taxonomy

Single Instruction Multiple Instruction

Single Data SISD: Pentium III MISD: None today

Multiple Data SIMD: SSE MIMD: Xeon e5345 instruction set (Clovertown)

7 4/27/2011

Instruction Level Parallelism (ILP)

Running independent instructions on separate execution units simultaneously.

Serial Execution: If each instruction takes one cycle, it takes x = a + b 3 clock cycles to run program. y = c + d z = x + y Parallel Execution: – First two programs are independent, so can be executed simultaneously. – Third instruction depends on first two, so must be executed afterwards. – Two clock cycles to run program.

Superscalar Architecture

Instead of one ALU, use multiple execution units – Some execution units are identical to others – Others are different: Integer, FPU, multi-media


• Multicore CPU chips contain multiple complete processors • Individual L1 and shared L2 caches • OS and applications see each core as an independent processor • Each core can run a separate task • A single application must be divided into multiple tasks to improve performance

8 4/27/2011

Amdahl’s Law

Speedup due to enhancement E is 퐸푥푒푐푢푡𝑖표푛 푡𝑖푚푒 푤𝑖푡ℎ표푢푡 퐸 푆푝푒푒푑푢푝 = 퐸푥푒푐푢푡𝑖표푛 푡𝑖푚푒 푤𝑖푡ℎ 퐸 Suppose E accelerates a piece P (P<1) of task by a factor S (S>1) and remainder unaffected

Exec time with E = Exec time w/o E × [ 1 - P + P/S ] 1 푆푝푒푒푑푢푝 = 1 − 푃 + 푃/푆

Amdahl’s Law: Example

Consider an application whose work is divided into the following four components:

Work Memory Disk Network Computation Load Access Access Access Time 10% 70% 10% 10% What is the expected percent improvement if:

Memory access speed is doubled? 5%

Computation speed is doubled? 35%

Amdahl’s Law for Parallelization 1 푆푝푒푒푑푢푝 = 1 − 푃 + 푃/푆 • Let P be the parallelizable portion of code • As the number of processors increases, the time to do the parallel portion of the program, P/S tends towards zero, reducing the equation to: 1 푆푝푒푒푑푢푝 = 1 − 푃 • If P=0, then =1 (no improvement) • If P=1, then speedup grows without limit. • If P=0.5, then maximum speed is 2.

9 4/27/2011

Amdahl’s Law: Parallel Example

Consider an application whose work is divided into the following four functions:

Work f1 f2 f3 f4 Load Time 4% 10% 80% 6% Assume f1, f3, and f4 can be parallelized, but f2 must be computed serially.

Parallelizing which part would best improve performance? f3

What is the best performance speedup that could be reached by parallelizing all 10X three parallelizable functions?

Amdahl’s Law: Time Example

Consider an application whose work is divided into the following four functions:

Work f1 f2 f3 f4 Load Time 2ms 5ms 40ms 3ms Assume f1, f3, and f4 can be parallelized, but f2 must be computed serially. Assume that running the whole program takes 50ms.

What is the best running time that can be achieved by parallelizing f1, f3, and f4? 5ms

5ms is the time Why can’t parallelizing the program decrease required for serial the total running time below that time? part. Even if parallel part takes 0ms, f2 still takes 5ms to run.

Amdahl’s Law

10 4/27/2011


Vertical Scaling

Plenty of Fish

• 1.2 billion page views per month, 500,000 average unique logins per day • 30+ million hits per day, 500-600 per second • 45 million visitors per month • top 30 site in the US, top 10 in Canada, top 30 in the UK • 2 load balanced Windows Server 2003 x64 web servers with 2 Quad Core 2.66Ghz CPUs, 8 GB RAM, 2 hard drives • 3 servers. No data on their configuration • Approaching 64,000 simultaneous connections and 2 million page views per hour • connection is a 1 Gbps line, 200 Mbps is used • 1 TB per day serving 171 million images through Akamai • 6 TB storage array to handle millions of full sized images uploaded every month to the site http://highscalability.com/plentyoffish-architecture

11 4/27/2011

Plenty of Fish Scaling “We upgraded from a machine with 64 GB of ram and 8 CPU’s to a HP ProLiant DL785 with 512 GB of ram and 32 CPU’s and moved from SQLserver 2005 to 2008 and windows 2008.” – Markus, https://plentyoffish.wordpress.com/2009/06/14/upgrade s-themes-date-night/

Estimated cost ~ $100,000

Horizontal Scaling

Googol = 10100

Large Container First Rack Data • 8 CPUs Centers • 200 GB • > 106 Sun Ultra 2 servers • 2 200 MHz processors

12 4/27/2011

Horizontal vs. Vertical Scaling Example

• Total budget is $100,000 • Vertical: PoF HP ProLiant DL785 32CPU,512GB • Horizontal: 83 1U servers for $1150 each Lenovo ThinkServer RS110 barebones $600 8 GB RAM $100 2 x eBay drive brackets $50 2 x 500 GB SATA hard drives, mirrored $100 Intel Xeon X3360 2.83 GHz quad-core CPU $300

• Comparison: Scaling Up Scaling Out CPUs 32 332 RAM 512 GB 664 GB Disk 4 TB 40.5 TB http://www.codinghorror.com/blog/2009/06/scaling-up-vs-scaling-out-hidden-costs.html

Distributed System Types

Shared • All CPUs share memory/disk • Scalability limited by memory Memory contention (vertical scaling only)

Shared • CPUs share storage, not RAM • Scalability limited by disk contention Disk (vertical scaling only)

Shared • Each CPU has its own RAM and disks • Very high (horizontal) scalability since Nothing no contention for shared resources

Data Centers

13 4/27/2011

Data Center Components

Measuring Power Efficiency

PUE is ratio of total building power to IT power; efficiency of datacenter building infrastructure SPUE is ratio of total server input to its useful power, where useful power is power consumed by CPU, DRAM, disk, motherboard, etc. Excludes losses due to power supplies, fans, etc. Computation efficiency depends on software and workload and measures useful work done per watt

Improving Power Efficiency

14 4/27/2011

Performance Impact of App Living Across

Data Center





Total Cost of Ownership (TCO)

TCO = Data Center Depreciation + Data Center Operating Expenses (Opex) + Server Depreciation + Server Operating Expenses (Opex) Depreciation is the of allocating cost of assets across period during which assets are used. Example: server cost = $10,000, $0 residual value annual depreciation over 4 years = $2500

High Availability and Load Balancing

15 4/27/2011

Load Balancing and High Availability

Load Balancing High Availability

round-robin DNS reverse proxy heartbeat data partitioning wackamole hot spare


푀푇퐵퐹 퐴 = 푀푇퐵퐹 + 푀푇푇푅 MTBF = Mean Time Between Failures MTTR = Maximum Time To Resolution Example: MTBF=1000 hours, MTTR=1 hour 1000 퐴 = 1000+1 = 0.999000 = 99.9%

Single Point of Failure

• A single point of failure (SPOF) is a component which will cause the entire system to fail if it fails. • A fault tolerant system cannot have a single point of failure, and so has redundant components.

16 4/27/2011

Failover Requirements

Transparency Failover should not be noticeable by clients. Speed Failover should happen quickly, so that there is only a short downtime while it occurs. Automatic Failover should not require sysadmin to intervene. Consistent Clients should see same data on failover server as they saw on the original server prior to failover.

Failover Components

Move from failed server to the failover server – Network identity: DNS name, IP address, or MAC address must be transferred, depending on what layer of protocol stack service functions on. – Data: Usually must be accomplished by a shared storage system: NAS or SAN. – Processes: Server processes associated with data must be restarted once data and network identity are transferred.


• Heartbeat is the signal that failover pairs use to communicate their status with each other. • Heartbeat signals are typically sent over a dedicated network between the pair, so that other network issues won’t interfere. • When heartbeat signals are not received by the secondary host for a specified time, it will start the failover process.

17 4/27/2011

Load Balanced Servers

Internet • Round robin DNS • Layer 4 load balancer • Web switch • Reverse proxy

Server 1 Server 2 … Server N

Load Balancing + Failover


18 4/27/2011

Database Scaling Techniques

B a s e c a s e S c a l e u p a 1 TPS system to a 2 TPS centralized system

1 TPS server 2 TPS server 1 0 0 U s e r s 2 0 0 U s e r s

Partitioning Two 1 TPS system s Two 2 TPS system s

1 TPS server 2 TPS server 1 0 0 U s e r s 1 0 0 U s e r s

O t p s

O t p s 1 t p s

1 t p s

1 TPS server 2 TPS server 1 0 0 U s e r s 1 0 0 U s e r s

Master/Slave Replication • Slave DBs only accept read operations for application. – Flickr.com DBs logged 13 SELECTS for each write. • Master DB – Accepts write operations. – Copies write operations to slave DBs. – Single point of failure!

Two-phase commit

1. Request phase: transaction coordinator asks nodes to try to commit. Nodes execute transaction up to point where they will be asked to commit. Each node returns commit/abort. 2. Commit phase: if any node returns abort, coordinator tells all nodes to rollback; if all nodes return commit, http://xml.sys-con.com/node/43755 then coordinator tells all to commit and nodes send acknowledgement to coordinator when done.

19 4/27/2011

What is NoSQL?

“NoSQL is the term used to designate database management systems that differ from classic relational database management systems in some way. These data stores may not require fixed table schemas, and usually avoid join operations and typically scale horizontally.” -- Wikipedia

Non-relational may be more accurate than NoSQL, as some NoSQL DBs support a subset of SQL.

Why not stick to relational DBs? Limited scalability – Master/slave clusters are limited by write bandwidth and have a SPOF. – Partitioning requires that you rewrite your application to find your data, and does not scale joins. Availability is more important than consistency – A RDBMS will make data unavailable until it is consistent on each node. – In large clusters or if a node is down, the period of unavailability can be too long.


20 4/27/2011

Brewer’s CAP Theorem Web services can at most ensure 2 of the 3 following properties: – Strong consistency. All clients perceive that a set of operations has occurred completely or not at all. – Availability. All clients can read or write to some replica of the data, even if some nodes fail. – Partition tolerance. Operations will complete, even if individual components are unavailable. Distributed systems must be partition tolerant, so we have to choose between Consistency and Availability.

Brewer’s CAP Theorem

Eventual Consistency

Eventual consistency is a specific form of weak consistency, which guarantees that if no new updates are made to the object, eventually all accesses will return the last updated value.

Example: DNS – Owner makes DNS changes on local server – Servers update name after old name TTL expires – Clients see updated name after their caches expire

21 4/27/2011

Reverse Proxy Caching

Database Caching

Security and Privacy

22 4/27/2011

No Security Perimeter • Little control over physical or network location of cloud instance VMs • Network access must be controlled on a host by host basis.

Use iptables and EC2 Security Groups

Larger Attack Surface

Cloud Provider

Your Network

23 4/27/2011

Virtualization Security

VMs provide isolation and net attack vectors – Attacks from other VMs on same host – Attacks on hypervisor – Side channels allow other VMs to obtain data VM snapshots can remove malware or add it – Transient VMs may not be updated – Snapshots can revert to vulnerable or infected states

Data Security

Symmetric Homomorphic SSL Encryption Encryption Confidentiality

MAC Homomorphic SSL Integrity Encryption

Redundancy Redundancy Redundancy


Storage Processing Transmission

Case Study: Wikipedia

24 4/27/2011

Wikipedia Architecture 2004

Wikipedia Architecture 2005

Wikipedia Architecture 2006

25 4/27/2011

Wikipedia Architecture 2008

Wikipedia Architecture 2010

Wikipedia Shard Architecture

26 4/27/2011

Case Study: Quora

• You can ask questions • You can answer questions • You can comment on answered questions • You can vote-up or vote-down answers to questions • Questions can be assigned to topics • You can write a post (an informative statement, rather like a orphaned answer or blog post) • You can follow questions, topics or other users • Auto-complete search-box at the top, which doubles as the method for entering new questions

Quora Technologies • AMI is Ubuntu Linux • Static Content uses Cloudfront CDN • Load balancing – HAProxy for front-end – nginx as reverse-proxy to web servers • – Pylons, a lightweight Python web framework • MySQL used as NoSQL data store – Sharded onto multiple servers – No joins or normalized schemas – memcached used as caching layer http://www.quora.com/What-languages-and-frameworks-were-used-to-code-Quora

27 4/27/2011

Schema-less data in MySQL

