Distributed Systems CIS 505: Software Systems

Introduction to Distributed Systems  Why distributed systems? o availability of powerful yet cheap microprocessors (PCs, workstations, PDAs, embedded systems, etc.) o continuing advances in communication technology  What is a distributed system? o A distributed system is a collection of independent that appear to the users of the system as a single Insup Lee coherent system. Department of and Information Science University of Pennsylvania

CIS 505, Spring 2007

1 CIS 505, Spring 2007 Distributed Systems 2

Examples Advantages and disadvantages

 The world wide web – information, resource  Advantages sharing o Economics o Speed  Clusters, Network of workstations o Inherent distribution  Distributed manufacturing system (e.g., o Reliability automated assembly line) o Incremental growth  Network of branch office computers -  Disadvantages Information system to handle automatic o Software processing of orders o Network o More components to fail  Network of embedded systems o Security  New Cell processor (PlayStation 3)

CIS 505, Spring 2007 Distributed Systems 3 CIS 505, Spring 2007 Distributed Systems 4

1 Organization of a Distributed System Goals of Distributed Systems

 Transparency  Openness  Reliability 1.1  Performance  Scalability

A distributed system organized as middleware. Note that the middleware layer extends over multiple machines.

CIS 505, Spring 2007 Distributed Systems 5 CIS 505, Spring 2007 Distributed Systems 6

Transparency Transparency in a Distributed System

Transparency Description • How to achieve the single-system image, i.e., Hide differences in data representation and how a resource is Access how to make a collection of computers accessed Location Hide where a resource is located appear as a single computer. Migration Hide that a resource may move to another location Hide that a resource may be moved to another location while • Hiding all the distribution from the users as Relocation in use well as the application programs can be Hide that a resource may be shared by several competitive Replication achieved at two levels: users Hide that a resource may be shared by several competitive Concurrency 1) hide the distribution from users users 2) at a lower level, make the system look Failure Hide the failure and recovery of a resource transparent to programs. Persistence Hide whether a (software) resource is in memory or on disk 1) and 2) requires uniform interfaces such as access to files, communication. Different forms of transparency in a distributed system.

CIS 505, Spring 2007 Distributed Systems 7 CIS 505, Spring 2007 Distributed Systems 8

2 Openness Reliability

• Make it easier to build and change • Monolithic Kernel: systems calls are trapped and • Distributed system should be more reliable executed by the kernel. All system calls are served than single system. by the kernel, e.g., UNIX. – Availability: fraction of time the system is usable. • Microkernel: provides minimal services. Redundancy improves it. • IPC – Need to maintain consistency • some memory management – Need to be secure • some low-level process management and scheduling – Fault tolerance: need to mask failures, recover from errors. • low-level i/o (E.g., Mach can support multiple file systems, • Example: 3 machines with .95 probability of multiple system interfaces.) being up • Standard interface, separation of policy from • (1-.05)**3 vs 1-.05**3 probability of being up mechanism

CIS 505, Spring 2007 Distributed Systems 9 CIS 505, Spring 2007 Distributed Systems 10

Performance Scalability

• Without gain on this, why bother with distributed • Systems grow with time or become obsolete. systems. • Techniques that require resources linearly in terms of the size of the system are not scalable. (e.g., • Performance loss due to communication delays: broadcast based query won't work for large – fine-grain parallelism: high degree of interaction distributed systems.) – coarse-grain parallelism • Examples of bottlenecks (i.e., scalability limitations) • Performance loss due to making the system o Centralized components: a single mail server o Centralized tables/data: a single URL address book fault tolerant. o Centralized algorithms: routing based on complete information

CIS 505, Spring 2007 Distributed Systems 11 CIS 505, Spring 2007 Distributed Systems 12

3 Scalability Problems Scaling Techniques (1)

Characteristics of decentralized algorithms:  No machine has complete information about the system state.  Machines make decisions based only on local information. 1.4  Failure of one machine does not ruin the algorithm.  There is no implicit assumption that a global The difference between letting: clock exists. a) a server or b) a client check forms as they are being filled

CIS 505, Spring 2007 Distributed Systems 13 CIS 505, Spring 2007 Distributed Systems 14

Scaling Techniques (2) Pitfalls when Developing Distributed Systems

False assumptions made by first time developer:  The network is reliable.  The network is secure.

1.5  The network is homogeneous.  The topology does not change.  Latency is zero.  Bandwidth is infinite.  Transport cost is zero.  There is one administrator. An example of dividing the DNS name space into zones.

CIS 505, Spring 2007 Distributed Systems 15 CIS 505, Spring 2007 Distributed Systems 16

4 Hardware Concepts Multiprocessors (1)

1.6 1.7

A bus-based multiprocessor o Cache memory, hit rate, coherence, write-through Different basic organizations and memories in distributed cache, snoopy cache computer systems: multiprocessors vs. multicomputers CIS 505, Spring 2007 Distributed Systems 17 CIS 505, Spring 2007 Distributed Systems 18

Multiprocessors (2) Homogeneous Multicomputer Systems

a) Grid a) A crossbar switch b) Hypercube b) An omega switching network

1.8 1-9

Tightly coupled vs. loosely coupled

CIS 505, Spring 2007 Distributed Systems 19 CIS 505, Spring 2007 Distributed Systems 20

5 How slow is the network? Communication Latency

 “ping www.cis.upenn.edu”  Latency – “wire delay”  Round-trip times o Time to send and recv one byte of data o Depends on “distance” o Upenn .5ms  Bandwidth o Princeton 5ms o Bytes/second o Rice 43ms o Depends on size of vehicle o Stanford 80ms  Latency is the bottleneck o Tsinghua, Beijing 280ms o It improves slower than bandwidth  Speed of light  Routers in the middle (traffic stops) o Request-respond cycles dominate application

CIS 505, Spring 2007 Distributed Systems 21 CIS 505, Spring 2007 Distributed Systems 22

The speed pyramid Continuum of Distributed Systems

Issues: register 1 naming and sharing performance and scale L2 10 resource management small Parallel big Architectures Networks Memory 200 fast slow ? ? LAN 100,000 Multiprocessors Global clusters LAN Internet Disk 2,000,000 fast network slow network low latency trusting hosts untrusting hosts high latency WAN 20,000,000 high bandwidth coordinated autonomy low bandwidth secure, reliable interconnect autonomous nodes no independent failures unreliable network  Will the ratios change? coordinated resources fear and distrust independent failures decentralized administration [J. Chase] CIS 505, Spring 2007 Distributed Systems 23 CIS 505, Spring 2007 Distributed Systems 24

6 Types of Distributed Systems Cluster Systems

 Distributed Computing Systems  Distributed information systems  Distributed Pervasive/Embedded Systems

 Figure 1-6. An example of a cluster computing system.

CIS 505, Spring 2007 Distributed Systems 25 CIS 505, Spring 2007 Distributed Systems 26

Grid Computing Systems Transaction Processing Systems (1)

 Figure 1-8. Example primitives for transactions.  Figure 1-7. A layered architecture for grid computing systems.

CIS 505, Spring 2007 Distributed Systems 27 CIS 505, Spring 2007 Distributed Systems 28

7 Transaction Processing Systems (2) Transaction Processing Systems (3)

Characteristic properties of transactions:  Atomic: To the outside world, the transaction happens indivisibly.  Consistent: The transaction does not violate system invariants.  Isolated: Concurrent transactions do not interfere with each other.  Durable: Once a transaction commits, the changes are permanent.  Figure 1-9. A nested transaction.

CIS 505, Spring 2007 Distributed Systems 29 CIS 505, Spring 2007 Distributed Systems 30

Enterprise Application Integration Distributed Pervasive Systems

Requirements for pervasive systems

 Embrace contextual changes.  Encourage ad hoc composition.  Recognize sharing as the default.

 Figure 1-11. Middleware as a communication facilitator in enterprise application integration.

CIS 505, Spring 2007 Distributed Systems 31 CIS 505, Spring 2007 Distributed Systems 32

8 Embedded Home Environment Example: Home and Personal Appliances

Volume / Diversity

Home care facilities

Intelligent devices, tools, appliances and software for assisted living

Smart homes, home theaters, games, smart cars, etc. Yr 2005 ~2015 ~2025 [Liu] CIS 505, Spring 2007 Distributed Systems 33 CIS 505, Spring 2007 Distributed Systems 34

Justifications Observations

 Number of users: 10 – 1000 million  Rapid advances in component technologies, e.g.,  Types of sensors and actuators: 100’s o Smart gadgets, wearable sensors and actuators,  Number of suppliers: 10 – 100’s robotic helpers, mobile devices  Required reliability: <10,000 recalls/year o Wireless, wideband interconnects  User tolerance to glitches: minimum  Increasing critical needs due to  Product life cycles: 3 – 20 yrs o Aging baby-boom generation  Tolerable upgrade effort: minimum o Long life expectancy o New safety, security, and privacy concerns The environment must be open and evolvable, & capable of self diagnosis, healing, maintenance

CIS 505, Spring 2007 Distributed Systems 35 CIS 505, Spring 2007 Distributed Systems 36

9 Desired Trends Electronic Health Care Systems (1)

Quality & Questions to be addressed for health care systems: Usability  Where and how should monitored data be stored? Volume &  How can we prevent loss of crucial data? Diversity  What infrastructure is needed to generate and propagate alerts?  How can physicians provide online feedback?  How can extreme robustness of the monitoring Unit cost system be realized? Maintenance  What are the security issues and how can the cost proper policies be enforced? 2005 ~2015 ~2025 CIS 505, Spring 2007 Distributed Systems 37 CIS 505, Spring 2007 Distributed Systems 38

Electronic Health Care Systems (2) Background: Sensor Networks

 Types of sensors o Seismic, low sampling rate magnetic, thermal, visual, infrared, acoustic and radar  Conditions to monitor o Temperature, humidity, (vehicular) movement, lightning condition, pressure, soil makeup, noise levels o Presence or absence of certain kinds of objects o Mechanical stress levels on attached objects o Current characteristics such as speed, direction, and size of an object

 Figure 1-12. Monitoring a person in a pervasive electronic health care system, using (a) a local hub or  (b) a continuous wireless connection.

CIS 505, Spring 2007 Distributed Systems 39 CIS 505, Spring 2007 Distributed Systems 40

10 Technology Trends: Mote Evolution SN Characteristics

 Environment o connect to physical environment (large numbers, dense, real-time) o Sensor nodes are prone to failures, non-deterministic o wireless communication o massively parallel interfaces (to users and applications) o Limited resources: battery, bandwidth, memory, CPU (power management critical)  Network o Topology changes dynamically o sporadic connectivity o new resources entering/leaving o large amounts of redundancy o self-configure/re-configure

CIS 505, Spring 2007 Distributed Systems 41 CIS 505, Spring 2007 Distributed Systems 42

SN applications Smart Spaces

 Infrastructure security, military applications  Environmental and Habitat monitoring  Health applications Smart School  Smart space/home applications Smart Factory Smart City  Other commercial applications Other Applications o Industrial Sensing • Battlefields/Surveillance o Traffic Control, vehicle tracking and detection • Earthquake areas o Interactive museums • Environmental Monitoring • Airport security • Emergency Response • Location Services CIS 505, Spring 2007 Distributed Systems 43 CIS 505, Spring 2007 Distributed Systems 44

11 Sensor Networks (1) Sensor Networks (2)

Questions concerning sensor networks:  How do we (dynamically) set up an efficient tree in a sensor network?  How does aggregation of results take place? Can it be controlled?  What happens when network links fail?

 Figure 1-13. Organizing a sensor network , while storing and processing data (a) only at the operator’s site or …

CIS 505, Spring 2007 Distributed Systems 45 CIS 505, Spring 2007 Distributed Systems 46

Sensor Networks (3) The Challenges of Distributed Systems

o Secure communication over public networks  ACI: who sent it, did anyone see it, did anyone change it o Fault-tolerance  Building reliable systems from unreliable components  nodes fail independently; a distributed system can “partly fail”  [Lamport]: “A distributed system is one in which the failure of a machine I’ve never heard of can prevent me from doing my work.” o Replication, caching, naming  Placing data and computation for effective resource sharing, hiding latency, and finding it again once you put it somewhere. o Coordination and shared state  What should the system components do and when should they do it?  Once they’ve all done it, can they all agree on what they did and when?  Figure 1-13. Organizing a sensor network database, while storing and processing data … or (b) only at the sensors.

CIS 505, Spring 2007 Distributed Systems 47 CIS 505, Spring 2007 Distributed Systems 48

12