Lesson 02: Distributed System Patterns
Total Page:16
File Type:pdf, Size:1020Kb
Lesson 02: Distributed System Patterns Phillip J. Windley, Ph.D. CS462 – Large-Scale Distributed Systems Lesson 02: Distributed System Patterns Contents 00 Coupling 03 Distributed Architectures 01 Distributed System Design Axes 04 Conclusion 02 Rethinking Distributed One of the most important Coupling properties of a distributed system is how tightly or loosely coupled the processing is. Lesson 02: Distributed System Patterns Coupling Coupling refers to the degree to which two or more processes are interdependent. We say two things are tightly coupled when they are interdependent and loosely coupled when they are independent. Tight and loose coupling are not binary positions, but rather relative terms. Any given architectural choice might make two processes more or less coupled. CS462 – Large-Scale Distributed Systems 4 Lesson 02: Distributed System Patterns Distributed Systems and Loose Coupling Generally when designing If two processes are completely independent they can successfully operate without any kind of coordination with the other process. distributed systems Whenever we introduce dependencies, the processes must anything that makes the coordinate their activities. This requires some kind of system more loosely communication between the two processes. This can result in coupled is desirable. wasted computation, delays due to latency, and computational errors. But, no useful software Of course, we can’t accomplish most interesting computations without some coupling. Good distributed architectures accomplish system can be built where their tasks with a minimum of coupling. Good distributed system all the components are architects choose architectures that minimize the coupling necessary to get the job done. completely decoupled. CS462 – Large-Scale Distributed Systems 5 Lesson 02: Distributed System Patterns System Level Coupling Coupling can occur on multiple levels within a system. The following table gives some of the choices that can be made at different levels of a system that increased or decrease the level of coupling. Level More Tightly Coupled More Loosely Coupled Physical connection Direct Intermediary Communication style Synchronous Asynchronous Type system Strong type system Weak type system Interaction pattern Remote procedure call Messages Process logic control Central design Independent teams Data schema Normalized Denormalized Service discovery/binding Static Dynamic Platform dependencies Dependent/specified Independent/guided Adapted from Enterprise SOA: Service-Oriented Architecture Best Practices CS462 – Large-Scale Distributed Systems 6 Lesson 02: Distributed System Patterns Types of Coupling - Logical In Event-Based Logical coupling occurs when two processes share information or make assumptions about the other. When this happens, one Programming, Ted Faison process can have an effect on the other even when they share no identifies three flavors of data. coupling: logical, type, and For example, suppose both processes share an algorithm for signature. calculating sales tax. Changes to one process will affect the other despite the fact that there is no computational artifact (code, API, etc.) that links them. This can occur even if one makes assumptions about how the other computes sales tax. Logical coupling can be and should be avoided because it adds nothing to the computation and is a potential source of logical errors and system failure. CS462 – Large-Scale Distributed Systems 7 Lesson 02: Distributed System Patterns Types of Coupling - Type Type coupling occurs when Type coupling is one of the most common forms of coupling in distributed systems. External interfaces abound and distributed one process references systems use those interfaces to make requests of and give the external interface or commands to other processes. (worse) internal model of External interfaces, called APIs, provide a contract that defines the another. interprocess communication. Coupling occurs because the calling process has to know the syntax and semantics of the API and is thus dependent on it. Another, more insidious, form of type coupling occurs when processes share a data model. For example two processes may have direct access to a database and used it as a shared memory, linking them. Shared memory requires careful and complex coordination and is often a source of logical coupling through share semantics. CS462 – Large-Scale Distributed Systems 8 Lesson 02: Distributed System Patterns Types of Coupling - Platform Platform coupling is a special case of type coupling. Platforms include not only popular language platforms like the Java JVM and the .Net CLR, but also frameworks for other languages. Platforms create coupling through common components and assumed or defined interface patterns. This kind of coupling can provide advantages in the form of reduced programming effort and abstractions to common interaction patterns. But relying on platforms to solve interaction patterns also limits the kinds of processes that can participate. As an example, if you build a distributed system on the JVM and use RMI for interprocess communication, processes implemented in non-JVM-based languages can’t easily interact with the system as first-class citizens. CS462 – Large-Scale Distributed Systems 9 Lesson 02: Distributed System Patterns Types of Coupling - Signature Signature coupling occurs For distributed systems, the primary difference between type and when one process signature coupling is that the latter refers to run-time considerations. For example, a service you depend on may be indirectly and dynamically down or non-performant. causes another process to Messaging interfaces are also a form of signature coupling since take action.* they are more dynamic than request-response or RPC-style interfaces. Messages don’t require the same level of interface dependency as an API. In general, signature coupling is preferable to type coupling since two processes coupled by messaging know much less about the internal semantics and capabilities of the other. Dynamic * Faison defines signature coupling relative interaction also allows for programmatic self-healing of faults, a to object-oriented programming. I’ve recast it desirable property of distributed systems. in light of more general-purpose distributed system concepts. CS462 – Large-Scale Distributed Systems 10 The term distributed system Design Axes can refer to systems with vastly different design choices. Lesson 02: Distributed System Patterns Distributed Systems Use Many Processes Distributed systems are We saw in Why Distributed Systems?[01] that a distributed system is one made up of a number of processes that are coordinating made from two or more their efforts to achieve a specific goal or offer a specific service. processes that are The key principle is that even though multiple processes are being interconnected to achieve used, to the end user of the computation, it appears that a single a specific purpose. computer were being used. Achieving this requires careful design. There are myriad design Design involves choosing choices a distributed system architect can follow in achieving precise system goals. the process architecture, their interconnections, and controlling data. CS462 – Large-Scale Distributed Systems 12 Lesson 02: Distributed System Patterns Where is the processing done? What do we mean by process? We’ve been circumspect to this point, merely referring to processes. A process can be many things. A process might be an entire CPU in an embedded system with a simple, single-threaded OS. A process might be an OS process or thread. And of course, they can be further virtualized by things like hypervisors or the Java Virtual Machine (JVM). Processes can be running next to each other on the same CPU, on different cores on the same chip, or spread out across machines in the same data center or around the world. CS462 – Large-Scale Distributed Systems 13 Lesson 02: Distributed System Patterns How are processes interconnected? Put another way what is the topology of the distributed system? We can arrange the processing so that it flows along a pipeline of processes. We can have have a central controller yielding a star topology. We can arrange processing in a stack where communication flows up and down. We might create a hierarchy or tree. We can fully connect each of the processes so that they can all speak together. None of these are inherently better than the others. The best choice depends on the problem that you’re solving. CS462 – Large-Scale Distributed Systems 14 Lesson 02: Distributed System Patterns What is the communication style? There are numerous ways that processes can exchange information. Each has implications for scalability, reliability, and maintainability of the system. RPC , or remote procedure call, is the most straightforward, but creates tight coupling that can affect system performance and code understandability. Request -response is the primary communication style of the Web and it’s underlying HTTP protocol. Request-response is typically synchronous. Request-response and RESTful are often treated as synonymous, but we’ll see they’re different. Messaging is the most general and can be synchronous or asynchronous. Events are special types of messages that notify of a state change. CS462 – Large-Scale Distributed Systems 15 Lesson 02: Distributed System Patterns Where is information stored? Distributed systems are significantly simpler when they don’t have to worry about state