Ieeecomm.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
Constructing Reliable Distributed Communication Systems with CORBA Silvano Maffeis Douglas C. Schmidt [email protected] [email protected] Olsen and Associates Department of Computer Science Zurich, Switzerland Washington University, St. Louis This paper will appear in the feature topic issue on Dis- rameter marshalling and demarshalling; and operation dis- tributed Object Computing in the IEEE Communications patching. Magazine, Vol. 14, No. 2, February 1997. Experience over the past several years [3] illustrates that CORBA is well-suited for best-effort, client-server applica- tions running over conventional local area networks (such Abstract as Ethernet and Token Ring). However, building highly Communication software and distributed services for next- available applications with CORBA is much harder. Neither generation applications must be reliable, ef®cient, ¯ex- the CORBA standard nor conventional implementations of ible, and reusable. These requirements motivate the CORBA directly address complex problems related to dis- use of the Common Object Request Broker Architecture tributed computing, such as real-time quality of service [4] (CORBA). However, building highly available applications or high-speed performance [5], group communication [6], with CORBA is hard. Neither the CORBA standard nor partial failures, [7] and causal ordering of events [8]. conventional implementations of CORBA directly address This paper makes three contributions to the study of re- complex problems related to distributed computing, such as liable distributed object computing systems with CORBA. real-time or high-speed quality of service, partial failures, First, we examine the question of whether reliable applica- group communication, and causal ordering of events. This tions can (or should) be implemented with CORBA today. paper describes a CORBA-based framework that uses the Vir- Next, we present an extension to the Object Management tual Synchrony model to support reliable data- and process- Architecture that improves support for reliability and fault- oriented distributed systems that communicate through syn- tolerance. Finally, we propose a CORBA-based framework chronous methods and asynchronous messaging. based on the Virtual Synchrony model [7] that supports reli- able data- and process-oriented distributed systems. In addi- tion, our proposed framework supportsapplications requiring 1 Introduction loosely coupled processes that communicate through asyn- chronous messaging. Communication software and distributed services for next- generation applications must be reliable, ef®cient, ¯exible, and extensible. For instance, applications like personal com- 2 Reliability Matters munication systems (PCS), real-time market data feeds, and ¯ight reservation systems must be highly available and scal- Distributed systems are intended to form the backbone of able to meet their stringent reliability and performance de- emerging next-generation communication systems, includ- mands. In addition, these applications must be ¯exible and ing electronic commerce, PCS, satellitesurveillance systems, extensible to cope with their inherent complexity and to re- distributed medical imaging, real-time data feeds, and ¯ight spond rapidly to changing application requirements that span reservation systems. An obvious bene®t of distributed sys- a wide range of media types and access patterns. tems is that they re¯ect the global business and social envi- These requirements motivate the use of the Object Man- ronments in which we live and work. Another bene®t is that agement Group's Common Object Request Broker Architec- they can improvethe qualityof service (in terms of reliability, ture (OMG CORBA) [1, 2]. CORBA is an open standard availability, and performance) for complex systems. for distributed object computing. It de®nes a set of compo- Reliability is an important quality in mission-critical dis- nents that allow client applications to invoke operations on tributed applications. In many distributed environments, remote object implementations. CORBA enhances applica- even small amounts of downtime can annoy customers, hurt tion ¯exibility and portability by automating many common sales, or endanger human lives. We de®ne a distributed development tasks such as object registration, location, and system as reliable if its behavior is predictable despite of activation; demultiplexing; framing and error-handling; pa- partial failures, asynchrony, and run-time recon®guration of 1 the system. In addition, we require reliable applications to be using CORBA to implement portions of the Iridium system highly available, i.e., the application can provide its essen- control software. tial services despite the failure of computing nodes, software objects, and communication links. Building distributed systems is complex and expensive. Certain aspects of distributed systems make reliability Distribution presents developers with a number of inherent more dif®cult to achieve. For instance, partial failures are problems such as latency, concurrency control, heterogene- an inherent problem in distributed systems. The ªmean time ity, and partial failures. Further complicating matters are to failureº (MTTF) of components in a distributed system accidental problems such as the lack of widely reused higher- rapidly decreases as thenumber of computing nodes and com- level application frameworks, primitivedebugging tools, and munication links that constitute the system increases. An- non-scalable, unreliable software infrastructures. other inherent problem is that developers must address com- plex execution states of concurrent programs. Distributed Distributed object models like CORBA were devised to ad- systems consist of processes that run in parallel on heteroge- dress several of these problems. In CORBA, objects are spec- neous platforms and are therefore prone to race conditions, i®ed in a strongly typed interface de®nition language (IDL). communication errors, node failures, and deadlocks. Thus, Thus, CORBA objects can be used to hide heterogeneity and distributed systems are often more dif®cult to develop, ad- thedetails of the underlyingsystem software, communication minister, and maintain than their centralized counterparts. protocols, and hardware. For instance, two CORBA objects Conversely, other aspects of distributed systems can help running on the Object Request Brokers (ORBs) of different make applications more robust. For instance, distributed sys- manufacturers can interoperate with each other even when tems can be made more reliable than centralized systems by implemented in different programming languages, operating providing important services redundantly on multiple nodes. systems, or hardware platforms. To enable redundancy, active or passive replication should be supported by the communication system used to run dis- CORBA's synchronous method invocation model can help tributed applications. programmers avoid concurrency related problems. More- Hence, we face a peculiar situation: although program- over, CORBA object services (such as the Event, Concur- ming distributed applications is a daunting and error-prone rency, and Transaction Service) help to orchestrate the ac- task, a high degree of failure tolerance and reliability can tivities of distributed network objects that execute in parallel be achieved if the underlying communication system soft- across local area and wide area networks. ware supports replication. The conclusion we draw is that non-robust communication software will lead to fragile dis- The CORBA model by itself does not provide solutions tributed applications that will be frequently unavailable and to the problem of detecting and reacting to partial failures will require constant supervision. In contrast, sophisticated and to network partitioning. However, a CORBA object communication software (such as the Virtual Synchrony ap- can encapsulate internal state and make it accessible through proach presented in Section 4.3) can lead to distributed appli- an IDL interface. Due to this encapsulation, fault-tolerance cations that are inherently more reliable, more modular, and techniques like replication and state-checkpointing become more scalable than centralized ones. easier to implement because the internal state of an object is isolated. 3 Software Quality Matters There is a need for distributed debugging tools and run- time validation tools in the CORBA model. Distributed de- Most reported success with object technology has involved bugging has not been addressed by the OMG yet. However, centralized applications running on stand-alone computers. viewing a complex system in the form of distributed state In particular, user interfaces have widely adopted object- machines that interact by invoking operations on each other oriented design and programming techniques [9]. In contrast, can help in tracing distributed activities and in isolating and relatively few examples [10, 3] of object-oriented distributed correcting problems. systems are currently deployed in production commercial systems. Reliable distributed computing requires the presence of an There are a number of interesting projects that are currently execution model that allows programmers to predict the be- employing object-oriented distributed technology and which havior of a distributed application despite asynchrony, con- are planned to become operational shortly. For example the currency, and partial failures. There are three interesting Iridium system, which is being designed