<<

Programming Languages for Distributed Systems

HENRI . BAL Department of Mathematics and , Vrije Universiteit, Amsterdam, The Netherlands

JENNIFER G. STEINER Centrum uoor Wiskunde en Znformatica, Amsterdam, The Netherlands

ANDREW S. TANENBAUM Department of Mathematics and Computer Science, Vrije Uniuersiteit, Amsterdam, The Netherlands

When distributed systems first appeared, they were programmed in traditional sequential languages, usually with the addition of a few procedures for sending and receiving messages. As distributed applications became more commonplace and more sophisticated, this ad hoc approach became less satisfactory. Researchers all over the world began designing new programming languages specifically for implementing distributed applications. These languages and their history, their underlying principles, their design, and their use are the subject of this paper. We begin by giving our view of what a distributed system is, illustrating with examples to avoid confusion on this important and controversial point. We then describe the three main characteristics that distinguish distributed programming languages from traditional sequential languages, namely, how they deal with parallelism, communication, and partial failures. Finally, we discuss 15 representative distributed languages to give the flavor of each. These examples include languages based on , rendezvous, , objects, and atomic transactions, as well as functional languages, logic languages, and distributed data structure languages. The paper concludes with a comprehensive bibliography listing over 200 papers on nearly 100 distributed programming languages.

Categories and Subject Descriptors: .2.4 [Computer-Communication Networks]: Distributed Systems-distributed applications; D.1.3 [Programming Techniques]: Concurrent Programming; D.3.3 [Programming Languages]: Language Constructs- concurrent programming structures; D.4.7 [Operating Systems]: Organization and Design-distributed systems General Terms: Languages, Design Additional Key Words and Phrases: Distributed data structures, distributed languages, distributed programming, functional programming, languages for distributed programming, languages for parallel programming, logic programming, object-oriented programming, parallel programming

The research of H. E. Bal was supported in part by the Netherlands Organization for Scientific Research under Grant 125-30-10. J. G. Steiner’s current address: Open Software Foundation, Cambridge, MA. Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Association for Computing Machinery. To copy otherwise, or to republish, requires a fee and/or specific permission. 0 1989 ACM 0360-0300/89/0900-0261$00.75

ACM Computing Surveys, Vol. 21, No. 3, September 1989 262 . H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

CONTENTS quired for implementing these applications. This programming support may be pro- vided either by the or by a . We briefly ex- INTRODUCTION amine the first option, and note several 1. SYSTEMS disadvantages. 1.1 Classes of Distributed Applications In Section 2 we discuss the second op- 1.2 Requirements for Distributed Programming tion-special language support for pro- support 1.3 Languages for Distributed Programming gramming distributed computing systems. 2. LANGUAGE SUPPORT FOR PROGRAMMING We identify three issues that must be ad- DISTRIBUTED SYSTEMS dressed by a language intended to support 2.1 Parallelism distributed applications programming: the 2.2 Interprocess Communication and Synchronization ability to execute different pieces of a pro- 2.3 Partial Failure gram on different processors, the ability for 3. LANGUAGES FOR PROGRAMMING these pieces to cooperate with one another, DISTRIBUTED SYSTEMS and the ability to cope with (or take advan- 3.1 Languages with Logically Distributed tage of) partial failure of the distributed Address Spaces 3.2 Languages with Logically Shared system. We show how different languages Address Spaces have addressed these issues in very differ- 4. CONCLUSIONS ent ways, and how the appropriateness of APPENDIX one language over another depends primar- ACKNOWLEDGMENTS REFERENCES ily on the type of application to be written and, to a lesser extent, on the kind of dis- tributed system on which the application is to be implemented. INTRODUCTION In Section 3 we look at some represent- ative programming languages designed for During the past decade, many kinds of distributed computing systems. We divide distributed computing systems have been the languages into simple categories, de- proposed and built. These systems cover a scribing one or two examples from each wide spectrum in terms of design goals, category in some detail. We hope in this size, performance, and applications. They way to give a flavor of current research in also differ considerably in how they are this area. The languages we describe are programmed. Some are programmed in CSP, Occam, NIL, Ada,’ Concurrent C, conventional languages, possibly supple- Distributed Processes, SR, Emerald, mented with a few new library routines. Argus, Aeolus, ParAlfl, Concurrent PRO- Others are programmed in completely new LOG, PARLOG, Linda, and Orca. Finally, languages, specially designed for distrib- we present our conclusions and give an uted applications. It is the intention of this extensive bibliography. paper to describe and compare the methods and languages that can be used for pro- 1. DISTRIBUTED COMPUTING SYSTEMS gramming distributed systems, and to pres- ent in some detail several languages We begin this section with our definition representative of the research to date in of a distributed computing system. Noting this area. that there is a spectrum of such systems, There is no consensus in the literature characterized by their interconnecting net- as to the definition of a distributed com- work. We then discuss the kinds of appli- puting system, so we begin Section 1 by cations for which these systems are used. defining our use of this term. We then Next, we list the requirements for support- discuss the different kinds of distributed ing the programming of these applications. computing systems that have been built, the types of applications for which they are ’ Ada is a registered trademark of the U.S. Department intended, and the programming support re- of Defense.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 263 Finally, we discuss how programming lan- but cooperate by sending messages over a guages can fulfill these requirements. communications network. There is considerable disagreement in the literature as to what constitutes a dis- Each in such a system executes tributed system. Among the many defini- its own instruction stream(s) and uses its tions that we have seen, there is only own local data, both stored in its local one point of agreement: They all require memory. Occasionally, processors may the presence of multiple processors. The need to exchange data; they do so by send- confusion may therefore be due to the large ing messagesto each other over a network. number of different architectural models Many different types of networks exist one finds in multiple-processor systems. (e.g., hypercube, local-area network, wide- Vector computers, for example, use many area networks), as will be discussed below. processors that simultaneously apply the Although these networks have very differ- same arithmetic operations to different ent physical properties, they all fit into the data [Russell 19781. They are best suited same model: each is a medium for trans- for computation-intensive numerical appli- ferring messages among processors (see cations. Dataflow and reduction machines Figure la). Distributed systems can be con- apply different operations to different data trasted with multiprocessors, in which pro- [Treleaven et al. 19821. Multiprocessors cessors communicate through a shared consist of several autonomous processors memory (see Figure lb). sharing a common primary memory [Jones Of the architectures mentioned in the and Schwarz 19801. These are well suited list of examples above, multicomputers, for running different subtasks of the same workstation-LANs, and workstation- program simultaneously. Multicomputers WANs qualify as distributed computing are similar to multiprocessors, except that systems by our definition. the processors do not memory, but Distributed systems can be further char- rather communicate by sending messages acterized by their communications net- over a communications network [Athas and works. The network determines the speed Seitz 19881. As a final example, there are and reliability of interprocessor communi- systems comprised of workstations or min- cation, and the spatial distribution of the icomputers connected by a local- or wide- processors. Traditionally, a distributed ar- area network. This type of system is fre- chitecture in which communication is fast quently the target for distributed operating and reliable and where processors,are phys- systems [Tanenbaum and van Renesse ically close to one another is said to be 19851. (We will refer to these latter closely coupled; systems with slow and un- two systems as workstation-LANs and reliable communication between processors workstation- WANs.) that are physically dispersed are termed Experts strongly disagree as to which of loosely coupled,’ these multiple processor architectures are Closely coupled distributed systems use to be considered distributed systems. Some a communications network consisting of people claim that all of the configurations fast, reliable point-to-point links, which mentioned above fall under this category. connect each processor to some subset of Others include only geographically distrib- the other processors. Examples of such sys- uted computers connected by a wide-area tems are the Cosmic Cube [Seitz 19851, network. Each combination in between hypercubes [Ranka et al. 19881, and trans- these two extremes probably also has its puter networks [May and Shepherd 19841. defenders. The meaning we intend to con- Communication costs for this type of sys- vey by our use of the term distributed com- tem used to be on the order of a millisecond, puting system is the following: but are expected to drop to less than a

Definition. A distributed computing sys- *It must be noted that it is not entirely correct to associate these two attributes with architectures, as, tem consists of multiple autonomous pro- for example, communication speed also depends very cessors that do not share primary memory, much on the current hardware technology.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 264 . H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Private Private memory memory 0 0

‘9 NETWORK p

(a)

7 Shared CPU memory CPU

Figure 1. Communication in distributed systems (a) versus shared-memory multiprocessors (b): (a) physical communication by message passing; (b) physical communication through . microsecond in the near future [Athas and the same model: autonomous processors Seitz 19881. connected by some kind of network that A more loosely coupled type of distrib- communicate by sending messages. uted system is a workstation-LAN. The The main purpose of this paper is to local-area network (LAN) allows direct study languages for programming the sys- communication between any two proces- tems that fit into this spectrum. As all sors. Communication cost is typically on systems discussed here are conceptually the order of milliseconds. In many LANs, similar, their programming languages need communication is not totally reliable. not be fundamentally different. At least in Occasionally, a message may be damaged, principle, any one of these languages may arrive out of order, or not arrive at be used for programming a variety of dis- its destination at all. Software protocols tributed architectures. The choice of a suit- must be used to implement reliable able language depends very much on the communication. kind of application to be implemented. A LAN limits the physical distance be- Below, we first look at several classes of tween processors to on the order of a few applications that have been written for dis- kilometers. To interconnect processors that tributed systems. Then we consider what are farther apart, a wide-area network kind of support is required for these appli- (WAN) can be used. A workstation-WAN cations and how this support can be pro- can be seen as a very loosely coupled dis- vided by a programming language. tributed system. Communication in a WAN is slower and less reliable than in a LAN; 1.1 Classes of Distributed Applications communication cost may be on the order of seconds. On the other hand, the increased Distributed computing systems are used for availability of wide-area lines at speeds many different types of applications. We above 1 Mbit/s (e.g., Tl lines in the United first look at the reasons why a distributed States), will blur the distinction between system might be favored over other archi- LANs and WANs in the future. tectures, such as uniprocessors or shared- In summary, there is a spectrum of dis- memory multiprocessors, and then classify tributed computing systems, ranging from the distributed applications accordingly. closely coupled to very loosely coupled sys- The reasons for programming applications tems. Although communication speed and on distributed systems fall into four general reliability decrease from one end of the categories: decreasing turnaround time for spectrum to the other, all systems fit into a single computation, increasing reliability

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 265 and availability, the use of parts of the Example applications are compilation of systems to provide special functional- modules of a given program in parallel on ity, and the inherent distribution of the different machines [Baalbergen 19881 and application. implementation of heuristic search algo- rithms [Bal et al. 1987; Finkel and Manber 1.1.1 Parallel, High-Performance Applications 19871. Also, some of the worlds best chess programs run on loosely coupled distrib- Achieving through parallelism is a uted systems. ParaPhoenix, for example, common reason for running an application runs on a collection of SUNS connected by on a distributed computing system. By ex- an [Marsland et al. 19861. ecuting different parts of a program on different processors at the same time, some programs will finish faster. In principle, 1.1.2 Fault-Tolerant Applications these parallel applications can be run just For critical applications such as controlling as well on shared-memory multiprocessors. an aircraft or an automated factory, a uni- Shared-memory systems, however, do not processor may not be reliable enough. Dis- scale to large numbers (thousands) of pro- tributed computing systems are potentially cessors, which explains the high interest in more reliable, because they have the so- implementing parallel programs on distrib- called partial failure property: since the uted systems. processors are autonomous, a failure in one Parallel applications can be further clas- processor does not affect the correct func- sified by the grain of parallelism they use. tioning of the other processors. Reliability The grain is the amount of computation can therefore be increased by replicating time between communications. Large-grain the functions or data of the application on parallel programs spend most of their time several processors. If some of the processors doing computations and communicate in- crash, the others can continue the job. frequently; fine-grain parallel programs Some fault-tolerant applications may communicate more frequently. also be run on other multiple-processor ar- Fine-grain parallelism and medium-grain chitectures that can survive partial failures parallelism are best applied to closely cou- (e.g., shared-memory multiprocessors). pled distributed systems; on loosely coupled However, if the system must survive natu- systems, the communication overhead be- ral disasters like fires, earthquakes, and comes prohibitively expensive. The litera- typhoons, one might want the processors ture contains numerous papers discussing to be geographically distributed. To imple- applications that can benefit from this kind ment a highly reliable banking system, for of parallelism. Recent introductory papers example, loosely coupled or very loosely on this subject are Athas and Seitz [1988] coupled distributed systems might be the and Ranka et al. [ 19881. obvious choice. Large-grain parallelism, on the other Research in this area has focused mainly hand, is suitable for both closely and loosely on software techniques for realizing the coupled distributed systems3 Most of the potential increase in reliability and availa- research in this area has focused on imple- bility. Example projects are Circus [Cooper menting large-grain parallel applications 19851, Clouds [LeBlanc and Wilkes 19851, on top of existing distributed operating sys- Argus [Liskov 19881, and Camelot [Spector tems [Tanenbaum and van Renesse 19851. et al. 19861.

3 If the grain of parallelism is large enough, even very 1.1.3 Applications Using Functional loosely coupled distributed systems might be consid- Specialization ered for running parallel applications. Recently, an international project was undertaken to find the prime Some applications are best structured as a factors of a loo-digit number. The problem was solved in parallel using 400 computers located at research collection of specialized services. A distrib- institutes on three different continents (New York uted operating system like Amoeba, for ex- Times, Oct. 12, 1988). ample, may provide a file service, a print

ACM Computing Surveys, Vol. 21, No. 3, September 1989 266 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum service, a, service, a terminal serv- sequential programming: ice, a time service, a boot service, and a gateway service [Tanenbaum and van (1) The use of multiple processors. (2) The cooperation among the processors. Renesse :1985]. It is most natural to imple- ment such an application on distributed (3) The potential for partial failure. hardware. Each service can use one or more Each is discussed below dedicated. processors, as this will give good Distributed programs execute pieces of performa.nce and high reliability. The serv- their code in parallel on different proces- ices can send requests to each other across sors. High-performance applications use the network. If new functions are to be this parallelism for achieving speedups. added or if existing functions need extra Here, the goal is to make optimal use of the compute power, it is easy to add new pro- available processors; decisions regarding cessors. As all processors can communicate which computations are to run in parallel through the network, it is easy to share are of great importance. In fault-tolerant special resources like printers and tape applications, decisions to perform func- drives. tions on different processors are based on increasing reliability or availability. For special-function and inherently distributed 1.1.4 inherently Distributed Applications applications, functions may be performed on a given processor because it has certain Finally, there are applications that are capabilities or contains needed data. The inherently distributed. One example is first requirement for distributed program- sending electronic mail between people’s ming support is therefore the ability to workstations. The collection of worksta- assign different parts of a program to be tions can be regarded as a distributed com- run on different processors. puting system, so the application () The processes of a distributed system has to run on distributed hardware. Simi- need to cooperate while executing a dis- larly, a company with multiple offices and tributed application. With parallel appli- factories may need to set up a distributed cations, processes sometimes have to system so that people and machines at dif- exchange intermediate results and syn- ferent sites can communicate. chronize their actions. In a system that controls an automated factory, for example, processors have to keep an eye on each 1.2 Requirements for Distributed other to detect failing processors. The serv- Programming Support ices of a distributed operating system will We have described a spectrum of distrib- need each other’s assistance: A process uted architectures and several kinds of service, for example, may need the help of applications that may be run on such hard- a file service to obtain the binary image file ware. We now address the issue of how of a process. With distributed electronic these applications are to be implemented mail, messages have to be forwarded be- on these architectures. We refer to this tween processess. In all these examples, activity as distributed programming. processess must be able to communicate Distributed programming requires sup- and synchronize with each other, a second port in several areas. We first give the requirement for distributed programming requirements for distributed programming support. support and then discuss the disadvantages In a uniprocessor system, if the CPU of having these provided by the operating fails, all work ceases instantly. But in a system. Later on, we will see how program- distributed system some CPUs may fail ming languages designed for distributed while others continue. This property can systems can fulfill these requirements. be used to write programs that can tolerate There are basically three issues that dis- hardware failures. This is particularly im- tinguish distributed programming from portant for fault-tolerant applications, but

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 267 it is desirable for other applications as well. designed for distributed programming, on For a distributed computer chess program the other hand, could do the conversion that participates in a tournament, for ex- automatically. ample, the ability to survive processor fail- Using a special language for distributed ures is highly useful. The third and final programming also gives other advantages, requirement for distributed programming such as improved readability, portability, support, therefore, is the ability to detect and static type checking. Finally and most and recover from partial failure of the importantly, a language may present a pro- system. gramming model that is higher level, more. Ideally, programming support for imple- abstract, than the message passing model menting distributed applications must ful- supported by most operating systems. Sev- fill all three of these requirements. The eral such models are discussed in this paper. support may either be provided by the (dis- tributed) operating system or by a language 1.3 Languages for Distributed Programming especially designed for distributed pro- gramming. In the first case, applications A central question encountered by devel- are programmed in a sequential language opers of distributed software is, “Given a extended with library routines that invoke certain application that has to be imple- operating-system primitives. As a disad- mented on a certain distributed computing vantage of this approach, the control struc- system, which programming language tures and data types of the sequential should be used?” A language can be consid- language are usually inadequate for distrib- ered as a candidate if uted programming. Below, we consider two (1) the language is suitable for the appli- examples of friction between sequential cation, and and distributed programming. (2) the language can be implemented with Simple actions, like forking off a subpro- reasonable efficiency on the given cess or receiving a message from a specific hardware. sender, can be expressed relatively easily through library calls. But problems arise, A maze of languages for distributed pro- for example, if a process wants selectively gramming has evolved during the past dec- to receive a message from one of a number ade, making the choice of the most suitable of other processes, where the selection language a difficult one. Most importantly, criteria depend on, say, the state of the the underlying models of the languages dif- receiver and the contents of the message. fer widely. Below, we look at several such Although concise programming notations models. We begin by describing the exist for such cases (e.g., the select state- model, which is characterized by the use of ment discussed in Section 2.2.3), it would processes, message passing, and explicit probably take a number of complicated failure detection. Next, we look at alterna- library calls to convey such a request to the tive ways for dealing with parallelism, com- operating system. munication, and processor failures. Problems with data types arise if one The most basic model is that of a group tries to pass a complex data structure as of sequential processes running in parallel part of a message to a remote process. As and communicating through message pass- the operating system does not know how ing. This model directly reflects the distrib- data structures are represented, it is unable uted architecture, consisting of processors to pack the data structure into a network connected through a communications net- packet (i.e., a sequence of ). Instead, work. Languages based on this model in- the has to write explicit code clude CSP and Occam.4 The language may that flattens the data structure into a ease the programming task in many ways, sequence of bytes on the sending end and that reconstructs the original data ’ All languages mentioned in this section are described structure on the receiving end. A language in Section 3.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 268 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum for example, by supporting different kinds nication primitive and have developed com- of message passing (as discussed in Section munication models that do not directly 2), by masking communication errors, and reflect the hardware communication model. by type checking the contents of messages. One step in this direction is to have proces- Such languages usually provide a simple sors communicate through a (generalized mechanism for detecting failures in proces- form of) procedure call [Birrell and Nelson sors (e.g., an exception is generated or an 19841; this approach is used in Distributed error returned on attempt to communicate Processes. A more fundamental break with with a faulty processor). An example of a message passing is achieved through com- language supporting such features is SR. munication models based on shared data. For many applications, this basic model Although implemented on a physically dis- of processes and message passing may tributed system, such shared data systems be just what is needed. The model can be are logically nondistributed. mapped efficiently onto the distributed ar- Let us make the following distinction chitecture, and it gives the programmer full between logical and physical distribution: control over the hardware resources (pro- As discussed above, distributed comput- cessors and network). For other applica- ing systems do not have shared memory; tions, however, the basic model may be too the hardware of such systems is physically low level. Therefore, several alternative distributed. Distributed systems can be models have been designed for parallelism, contrasted with multiprocessors or unipro- communication, and partial failures, which cessors, which have a single systemwide provide higher level abstractions. Below, we primary memory; these systems are physi- give some examples of other models. cally nondistributed. Several researchers have come to believe A similar distinction can be used for clas- that imperative (algorithmic) languages are sifying software systems, only here the dis- not the best ones for dealing with parallel- tinction concerns the logical distribution of ism. Because of the “one-word-at-a-time” the data used by the software, rather than von Neumann bottleneck [Backus 19781, the physical distribution of the memories. imperative languages are claimed to be For software systems the distinction is log- inherently sequential. This has led to ical rather than physical. We define a logi- research on parallelism in languages with cally distributed system as follows: inherent parallelism, like functional, logic, and objectoriented languages. The lack of Definition. A logically distributed soft- side effects in functional languages (like ware system consists of multiple software ParAlfl) allows expressions to be evaluated processes that communicate by explicit in any order, including in parallel. In logic message passing. languages, different parts of a proof proce- This is in contrast with a logically nondis- dure can be worked on in parallel, as ex- tributed software system, in which software emplified by Concurrent and processes communicate through shared PARLOG. Parallelism can also be intro- data. duced into object-oriented (or object-based) There are four different combinations of languages, by making objects active; this logical and physical distribution, each of approach is taken in Emerald. As a result, which is viable: models for expressing parallelism that are quite different from the basic model have (1) logically distributed software running been developed. The parallelism in these on physically distributed hardware, models is usually much more fine grain (2) logically distributed software running than in the basic model, however. These on physically nondistributed hardware, languages can be made suitable for large- (3) logically nondistributed software run- grain distributed architectures by supple- ning on physically distributed hard- menting them with mapping notations, as ware, and discussed in Section 2.1.2. (4) logically nondistributed software run- Likewise, some people are dissatisfied ning on physically nondistributed with message passing as the basic commu- hardware.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 269 Let us briefly examine each of these. The programmer from all these details, models first class is simple. A typical example is a have been suggested to make recovery from collection of processes, each running on failures easier. Ideally, the system should a separate processor and communicating hide all processor failures from the pro- using SEND and RECEIVE primitives grammer. Such models have in fact been that send messages over a network (e.g., a implemented [Borg et al. 19831. Alterna- Hypercube network, LAN, or WAN). The tively, the programmer can be given high- second class has the same logical multiple- level mechanisms for expressing which process structure, only now the physical processes and data are important and how message passing is simulated by imple- they should be recovered after a crash. Lan- menting message passing using shared guages that use this approach are Argus memory. The third class tries to hide the and Aeolus. physical distribution by making the system Which model of parallelism, interprocess look like it has shared memory with the cooperation, and is most programmer. Finally, the fourth class also appropriate for a certain application de- uses communication through shared data, pends very much on the application itself. only the existence of physical shared A distributed system that controls an air- memory makes the implementation much craft can probably do very well without easier. fancy constructs for parallelism. In a dis- In this paper we discuss languages for tributed banking system, the programmer physically distributed systems. Most of may want to “see” the distribution of the these languages are based on logical distri- hardware, so a language that hides this bution. Several others, however, are logi- distribution would be most inappropriate. cally nondistributed and allow processes to Finally, it makes no sense to apply expen- communicate through some form of shared sive techniques for fault tolerance to a par- data [Bal and Tanenbaum 19881. In such allel matrix-multiplication batch-program languages, the implementation rather than that takes only a few seconds to execute. the programmer deals with the physical On the other hand, there also are numerous distribution of data over several processors. cases where these models are useful. One example in this class is Linda, which In the next section, we survey current supports an abstract global memory called research in language models and notations the Tuple Space. Another example is Orca, for distributed programming. Although we which allows processes to share variables discuss a variety of language primitives, one of abstract data types (objects). Other should keep in mind that all primitives are members of this class are parallel logic different solutions to the same three prob- languages (e.g., Concurrent PROLOG lems: dealing with parallelism, communi- and PARLOG) and parallel functional cation, and partial failures in distributed languages (e.g., ParAlfl). computing systems. The third important issue in the design of a model for distributed programming- besides parallelism and communication- 2. LANGUAGE SUPPORT FOR is handling of processor failures. The basic PROGRAMMING DISTRIBUTED SYSTEMS method for dealing with such failures is to provide a mechanism for failure detection. In the previous section, we discussed our With this approach, the programmer is definition of the term distributed computing responsible for cleaning up the mess that system and described the kinds of tasks that results after a processor crash. The major might profitably be applied to these sys- problem is to bring the system back into a tems. We outlined the support required for consistent state. This usually can only be programming such applications, and what done if processor crashes are anticipated kinds of languages might be expected to and precautions are taken during normal provide it. Before describing several of computations (e.g., each process may have these languages in detail in Section 3, we to dump its internal state on secondary take this section to discuss in a general storage at regular intervals). To release the way the methods that can be used by

ACM Computing Surveys, Vol. 21, No. 3, September 1989 270 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Table 1. Overview of Language-Primitives Discussed in Section 2 Primitive Example languages PARALLELISM Expressing parallelism Processes Ada, Concurrent C, Linda, NIL Objects Emerald, ConcurrentSmalltalk Statements Occam Expressions ParAlfl, FX-87 Clauses Concurrent PROLOG, PARLOG Mapping Static Occam, StarMod Dynamic Concurrent PROLOG, ParAlfl Migration Emerald COMMUNICATION Message passing Point-to-point messages CSP, Occam, NIL Rendezvous Ada, Concurrent C Remote procedure call DP, Concurrent CLU, LYNX One-to-many messages BSP, StarMod Data sharing Distributed data structures Linda, Orca Shared logical variables Concurrent PROLOG, PARLOG Nondeterminism Select statement CSP, Occam, Ada, Concurrent C, SR Guarded Horn clauses Concurrent PROLOG, PARLOG PARTIAL FAILURES Failure detection Ada, SR Atomic transactions Argus, Aeolus, Avalon Transparent fault tolerance NIL

programming languages to fulfill the re- tem has by definition more than one pro- quirements set out in the preceding section. cessor, it is possible to have more than one As mentioned above, there are three part of a program running at the same time. issues that must be addressed in designing This is what we mean by parallelism. a language for distributed programming, We begin by drawing a distinction be- above and beyond other programming lan- tween true parallelism and what we call guage issues. These are parallel execution, pseudoparallelism. It is sometimes useful to communication and synchronization be- express a program as a collection of pro- tween parallel parts of the program, and cesses running in parallel, whether or not exceptional conditions brought about by these processes actually run at the same partial failure of the system. As we shall time on different processors. For example, see, each of these issues may be addressed a given problem might lend itself well to to a greater or lesser degree in a given being expressed as several largely indepen- language, and may be resolved in quite dif- dent processes, running logically in paral- ferent ways, often depending on the class lel, even though the program may in fact of distributed application for which the lan- be run on a uniprocessor with only one guage is intended. Table 1 gives an overview piece of it running at a given moment in of the primitives described in this section, time. The MINIX operating system, for together with some examples of languages example, was built using this approach that use the primitives. [Tanenbaum 19871. We call this pseudo- parallelism.5 This technique has been 2.1 Parallelism ’ Some authors (e.g., Scott [1985]) use the term con- The first issue that must be dealt with in a currency for denoting pseudoparallel execution. Other language for distributed programming is authors use the term as a synonym for (real) parallel- parallel execution. Since a distributed sys- ism, however, so we will not use this term.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 271 employed in programming languages, munication costs, such as a hypercube; especially those intended for writing uni- whereas in a system with high com- processor operating systems, for quite some munication costs, such as a WAN, the time. communication cost of fine-grained par- Pseudoparallelism is just as useful in allelism may outweigh the gain in parallel distributed programming as it is in unipro- computation. cessor programming. But the difference Note that the fact of parallelism is dis- between true parallelism and pseudo- tinct from parallelism as an objectiue. That parallelism must be kept in mind, despite is, in some applications, a high degree of the fact that in some languages the distinc- parallelism is a goal, as it results in short- tion is hidden from the programmer. For ened computing time for an application. example, if a program consists of four pro- However, not all distributed applications cesses and is running on a distributed sys- have high parallelism as their main objec- tem of four or more available processors, tive. Yet, even in these latter applications, the four processes may run in truly parallel the ability to express parallelism may be fashion-one on each processor. On the important, since this reflects what is ac- other hand, the same program may be run- tually occurring in the distributed system. ning on a system with only two processors, Finally, not all languages support explicit in which case two processes may be as- control of parallelism. In some languages, signed to run on each of the two processors. the dividing up of code into parallel seg- In this case, there are two processes run- ments is done by the rather than ning in pseudoparallel on each processor. the programmer. Moreover, in some lan- At a given point in time, at most two of the guages the sending of a message on behalf program’s four processes are running truly of one process results in the implicit gen- in parallel. eration of another, parallel process on the In some languages, the distinction be- remote host to handle the request. tween parallelism and pseudoparallelism is Below we describe several ways in which not hidden from the programmer. It may parallelism can be expressed in program- be possible for the programmer to explicitly ming languages for distributed systems. We assign (or ) pieces of programs to pro- then discuss the mapping of parallel com- cessing units. This delivers more complex- putations to physical processors. For a ity into the hands of the programmer, but discussion of the expression of pseudopar- also provides more flexibility. For example, allelism, we refer the reader to Andrews given a language in which the programmer and Schneider [ 19831. controls the mapping of processes onto pro- cessors, it is possible to support shared variables among processes known to be 2.1.1 Expressing Parallelism running on the same processor, and to dis- Parallelism can be expressed in a variety of allow the sharing of variables between pro- ways. An important factor is the language’s cesses assigned to different processors. unit of parallelism. In a sequential language, This is the case with several languages dis- the unit of parallelism is the whole pro- cussed in Section 3 (e.g., SR, Argus). gram. In a language for distributed pro- The granularity of parallelism varies gramming, however, the unit of parallelism from language to language, as mentioned can be a process, an object, a statement, an above. The unit of parallelism in a language expression, or a clause (in logic languages). ranges from the process (e.g., in Concurrent We discuss each of these in turn, beginning C!) to the expression (in ParAlfl and oth- with the process, as it is intuitively the most ers). In general, the higher the cost of com- obvious. munication in a distributed system, the larger the appropriate granularity of par- Processes. In most procedural languages allelism. For example, it may be possible to for distributed programming, parallelism is efficiently support fine-grained parallelism based on the notion of a process. Different in a distributed system with low com- languages have different definitions of this

ACM Computing Surveys, Vol. 21, No. 3, September 1989 272 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum notion, but in general a process is a logical itself. The behavior of an object is defined processor that executes code sequentially by its class, which comprises a list of oper- and has its own state and data. Processes ations that can be invoked by sending a (or process types) are declared, just like message to the object. Inheritance allows a procedures (and procedure types). class to be defined as an extension of an- Processes are created either implicitly by other (previously defined) class. Languages their declaration or explicitly by some cre- that support objects but lack inheritance ate construct. With implicit creation, one are usually said to be object based. usually first declares a process type and Objects are primarily intended for struc- then creates processes by declaring vari- turing programs in a clean and understand- ables of that type. Often, arrays of pro- able way, reflecting the structure of the cesses may be declared. In some languages problem to be solved as much as possible. based on implicit process creation, the total At least two different opinions exist on number of processes is fixed at compile what should be treated as an object. The time. This makes the efficient mapping of -806 view is simply to consider processes onto physical processors easier, everything an object, even integers and but it imposes a restriction on the kinds of Booleans [Goldberg and Robson 19831. The applications that can be implemented in second view (e.g., taken in Aeolus [Wilkes the language, since it requires that the and LeBlanc 19861) is less pure and lets number of processes be known in advance. decide what objects are. Having an explicit construct for creating Parallelism in object-oriented languages processes allows more flexibility than can be obtained in one of two ways. implicit process creation. For example, the Smalltalk- includes the traditional no- creation construct may allow parameters to tion of a process and lets the programmer be passed to the newly created process. deal with two kinds of modules: objects and These are typically used for setting up com- processes. A more orthogonal approach munication channels between processes. If is to use the object itself as the unit of processes do not take parameters (as in Ada parallelism. [U.S. Department of Defense 1983]), the Sequential object-oriented languages are parameters have to be passed to the newly based on a model of passive objects. An created process using explicit communica- object is activated when it receives a mes- tion. A mechanism is needed to set up the sage from another object. While the re- communication channel over which the ceiver of the message is active, the sender parameters are sent. is waiting for the result, so the sender is Another important issue is termination passive. After returning the result, the of processes. Processes usually terminate receiver becomes passive again and the themselves, but some primitive may be pro- sender continues. At any point of time, only vided to abort other processes too. Some one object in the system is active. Parallel- precautions may be needed to prevent pro- ism can be obtained by extending the cesses from trying to communicate with a sequential object model in any of the fol- terminated process. In Section 2.2.3 we dis- lowing ways: cuss mechanisms for cooperative termina- tion of m.ultiple processes. (1) Allow an object to be active without having received a message, Objects. The notion object-oriented pro- (2) allow the receiving object to continue gramming causes as much confusion as the execution after it returns its result, term distributed system. In general, an ob- (3) send messages to several objects at ject is a self-contained unit that encapsu- once, or lates both data and behavior, and that (4) allow the sender of a message to pro- interacts with the outside world (i.e., other ceed in parallel with the receiver. objects) exclusively through some form of message passing. The data contained in the object are visible only within the object B Smalltalk- is a trademark of ParcPlace Systems.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 273 Methods (1) and (2) effectively assign a or pointer variables. Procedural languages parallel process to each object, resulting in are claimed to be more flexible, whereas a model based on active objects. Method (4) functional languages have a sounder math- can be implemented using asynchronous ematical basis. We will not enter into the message passing (instead of synchronous holy war between these two schools of message passing) or by letting a single ob- thought, but we will concentrate on the way ject consist of multiple threads of control. functional languages can be used for pro- gramming distributed systems. Parallel Statements. Another way of ex- If functions do not have any side effects, pressing parallelism is by grouping together it makes no difference (except perhaps for statements that are to be executed in par- termination) in which order they are exe- allel. Occam [Inmos Ltd. 19841 allows con- cuted. For example, in the expression secutive statements to be executed either sequentially, as in Mf (3, 41, g(8))

SEQ it is irrelevant whether f or g is evaluated Sl first. Consequently, it is possible to evalu- s2 ate f and g in parallel. In principle, all function calls can be executed in parallel, or in parallel, as in the following: the only restriction being that a function using the result of another function wait PAR for that result to become available (e.g., h Sl waits for f and g). This s2 is fine grained and is well suited for achi- tectures supporting such parallelism, such This method is easy to use and understand. as data-flow computers. Several data-flow Initiation and termination of parallel com- languages are based on this principle, for putations are well defined. However, this example, Id and VAL [Ackerman 19821. method gives little support for the struc- For distributed systems (and to some ex- turing of large parallel programs. tent also for other architectures), the func- The parallel statement described above tional approach has some problems that creates only a fixed number of parallel need to be resolved. First of all, blindly units. Another method is to use a parallel evaluating all functions in parallel is not a loop statement. Occam contains a parallel very good idea. If a function does relatively statement, similar to a traditional for for little work (such as adding two integers), statement, except that all iterations of the the overhead of doing it in parallel and loop are executed in parallel, as in the communicating the result back to the caller following: will far outweigh the savings in elapsed PARi=OFORn computation time. If a certain function call A[i] := A[i] + 1 is selected for remote execution, there still remains the choice between evaluating Although this construct is easy to use, it is its arguments either locally (and then not as general as other mechanisms. sending them to the remote processor) or remotely (by dispatching the unevaluated Functional Parallelism. In a pure func- expressions). tional (applicative) language, functions be- Ideally the compiler should analyze the have as mathematical functions: They program and decide on which processor to compute a result that depends only on the perform each function call. Since current values of their input data. Such functions are not yet capable of taking do not have any side effects. In contrast, maximum advantage of parallelism in this procedural (imperative) languages allow way, mechanisms have been proposed to functions to affect each other in various put control in the hands of the programmer ways, for example, through global variables [Burton 1984; Hudak 19861.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 274 . H. E. Bal, J. G. Steiner, and A. S. Tanenbaum AND/QR-Parallelism. Logic program- are very lightweight and similar in granu- ming offers many opportunities for paral- larity to a procedure call in a procedural lelism [Takeuchi and Furukawa 19861. We language. describe AND/OR parallelism, as this If the goals of a clause share some vari- mechanism is suitable for distributed ables, they cannot be evaluated indepen- programming and has been incorporated dently, because conflicts may arise when into many parallel logic programming several goals try to generate a value for a languages. shared variable. For example, in the clause Logic programs can be read declaratively as well as procedurally. In the code below, A :- B(X), C(X) two clauses for the predicate A are given: the variable X creates a dependency be- (1) A :- B, C, D. tween the goals B and C. Several ap- (2) A :- E, F. proaches have been suggested to deal with The declarative reading of the clauses is “if this problem. One method is to have the B, C, and D are true, then A is true” (Clause programmer restrict the rights of goals to (1)) and “if E and F are true, then A is instantiate (or bind) shared variables. In true” (Clause (2)). Procedurally, the clauses Concurrent PROLOG [Shapiro 19861, the can be interpreted as “to prove theorem A, notation you either have to prove subtheorems (or goals) B, C, and D, or you have to prove A :- B(X), C(X?) subtheorems E and F.” From the proce- dural reading, it becomes clear that there indicates that B is allowed to generate a are two opportunities for parallelism: binding for X, but C is only allowed to read X. This mechanism can be used for inter- (1) The two clauses for A can be worked process communication and synchroniza- on in parallel, until one of them suc- tion, as discussed in Section 2.2.2. Another ceeds, or both fail. method for dealing with conflicts is to solve (2) For each of the two clauses, the sub- dependent goals sequentially. In general, theorems can be worked on in parallel, both compile-time analysis and run-time until they all succeed, or any one of checks are used to determine if two clauses them fails. are independent. Both solutions-re- The former kind of parallelism is called stricted instantiation and sequential solu- OR-parallelism; the latter is called AND- tion of dependent goals-necessarily parallelism. restrict parallelism. The parallel execution of a logic program can also be described in terms of processes, 2.1.2 Mapping Parallel Computations onto resulting in a third interpretation, the process reading, of logic programs. If we Physical Processors associate a separate process with every sub- In the previous section, we described sev- theorem to be proved, then Clause (1) sim- eral ways in which languages for distributed ply states that a process trying to prove A programming can provide support for the can be :replaced by three parallel pro- expression of parallelism. A related issue is cesses that try to prove B, C, and D. In how these parallel computations are dis- general, a clause like tributed over the available physical proces- PO :- , . . . , PN sors, in other words, which parallel unit is executed on which processor at a given causes a single process to be replaced by N point in time. We refer to the assignment other processes. If N = 0, the original pro- of computations to processors as mapping. cess terminates. For N = 1, the process Some languages give the programmer con- effectively changes its state, going to work trol over mapping, and in this section we on a different goal. If N > 1, then (N - 1) describe some ways in which this can be new processes are created. Such processes expressed.

ACM Computing Surveys, Vol. 21, No. 3. September 1989 Programming Languages for Distributed Computing Systems l 275 Mapping strategies vary depending on Programmable (i.e., user-controlled) the application to be implemented. The mappings usually consist of two steps. In assignment of processes to processors will the first step, the parallel units are mapped be quite different in an application whose onto the physical processors. Several par- objective is to obtain maximum speedup allel units may be mapped onto the same through parallelism, and in an application processor. In the second step, the units on whose objective is to obtain high availabil- the same processor are scheduled by a local ity through , for example. mapping, usually based on priorities as- When the goal of a distributed program signed to the parallel units. is to speed up computation time through There are three approaches for assigning parallelism, the mapping of processes to parallel units to processors, whether the processors is similar to load balancing in assignment is done by the programmer or distributed operating systems: Both at- the system: The processor can either be tempt to maximize parallelism through fixed at compile time, fixed at run time, or efficient use of available computing power. not fixed at all. The first method is least But there are important differences. An flexible, but has the distinct advantage that operating system tries to distribute the it is known at compile time which parallel available processing power fairly among units will run on the same processor, allow- competing processes from different pro- ing the programmer to take advantage of grams and different users. It may try to the fact that these processes will have reduce communication costs by having shared memory available. StarMod uses the processes that communicate frequently run notion of a processor module that groups in pseudoparallel on the same processor. together processes located on the same pro- The goal of mapping, however, is to mini- cessor [Cook 19801. These processes are mize the execution time of a single distrib- allowed to communicate through shared uted program. As all parallel units are part variables, whereas communication between of the same program, they are cooperating processes on different processors is re- rather than competing, so fairness need not stricted to message passing. be an issue. In addition, the reduction of With the run-time approach to mapping communication overhead achieved through computations to processors, a parallel unit mapping processes to the same processor is assigned to a processor when that unit is must be weighed against the resulting loss created. An example is the Turtle notation of parallelism [Kruatrachue and Lewis designed by Shapiro for executing Concur- 19881. rent PROLOG programs on an infinite grid If the application’s goal is to increase of processors, where each processor can fault tolerance, an entirely different map- communicate with its four neighbors ping strategy may be taken. Processes may [Shapiro 19841. Every process has a posi- be replicated to increase availability. The tion and a heading, just like a Turtle in the mapping strategy should at least assign LOGO programming language [Papert the replicas of the same logical process to 19811. By default, the position and head- different physical processors. ing of a process are those of its parent An important choice in the design of a (creator), but they can be altered using a parallel language is whether mapping will sequence of Turtle commands. For exam- be under user control. If not, mapping is ple, if a process located on Processor P and done transparently by the compiler and heading northward uses the rule language run-time system, possibly assisted by the operating system. At first sight, this A :- B, C @ (left, forward), may ease the programmer’s task, but the D @ (right, forward). system generally does not have any knowl- edge about the problem being implemented, to solve A, then Process B is created on so problem-specific mapping strategies Processor P, Process C is created on the would be ruled out. This is a severe restric- processor to the west of P, and Process D tion for many applications. is created on the processor to the east of P.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 276 I. H. E. Bal, J. G. Steiner, and A. S. Tanenbaum B is headed northward, C westward, and D from a consumer process to add an item eastward.7 whenever the buffer is not empty. To pro- Only a few languages support the third gram such behavior, a notation is needed approach to processor allocation, allowing to express and control nondeterminism. We a process to execute on different processors look at such notations in Section 2.2.3. during its lifetime. Emerald, for example, Expression of interprocess communica- is an object-based language that allows tion (IPC)8 falls into two general cate- objects ,to migrate from one processor to gories-shared data and message passing- another [Jul et al. 19881. The language has although this categorization is not always primitives to determine the current loca- clear-cut. Parallel logic languages that pro- tion of an object, to fix or unfix an object vide shared logical variables, for example, on a specific processor, and to move an are frequently used for programming in a object to a different processor. message-passing style (see Section 2.2.2). Note that the model provided by the lan- 2.2 Interprocess Communication and guage for expressing IPC and the imple- Synchronization mentation of that model may be two entirely different things; in particular, since The second issue that must be addressed in we restrict our discussion to languages for the design of a language for distributed systems without shared memory, any programming is how the pieces of a pro- shared data model must be simulated in the gram that are running in parallel on differ- language implementation. ent processors are going to cooperate. This cooperation involves two types of interac- tion: communication and synchronization. 2.2.1 Message Passing For example, Process A may require some data X t.hat is the result of some computa- We first discuss communication through tion performed by Process B. There must message passing. Many factors come into be some way of getting X from B to A. In play in the sending of a message: who sends addition, if Process A comes to the point in it, what is sent, to whom is it sent, is it its execution that requires the information guaranteed to have arrived at the remote X from Process B, but Process B has not host, is it guaranteed to have been accepted yet communicated the information to A for by the remote process, is there a reply (or whatever reason, A must be able to wait several replies), and what happens if some- for it. Synchronization and communication thing goes wrong. There are also many mechanisms are closely related, and we considerations involved in the receipt of a treat them together. message: for which process or processes on An issue related to synchronization is the host, if any, is the message intended; is nondeterminism. A process may want to a process to be created to handle this mes- wait for information from any of a group of sage; if the message is intended for an ex- other processes, rather than from one spe- isting process, what happens if the process cific process. As it is not known in advance is busy-is the message queued or dis- which member (or members) of the group carded; and if a receiving process has more will have its information available first, than one outstanding message waiting to such behavior is nondeterministic. In some be serviced, can it choose the order in which cases it i.s useful to dynamically control the it services messages-be it FIFO, by sender, group of processes from which to take by some message type or identifier, by the input. For example, a buffer process may contents of the message, or according to accept a request from a producer process to the receiving process’s internal state. store an item in the buffer whenever the buffer is not full; it may accept a re.quest ‘We adopt the well-known term interprocess commu- nication although it is somewhat misleading, since the ‘This Turtle notation was later generalized into a unit of parallelism is not always the process, as has layered method, using virtual machines [Taylor et al. been discussed above. In the rest of this section, we 1987a]. The layered method is also suitable for other will use the term process as a shorthand for unit of architectures than a processor grid. parallelism.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 277 We begin with a general discussion of Some languages give the programmer con- issues common to all message-passing trol over the order of message acceptance. mechanisms. We then outline four specific Usually, messages are accepted in FIFO message-passing models: point-to-point order, but occasionally it is useful to change messages, rendezvous, remote procedure this order according to the type, sender, or call, and one-to-many messages. contents of a message. For example, the file server may want to handle read requests General Issues. The most elementary for small amounts of data first: primitive for message-based interaction is the point-to-point message from one pro- accept read( f, offset, nr-bytes) cess (the sender) to another process (the by nr-bytes ( receiver). Languages usually provide only . . . reliable message passing. The language process read request run-time system (or the underlying oper- . . . ating system) automatically generates ac- knowledgment messages, transparent at the language level. The value given in the by expression Most (but not all) message-based inter- determines the order of acceptance. If con- actions involve two parties, one sender and ditional or ordered acceptance is not sup- one receiver. The sender initiates the inter- ported by the language, an application action explicitly, for example, by sending a needing these features will have to keep message or invoking a remote procedure. track of requests that have been accepted On the other hand, the receipt of a message but not handled yet. may either be explicit or implicit. With ex- Another major issue in message passing plicit receipt, the receiver is executing some is naming (or addressing) of the parties sort of accept statement specifying which involved in an interaction: to whom does messages to accept and what actions to the sender wish to send its message and, undertake when a message arrives. With conversely, from whom does the receiver implicit receipt, code is automatically in- wish to accept a message? These parties voked within the receiver. It usually creates can be named directly or indirectly. Direct a new of control within the receiving naming is used to denote one specific pro- process. Whether the message is received cess. The name can be the static name of implicitly or explicitly is transparent to the the process or an expression evaluated at sender. run time. A communication scheme based Explicit message receipt gives the re- on direct naming is symmetric if both the ceiver more control over the acceptance of sender and the receiver name each other. messages. The receiver can be in many In an asymmetric scheme, only the sender different states and accept different types names the receiver. In this case, the re- of messages in each state. More accurate ceiver is willing to interact with any sender. control is possible if the accept statement Note that interactions using implicit re- allows messages to be accepted condition- ceipt of messages are always asymmetric ally, depending on the arguments of the with respect to naming. Direct naming message (as in SR [Andrews 19811 and schemes, especially the symmetric ones, Concurrent C [Gehani and Roome 19891). leave little room for expressing nondeter- A file server, for example, may want to ministic behavior. Languages using these accept a request to open a file only if the schemes therefore have a separate mecha- file is not locked. In Concurrent C this can nism for dealing with nondeterminism (see be coded as follows: Section 2.2.3). Indirect naming involves an intermediate accept open( f ) suchthat not-locked(f) ( object, usually called a mailbox, to which . . . the sender directs its message and to which process open request the receiver listens. In its simplest form, a . . . mailbox is just a global name. More ad- 1 vanced schemes treat mailboxes as values

ACM Computing Surveys, Vol. 21, No. 3, September 1989 278 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum that can be passed around, for example, as In the synchronous model, there can be part of a message. This option allows highly only one pending message from any process flexible communication patterns to be S to a process . Usually, no ordering re- expressed. Mailing a letter to a post office lation is assumed between messages sent box rather than a street address illustrates by different processes. Buffering problems the difference between indirect and direct are less severe in the synchronous model, naming. A letter sent to a post office box as a receiver need buffer at most one mes- can be collected by anyone who has a key sage from each sender, and additional flow to the box. People can be given access to control will not change the semantics of the the box by duplicating keys or by transfer- primitive. On the other hand, the synchro- ring existing keys (possibly through an- nous model also has its disadvantages. other P.O. box). A street address, on the Most notably, synchronous message pass- other hand, does not have this flexibility. ing is less flexible than asynchronous message passing, because the sender always Synchronous and Asynchronous Point- has to wait for the receiver to accept the to-Point Messages. The major design issue message, even if the receiver does not have for a point-to-point message-passing sys- to return an answer [Gehani 19871. tem is the choice between synchronous and asynchronous message passing. With syn- Rendezvous. A point-to-point message chronous message passing, the sender is establishes one-way communication be- blocked until the receiver has accepted the tween two processes. Many interactions be- message (explicitly or implicitly). Thus, the tween processes, however, are essentially sender and receiver not only exchange data, two-way in nature. For example, in the but they also synchronize. With asynchron- client/server model the client requests a ous message passing, the sender does not service from a server and then waits for the wait for the receiver to be ready to accept result returned by the server. This behavior its message. Conceptually, the sender con- can be simulated using two point-to-point tinues immediately after sending the mes- messages, but a single higher level con- sage. The implementation of the language struct is easier to use and more efficient to may suspend the sender until the message implement. We will describe two such con- has at least been copied for transmis- structs, rendezvous and remote procedure sion, but this delay is not reflected in the call. semantics. The rendezvous model is based on three In the’ asynchronous model, there are concepts: the entry declaration, the entry some semantic difficulties to be dealt with. call, and the accept statement.g The entry As the sender S does not wait for the re- declaration and accept statement are part ceiver R to be ready, there may be several of the server code, while the entry call is on pending messages sent by S, but not yet the client side. An entry declaration syn- accepted by R. If the message-passing prim- tactically looks like a procedure declara- itive is order preserving, R will receive the tion. An entry has a name and zero or more messages in the order they were sent by S. formal parameters. An entry call is similar The pending messages are buffered by the to a procedure call statement. It names the language run-time system or the operating entry and the process containing the entry, system. The problem of a possible buffer and it supplies actual parameters. An accept overflow can be dealt with in one of two siatement for the entry may contain a list ways. Message transfers can simply fail of statements, to be executed when the whenever there is no more buffer space. entry is called, as in the following accept Unfortunately, this makes message passing statement for the entry incr: less reliable. The second option is to use accept incr(X: integer; Y: out integer) flow control, which means the sender is do Y:=X+ 1; blocked until the receiver accepts some end; messages. This introduces a synchroniza- tion between the sender and receiver and may result in unexpected . ’ Here we use the terminology introduced by Ada.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 279 An interaction (called a rendezvous) be- and van Renesse 19881. One source of tween two processes S and R takes place problems is that, in the absence of shared when S calls an entry of R, and R executes memory, pointers (address values) are an accept statement for that entry. The meaningless on a remote processor. This interaction is fully synchronous, so the first makes pointer-valued parameters and call- process that is ready to interact waits for by-reference parameters highly unattrac- the other. When the two processes are syn- tive. De-referencing a pointer passed by the chronized, R executes the do part of the caller has to be done at the caller’s side, accept statement. While executing these which implies extra communication. An statements, R has access to the input pa- alternative implementation is to send a rameters of the entry, supplied by S. R can copy of the value pointed at the receiver, assign values to the output parameters, but this has subtly different semantics and which are passed back to S. After R has may be difficult to implement if the pointer executed the do statements, S and R con- points into the middle of a complex data tinue their execution in parallel. R may structure, such as a directed graph. In lan- still continue working on the request of S, guages lacking strong type checking, it although S is no longer blocked. may not even be clear what type of object the pointer points to. Similarly, call-by- Remote Procedure Call. Remote proce- reference can be replaced by copy-in/copy- dure call (RPC) is another primitive for out, but also at the cost of slightly different two-way communication. It resembles a semantics. The issue of passing arguments normal procedure call, except that the to a remote procedure is discussed further caller and receiver are different processes. by Herlihy and Liskov [1982]. When a process S calls a remote procedure The possibility of processor crashes P of a process R, the input parameters of makes it even more difficult to obtain the P, supplied by S, are sent to R. When R same semantics for RPC as for normal pro- receives the invocation request, it executes cedures. If S calls a remote procedure P of the code of P and then passes any output a process R and the processor of R crashes parameters back to S. During the execution before S gets the results back, then S of P, S is blocked. S is reactivated by the clearly is in trouble. First, the results S is arrival of the output parameters. This is in waiting for will never arrive. Second, it is contrast with the rendezvous mechanism, not known whether R died before receiving where the caller is unblocked as soon as the the call, during the execution of P, or after accept statement has been executed. Like executing P (but before returning the rendezvous, RPC is a fully synchronous results). The first problem can be solved interaction. Acceptance of a remote call is using time-outs. The second problem is usually (but not always) implicit and cre- more serious. If P has no side effects, the ates a new thread of control within the call can be repeated, perhaps on a different receiver. processor or after a certain period of time. A major design choice is between a trans- If P does have side effects (e.g., increment- parent and a nontransparent RPC mecha- ing a bank account in a ), execut- nism. Transparent RPC offers semantics ing (part of) P twice may be undesirable. close to a normal procedure. This model, Because of these difficulties in achieving advocated by Nelson and Birrell, has sig- normal call semantics for remote calls, nificant advantages [Nelson 1981; Birrell Hamilton argues that remote procedures and Nelson 19841. Foremost, it gives the should be treated differently from the start, programmer a simple, familiar primitive for resulting in a nontransparent RPC mech- interprocess communication and synchro- anism [Hamilton 19841. Almes describes an nization. It also is a sound basis for porting RPC implementation in the context of an existing sequential software to distributed existing language (Modula-2) and distrib- systems. uted operating system (the system) Unfortunately, achieving exactly the [Almes 19861. Although the goal of the same semantics for RPC as for normal pro- implementation was to make remote calls cedures is close to impossible [Tanenbaum as similar to normal calls as possible,

ACM Computing Surveys, Vol. 21, No. 3, September 1989 280 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

special features for remote calls had to be ming [Gehani 1984b]. In CSP a message is added to obtain an efficient implementa- sent to one specific process. In BSP a mes- tion. Almes’s RPC system therefore is also sage can also be sent to all processes or nontransparent. to a list of processes. Both primitives are reliable (i.e., messages are delivered at all One-to-Many Message Passing. Many destinations). If the underlying hardware is networks used for distributed computing not reliable, extra software protocols have systems support a fast broadcast or multi- to be added by the operating system or cost facility. A broadcast message is sent to language run-time system. Broadcast in all processors connected to the network. A BSP is asynchronous, because the sender multicast message is sent to a specific sub- normally does not want to wait until all set of these processors. It takes about the other processes are ready to receive a mes- same time to broadcast or multicast a mes- sage. Two forms of broadcast are defined. sage as to send it to one specific processor. An unbuffered broadcast message is only Unfortunately, it is not guaranteed that received by those processes ready to accept messages are actually delivered at all des- one. Buffered broadcast messages are buff- tinations. The hardware attempts to send ered by the receiving processes, so each the messages to all processors involved, process will eventually receive the message. but messages may get lost due to com- A receiver may accept messages from any munication errors or because some re- process, or it may screen out messages ceiving processors were not ready to accept based on their contents or on the identity a message. of the sender (passed as part of the Despite being unreliable, broadcast and message). multicast are useful for operating system kernels and language run-time systems. For 2.2.2 Data Sharing example, to locate a processor providing a specific service, an enquiry message may be In the previous section, we discussed broadcast. In this case, it is not necessary models of interprocess communication to receive an answer from every host: Just based on message passing. In this section, finding one instance of the service is suf- we describe how parts of a distributed pro- ficient. Broadcast and multicast are also gram can communicate and synchronize useful for implementing distributed algo- through the use of shared data. If two pro- rithms, so some languages provide a one- cesses have access to the same variable, to-many message-passing primitive. communication can take place by one pro- One-to-many communication has several cess setting the variable and the other advantages over point-to-point message process reading it. This is true whether passing. If a process needs to send data to the processes are running on the host where many other processes, a single multicast the variable is stored and can manipulate will be faster than many point-to-point it directly, or if the processes are on differ- messages. More importantly, a broadcast ent hosts and access the variable by sending primitive may guarantee a certain ordering a message to the host on which it resides. of messages that cannot be obtained easily The use of shared variables for the com- with point-to-point messages [Birman and munication and synchronization of pro- Joseph 19871. A broadcast primitive that cesses running in pseudoparallel on a delivers messages at all destinations in the uniprocessor has been studied extensively. same order, for example, is highly useful We assume a familiarity with this material; for consistent updating of replicated data the uninitiated reader is referred to [Joseph and Birman 1986; Bal and Tanen- Andrews and Schneider [1983] for an ex- baum 19881. Finally, broadcasting may lead cellent overview. to new programming styles. As mentioned above, many distributed Gehani describes a system of Broadcast- languages support processes running in ing Sequential Processes (BSP) based on pseudoparallel on the same processor, and CSP and the concept of broadcast program- these often use traditional methods of com-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 281 munication and synchronization through physical distribution of main memory, as shared variables. See, for example, the discussed in Section 1. description of mutex in Argus and sema- Note that objects, whose role in express- phores in SR in Section 3. What we are ing parallelism was discussed in Section interested in here, however, is the use of 2.1.1, may also be thought of as imple- shared data for communication and syn- menting shared data in a distributed pro- chronization of processes running on gram. Just as with the shared data models different processors, that are discussed in this section, two pro- At first sight it may seem to be unnatural cesses may communicate indirectly with to use shared data for communication in a one another by invoking operations on a distributed system, as such systems do not given object. Objects, since they control have physically shared memory. However, access to the data they manage, can also the shared data paradigm has several ad- implement synchronization of access to vantages (as well as disadvantages) over those data by other processes, analogously message passing [Bal and Tanenbaum to the synchronization of pseudoparallel 19881. Whereas a message generally trans- processes accessing data controlled by a fers information between two specific pro- monitor. cesses, shared data are accessible by any A different approach to the synchroni- process. Assignment to shared data concep- zation of distributed access to shared data tually has immediate effect; in contrast, is taken by languages that implement there is a measurable delay between send- atomic transactions. Since this approach ing a message and its being received. On also involves dealing with partial failures the other hand, shared data require precau- of the distributed systems, we will treat it tions to prevent multiple processes from in the section on atomic transactions. simultaneously changing the same data. As neither of the paradigms is universally bet- Distributed Data Structures. Distributed ter than the other one, both paradigms are data structures are data structures that can worth investigating. be manipulated simultaneously by several Simple shared variables, as used, for ex- processes [Carrier0 et al. 19861. This para- ample, in Algol 68 [van Wijngaarden et al. digm was first introduced in the language 19751, are not well suited for distributed Linda, which uses the concept of a Tuple systems. In principle, they can be imple- Space for implementing distributed data mented by simulating shared physical structures [Ahuja et al. 19861. We will use memory, using, for example, a method such the Tuple Space model for discussing the as Li’s shared virtual memory [Li and distributed data structures paradigm. Hudak 19861. None of the languages we The Tuple Space (TS) is conceptually a know of does this, however, probably due to shared memory, although its implementa- performance considerations. Several other tion does not require physical shared mem- communication models based on shared ory. The TS is one global memory shared data exist, however, that are better suited by all processes of a program [Gelernter for distributed systems. These models place 19851. The elements of TS, called tupks, certain restrictions on the shared data, are ordered sequences of values, similar making a distributed implementation fea- to records in Pascal [Wirth 19711. For sible. Below we describe two methods for example, providing shared data to distributed pro- cesses: distributed data structures and [ “jones”, 31, true] shared logical variables. Both models are used in several languages for distributed is a tuple with three fields: a string, an programming (see Section 3) that have integer, and a Boolean. been implemented on different kinds of Three atomic operations are defined on distributed architectures. These languages TS: out adds a tuple to TS, read reads a are mainly useful for applications where tuple contained in TS, and in reads a tuple the programmer need not be aware of the and also deletes it from TS. Unlike normal

ACM Computing Surveys, Vol. 21, No. 3, September 1989 282 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

shared variables, tuples do not have ad- generated: dresses. Rather, tuples are addressed by their contents. A tuple is denoted by spec- in(“A”, i, var void) ifying either the value or the type of each out(“A”, i, Y) field. This is expressed by supplying an To increment element i, the current tuple actual parameter (a value) or a formal is removed from TS, its value is stored in a parameter (a variable) to an operation. For temporary variable, and the new value is example, if age is a variable of type integer computed and stored in a new tuple: and married is a variable of type Boolean, then the tuple shown above can be read in in(“A”, i, var tmp) the operation out(“A”, i, tmp + 1) read(“jones”, var age, var married) If two processes simultaneously want to increment the same array element, the ele- or read and removed in the operation ment will indeed be incremented twice. Only one process will succeed in doing the in(“jones”, var age, var married). in, and the other process will be blocked In both operations, the variable age is as- until the first one has put the new value of signed the value of the second field (31) A[i] back into TS. and the variable married gets the value of In a distributed implementation of TS, the last field (true). Both the in and the the run-time system takes care of the dis- read operations try to find a matching tribution of tuples among the processors. tuple in ‘IS. A tuple matches if each field Several strategies are possible, such as rep- has the value or type passed as parameter licating the entire TS on all processors, to the operation. If several matching tuples hashing tuples onto specific processors, or exist, one is chosen arbitrarily. If there are storing a tuple on the processor that did no matching tuples, the operation (and the the out operation [Gelernter 19851. invoking process) blocks until another pro- In contrast with interprocess communi- cess adds a tuple that does match (using cation accomplished through message pass- out). ing, communication through distributed There is no operation that modifies a data structures is anonymous. A process tuple in place. To change a tuple, it must reading a tuple from TS does not know or first be removed from TS, then modified, care which other process inserted the tuple. and then put back into TS. Each read, in, Neither did the process executing an out or out operation is atomic: The effect of on a tuple specify which process the tuple several simultaneous operations on the was intended to be read by. This informa- same tuple is the same as that of executing tion could in principle be included in a them in some (undefined) sequential order. distributed data structure, for example, by In particular, if two processes want to re- having sender and receiver fields as part of move the same tuple, only one of them will the structure, but it is not an inherent part succeed, and the unlucky one will block. of the model. These two properties make it possible to Shared Logical Variables. Another build distributed data structures in TS. For shared data model is the shared logical example, a distributed array can be built variable. Logical variables have the “single- out of tuples of the form [name, index, assignment” property. Initially, they are value]. The value of element i of array A unbound, but once they receive a value (by can be read into a local integer variable X unification), they cannot be changed. In with a sirnple read operation: Section 2.1.1 we noted that this property read(“A”, i, var X) can cause conflicts between parallel pro- cesses sharing logical variables. Below, To assign a new value Y to element i, the we show how such variables can be current tuple representing A[i] is removed used as communication channels between first; then a tuple with the new value is processes.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 283 As an example, assume the three goals of and a consumer is created by having the the conjunction producer bind a shared variable to a list with two fields, head and tail. The head is goal-1(X, Y), goal-2(X, Y), goal-3(X) bound to the message, and the tail is the new stream variable, used for subsequent are solved in parallel by Processes Pl, P2, communications (wishes). This is illus- and P3. The variable X (initially unbound) trated in Figure 2 where the first call, pro- is a communication channel between the ducer(1, S), will cause S to be bound to three processes. If any of them binds X to [l ] Sl], where Sl is an unbound variable. a value, the other processes can use this The next (recursive) call, producer(2, Sl), value. Likewise, Y is a channel between Pl binds Sl to [4 ] S2], where S2 is unbound. and P2. The call, consumer(S), will cause the con- Processes synchronize by suspending on sumer process to be blocked until S is unbound variables. If Y is to be used to bound by the producer. When S is bound send a message from Pl to P2, then P2 can to [l ] Sl], the consumer wakes up, calls, suspend until Y is bound by Pl. There are use(l), followed by the recursive call, con- several ways to realize suspension on sumer(S1). The latter call blocks until Sl shared variables, but the general idea is to is bound to [4 ] S2], and so on. restrict the rights of specific processes Other techniques implementable with to generate bindings for variables (i.e., to shared logical variables are bounded-buffer unify them with anything but an unbound streams [Takeuchi and Furukawa 19851, variable). If a process wants to unify two one-to-many streams, and incomplete mes- terms, the unification may need to generate sages. An incomplete message contains a binding for some variables. If the process variables that will be bound by the receiver, does not have the right to bind one of these thus returning reply values. The sender can variables, the process suspends until some wait for replies by suspending on such a other process that does have this right gen- variable. Incomplete ‘messages can be used erates a binding for the variable. The first to implement many different message pro- process can then continue its unification tocols (e.g., remote procedures and rendez- of the two terms. Examples of mechanisms vous, discussed above) and to dynamically to restrict the rights for binding variables set up communication channels between are the read-only variables of Concurrent processes. PROLOG [Shapiro 19861 and the mode The shared logical variable model also declarations of PARLOG [Clark and has some disadvantages, as discussed by Gregory 19861. Gelernter [1984]. Only a single process can At first sight, shared logical variables append to a stream implemented through seem to be capable of transferring only a logical variables (e.g., in Figure 2 only the single message, as bindings cannot be producer can append to S:. Applications undone. But, in fact, the logical variable based on the client/server model, however, allows many communication pattern&o be require multiple clients to send messages expressed. The key idea is to bind a logical to a single server (many-to-one communi- variable to a term containing other (un- cation). To implement this in a parallel bound) variables, which can be used as logic language, each client must have its channels for further communication. A log- own output stream. There are two alterna- ical variable is like a Genie, from which you tives for structuring the server: First, the can ask one wish. What would you ask such server may use a separate input stream for a Genie? To have two more wishes! Then each client and accept messages sent use one of them, and iterate.” through each of these streams. This re- This idea has been used to develop sev- quires the server to know the identities of eral programming techniques. For example, all clients and thus imposes a limit on the a stream of messages between a producer number of clients. The second alterna- tive is to merge the output streams of all ” This analogy was contributed by Ehud Shapiro. clients and present it as a single input

ACM Computing Surveys, Vol. 21, No. 3, September 1989 284 8 H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

/+ the consumer is not allowed to bind 5 w/ mode producer(N?, S^), consumer(S?).

producer(N, [XIXs]) :- /* produce stream of squares */ X is NwN, N2 is N+l, producer(N2,Xs).

consumer([XlXs]) :- use(X), consumer(Xs).

/w start consumer end producer in parallel */ main :- producer(l,S), consumer(S).

Figure 2. Implementation of streams with shared logical variables.

stream to the server. Such merge opera- The guard consists of a Boolean expression tions can be expressed in parallel logic lan- and some sort of “communication request.” guages [Shapiro and Safra 19861, but The Boolean expression must be free of Gelernter argues that the resulting pro- side effects, as it may be evaluated more grams are less clear and concise than sim- than once during the course of the select ilar prOgXinS in languages supporting statement’s execution. In CSP [Hoare streams with multiple readers and writers. 19781, for example, a guard may contain an explicit receipt of a message from a specific 2.2.3 Expressing and Controlling process P. Such a request may either suc- Nondeterminism ceed (if P has sent such a message), fail (if P has already terminated), or suspend (if P As discussed in Section 2.2, the interac- is still alive but has not sent the message tion patterns between processes are not al- yet). The guard itself can either succeed, ways deterministic, but sometimes depend fail, or suspend: The guard succeeds if on run-time conditions. For this reason, the expression is “true” and the request models for expressing and controlling non- succeeds; the guard fails if the Boolean determinism have been introduced. Some expression evaluates to “false” or if the communication primitives that we have al- communication request fails; or the guard ready seen are nondeterministic. A message suspends if the expression is “true” and the received indirectly through a port, for ex- request suspends. The select statement as ample, may have been sent by any process. a whole blocks until either all of its guards Such primitives provide a way to express fail or some guards succeed. In the first nondeterminism, but not to control it. Most case, the entire select statement fails and programming languages use a separate con- has no effect. In the latter case, one suc- struct for controlling nondeterminism. We ceeding guard is chosen nondeterministi- will look at two such constructs: the select tally, and the corresponding statement part statement, used by many algorithmic lan- is executed. guages, and the guarded Horn clause, used In CSP, the select statement can be used by most parallel logic programming lan- to wait nondeterministically for specific guages. Both are based on the guarded com- messages from specific processes. The se- mand statement, introduced by Dijkstra as lect statement contains a list of input re- a sequen.tial control structure [Dijkstra quests and allows individual requests to be 19751. enabled or disabled dynamically. For ex- The Select Statement. A select state- ample, the buffer process described above ment consists of a list of guarded com- can interact with a consumer and a pro- mands of the following form: ducer as shown in Figure 3. Communication takes place as soon as either (1) the buffer guard + statements is not full and the producer sends a mes-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 285

not full(buffer); producer?DepositItem(x) + add x 10 end of buffer;

[] not empty(buffer); consumer?AskForItemO + consumer!SendItem(first item of buffer); remove first irem from buffer; 1

Figure 3. A select statement in CSP used by a buffer process. The statement consists of two guarded commands, separated by a “[ I.” The ‘I?” is the input (receive) operator. The “!” is the output (send) operator.

sage, DepositItem; or (2) the buffer is not same process are willing to terminate and empty and the consumer sends a message, the process that created them has finished AskForItem. In the latter case, the buffer the execution of its statements, all these process responds by sending the item to the processes are terminated. This mechanism consumer. presumes hierarchical processes. CSP’s select statement is asymmetric in A final note: Select statements in most that the guard in CSP can only contain an languages are unfair. In the CSP model, for input operator, not an output operator. example, if several guards are successful, Thus, a process P can only wait to receive one of them is selected nondeterministi- messages nondeterministically; it cannot tally. No assumptions can be made about wait nondeterministically until some other which guard is selected. Repeated execution process is ready to accept a message from of the select statement may select the same P [Bernstein 19801. Output guards are ex- guard over and over again, even if there are cluded from most languages, because they other successful guards. An implementation usually complicate the implementation. may introduce a degree of fairness, by Languages that do allow output guards assuring that a successful guard will be include Joyce [Brinch Hansen 19871 and selected within a finite number of itera- Pascal-m [Abramsky and Bornat 19831. tions, or by giving guards equal chances. Select statements can also be used for On the other hand, an implementation may controlling nondeterminism other than evaluate the guards sequentially and always communication. Some languages allow a choose the first one yielding “true.” The guard to contain a time-out instead of a semantics of select statements do not guar- communication request. A guard contain- antee any degree of fairness, so program- ing a time-out of T seconds succeeds if no mers cannot rely on it. other guard succeeds within T seconds. Proposals have been made for giving pro- This mechanism sets a limit on the time a grammers explicit control over the selec- process will wait for a message. Another tion of succeeding guards. Silberschatz use of select statements is to control suggests a partial ordering of the guards termination of processes. In Concurrent C, [ Silberschatz 19841. Elrad and Maymir- a guard may consist of the keyword ter- Ducharme propose prefixing every guarded minate. A process that executes a select command with a compile-time constant statement containing a terminate guard is called the preference control value [Elrad willing to terminate if all other guards fail and Maymir-Ducharme 19861. If several or suspend. If all processes are willing to guards succeed, the one with the highest terminate, the entire Concurrent C pro- preference control value (i.e., priority) is gram terminates. Ada uses a similar mech- chosen. If there are several guards with anism to terminate parts of a program. this value, one of them is chosen nondeter- Roughly, if all processes created by the ministically. This feature is useful if some

ACM Computing Surveys, Vol. 21, No. 3, September 1989 286 . H. E. Bal, J. G. Steiner, and A. S. Tanenbaum requests are more urgent than others. body. Declaratively, the commit operator For example, the buffer process may wish ‘I 1n is. also a conjunction operator. to give consumers a higher priority than Just like the guards of a select statement, producers. the guard of a guarded Horn clause can either succeed, fail, or suspend. A guard Guarded Horn Clauses. Logic programs suspends if it tries to bind a variable that are inherently nondeterministic. In reduc- it is not allowed to bind, as explained in ing a goal of a logic program, there are often Section 2.2.2. If a goal with a predicate A several clauses to choose from (see Section is to be reduced, the guards of all clauses 2.1.1). Intuitively, the semantics of logic for A are tried in parallel, until some guards programming prescribe that the underlying succeed. The reduction process then execution machinery must simply choose chooses one-of these guards nondetermin- the “right” clause, the one leading to a istically and commits to its clause. It aborts proof. This behavior is called don’t know execution of the other guards and executes nondeterminism. In sequential logic lan- the body of the selected clause. guages (e.g., PROLOG), these semantics So far, this all looks much like the select are implemented using backtracking. At statement, but there are some subtle differ- each choice point, an arbitrary clause is ences. A guard should not be allowed to chosen, and if it later turns out to be the affect its environment until it is selected. wrong one, the system resets itself to the Guards that are aborted should have no state before the choice point and then tries side effects at all. Precautions must be another clause. taken against guards that try to bind vari- In a parallel execution model, several ables in their environment. For. example, goals may be tried simultaneously. In this consider the following piece of code: model, backtracking is very complicated to implement. If a binding for a variable has A W :- G(X) 1B(X). to be undone, all processes that have used A(X) :- H(X) I C(X). this binding must backtrack too. Most par- G(1) :- P(1). allel logic programming languages therefore H(2) :- Q(2). avoid backtracking altogether. Rather than trying the clauses for a given predicate one The guard G of the first clause binds X to by one and backtracking on failure, parallel 1 and then calls P. Guard H of the second logic languages (1) search all these clauses clause binds X to 2 and calls Q. These in parallel and (2) do not allow any bindings bindings should not be made visible to the made during these parallel executions to be caller of A until one of the guards G or H visible to the outside until one of the par- is committed to. PARLOG ensures this allel executions is committed to. This is by using mode-declarations to distinguish called OR-parallelism (see Section 2.1.1). between input and output variables of a Unfortunately, this cannot go on indefi- clause. The compiler checks that guards (or nitely, because the number of search paths any other goals in the body) do not bind worked on in parallel will grow exponen- input variables. If a guard binds an output tially with the length of the proof. variable, this binding is initially made to a A popular technique to control OR- temporary variable. When a clause is com- parallelism is committed-choice nondeter- mitted to, the bindings made by its guard minism (or don’t care nondeterminism), are made permanent, and the bindings gen- which nondeterministically selects one al- erated by the other guards (to temporaries) ternative clause and discards the others. It are thrown away. If a guard is ultimately is based on guarded Horn clauses of the not selected, it has no effect at all. following form: Concurrent PROLOG, on the other hand, allows variables in the environment A :- G1, . . . , G, ( B1, . . . , B,, to be changed before commitment. But the n Z 0, m Z 0. effects only become visible outside the clause if the clause is committed to. The The conjunction of the goals Gi is called the semantics and distributed implementation guard; the conjunction of the goals Bi is the of commitment in Concurrent PROLOG

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 287 are similar to those of atomic transactions failing processors is also stored on some [Taylor et al. 1987131. healthy ones. Thus, the system as a whole Committed-choice nondeterminism has becomes more reliable. This principle of one severe drawback: It gives up complete- replication of information can be used to ness of search. In the code above, if P(l), increase the availability of the system. A C(2), and Q(2) are true (i.e., can be proven), system is said to be fault tolerant if it still but B (1) is false, then the goal A(X) logi- continues functioning properly in the face cally should succeed. The conjunction of processor crashes, allowing distributed “H(X) and C(X)” is true for X = 2 (recall programs to continue their execution and that “ ] ” is logically equivalent to “and”). allowing users to keep on using the system. As both guards G(X) and H(X) succeed, In general, it is not an easy task to however, the system may select G(X) and write programs that can survive processor abandon the second clause. As B(1) turns crashes and continue as if nothing had hap- out to be false, the first clause will fail, and pened. The responsibility for achieving re- no solution will be found. The programmer liability can be split up among the operating must ensure that, at the time of commit- system, the language run-time system, and ment, either the right clause is selected or the programmer. Numerous research pa- no clause resulting in a proof exists. This pers have been published about how oper- can be achieved by extending the guards to ating systems can support fault tolerance include B(X) and C(X): [LeBlanc and Wilkes 1985; Powell and Presotto 19831. In the following sections, A(X) :- G(X), B(X) I. we discuss how programming languages can A(X) :- H(X), C(X) I. contribute their part. This technique should not be used indis- criminately, because it restricts the effec- 2.3.1 Programming Fault Tolerance tive parallelism. The binding to variable X is not made known to the caller of A(X) The simplest approach to handling proces- until commitment, so in the new scheme, sor failures is to ignore them altogether. the caller will have to wait longer for this This means that a single crash will cause value to be available. This implies that, in the entire program to fail. Typically, pro- general, guards should be kept as small as cesses trying to interact with a sick proces- possible. sor will either be blocked forever or discover For reasons of simplicity and ease of an unexpected communication failure and implementation, most of the recent efforts terminate. A program running in parallel in parallel logic programming languages on several processors has a higher chance center on their so-called “flat” subsets. of failing than its single processor counter- In a flat guarded Horn clause, guards part (although the shorter execution time are restricted to simple predefined test of the parallel version may compensate a predicates. bit). Still, as processor crashes are rare, for many applications this is not a problem. The next simplest approach to imple- menting fault tolerance is to let the pro- 2.3 Partial Failure grammer do it. The operating system or The final issue that must be addressed by language run-time system can detect pro- languages for programming distributed sys- cessor failures and return an error status tems is the potential for partial failure of to every process that wants to communicate the system. Distributed computing systems with a crashed processor. The programmer have the potential advantages over central- can write code to deal with this contin- ized systems of higher reliability and avail- gency. For some programs, this approach is ability. If some of the processors involved quite adequate. As an example, consider a in a distributed computation crash, then, distributed chess program in which each in principle, the computation can still con- processor repeatedly chooses one possible tinue on the remaining processors, provided move in the current board position, evalu- that all vital information contained by the ates the new position, and returns some

ACM Computing Surveys, Vol. 21, No. 3, September 1989 288 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum score. If a processor crashes before return- will again call procedure g on P3. Unfortu- ing the result, all that need be done is to nately, P3 does not know that P2 has have another processor analyze the posi- crashed. P3 executes g twice and may tion. This simple scheme only works be- return the results in any order, possibly cause the processors have no side effects violating last-one semantics. The problem except for returning the score. No harm is is that, in a distributed environment, a done if a position is examined twice. crashed processor may still have outstand- A possible improvement to this scheme ing calls to other processors. Such calls are is to let the language run-time system take appropriately called orphans, because their care of repeating requests to do some work. parents (callers) have died. To achieve last- Nelson has studied this approach in the one semantics, these orphans must be context of the Remote Procedure Call terminated before restarting the crashed model [Nelson 19811. If the run-time sys- processes. This can be implemented either tem detects that Processor P has crashed, by waiting for them to finish or by tracking P’s processes are restarted, either on P or them down and killing them (“orphan ex- on another processor. Furthermore, all out- termination”). As this is not an easy job, standing RPCs to P are repeated. other (weaker) semantics have been pro- As procedures can have side effects, it is posed for RPC. Last-of-many semantics is important to specify accurately the seman- obtained by neglecting orphans. It suffers tics of a call that may have been executed from the problem described above. An even (entirely or partially) more than once. weaker form is at-least-once semantics, Nelson gives a classification of these call which just guarantees that the call is exe- semantics. The simplest case is a local pro- cuted one or more times, but does not spec- cedure call (the caller and callee are on the ify which results are returned to the caller. same processor). If the processor does not One key idea is still missing from our crash, the call is executed exactly once (ex- discussion. Procedure calls (local as well as actly-once semantics). If the processor does remote) can have side effects. If a call is crash, the run-time system restarts all proc- executed many times (because of processor esses of the crashed processor, including crashes), its side effects also are executed the caller and the callee of the procedure. many times. For side effects like incre- The call will eventually be repeated, until menting a bank account stored in a data- it succeeds without crashing. Clearly, the base, this may be highly undesirable (or results of the last executed call are used by highly desirable, depending on one’s point the caller, although earlier (abandoned) of view). A mechanism is needed to specify calls may have had side effects that sur- that a call either runs to completion or vived the crash (e.g., changing a file in the has no effects at all. This is where atomic processor’s local disk). These semantics are transactions come in. called last-one semantics. For RPCs, where the caller and callee are 2.3.2 Atomic Transactions on different processors, the best that can be hoped for is to have the same semantics A distributed program can be regarded as a as for local calls, which are exactly-once set of parallel processes performing opera- semantics without crashes and last-one tions on data objects. Usually, a data object semantics with crashes. The former is not is managed by a single process, but other very hard to obtain, but achieving last-one processes can operate on the object indi- semantics in the presence of crashes turns rectly (e.g., by issuing an RPC requesting out to be tricky, especially if more than two the managing process to do the operation). processors are involved. Suppose Processor In general, the effects of an operation be- Pl calls Procedure f on Processor P2, which come visible immediately. Moreover, oper- in turn calls Procedure g on Processor P3. ations affecting objects on secondary While P3 is working on g, P2 crashes. P2’s storage become permanent once the oper- processes will be restarted, and Pl’s call to ation has been performed. Sometimes this f will be repeated. The second invocation behavior is undesirable. Consider a pro-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 289 gram that transfers a sum of money from parallelism and hence degrades perfor- one bank account (stored on disk) to an- mance. A more efficient approach is to syn- other, by decreasing the first one and in- chronize processes by using finer-grained creasing the second one. This simple locks. The process doing the transfer first approach has two dangers. First, if another locks the two accounts. Other processes parallel process adds up all accounts in the trying to access these two accounts are database while the first process is in the automatically suspended when they at- middle of its transaction, it may observe tempt to them. the new value of the first account and the Atomic transactions originated in the old value of the second, so that it uses database world, but they are also used by inconsistent values. Second, if the process some programming languages, such as Ar- doing the transfer crashes immediately gus [Liskov 19881, Aeolus [Wilkes and after decreasing the first account, it leaves LeBlanc 19861, and Avalon [Detlefs et al. the database in an inconsistent state. If it 19881. A programming language can pro- is restarted later, it may try to decrease the vide convenient abstractions for data first account once more. objects and invocations of atomic trans- A solution to these problems is to group actions. The language run-time system can operations together in atomic transactions take care of many details, like locking and (also called atomic actions or simply trans- version management. These issues are dis- actions). A group of operations (called a cussed in Section 3.1.7. transaction) is atomic if it has both the property of indivisibility and the property 2.3.3 Transparent Fault Tolerance of recoverability. A transaction is indivisible if, viewed from the outside, it has no inter- The mechanisms discussed above provide mediate states. For the outside world (i.e., linguistic support for dealing with partial all other transactions), it looks as if either failures. Some of the problems are solved all or none of the operations have been by the operating system or the language executed. A transaction is recoverable if all run-time system, but programmers still objects involved can be restored to their have to do part of the work. This work has initial state if the transaction fails (e.g., to be done for every new application. Other due to a processor crash), so that the systems relieve programmers from all wor- transaction has no effect at all. ries, by supporting fault tolerance in a fully Recoverability can be achieved as fol- transparent way. lows: If a transaction contains an operation Borg et al. describe a fauit-tolerant that tries to change an object, the changes message-passing system [Borg et al. 19831. are not applied to the original object, but For each process, an inactive backup pro- to a new copy of the object, called a version. cess is created on another processor. All If the entire transaction fails (aborts), the messages sent to the primary process are new versions are simply discarded. If the also sent to its backup. The backup also transaction succeeds, it commits to these counts the messages sent by the primary new versions. All objects changed by the process. If the primary processor crashes, transaction retain the value of their new the backup process becomes active and version. Furthermore, the latest value of starts repeating the primary process’s com- each object is also placed on stable storage putations. Whenever it wants to receive a [Lampson 19811, which has a very high message, the backup process reads the next chance of surviving processor crashes and message saved while the primary process is accessible by all processors. was still alive. If the backup process needs Indivisibility can be trivially assured by to send a message, it first checks to see if executing all atomic transactions sequen- the primary process had already sent it, to tially. In our bank account example, we avoid sending messages twice. During nor- could deny other processes access to the mal computations, the primary and backup database while the first process is doing the processes periodically synchronize, to copy transfer. Unfortunately, this severely limits the entire state of the primary process (a

ACM Computing Surveys, Vol. 21, No. 3, September 1989 290 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum checkpoint) to the backup. The backup pro- on logical distribution, parallel computa- cess then can forget all messages previous tions (e.g., processes) communicate by to the checkpoint. sending messages to each other. The ad- This approach requires extra processors dress spaces of different computations do and will sometimes delay computation not overlap, so the address space of the while a checkpoint is being made. Strom whole program is distributed. In a logically and Yemini propose a different technique, nondistributed language, the parallel units optimistic recovery, to be used in systems have a logically shared address space and consisting of processes that interact only communicate through data stored in the by message passing [Strom and Yemini shared address space. Note that this dis- 1985b]. (This model is used in their lan- tinction is based on the logical model of the guage NIL.) Their technique involves language; the presence of logically shared periodic checkpointing and logging of mes- data does not imply that physical shared sages on stable storage, rather than to a memory is needed to implement the lan- backup process. As a fundamental depar- guage. All languages described below that ture from Borg’s approach, these activities are based on logically shared data have proceed asynchronously with the normal been implemented on distributed comput- computations. This has the advantage that, ing systems, that is, on computers without if I/O bandwidth to stable storage is high shared primary memory. enough, the normal computation will not The languages in the two categories are slow down. However, the technique re- further partitioned into a number of quires some bookkeeping overhead to allow classes, based on their communication a consistent system state to be restored mechanisms. In the first category, we in- after a crash. clude synchronous message passing, asyn- chronous message passing, rendezvous, RPC!, multiple communication primitives, 3. LANGUAGES FOR PROGRAMMING objects, and atomic transactions. In the DISTRIBUTED SYSTEMS second category, we distinguish between In this section we take a closer look at implicit communication through function- several languages that were designed for results (used in parallel functional lan- programming distributed systems. It is dif- guages), shared logical variables (parallel ficult to determine exactly how many such logic languages), and distributed data struc- languages exist; we know of nearly 100 rel- tures. The classification is illustrated in evant languages, but there are probably Figure 4. many more. We have selected a subset for In each of the following subsections, we closer study. These languages together are discuss one class of languages. Each sub- representative of research in this area. We section starts with a table containing sev- have chosen these languages to cover a eral languages of that class together with broad spectrum of ideas. Although we have references to papers on these languages. attempted to focus on languages that have Each table corresponds with one specific been well documented and cited in the lit- leaf in the tree of Figure 4. We have selected erature, we fully admit that any selection at least one language from each table for of this kind contains a certain amount of closer study. We describe the most distinc- subjective choice. We include references to tive features of the example language(s) languages not discussed in detail here. An and discuss how it differs from other mem- overview of the languages for distributed bers of its class. We emphasize the seman- programming cited in this paper is given in tics, rather than the syntax. Our intention the Appendix. is to expose the new key ideas in the We have organized the languages in a language, not to provide a full language simple classification scheme. First, we dis- description. tinguish between logically distributed and For each language, we first provide back- logically nondistributed languages, as dis- ground information on its design. Next, we cussed in Section 1.3. In languages based describe how parallelism is expressed in the

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 291 Synchronous messagepassing Asynchronous messagepassing

Rendezvous distributed Remote Rocedure Call address space Multiple primitives

Objects distributed Atomic transactions .languages

Functional languages shared Logic languages address space Distributed data structures

Figure 4. Classification of languagesfor distributed programming. language and how parallel units are mapped even multiway) communication primitives. onto processors (if the language addresses Object-based languages also support one or this issue). Subsequently, the communica- more of the above primitives. Unlike other tion and synchronization primitives are languages, communication is between ob- discussed. If relevant, we also discuss how jects rather than processes. As objects en- the language deals with fault tolerance. capsulate both data and behavior, these Finally, we give information on implemen- languages may also be thought of as provid- tations and user experiences with the lan- ing some form of data sharing. Finally, we guage. Issues like support for distributed discuss languages based on atomic trans- debugging and commercial availability of actions; these languages are mainly in- language implementations are outside the tended for implementing fault-tolerant scope of this paper and are not discussed. applications.

3.1 Languages with Logically Distributed 3.1.1 Synchronous Message Passing Address Spaces In 1978 Hoare wrote what was later to We discuss seven classes of languages become a very influential paper, although with logically distributed address spaces: it described only a fragment of a language languages supporting synchronous mes- [Hoare 19781. The language, called Com- sage passing, asynchronous message pass- municating Sequential Processes (CSP), ing, rendezvous, RPC, multiple communi- generated some criticism [Kieburtz and cation primitives, operation invocations Silberschatz 1979; Bernstein 19801, but on objects, and atomic transactions. also stimulated the design of many other Languages in the first two classes pro- languages and systems (see Table 2). The vide point-to-point messages. Rendezvous- CSP model consists of a fixed number of based languages support two-way commu- sequential processes that communicate nication between senders and receivers. An only through synchronous message passing. RPC is also a two-way interaction, but Joyce differs from the other languages of its semantics are closer to a normal proce- Table 2 by supporting recursiue processes. dure call. Languages in the fifth class use Below, we describe CSP in some detail and a variety of one-way and two-way (or discuss one of its descendants, Occam.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 292 ’ H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Table 2. Languages Based on Synchronous receiving process specifies the name of the Message Passing sending process and provides a variable to Synchronous message passing which the received value is assigned. A Language References process executing either a send or a re- ceive is blocked until its partner has CCSP [Hull and Donnan 19861 CSM [Zhongxiu and Xining 19871 executed the complementary statement. CSP [Hoare 19781 Consider the following example: CSP-s [Patniak and Badrinath 19841 CSPS [Roman et al. 19871 [X :: Y! 3 ]] Y :: n: integer; X ? n] CSP/80 [Jazayeri et al. 19801 ECSP [Baiardi et al. 19841 In Process X’s statement, the value 3 is GDPL [Ng and Li 19841 sent to Y. In Process Y’s statement, input Joyce [Brinch Hansen 19871 is read from Process X and stored in the LIMP [Hunt 19791 Occam [Inmos Ltd. 19841 local variable n. When both X and Y have Pascal-m [Abramsky and Bornat 19831 executed their statements, the one-way Pascal + CSP [Adamo 19821 communication occurs. The net result is Planet [Crookes and-Elder 19841 assigning 3 to n. RBCSP [Roper and Barter 19811 Both simple and structured data may be communicated (and assigned), as long as the value sent is of the same type as the CSP. CSP was designed by Hoare as a variable receiving it. The structured data simple language that allows an efficient can be given a name (a constructor), such implementation on a variety of architec- as pair in the following example: tures [Hoare 1978, 19851.” [X :: Y ! pair(35, 60) ]I Y :: n, m: integer; (1) Parallelism. CSP provides a simple X ? pair(n, m)] parallel command to create a fixed number of parallel processes. A process consists of An empty constructor may be used to syn- a name, local variables, and a sequence of chronize two processes without transfer- statements (body). CSP processes take no ring any real data. parameters and cannot be mapped onto The alternative construction provides for specific processors. An array of similar nondeterminism in CSP. It consists of sets processes can be created, but their number of guards followed by actions to be per- must be a compile-time constant. As a formed. The guards may contain Boolean simple example of a parallel statement, expressions and an input statement, as explained in Section 2.2.3. CSP allows [writer :: 1~:real; . . . ]] reader(i: 1 . . 2) :: . . .] a process to receive selectively, based creates three processes, called “writer,” on the availability of input and the “reader(l),” and “reader(2).” The writer name field (constructor) of the incoming has a local variable x. The subscript vari- communication. able i can be used within the body of the (3) Implementation and experience. reader processes. CSP is essentially a paper design, but it has influenced the design of several langu- (2) Communication and synchronization. ages (see Table 2) that have been imple- CSP processes may not communicate by mented and used, most notably the Occam using global variables. All interprocess language. communication is done using synchronous receive and send. The sending process Occam. Occam is modeled on Hoare’s specifies the name of the destination pro- CSP and was designed for programming cess and provides a value to be sent. The Inmos’s transputer [Inmos Ltd. 1984; May 19831. Occam is essentially the assembly I1 We describe the original language, outlined by Hoare language of the transputer. The language [1978]; the 1985 version has a clearer syntax and uses lacks features that have become standard named channels. in most modern programming languages,

ACM Computing Surveys, Vol. 21, NO. 3, September 1989 Programming Languages for Distributed Computing Systems 9 293 such as data typing, recursive proce- with the “WAIT AFTER t” construct. This dures, and modules. can be used as a constituent of an ALT construct, for example, to prevent a pro- (1) Parallelism. There are three basic cess from hanging forever if no input is actions in Occam: assignment, input, and forthcoming. output. Each action is considered to be a little process. Processes can be grouped to- (3) Implementation and experience. gether in several ways to form more com- Occam was intended for use with multiple plex processes. Any process can be named interconnected transputers, where a chan- by prefixing its definition with the keyword nel would be implemented as a link between PROC, followed by its name and a list of two transputers [May and Shepherd 19841. formal parameters. When subsequently The transputer implementation is quite ef- referenced, a new instance of the named ficient (e.g., a context switch takes a few process is created, with the parameters microseconds). This efficiency has been specified in the reference. Both parallel and achieved by using a simple communication sequential execution of a group of processes model (CSP) and by requiring the number must be explicitly stated, by heading the of processes and their storage allocation to group with a PAR or SEQ, respectively. be determined at compile time. Occam has Arrays of similar processes can be ex- also been implemented on nontransputer pressed in Occam. In the construct systems [Fisher 19861. Occam is used extensively for applica- PARi=OFORn tions like signal processing, image pro- process . . . cessing, process control, simulation, and . A major criticism of the n parallel processes are created, each with first version of Occam is the inability to a different value for i. pass complex objects (e.g., arrays) as part Occam provides a facility for assigning of a single message. Occam-2 has addressed processes to processors. Parallel processes this problem through the introduction of may be prioritized by prefixing the group channel protocols, which describe the types with PRI PAR. The first process in the of objects that may be transferred across a group is given highest priority; the second, channel [Burns 19881. The compiler (some- second highest priority; and so on. times with the help of the run-time system) checks that the input and output operations (2) Communication and synchronization. on a channel are compatible with the chan- Unlike CSP, parallel processes communi- nel protocol. cate indirectly through channels. A channel is a one-way link between two processes. 3.1.2 Asynchronous Message Passing Channel communication is fully synchron- ous. Only one process may be inputting The synchronous message-passing model from, and one outputting to, a channel at a proposed by Hoare and adapted by Occam given time. Channels are typed, and their prevents the sending process from contin- names can be passed as parameters to uing immediately after sending the mes- PROC calls. sage. The sender must wait until the Occam provides an ALT construct, sim- receiving process is willing to accept the ilar to CSP’s alternative statement, to ex- message. This design decision has a major press nondeterminism. The constituents of impact on both the programming style and this construct can be prioritized. If input the implementation of a language. Several is available on more than one channel, language designers have chosen to remove the one with the highest priority will be this restriction and support asynchronous accepted. message passing, sometimes in addition to The current time can be read from an synchronous message passing. Languages input-only channel declared as a TIMER. in this class are shown in Table 3. We A delay until a certain time can be made discuss NIL in more detail.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 294 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Table 3. Languages Based on Asynchronous Message Passing Asynchronous message passing Language References AMPL [Dannenberg 1981; Milewski 19841 CMAY [Bagrodia and Chandy 19851 Concurrent C [Tsujino et al. 19841 CONIC [Kramer and Magee 1985; Sloman and Kramer 19871 DPL-82 [Ericson 19821 FRANK [Graham 19851 GYPSY [Ambler et al. 19771 LADY [Nehmer et al. 19871 MENYMA/S [Koch and Maibaum 19821 NIL [Strom and Yemini 19831 ParMod [Eichbolz 19871 PCL [Lesser et al. 19791 Platon [Staunstrup 19821 PLITS [Feldman 19791 Port Language [Kerridge and Simpson 19861 Pronet [LeBlanc and Maccabe 19821 ZEN0 [Ball et al. 19791

NIL. NIL (Network Implementation that would make compile-time checking of Language) is a high-level language for the typestates impossible. It does not provide construction of large, reliable, distributed explicit pointer manipulation (it does pro- software systems [Strom and Yemini 1983, vide a higher level construct for building 1984, 1985a, 19861. NIL was designed by general data structures), and it has an IPC Robert Strom and Shaula Yemini at the model that disallows sharing of variables. IBM T. J. Research Center. (1) Parallelism. Parallelism in NIL is NIL is a secure language, which means based on the so-calledprocess model [Strom that one program module cannot affect the et al. 1985; Strom 19861. A NIL system correctness of other modules (e.g., by a consists of a network of dynamically cre- “wild store” through a bad pointer). The ated processes that communicate only by importance of security is pointed out by message passing over communication chan- Hoare [1981]. Security in NIL is based on nels. In NIL, a process is not only the unit an invention called the typestate [Strom of parallelism, but also the unit of modular- and Yemini 19861. A typestate is a compile- ity. The division of a NIL program into pro- time property that captures both the type cesses should be based on software engi- of a variable and its state of initialization. neering principles rather than on perfor- In the program fragment mance considerations. The mapping of 1. X, Y: INTEGER, processes onto processors is considered to 2. if condition then X := 4; end if be an implementation issue, to be dealt 3. Y := x + 3; with by the compiler and run-time system. This process model makes NIL concep- statement 3 is marked as illegal by the tually simpler than languages that have compiler, because variable X might still be separate mechanisms for parallelism and uninitialized at this point. X has the right modularity (e.g., tasks and packages in type (integer), but the wrong state. The Ada). typestate mechanism imposes some con- straints on the structure of the programs (2) Communication and synchronization. (especially on the control flow), but the Configuration of the communication paths designers claim that these constraints are between processes is done dynamically. A not overly restrictive and usually lead to port in NIL is a queued communication better structured code. NIL avoids features channel. At a given time, a port has one

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 295 specific owner. Ownership of a port can be Table 4. Languages Based on Rendezvous transferred to another process, by passing Rendezvous the port as part of a message or by passing the port as an initialization parameter to a Language References newly created process. A process can con- Ada [U.S. Department of Defense 19831 nect input ports and output ports owned BNR Pascal [Gammage et al. 19871 by it. Concurrent C [Gehani and Roome 19891 Both synchronous communication and MC [Rizk and Halsall 19871 asynchronous communication are sup- ported. A single input port may be con- nected to several output ports, so there can 1984a; Mundie and Fisher 1986; Burns et be multiple pending messages on an input al. 19871) and to the implementation of port; these messages therefore have to be Ada’s multitasking. van Katwijk reviews queued. A guarded-command style state- more than 30 papers of the latter category ment is provided for waiting for messages [van Katwijk 19871. on any of a set of input ports. (1) Parallelism. Parallelism is based on (3) Fault tolerance. Recovery from pro- sequential processes, called tasks in Ada. cessor failures is intended to be handled Each task has a certain type, called its task transparently by the NIL run-time system, type. A task consists of a specification part, using the optimistic recovery technique which describes how other tasks can com- discussed in Section 2.3.3. municate with it, and a body, which con- (4) Implementation and experience. A tains its executable statements. Tasks can NIL compiler generating code for a unipro- be created explicitly or can be declared, but cessor (IBM 370) has been implemented. in neither case is it possible to pass any Research on distributed implementations parameters to the new task. Limited con- has focused on transformation strategies, trol over the local of tasks is which optimize NIL programs for specific given, by allowing a static priority to be target configurations [Strom and Yemini assigned to task types. There is no notation 1985a]. NIL has been used to implement a for mapping tasks onto processors. prototype communication system, consist- (2) Communication and synchronization. ing of several hundred modules [Strom and Tasks usually communicate through the Yemini 19861. The implementors found rendezvous mechanism. Tasks can also the typestate mechanism highly useful in communicate through shared variables, but integrating this relatively large number of updates of a shared variable by one task modules. are not guaranteed to be immediately visi- ble to other tasks. An implementation that 3.1.3 Rendezvous does not have physically shared memory The rendezvous mechanism was first used may keep local copies of shared variables in Ada and later employed in some other and defer updates until tasks explicitly syn- languages, as shown in Table 4. We discuss chronize through a rendezvous. Ada and Concurrent C below. The rendezvous mechanism is based on entry declarations, entry calls, and accept Ada. The language Ada was designed on statements, as discussed in Section 2.2.1. behalf of the Department of Defense by a Entry declarations are only allowed in the team of people led by Jean Ichbiah [U.S. specification part of a task. Accept state- Department of ,Defense 19831. Since its ments for the entries appear in the body of first (preliminary) definition appeared in the task. They contain a formal parameter 1979, Ada has been the subject of an ava- part similar to that of a procedure. It is not lanche of publications. A substantial part possible to accept an entry conditionally of the discussion in these publications re- depending on the values of the actual pa- lates to parallel and distributed program- rameters, or to control the order in which ming in Ada (e.g., [Yemini 1982; Gehani outstanding requests are accepted. Gehani

ACM Computing Surveys, Vol. 21, No. 3, September 1989 296 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum and Cargill show that an array of entries extensively in the future. Many implemen- with the same formal part (a so-called fam- tations of Ada are now available, and sev- ily) can sometimes be used instead of con- eral million lines of Ada code have already ditional acceptance, although in general been written for uniprocessor applications this leads to polling [Gehani and Cargill [Myers 19871. 19841. Burns et al. cite 18 papers addressing the A task can call an entry of another task issue of how to use Ada in a distributed by using an entry call statement, similar to environment [Burns et al. 19871. They also a procedure call statement. An entry call review many problems with parallel and specifies the name of the task containing distributed programming in Ada. The the entry, as well as the entry name itself. synchronization mechanism receives a Entry names cannot be used in expressions substantial part of the criticism: It is asym- (e.g., ‘they cannot be passed around as metric (input guards but not output guards parameters). A program can use a pointer in select statements), entry calls are al- to an explicitly created task as a name for ways serviced in FIFO order and cannot be that task. Pointers are more flexible than accepted conditionally, and it is not possi- static identifiers, but they cannot point to ble to assign priorities to alternatives of a declared tasks or to entries. select statement. Distribution of programs Ada uses a select statement similar to among multiple processors is not addressed CSP’s alternative command for expressing by the definition of Ada, but is left to nondeterminism. Ada’s select statement is configuration tools. actually used for three different purposes: Concurrent C. Concurrent C extends the to select an entry call nondeterministically C language [Kernighan and Ritchie 19781 from a set of outstanding requests, to call by adding support for distributed program- an entry conditionally (i.e., only if the ming. The language is being developed at called task is ready to accept it immedi- AT&T Bell Laboratories, by Narain Ge- ately), and to set a time-out on an entry hani and others [ Gehani and Roome 1986a, call. So Ada essentially supports input 19891. Concurrent C is based on Ada’s ren- guards and conditional and timed entry dezvousmodel,butitsdesignerstriedtoavoid calls, but not output guards. the problems they observed in this model (3) Fault tolerance. Ada has an excep- [Gehani and Roome 19881. tion-handling mechanism for dealing with (1) Parallelism. A process in Concur- software failures, but the language defini- rent C has a specification part and a body, tion does not address the issue of hardware just like tasks in Ada. The specification failures [Burns et al. 19871. If the processor part consists of the process’s name, a list on which a task T executes crashes, an of formal parameters, and a list of trans- implementation may (but need not) treat actions. (A transaction is Concurrent C’s T like an aborted task (i.e., a task that equivalent to an Ada entry.) Processes are failed because of software errors). If so, created explicitly, using the create primi- other tasks that try to communicate with tive, which can pass parameters to the cre- T will receive a tasking-error exception ated process. The new process can be given and conclude that T is no longer alive; a priority (which can later be changed by however, they do not know the reason itself or by other processes) and can be (hardware or software) why T died, so this assigned to a specific processor. The create support for dealing with processor failures primitive returns an identifier for the new is very rudimentary. process instantiation. This value can be assigned to a variable of the same pro- (4) Implementation and experience. cess type and can be passed around as a Given the fact that the Department of De- parameter. For example, fense intends to have Ada replace 300 or so other languages currently in use and that process buffer pid; industry has also shown some interest in pid = create buffer(lOO) priority(l) Ada, the language probably will be used processor (3);

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 297 starts a process of type buffer on processor not know the type or name of the called 3, giving it priority 1 and passing the num- process. ber 100 as a parameter to it. A reference to The caller can specify the amount of time the process is returned in pid, which might it is willing to wait for its request to be be passed to another process to use for carried out, using the following construct: subsequent communication with the buffer within N ? pid.tname(params) : expr process. If process pid does not accept the tname (2) Communication and synchroniza- transaction call within N seconds, the call tion. Processes communicate through the is canceled, and the expression expr is eval- rendezvous mechanism. (Communication uated instead. This construct is equivalent through shared variables is not forbidden, to Ada’s timed entry call, although with an but no special language support is provided entirely different syntax. for it, and it will only work correctly on shared-memory machines.) A transac- Nondeterminism is expressed through a tion in Concurrent C differs from an Ada select statement, similar to the one used entry in that a transaction may return a by Ada. Concurrent C’s select statement is somewhat cleaner, because it is used only value. In addition, Concurrent C supports for dealing with nondeterminism, not for asynchronous transactions (equivalent to asynchronous message passing); such timed or conditional transaction calls. transactions may not return a value (3) Fault tolerance. A fault-tolerant [Gehani 19871. version of Concurrent C (called FT Con- Concurrent C supports a more powerful current C) based on replication of processes accept statement than Ada. Transactions has been designed [Cmelik et al. 19871. can be accepted conditionally, based on the values of their parameters, as in the (4) Implementation and experience. following example: Concurrent C has been implemented on a uniprocessor, a group of executable-code- accept tname(a, b, c) suchthat (a < b) compatible machines connected by an Eth- ernet, and a multiprocessor providing Only outstanding transaction calls for shared global memory [Cmelik et al. 19861. Concurrent C has been used in several which the expression after the suchthat evaluates to “true” will be accepted. The nontrivial applications, such as a distrib- order of acceptance can be controlled using uted version of “make” [Cmelik 19861, a a by clause: robot system [Cox and Gehani 19861, dis- crete event simulation [Roome 19863, and accept tname(a, b, c) by(c) ( . . . ) a window manager [Smith-Thomas 19861. The language is being merged with C++ Of all outstanding calls to transaction [Stroustrup 19861 to create a programming tname, the one with the lowest third language supporting both distributed pro- parameter will be accepted. gramming and classes [Gehani and Roome A transaction call is similar to a function 1986b]. call and can be used as part of an expression (since transaction calls may return values). 3.1.4 Remote Procedure Call The transaction call specifies the name of the called process along with the transac- RPC was first introduced by Brinch Han- tion name and supplies actual parameters. sen for his language Distributed Processes The process name can be any expression (see below) and has been studied in more that yields a process identifier. The trans- detail by Nelson and Birrell [Nelson 1981; action name is a static identifier. A specific Birrell and Nelson 19841. Remote pro- transaction of a specific process can be cedure calls are also used in several other assigned to a transaction pointer variable, languages, as shown in Table 5. (Most which can subsequently be used instead of languages based on atomic transactions these two names in an indirect transaction also use RPCs; these are discussed in call. With this mechanism, the caller need Section 3.1.7.)

ACM Computing Surveys, Vol. 21, No. 3, September 1989 298 . H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Table 5. Languages Based on Remote statement and the threads created to Procedure Call handle remote calls execute as pseudo- Remote procedure call parallel processes, scheduled nonpre- Lanauaee References emptively. They communicate through P’s global variables and synchronize through Cedar [Swinehart et al. 19851 guarded regions. Concurrent CLU [Hamilton 1984, Cooper and Hamilton 19881 Like the select statement of CSP and Distributed Processes [Brinch Hansen 19781 Ada, a guarded region in DP is based on LYNX [Scott 1985,1986,1987, Scott Dijkstra’s guarded command. A guarded re- and Cox 19871 gion allows a thread to wait until one of a P’ [Carpenter and Cailliau 19841 number of guards (conditional expressions) is true. When a thread is blocked in a guarded region, other threads in its process Distributed Processes. Brinch Hansen’s can continue their execution. The guards Distributed Processes (DP) [Brinch Han- have access to the input parameters of the sen 19783 is the successor to Concurrent remote call and to the process’s global vari- Pascal [Brinch Hansen 19753. Like Con- ables. Since other threads can change the current Pascal, DP is oriented toward real- global variables, the guards are repeatedly time systems programming. Instead of evaluated, until one or more of them is true. Concurrent Pascal’s monitor-based com- This is a major difference with the select munication scheme, DP processes commu- statement and makes the guarded region nicate using RPC. somewhat more powerful. Two forms of guarded regions are sup- (1) Parallelism. In DP the number of ported by DP. The when statement non- processes is fixed at compile time. The deterministically selects one true guard and intention is that there be one processor executes the corresponding statement. The dedicated to executing each process. Each cycle statement is an endless repetition of process, however, can contain several the when statement. threads of control running in pseudoparal- lel. A process definition contains an initial (3) Implementation and experience. statement, which may be empty; this is the DP is a paper design and has not been first thread. It may continue forever, or it implemented. An outline of a possible im- may finish executing at some point, but in plementation is given by Brinch Hansen either case the process itself continues to [1978]. exist; DP processes never terminate. Addi- tional threads are initiated by calls from 3.1.5 Multiple Communication Primitives other processes. Arrays of processes may be declared. A process can determine its array As can be seen from the previous sections, index using the built-in function this. many different communication and syn- chronization mechanisms exist, each with (2) Communication and synchroniza- its own advantages and disadvantages. As tion. DP processes communicate by call- there is no general agreement on which ing one another’s common procedures. Such primitive is best, some language designers a call has the form have taken the approach of providing a range of primitives, from which the pro- call P.f (exprs, vars) grammer can choose the one most suited to where P is the name of the called process the application. In addition, programmers and f is the name of a procedure declared can experiment with different primitives by P. The expressions are input param- while still using the same language. An eters; the return values of the call are important issue in the design of such a assigned to the (output) variables. language is how to integrate all these prim- The calling process (and all its threads) itives in a clean and consistent way. Ex- is blocked during the call. A new thread of amples of languages in this class are shown control is created within P. P’s initial in Table 6. We discuss SR below.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 299

Table 6. Languages Based on Multiple parent to the invoker of the operation. On Communication Primitives the invoker’s side, an operation may be Multiple communication primitives called asynchronously using a send or syn- Language References chronously using a call. A send blocks until the message has been delivered to Dislang [Li and Liu 19811 the remote machine; a call blocks until the Pascal-FC [Burns and Davies 19881 StarMod [Cook 1980, LeBlanc and Cook operation has been completed and any 19831 return values have been received. Several SR [Andrews 19811 calls can be grouped in a parallel call state- ment, which terminates when all calls have been completed. The operation and its re- Synchronizing Resources. Synchronizing source instance must be named explicitly Resources (SR) was developed by Gregory in the invocation. This is done using the Andrews et al. at the University of Arizona identifier for the resource returned by the [Andrews 1981, 1982; Andrews and Olsson create command. 1986; Andrews et al. 19881. SR is a language By combining the two modes of servicing for programming distributed operating sys- operations and the two modes of invoking tems and applications. It is based on Mod- them, four types of interprocess commu- ula, Pascal, and DP, and provides several nication can be expressed as shown in models of interprocess communication. Table 7. SR uses a construct similar to the select (1) Parallelism. An SR program con- statement (see Section 2.2.3) to deal with sists of one or more resources. A resource nondeterminism. The SR guarded com- is a module run on one physical (either mand, or alternative, has the following a single processor or a shared-memory mul- form: tiprocessor). Resources are dynamically entry-point(params) and bool-expr created (parameters may be passed), and by expr + statements optionally assigned to run on a specific machine. An identifier for the resource in- A guard may contain an entry point for an stance is returned by the create command. operation, a Boolean expression, and a A resource can contain several processes, priority expression. The two expressions and these may share data. Synchronization can refer to the actual parameters of the among these processes is supported by the operation. An alternative is enabled if there use of semaphores. Communication with is a pending invocation of the operation processes in other resources is restricted to and the Boolean expression evaluates to operations, discussed below. A resource may true. The expression in the by part is used contain an initialization and a termination for priorization when there are several process. These are created and run implic- pending invocations of the same operation. itly. A resource terminates when it is killed If all Boolean expressions are false, the by the destroy command. A program ter- process suspends. minates when all its processes terminate or block. (3) Fault tolerance. SR supports two rudimentary mechanisms for handling fail- (2) Communication and synchroniza- ures [Andrews et al. 19881. Exception han- tion. An SR operation definition looks dlers can be used to handle failures detected like a procedure definition. Its implemen- by the run-time system. For example, a tation can look like either a procedure or handler attached to an operation invoca- an entry point. When implemented as a tion is called if the invocation fails. A procedure, the operation is serviced by an when-statement can be used to ask the implicitly created process. When imple- run-time system to monitor a certain mented as an entry point, it is serviced by source (e.g., a process or processor) and to an already running process in a rendezvous. invoke a user-supplied operation if the The two types of implementation are trans- source fails.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 300 ’ H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Table 7. Four Types of Interprocess Communication in SR Call (synchronous) Send (asynchronous) Entry (synchronous) Rendezvous Message passing Process (asynchronous) RPC Fork

(4) Implementation and experience. An Seitz discuss the usage of the actor lan- implementation of SR on top of UNIXl’ is guage Cantor for programming fine-grain described by Andrews et al. [ 19881. It runs multicomputers [Athas and Seitz 19881. on collections of SUNS or VAXes and on ABCL/l uses asynchronous message the Encore Multimax. SR has been used to passing and an explicit construct for send- implement a parallel PROLOG ing several messages simultaneously to and the file system of the Saguaro distrib- different objects. Orient84/K is a multipar- uted operating system [Andrews et al. adigm language for programming knowl- 19881. edge-based systems, integrating object- oriented, logic, and parallel programming. 3.1.6 Object-Based Languages OIL is the intermediate language for the FAIM-1 symbolic multiprocessor system. The object-based approach to programming OIL integrates parallel, object-oriented, is becoming increasingly popular, not only logic, and procedural programming. Raddle in the world of sequential programs, but is a language for the design of large distrib- also for building distributed applications. uted systems. EPL is an object-based lan- The need for distributed objects arises guage based on Concurrent Euclid [Holt when, for example, operating systems and 19821. EPL is used with the Eden distrib- distributed problem solvers are modeled. uted operating system [Almes et al. 19851. Exploiting parallelism to speed up pro- It influenced the design of Emerald (dis- grams is usually considered to be a second- cussed below) and Distributed Smalltalk, ary issue, to be dealt with by the language which is based on Smalltalk-80. implementation. In most parallel object-based or object- Emerald. Emerald is an objeci-based oriented13 languages (see Table 8), parallel- programming language for the implemen- ism is based on assigning a parallel process tation of distributed applications. It was to each object, so objects become active developed at the University of Washington components. This method is used, for ex- by Andrew Black and others [Black et al. ample, in the languages Concurrent Small- 1986, 1987; Hutchinson 1987; Jul et al. talk, CLIX, Emerald, Hybrid, Ondine, 1988; Jul19881. POOL, Sloop, and in the actor languages Like Smalltalk-80, Emerald considers all Act 1, Cantor, and CSSA. Concurrent entities to be objects. For the programmer, Smalltalk (based on Smalltalk- [Gold- both a file accessible by many processes berg and Robson 19831) also supports and a Boolean variable local to a single asynchronous message passing to increase process are objects. Objects can either be parallelism even further. Actor languages passive (data) or active. Unlike Smalltalk- [Hewitt 1977; Agha 19861 are related to 80, Emerald is a strongly typed language object-oriented languages, but arrange ob- and has no classes or inheritance. jects in dynamically changing hierarchies, Abstract types are used to define the in- rather than in static classes. Athas and terface to an object. An abstract type can be implemented by any object supporting at least the operations specified in the ” is a registered trademark of AT&T Bell Lab- interface. The was designed to oratories. I3 As discussed in Section 2.1.1, we define a language allow multiple implementations of the same to be object oriented if it has inheritance, and object type to coexist; new implementations can based if it supports objects but lacks inheritance. be added to a running system. The pro-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 301

Table 8. Object-Oriented, Object-Based, and Actor Languages Objects Language References ABCL/l [Yonezawa et al. 19861 Act 1 [Lieberman 19871 ALPS [Vishnubhotia 19881 Cantor [Athas and Seitz 19881 CLIX [Hur and Chon 19871 Cluster 86 [Lujun and Zhongxiu 19871 ConcurrentSmalltalk [Yokote and Tokoro 1986,1987a, 1987b] CSSA [Nehmer et al. 19871 Distributed Smalltalk [Bennett 19871 Emerald [Black et al. 19861 EPL [Black et al. 19841 Hybrid [Nierstrasz 19871 Mentat [Grimshaw and Liu 19871 OIL [Davis and Robison 19851 Ondine [Ogihara et al. 19861 Orient84/K [Ishikawa and Tokoro 19871 POOL [America 19871 Raddle [Forman 19861 SINA [Aksit and Tripathi 19881 Sloop [Lucco 19871 grammer can supply different implemen- itor construct. The internal process can tations, each tailored to a specific use. enter the monitor by calling an operation Alternatively, the compiler can automati- of its own object. Within a distributed cally generate different implementations system, many objects can run in parallel. from the same , tailored for local Emerald provides the same semantics for objects or distributed objects. local and remote invocations. At any given time, an object is on a single (1) Parallelism. Parallelism is based on processor, called its location. In general, the simultaneous execution of active ob- programmers do not have to worry about jects. Since objects in Emerald can be locations, because the semantics of opera- moved from one processor to another (as tion invocations are location independent. discussed below), the language essentially Some applications, however, are better off supports . This is the when they can control the locations of most flexible mapping strategy discussed in objects. For example, two objects that com- Section 2.1.2. municate frequently can be put on the same processor. Conversely, objects that are rep- (2) Communication and synchroniza- licas of the same data should be located on tion. An object consists of four parts: a different processors, to reduce the chance name, a representation, a set of operations, of losing the data after a processor crash. and an optional process. The name uniquely In Emerald, global objects can be moved identifies the object within the distributed from one processor to another. Such a move system. The representation contains the may be initiated either by the compiler data of the object. Objects communicate by (using compile-time analysis) or by the pro- invoking each other’s operations. There grammer, using a few simple language can be multiple active invocations within primitives. One important case is where an one object. The optional process runs in object is passed as a parameter in a remote parallel with all these invocations. The operation. Every access to the parameter invocations and the data shared by these object will result in an extra remote invo- invocations can be encapsulated in a mon- cation. The obvious solution is to pass a

ACM Computing Surveys, Vol. 21, No. 3, September 1939 302 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

copy of the object as a parameter, but this Table 9. Languages Based on Atomic Transactions changes the parameter mechanism into Atomic transactions call-by-value. For object-based languages, Language References call-by-reference is more natural. The so- lution in Emerald is to optimize such calls Aeolus [Wilkes and LeBlanc 19861 by first moving the parameter object to the Argus [Liskov and Scheifler destination processor, then doing the call, 19831 and optionally moving the object back. As Avalon [Detlefs et al. 19881 this case occurs frequently, Emerald intro- Camelot Library [Bloch 19881 duces a new parameter passing mode, called call-by-move, to accomplish this efficiently (i.e., with low message-passing overhead). a collection of macros and em- bedded in the C language. The Camelot (3) Implementation and experience. An Library takes care of many low-level details important goal of a good implementation of transaction and object management, of Emerald is to recognize simple opera- thus facilitating the implementation of tions on small objects and to treat them Camelot servers and applications. This ap- efficiently. For example, an addition of two proach avoids designing a totally new lan- local integer variables is compiled to inline guage, while providing higher level code. Local calls to objects that cannot primitives than traditional system calls. move essentially take a local procedure call. Global objects (which are allowed to move) Argus. Argus [Liskov 1982; Liskov and are referenced indirectly through an object Scheifler 1983; Liskov 1984, 1988; Weihl descriptor, which either contains the ad- and Liskov 19851, being developed at MIT dress of the object if it is local, or tells by and colleagues, is based where to find the object in the distributed on CLU [Liskov et al. 19771 and Extended system. When an object moves to another CLU [Liskov 19791. It provides support for processor, the descriptors on its old and fault-tolerant distributed programming, in new locations are updated. particular for applications requiring a high A prototype distributed implementation degree of reliability and availability, such of Emerald has been built, running on DEC as banking, airline reservation, and mail MicroVax II and SUN workstations con- systems. Its main features are guardians nected by Ethernet. (modules that can survive crashes) and ac- Emerald has been used to implement a tions (groups of atomic executions). mail system, a replicated name server, a (1) Parallelism. An Argus module, shared appointment calendar system, and called a guardian, contains data objects and several other applications [Jul et al. 19881. procedures for manipulating those objects. A guardian may contain background and 3.1.7 Atomic Transactions recover sections and may have several Several languages that were specifically creator and handler procedures. A crea- designed for building fault-tolerant appli- tor procedure is run when an instance of cations support atomic transactions in the guardian is being made. The handler combination with RPC (see Table 9). procedures are run on behalf of processes Aeolus and Avalon are built on top of ex- outside of the guardian. The recover section isting systems that already support atomic is executed when the guardian is started transactions. Aeolus provides language up again after a crash. The background support for the Clouds operating system section is intended for doing periodic tasks [LeBlanc and Wilkes 19851. Avalon is being and is run continually during the life of the implemented on top of the Camelot distrib- guardian. uted transaction management system A guardian instance is created dynam- [Spector et al. 19861. Camelot applications ically by a call to a creator procedure. can also use the Camelot Library, which is A creator may take parameters, and the

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 303 guardian may be explicitly placed on a of atomic data types. Argus has some built- node: in atomic types, and users can define their own. The types of atomic objects deter- guardianType$creator(params) mine the amount of parallelism of actions Q, machinex permitted. More than one process may be running (3) Fault tolerance. Some of the guard- in a guardian instance at a given time. If ian’s objects may be declared as stable; the guardian contains a background sec- they are kept on stable storage. If a node tion, a process is created to run it. In ad- crashes, the guardian can be brought up dition, each time a call is made to one of again by retrieving its stable objects from the guardian’s handlers, a process is created store and executing its recover section. to run the appropriate handler proce- Argus supports two types of atomic exe- dure. A guardian can terminate itself by cutions: topactions and nested subactions executing the terminate statement. [Moss 19811. Changes only become per- Parallelism results from simultaneous manent (and stable objects written back to execution of guardians. Pseudoparallelism stable storage) when a topaction commits. results from the implicit creation of a new A subaction is indivisible, but its effects are process for each handler call within a not made permanent until its top-level ac- guardian. Pseudoparallel execution can tion commits. If a top-level action aborts, also be expressed using a coenter state- its subactions have no effect at all. On the ment. A coenter terminates when all its other hand, if a subaction aborts, its parent components have finished, and one com- action is not forced to abort. Nested sub- ponent may terminate the rest prematurely actions can be used for dealing with com- by transferring control out of the coenter munication failures and for increasing par- statement. allelism. An action can also start up a new topaction. (2) Communication and synchroniza- tion. Processes running in the same (4) Implementation and experience. A guardian instance can communicate using UNIX-based implementation of Argus on shared variables. Processes belonging to a collection of MicroVax II workstations is different guardians, however, can only com- described by Liskov et al. [ 19871. municate using handler calls. A handler call One Argus application reported in the is a form of RPC, with arguments passed literature is a distributed collaborative ed- by value. Guardian and handler names may iting system (CES), which allows a group be passed as parameters. Argus provides of coauthors to edit a shared document synchronization mechanisms at two levels: [Greif et al. 19861. A number of problems one for pseudoparallel processes; the other with the language design were identified for parallel actions. during this experiment. When an action The mutex type provides mutually ex- aborts, for example, no user code is acti- clusive access to objects shared by processes vated, the run-time system does all the within a guardian. The seize construct de- processing automatically. In some cases, lays a process until it can gain possession however, the application also needs to do of the given mutex object; the process gives some processing after an abort (e.g., in CES up possession again when it finishes exe- an abort sometimes implies updating a cuting the seize body. The pause call can screen). The implementors of CES reported be made when a process encounters a delay that their task would have been signifi- (such as an unavailable resource) and cantly simplified had Argus provided more wants to give up the mutex object while it explicit control over action aborts and suspends for a while. commits. In order to allow parallelism of actions, Another application implemented in while retaining their atomic semantics, Argus, a distributed mail repository, is atomic objects are used, which are instances described by Day [ 19871.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 304 ’ H. E. Bal, J. G. Steiner, and A. S. Tanenbaum Aeolus. Aeolus is a systems program- object, whereas custom recovery need only ming language for implementing fault-tol- checkpoint those parts of the object state erant servers for the Clouds distributed that have been indicated by the program- operating system. Aeolus provides abstrac- mer. Automatic synchronization allows tions for the Clouds notions of objects, multiple read-operations to execute simul- actions, and processes, and provides access taneously, but serializes all operations that to the recoverability and synchronization modify any part of the object state. features supported by Clouds. Both Clouds Aeolus gives programming language sup- and Aeolus are being developed at Georgia port for Clouds objects and actions. An Institute of Technology by Richard object type in Aeolus consists of a definition LeBlanc and colleagues [LeBlanc and part and an implementation part. The for- Wilkes 1985; Wilkes and LeBlanc 1986, mer contains the name of the object type 19881. and the operations allowed on objects of that type. It also specifies whether recovery (1) Parallelism. Aeolus is object based should be done by the system, by the pro- in the sense that it supports data abstrac- grammer, or not at all, and it may specify tions. Unlike in Emerald, however, objects that synchronization is to be done auto- in Aeolus are passive (see Section 2.1.1). matically. Programmed synchronization is Aeolus therefore supports aprocess concept based on critical sections (for mutual exclu- for providing parallel activity. sion) and on explicit locking using various (2) Communication and synchronization. lock types. The declaration of a lock type Communication and synchronization in specifies a number of modes, a compatibility Aeolus are expressed through operation in- relation between the nodes, and (option- vocations on objects, as discussed below. ally) a domain of values to be locked. For example, a read-write lock type over file (3) Fault tolerance. We first give a brief names of 14 characters can be declared as description of salient features of Clouds follows: related to fault tolerance and then discuss how Aeolus supports these features. The type rw-lock is Clouds distributed operating system sup- lock (read: [read], write: [ 1) ports atomic transactions on objects. As in domain is string( 14) Argus, actions can be nested. A Clouds object is a passive entity that encapsulates This declaration introduces two modes, data. The data of an object can only be “read” and “write,” and specifies that sev- manipulated by invoking operations (re- eral “read” locks on file names may be mote procedures) defined for the object. obtained, but that “write” locks are exclu- Objects are created dynamically. Each sive, as they are compatible with no other instance of an object type has its own state, mode. The usage of a domain allows a lock consisting of the global variables used by to be separated from the data being locked. the implementation of the operations. Ob- The support provided by Aeolus for pro- jects may be replicated in order to increase gramming with actions is rather low level. availability [Wilkes and LeBlanc 19881. The language provides direct access to the Clouds supports so-called recoverability Clouds primitives. Programmers may write and synchronization of objects. Recovera- their own action event handlers, procedures bility allows objects to survive processor that are called when an action event (such crashes. Synchronization ensures that par- as commit or abort) happens. allel operation invocations are ordered such Aeolus gives the programmer more flex- that they do not interfere with each other. ibility than Argus for optimizing recovery Both features can be handled automatically and synchronization. On the negative side, by the Clouds kernel or can be custom the many features thus introduced make programmed for higher efficiency, using se- Aeolus a fairly complex language. mantic knowledge about the problem being implemented. Automatic recovery is based (4) Implementation and experience. A on checkpointing the entire state of the compiler and run-time system for Aeolus

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 305 have been implemented. Aeolus has not Table 10. Parallel Functional Languages yet been used for any major distributed Parallel functional languages applications. Language References Blaze [Mehrotra and van Rosen- 3.2 Languages with Logically Shared dale 19871 Concurrent LISP [Sugimoto et al. 19831 Address Spaces FX-87 [Jouvelot and Gifford 1988] We now turn our attention from languages Lisptalk [Li 19881 Multilisp [Halstead 19851 with logically distributed address spaces to ParAlfl [Hudak 19861 languages providing logically shared ad- PML [Reps 19881 dress spaces. In particular, we look at three QLISP [Gabriel and McCarthy subclasses: parallel functional languages, 1984,1988] Symmetric LISP [Gelernter et al. 1987a, parallel logic languages, and languages 1987b] based on distributed data structures (see Figure 4). Languages based on shared vari- ables (e.g., Algol68 [van Wijngaarden et al. 19751, Concurrent Pascal [Brinch Hansen tions and to determine which expressions 19751, and MESA [Geschke et al. 19771) to evaluate in parallel. are not discussed here. They can (at least Multilisp, QLISP, and Concurrent LISP in principle) be implemented on a distrib- are intended primarily for shared-memory uted system, using techniques like Shared machines and would be less efficient on Virtual Memory [Li and Hudak 19861, but they were designed for shared-memory distributed systems. Blaze is a Pascal-based multiprocessors. (For a detailed discussion language for parallel scientific program- ming that supports the functional program- of shared-variable languages, we refer the ming model. It uses functional parallelism reader to Andrews and Schneider [1983].) as well as through a parallel loop-construct. 3.2.1 Parallel Functional Languages ParAlfl. ParAlfl [Hudak and Smith Pure functional languages are being studied 1986; Hudak 1986, 19881 is a parallel func- in several parallel programming projects tional language developed by Paul Hudak (see Table 10). The implicit parallelism in at Yale University. functional languages is especially suited for closely coupled architectures like dataflow (1) Parallelism. ParAlfl employs im- machines; whereas distributed computing plicit, functional parallelism. Functional systems are in general more coarse grained. parallelism is usually fine grained, resulting Nevertheless, functional languages can be in many small tasks that can be done in used for programming distributed systems parallel. As there may be far more parallel by providing a mapping notation that effi- tasks than physical processors, ParAlfl ciently distributes computations among uses a mapping notation to specify which processors. This approach is taken by the expressions are to be evaluated on which language ParAlfl (discussed below). processors. An expression followed by the Nonpure functional languages can also annotation $on proc will be evaluated on be based on functional parallelism, but they the processor determined by the expression require a mechanism for determining which proc. This proc expression can be relative expressions can be evaluated in parallel. to the currently executing processor. For The language FX-87 uses an effect system example, if the expression for this purpose. An effect is a static de- scription of the side effects of an expres- (f(x) $on (&elf - 1)) sion. The effect of a function can be + k(y) Son (@elf + 1)) specified by the programmer and checked by the compiler. The compiler uses the is executed by Processor P, then Processor effect information to do certain optimiza- P - 1 executes f(x), Processor P + 1

ACM Computing Surveys, Vol. 21, No. 3, September 1989 306 . H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

executes g(y) (in parallel), and Processor Table 11. Parallel Logic Languages P itself performs the addition. Parallel logic languages (2) Communication and synchroniza- Language References tion. Communication and synchroniza- BRAVE [Reynolds et al. 19881 tion between parallel computations are Concurrent PROLOG [Shapiro 19871 implicit, so there is no need for explicit Delta PROLOG [Pereira et al. 19861 language constructs. A computation auto- Guarded Horn clauses [Ueda 19851 Mandala [Ohki et al. 19871 matically blocks when it needs the result oc [Takeuchi and Furukawa of another computation that is not yet 19861 available. PARLOG [Clark and Gregory 19861 The semantics of ParAlfl are based on P-PROLOG [Yang and Aiso 1986, Yang lazy evaluation, which means that an 19881 QW [Sato 19871 expression is not evaluated until its result Relational Language [Clark and Gregory 19811 is needed. In general, the programmer need Vulcan [Kahn et al. 19861 not be concerned with the order in which computations are done. For efficiency rea- sons, he or she may want to control the evaluation order, however. For this pur- their Relational Language. Most parallel pose, ParAlfl supports eager expressions, logic languages are based on AND/OR which are evaluated before their results parallelism, shared logical variables, and are needed, and synchronizing expressions, committed choice nondeterminism (see which constrain the evaluation order. Sections 2.1.1, 2.2.2, and 2.2.3). Examples ParAlfl programs are fully deterministic, are Concurrent PROLOG and Flat Concur- provided that a few simple restrictions on rent PROLOG (discussed below), PAR- proc expressions are satisfied. This means LOG (also discussed below), guarded Horn that the results of such programs do not clauses (GHC), and Oc. P-PROLOG is also depend on how the computations are dis- based on shared logical variables, but uses tributed among the processors. In particu- a mechanism called exclusive guarded Horn lar, the results of a program will be the clauses for controlling OR-parallelism. For same whether executed on a uniprocessor a normal guarded Horn clause, if several or on a parallel system. clauses for a given goal have a guard that evaluates to “true,” then one of them is (3) Implementation and experience. chosen nondeterministically. For an exclu- ParAlfl has been implemented on the En- sive guarded Horn clause, however, the ex- core Multimax multiprocessor and on two ecution of the goal suspends, until exactly distributed architectures (hypercubes): the one guard evaluates to “true.” (Note that a Intel iPSC and the NCube [Goldberg and guard that initially succeeds can later fail, Hudak 19861. The language has been used if one of the variables used by the guard for implementing several parallel algo- gets bound.) rithms (e.g., divide-and-conquer, linear BRAVE is a parallel logic language that equations, partial differential equations) does not use committed choice nondeter- [ Hudak 19861. minism, but supports true OR-parallelism. Mandala combines object-oriented and 3.2.2 Parallel Logic Languages logic programming. Quty combines func- tional and logic programming. Many of the underlying ideas of paralle114 Delta PROLOG is significantly different logic programming languages (see Table 11) from the languages mentioned above. It is were introduced by Clark and Gregory for based on message passing rather than on shared logical variables, it uses only AND- I4 The logic programming community has adopted the term concurrent logic language rather than parallel parallelism, and it supports PROLOG’s logic language; for consistency with the rest of the completeness of search by using distributed paper, we use the latter term, however. backtracking.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 307

Concurrent PROLOG. Concurrent PRO- of these separate environments is difficult LOG was designed by Ehud Shapiro of the to implement, even on a single-processor Weizmann Institute of Science in Rehovot, machine. The need for environments has Israel [Shapiro 1983, 1986, 19871. Concur- been eliminated in a subsequent language, rent PROLOG uses many of the ideas pro- called Flat Concurrent PROLOG (FCP) posed by Clark and Gregory for their [Mierowsky et al. 19851. In FCP, guards Relational Language. Shapiro and his may only contain a predefined set of pred- group, however, have developed several new icates, rather than user-defined predicates, programming techniques for languages like so nesting of guards is ruled out. This also Concurrent PROLOG. virtually eliminates OR-parallelism, but a method has been designed to compile OR- (1) Parallelism. Parallelism in Concur- parallel programs into AND-parallel pro- rent PROLOG comes from the AND-par- grams [Codish and Shapiro 19861. allel evaluation of the goals of a conjunction and from the OR-parallel evaluation of the guards of a guarded Horn clause, as dis- (3) Implementation and experience. A cussed in Sections 2.1.1 and 2.2.3. There is uniprocessor implementation of Flat Con- no sequential AND-operator, so every goal current PROLOG exists for several types of a conjunction creates a new parallel pro- of UNIX machines [Houri and Shapiro cess. The textual ordering of the goals has 19861. The implementation supports the no semantic significance. A mapping nota- Logix programming environment and op- tion has been designed for assigning pro- erating system [Silverman et al. 19861. The cesses to processors, as discussed in Section novelty in the implementation is its effi- 2.1.2. cient support for the creation, suspension, activation, and termination of lightweight (2) Communication and synchroniza- processes. The performance is comparable tion. Parallel processes communicate to conventional uniprocessor PROLOG through shared logical variables. Synchro- implementations. nization is based on suspension on read- A distributed implementation of FCP only variables. A variable is marked as read- was developed for the Intel iPSC hypercube only by suffixing it with a “?.” Unification [Taylor et al. 1987b]. The key concepts in of two terms suspends if an attempt is made the implementation are data distribution to instantiate a read-only variable. Thus, by demand-driven structure copying and Concurrent PROLOG extends the unifica- the use of a specialized two-phase lock- tion of PROLOG [Robinson ing protocol to implement FCP’s atomic 19651 with a test for read-only variables. unification. Concurrent PROLOG uses guarded Horn Several applications have been written in clauses to deal with nondeterminism. There Concurrent PROLOG. Shapiro [ 19871 con- is no restriction on the kinds of goals that tains separate papers on Concurrent may appear in a guard, so a guard may PROLOG implementations of systolic al- create other AND-parallel processes. As gorithms, the Maxflow problem (determin- these processes may invoke new guards, ing the maximum flow through a network), this may lead to a system of arbitrarily region finding in a self intersecting polygon, nested guards. This creates a problem, as image processing, the Logix system, a only the guard of the clause that is com- distributed window system, a public-key mitted to may have side effects (see Section system, an equation solver, a compiler 2.2.3). Therefore, a new environment is cre- for FCP, and a hardware simulator. Most ated for every guard of a guarded Horn experiences reported are quite positive; clause, containing the bindings made by actual performance measurements are ab- that guard. On commitment, the environ- sent in nearly all papers. ment of the chosen guard is unified with Several programming techniques have the goal being solved. The environments of been developed that can be used for sys- all other guards are discarded. Maintenance tems and application programming in

ACM Computing Surveys, Vol. 21, No. 3, September 1989 308 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Concurrent PROLOG. Streams, bounded arator do not match. In the example below, buffers, and incomplete messages were 1. mentioned in Section 2.2.2. Streams and A t (B & C), (D & E); merging of streams can be expressed 2. A+F,G. in Concurrent PROLOG [Shapiro and 3. AtH& J. Mierowsky 1984; Shapiro and Safra Clause 1 is tried first, by doing “(B & C)” 19861. The “short-circuit” technique imple- and “(D & E)” in parallel. If Clause 1 fails, ments distributed termination detection. Clauses 2 and 3 are tried in parallel. F and “Metaprogramming” and “metainterpre- G are evaluated in parallel, but Hand J are ters” are studied by Safra and Shapiro done sequentially. [1986]. Systolic programming is a well- The presence of sequential AND/OR op- known technique for executing numerical erators requires the implementation to de- in parallel on special-purpose termine when a group of parallel processes hardware [Kung 19821. Shapiro [1984] has terminated, which is not a trivial task shows that systolic algorithms can also be in a distributed environment. For example, expressed in Concurrent PROLOG, so they in “B & C,” all the processes created by B can be run on general-purpose hardware. must have terminated before C is started. Concurrent PROLOG can also be used for This additional complexity is the main rea- object-oriented programming [Shapiro and son why Concurrent PROLOG supports Takeuchi 19831. Kahn et al. have designed only the parallel operators. a preprocessor language for Concurrent PROLOG (called Vulcan), which allows (2) Communication and synchroniza- object-oriented programs to be written with tion. Processes communicate through less verbosity [Kahn et al. 19861. shared logical variables and synchronize by suspending on unbound shared variables. PARLOG has a mechanism for specifying PARLOG. PARLOG is a parallel logic which processes may generate a binding for programming language being developed at a variable. For every relation, a Imperial College, London, by Keith Clark mode dec- laration must be given that specifies which and Steve Gregory [Clark and Gregory arguments are input and which are output. 1985,1986; Foster et al. 1986; Gregory 1987; For example, the declaration Ring-wood 1988; Clark 19881. It is a descen- dant of IC-PROLOG [Clark et al. 19821 mode append(listl?, list2?, and the Relational Language [Clark and appended-list-). Gregory 19811. Like Concurrent PROLOG, PARLOG is based on AND/OR parallelism defines the first two arguments to the ap- and committed choice nondeterminism. pend relation to be input and the third one The main innovation introduced by the to be output. An actual argument appearing language is the use of mode declarations to in an input position will only be used for control synchronization. input matching. If unification of the argu- ment with the corresponding term in the (1) Parallelism. PARLOG uses AND/ head can only succeed by binding a variable OR parallelism that can be controlled by appearing in the input argument, then the the programmer. There are two different unification will suspend. The unification conjunction operators: “,” evaluates both will be resumed when some other process conjuncts in parallel, and “8~” evaluates generates a binding for the variable. After them sequentially (left to right). The commitment, any actual argument appear- clauses for a relation can be separated ing in an output position is unified with the either by a “.” or by a ‘I;” operator. In output argument in the head of the clause. finding a matching clause for a goal, all PARLOG uses three specialized unifica- clauses separated by a “.” are tried in par- tion primitives for input matching, equality allel (OR-parallelism). Clauses after a “;” testing, and output unification. In contrast, are only tried if all clauses before the sep- Concurrent PROLOG has a general unifi-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 309 cation algorithm, which also has to take Table 12. Languages Based on Distributed care of read-only variables. Data Structures Like Concurrent PROLOG, PARLOG Distributed data structures uses guarded Horn clauses for nondeter- Language References minism. A guard in PARLOG may test any Linda [Ahuja et al. 19861 input variables and bind local variables of Orca [Bal and Tanenbaum 19881 the clause, but it may not bind variables SDL [Roman et al. 19881 passed in an input argument. This is Tuple Space Smalltalk [Matsuoka and Kawai 19881 checked at compile time, using mode dec- larations. If a guard tries to bind an output variable, the actual binding is established University [Gelernter 1985; Ahuja et al. only after commitment. Unlike Concur- 1986; Carrier0 et al. 19861. Linda is not rent PROLOG, no environments need be based on shared variables or message pass- maintained. ing, but uses a novel communication mech- anism: the Tuple Space. It supports (but (3) Implementation and experiertce. does not enforce) a programming method- The compiler can use the information in a ology based on distributed data structures mode declaration to increase efficiency. A and replicated workers. The goal of this sequential implementation of PARLOG is methodology is to release the programmer described by Foster et al. [ 19861. The com- from thinking in terms of parallel compu- piler first compiles PARLOG programs tations and simultaneous events, hence into a subset called Kernel PARLOG, in making parallel programming conceptually which all unifications are performed by ex- similar to sequential programming. plicit unification operators. Kernel PAR- LOG programs are subsequently compiled (1) Parallelism. Linda provides a sim- to code for an abstract machine, called the ple primitive (called eval) to create a se- Sequential PARLOG Machine (SPM), quential process.15 Linda does not have a which is emulated on a real machine. notation for mapping processes to proces- A distributed implementation of PAR- sors. With the replicated worker model LOG on a network of SUNS is described by (discussed below), there is no need for such Foster [ 19881. The implementation uses a notation, as each processor executes a some of the ideas of the distributed FCP single process. implementation (see above), but differs in supporting distributed termination and (2) Communication and synchroniza- detection. Also, as PARLOG does tion. Linda’s underlying communication not have atomic unification, distributed model, the Tuple Space (TS), was discussed unification is implemented without using a in Section 2.2.2. Processes communicate by two-phase locking protocol. inserting new tuples into TS and by reading PARLOG has been used for discrete or removing existing tuples. Processes event simulation, the specification and synchronize by waiting for tuples to be verification of communication protocols, available, using blocking read and in op- a medical diagnosis expert system, and erations. natural-language parsing [Gregory 1987; Traditional communication primitives Clark 19881. (e.g., message passing and remote proce- dures) can be simulated using operations on TS [Gelernter 19851, so algorithms that 3.2.3 Distributed Data Structures split up the work among several communi- cating processes can be expressed in Linda. Distributed data structures are used in Linda, Orca (both discussed below), SDL, and Tuple Space Smalltalk (see Table 12). I5 An earlier version of Linda provided constructs for parallel execution of a group of statements [Gelernter Linda. Linda is being developed by 19851. We describe the current version here, which is David Gelernter and colleagues at Yale based on C.

ACM Computing Surveys, Vol. 21, No. 3, September 1939 310 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Alternatively, Linda programs can use the man problem, parameter sensitivity anal- so-called replicated workers style [Ahuja et ysis, ray tracing, numerical problems al. 19861. Such a program consists of P [Gelernter and Carrier0 19881, and a dis- identical (replicated) worker processes, one tributed backtracking package [Kaashoek for each processor. The work to do is stored and Bal 19881. in a distributed data structure, which is implemented in TS and is accessible by all Orca. Orca is being developed by Henri worker processes. Each process repeatedly Bal and Andrew Tanenbaum.at the Vrije takes some work from the data structure, Universiteit in Amsterdam [Bal and Ta- performs it, puts back the results into the nenbaum 1988; Bai 19891. The language is data structure, and possibly generates some primarily intended for the implementation more work. All workers essentially perform of parallel algorithms on distributed sys- the same kind of task, until all work is tems. Orca allows processes on different done. The workers are loosely coupled; they processors to share data structures that are only interact indirectly through the data engapsulated in passive objects, which are structure. This model is claimed to have instances of abstract data types. several advantages [Carrier0 et al. 19861. In principle, any number of processors can (1) Parallelism. Parallelism in Orca is be used (including just one). Also, the work based on sequential processes. Orca pro- is automatically and fairly distributed vides an explicit fork primitive for spawn- among the workers. Finally, process man- ing a new child process and passing agement is easy, as there usually is only parameters to it. Parameters may be value one process per processor. or shared, as specified in the declaration of the child process. With value parameters, a copy of the (3) Fault tolerance. A fault-tolerant actual parameter is created and passed to network kernel for Linda, based on repli- the child process. This mode is allowed for cation of the TS, has been designed by Xu any type of parameter, including data struc- [Xu 19881. tures like sets and graphs. Unlike most procedural languages, Orca does not pro- (4) Implementation and experience. vide pointers for building graphs; instead, Implementations exist for running Linda graphs are built-in data types, just like programs on Bell Labs’ S/Net [Carrier0 arrays and sets. This eliminates the prob- and Gelernter 19861, an Ethernet-based lem described in Section 2.2.1 of pass- MicroVax network, the iPSC hypercube ing complex data structures around in a [Gelernter and Carrier0 19861, the Encore distributed system. Multimax, the Sequent Balance, and other The second parameter mode-shared- configurations. Different implementation is only allowed for parameters of (user- strategies are discussed by Carrier0 [ 19871. defined) abstract data types. The actual A hardware has been de- parameter must be a variable of the same signed by Venkatesh Krishnaswamy that abstract type. The variable, called a data supports Linda communication patterns object, is shared by the parent and child and tuple matching [Ahuja et al. 19881. process. A child process may pass the object Several (on the order of thousands) nodes, as shared parameter to its children, and so consisting of some CPU and the Linda on, so, in general, there will be a hierarchy coprocessor, can be arranged into a grid of processes sharing objects. using this new hardware to form a highly parallel Linda Machine. (2) Communication and synchroniza- One of Linda’s main goals is to achieve tion. Processes communicate indirectly high speedups for real-life problems. Appli- through shared data objects. As described cations for which Linda programs have above, such objects are instances of ab- been written include DNA-sequence com- stract data types. An abstract parison, database search, VLSI simulation, definition consists of a specification part heuristic monitoring, the Traveling Sales- and an implementation part. The speci-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 9 311

fication part lists the operations that can based on statistics collected during execu- be applied to objects of the given type. Each tion by the run-time system. operation is applied to a single object; other A compiler front-end for Orca has been objects (or regular data) can be passed as built using the value parameters. All operations to the [Tanenbaum et al. 19831. The compiler same object are executed indivisibly. For translates Orca programs into a machine- example, if X is a shared object of type independent intermediate code, which is IntegerObject, and the specification part subsequently compiled into object code and of IntegerObject contains the operation linked with a machine-specific run-time system. The compiler also generates de- operation increment(by: integer); scriptive information to be used by the then the invocation run-time system, such as which opera- tions modify their objects and which are X$increment(lB); read-only. applies the operation increment to object Orca has been used for the implementa- X, using the constant 12 as value parame- tion of several applications, including par- ter. If multiple processes simultaneously allel branch-and-bound, computer chess, try to increment object X, then all these and graph algorithms. invocations will (at least conceptually) be serialized. 4. CONCLUSIONS The implementation part of an abstract data type specifies the data contained in The main reasons for running an applica- each variable (object) of the abstract type tion on a distributed computing system are and contains code implementing the oper- high speed through parallelism, high relia- ations. An operation implementation may bility through replication of processes consist of one or more guarded statements. and data, and functional specialization. In If so, an invocation of the operation blocks addition, there are applications that are until at least one of the guards succeeds; inherently distributed, such as sending next, one true guard is chosen nondeter- electronic mail between geographically sep- ministically, and its statements are exe- arated computers. The main issues distin- cuted without blocking again. An operation guishing distributed programming from cannot block halfway during its execution. sequential programming are parallelism, communication and synchronization, and (3) Implementation and experience. partial failures. The shared data-object model can be im- Parallelism was already employed in lan- plemented efficiently on a distributed sys- guages designed for implementing unipro- tem by replicating objects. If a shared object cessor operating systems, which are easier is changed infrequently, communication to understand as collections of processes overhead is decreased by maintaining cop- than as monolithic programs. The earliest ies of the object on those processors that languages were based on pseudoparallelism read it frequently and by applying read- (no two processes are ever executed simul- only operations to the local copy. There are taneously), but with the advent of multiple- several different ways of deciding where to processor systems, interest shifted toward store copies of an object and how to update employing real parallelism to speed up pro- all these copies in a consistent way [Bal grams. Parallelism was also welcomed by and Tanenbaum 19881. One prototype im- the designers of programming languages plementation of Orca exists that replicates based on paradigms like logic program- all objects on all processors and updates the ming, functional programming, and object- copies using a reliable, ordered broadcast oriented programming. They realized that primitive (see Section 2.2.1). Orca has also parallelism might be the solution to the been implemented on top of the Amoeba problem of obtaining an efficient imple- distributed operating system [Tanenbaum mentation of their languages. This has led and van Renesse 19851. This implementa- to languages in which the details of man- tion uses selective replication of objects, aging parallelism are handled more by the

ACM Computing Surveys, Vol. 21, No. 3, September 1989 312 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

run-time system and less by the program- distributed programming left the program- mer, yielding a higher level of abstraction. mer to deal with fault tolerance. Later on, The early pseudoparallel languages and languages based on atomic transactions operating systems used shared variables for were introduced. The atomic transaction is interprocess communication. Mechanisms a powerful mechanism for managing par- like semaphores and monitors were in- allel access to data. Languages based on vented for synchronizing access to shared this abstraction take care of a lot of low- data in a clean way. A different approach level details, such as locking and version was taken in the RC4000 operating system management. [Brinch Hansen 19731, which used message As a result of all these developments, passing for interprocess communication. we see several languages that differ sig- Later, message passing was also introduced nificantly from their Pascal-based pred- as a programming language primitive ecessors and that provide high-level [Hoare 19781. This resulted in many abstractions for distributed programming. Pascal-like languages based on some form Unlike many of their predecessors, how- of message passing. These languages were ever, most novel languages have yet to es- used for programming distributed systems tablish themselves as practical tools for the as well as shared-memory multiprocessors. development of distributed software. As the The next step was the development of advances in hardware technology do not higher level paradigms for interprocess show any signs of slowing down, we expect communication, such as rendezvous, re- distributed architectures to continue to be- mote procedure calls (RPC), and distrib- come more widely available, more generally uted data structures. used, and to be programmed in increasingly A similar development took place in the advanced languages. A summary of the lan- area of techniques for fault-tolerant dis- guages mentioned in this paper is given in tributed applications. Early languages for Table 13.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 313

APPENDIX Table 13. Overview of Languages for Distributed Systems Language Section Description ABCL/l 3.1.6 Object-oriented language for modeling distributed systems Act 1 3.1.6 Language based on Ada 3.1.3 Language based on rendezvous Aeolus 3.1.7 Language based on atomic transactions, used for Clouds distributed oper- ating system ALPS 3.1.6 Object-oriented language for parallel and distributed systems AMPL 3.1.2 Language based on asynchronous message passing Argus 3.1.7 Language based on atomic transactions Avalon 3.1.7 Language based on atomic transactions, used for Camelot distributed transaction system Blaze 3.2.1 Language for scientific programming, based on parallel loops and functional (implicit) parallelism BNR Pascal 3.1.3 Language based on rendezvous BRAVE 3.2.2 Logic language for artificial-intelligence applications Camelot Library 3.1.7 Language based on atomic transactions, used for Camelot distributed transaction system Cantor 3.1.6 Language based on actor model CCSP 3.1.1 CSP-based language for operating-system design Cedar 3.1.4 Language based on remote procedure calls CLIX 3.1.6 Object-oriented language Cluster 86 3.1.6 Object-oriented language CMAY 3.1.2 -based language with asynchronous message passing Concurrent C 3.1.2 Language based on asynchronous message passing Concurrent C 3.1.3 Extension of C with processes and rendezvous Concurrent CLU 3.1.4 Language based on remote procedure calls Concurrent LISP 3.2.1 Functional language with processes and shared variables Concurrent PROLOG 3.2.2 Logic language based on read-only variables ConcurrentSmalltalk 3.1.6 Object-oriented language CONIC 3.1.2 Configuration language with asynchronous message passing CSM 3.1.1 Language based on synchronous message passing CSP-s 3.1.1 Language based on synchronous message passing CSP/SO 3.1.1 Language based on synchronous message passing CSP 3.1.1 Language based on synchronous message passing CSPS 3.1.1 CSP-based language for modeling distributed systems CSSA 3.1.6 Pascal-based language using actor model, used for INCAS project Delta PROLOG 3.2.2 Logic language based on message passing and distributed backtracking Dislang 3.1.5 Language with multiple communication primitives Distributed Smalltalk 3.1.6 Object-oriented language DP 3.1.4 Language based on remote procedure calls DPL-82 3.1.2 Language based on asynchronous message passing ECSP 3.1.1 Language based on synchronous message passing Emerald 3.1.6 Object-based language, employs object mobility EPL 3.1.6 Object-based language, used for Eden distributed operating system FRANK 3.1.2 Language based on asynchronous message passing FX-87 3.2.1 Functional language based on effect system GDPL 3.1.1 Language based on synchronous message passing GHC 3.2.2 Logic language based on guarded Horn clauses GYPSY 3.1.2 Language for implementing verifiable programs, based on asynchronous message passing Hvbrid 3.1.6 Object-oriented language Joyce 3.1.1 Secure language based on CSP and Pascal LADY 3.1.2 Language based on asynchronous message passing, used for INCAS project LIMP 3.1.1 Language based on synchronous message passing Linda 3.2.3 Language based on Tuple Space model Lisntalk 3.2.1 Functional language based on CSP and LISP LYNX 3.1.4 Language based on remote procedure calls MC 3.1.3 Language based on rendezvous Mandala 3.2.2 Logic/object-oriented language for knowledge programming Mentat 3.1.6 Object-oriented language based on macro data flow MENYMA/S 3.1.2 Language based on asynchronous message passing

ACM Computing Surveys, Vol. 21, No. 3, September 1989 314 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

Table 13. (Continued) Language Section Description

Multilisp 3.2.1 Functional language NIL 3.1.2 Secure language based on process model, typestates, and asynchronous message passing oc 3.2.2 Logic language Occam 3.1.1 Language based on synchronous message passing, used for Inmos transputers OIL 3.1.6 Object-oriented/logic/procedural language, used for FAIM-1 multiprocessor Ondine 3.1.6 Object-oriented language Orca 3.2.3 Language based on shared data objects 3.1.6 Object-oriented language for modeling knowledge systems ::ient&4’K 3.1.4 Language based on remote procedure calls P-PROLOG 3.2.2 Logic language based on exclusive guarded Horn clauses ParAlfl 3.2.1 Functional language with mapping notation PARLOG 3.2.2 Logic language based on mode declarations ParMod 3.1.2 Pascal-based language with modules and asynchronous message passing Pascal+CSP 3.1.1 Language integrating Pascal and CSP Pascal-FC 3.1.5 Language with multiple communication primitives Pascal-m 3.1.1 Language with synchronous message passing through mailboxes PCL 3.1.2 Language based on asynchronous message passing Planet 3.1.1 Pascal-based language with synchronous message passing through links Platon 3.1.2 Pascal-based language with asynchronous message passing PLITS 3.1.2 Language based on asynchronous message passing PML 3.2.1 Functional language POOL 3.1.6 Object-oriented language, used for Philips DOOM machine Port Language 3.1.2 Language with asynchronous message passing through ports Pronet 3.1.2 Language based on asynchronous message passing Quty 3.2.2 Language combining logic and functional programming QLISP 3.2.1 Functional language Raddle 3.1.6 Object-based language for designing large distributed systems RBCSP 3.1.1 Language based on synchronous message passing Relational Language 3.2.2 Logic language, predecessor of PARLOG and Concurrent PROLOG SDL 3.2.3 Language based on shared dataspace, extends tuple space with atomic transactions SINA 3.1.6 Object-oriented language Sloop 3.1.6 Object-oriented language based on virtual object space SR 3.1.5 Language with multiple communication primitives StarMod 3.1.5 Language with multiple communication primitives Symmetric LISP 3.2.1 Parallel functional language Vulcan 3.2.2 Logic object-oriented language ZEN0 3.1.2 Language based on asynchronous message passing

ACKNOWLEDGMENTS Clifford Neuman, Robbert van Renesse, Guido van Rossum, Irvin Shizgal, Chris Steketee, Kuo-Chung We would like to thank Ehud Shapiro for his contri- Tai, and Hans Tebra were very kind to read and bution to our discussion of parallel logic programming comment on earlier versions of the paper. Finally, we languages. Peter Wegner suggested many improve- would like to thank the mod.os readership for ments to the paper, including the distinction between contributions to the “what is a distributed system” logical and physical distribution. We are grateful to discussion. Nick Carriero, , , and the anonymous referees, who gave us useful com- ments on various parts of the paper. Greg Andrews, Andrew Black, Narain Gehani, Steve Gregory, Tony REFERENCES Hoare, Paul Hudak, Norman Hutchinson, Gary ABRAMSKY, S., AND BORNAT, R.1983. Pascal-m: A Leavens, Barbara Liskov, and Tom Wilkes helped us language for loosely coupled distributed systems. in getting the description of their languages accurate In Distributed Computing Systems, Y. Paker and and up-to-date. Erik Baalbergen, Susan Flynn Hum- J.-P. Verjus, Eds. Academic Press, London, pp. mel, Dick Grune, Frans Kaashoek, Robert Halstead, 163-189.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 315

ACKERMAN,W. B. 1982. Data flow languages. Com- BAALBERGEN,E. H. 1988. Design and implementa- puter 15, 2 (Feb.), 15-25. tion of parallel make. Comput. Syst. 1, 2 (Spring), ADAMO, J.-M. 1982. Pascal+CSP, merging Pascal 135-158. and CSP in a parallel processing oriented lan- BACKUS, J. 1978. Can programming be liberated guage. In Proceedings of the 3rd Znternational from the von Neumann style? A functional style Conference on Distributed Computing Systems and its algebra of programs, Commun. ACM 21, (Miami/Ft. Lauderdale, Fla., Oct. 18-22). IEEE, 8 (Aug.), 613-641. New York, pp. 542-547. BAGRODIA,R., AND CHANDY, K. M. 1985. A micro- AGHA, G. 1986. An overview of actor languages. SZG- kernel for distributed applications. In Proceed- PLAN Not. (ACM) 22, 10 (Oct.), 58-67. ings of the 5th International Conference on Dis- AHUJA, S., CARRIERO,N., AND GELERNTER,D. 1986. tributed Computing Systems (Denver, Colo., May Linda and friends. Computer Z9,8 (Aug.), 26-34. 13-17). IEEE, New York, pp. 140-149. AHUJA, S., CARRIERO, N., GELERNTER, D;, AND BAIARDI, F., RICCI, L., AND VANNESCHI, M. 1984. KRISHNASWAMY,V. 1988. Matching language Static type checking of interprocess communica- and hardware for parallel computation in the tion in ECSP. In Proceedings of the SIGPLAN Linda machine. IEEE Trans. Comput. C-37, 8 84 Symposium on Compiler Construction. SZG- (Aug.), 921-929. PLAN Not. (ACM) 29,6 (June), 290-299. AKSIT, M., AND TRIPATHI, A. 1988. Data abstraction BAL, H. E. 1989. The shared data-object model as a mechanisms in SINA/st. In Proceedings of Ob- paradigm for programming distributed systems. ject-oriented Programming Systems, Languages Ph.D. dissertation, Dept. of Mathematics and and Applications 1988. SZGPLAN Not. (ACM) Computer Science, Vriie Universiteit. Amster- 23, 11 (Nov.), 267-275. dam,.the Netherlands. - ALMES, G. T. 1986. The impact of language and BAL, H. E., AND TANENBAUM, A. S. 1988. system on remote procedure call design. In Pro- Distributed programming with shared data. In ceedings of the 6th International Conference on Proceedings of the IEEE CS 1988 International Distributed Computing Systems (Cambridge, Conference on Computer Languages (Miami, Fla., Mass., May 19-23). IEEE, New York, pp. Oct. 9-13). IEEE, New York, pp. 82-91. 414-421. BAL, H. E., VAN RENESSE,R., AND TANENBAUM, A. ALMES, G. T., BLACK, A. P., LAZOWSKA,E. D., AND S. 1987. Implementing distributed algorithms NOE, J. D. 1985. The Eden system: A technical using remote procedure calls. In Proceedings of review. IEEE Trans. Softw. Eng. SE-II, 1 (Jan.), the AFZPS National Computer Conference (Chi- 43-59. cago, Ill., June 15-18). AFIPS Press, Reston, Va., AMBLER, A. L., GOOD, D. I., BROWNE,J. C., BURGER, pp. 4999506. W. F., COHEN, R. M., HOCH, C. G., AND WELLS, BALL, J. E., WILLIAMS, G. J., AND Low, J. R. R. E. 1977. GYPSY: A language for specifica- 1979. Preliminary ZEN0 language description. tion and implementation of verifiable programs. SZGPLAN Not. (ACM) 14, 9 (Sept.), 17-34. SZGPLAN Not. (ACM) 12,3 (Mar.), l-10. BENNETT, J. K. 1987. The design and implementa- AMERICA, P. 1987. POOL-T: A parallel object- tion of distributed Smalltalk. In Proceedings of oriented language. In Object-Oriented Concurrent Object-Oriented Programming Systems, Lan- Programming, A. Yonezawa and M. Tokoro, Eds. guages and Applications 1987. SZGPLAN Not. MIT Press, Cambridge, Mass., pp. 199-220. (ACM) 22,12 (Dec.), 318-330. ANDREWS, G. R. 1981. Synchronizing resources. BERNSTEIN, A. J. 1980. Output guards and non- ACM Trans. Program. Lang. Syst. 3, 4 (Oct.), determinism in “Communicating Sequential Pro- 405-430. cesses”. ACM Trans. Program. Lang. Syst. 2, 2 (Apr.), 234-238. ANDREWS, G. R., 1982. The distributed program- ming language SR-Mechanisms, design and im- BIRMAN, K. P., AND JOSEPH, T. A. 1987. Reliable communication in the presence of failures. ACM plementation. Softw. Prac. Exper. 12, 8 (Aug.), 719-753. Trans. Comput. Syst. 5, 1 (Feb.), 47-76. BIRRELL, A. D., AND NELSON, B. J. 1984. ANDREWS, G. R., AND OLSSON, R. A. 1986. The Implementing remote procedure calls. ACM evolution of the SR programming language. Dis- Trans. Comput. Syst. 2, 1 (Feb.), 39-59. trib. Comput. 1, 3 (July), 133-149. BLACK, A., HUTCHINSON, N., JUL, E., AND LEVY, H. ANDREWS, G. R., AND SCHNEIDER, F. B. 1983. 1986. Object structure in the Emerald system. Concepts and notations for concurrent program- In Proceedings of Object-Oriented Programming ming. ACM Comput. Suru. 15, 1 (Mar.), 3-43. Systems, Languages and Applications 1986. SZG- ANDREWS, G. R., OLSSON, R. A., COFFIN, M., EL- PLAN Not. (ACM) 21, 11 (Nov.), 78-86. SHOFF,I., NILSEN, K., PURDIN, T., AND TOWN- BLACK, A. P., HUTCHINSON, N. C., MCCORD, B. C., SEND,G. 1988. An overview of the SR language AND RAJ, R. K. 1984. EPL Programmer’s Guide. and implementation. ACM Trans. Program. Univ. of Washington, Seattle, June. Lang. Syst. 20, 1 (Jan.), 51-86. BLACK, A., HUTCHINSON, N., JUL, E., LEVY, H., AND ATHAS, W. C., AND SEITZ, C. L. 1988. CARTER, L. 1987. Distribution and abstract Multicomputers: Message-passing concurrent types in Emerald. IEEE Trans. Softw. Eng. SE- computers. Computer 21, 8 (Aug.), 9-24. 13, 1 (Jan.), 65-76.

ACM ComputingSurveys, Vol. 21,No. 3, September1989 316 l H. E. Bal, J, G. Steiner, and A. S. Tanenbaum

BLOCH, J. J. 1988. The Camelot library. In Guide to CLARK, K. L., AND GREGORY, S. 1986. PARLOG: the Camelot Distributed Transaction Facility: Re- Parallel programming in logic. ACM Trans. Pro- lease 1, A. Z. Spector and K. R. Swedlow, Eds. gram. Lang. Syst. 8, 1 (Jan.), 1-49. Carnegie-Mellon University, Pittsburgh, Pa., CLARK, K. L., MCCABE, F. G., AND GREGORY, S. pp. 29-62. 1982. IC-PROLOG language features. In Logic BORG, A., BAUMBACK, J., AND GLAZER, S. 1983. A Programming, S.-A. Tarnlund, Ed. Academic message system supporting fault tolerance. In Press, London, pp. 253-266. Proceedings of the 9th Symposium on Operating CMELIK, R. F., 1986. Concurrent Make: The Design Systems Principles (Bretton Woods, N.H., Oct. and Implementation of a Distributed Program-in 10-13). ACM-SIGOPS, New York, pp. 90-99. Concurrent C. AT&T Bell Laboratories. Murrav BRUNCH HANSEN, P. 1973. Operating System Prin- Hill, N.J. ciples. Prentice-Hall, Englewood Cliffs, N.J. CMELIK, R. F., GEHANI, N. H., AND ROOME, W. D. BRINCH HANSEN, P. 1975. The programming lan- 1986. Experience with Distributed Versions of guage concurrent Pascal. IEEE Trans. Softw. Concurrent C. AT&T Bell Laboratories, Murray Eng. SE-I, 2 (June), 199-207. Hill, N.J. (Also to appear in IEEE Trans. Softw. BRINCH, HANSEN, P. 1978. Distributed processes: Ew.) A concurrent programming concept. Commun. CMELIK, R. F., GEHANI, N. H., AND ROOME, W. D. ACM 22,ll (Nov.), 934-941. 1987. Fault Tolerant Concurrent C: A Tool for BRINCH HANSEN, P. 1987. Joyce-A programming Writing Fault Tolerant Distributed Prozrams. language for distributed systems. Softw. Pratt. AT&T-Bell Laboratories, Murray Hill, N.i. Exper. 17, 1 (Jan.), 29-50. CODISH, M., AND SHAPIRO, E. 1986. Compiling OR- BURNS, A. 1988. Programming in Occam 2. Addison- parallelism into AND-parallelism. In Proceedings Wesley, Wokingham, England. of the 3rd International Conference on Logic Pro- BURNS, A., AND DAVIES, G. 1988. Pascal-FC: A lan- gramming (London, July 14-18), Springer- guage for teaching concurrent programming. SIG- Verlag, Berlin, pp. 283-297. PLAN Not. (ACM) 23, 1 (Jan.), 58-66. COOK, R. P. 1980. *MOD-A language for distrib- BURNS, A., LISTER, A. M., AND WELLING, A. J. uted programming. IEEE Trans. Softw. Eng. SE- 1987. A review of Ada tasking. Lecture Notes in 6, 6 (Nov.), 563-571. Computer Science, Vol. 262. Springer-Verlag, COOPER, E. C. 1985. Replicated distributed pro- Berlin. grams. In Proceedings of the 10th Symposium on BURTON, F. W. 1984. Annotations to control par- Operating Systems Principles (Rosario Resort allelism and reduction order in the distributed Orcas Island, Wash., Dec.). New York, ACM- evaluation of functional programs. ACM Trans. SIGOPS, pp. 63-78. Program. Lang. Syst. 6, 2 (Apr.), 159-174. COOPER, R. C. B.. AND HAMILTON, K. G. 1988. CARPENTER, B. E., AND CAILLIAU, R. 1984. Preserving abstraction in concurrent program- Experience with remote procedure calls in a real- ming. IEEE Trans. Softw. Enc. SE-14. ,.2 (Feb.). time control system. Softw. Pratt. Exper. 14, 9 258-1263. (Sept.), 901-907. Cox, I. J., AND GEHANI, N. H. 1986. Concurrent CARRIERO, N. 1987. The implementation of tuple programming and robotics. AT&T Bell Labora- space machines. Res. Rep. 567, Ph.D. disserta- tories, Murray Hill, N.J. tion, Dept. of Computer Science, Yale Univ., New CROOKES, D., AND ELDER, J. W. G. 1984. An exper- Haven, Conn., Dec. iment in language design for distributed systems. CARRIERO, N., AND GELERNTER, D. 1986. The Softw. Pratt. Exper. 14, 10 (Oct.), 957-971. S/Net’s Linda kernel. ACM Trans. Comput. Syst. DANNENBERG, R. B. 1981. AMPL: Design, imple- 4,2 (May), 110-129. mentation and evaluation of a lan- CARRIERO, N., GELERNTER, D., AND LEICHTER, J. guage. Carnegie-Mellon University. Pittsburgh, 1986. Distributed data structures in Linda. In Pa. Proceedings of the 13thACM Symposium on Prin- DAVIS, A. L., AND ROBISON, S. V. 1985. The archi- ciples of Programming Languages (St. Petersburg, tecture of the FAIM-1 symbolic multiprocessing Fla., Jan. 13-15). ACM, New York, pp. 236-242. system. In 9th International Joint Conference on CLARK, K. L. 1988. PARLOG and its applications. (Los Angeles, Calif., Aug. IEEE Trans. Softw. Eng. SE-14,12, (Dec.), 1792- 13-18). pp. 32-38. 1804. DAY, M. S. 1987. Replication and reconfiguration in CLARK, K. L., AND GREGORY, S. 1981. A rela- a distributed mail repository. TR-376, MIT Lab- tional language for parallel programming. In oratory for Computer Science, Cambridge, Mass., Proceedings of the 1981 ACM Conference on Func- Apr. tional Programming Languages and Computer Ar- DEPARTMENT OF DEFENSE, U.S. 1983. Reference chitecture (Portsmouth, N.H., Oct.). ACM, New manual for the Ada programming language. York, pp. 171-178. ANSI/MIL-STD-1815A, DoD,Washington, D.C., CLARK, K. L., AND GREGORY, S. 1985. Notes on the Jan. implementation of PARLOG. J. Logic Program. DETLEFS, D. L., HERLIHY, M. P., AND WING, J. M. 2, 1 (Apr.), 17-42. 1988. Inheritance of synchronization and recov-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 317

ery properties in Avalon/C++. Computer 21, 12 GEHANI, N. H. 1984b. Broadcasting sequential pro- (Dec.), 57-69. cesses (BSP). IEEE Trans. Softw. Eng. SE-IO, 4 DIJKSTRA, E. W. 1975. Guarded commands, nonde- (July), 343-351. terminacy, and formal derivation of programs. GEHANI, N. H. 1987. Message Passing: Synchronous Commun. ACM 18,s (Aug.), 453-457. uersus Asynchronous. AT&T Bell Laboratories, Murray Hill, N.J. EICHHOLZ, S. 1987. Parallel programming with ParMod. In Proceedings of the 1987 International GEHANI, N. H., AND CARGILL, T. A. 1984. Conference on Parallel Processing (St. Charles, Concurrent programming in the Ada language: Ill., Aug. 17-21). Penn State University, pp. The polling bias. Softw. Pratt. Exper. 24,5 (May), 377-380. 413-427. GEHANI, N. H., AND ROOME, W. D. 1986a. ELRAD, T., AND MAYMIR-DUCHARME, F. 1986. Concurrent C. Softw. Pratt. Exper. 26, 9 (Sept.), Distributed languages design: Constructs for con- 821-844. trolling preferences. In Proceedings of the 1986 International Conference on Parallel Processing GEHANI, N. H., AND ROOME, W. D. 198613. (St. Charles, Ill., Aug. 19-22). Penn State Uni- Concurrent C++: Concurrent Programming with versity, pp. 176-183. Class(es). AT&T Bell Laboratories, Murray Hill, N.J. ERICSON, L. W. 1982. DPL-82: A language for dis- GEHANI, N. H., AND ROOME, W. D. 1988. tributed processing. In Proceedings of the 3rd Rendezvous facilities: Concurrent C and the Ada International Conference on Distributed Comput- ing Systems (Miami/Ft. Lauderdale, Fla., Oct.). language. IEEE Trans. Softw. Eng. SE-14, 11 (Nov.), 1546-1553. pp. 526-531. GEHANI, N. H., AND ROOME, W. D. 1989. The Con- FELDMAN, J. A. 1979. High level programming for current C Programming Language. Silicon Press, distributed computing. Commun. ACM 22, 6 Summit, N.J. (June), 353-368. GELERNTER, D. 1984. A note on systems program- FINKEL, R., AND MANBER, U. 1987. DIB-A distrib- ming in Concurrent Prolog. In Proceedings of the uted implementation of backtracking. ACM Znte&ational Symposium on Logic Programming Trans. Program. Lang. Syst. 9, 2 (Apr.), 235-256. (Atlantic City, N.J., Feb. 6-S). IEEE, New York, FISHER, A. J. 1986. A multi-processor implementa- pp. 76-82. tion of Occam. Softw. Pratt. Exper. 16, 10 (Oct.), GELERNTER, D. 1985. Generative communication in 875-892. Linda. ACM Trans. Program. Lang. Syst. 7, 1 FORMAN, I. R. 1986. On the design of large distrib- (Jan.), 80-112. uted systems. In Proceedings of the IEEE CS 1986 GELERNTER, D., AND CARRIERO, N. 1986. Linda on International Conference on Computer Languages hypercube multicomputers. In Proceedings of the (Miami, Fla., Oct. 27-30). IEEE, New York, pp. 1985 SIAM Conference (Knoxville, Tenn.). Soci- 84-95. ety for Industrial and Applied Mathematics, Phil- FOSTER, I. 1988. Parallel implementation of PAR- adelphia, Pa., pp. 45-55. LOG. In Proceedings of the International Confer- GELERNTER, D., AND CARRIERO, N. 1988. ence on Parallel Processing (Vol. ZZ) (St. Charles, Applications experience with Linda. In Proceed- Ill., Aug. 15-19). Penn State University, pp. ings of PPEALS 1988. SZGPLAN Not. (ACM) 23, 9-16. 9 (Sept.), 173-187. FOSTER, I., GREGORY, S., RINGWOOD, G., AND SATOH, GELERNTER, D., JAGANNATHAN, S., AND LONDON, T. K. 1986. A sequential implementation of PAR- 1987a. Environments as first class objects. In LOG. In Proceedings of the 3rd International Con- Proceedings of the 14thACM Symposium on Prin- ference on Logic Programming (London, July ciples of Programming Languages (Munich, West 14-18). Springer-Verlag, Berlin, pp. 149-156. Germany, Jan. 21-23). ACM, New York. GABRIEL, R. P., AND MCCARTHY, J. 1984. Queue- GELERNTER, D., JAGANNATHAN, S., AND LONDON, T. based multi-processing Lisp. In Proceedings of 1987b. Parallelism, persistence and meta-clean- the 1984 ACM Sympostum on Lisp and FuncsonaZ liness in the symmetric Lisp interpreter. In Pro- Programming (Austin. Tex.. Aua. 6-8). ACM, ceedings of the Symposium on Interpreters and New York, pp. 25-43. - Interpretive Techniques. SZGPLAN Not. (ACM) GABRIEL, R. P., AND MCCARTHY, J. 1988. QLISP. 22, 7 (July), 274-282. In Parallel Computation and Computers for Arti- GESCHKE, C. M., JR., MORRIS, J. H., AND SATTER- ficial Intelligence, J. Kowalik, Ed. Kluwer Aca- THWAITE, E. H. 1977. Early experience with demic Publishers, Deventer, The Netherlands, Mesa. Commun. ACM 20,s (Aug.), 540-553. pp. 63-89. GOLDBERG, A., AND ROBSON, D. 1983. Smalltalk-80: GAMMAGE, N. D., KAMEL, R. F., AND CASEY, L. M. The Language and Its Implementation. Addison- 1987. Remote rendezvous. Softw. Pratt. Exper. Wesley, Reading, Mass. 17, 10 (Oct.), 741-755. GOLDBERG, B., AND HUDAK, P. 1986. Alfalfa: Dis- GEHANI, N. H. 1984a. Ada: Concurrent Program- tributed graph reduction on a hypercube multi- ming. Prentice-Hall, Englewood Cliffs, N.J. processor. Lecture Notes in Computer Science,

ACM Computing Surveys, Vol. 21, No. 3, September 1989 318 . H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

vol. 279 (Proceedings of t/m Santa Fe Graph Re- HUNT, J. G., 1979. Messages in typed languages. duction Workshop). Springer-Verlag, New York, SIGPLAN Not. (ACM) 14, 1 (Jan.), 27-45. pp. 94-113. HUR, J. H., AND CHON, K. 1987. Overview of a GRAHAM, P. C. J. 1985. Using BINS for inter- parallel object-oriented language CLIX. Lecture process communication. SZGPOiN Not. (ACM) Notes in Computer Science, Vol. 276 (Proceedings 20, 2 (Feb.), 32-41. of the European Conference on Object-Oriented GREGORY, S. 1987. Parallel Logic Programming in Programming). Springer-Verlag, Berlin, pp. PARLOG. Addison-Wesley, Wokingbam, Eng- 265-273. land. HUTCHINSON, N. C. 1987. Emerald: An object-based GREIF, I., SELIGER, R., AND WEIHL, W. 1986. language for distributed programming. Tech. Atomic data abstractions in a distributed collab- Rep. 87-01-01, Ph.D. dissertation, Dept. of Com- orative editing system. In Proceedings of the 13th puter Science, Univ. of Washington, Seattle, Jan. ACM Symposium on Principles of Programming INMOS LTD. 1984. Occam Programming Manual. Lunguages (St. Petersburg, Fla., Jan.). ACM, New Prentice-Hall, Englewood Cliffs, N.J. York, pp. 160-172. ISHIKAWA, Y., AND TOKORO, M. 1987. Orient84/K: GRIMSHAW, A. S., AND LIU, J. W. S. 1987. Mentat: An object-oriented concurrent programming lan- An object-oriented macro data flow system. In guage for knowledge representation. In Object- Proceedings of Object-Oriented Programming Oriented Concurrent Programming, A. Yonezawa Systems, Languages and Applications 1987. SIG- and M. Tokoro, Eds. MIT Press, Cambridge, PLAN Not. (ACM) 22, 12 (Dec.), 35-47. Mass., pp. 159-198. HALSTEAD, R. H., JR. 1985. Multilisp: A language JAZAYERI, M., GHEZZI, C., HOFFMAN, D., MIDDLE- for concurrent symbolic computation. ACM TON, D., AND SMOTHERMAN, M. 1980. CSP/80: Trans. Program. Lung. Syst. 7, 4 (Oct.), 501-538. A language for communicating sequential pro- HAMILTON, K. G. 1984. A remote procedure call cesses. In Proceedings of IEEE COMPCON Full system. Tech. Rep. 70, Ph.D. dissertation, Com- 1980 (New York, Sept.). IEEE, New York, pp. puter Laboratory, Univ. of Cambridge, Cam- 736-740. bridge, U.K., Dec. JONES, A. K., AND SCHWARZ, P. 1980. Experience HERLIHY, M., AND LISKOV, B. 1982. A value trans- using multiprocessor systems-A status report. mission method for abstract data types. ACM ACM Comput. Suru. 12, 2 (June), 121-165. Trans. Program. Lang. Syst. 4, 4 (Oct.), 527-551. JOSEPH, T. A., AND BIRMA~, K. P. 1986. LOW cost HEWITT, C. 1977. Viewing control structures as pat- management of replicated data in fault-tolerant terns of passing messages. Artif. Zntell. 8,3 (June), distributed systems. ACM Trans. Comput. Syst. 323-364. 4, 1 (Feb.), 54-70. HOARE, C. A. R. 1978. Communicating sequential JOUVELOT, P., AND GIFFORD, D. K. 1988. The FX- processes. Commun. ACM 21,8 (Aug.), 666-677. 87 interpreter. In Proceedings of the IEEE CS HOARE, C. A. R. 1981. The emperor’s old clothes. 1988 International Conference on Computer Lan- Commun. ACM 24,2 (Feb.), 75-83. guuges (Miami, Fla., Oct. 9-13). IEEE, New York, pp. 65-72. HOARE, C. A. R. 1985. Communicating Sequential Processes. Prentice-Hall, Englewood Cliffs, N.J. JUL, E. 1988. Object mobility in a distributed object- oriented system. Tech. Rep. 88-12-06, Ph.D. dis- HOLT, R. C. 1982. A short introduction to Concur- sertation, Univ. of Washington, Seattle, Dec. rent Euclid. SZGPLAN Not. (ACM) 27, 5 (May), 60-79. JUL, E., LEVY, H., HUTCHINSON, N., AND BLACK, A. 1988. Fine-grained mobility in the Emerald HOURI, A., AND SHAPIRO, E. 1986. A sequential system. ACM Trans. Comput. Syst. 6, 1 (Feb.), abstract machine for Flat Concurrent Prolog. 109-133. Rep. CS86-20, Dept. of Computer Science, The Weizmann Institute of Science, Rehovot, Israel, KAASHOEK, M. F., AND BAL, H. E. 1988. An evalu- July. ation of the distributed data structure paradigm in Linda. IR-173, Dept. of Mathematics and Com- HUDAK, P. 1986. Para-functional programming. puter Science, Vrije Universiteit, Amsterdam, Computer 19,8 (Aug.), 60-70. Dec. HUDAK, P. 1988. Exploring parafunctional program- KAHN, K., TRIBBLE, E. D., MILLER, M. S., AND ming: Separating the what from the how. IEEE BOBROW, D. G. 1986. Objects in concurrent Softw. 5, 1 (Jan.), 54-61. logic programming languages. Proceedings of Ob- HUDAK, P., AND SMITH, L. 1986. Para-functional ject-oriented Programming Systems, Languages programming: A paradigm for programming mul- and Applications 1986. SIGPLAN Not. (ACM) tiprocessor systems. In Proceedings of the 13th 22, 11 (Nov.), 242-257. ACM Symposium on Principles of Programming VAN KATWIJK, J. VAN 1987. The Ada compiler. Ph.D. Languages (St. Petersburg, Fla., Jan. 13-15). dissertation, Dept. of Mathematics and Computer ACM, New York, pp. 243-254. Science, Delft Univ. of Technology, Delft, The HULL, M. E. C., AND DONNAN, G. 1986. Netherlands, Sept. Contextually communicating sequential proc- KERNIGHAN, B. W., AND RITCHIE, D. M. 1978. The esses-a approach. Softw. C Programming Language. Prentice-Hall, Engle- Pratt. Exper. 16, 9 (Sept.), 845-864. wood Cliffs, N.J.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems 319

KERRIDGE, J., AND SIMPSON, D. 1986. Operating Systems Principles (Pacific Grove, Communicating parallel processes. Softw. Pratt. Calif., Dec. 10-12). ACM-SIGOPS, New York, Exper. 16, 1 (Jan.), 63-86. pp. 33-42. KIEBURTZ, R. A., AND SILBERSCHATZ, A. 1979. LISKOV, B. 1982. On linguistic support for distrib- Comments on “Communicating Sequential Pro- uted programs. IEEE Trans. Softw. Eng. SE-8, 3 cesses.” ACM Trans. Program. Lang. Syst. 1, 2 (May), 203-210. (Oct.), 218-225. LISKOV, B. 1984. Overview of the Argus language KOCH, A., AND MAIBAUM, T. S. E. 1982. A message and system. Programming Methodology Group oriented language for system applications. In Pro- Memo 40, MIT Laboratory for Computer Science, ceedings of the 3rd International Conference on Cambridge, Mass., Feb. Distributed Computing Systems (Fort Lauderdale, LISKOV, B. 1988. Distributed programming in Argus. Fla., Oct. 18-22). IEEE, New York, pp. 824-832. Commun. ACM 31,3 (Mar.), 300-312. KRAMER, J., AND MAGEE, J. 1985. Dynamic config- LISKOV, B., AND SCHEIFLER, R. 1983. Guardians and uration for distributed systems. IEEE Trans. actions: Linguistic support for robust, distributed Softw. Eng. SE-II, 4 (Apr.), 424-436. programs. ACM Trans. Program. Lang. Syst. 5,3 KRUATRACHLJE, B., AND LEWIS, T. 1988. Grain size (July), 381-404. determination for parallel processing. IEEE LISKOV, B., CURTIS, D., JOHNSON, P., AND Softw. 5, 1 (Jan.), 23-32. SCHEIFLER, R. 1987. Implementation of Argus. KUNG, H. T. 1982. Why systolic architectures? Com- In Proceedings of the Zlth Symposium on Oper- puter 15, 1 (Jan.), 37-46. ating Systems Principles (Austin, Tex., Nov. LAMPSON, B. W. 1981. Atomic transactions. In Dis- S-11). ACM-SIGOPS, New York, pp. 111-122. tributed Systems-Architecture and Implementa- LISKOV, B., SNYDER, A., ATKINSON, R., AND SCHAF- tion, B. W. Lampson, Ed. Springer-Verlag, New FERT, C. 1977. Abstraction mechanisms in York, pp. 246-265. CLU. Commun. ACM 20,8 (Aug.), 564-576. LEBLANC, R. J., AND MACCABE, A. B. 1982. The LUCCO, S. E., 1987. Parallel programming in a vir- design of a programming language based on con- tual object space. In Proceedings of Object- nectivity networks. In Proceedings of the 3rd In- Oriented Programming Systems, Languages and ternational Conference on Distributed Computing Applications 1987. SIGPLAN Not. (ACM) 22, 12 Systems (Fort Lauderdale, Fla., Oct. 18-22). (Dec.), 26-34. IEEE, New York, pp. 532-541. LUJUN, S., AND ZHONGXIU, S. 1987. An object- LEBLANC, R. J., AND WILKES, T. 1985. Systems oriented programming language for developing programming with objects and actions. In Pro- distributed software. SIG’PLAN Not. (ACM) 22, ceedings of the 5th International Conference on 8 (Aug.), 51-56. Distributed Computing Systems (Denver, Colo., May 13-17). IEEE, New York, pp. 132-139. MARSLAND, T. A., OLAFSSON, M., AND SCHAEFFER, J. 1986. Multiprocessor tree-search experi- LEBLANC, T. J., AND COOK, P. An analysis of lan- ments. In Aduances in Computer Chess 4, D. F. guage models for high-performance communica- Beal, Ed. Pergamon Press, Oxford, pp. 37-51. tion in local-area networks. SZGPLAN Not. (ACM) 18,6 (June), 65-72. MATSUOKA, S., AND KAWAI, S. 1988. Using tuple space communication in distributed object- LESSER, V., SERRAIN, D., AND BONAR, J. 1979. oriented languages. In Proceedings of Object- PCL-A process oriented . Oriented Programming Systems, Languages and In Proceedings of the 1st International Conference Applications 1988. SZGPLAN Not. (ACM) 23, 11 on Distributed Computing Systems (Huntsville, (Nov.), 276-284. Ala., Oct. l-5). IEEE, New York, pp. 315-329. LI, C. 1988. Concurrent programming language Lisp- MAY, D. 1983. Occam. SIGPLAN Not. (ACM) 18, 4 talk. SIGPLAN Not. (ACM) 23, 4 (Apr.), 71-80. (Apr.), 69-79. LI, C.-M., AND LIU, M. T. 1981. Dislang: A distrib- MAY, D., AND SHEPHERD, R. 1984. The transputer uted programming language/system. In Proceed- implementation of Occam. In Proceedings of the ings of the 2nd International Conference on International Conference on Fifth Generation Distributed Computing Systems (Paris, France, Computer Systems 1984 (Tokyo, Japan, Nov. Apr. S-10). IEEE, New York, pp. 162-172. 6-9), pp. 533-541. LI, K., AND HUDAK, P. 1986. in MEHROTRA, P., AND ROSENDALE, J. VAN. 1985. The shared virtual memory systems. In Proceedings of Blaze language: A parallel language for scientific the 5th Annual ACM Symposium on Principles of programming. Parallel Comput. 5, 3 (Nov.), Distributed Computing (Calgary, Canada, Aug. 339-361. 13-18). ACM, New York, pp. 229-239. MIEROWSKY, C., TAYLOR, S., SHAPIRO, E., LEVY, J., LIEBERMAN, H. 1987. Concurrent object-oriented AND SAFRA, M. 1985. The design and imple- programming in Act 1. In Object-Oriented Con- mentation of Flat Concurrent Prolog. Rep. CS85- current Programming. A. Yonezawa and M. 09, Dept. of Computer Science, The Weizmann Tokoro, Eds. MIT Press, Cambridge, Mass., Institute of Science, Rehovot, Israel, July. pp. 9-36. MILEWSKI, J. 1984. Loglan implementation of the LISKOV, B. 1979. Primitives for distributed comput- AMPL message-passing system. SIGPLAN Not. ing. In Proceedings of the 7th Symposium on (ACM) 19, 9 (Sept.), 21-29.

ACM Computing Surveys, Vol. 21, No. 3, September 1989 320 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

MOSS, J. E. B. 1981. Nested transactions: An ap- and Implementation (Atlanta, Ga., June 22-24). proach to reliable distributed computing. Tech. ACM, New York, pp. 250-259. Ren. MlT/LCS/TR-260. Ph.D. dissertation. MIT REYNOLDS, T. J., BEAUMONT, A. J., CHENG, A. S. K., Laboratory for’ Computer Science, Camdridge, DELGADO-RANNAURO, S. A., AND SPACEK, L. A. Mass. 1988. BRAVE-A parallel logic language for MUNDIE, D. A., AND FISHER, D. A. 1986. Parallel artificial intelligence. Future Generations Com- processing in Ada. Computer 19, 8 (Aug.), 20-25. put. Syst. 4, 1 (Aug.), 69-75. MYERS, W. 1987. Ada: First users-Pleased; Pro- RINGWOOD, G. A. 1988. PARLOG and the dining spective users-Still hesitant. Computer 20, 3 logicians. Commun. ACM 31, 1 (Jan.), 10-25. (Mar.), 68-73. RIZK, A., AND HALSALL, F. 1987. Design and imple- NEHMER, J., HABAN, D., MATTERN, F., WYBRANIETZ, mentation of a C-based language for distributed D., AND ROMBACH, H. D. 1987. Key concepts real-time systems. SZGPLAN Not. (ACM) 22, 6 in the INCAS multicomputer project. IEEE (June), 83-96. Trans. Softu~. Eng. SE-13,8 (Aug.), 913-923. ROBINSON, J. A. 1965. A machine-oriented logic NELSON, B. J. 1981. Remote procedure call. Rep. based on the resolution principle. J. ACM 12, 1 CMU-CS-81-119, Dept. of Computer Science, (Jan.), 23-41. Carnegie-Mellon Univ., Pittsburgh, Pa., May. ROMAN, G.-C., CUNNINGHAM, H. C., AND EHLERS, NC, K.-W., AND LI, W. 1984. GDPL-A generalized M. E. 1988. A shared dataspace language sup- distributed programming language. In Proceed- porting large-scale concurrency. In Proceedings ings of the 4th International Conference on Dis- of the 8th International Conference on Distributed tributed Computing Systems (San Francisco, Computing Systems (San Jose, Calif., June Calif., May 14-18). IEEE, New York, pp. 69-78. 13-17). IEEE, New York, pp. 265-272. NIERSTRASZ, 0. M. 1987. Active objects in hybrid. ROMAN, G.-C., EHLERS, M. E., CUNNINGHAM, H. C., In Proceedings of Object-Oriented Programming AND LYKINS, R. H. 1987. Toward comprehen- Systems, Languages and Applications 1987. SZG- sive specification of distributed systems. In Pro- PLAN Not. (ACM) 22,12 (Dec.), 243-253. ceedings of the 7th International Conference on OGIHARA, T., KAJIHARA, Y., NAGANO, S., AND ARI- Distributed Computing Systems (Berlin, Sept. SAWA, M. 1986. Concurrency introduction to an 21-25). IEEE, New York, pp. 282-289. object-oriented language system ondine. In 3rd ROOME, W. D. 1986. Discrete event simulation in National Conference Record A-5-l. Japan Society Concurrent C. AT&T Bell Laboratories, Murray for Software Science and Technology, Japan. Hill, N.J. OHKI, M., TAKEUCHI, A., AND FURUKAWA, K. 1987. ROPER, T. J., AND BARTER, J. 1981. A communicat- An object-oriented programming language based ing sequential process language and implemen- on the parallel logic programming language KLl. tation. Softw. Pratt. Exper. 12, 11 (Nov.), In Proceedings of the 4th International Conference 1215-1234. on Logic Programming (Melbourne, Australia RUSSELL, R. M. 1978. The CRAY-1 computer sys- (May 25-29). MIT Press, Cambridge, Mass., pp. tem. Commun. ACM 21, 1 (Jan.), 63-72. 894-909. SAFRA, S., AND SHAPIRO, E. 1986. Meta interpreters PAPERT, S. 1981. Mindstorms: Children, Computers for real. In Proceedings of ZFIP Congress ‘86 and Powerful Ideas. Basic Books, New York. (Dublin, Ireland, Sept.). IFIP, pp. 271-278. PATNIAK, L. M., AND BADRINATH, B. R. 1984. SATO, M. 1987. Quty: A concurrent language based Implementation of CSP-S for description of dis- on logic and function. In Proceedings of the 4th tributed algorithms. Comput. Lang. 9,3, 193-202. International Conference on Logic Programming PEREIRA, L. M., MONTEIRO, L., CUNHA, J., AND (Melbourne, Australia, May 25-29). MIT Press, APARICIO, J. N. 1986. Delta Prolog: A distrib- Cambridge, Mass., pp. 1034-1056. uted backtracking extension with events. In Pro- SCOTT, M. L. 1985. Design and implementation of a ceedings of the 3rd Znternationnl Conference distributed systems language. Tech. Rep. 596, on Logic Programming (London, July 14-19). Ph.D. dissertation, Computer Science Dept., Springer-Verlag, Berlin, pp. 69-83. Univ. of Wisconsin at Madison, May. POWELL, M. L., AND PRESOTTO, D. L. 1983. SCOTT, M. L. 1986. The interface between distrib- Publishing: A reliable broadcast communication uted operating system and high-level program- mechanism. In Proceedings of the 9th Symposium ming language. In Proceedings of the 1986 on Operating Systems Principles (Bretton Woods, International Conference on Parallel Processing N.H., Oct.). ACM-SIGOPS, New York, pp. (St. Charles, Ill., Aug. 19-22). Penn State Uni- 100-109. versity, pp. 242-249. RANKA, S., WON, Y., AND SAHNI, S. 1988. SCOTT, M. L. 1987. Language support for loosely- Programming a hypercube multicomputer. IEEE coupled distributed programs. IEEE Trans. Softw. 5, 5 (Sept.), 69-77. Softw. Eng. SE-13, 1 (Jan. 1987), 88-103. REPPY, J. H. 1988. Synchronous operations as first- SCOTT, M. L., AND COX, A. L. 1987. An empirical class values. In Proceedings of the SZGPLAN 88 study of message-passing overhead. In Prked- Conference on Programming Language Design ings of the 7th Znternational Conference on Dis-

ACM Computing Surveys, Vol. 21, No. 3, September 1989 Programming Languages for Distributed Computing Systems l 321

tributed Computing Systems (Berlin, Sept. Semantics of Concurrency (Pittsburgh, Pa., July 21-25). IEEE, New York, pp. 536-543. 9-11). Springer-Verlag, New York, pp. 512-523. SEITZ, C. L. 1985. The cosmic cube. Commun. ACM STROM, R. E., AND YEMINI, S. 1985a. Synthesizing 28, 1 (Jan.), 22-33. distributed and parallel programs through opti- SHAPIRO, E. 1983. A subset of Concurrent Prolog mistic transformations. In Proceedings of the and its interpreter. ICOT TR-003, Institute for 1985 International Conference on Parallel Pro- New Generation Computer Technology, Tokyo, cessing (St. Charles, Ill. Aug. 20-23). Penn State Japan, Feb. University, pp. 632-641. SHAPIRO, E. 1984. Systolic programming: A para- STROM, R. E., AND YEMINI, S. 1985b. Optimistic digm of parallel processing. In Proceedings of recovery in distributed systems. ACM Trans. International Conference on Fifth Generation Comput. Syst. 3, 3 (Aug.), 204-226. Computer Systems 1984 (Tokyo, Japan, Nov. STROM, R. E., AND YEMINI, S. 1986. Typestate: A 6-9). pp. 458-471. programming language concept for enhancing SHAPIRO, E. 1986. Concurrent Prolog: A progress software reliability. IEEE Trans. Softw. Eng. SE- report. Computer 19,8 (Aug.), 44-58. 12, 1 (Jan.), 157-171. SHAPIRO,E. 1987. Concurrent Prolog: Collected Pa- STROM, R. E., YEMINI, S., AND WEGNER, P. 1985. pers. MIT Press, Cambridge, Mass. Viewing Ada from a process model perspective. SHAPIRO, E., AND MIEROWSKY, C. 1984. Fair, In Proceedings of the Conference on Ada in Use biased, and self-balancing merge operators: Their (Paris, France, May 14-18). ACM, New York. specifications and implementation in Concurrent STROUSTRUP,B. 1986. The C++ Programming Lan- Prolog. J. New Generation Comput. 2,3,221-240. guage. Addison-Wesley, Reading, Mass. SHAPIRO,E., AND SAFRA, S. 1986. Multiway merge SUGIMOTO,S., AGUSA, K., TABATA, K., AND OHNO, with constant delay in Concurrent Prolog. J. New Y. 1983. A multi-processor system for Concur- Generation Comput. 4,3, 211-216. rent Lisp. In Proceedings of the 1983 International SHAPIRO, E., AND TAKEUCHI, A. 1983. Object-ori- Conference On Parallel Processing (Bellaire, ented programming in Concurrent Prolog. J. New Mich.). pp. 135-143. Generation Comput. I, 1, 25-48. SWINEHART, D. C., ZELLWEGER,P. T., AND HAG- SILBERSCHATZ,A. 1984. Cell: A distributed comput- MANN, R. B. 1985. The structure of Cedar. In ing modularization concept. IEEE Trans. Softw. Proceedings of ACM SIGPLAN 85 Symposium Eng. SE-IO, 2 (Mar.), 178-185. on Language Issues in Programming Environ- ments. SZGPLAN Not. (ACM) 20, 7 (July), SILVERMAN,W., HIRSCH, M., HOURI, A., AND SHAP- 230-244. IRO, E. 1986. The Logix system user manual. Rep. CS86-21, Dept. of Computer Science, The TAKEUCHI, A., AND FURUKAWA,K. 1985. Bounded Weizmann Institute of Science, Rehovot, Israel. buffer communication in Concurrent Prolog. J. New Generation Comput. 3, 2, 145-155. SLOMAN, M., AND KRAMER, J. 1987. Distributed TAKEUCHI, A., AND FURUKAWA, K. 1986. Parallel Systems and Computer Networks. Prentice-Hall, logic programming languages. In Proceedings of Englewood Cliffs, N.J. the 3rd International Conference on Logic Pro- SMITH-THOMAS, B. 1986. Managing I/O in concur- gramming (London, July 14-18). Springer- rent programming: The Concurrent C window Verlag, Berlin, pp. 242-254. manager. AT&T Bell Laboratories, Murray Hill, TANENBAUM, A. S. 1987. Operating Systems: Design N.J. and Implementation. Prentice-Hall, Englewood SPECTOR, A. Z., BLOCH, J. J., DANIELS, D. S., Cliffs, N.J. DRAVES, R. P., DUCHAMP, D., EPPINGER, J. L., TANENBAUM, A. S., AND VAN RENESSE, R. 1985. MENEES, S. G., AND THOMPSON, D. S. Distributed operating systems. ACM Comput. 1986. The Camelot project. Rep. CMU-CS-86- Sum. 17, 4 (Dec.), 419-470. 166, Dept. of Computer Science, Carnegie-Mellon TANENBAUM, A. S., AND VAN RENESSE,R. 1988. A Univ., Pittsburgh, Pa., Nov. critique of the remote procedure call paradigm. STAUNSTRUP,J. 1982. Message passing communi- In Proceedings of the EUTECO 88 Conference cation versus procedure call communication. (Vienna, Austria, Apr. 20-22). North-Holland, Softw. Pratt. Erper. 12, 3, (March) 223-234. Amsterdam, pp. 775-783. STROM, R. E. 1986. A comparison of the object- TANENBAUM, A. S., VAN STAVEREN, H., KEIZER, oriented and process paradigms. SZGPLAN Not. E. G., AND STEVENSON,J. W. 1983. A practical (ACM) 21, 10 (Oct.), 88-97. toolkit for making portable compilers. Commun. STROM, R. E., AND YEMINI, S. 1983. NIL: An inte- ACM 26,9 (Sept.), 654-660. grated language and system for distributed pro- TAYLOR, S., Av-RON, E., AND SHAPIRO,E. 1987a. A gramming. SZGPLAN Not. (ACM) 28, 6 (June), layered method for process and code mapping. 73-82. J. New Generation Comput. 5, 2, 185-205. STROM, R. E., AND YEMINI, S. 1984. The NIL dis- TAYLOR, S., SAFRA, S., AND SHAPIRO, E. 198713. A tributed systems programming language: A status parallel implementation of Flat Concurrent report. In Proceedings of the NSF/SRC Seminar Prolog. Znt. J. Parallel Program. 15, 3, 245-275.

ACM ComputingSurveys, Vol. 21, No. 3, September1989 322 l H. E. Bal, J. G. Steiner, and A. S. Tanenbaum

TRELEAVEN, P. C., BROWNBRIDGE,D. R., AND HOP- XU, A. S. 1988. A fault-tolerant network kernel for KINS, R. P. 1982. Data-driven and demand- Linda. Tech. Rep. 424, MIT Laboratory for Com- driven computer architectures. ACM Comput. puter Science, Cambridge, Mass., Aug. Sure. 14, 1 (Mar.), 93-143. YANG, R. 1988. P-Prolog: A Parallel Logic Program- TSUJINO, Y., ANDO, M., ARAKI, T., AND TOKURA, N. ming Language. World Scientific Publishing Co., 1984. Concurrent C: A programming language Singapore. for distributed systems. Softw. Pruct. Exper. 14, YANG, R., AND Also, H. 1986. P-Prolog: A parallel 11 (Nov.), 1061-1078. logic language based on exclusive relation. In Proceedings of the 3rd International Conference UEDA, K. 1985. Guarded Horn clauses. ICOT TR- on Logic Programming (London, July 14-18). 103, Institute for New Generation Computer Springer-Verlag, Berlin, pp. 255-269. Technology, Tokyo, Japan, June. YEMINI, S. 1982. On the suitability of Ada multi- VISHNUBHOTIA, P. 1988. Synchronization and tasking for expressing parallel algorithms. In scheduling in ALPS objects. In Proceedings of the Proceedings of the AdaTec Conference on Ada 8th International Conference on Distributed Com- (Arlington, Va., Oct. 6-8). ACM, New York, pp. puting Systems (San Jose, Calif.. June 13-19). 91-97. IEEE, New York, pp. 256-264. YOKOTE, Y., AND TOKORO, M. 1986. The design and WEIHL, W., AND LISKOV, B. 1985. Implementation implementation of ConcurrentSmalltalk. In of resilient, atomic data types. ACM Trans. Pro- Proceedings of Object-Oriented Programming gram. Lang. Syst. 7,2 (Apr.), 244-269. Systems, Languages and Applications 1986. SIG- PLAN Not. (ACM) 21,ll (Nov.), 331-340. VAN WIJNGAARDEN,A., MAILLOUX, B. J., PECK, J. E. L., KOSTER, C. H. A., SINTZOFF, M., LINDSEY, YOKOTE, Y., AND TOKORO, M. 1987a. Concurrent C. H., MEERTENS, L. G. L. T., AND FISKER, programming in ConcurrentSmalltalk. In Object- R. G. 1975. Revised report on the algorithmic Oriented Concurrent Programming, A. Yonezawa and M. Tokoro, Eds. MIT Press, Cambridge, language Algol 68. Acta Znf. 5, l-236. Mass., pp. 129-158. WILKES, C. T., AND LEBLANC, R. J. 1986. Rationale YOKOTE, Y., AND TOKORO, M. 1987b. Experience for the design of Aeolus: A systems programming and evolution of ConcurrentSmalltalk. In language for an action/object system. In Proceed- Proceedings of Object-Oriented Programming ings of the IEEE CS 1986 International Confer- Systems, Languages and Applications 1987. SIG- ence on Computer Languages (Miami, Fla., Oct.). PLAN Not. (ACM) 22,12 (Dec.), 406-415. IEEE, New York, pp. 107-122. YONEZAWA, A., BRIOT, J.-P., AND SHIBAYAMA, E. WILKES, C. T., AND LEBLANC, R. J. 1988. 1986. Object-oriented concurrent programming Distributed locking: A mechanism for construct- in ABCL/l. In Proceedings of Object-Oriented ing highly available objects. In Proceedings of the Programming Systems, Languages and Applica- 7th Symposium on Reliable Distributed Systems tions 1986. SZGPLAN Not. (ACM) 21,ll (Nov.), (Columbus, Ohio, Oct. 10-12). IEEE, New 258-268. York. ZHONGXIU, S., AND XINING, L. 1987. CSM: A dis- WIRTH, N. 1971. The programming language Pascal. tributed programming language. IEEE Trans. Acta Inf. 1, 35-63. Softw. Eng. SE-13, 4 (Apr.), 497-500.

Received June 1988; final revision accepted April 1989.

ACM Computing Surveys, Vol. 21, No. 3, September 1989