An Engineer’s Introduction to the Architecture ANSA: An Engineer’s Introduction

to the

Architecture

Release TR.03.02

November 1989

This document provides an introduction to ANSA. It is specifically oriented towards those with a software and systems background. It describes what is available and how it is used; it does not describe how the architecture is applied to specific application domains.

Architecture Projects Management Limited Architecture Projects Management Limited and their sponsors take no responsibility for the consequences of errors or omissions in this manual, nor for any damages resultmg from the applicatron of the ideas expressed herein.

Architecture Projects Management Limited

Poseidon House Castle Park CAMBRIDGE CB3 ORD United Kingdom

TELEPHONE UK Cambridge (0223) 323010 INTERNATIONAL +44 223 323010 FAX + 44 223 359779 UUCP . ..ukc!acorn!ansa!apm ARPA Internet [email protected]

8 1989 Architecture Projects Management Limited

Permission to copy without fee all or part of this material is granted provided that notice IS given that copying is by permission of Architecture Projects Management Limited. To copy otherwise or to republish requiresspecific permlssion. Advanced Networked Systems Architecture

CONTENTS

Page

1 Background ...... 1 1.1 Objectives ...... 1 1.2 Activities ...... 1 1.3 Standardization ...... 2

2 Executive summary ...... 3 2.1 The problem space ...... 3 2.2Overview ...... 4

3 Architectural philosophy ...... 11 3.1 Viewpoints on distributed processing ...... 11 3.2The computation model ...... 13 3.3 Distribution transparency ...... 14 3.4 The Object-Oriented Approach ...... 15 3.5Type checking ...... 16 3.6 Configuration ...... 16 3.7Interfaces ...... 17 3.8 Operations ...... 18 3.9 Invocations ...... 19 3.10 Terminations ...... 20 3.11 Atomicity ...... 20 3.12 Objects ...... 21 3.13Security ...... 21 3.14Trading ...... 22 3.15 Object groups ...... 22

4 Testbench software ...... 27 4.1 Overall structure ...... 28 4.2 Implementation of engineering concepts ...... 29 4.3 Implementation options ...... 31 4.4 Process mapping ...... 31 4.5 Using the Testbench ...... 32

5 Evaluation ...... 37 .J.l Performance ...... 37 5.2 Transparency ...... 40 5.3 Diagnostics, error detection and recovery ...... 41 5.4 Heterogeneity ...... 41 5.5 Relationship to OS1 ...... 42 5.6 Software engineering ...... 43

6 International standards ...... 45

References ...... *...... 47

Appendices . . ..*...... ,..,...... 49 Advanced Networked Systems Architecture

1 Background

The Advanced Networked Systems Architecture (ANSA) originated in a project undertaken by BT, DEC, GECMarconi, GPT, HP, ICL, ITL, Olivetti, Plessey, Racal and STC within the UK Alvey Information Technology Programme. As the results of the project became more well known it became apparent that a more formal structure was needed to manage the development and exploitation of the architecture. To this end Architecture Projects Management Ltd (APM) was set up as a company in 1989. APM undertakes work on ANSA on behalf of the sponsors at a central laboratory in Cambridge, England. Much of the work is currently funded via the Commission of the European Communities (CEC) ESPRIT II Programme within a project called ISA - Integrated Systems Architecture - in which many of the sponsors of APM are joined by AEG, CTI-Patras, Ericsson Telecom, Televerket, Philips, France Telecom (SEPT) and Siemens. The architecture continues to be known as ANSA, and APM also trades under the name ANSA.

1.1 Objectives

The scope of ANSA is to provide an architecture for distributed systems that satisfies the following objectives:

b generic to many fields of application (including office, factory, telecommunications and general data processing)

b state of the art in technical content

b portable across a wide range of operating systems and programming languages operable in heterogeneous, multi-vendor environments modular structure with maximum opportunity for re-use of existing functionality

b support for a range of distribution naming, concurrency and fault handling policies

b applicable to a wide range of computer and network topologies with no constraints on size oriented towards the requirements of application programmers provision for interworking between autonomously managed networks.

The design principles that have been followed to achieve these objectives are described in Section 3. Advanced Networked Systems Architecture

1.2 Activities

The activities associated with the name “ANSA” are as follows:

b Architecture - development of an architecture for building distributed systems, in the form of an integrated set of structures, functions, design recipes and implementation guidelines

b Software - development of software to demonstrate and validate the architecture (known as the ANSA Testbench)

b Standards - contribution of ANSA results to international standards

D Technology Transfer - transfer of the architecture and Testbench as a technology to both the sponsors of ANSA and to the community at large.

The specification of ANSA is contained in the ANSA Reference Manual [ARM 891, a copy of which may be obtained from APM Ltd.

The Testbench software consists of a suite of C programs and instructions for installing them on HP-UX, SunOS-4, , MSDOS and VMS. (Unsupported ports have also been made to various other systems, but are not yet included on the standard release tape).

1.3 Standardization

ANSA has been founded upon the concept that the architecture should adopt, and not conflict with, current open standards wherever possible. Aspects of the architecture which fall outside the scope of current open standards are taken into the open standards process. To this end, the ANSA team participates actively in ISO/IEC JTCl SC21 WG7 on the standardization of a Reference Model for Open Distributed Processing and with ECMA TC32-TG2 on a Support Environment for Open Distributed Processing (SE-ODP). The team participate to a lesser extent in a number of other standards groups. Advanced Netvorked Systems Architecture

2 Executive summary

2.1 The problem space

ANSA is an architecture for building distributed systems that can operate as a unified whole such that the fact of distribution is transparent to application programmers and users. This is a fundamentally different approach from networking single systems together. It allows full advantage to be taken of the inherent concurrency and separation of distributed systems in order to increase performance, decentralization and reliability, while better masking their disadvantages such as communication errors and partial failures. It produces a system that can be managed as co-ordinated sub-systems appropriate to the enterprise they serve rather than as a random collection of boxes.

When building a distributed system, a number of assumptions which are commonly made when engineering systems for single hosts not only become invalid, but have to be reversed. The most important of these are:

local --, remote more failure modes are possible for remote interactions than for local ones

direct + indirect binding configuration becomes a dynamic process, requiring support for linkage at execution time sequential + concurrent execution true concurrency requires mechanisms to provide sequentiality synchronous + asynchronous interaction communication delays require support for asynchronous interactions and pipelining homogeneous + heterogeneous environment requires common data representation for interactions between remote systems single instance -+ replicated group replication can provide availability and/or dependability

fixed location + migration locations of remote interfaces may not be permanent

unified name space -+ federated name spaces need for naming constructs which mirror administrative boundaries across different remote systems

shared memory + disjoint memory shared memory mechanisms cannot operate successfully on a large scale and where remote operations are involved.

3 Advanced Networked Systems Architecture

Given sufficient knowledge, local optimisations can be engineered back in to a system where appropriate, but in order to retain flexibility they should not be present in the program source code. For example, all procedure calls can be treated as remote by the programmer, and the compiler or linker can replace them by local ones if it subsequently discovers that the calling and called procedures are co-located.

2.2 Overview

2.2.1 Architecture

ANSA is defined with reference to five related models of distributed systems called enterprise, information, computation, engineering and technology. While all are relevant to the design of distributed systems, the computation and engineering models are the ones that bear most directly on the use and construction of distributed systems. A discussion of these models is given in section 3, and a complete description is given in the ANSA Reference Manual.

ANSA has been under development since 1986. In that time the importance of distinguishing between a distributed computing environment as seen by an application programmer and as seen by a system programmer has become very stark. Many environments have an ‘application programmer’s interface’ as the dividing line between the two views. In practice this interface often turns out to be clumsy and reveals too much of the system detail. Moreover, checks must be made at run time to ensure that application programmers are using the interface correctly. ANSA follows a programming language view: that is to say distributed computing concepts should be represented by extra syntactic constructs to be added to existing programming languages. These can then be compiled directly into calls at the systems level. The main advantages of this approach are: ) a simple programming model for applications programmers D checking at compile time rather than run time ) independence of application programmer view from system view. The first two increase the confidence that application programmers have in their programs. The third provides for separate evolution of the two views of the environment, making applications and systems compatible with future modifications. Architecture: Computation model

The computation model is a framework of programming structures and program development tools that should be available to distributed application programmers, whatever the application programming language they choose to use. This model addresses the topics of: ) modularity of distributed application ) access transparent invocation of operations in interfaces

4 Advanced Networked Systems Architecture

b parameter passing scheme ) configuration and location transparency of interfaces ) concurrency and synchronization constraints on interface b replication constraints on interfaces ) extending existing languages to support distributed computing. Maximum engineering flexibility is obtained if all computation requirements of an application are expressed declaratively. This permits tools to be applied to the specifications to generate code satisfying the declared requirements. It facilitates a clean separation between application programmers (stating the requirements) and system programmers (providing tools which use the requirements to generate template code satisfying those requirements in the environment in which the applications are to operate).

Architecture: Engineering model

The engineering model is a framework of compiler and components for realizing the computation in heterogeneous environments, namely:

D thread and task management - a thread is a function to be carried out, and is allocated to a task which will execute it

D address space management ) inter-address space communication ) distributed application protocols

b network protocols

b interface locator - identifies the location of a specific interface

b interface traders - provide directory facilities for identification of interfaces, both imported and exported

b configuration managers ) atomic operation manager b replicated interface manager

The engineering model provides the system designer with a view of engineering trade-offs available when providing a mechanism to support a particular function defined in the computation model. By making different trade-offs the implementor may vary the quality attributes of a system in terms of its dependability (reliability, availability, security, safety) and performance without disturbing its function. This is an important feature of ANSA since it decouples application design from technology to a significant degree. By conforming to the computation model, a programmer is given a guarantee that his program will be able to operate in a variety of different quality environments without modification of the source. The engineering model gives the system implementor a toolbox for building an environment of the appropriate quality to the task in hand. In other words, by making this separation it is possible to identify what forms of transparency are required

5 Advanced Networked Systems Architecture by a distributed application and to be able to choose the most appropriate technique for providing the required transparency for each application.

Architecture: Overall structure

The way in which the components of ANSA fit together is shown in Figures 1,2 and 3, in increasing level of detail.

Figure 1 shows two ANSA systems. Each ANSA system is running several applications, which are linked together with a trader and configuration manager. The trader provides a directory structure that can be searched by path name, by property values, or by some combination of the two. A server can export an interface reference to the trader to make it accessible to other applications. An import operation is provided to clients so that they can retrieve interfaces from the trader. The configuration manager provides the means to start new application components executing in an ANSA system. To provide federation between the systems, the two traders are also linked together. This enables an application to export an object which the other imports, such that to the user the distributed system appears to be running on a single host.

Figure 1: A Federation of ANSA Systems

\ Configuration C and T are special Manager Trader built-in “applications”

Figure 2 introduces the nucleus components. These take the basic resources of the local infrastructure and build on them to provide a basic distributed computing environment common to each host. These nucleus components are then able to work together, along with the trader and configuration manager (which may themselves be distributed), to provide a basic support platform for distributed computing.

Figure 3 expands the nucleus components in both directions. The transparency components provide additional functions which enable the various aspects of distribution to be made transparent to applications. For example they may hide the location of an application from any other component, or hide the fact that an application may be either local or remote. Advanced Networked Systems Architecture

Figure 2: An ANSA system

Applicatio

host systems

Figure 3: An ANSA capsule

Transparency stubs

ANSA interface ---1 to local functions Nucleus

1 i I I Executive I I I I can replace these with local protocols I I 1 I equivalents if interworking is I I I 1 not required

Message passing

Local CPC management

local local memory communications management Advanced Networked Systems Architecture

Below the nucleus there are components to provide execution protocols and message passing protocols. If interworking between heterogenous systems is not required, either or both of these can be replaced by local equivalents. Below these will be the local cpu management, communications, memory management and other local functions.

The engineering model specifies the mechanisms needed to provide the various kinds of transparency (see section 3.3) and the protocols for interaction between nucleus components on different hosts. Application components are structured according to the computational model and the distributed computing aspects of the application are compiled into calls on the interfaces to the appropriate transparency and platform components.

The engineering model can also be taken as a template for the implementation of the nucleus, platform and transparency components, although this is not mandatory for either application portability across implementations, nor for interworking between them. The conformance criteria for portability are the interfaces to the transparency and platform components.

Once conformance to the computation model has been established, it is possible to conceive of multiple implementations of the architecture which make different engineering trade-offs. To provide interworking between systems that have made different implementation choices it will be necessary to provide gateway functions, but this will be confined to simple interface adaptors that match the different engineering trade-offs rather than changes to the applications themselves.

Many hosts will provide a range of functions and resources beyond those needed by the platform and may wish to contribute them to the distributed computing environment as potential application components. This can be achieved by extending the nucleus with additional distributed computing environment interfaces that map onto the locally available functions. Thus the nucleus acts as an architectural switch, transparently linking application components to both local and remote resources in a uniform way.

2.2.2 Software

The Testbench software is a suite of ANSI C programs that conform to the architecture described in 2.2.1. These programs represent an instantiation of parts of the architecture intended for porting across the current generation of operating systems and network protocols. In particular, the standard distribution includes porting instructions for SunOS, HP-UX, Ultrix, VMS and MSDOS.

8 Advanced Networked Systems Architecture

The software includes the following modules A threads management package, whose function is to provide for concurrency within an address space, if it is not provided by the host. Concurrency is needed a) so that servers can respond to multiple clients in parallel and b) so that clients can distribute computation in time (i.e perform parallel tasks) as well as space (i.e. perform remote tasks), or both (run remote tasks in parallel) An address space management package, whose function is to complement the threads package with facilities for managing multiple stacks, communications buffers and a shared heap within a single address space. Multiple stacks are needed to be able to support true concurrency

An inter-address space communrcations package (the ‘interpreter 3 to provide an implementation-independent standard interface for interactions between threads in separate address spaces A remote execution protocol (REX) provides for the transport of messages to implement the communications requirements of the inter-address space communications package. It provides functions of transport, error recovery, fragmentation of large messages and control of optional end-to-end connections An interface description language (IDLI is used to describe interfaces between application components. It is derived from the Courier language developed by Xerox [XEROX 811 An IDL processor reads interface descriptions and generates libraries of stub procedures in C. These procedures handle the marshalling (packing of arguments/results into buffers for transmission), unmarshalling (unpacking arguments/results from transmission buffers), and communications (exchanging buffers between the distributed portions of the application) An application description language (DPL) A DPL preprocessor for C. The DPL preprocessor for C extracts statements that augment an ANSI standard C program to connect to interfaces and invoke remote operations, and translates these statements into calls to the appropriate stub procedures and inter- address space communication package calls A trader. This is a distributed application component which acts as a directory and management facility for distributed application components) A configuration manager. This is a distributed application component which provides a means to instantiate application components above the platform.

The Architecture is not restricted to any particular programming language. operating system or network or hardware platform

9 Advanced Networked Systems Architecture

3 Architectural philosophy

The goal of the ANSA project has been to develop an architecture which provides the simplest set of concepts necessary to build distributed systems. This philosophy has had a profound effect on the design of the architecture and the Testbench. To understand ANSA it is important to have an appreciation of the design philosophy and total scope of the architecture.

3.1 Viewpoints on distributed processing

To derive the Advanced Networked Systems Architecture, the ANSA team studied current practice and research in distributed computing and system design techniques.

The study revealed that distributed processing experts have different viewpoints about what are the crucial concerns that make up ‘distributed processing’. Further examination revealed that five viewpoints were dominant and that each viewpoint in some way or other acknowledged the concerns addressed in other viewpoints but with a lesser priority. As a consequence, the description of ANSA is structured as a set of projections of the architecture onto models representing these five viewpoints. The models that make up ANSA are termed enterprise, information, computation, engineering and technology projections. A distributed system can be described using any one of these models of AMA, and the resulting descriptions reveal different facets of the system. Each description in ANSA is self-contained and complete. The difference between desciptions is not how much of the system they describe, but rather what aspects of the system they emphasize.

Designers used to working with one model often have difficulty assessing the relevance of the concepts used in any of the others. It is important to realize, however, that a system has ultimately to be described in each model; all are equally valid, and it is a mistake to argue which is the more fundamental. Designs and designers not aware of the other projections frequently overlook important areas in the design of a system. A successful design is one which satisfies requirements identified for each model.

3.1.1 Enterprise model

The purpose of the enterprise model of ANSA is to provide a framework for explaining and justifying the role of an information processing system within an organization. An enterprise description is one that describes the overall objectives of a system in terms of roles (for people), actions, goals and policies. It specifies the activities that take place within the organization using the system, the roles that people play in the organization, and the interactions

11 Advanced Networked Systems Architecture between the organization, the system and the environment in which system and organization are placed.

3.1.2 Information model

The information model of ANSA provides a framework to describe the information requirements of a system. An information description of a system is made up of structures of information elements, rules stating the relationships of the information elements in a system, and constraints on the information elements and the rules. In a distributed system, information models must also show both how information is partitioned across logical boundaries, and the required quality attributes of information. The model does not have to differentiate between parts that are to be automated, or performed manually.

3.1.3 Computation model

The computation model provides a framework for modelling the operations of information transfer, retrieval, transformation and management necessary to automate information processing. The mechanisms required to support the computation model thus defined are specified in the engineering projection of the system.

A computation description of a system partitions the required transformations among processing objects as necessary to achieve the complete set of transformations. The partitioning thus defined is logical and not location-dependent.

A computation description concerns the structuring of appli cations sindependently of the computer systems and networks on which they run.

For distributed information processing, the ANSA computation model necessarily enables modelling of application components which can execute in parallel, can fail independently, and in which bindings can be established dynamically.

3.1.4 Engineering model

The engineering model provides a framework for describing how to mechanize an application definition identified using the computation model. This support will include definition of physical distribution (as required) to realize the partitioning defined in the computation projection.

3.1.5 Technology model

The technology model provides a framework for describing the technical artifacts (realized components) from which the distributed system is built. This description can include OS1 and proprietary standards as needed. It

12 Advanced Networked Systems Architecture shows how the hardware and software that comprise the local operating systems, the input/output devices, storage, points of access to communications, are mapped onto the mechanisms identified in the engineering model.

3.2 The computation model

In the design of an ANSA conformant system, much of the detail flows from design decisions visible in the computation model. The design philosophy for the computation model has been to find the smallest number of concepts needed to describe distributed computations and to propose a declarative rather than an imperative formulation of each concept. By taking a declarative approach the path is opened to greater compile-time checking of the safety of a program, the automatic generation of support code and optimization of special cases by compilers and other development tools. These are important attributes of a distributed computing environment, since they help to reduce the complexity that is brought into the application programmer’s world due to distribution.

Priority has to be given to putting as much of the information about distribution as possible into description of interfaces between application components since this enables a distributed systems expert to comprehend a complex application as a structure of encapsulated models behind narrow interfaces and enables the application expert to focus on the application specific code, aware of the guarantees and limitations, but not the intricacies, of the distributed computing environment.

The ANSA computation model concentrates on the problems and opportunities presented by the execution of applications on many loosely coupled computer systems. It provides the concepts needed to define distributed application programming languages. In particular it defines the programming language features that are necessary to be able to write application programs for a distributed environment.

Since ANSA is an architecture for open systems it is not viable to impose a single language as the conformance criterion for the computation model. Parts of a large application may be written in different languages for reasons of history or suitability and any scheme for distributed application programming must face this reality.

The approach adopted for the definition of the computation model for ANSA has been to identify the functions that must be available to the programmer and the constraints on program structure necessary to enable distribution, rather than any particular syntax. The outcome of this approach is that all programs, in whatever language, are written with the same abstract (distributed) machine as their target. Porting a program from one system to another is then a matter of changing only the local representation of the abstract machine as it appears in the application programming language, while no changes are required to the application program itself.

13 Advanced Networked Systems Architecture

3.3 Distribution transparency

The extent to which it is practicable to distribute a computation may depend on many things. Where communication costs are high it may be prudent to minimize the distribution of those parts that are expected to interact heavily. Where parts of a computation are processor intensive, the extra concurrency introduced by distribution may lead to an improvement in performance. Where replication is used to give increased reliability and availability it is important that the software replicas are located on distinct hardware replicas. Most trade-offs of this kind do not really belong to the computation model; they belong to the engineering model.

To lighten the load on the programmer, various distribution transparencies are made available in ANSA. Transparencies determine the extent to which programmers need to be concerned with, and have control over, the integration of disparately located pieces of application program. In a fully transparent program the programmer has delegated all responsibility for distribution to the support environment provided in the engineering model.

In a non transparent program the programmer has taken full responsibility for all aspects of distribution. This is achieved by direct manipulation of the support environment. However, components of an application program are usually directly concerned with only a few aspects of distribution so it is convenient to be able to include transparencies individually. This is known as selective transparency.

In the ANSA computation model, a programmer can select the kinds of transparency required when declaring an interface between two application components. The particular transparencies supported are:

b Access transparency provides identical invocation semantics for both local and remote components. The overriding criterion is to remove the concept of co-located (local) components from the computation projection. (Local optimizations can be put back in by the engineering model where appropriate.) All invocations can thus be considered as remote b Locution transparency hides the exact location of a program component from any other component that interacts with it, enabling components to be located anywhere in the computing system ) C~>ncurrency transparency hides the existence of concurrent users of a service. If a use of a service has concurrency transparency, then the user of the service is unable to observe any effects due to other concurrent users of the service ) Failure transparency hides the effects of partially completed interactions that fail for what ever reason. It requires mechanisms for making interactions atomic, such that they either entirely succeed or if they fail, fail completely

14 Advanced Networked Systems Architecture

b Replication transparency hides the effects of having multiple copies of program components (to provide for an increase in dependability or availability). It requires mechanisms for interacting with and managing component groups (see section 3.18) ) Migration transparency is a dynamic form of location transparency; it hides the effect of a program component being moved from one location to another while it is being used by another component The Testbench software currently supports access, location and concurrency transparencies. Partial support for replication transparency will be available in late 1989.

3.4 The Object-Oriented Approach

In distributed systems the physical separation of program compenents is unavoidable, introducing the possibility both of failures occurring during communication and partial failure of the program. In order to have equivalent failure semantics to a non-distributed application, the invocation mechanism must allow such failures to be reported and processed. The decomposition of a program for distribution is is limited due to an increase in concurrency and the requirement for shared state and mechanisms for sequencing and scheduling. By using the object-oriented approach, the ANSA computation model is able to take the scoping and encapsulation mechaism down to the level of simple data structures and data types, if required.

A program may go through various stages before it is fully evaluated. Associated with these stages are several tasks. A compiler or pre-processor may evaluate all constant expressions and do some dereferencing early on. A linker may check that all cross references between the parts that are to be linked are satisfied. These tasks, and in particular their ordering, is relevant to the computation model. The ordering constrains the latest point at which a given task may be performed. In general, if a task is performed at an earlier stage than is strictly necessary the engineering model has greater scope for optimization, at the expense of flexibility. This concept of evaluation stages in a distributed system has an important influence on the computation model and may be thought of as distribution in time. The ANSA computation model includes syntactic structures and a flexible type system which permits a wide range of checks to be made statically at compile time, without compromising the ability of the programmer to defer some decisions to run time by explicit control.

Programs are not static; their requirements, environment and supporting technology change over time. They also require tuning to increase performance, reliability or other qualities and porting to new environments. There is great benefit to be gained from the ability to re-use program components in new programs, either by making new connections or by

15 Advanced Networked Systems Architecture

copying their implementations.ecisions to run time by explicit control. This leads to the requirement that programs should be highly modular and should restrict knowledge of the components of a program on a need to know basis. Backward compatibility with non distributed program components requires mechanisms for embedding or encapsulating programming components by wrapping them up in the necessary distribution transparencies. In the ANSA computation model, this is accomplished by the use of the object-oriented philosophy and by the separation of interface specification from object definition.

3.5 Type checking

Because the component parts of distributed applications programs are separated in both space and time, extra care needs to be taken when composing them into a whole for final evaluation. The computation model concentrates on those checks designed to ensure that the assumptions made by programmers in different places and at different times are still valid. These checks are generally known as type checks and will check such things as the use of operators, types of data items, and matching of operations used and provided.

In addition to type checks, the computation model is also concerned with checking that components of an application that interact with each other have compatible transparencies. Other checks such as access controls and consistency constraints are passed through the computation model from information models for the system in question.

Checks can be performed at various points but steps must be taken to ensure that early checks are still valid during the final evaluation stage.

In the ANSA computation model, interfaces are typed. An interface type describes both the operations in the interface and the properties of those operations in terms of transparency attributes.

In the ANSA Testbench some type checking is performed by the IDL and DPL processors; some type checking is deferred to the application programming language.

3.6 Configuration

There are two styles of distributed application programming. In the first the application is treated as a single large program. The program may be divided up into separately compiled components for reasons of efficiency and modularity. The program, once compiled, is then loaded into an appropriate configuration of computers and allowed to execute. In the second style, an application is treated as a number of separate programs which are independently compiled and loaded into individual computers. Programmers who work in this style often refer to ‘server’ programs and ‘client’ programs to indicate whether a program expects to be invoked by others, or whether it is

16 Advanced Networked Systems Architecture responsible for invoking others. This second style is dependent upon some form of system directory (i.e. a trader) that enables servers to register their presence in the network and for clients to locate servers. In some systems the directory is itself a server, in others it is decentralized and broadcast algorithms are used to locate servers.

The distinction between these two styles is one of early versus late binding.

The first program style is potentially more convenient than the second when a single application is to be distributed. However, this style does not permit an application to be developed in which some components are developed independently of the others. This requirement is inevitable in an open systems context, because it is unlikely two organizations will be willing to lock together their programming environments in order to interwork over a network! The second style has in the past lead to a rather rigid assignment of components to computers, since externally visible names have to be invented and this can be inconvenient if the system supporting the application is restructured. In the computation model this problem is avoided by the introduction of the trader. This acts as a directory facility in which exported functions are entered, and through which imported fuctions may be located.

It can be concluded that both styles are appropriate in different circumstances, and the ANSA computation model provides for both.

The ANSA Testbench predominantly supports the client - server style of configuration, although an application programmer does have access to the facilities required to provide the distributed program style for himself.

3.7 tnterfaces

As discussed above, the components of a distributed program may be written in different languages by different programmers in different places at different times. In order for a component to be constructed independently of another component with which it is to interact, a precise specification of the interactions between them is necessary.

This interface specification can be used to generate the interaction code as well as to independently check that one component is correctly interacting with another. Later on, when the program is finally assembled a check can be made that each pair of interacting components is using the same interface specification.

An interface specification requires:

b An action specification to define the actions that one program component may request another to perform.

b A data specification to define the types of data that may be passed with each action request and reply.

17 Advanced Networked Systems Architecture

b A property specification to define the transparencies and constraints to be associated with each action, or the with the whole interface. In general, an interface specification may be bi-directional and specify the actions each of a pair of program components could request the other to perform. For simplicity, the ANSA computation model only contains uni-directional interface specifications which directly support the client/server style of interaction. A bi-directional interaction can easily be specified as a composition of two uni-directional interface specifications in opposite directions.

A program component acting as a client may request a number of other components to perform actions and thus needs a different interface with each of these. Equally, a program component acting as a server may perform actions requested by a number of client components. There is no reason to restrict a server to provide interfaces with identical specifications to each of its clients. Allowing a server to provide multiple interfaces with distinct specifications enables a computation description to directly reflect the different roles identified in the enterprise description, especially with regard to access control. Multiple interfaces also enable knowledge of other components to be more tightly scoped. This conforms to the “need to know” principle required for program evolution and component re-use and is an important feature of the ANSA computation model.

The ANSA Testbench includes an interface definition language (IDL) for action and data specification. Property specification will be added in the course of the current workplan.

3.8 Operations

Actions defined as procedures with multiple arguments and results provide the protocol part of interface specifications and are known as operations. The data specification requirement is provided by the definition of the argument and result types.

In most programming languages procedures can have multiple arguments. In only a few programming languages can they return multiple results. Single results are asymmetrical and restrictive, especially in a distributed system where computation level interactions must be turned into message passing at some lower level with performance more dependent on message latency than message size. Consequently the ANSA computation model assumes symmetry.

In the ANSA computation model, properties, such as transparency or synchronization constraints, are specified declaratively for each operation or the whole interface specification and automatically inserted by the invocation mechanism.

18 Advanced Networked Systems Architecture

The Testbench IDL provides for multiple arguments and results of various canonical concrete data types and composite types; a more general interface reference type can be used to convey abstract data types.

3.9 Invocations

Operations can only be invoked via their enclosing interface. Because the program component providing the interface may be remote an operation invocation must be via an interface reference in order to preserve access transparency.

The results of an operation are normally required before the client can proceed. In a distributed and therefore concurrent system this is achieved by blocking the client until the server has performed the operation and delivered the results. Thus the client and server are synchronized by the invocation. The local optimization of a synchronous operation is the procedure call.

Where the client does not require an operation to deliver any results, the synchronous invocation suffers from latency and a reduction of concurrency in distributed systems. Asynchronous operations remove the latency and preserve the concurrency when immediate results are not required. The engineering level can make further optimizations by concatenating messages. Some systems describe such invocations as being ‘streamed’.

There is no confirmation that asynchronous operations have terminated or even started, but if they are serialized with synchronous operations in the same interface then the result of a synchronous operation can indicate which of the preceding asynchronous operations failed. Thus a synchronous operation can be used to re-synchronize a client and server after a stream of asynchronous operations has transferred data at full speed (i.e. a pipeline with no latency and a concurrent client and server). This kind of synchronization must be explicit in the specification of the interface and therefore a conformance requirement for an implementation.

Distributed computing systems have unpredictable delays and partial failures, which may be silent. A client requires some way of indicating the urgency with which it requires a server to perform an operation and whether or not it is to keep trying forever or give up at some point so that corrective action can be taken.

Time is the only universal means of measuring urgency. Therefore a deadline may be required by each operation. Soft deadlines only affect the scheduling of an operation. Hard deadlines also prematurely terminate an operation when the deadline is reached.

The ANSA computation model provides for synchronous and asynchronous operations, and work on streaming and deadlines will follow.

19 Advanced Networked Systems Architecture

In the ANSA Testbench, synchronous and asynchronous operations are currently supported, support for deadlines and streaming is scheduled for 1990.

3.10 Terminations

There is not necessarily a ‘right answer’ for every operation. An often used example is the popping of an empty stack, say of integers. A pop operation on a stack of integers normally returns an integer but if the stack is empty there is no integer that can be returned. The pop operation needs to return some other response that is distinguishable from the responses that indicate integers so that different actions can be taken.

Operations therefore require multiple responses (each of which may consist of multiple results). In the ANSA computation model, these responses are distinguished by name and known as terminations. Mechanisms are required for raising these terminations from within an operation and for changing the sequence of actions taken after an invocation of an operation depending on the termination it returns.

In any operation invocation one termination will cause no changes to the sequence of following actions. This termination is distinguished by not having a name and may be thought of as the ‘normal’ response of the operation.

This termination mechanism can also be used by the engineering support environment for reporting engineering or transparency failures to the invoker of an operation.

The ANSA Testbench includes a limited form of the termination features defined in the computation model.

3.11 Atomicity

When an operation prematurely terminates it is not always possible to tell whether this was because of a partial failure. Partial failure may result in inconsistent state and orphans (operations which continue to execute after their invocation has been terminated).

If the client has sufficient knowledge and is still in communication with the server it may be able to organize a cl’ean up, but in general this cannot be assumed. The only general mechanism which can cope with cleaning up the unwanted side effects of partial failure is the use of atomic operations, which either entirely succeed or, if they fail, fail completely. It is not possible to build atomic operations out of non-atomic operations in a distributed computing system because the atomicity mechanisms require intimate knowledge and control over the engineering support environment. Atomic operations will therefore become part of the ANSA computation model.

20 Advanced Networked Systems Architecture

3.12 Objects

It is very hard in a networked system to achieve a workable, let alone efficient, implementation of global distributed storage. It is therefore necessary to look for a programming model which partitions and encapsulates state in order to describe the components of a distributed program. Such a model is common to the object-oriented programming languages such as Emerald [BLACK 871 and Argus [LISKOV 831. In these languages each object provides a set of operations by which it can be manipulated. Externally these operations are known by their names. The binding of operation names to computations that perform operations is an internal property of each object. Thus it is possible for different objects to respond to the same operation names, but to have different implementations of those operations.

This indirection from operation name to implementation has useful properties for distributed computations. Firstly it allows for heterogeneity: two interacting objects need not share the same infrastructure; they merely require communication between their infrastructures. Secondly the indirection provides a point to transparently insert the mechanisms that provide for communications. Thirdly the indirection makes it possible to substitute replacement objects without requiring the users of that object’s operations to take any additional action This has important benefits for software maintainability and evolution.

The ANSA (object-oriented) computation model is specialized for distribution by packaging sets of operations into interfaces to restrict the scope of operation names as tightly as possible and by always accessing interfaces indirectly so as to preserve location transparency.

All data is stored in objects and accessed indirectly via interfaces. Thus the ANSA computation model only deals with interface references. It makes no statements about how values are represented. The obvious optimizations can be made when invoking references to the interfaces of local (co-located) objects, especially trivial ones such as integers and booleans, but such optimizations are definitely not part of the computation model and are an issue for the mechanization of the model which is considered in the engineering model.

3.13 Security

The computation model fully supports the addition of security issues. The formal addition of these, and the statement of security constraints (e.g. authentication, confidentiality) are topics for future, scheduled, work.

21 Advanced Networked Systems Architecture

3.14 Trading

In a distributed environment it is necessary to provide a means by which the separate parts of a distributed application can rendezvous. This is called trading in ANSA and is an interface provided to every object. Trading gives access to a directory structure that can be searched by path name, or by property values, or by some combination of both. A server can export an interface reference to the trading service to make it accessible to other programs. An import operation is provided to clients so that they can retrieve interfaces from the trader.

The trader performs type matching of imports and exports. This is done by maintaining a typename space in the trading system. The name space is an acyclic graph showing the sub-typing relationship between types. The trader provides operations for programs to add and retrieve types by name from the type graph. Imports and exports are typed: trading operations are parameterized by types and the trading service will only search through exports of the the required type and its sub-types when trying to match on path and properties.

The import operation returns an interface reference to the client. These references are unambiguous in a trading domain (a trading system can be structured as a federation of autonomous trading domains, managed by separate trading authorities, and a domain can be partitioned into a hierarchy of sub-domains). The importer can retain the reference for as long as required. If the distributed computing environment has location transparency enabled, the system will be searched to find the object. If location transparency is not enabled, an address indication in the interface reference is assumed to be absolute.

3.15 Object groups

An object group is a collection of objects whose members portray an abstraction of a single object. The object group model defines the mechanisms of an infrastructure that supports the existence and interworking of object groups.

The mechanisms of the infrastructure encompass controls for object group instantiation and initialization, group interactions, group fault detection and recovery, and group trading.

In systems comprised of object groups, invocations of an operation on a group are automatically propagated to all members of the group in accordance with ordering and reliability guarantees specified in the interface description of the group.

The automatic propagation of invocation requests to all members of a group ensures distributed knowledge. Each member acquires knowledge of its group’s activities, whether it is the sole executor of a request, the executor of

22 Advanced Networked Systems Architecture a shared request, or the observer of anothers’ request. Accordingly, the operational members of a group possess the information needed to make collective decisions on what to do whenever they detect the failure of another member to deliver its intended service. A number of failure detection and recovery methods are accommodated in the model.

The provision of object group management facilities in distributed application environments is not new [BIRMAN 85, CHERITON 85, COOPER 851. The contribution of the proposed model lies in its unification of these ideas and the clear separation between group management and group communications. The object group model distinguishes three basic types of group. The classification is shown in Figure 4 in which the directed arrows denote composition.

Figure 4: Object group classifications

: object

The membership of each distinguished group type is formed from a collection of single objects or from combinations of other groups.

Functionally distributed group

A functionally distributed group is comprised of non-replicated members sharing the provision of the service defined by the group’s interface. This sharing might reflect partitioning of data, of functions or load-sharing. A single object is regarded in ANSA as an important special case of a functional distributed group. An example is a conferencing service that may be provided by propagating each message to the participating conference members, each of which may not be identical in operation. Another example is a highly parallel array processing service in which each member receives and operates on different elements of the same array to compute part of a collective result.

23 Advanced Networked Systems Archztecture

Without member replication, a functionally distributed group is susceptible to partial or total service failures in the event of a processing site crash or network partition. At best, the group infrastructure can detect and signal such failures to the remaining operational members. They may elect to continue with partial service or be forced to withdraw the service entirely.

The key to failure-resilience is to replicate each member of a group to such a degree that the probability of all members failing becomes acceptably small.

The specific strategies for replication are outlined below. These strategies centre on the principle that any group of n = k+ 1 replicas can maintain operational service provided that no more than k replicas fail. These replicas execute on different processing sites with independent failure modes.

Coordinated replica group

The coordinated replica group is based on principles and techniques originated by the ISIS project at Cornell [BIRMA?U’871. The strategy incorporates a backup mechanism which perrnits a failed computation to be restarted from a prior state saved in a checkpoint.

In each interaction with a coordinated group, one replica plays the role of a coordinator, and it alone performs the operation. Other replicas of the group are passive, monitoring cohorts.

The coordinator propagates checkpoints to its cohorts to keep them informed of progress. If the coordinator fails (e.g. due to a site crash), its cohorts will respond to this incident and one of them will take over the role of the coordinator to recover and resume the execution of the disrupted operation from the most recently received checkpoint. The remaining cohorts will then resume monitoring (but of the new coordinator). Each coordinated replica group can remain operational as long as each failure can be detected and a substitute coordinator can be assigned.

Since all members of a group are identical, each replica may independently become a coordinator for any individual interaction, provided that it does not engage in a computation that causes state inconsistencies with other computations in the group. Moreover, each replica may play the role of several coordinators by engaging in several non-conflicting interactions concurrently.

Consequently, group replicas can participate in sharing the processing load of the group, with a concomitant increase in concurrency.

The correct behaviour of a coordinated replica group depends on three assumptions [SCHLICHTIN~G~~]:

b all replica specifications are equivalent and correct

b the physical processors, memories and communications supporting group replicas are fail-stop (i.e. they fail by halting

24 Advanced Networked Systems Architecture

and do not produce undetected spurious or malicious messages for consumption by other sites)

b all operational sites are informed of all related fail-stop incidents

The coordinated replica approach is applicable to systems in which high levels of failure-resilient computation are required and the overheads of state checkpointing and recovery are acceptable.

Parallel replica group

The parallel replica group was originated by the Circus project at Berkeley [CO~PER~~].

Unlike a coordinated replica group, a parallel replica group eliminates the need for checkpointing by ensuring that all interactions with the group are executed in parallel by all replicas. This strategy provides continuous progress of a computation, since non-failed replicas continue to execute the same operation(s) despite the failure of others.

The replicas of a group must exhibit deterministic behaviour; that is, they must each possess the same (abstract) state, receive the same requests, produce the same results and enter identical new states. Thus a client sees identical results from a server group, and a server group sees identical requests from a client.

If, however, group members are deterministic up to equivalance, the results of a replicated request will be equivalent according to some application dependent relationship. In such cases, the results produced by replicas may be logically equivalent, but not necessarily numerically and/or textually identical. A voter or collator mechanism must therefore be employed to reduce the set of equivalent results to a single result. A collator of this type would typically be used in applications which involve N-version programmed groups.

The correct behaviour of a parallel replica group may be further secured by incorporating voting techniques based on byzantine agreement protocols [DOLEV 871. Such majority voting protocols are designed to overcome the problem of uncontrolled, non-fail stop processing sites and/or the effects of malicious group members.

The parallel replica approach is applicable to systems which require extreme failure-resilience either with or without fail-stop assumptions.

Computation model implications

The foregoing outlined three distinguished group types, each exhibiting some greater or lesser degree of resilience against failures. Each individual type may be used in isolation to construct distributed system services, or may be combined with other types in different configurations to achieve services offering increased distribution an&or failure-resilient operation.

25 Advanced Networked Systems Architecture

The ANSA computation model provides a framework via interface properties to represent interfaces structured according to any of the group types. The ANSA Testbench will provide a Group Execution Protocol in support of object groups within the next year, and extensions to DPL that enable the collating and collection functions for the two forms of replica group to be generated automatically.

26 Advanced Networked Systems Architecture

4 Testbench software

This chapter describes the structure of the ANSA Testbench. In particular, it describes and discusses the way that this implementation is mapped onto industry-standard operating system platforms, the structure of the interpreter, and the way that common interfaces are generated and pre- processors are invoked.

The Testbench is an implementation, based upon the ANSA architecture, of an infrastructure for distributed systems. It is an example implementation and does not represent a conformance test against which other implementations of the architecture have to be measured. The Testbench constructs provide the programmer with a number of facilities that enable him to build distributed processing systems.

The ANSA Testbench is the minimum set of functions and extensions necessary to convert an existing system like UNIX’, VMS2, or MS-DOS3 into an ANSA distributed computing environment such that all further components of an ANSA application can be implemented within the architecture. The Testbench provides the functionality equivalent to the nucleus and a vertical slice of the platform in Figure 3.

In order to maximize the portability of the Testbench to other environments, the X/OPEN portability guidelines have been followed. C is the standard source language for all software and, where possible, some anticipated features of the ANSI standard for the C language have been used.

The versions of UNIX used (HP-UX4, SunOS’, Ultrix”) provide the socket abstraction for interaction with the TCP and UDP transport protocols. Since the implementations of REX, GEX, etc. implicitly depend upon the socket interface to the kernel, this will affect the portability of the Testbench to versions of UNIX which use other mechanisms for interprocess communication, (e.g. ‘streams’). In particular, the current ports to VMS and MS-DOS rely upon the availability of TCP-IP packages providing the socket abstraction to permit interworking with UNIX versions.

1 C’NIXis a trademark of AT&T Bell Laboratories. 2 VMS and Ultrix are trademarks of Digital Equipment Corporation. 3 MS-DOS is a trademark of _Microsoft Corporation.

4 HP-ux is a trademark of Hewlett-Packard Company. 3 SunOS is a trademark of Sun Microsystems Incorporated

27 Advanced Networked Systems Architecture

4.1 Overall structure

The Testbench is implemented according to the engineering model for AMA. In the engineering model (depicted in Figure 5) the components of the support environment are shown as distinct objects.

Figure 5: The engineering model

I Host 1 / Host2

A key engineering concept is the capsule, which is a collection of computation (i.e. programmer defined) objects whose mutual encapsulation and external bindings are enforced at run time. A capsule is constructed from a number of application objects and transparency stubs plus an interpreter object. They are connected together by an inter-capsule communications object.

In Figure 5, objects and their interconnections are shown. Two capsules are shown in their decomposed form.

The capsule object consists of one or more (virtual) processors executing the instructions of each object and a special processor (with operations like call, cast,fork) for synchronizing the other processors and interpreting operations on nucleus resources. (In an object-oriented implementation, the interpreter would not be visible, since it would become part of the nucleus resources. In

28 Advanced Networked Systems Architecture the Testbench the data structures for nucleus resources and the operations on these structures are separate).

The nucleus is a special type of object in that it presents a capsule interface to other capsules, but also has direct access to the tables controlling the interpreters and thus the underlying hardware.

The message passing services encapsulate the network linking the two hosts.

REX is a protocol that extends inter-capsule communication between hosts using an appropriate message passing service (MI’S). The message passing service encapsulates the detail of the particular protocol used to transport messages between capsules. The interface between REX and MPS is designed to accommodate connection-oriented and connectionless protocols. In the Testbench this is demonstrated by the existence of MPS objects for both TCP and UDP.

The Group Execution Protocol (GEX) is inserted in place of REX for interactions with object groups, and the interpreter mechanisms for setting up bindings between capsules select an appropriate execution protocol and MI’S using information supplied by the trader.

4.2 Implementation of engineering concepts

In this section implementation considerations of some of the definitions of fundamental ANSA engineering model concepts in the Testbench are presented to give an indication of the functions included in the Testbench.

0 bjects An object is a unit in the structure of a system. This definition allows the term object to be used at many different levels of abstraction. When the discussion is focused on a particular viewpoint, more precise terms are used: for example, in the context of an object-oriented programming language, we are concerned with computation objects.

Capsules A capsule is:

b a homogeneous addressing domain ) a container for computation objects ) the unit of replication for object groups A capsule represents a virtual address space, and will usually be mapped onto the corresponding abstraction in the local operating system - e.g., a process under UNIX.

29 Advanced Networked Systems Architecture

Interpreter The capsule interpreter object consists of one or more (virtual) processors executing the instructions of each object and a special processor (supporting call, cast, fork, etc.) for synchronizing the other processors and interpreting inter-object interactions.

Nucleus The nucleus is the local low-level resource manager for the support environment. In practice, this is implemented as a layer above the local operating system to provide a portable interface to the infrastructure services.

Conceptually, the nucleus is a separate capsule invoked via calls. However, since the call mechanism is access and location transparent, implementations may choose to provide nucleus services by a combination of in-capsule library calls and remote operations.

Threads and tasks A thread is an independent execution path through a sequence of operations while a task is a virtual processor which provides a thread with the resources it requires. In general, a thread may represent any unit of potentially concurrent activity. However, to make progress, a thread must be bound to a task.

While a thread is being executed, it requires additional resources to store its intermediate state. These resources are provided by a task.

All capsules are multi-threaded and may optionally be multi-tasking. Threads only identify potential concurrency, while tasks provide the resources for real concurrency. When the number of threads exceeds the number of tasks then some threads must be serialized. The scheduler is conservative in the allocation of tasks to threads to minimise wastage of memory space.

Where the local operating system only supports one virtual processor per capsule, multi-tasking capsules can be implemented via a coroutine package.

Sockets The unit of addressing for inter-capsule invocations (and not to be confused with Berkeley Unix sockets). Sockets are created via nucleus services. The export operation allows a socket name to be published in an external trading scope, and thus be made accessible to clients outside the capsule.

Sockets are associated wi th defined interfaces, and thus present a typed view of the exporting capsule. All calls and casts are targeted at sockets.

30 Advanced Networked Systems Architecture

Plugs and channels A plug is the access point for an imported interface. Inter-capsule operations are invoked upon plugs. Each plug is bound to a corresponding socket. The path from plug to socket is known as a channel.

4.3 Implementation options

In mapping ANSA computation level objects onto UNIX/VMS processes there are three basic choices: 1) all application objects in a single process 2) one process for each task of an application object 3) one process per application object

Putting all application objects into a single process would leave the implementation without any memory protection between application objects and C is not robust enough for this to be tenable. Having a process per task with application object state in shared data segments is more robust but very inefficient. The option of mapping a single application object into a UNIXNMS process provides a reasonable compromise between robustness and efficiency.

4.4 Process mapping

Mapping an application object onto a UNIXNMS process (Figure 6) requires that at least the stubs and the interpreter be in-process. The nucleus and binder can also be in-process because the required functions can be provided either by in-process library modules or are already present in the UNIXNMS kernel.

If REX were in a separate process, the interpreter would require an inter- process communication mechanism of nearly the same complexity and performance to communicate with it. REX is placed in process with one or more message passing services, e.g. UDP and TCP which are provided by the kernel. The usage of REX is sufficiently asynchronous to enable a reasonable simulation of a multi-threaded capsule to be provided by the interpreter.

An application object interacts with application objects in other capsules via stub procedures which provide marshalling, unmarshalling, buffer management and thread management. The stubs interact with the interpreter via the call, cast and dispatch operations. Because the nucleus and binder are local objects, these operations can be accessed via a direct procedural interface.

The interpreter is controlled by channel and thread tables to which the nucleus has direct access. It also has a session by which it exchanges data with the communications objects via a parameterless procedure interface.

31 Advanced Networked Systems Architecture

Figure 6: UNIXNMS process mapping

I I I I I I 1 I I I I I I I I I I I I I I I L

The trader is implemented out of process as an ANSA application program.

4.5 Using the Testbench

This section describes the use of the tools provided with the Testbench to aid distributed application construction. Two principal tools are provided to facilitate the construction of distributed applications:

b a stub compiler - which converts an interface specification for a service into a set of stub routines for marshalling/unmarshalling arguments and results and the communication of arguments and results between the logically separate parts of a distributed application. As the Testbench develops, the stubs will be enhanced to provide for the transparencies.

b a pre-processor - which processes source files containing embedded DPL statements into compilable source files with appropriate references to the stubs generated by the stub compiler. In the Testbench, the pre-processor generates C.

32 Advanced Networked Systems Architecture

4.5.1 Interface specifications and stub compilation

The first consideration when designing a distributed application is the specification of the interfaces between the distributed portions of the application. With no loss of generality, the following discussion concentrates upon a single interface.

The interface specification consists of a text file which defines constructed data types for the specification and the signature for each operation defined in the interface. Each operation signature is defined in terms of a set of arguments and a set of results; either set may be empty and both arguments and results are passed by value.

A capsule which is willing to perform the operations specified in the interface specification must export the interface. A capsule wishing to use an instance of the service must import the interface. After a successful import, the operations can be invoked as many times as desired; each invocation is performed by the exporter whose offer satisfied the import request.

The distributed nature of the application is hidden from the users of a service through the use of stub routines generated from the interface specification. The stub routines (generated by the stub compiler) handle the marshalling (packing arguments/results into buffers for transmission), unmarshalling (unpacking arguments/results from transmission buffers), and comm- unications (exchanging buffers between the distributed portions of the application).

To perform the above tasks, the stub compiler makes choices in three specific areas: 1. the language binding between the interface specification language datatypes and the types available in the target programming language (data type conversion between different languages) 2. the language binding between the interface specification language operation signatures and the functional/procedural signatures of the target programming language (procedure call specification conversion)

3. the encoding of the basic datatypes as they are exchanged in transmission buffers

Previous stub systems have made many different choices in these areas. The most common choices have been: 1. a direct mapping from interface specification language datatypes to target programming language datatypes 2. the provision of “complete transparency” to distribution, leading to a direct mapping between interface specification language

33 Advanced Networked Systems Architecture

operation signatures and target programming language functional/procedural signatures 3. the assumption of a high degree of uniformity of system types, leading to the use of “receiver makes it right” data encoding between portions of the application

The ANSA stub system has made different choices in all three of these areas. The different choices made (and the reasons for the choices) are: 1. A basic set of datatypes and constructors are defined for the interface specification language, with no guarantee of a simple mapping of all types to any particular target language. This choice is essential to support multi-language environments. Experience will indicate which basic types and constructors lead to difficulties in multi-language interworking. 2. The potential for interaction failure is made explicit (which would not normally occur in the non-distributed case). This is a clear departure from the assumption in other systems and reflects the ANSA philosophy that transparency should be selective. 3. An intermediate representation is used between application portions to enable interworking in a multi-language, multi- vendor environment. In a remote invocation scheme, this does not require the full generality of ASN.l encoding since the types are known by both parties through the interface specification. 4. Operations are described in terms of a set of arguments and a set of results. Many languages have restrictions in the number of results which may be returned by a function and/or the mutability of arguments. The ANSA system, by defining the binding for each language, removes any restrictions inherent in any particular programming language.

By using the pre-processor, the user need never be concerned with the mapping to the client stub routines; it is included here only for completeness. It is essential, on the other hand, that a programmer writing the routines to provide a particular service be aware of the assumptions made by the server stub routines.

4.5.2 Preprocessing the distributed processing language

The actual code to use or provide an ANSA service is written using DPL statements embedded in C source code. This discussion gives a brief overview of how pre-processing is used to construct a capsule.

The pre-processor defines the mapping between the DPL statements and the interface to the binder and the stubs generated by the stub compiler. As such, it insulates the programmer from the low-level details of these two parts of the ANSA run-time system.

34 Advanced Networked Systems Architecture

The programmer writes the client and server portions of the service in terms of embedded DPL statements in C source code; by convention, the suffix of such files is '.dpl'.The user will invoke the pre-processor on these tiles to yield source files ending in a suffix of ‘.c’. These source files can then be compiled and linked together to yield a capsule as with any C source file.

An example application is constructed in Appendix A.

35 Advanced Networked Systems Architecture

5 Evaluation

5.1 Performance

5.1.1 Communications system structure

As described in Section 4 and shown in Figure 7, an ANSA capsule is mapped onto a single UNIXNMS process; the interpreter, protocol objects (e.g. REX), and message passing services (e.g. MUDP) are included in the process as library modules.

Figure 7: A UNIX/VMS capsule

a

Interpreter

Generic protocol interface Protocol Engines

Generic MPS interface

MPS Modules

Two levels of multiplexing of packets occurs in the capsule: the interpreter decides which protocol object to use to transport a particular request

b the protocol object determines which message passing service to employ for a particular request

This two-level modular structure is vital to the Testbench design, since it permits other protocol objects and other MPS modules to be easily added to the system. As a result, performance numbers need to be quoted for the full cross product of protocol object and MPS modules.

Broadcast protocols are not used in the system, thereby avoiding another possible source of scaling problems. Multi-cast protocols are used by GEX, but this is not a problem, since multi-cast protocols cause a bounded number of hosts to be involved in a particular interaction.

37 Advanced Networked Systems Architecture

5.1.2 REX

The REX protocol object provides a fully acknowledged synchronous RFC service (call, cal lack, reply, replyack message types) together with asynchronous casts for applications which do not require an explicit acknowledgement for every message. Large buffers are sent as a stream of fragments using a rate-based flow control scheme which attempts to maximize throughput whilst minimising unnecessary network load.

Unacknowledged calls and replys are assumed to have been lost in transit and the message is periodically retransmitted until a response is received; if an installation defined deadline is exceeded, the transmission is abandoned. Sequence numbers are used to order messages; out of sequence, or otherwise erroneous messages, are ignored. Duplicate messages, i.e. ones which have just been (or are being) processed, are assumed to have been sent to solicit an acknowledgement and are acknowledged.

Buffers which are too large to be carried on the underlying message passing service are broken up and sent as a stream of fragments. Each fragment of a large buffer carries the same message sequence number, the total buffer size, and the position (as a byte offset) of this fragment within the buffer. To avoid losing fragments at the receiver by overloading it with incoming packets, the transmission is rate controlled to maintain a comfortable average packet arrival rate. When the buffer has all been sent, REX enters a probing phase when it resends the first fragment in order to solicit a response (usually a fragnack). Arrival of a fragnack at any time causes a map of unsent fragments to be redefined and the buffer transmission cycle to begin afresh.

Reception of a fragmented buffer begins with the first fragment which arrives (not necessarily the first sent) and causes a receive buffer to begin to be constructed and a timer to be armed. Whenever a fragment arrives which has not been previously received, the timer is rearmed. Whenever the timer fires, a fragnack specifying all unreceived fragments (or as many as will fit into a single buffer) is constructed and sent; the timer is also rearmed.

Although REX does not require any specific performance criteria, it is designed to exploit fast and reliable networks; in order to optimize ‘bursty’ interactions, REX allows (and takes advantage of) implicit acknowledgements. Thus, the arrival of a reply indicates receipt of the preceding call and vice versa. Because a cast may be sent whenever a new call or replyack can, it also acknowledges a preceding reply.

The use of rate-based flow control is a primary mechanism for preventing overload at congested servers. It forces the responsibility for initiating further transfers upon the server. REX also uses an exponential backoff algorithm for the timers which drive probing.

38 Advanced Networked Systems Architecture

5.1.3 Measured performance

The performance numbers supplied in Tables 1 and 2 are for HP-9000/350 processors with 16 MB of memory, running HP-UX ~6.2, and interconnected via Ethernet. The IPC MPS is implemented using FIFO’s, with a fragment size of 2048 bytes; the UDP MPS has a fragment size of 4096. Due to the large granularity of the HP-UX clock, 3 fragments are sent at each clock tick when sending fragmented buffers.

The test programs themselves are constructed using the stub compiler and preprocessor. The interface specification for the Test service is as follows:

Test : INTERFACE = BEGIN Sink : OPERATION [VC: STRING] RETURNS [I; END.

The server routine simply returns when invoked by the server stub. The pseudo-code of the client program is:

bind to service instance gettireofday while (n-- > 0) Sink(fixed size character string) gettimeofday report statistics

The command line interface to the client program is:

tsink -b -n HostNams

tsink was repeated many times on essentially quiescent systems, with the best number selected from each repetition of tests.

Table 1: Elapsed time/call in milliseconds

buffer size (bytes) MFS 1 10 100 1000 10000

IPC 8.146 8.188 8.419 10.785 82.636 UDP 12.196 12.197 12.817 18.227 80.365

5.1.4 Commentary

Firstly, it is important to emphasize that these figures represent exactly the performance which can be achieved by a user program. The test programs go through all of the marshalling/unmarshalling overheads, and no attempts have been made to exploit any streamlined paths through the system.

The figures for the elapsed time per RPC is consistent with most other extra- kernel implementations. Of course, better times will be observed on systems for which the basic scheduling time is faster than for HP-UXon the HP-9000

39 Advanced Networked Systems Architecture

Table 2: Throughput in KBytes/sec

buffer size (bytes) IMPS 1 1000 2000 3000 4000 5000 10000 15000

IPC .121 93.00 83.70 75.0 100.0 120.1 120.6 120.2

UDP .084 54.76 87.88 100.4 120.1 95.1 124.9 125.2

buffer size (bytes X 1000) Mps 20 25 50 75 100 150 200 250 I I PC [ 139.0 136.5 133.9 135.6 136.1 127.6 128.1 128.3

1 UDP 1140.2 137.7 137.2 138.5 135.0 128.3 128.9 129.1 systems. Nonetheless, the performance improvement will be of the order of a few percent.

The throughput figures for small buffer sizes is also comparable with those of other systems. The dramatic increase in throughput for very large buffers, and its essentially flat behaviour as a function of buffer size, are very much better than most systems. Note that this has been achieved without any modifications to the programmer’s computation model. The initial instability seen for “small” buffers is due to a combination of the fragment size and the way that the marshalling code allocates buffers. The flat performance with large buffer sizes shows that these oscillations damp out when the buffer size is large with respect to the fragment size. In fact, the throughput remains constant for IPC (-129,000 bytes/second) and UDP (-130,000 bytes/second), as determined from tests using buffer sizes up to 500,000 bytes.

5.2 Transparency

The ANSA philosophy is to regard all operations as being potentially remote and to generate optimized code for objects that end up in the same address space to avoid paying unnecessary overheads. This provides distribution transparency without compromising the ability of the user to control which forms of transparency are applied. For an object to invoke operations on another, it must possess an interface reference. This may be obtained from other objects, or be build into the object. Invocation of the operations provided by the interface do not depend on co-location or otherwise of the serving and invoking objects,

In the Testbench, two mechanisms for location transparency are provided: ) if instructed that the service code will be in the same address space with the client, the preprocessor converts operation

40 Advanced Networked System Architecture

invocations on the interface reference to in-address-space function calls, bypassing the stubs and RPC mechanisms 1 if the remote interface is in a capsule on the same host as the calling capsule, then REX will select an MPS based on inter- process communication to optimize the interaction

5.3 Diagnostics, error detection and recovery

Extensive tracing facilities are included in the Testbench.

Most programming languages assume that function/subroutine calls will always successfully return; to be sure, an error may be signalled in one of the result parameters if the caller has violated the calling protocol, but one does not expect errors from the run-time support system. The possibility of a particular invocation involving remote access leads to situations where such expectations are no longer valid. As a result, DPL provides an explicit exception syntax which can be specified on each DPL invocation statement. An invocation with the exception syntax follows:

1 ( results ) <- refSop( arguments ) Continue statuslist 1 Abort statuslist Signal statuslist where status1 ist is a comma-separated list of the status names shown in Table 3 below. The action taken upon completion of an invocation depends upon which of these lists contains the operation status: ) Continue - the program continues on to the next statement Abort - the program aborts with an appropriate error message Signal - the program invokes Signal_Type_Op with the status followed by the addresses of the arguments followed by the addresses of the results; the programmer must provide this routine, which can manipulate any of the arguments or results; the return value of the function determines the final action taken by the program: -1 indicates that the program should abort, 1 indicates that the program should continue, and 0 indicates that the operation should be attempted again, presumably with arguments which have been modified by the signal routine

Specification of an asterisk (*) in a status1 ist indicates that all status names not explicitly listed are to be included in this category. Use of the exception syntax is optional; the default behaviour is equivalent to

Continue ok Abort *

5.4 Heterogeneity

The Testbench is operational on systems using the following hardware: Motorola 680?0, Intel 80?86, VAX, ICL 2900, Acorn RISC, HP Precision Architecture and DEC Firefly (a research multiprocessor).

41 Advanced Networked Systems Architecture

Table 3: Invocation status values

abnormalReturn

heapAllocationFailure 1accessViolation I unAvailableProtocol bindFailure 1invalidChannelId 1 allocatedChannel

insufficientChannels 1insufficientSessions 1invalidSessionId

invalidSessionState I invalidSessionEntry 1insufficientThreads invalidThreadId I notParentThread I noMultiTasking bufferTooLong 1transmitFailure 1transmitTimeout

probeTimeout I responseTimeout I protocolError

duplicatepacket resourceInCse

The Testbench has been ported to the following operating systems: HP-UX, SunOS, Ultrix, MS-DOS, VMS, VME (a mainframe OS for the ICL 29001, and a new multi-processor OS .

REX packets have been transported using MPS’s based upon the following networking protocols: TCPAP, UDP/IP, OS1 Transport, DECNET, UNIX address family sockets, FIFO’s, and VMS mailboxes.

5.5 Relationship to OS1

The Testbench relates to the OS1 reference model as follows: ) the stub compiler and preprocessor provide application and presentation layer functions ) the protocol objects (e.g. REX) implement the session protocols needed by the capsules ) the MPS modules (e.g. MUDP) encapsulate physical through transport protocols.

The Testbench provides for the use of OS1 protocols as follows: ) the construction of MPS modules interfacing to OS1 transport (both connection-oriented and connection-less) ) the possibility to insert a protocol object based upon the Remote Operations Protocol (ROP) and its 7 support layers 1 ASN.l encoding of the application Protocol Data Units (PDUs) is easily accommodated through the revision of the #include file which defines the marshalling/unmarshalling performed by the stub ) provision of ANSA interfaces to OS1 service objects via the nucleus.

42 Advanced Networked Systems Architecture

5.6 Software engineering

The ANSA architecture follows the principles of object-oriented design; this is carried through to an object-oriented implementation for the Testbench, within the limitations of C.

The architecture emphasizes the concept of generic functions and narrow interfaces as the key to minimising concepts and enabling maximum reuse of components.

The Testbench is structured into separate code modules with documented and maintained interfaces between them, following the engineering model of the architecture. All host, operating system and network specific functions required by the Testbench will be held in a single, operating-system specific module in releases subsequent to Release 2.5 of April 1989.

43 Advanced Networked Systems Architecture

6 International standards

The Project has a strong commitment to see that its work is placed in the public domain by actively participating in relevant standardization activities. The main activity has been participation in national and international meetings of project 1.21.43 of ISO/IEC JTCl, the ‘Basic Reference Model of Open Distributed Processing’ (ODP). There has also been support for work in ECMA which is defining a support environment for ODP and for a new question in the next CCI’M’ study period which builds on the experience of messaging and directories (X.400,X.500).

The IS0 ODP scope, as currently defined, states:

“This standard will be concerned with, and limited to, the general aspects and common features of distributed systems. It will provide: a. common definitions of concepts and terms for distributed processing b a generalized model of distributed processing using these concepts and terms

C. a general framework for identifying and relating together open distributed processing standards The standard will not provide a generic model for all information processing and will not describe standards for specific fields of application.”

The schedule for the standard calls for a draft proposal in mid-1990, followed by the DIS and IS in 1991 and 1992 respectively. Some relevant documents are:

ISOLIECJTClSC21 N3288 Working Document on Topic 2.2 - Properties and Design Freedoms ISO/IEC JTClSC21 N3194 Report on Viewpoints under Task 1 of Topic 2.3 - Framework of Abstractions ISO/IEC JTClISC21/WG? N108 Working document on Topic 4.1- Functions and Interfaces ISOLIECJTCl/SC21/WG? N109 Working document on Topic 6.1- Modelling techniques and their use in ODP

45 Advanced Networked Systems Architecture

References

[ARM 891 The ANSA Reference Manual, Architecture Projects Management Limited, Poseidon House, Castle Park, CAMBRIDGE, CB3 ORD, United Kingdom, 1989. [BIRMAN 851 Birman, K., “Replication and Fault Tolerance in the ISIS System”, ACM Operating Systems Review, 19 (5), 79-86 (December 1985). [BIRMAN 871 Birman, K., “Exploiting Vitual Synchrony in Distributed Systems” ACM Operating Systems Review, 21(5), 123-128 (November 1987). [BLACK 871 Black, A., Hutchinson, N., Jul, E., Levy, H. & Carter, J., “Distribution and Abstract Types in Emerald”, IEEE Transactions on Software Engineering, SE-13 (l), 65-76 (January 1987). [CHERITON 851 Cheriton, D. & Zwaenepoel, W., “Distributed Process Groups in the V Kernel”, ACM Transactions on Computer Systems, 3(2), 78-107, May 1985. [COOPER 851 Cooper, E., “Replicated Distributed Programs”, ACM Operating Systems Review, 19 (5), 63-78 (December 1985). [DOLEV 871 Dolev, D., Lamport, L., Pease, M., & Shostak, R., “The Byzantine Generals”, in Concurrency Control and Reliability in Distributed Systems, 348-369, Van Nostrand Reinhold. [ LISKOV 831 Liskov, B. & Scheifler, R., “Guardians and Actions: Linguistic Support for Robust Distributed Programs”, ACM Transactions on Programming Languages and Systems, 5 (3), 381-404 (July 1983). [ LISKOV 861 Liskov, B. & Guttag, J., Abstraction and Specification in Program Development, McGraw-Hill Book Company, New York, (1986). [MYERS 801 Myers, G.J., Advances in Computer Architecture, Second Edition, John Wiley & Sons, New York, (1980). lSCHLICTING831 Schlichting, R. D. & Schneider, F. B., “Fail-stop processors: An approach to designing fault-tolerant computing systems”, ACM Transactions on Computer Systems, l(3), 222-238,1983. [WEGNER 681 Wegner, P., Programming Languages, Information Structures and Machine Organization, McGraw-Hill Book Company, New York, (1968). [XEROX 811 Document XSIS038112, Xerox Corporation, Stamford, CT, 06904, December 1981.

47 Advanced Networked Systems Architecture

l/

#include

void body(argc. argv) int argc; char largv[]; c InterfaceRef ih: Real nuaber. fsqrt. mean. LongReal dsqrt; DataSet V; Real lrp, Data[Z5]; int i;

(ih} <- traderRefSImport(“test1”. v/v. “*) V-length = 0; rp = Data; V.data = rp; for (i = 1; i < argc; i++) { sscanf(argv[i], “Xf”, &number); V.length*; lrp++ = number; ( fsqrt } <- ih$FSqrt(number) { dsqrt } <- ihSDSqrt((LongReal)nuber) printf(vfsqrt(U!f) = Ulf. v, nuber, fsqrt); printf(‘dsqrt(X8lf) = Ullf\n”. number. dsqrt);

: mean. stddev } <- ihSStats(V) printf(“mean = Lf. stdev = Xf\n’. mean, stddev); ihSDiscard

This source should be stored in a file named cl ient .dpl . For each command argument, this client invokes FSqrt and Dsqrt on the number. It also stores the number away into a DataSet variable for eventual invocation of the Stats operation. Note that a successful import does not guarantee any future success in operation invocation.

A.4 Creating and running the programs

Depending upon the system upon which you are building the programs, a sequence of commands similar to those below are necessary to construct the programs:

stubc testl. id1 prepc client.dpl prepc server.dpl compile c1ient.c server-c ctest1.c stest1.c link client.ctestl.testbench/library link server,stestl.testbench/library

If the interface type n.ame (test 1 in this case) is not known to the trader, then the type name must be added to the trader’s database with a command similar to:

addtype test1 ansa

A command should be issued to cause the server to run in the background.

51 Advanced Networked Systems Architecture

Finally, the service can be tested by invoking the client, as in:

client 1.0 2.0 169.0 12.3 100.0 yielding:

fsqrt(l.OOOOOO) = 1.000000. dsqrt(l.OOOOOO) = 1.000000 fsqrt(2.000000) = 1.414214. dsqrt(2.000000) = 1.414214 fsqrt(169.000000) = 13.000000, dsqrt(169.000000) = 13.000000 fsqrt(12.300000) = 3.507136, dsqrt(l2.300000) = 3.507136 fsqrt(100.000000) = 10.000000, dsqrt(100.000000) = 10.000000 maan = 56.860001, stdev = 67.159500

52 Advanced Networked Systems Architecture

Appendices

A A complete example

This section provides all the information necessary to build client and server portions for a trivial example service.

A.1 The interface specification

test1 : INTERFACE =

-- testl.idl - specification for the test1 service -- __ Copyright (c) 1988. ANSA Project -- -- Sleader: B(l)testl.idl 2.1

DataSet : TYPE = SEQUEMCE OF REAL:

FSqrt : OPERATION [ nu&er : REAL ] RETURNS [ REAL 1; DSqrt : OPERATION [ number : LON6 REAL ] RETURNS [ LON6 REAL 1; Stats : OPERATION [ V : DataSet ] RETURNS [ REAL, REAL 1;

This interface describes three operations: FSqrt, which returns the square root of a Real number, DSqrt, which returns the double precision square root of a double precision number, and Stats, which given a sequence of real numbers returns the mean and standard deviation of the sequence. This specification is stored in a file named test 1. id 1.

A.2 The server program /* * server.dpl - server for the test1 service . l Copyright (c) 1988, ANSA Project l

l ftleader: R(#)server.dpl 2.1 l /

#include

void body( )

InterfaceRef eh;

(eh) <- traderRefSExport(‘test1”. I/*, “Name testl”. 16)

/* l routine to perform FSqrt operation l /

49 Advanced Networked Systems Architecture

int testl_FSqrt(number. result) Real number; Real lresul t: t if (number < 0.0) return 0; *result = (Real) sqrt((double)number); return 1;

/* . routine to perform DSqrt operation l/ int testl_DSqrt(number, result) LongReal number; LongReal lresul t; ( if (number < 0.0) return 0; *result = (LongReal) sqrt(number); return 1;

/’ l routine to perform Stats operation l/ int testl_Stats(V. #an, stddev) OataSet V; Real *mean, lstddev;

Real lrp, first. second; Real W; int i:

first = 0.0; second = 0.0; rp = V.data; N = (Real)V.length; for (i = 0: i < V.length; i++) ( first += rp[i] / I; second += rp[i] l rp[i] / II;

*mean = (Real) first: lstddev = (Real) sqrt(second - first l first): return 1;

This source is stored in the file server.dpl. Note that the routines testl_FSqrt and testl_DSqrt return the value 0 if the input argument was less that 0 (only real square roots are defined by this service, not complex ones). The Export reference in the body{) procedure requests that a service conforming to the type ’ testl’ be offered in the name tree at “/” with the value of the “Name” property being the string “testl”. The fourth parameter to Export indicates that up to 16 requests may be queued for this server while it is processing the current request.

A.3 A possible client program

client.dpl - simple client of the test1 service

Copyright (c) 1988, ANSA Project

Stleader: g(#)client.dpl 2.1

50 Aduanced Networked Systems Architecture

INDEX access transparency ...... 14 terminations ...... 20 activities ...... 2 testbench ...... 27 ANSA ...... 3 thread ...... 30 APM ...... 1 trader ...... 22 architecture ...... 4 trading ...... 22 ARM ...... 2 transparency ...... 14 asynchronous operations .... 19 type checking ...... 16 atomic operations ...... 20 atomicity ...... 20 capsule ...... 29 computation model ..... 12-13 concurrency transparency ... 14 configuration ...... 16 configuration manager ...... 6 coordinated replica group .... 24 dependability ...... 5 engineering model ...... 12 enterprise model ...... 11 failure transparency ...... 14 functionally distributed group 23 group ...... 22 IDL ...... 9 information model ...... 12 interfaces ...... 17 interpreter ...... 9 location transparency ...... 14 migration transparency ..... 15 nucleus ...... 30 object ...... 29 object group ...... 22 objectives ...... 1 operations ...... 18 parallel replica group ...... 25 performance ...... 37 platform ...... 6 plug ...... 31 projections ...... 11 replication transparency .... 15 REX ...... 38 security ...... 2 1 socket ...... 30 standards ...... 2 stub procedures ...... 3 1 synchronous operations ..... 19 technology model ...... 12

53