Using Compile-Time Reflection for Object Checkpointing 1

Marc-Olivier Killijian, Jean-Charles Fabre, Juan-Carlos Ruiz-Garcia

LAAS-CNRS, 7 Avenue du Colonel Roche 31077 Toulouse cedex, France {killijian, fabre, ruiz}@laas.fr

Abstract. This paper tackles the problem of checkpointing object oriented programs using a reflective approach. The objective of the technique and the corresponding tool is to provide checkpointing methods to application classes. In conventional object oriented fault-tolerant systems, the implementation of these methods to save and restore the state of objects is often delegated to the application programmer; the dependability of the system relies thus on his ability to implement these core functions. Our approach enable the automatic provision of these methods for classes that obey some programming restrictions. In a second step, we present an optimization of this technique using runtime reflection; the data checkpointed corresponding to the of attributes that have been modified since the last checkpoint. Some preliminary results of the evaluation of these techniques and an overview of the CORBA system framework in which they are used is finally described.

1 Introduction

The definition of application checkpoints is a major issue in the design and the implementation of dependable systems, especially for building fault tolerance strategies. Checkpointing the state of individual active objects is always needed for many replication strategies and mandatory for cloning a new replica during the reconfiguration of the system after a failure. The problem of checkpointing a distributed application involves complex to ensure the consistency of the distributed recovery state of the application. All these algorithms make the assumption that individual checkpoints of the application objects can be obtained easily. However, this is a strong assumption and, in practice, it is not so easy when complex active objects are considered. The available solutions either rely on very hardware or dependent mechanisms or delegate this to the application

1 This work has been partially supported by the European Esprit Project n° 20072, DEVA, by a contract with FRANCE TELECOM (ref. ST.CNET/DTL/ASR/97049/DT) and by a grant from CNRS (National Center for Scientific Research in France) in the framework of international agreements between CNRS and JSPS (Japan Society for the Promotion of Science).

1 / 20 programmer. These solutions have many drawbacks and recent approaches investigate the use of -assisted checkpointing techniques [9, 15] . The proposed solution can be compared to the later and shows that compile-time reflection is a promising approach to tackle the problem of obtaining the state of individual objects. This state information is here obtained at a high level of abstraction and handled through the combined use of both compile-time and runtime reflection. The technique proposed here is part of the definition of a protocol for implementing fault tolerance in CORBA applications [8] , although its use is not restricted to this topic. Obtaining the state of an object is part of this MOP but is also useful for other aims, e.g. object migration in mobile systems or simply for load balancing. In this paper, we investigated this problem for object-oriented programs; the real question is what is the state of individual object oriented programs? This state is required to resume execution on a remote system. After a brief analysis of current approaches and an introduction to our solution (Section 2), we describe a new approach to the object state capture using compile-time reflection. Indeed, using class definitions and implementations at compile-time, we are able to generate automatically methods, such as SaveState and RestoreState, responsible for the capture and restoration of the state of objects of this class (Section 3). An optimization of this approach using runtime reflection is then proposed: thanks to runtime information, we are able to checkpoint only the attributes that have been modified since the last checkpoint (Section 4). A comparison of our approach with a conventional non-reflective compile-time solution illustrates both the performance efficiency and the coverage of the state capture (Section 5). Section 6 gives an overview of the system in which these mechanisms are presently used.

2 Problem Statement and Related Work

The available techniques to obtain the state information of an object vary very much regarding the abstraction level where they apply and of course depend on the real information that is needed. The state information of an active object encompasses two of information, the internal state of the object (namely state variables) and the information in transit (invocation messages). We concentrate here on the internal state, the latter dealing with end-to-end communication protocols not described in this paper, see [5] . The internal state of an active object is mapped at a very low level to memory objects (segments, regions, etc.) which are handled either by the operating system or better by a middleware (the runtime layer for the application objects). Understanding the mapping implies diving into the operating system or the middleware to identify which memory objects hold the internal state of a given application object. This approach often leads to a customization of the software runtime layer in order to obtain such detailed information. This is a first drawback of this solution since off-the-shelf runtime layers cannot be used in this case. A second drawback of this solution is that it provides raw information concerning the state of an object. It is worth noting that such information is not appropriate to install and initialize a newly created object on a different site of the system for following reason:

2 / 20 1. Some data items are not significant on a different site since their value is very site dependent (e.g. pointers to objects, file descriptors, semaphore descriptors, etc.) 2. The semantics of state information variables must thus be interpreted in order to perform an appropriate initialization of such variables on a remote site. Indeed, some data items are related to internal objects within the middleware of the remote site; a raw copy of the memory objects is thus not consistent on the target site and some additional actions must be performed. These actions must create and initialize the corresponding internal objects within the runtime layer of the remote site. As soon as these actions complete and that the new instance is updated with the current state information, then the new object copy holds a consistent state from which the computation can continue. Such important observation indicates that semantic information regarding such data items is necessary to obtain a consistent checkpoint for the new object copy. Another approach consists in providing the user with libraries of functions [14] or classes [10, 11, 13] to deal with fault tolerance protocols and state information. In object oriented terms, the application classes must inherit from some base class in which the two methods SaveState and RestoreState are defined as virtual methods. This second approach is clearly not transparent for the user and relies on user’s skills to use the library functions or to implement the virtual methods correctly. This means that the implementation of both methods must be error-free. For instance, a wrong implementation of the SaveState function will lead the new copy to hold an inconsistent state from which the execution could not be restarted. This is a major drawback of such solutions. Moreover, since the implementation can be very difficult for very complex objects the probability of introducing software faults (bugs) in so doing is certainly very high. Another side effect of such solutions is that any modification (evolution) of the application object implementation must be consistently reported to the SaveState and RestoreState implementation. Because different application programmers can perform the long-term evolution of such application objects, a consistent implementation of these methods is not guaranteed during the lifetime of the application. Clearly, in the short or in the long term the whole distributed application will fail. It is worth noting that the definition and the implementation of such core functions must be tool assisted. Following this idea, some works have investigated the use of customized to generate these functions automatically, e.g. [15] . More recently, the use of compile-time reflection was introduced to tackle this problem [7, 8] . In this type of solution, the identification of the internal state of application objects is very language dependent. This is however the only way to obtained detailed information about the internal state of application objects on off-the-shelf runtime systems. Any other solution would provide a coarse view of this information and make the interpretation of site-dependent objects very uncertain. A language independent solution would require the runtime system (the middleware, e.g. the ORB) to reify such detailed information, i.e. making the runtime system reflective. Both solutions rely on a reflective approach, the latter involving anyway a customization of the runtime system or the use of an appropriate reflective runtime system (only the Java runtime system provides type information using the java.lang.reflection package).

3 / 20 It is worth mentioning that pure object-oriented languages are easier to handle with this approach rather that hybrid languages such as C++. However, the experiments reported in this paper have been performed for C++ objects and the tool based on Open C++ [1] . This is why some programming conventions and restrictions have been considered in order to enforce a strong encapsulation principle and avoid programming statements that may lead to uncontrolled side effects to the object state. The necessary restrictions would have been very different using Java. We will comment on this point later, in section 4.3, but programming restrictions are not a real problem from our viewpoint. The most important issue is that the state obtained must be complete and consistent with the current state of the computation, enabling the computation to continue with a new object copy on a remote site. This is really mandatory for a fault tolerant system whose first objective is to tolerate faults, e.g. object crashes. The first role of the tool is then to filter such programming conventions; any class not obeying the programming restrictions we have identified will be rejected. It is up to the application designer/programmer to implement the SaveState and RestoreState for this class (not recommended) or iterate on the design/implementation of the class to obey the programming restrictions. A second important design issue for the tool, is the coverage of the object state, i.e. make sure that all the necessary information for the new copy to resume execution is obtained. This assumption must be ensured when getting the whole state of the object (cf. Section 3) but also when obtaining any partial state information (cf. Section 4). Any superset of such necessary information is acceptable, although the objective is to minimize the redundancy for performance reasons. This work is part of the definition and the implementation of a metaobject protocol for fault-tolerant CORBA applications [8] . This MOP has two roles: (i) interception of object creation, deletion and method invocation and (ii) the capture/restoration of the object state. With this MOP, can implement non-functional mechanisms such as active or passive replication for fault tolerance and authentication for communication security as previously illustrated in the Friends system [6] . This MOP can also be used for other aims, as previously mentioned.

3 Object State using Compile-Time Reflection

3.1. Context and Motivations

We describe here how compile-time reflection can be used to define and generate the implementation of both methods SaveState and RestoreState. These two methods are first generated to obtain the whole state of the application objects, namely the full set of its private attributes. Public attributes are forbidden in a first step to ensure a strong encapsulation principle (programming restriction that can be bypassed, see section 4.3). We also assume in the remainder of this paper that base level

4 / 20 methods are executed sequentially within application objects (no internal concurrency, this problem has not been tackled yet). The state is thus obtained at runtime on the source object and forwarded within checkpoint messages through metaobjects to the target object (see. Fig.1). We also assume that a checkpoint can be taken after one or several method execution; this strategy is left open to the metaobjects.

Meta- checkpoints Meta-

Object Object’ RestoreState

e t

a t

S

e

v

a

S

Source Target Object Object

Fig. 1. Metaobjects checkpointing objects

The use of compile-time reflection enables the needed static information to be obtained and the adequate source code modifications to be made to get dynamic information when necessary. In practice, are used to analyze, translate classes and generate new methods at compile-time. These features enable the automatic generation of both state information data structures and the SaveState and RestoreState methods for each class. The reflective compiler OpenC++ v2 [1] was used in our experiments as a powerful macro processing system [2] . The generic approach proposed in this paper need to be applied to a convenient object model. CORBA [12] provides such convenient object model. However, depending on the programming language used to implement CORBA objects, some programming restrictions must be obeyed. Finally, the IDL compiler is used in combination with the metacompiler to manage the state information.

3.2 Approach Overview

The starting point is that compile-time reflection enables application classes definition to be reified during the compilation . This includes attributes names and types, parent classes, object references (composition), etc. This information is handled by metaclasses at compile-time. Given this information, (i) a (called StateContainer) is defined to hold the object state and also (ii) new class methods to save and restore the object state are created. In brief, the role of these methods is to write the attributes’ values into the StateContainer or the StateContainer to the attributes respectively. The StateContainer structure is defined by a that creates a field for each attribute of the class. Any field in this structure holds an IDL type. The

5 / 20 translation of the attributes types to these fields is performed according to the C++-to- IDL mapping defined by OMG [12] . A simple example of such a StateContainer data structure is proposed in Fig.2. class Example { struct StateContainer int a,b; { float c,d; long a,b; char e; double c,d; void set (int i, int j, char e; float k, float l, char m); } void calculus( int step); ..... }

Fig. 2. Example of a StateContainer Structure

The SaveState and RestoreState generated methods fill in and out the corresponding StateContainer data structure with the state information of a concrete objects at runtime. For each class, the of these methods can thus be defined in fig. 3. StateContainer SaveState(); void RestoreState(StateContainer);

Fig. 3. Interface for SaveState and RestoreState

The body of these methods is also generated by metaclasses. Both aspects are presented in the following sections.

3.3 Object Attributes Handling and Methods Generation

Object Model Assumptions To ensure the consistency of the checkpoints obtained using this approach, we need to have a real object model. We consider a pure object world; objects have only private attributes and are single threaded (no internal concurrency). Objects’ attributes types considered are simple types, classes, pointers to objects, arrays of the three preceding types (simple, class, and pointer on object) and object references by mean of CORBA_Reference. The work presented in this paper does not solve all the problems identified in the previous section, in particular regarding the use of internal variables within the middleware. This is why a pure object model was considered; even if local pointers cannot be raw-copied, this approach enables to create remote copies of local objects by re-creating them. If the pure object assumption is not met, i.e. if the system/middleware is hybrid, one possible solution is to wrap the system calls using object servers. For instance a file object server can encapsulate file accesses and log local actions; this would enable

6 / 20 to copy remote file references. The same approach can be applied to other site- dependent variables.

Simple Types The handling of simple C++ types is the basic case, as explained previously, each member of simple type in the class is mapped to an element of the StateContainer data structure following the IDL to C++ mapping. An example is given in Fig. 4. The generation of the StateContainer data structure for simple types is very easy: the metaclass parses the class definition for attribute declarations. For each attributes, the metaclass retrieves its type, gets the corresponding IDL type from a dictionary and generates an entry into the StateContainer for this attribute (see. Fig. 5). The generation of the SaveState and RestoreState methods follows the same scheme. The method SaveState creates a StateContainer data structure (see Fig. 6) and writes each attribute into the corresponding field of the structure. Similarly, the RestoreState method interprets the StateContainer input information and writes each field into the corresponding attribute (see Fig. 7 ).

Arrays In the object model considered, arrays can be of different types: arrays of simple data types, arrays of objects, arrays of strings and arrays of pointers to objects. However, the handling of arrays can be expressed in a generic manner: each element of the array is written into the corresponding array element of the StateContainer data structure. When array elements are of simple type, individual elements are stored in a similar way as previously done for simple type attributes. This also applies for arrays of strings. In all other cases, a recursive technique is used to handle each element, i.e. each object in the array. The example given in figures 4-7 illustrates the technique for both simple data types and arrays. class NQueen { struct StateContainer { public: char ChessBoard[100][100]; NQueen(int num); long N; bool compute(int l); long placed; private: long nbsteps; char ChessBoard[MAX][MAX]; }; inline bool check(int R,int C); int N; int placed; int nbsteps; }

Fig. 4. Original class definition Fig. 5. StateContainer Data Structure

7 / 20 StateContainer void Nqueen:: NQueen::GetState() { SetState(StateContainer State) StateContainer State; { for(int i=0;i<100;i++){ for(int i=0;i<100;i++){ for(int j=0;j<100;j++){ for (int j=0;j<100;j++){ State.ChessBoard[i][j] ChessBoard[i][j]= = State-> ChessBoard[i][j]; ChessBoard[i][j]; } } } } State.N=N; N = State-> N; State.placed=placed; placed = State-> placed; State.nbsteps=nbsteps; nbsteps = State-> nbsteps; return State; } }

Fig. 6. The SaveState Method Fig. 7. The RestoreState Method

Object Composition and Delegation In the object model described above we make a clear distinction between internal objects (composition) and external objects (delegation), see fig. 8. The former correspond objects addressed either directly (an object is an attribute of another object) or by reference (using pointers) within a CORBA object. The latter corresponds to the delegation relationship to a different CORBA object using explicit references, i.e. CORBA_Reference. This has a strong impact on the checkpointing technique since external objects and internal objects are handled in a different way.

SaveState() SaveState()

copy_ref() SaveState() SaveState()

Fig. 8. Composition vs. Delegation

External object can be shared by several objects and are checkpointed independently from the object holding a reference to it. They are thus checkpointed by their own metaobject. This means that any CORBA object is checkpointed independently. However, their reference has to be duplicated and stored into the StateContainer. Internal objects are members of one instance of a class, so they are really part of the state of objects of this class. These objects are checkpointed recursively; for instance in Fig 9, the SaveState method of class B calls the SaveState method of class A to get the state of both Object_1 and Object_2. The corresponding states are stored into the StateContainer data structure of class B. This corresponds to a deep copy

8 / 20 of the objects while the delegation relationship implies only swallow copies (duplication of the reference).

Examples of composition Examples of delegation Class A ; class C { class B { CORBA_Reference delegate; A Object_1; } A* Object_2; }

Fig. 9. Composition versus Delegation

Similarly, the RestoreState method performs the restoration of the state recursively. It is worth noting that, during this operation, newly created objects since the last checkpoint have to be created at least. In practice, all objects addressed by pointers are created and updated by the last corresponding state available in the checkpoint.

Class Inheritance Inheritance (presently, only single inheritance is considered) was identified in our previous experiments [6] as problematic using basic runtime MOPs such as those provided by Open C++ v1 [3] . Thanks to compile-time reflection, inheritance can be handled in a recursive way as for object composition: each class is responsible for checkpointing its own set of attributes and derived classes call automatically their base classes Save/RestoreState method in order to complete the checkpoint. Polymorphism can also be used with these techniques since base objects are responsible for obtaining their own state. Let a class hierarchy composed of classes A, B and C both inheriting from a mother class M; another class D owns an attributes P whose type is pointer to class M. When an instance of D is saving its state, it calls P->SaveState(); the object pointed by P can save its own state either it is of class A, B or C.

Packing and Portability Issues The representation of the StateContainer is an important issue. This representation should be generic for two purposes: 1. the state of any object must by handled within StateContainer independently of the class; 2. using a generic format for StateContainer would also enable the propagation of this state information to different environments. For this purpose, the StateContainer data structure is defined in IDL and mapped to the Any IDL type. The IDL compiler generates automatically conversion functions from and to the type Any for each StateContainer data structure (see Fig. 10). The Any type can hold any data structure.

9 / 20 Any <<= (StateContainer) Any <<= (StateContainer)

Fig. 10. Mapping of StateContainer/Any and Any/StateContainer

As a consequence, we can define in a generic manner the SaveState and RestoreState interfaces previously defined in Fig 3. Their argument and return value are now of Any type. Thanks to the use of CORBA, the checkpoints obtained are machine/system/ORB independent. This feature can help to implement dependability mechanisms on different platforms. Another interesting aspect of this approach is that the structure of the state information is now visible at runtime. Because the StateContainer data structure is stored in the Interface Repository of the ORB, such information can be obtained and manipulated for other aims.

3.4 Summary of the Programming Restrictions

In the previous sections, we have considered that objects have only private attributes, but also several other restrictions:  No friend class or function The definition of public or protected attributes, or of a friend class or function may violate the encapsulation principle. Public or protected attributes can be accessed outside of the scope metaclasses can presently control. A friend class (or function) can access the object private attributes and so can modify its state in an uncontrolled way.  No global variable Our technique cannot deal with such global variables, since they could be used, for instance, to store temporary data, information that is part of the internal state of an object, data shared between several objects, etc. Presently, we cannot control their use with Open C++. Moreover, they do not comply with a strict object model.  One level of indirection for pointers Enabling several level of indirection would require a very complex runtime analysis. This restriction does not limit very much the programming features for application programmers; C++ pointers can be seen as Java object references. Solving this problem would lead to very complex management software.  Single-threaded Objects The checkpointing of multi-threaded object involves having control over context switches, which is not presently the case with the tools we have used. Moreover, such information is often runtime layer dependent. Since we focus attention on off- the-shelf ORBs, this information is not available. There is currently many research on this topic, e.g. [7] .

10 / 20 4 DeltaState Checkpointing

We have shown in section 3 that compile-time reflection enables transparent checkpointing methods to be provided for object-oriented applications. We think that this technique can be optimized under certain conditions using runtime information, in particular, using runtime reflection; this is the topic of this new section.

4.1 Basic Idea

As stated before, an object state is represented by the values of the object's attributes. During the lifetime of the object, this state evolves according to the application control flow. The technique presented in section 3 can provide checkpoints that carry the whole object state, i.e. the whole set of attributes of the object. However, between two checkpoints, only few object's attributes are likely to be modified. We propose then to use runtime information to build delta-states that carry only those attributes that have been modified since the last checkpoint. This subset of attributes is called delta-state. In brief, to identify at runtime the attributes that have been modified, the source code of a class is modified at compile-time: the metaclass parses the class code and inserts some additional code to flag the attributes when a modification is performed. The metaclass generates also two methods to get and set delta-states: SaveDeltaState and RestoreDeltaState.

4.2 Technical Overview

In the FullState technique, metaclasses generate the StateContainer data structure according to class attributes. The approach for the DeltaState technique is slightly different. The DeltaStateContainer data structure is defined generically, i.e. the same DeltaStateContainer structure definition is used for any class. Even though the FullState technique simply in defines Save/RestoreState methods, the DeltaState technique implies more complex modifications of the application source code. A flag is associated with each attribute of the class. Each method of the class is parsed and modified. These modifications occurs for each possible write access to an attribute. When such situation is identified, then the flag associated to this attribute is set. The SaveState method only saves the attributes that have been flagged and clears the flags. The RestoreState method restores only the attributes that are stored into the DeltaStateContainer.

DeltaStateContainer Data Structure The data stored in delta-states varies at runtime according to the object activation. We thus need an adaptable DeltaStateContainer to store the delta-states. For a given object instance, the delta-state is a sequence of members of this object, an arbitrary set of some of the object's attributes. For each attribute into a DeltaStateContainer, we need a structure that holds both an identifier of the attribute and its corresponding

11 / 20 value. We thus define an element of the DeltaStateContainer (called AttributeInfo) as a tuple (see Fig. 11). The final DeltaStateContainer is a generic sequence of AttributeInfo, i.e. a list of AttributeInfo. This definition is generic, e.g. it does not depend of the class considered. struct AttributeInfo{ long attribute_id; Any attribute_value; }; typedef sequence DeltaStateContainer;

Fig. 11. DeltaStateContainer Definition

The attribute_value field of an AttributeInfo stores the attribute value. Since the attribute types considered are simple types, arrays and objects (composition), this attribute_value can be either a simple type attribute, an array element, or the DeltaStateContainer of an internal object.

Simple Types As mentioned above, when a metaclass encounter a write access to an attribute, some code is inserted to set the associated flag. The SaveDeltaState method just have to insert an AttributeInfo element into the DeltaStateContainer for each attribute whose flag is set. Once the whole DeltaStateContainer is built, all the flags can be cleared to reflect that a checkpoint has been taken. The RestoreState method interprets a DeltaStateContainer: each attribute identified in the DeltaStateContainer is assigned with its new value.

Arrays The delta-state of an array is the sequence of elements of this array that have been modified since the last checkpoint. We need thus a way to identify isolated array's elements. An array element is identified by a set of indexes: typedef sequence Index; The value stored into the AttributeInfo element of the DeltaStateContainer will thus be a tuple , see Fig. 12.

struct ArrayElementData{ Index index; any value; };

Fig. 12. Array Element Holding Structure

To handle the delta-state of arrays, i.e. a set of array's elements, we define runtime metaobjects called MetaArrays. They are responsible (i) for flagging the elements of the arrays that have been modified and (ii) for the retrieval of the sequence of elements modified since the last checkpoint.

12 / 20 The definition of these MetaArrays is illustrated fig. 13. When an element of an array is written, the Append method of the corresponding MetaArray is called to flag this element. The other methods are used by the SaveDeltaState method to retrieve the list of index that have been flagged into the array. class MetaArray{ public: void Append(Index index); // Record the index as modified int Number(); // # of modified elements void Clear(); // Checkpoint has been taken Index operator [](int n); // Returns the nth index Index_List All(); // Returns the list of indexes };

Fig. 13. Array Runtime Metaobjects

Thanks to this notion of MetaArrays, the SaveDeltaState method can easily retrieve the delta-state of an array, element by element, storing them into the object's DeltaStateContainer. The RestoreDeltaState method interprets the ArrayElementData contained into the DeltaStateContainer to restore the state of an array.

Object Composition As for the FullState checkpointing technique, the DeltaState technique handles object composition recursively: for instance, considering the class B (see Fig. 9), the SaveDeltaState method of B calls the SaveDeltaState method of class A on objects object_1 and object_2 and get two DeltaStateContainer in return, if these DeltaStateContainer are not empty they are then stored into the DeltaStateContainer of the class B instance. Similarly, the RestoreDeltaState method performs the restoration of the state recursively. Objects accessed using pointers are re-created, when there is a DeltaStateContainer for them. A similar approach also applies to inheritance hierarchies.

4.3 DeltaState Restrictions

Some additional restrictions are necessary to apply the DeltaState technique to C++ objects. However, when the application program does not meet these restrictions, the FullState checkpointing technique can be applied. Given the following restrictions, the choice between the FullState checkpointing and the DeltaState checkpointing techniques can only be made at runtime. This is why the use of the DeltaState technique is always made in combination with the FullState technique.  Pointers in method’s arguments To prevent the violation of encapsulation by sending an attribute pointer to a method, the general use of pointers in method’s arguments is forbidden. This is a strong restriction because many programmers use such a feature. We discuss in section 4.4 a way to overcome this restriction as for all other encapsulation-based

13 / 20 restrictions. It is worth noting that this restriction would not apply to a pure object oriented language such as Java simply because pointers do not exist in Java.  Pointer arithmetic and Local pointers Pointer arithmetic may have side effects on the attributes. Using this C++ feature, subtle modifications can be performed on any attribute of an object, for instance a method of an object can modify the state of another object. Since for DeltaState checkpointing, we need to be aware of any possible attribute modification, we cannot allow such a feature. When a metaclass identifies pointer arithmetic in a method body, the use of DeltaState checkpointing is disabled for the next checkpoint, i.e. the next checkpoint is a FullState. The use of local pointers within a method lead to the same complexity as pointer arithmetic. Indeed, it is very difficult to know if the pointer is used to access a « legal » item, or if it is used (or misused) to access some object attribute.

4.4 Overcoming Restrictions

The aim of some of the above restrictions is to enforce the encapsulation principle : an object can only have private attributes, no friend class or function can be defined and methods cannot have pointer as arguments. To overcome these restrictions, a solution consists in forcing every object’s attribute to be private and only accessed thought controlled accessors: two methods, one for getting the attribute value and another for setting the attribute value. The original scope (public, protected or private) of the attribute defines the scope of the corresponding get and set methods. Each attribute access in the application source code is then replaced by a call to the appropriate accessor. The accessors are the only way to access any attribute and the flags handling is located into them. This results in fewer restrictions for the end-user (see Fig. 14).

Restriction FullState DeltaState No global variables and functions •• Single level pointers only •• No pointer arithmetic • No local pointers • Mono-threaded objects only ••

Fig. 14. Restrictions with attribute accessors

The above restrictions apply to CORBA objects written in C++. Since pointers and global variables or functions do not exist in Java, only the last restriction concerning multithreaded objects sill remains with this language.

14 / 20 5 Performance Analysis

The performance analysis of the technique proposed in this paper needs more benchmarking activity than performed today. Various typical programs must be used to evaluate the coverage of the technique and that programming restrictions are strictly controlled. Clearly, the technique must guaranty that the state obtained enables the application to resume execution on a spare copy. This is work in progress and we report in this section only preliminary results obtained with one of the first experiments performed up to now. The given performance results have been obtained with one of these examples: the NQueen problem. The problem is to place N queens on a NxN chessboard in such a way that no queen can capture another queen; this means that no queens may be placed on the same row, column, or diagonal. The NQueen problem resolution has been implemented in a recursive way in C++ and used with several values of N (from 4 to 19). The measurements have been made for 100 repetitions of the computation, for the original program without checkpointing, with FullState checkpointing and DeltaState checkpointing. A checkpoint is taken at each step of the recursion. The next figures show the overhead by checkpoint (Fig. 15) and the amount of data checkpointed (Fig. 16) for both FullState and DeltaState checkpointing techniques. It appears that the overhead for FullState checkpointing is constant; this is normal since the amount of data to be checkpointed does not depend on the complexity of the . This overhead is near 2 milliseconds per checkpoint. In this Nqueen implementation, the full state of the object contains a 100x100 char array and few integers or boolean, the size of a FullState checkpoint is about 10 Kbytes. We can also notice that the overhead of DeltaState checkpointing varies very much depending on the complexity of the solution. Nevertheless, an important point is that this overhead is nearly always less than with FullState checkpointing. Actually, from N=14 it is twice less. The average size of DeltaState checkpoints is only around 15 in this example. Indeed, in many applications, few members are modified between two consecutive checkpoints/method calls. In order to evaluate our approach, we have compared our results with others obtained with similar techniques for providing transparent checkpointing. The Porch [15] compiler was used to this aim. The Nqueen application has been translated into C code and compiled using Porch. The results show that Porch is very slow on this example. Globally, the overhead introduced by Porch (see Fig. 17) for one checkpoint is around ten times the overhead introduced with our technique. It is worth noting that since Porch handles C code, it checkpoints both global and local variables and the stack; since Nqueen is recursive, the size of the stack is significant. Thus Porch saves more data than needed, this explains the above results. In order to validate the SaveState and RestoreState methods generated for the Nqueen class, we have implemented a distributed version of this application: two

15 / 20 objects are created on different sites and exchange their state regularly. The protocol used is the following: 1. the first replica computes one step and sends its state to the second replica, 2. the second replica restores its state from the one received 3. the second replica behaves in the same way (computes one step and sends its state to his spare copy). The main conclusion we can draw from this experiment is that, at least for this application, the states obtained are consistent. Actually, the final result of the computation we obtained on each copy was the expected one, i.e. the final result obtained with the single application.

Overhead by Checkpoint

2,500

2,000

1,500 Full

1,000 Delta

Microseconds 0,500

0,000 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Complexity

Fig. 15. Overhead by Checkpoint in milliseconds for FullState and DeltaState checkpointing

Checkpoint Size

1,E+09 1,E+08 1,E+07 1,E+06 1,E+05 Full 1,E+04 Delta 1,E+03

Bytesscale) (log. 1,E+02 1,E+01 1,E+00 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Fig. 16. Amount of Data Checkpointed in bytes for FullState and DeltaState checkpointing

16 / 20 Overhead by Checkpoint

100

) 10 Full Delta Time Porch (mili-seconds 1 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

0,1 Complexity

Fig. 17. Overhead by checkpoint of thew Nqueen benchmark

6 System framework

As already mentioned, this work is part of the development of a general framework for implementing fault tolerance in CORBA applications. The corner stone of this architecture, briefly described in this section, is a runtime metaobject protocol. Compile-time reflection is used (see Fig. 18) to realize this runtime Meta-Object Protocol (FT-MOP) able to both observe and control the behavior and the state of objects (see fig. 19).

ClassClass C C MetaclassMetaclass Pack Pack IDLIDL Compiler Compiler

Direct IDLIDL ClassClass C C’ ’ Pack/unpack Any Mapping StructStruct C C ’ ’ Pack/unpack Any

Fig. 18. Compile-Time view

CORBA objects interact at runtime through metaobjects thanks to this MOP. The runtime metaobjects implements various fault-tolerance strategies.

17 / 20 type info. repl. protocol IR MO MO’

c r e a t F i P o T - O n MO

-M r T e P F q .

O Factory O’

ORBORB

NodeNode 1 1 NodeNode 2 2

Fig. 19. Runtime view

Thanks to the use of CORBA, the metainformation about objects state is available at runtime from the ORB’s Interface Repository (IR). Several other useful services for object replication are also provided on top of the ORB. Such services are for instance an object factory to create remote objects, a group service to provide group communication protocols to metaobjects, a configuration service to attach dynamically metaobjects to CORBA objects, etc.

7 Conclusion

This paper shows that the use of compile-time reflection is a good approach to tackle the difficult problem of object checkpointing. This issue is mandatory when considering fault-tolerant systems and other solutions are not satisfactory. Our solution is based on the use of compile-time metaclasses for generating automatically Save/Restore state methods. From a dependability viewpoint, this approach looks better than user-defined based approach provided the restrictions filtering and the code generation is correctly performed. Clearly, this approach is language dependent but this seems mandatory as far as off-the-shelf ORBs are considered. The approach would have been different when considering reflective ORBs. This will be investigated in the next future. Currently the validation of this approach, using various benchmarking applications is our main concern. Several technical aspects have to be validated or investigated: multiple inheritance, polymorphism, huge composition hierarchies, etc. As indicated in this paper, implementing this approach in Java will be easier and clearer. This will be done using OpenJava [4] . In parallel, we are currently developing the overall architecture for fault-tolerance: including the implementation of the basic services and the various fault-tolerance strategies in metaobjects.

18 / 20 8 Acknowledgements

The work described in this paper has been developed at LAAS in close collaboration with the University of Tsukuba in Japan. We would like to thank very much Shigeru Chiba and Michiaki Tatsubori for their help during our stay in Tsukuba. Their clever assistance when solving Open C++ problems was so helpful, we will never thank them enough.

References

[1] S. Chiba, “A Metaobject Protocol for C++,” presented at OOPSLA, Austin, Texas, USA, pp. 285-299, 1995. [2] S. Chiba, “Macro Processing in Object-Oriented Languages,” presented at TOOLS Pacific, Australia, 1998. [3] S. Chiba and T. Masuda , “Designing an Extensible Distributed Language with a Meta-Level Architecture,” presented at European Conference on Object Oriented Programming (ECOOP), pp. 482-501, 1993. [4] S. Chiba and M. Tatsubori, “Yet Another java.lang.Class,” presented at ECOOP'98 Workshop on Reflective Object-Oriented Programming and Systems, Brussels, Belgium, 1998. [5] E. N. Elnozahy, D. B. Johnson, and Y. M. Wang, “A survey of rollback- recovery protocols in message-passing systems” Dept. of , Carnegie Mellon University, CMU-CS-96-181, 1996. [6] J.-C. Fabre and T. Perennou, “A Metaobject Architecture for Fault-Tolerant Distributed Systems : the FRIENDS Approach,” in IEEE Transactions on Computers, Special issue on Dependability of Computing Systems: IEEE, pp. 78-95, 1998. [7] M. Kasbekar, C. Narayanan, and C. R. Dar, “Using Reflection for Checkpointing Concurrent Object Oriented Programs” Center for Computational Physics, University of Tsukuba, UTCCP 98-4, ISSN 1344- 3135, October 1998. [8] M.-O. Killijian, J.-C. Fabre, J.-C. Ruiz-Garcia, and S. Chiba, “A Metaobject Protocol for Fault-Tolerant CORBA Applications,” presented at IEEE Symposium on Reliable Distributed Systems, West Lafayette, Indiana, USA, pp. 127-134, 1998. [9] C.-C. J. Li, E. M. Stewart, and W. K. Fuchs, “Compiler assisted full- checkpointing,” Software- Pratice and Experiments, vol. 24, pp. 871-886, 1994. [10] S. Maffeis and D. C. Schmidt, “Constructing Reliable Distributed Communication Systems with Corba,” IEEE Communications Magazine, vol. 14, pp. 6, 1997. [11] L. E. Moser and P. M. Melliar-Smith, “The Interception Approach to Reliable Distributed CORBA Objects,” presented at 3rd USENIX

19 / 20 Conference on Object-Oriented Technologies and Systems, Portland, Or., USA, pp. 245-248, 1997. [12] OMG, “CORBA/IIOP 2.2 Specification” , 98-07-01, 1998. [13] G. D. Parrington, S. K. Shrivastava, W. S. M., and M. C. Little, “The Design and Implementation of Arjuna,” Computing Systems, vol. 8, pp. 255-308, 1995. [14] J. S. Plank, M. Beck, G. Kingsley, and K. Li, “Libckpt: Transparent checkpointing under ,” presented at Usenix Winter 1995 Technical Conference, New Orleans, LA, USA, pp. 213-223, 1995. [15] V. Strumpen and B. Ramkumar , “Portable Checkpointing for Heterogeneous Architectures,” in Fault-Tolerant Parallel and Distributed Systems, D. Avresky, R. and D. Keli, R., Eds.: Kluwer Academic Press, pp. 73-92, 1998.

20 / 20