Tuple Spaces Implementations and Their Efficiency
Total Page:16
File Type:pdf, Size:1020Kb
Tuple Spaces Implementations and Their Efficiency Vitaly Buravlev, Rocco Nicola, Claudio Mezzina To cite this version: Vitaly Buravlev, Rocco Nicola, Claudio Mezzina. Tuple Spaces Implementations and Their Efficiency. 18th International Conference on Coordination Languages and Models (COORDINATION), Jun 2016, Heraklion, Greece. pp.51-66, 10.1007/978-3-319-39519-7_4. hal-01631715 HAL Id: hal-01631715 https://hal.inria.fr/hal-01631715 Submitted on 9 Nov 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License Tuple spaces implementations and their efficiency Vitaly Buravlev, Rocco De Nicola, and Claudio Antares Mezzina IMT School for Advanced Studies Lucca, Piazza S. Francesco, 19, 55100 Lucca, Italy fvitaly.buravlev, rocco.denicola, [email protected] Abstract. Among the paradigms for parallel and distributed comput- ing, the one popularized with Linda and based on tuple spaces is the least used one, despite the fact of being intuitive, easy to understand and to use. A tuple space is a repository of tuples, where process can add, withdraw or read tuples by means of atomic operations. Tuples may contain different values, and processes can inspect the content of a tuple via pattern matching. The lack of a reference implementations for this paradigm has prevented its widespread. In this paper, first we do an ex- tensive analysis on what are the state of the art implementations and summarise their characteristics. Then we select three implementations of the tuple space paradigm and compare their performances on three different case studies that aim at stressing different aspects of comput- ing such as communication, data manipulation, and cpu usage. After reasoning on strengths and weaknesses of the three implementations, we conclude with some recommendations for future work towards building an effective implementation of the tuple space paradigm. 1 Introduction Distributed computing is getting increasingly pervasive, with demands from various applications domains and highly diverse underlying architectures from the multitude of tiny things to the very large cloud-based systems. Several paradigms for programming parallel and distributed computing have been pro- posed so far. Among them we can list: distributed shared memory, message passing, actors, distributed objects and tuple spaces. Nowadays, the most used paradigm seems to be message passing, with MPI [2] being its latest incarnation, while the least popular one seems to be the one based on tuple space that was proposed by David Gelernter for the Linda coordination model [8]. As the name suggests, message passing provides coordination abstractions based on the exchange of messages between distributed processes, where mes- sage delivery is often mediated via brokers and messages consist of a header and a body. In its simplest incarnation, message-passing provides a rather low- level programming abstraction for building distributed systems. Linda, instead provides a higher level of abstraction by defining operations for synchronization out(<“goofy”, 4, 10.4>) in(<10, x>) ... ... < “goofy”, 4, 10.4 > < 3.4, <..> > < “donald”, 6, 5.0 > <10, < … >> ... ... eval(….) rd(<“goofy”, _, _>) and exchange of values between different programs that can share information by accessing common repositories named tuple spaces. The key ingredients of Linda are few basic operations which can be embedded into different programming languages. These are atomic operations used for writ- ing (out), withdrawing (in), reading (rd) tuples into/from a tuple space. The operations for reading and withdrawing select tuples via pattern-matching. An- other operation eval is used to spawn new processes. The figure above illustrates an example of tuples space with different, structured, values. For example tuple h\goofy"; 4; 10:4i is produced by a process via the out(h\goofy"; 4; 10:4i) oper- ation, and it is read by the operation rd(\goofy"; ; ) after pattern-matching: that is the process reads any tuple of three elements whose first one is exactly the string \goofy". Moreover, tuple h10; h:::ii is consumed (atomically retracted) by operation in(10; x) which consumes a tuple whose first element is 10 and binds its second element (whatever it is) to the variable x. Patterns are sometimes referred as templates. The simplicity of this coordination model makes it very intuitive and easy to use. Some synchronization primitives, e.g. semaphores, barrier synchronization, can be implemented easily in Linda (cf. [6], Chapter 3). Unfortunately Linda's implementations of tuple space have turned out to be quite inefficient, and this has led researchers to opt for different approaches such Open MP or MPI, which are nowadays offered, as libraries, for many programming languages. When con- sidering distributed applications, the limited use of Linda coordination model is also due to the need of keeping tuple spaces consistent. In fact, in this case, control mechanisms that can affect scalability are needed [7]. In our view, tuple spaces can be effectively exploited as a basis for the broad range of the distributed applications with different domains (from lightweight applications to large cloud based systems). However, in order to be effective, we need to take into account that performances of a tuple space system may vary depending on the system architecture and the type of interaction between its 2 components. The aim of this paper is to examine the state of the art implemen- tations of tuple spaces, and to find out strengths and weaknesses. We start by cataloguing the existing implementations according to their fea- tures, and then we focus on the most recent Linda based systems that are still maintained, while paying specific attention to those featuring decentralized tu- ples space. For the selected systems, we compare their performances on three different case studies that aim at stressing different aspects of computing such as communication, data manipulation, and cpu usage. After reasoning on strength and weakness of the three implementations, we conclude with some recommen- dation for future work towards building effective implementation of the tuple space paradigm. 2 Tuple space systems In this Section, first we review several existing tuple space systems by briefly describing each of them, and single out the main features of their implemen- tations, then we summarise these features in Table 1. Later, we focus on the implementations that enjoy the characteristics we consider important for a tu- ple space implementation: code mobility, distribution of tuples and flexible tuples manipulation. JavaSpaces. JavaSpaces [13] is one of the first implementations of the tuple space developed by Sun Microsystems. It is based on a number of Java technolo- gies (Jini, RMI). As a commercial system, JavaSpaces supports transactions and mechanism of tuple leases. A tuple, called entry in JavaSpaces, is an in- stance of a Java class and its fields are the public fields of the class. This means that tuples are restricted to contain only objects but not primitive values. The tuple space is implemented by using a simple Java collection. Pattern matching is performed on the byte level, and the byte level comparison of data supports object-oriented polymorphism. TSpaces. TSpaces [12] is an implementation of the Linda model at the IBM Almaden Research Center. It combines asynchronous messaging with database features. Like JavaSpaces, TSpaces provides transactional support and mech- anism of tuple leases. Moreover, the embedded mechanism for access control to tuple spaces is based on access permission. It checks whether a client is able to perform specific operations in the specific tuples space. Pattern matching is performed using either standard equals method or compareTo method. Pat- ter matching uses SQL-like queries, allowing to match tuples regardless of their structure (e.g. the order in which fields are stored). GigaSpaces. GigaSpaces [9] is a contemporary commercial implementation of tuple space. Nowadays, the core of that system is GigaSpaces XAP, a scale-out application server and any user application should interact with it for creating and manipulating its own tuple space. The main areas where GigaSpaces can be 3 applied are concerned with big data analythics. GigaSpaces main features are: linear scalability, optimization of RAM usage, synchronization with databases and several database-like operations such as complex queries, transactions and replication. Tupleware. Tupleware [1] is specially designed for array-based applications in which an array is decomposed into several parts each of which can be processed in parallel. It aims at developing a scalable distributed tuple space with good performance on a computing cluster and provides clear and simple programming facilities for dealing with distributed tuple space as well as with centralized one. The tuple space is implemented as a hashtable, containing pairs consisting of a key and a vector of tuples. Due to the nature of Jave hashtable, it is possible to access concurrently several elements of the hashtable, since synchonisation is at the level of hashtable element. To speed up the search in the distributed tuple space, an algorithm based on the history of communication is used. Its main aim is to minimize the number of communications between nodes for tuples retrieval. The algorithm uses success factor, a real number between 0 and 1, expressing the likelihood of the fact that a node can find a tuple in the tuple space of other nodes. Each instance of Tupleware calculates success factor on the basis of past attempts to get information from other nodes and tuples are first searched in nodes with greater success factor.