Resource Localization Using Peer-To-Peer Technology for Network Enabled Servers*
Total Page:16
File Type:pdf, Size:1020Kb
Resource Localization Using Peer-To-Peer Technology for Network Enabled Servers? Eddy Caron1, Fr´ed´eric Desprez1, Franck Petit2, and C´edric Tedeschi1 1 [Eddy.Caron,Frederic.Desprez,Cedric.Tedeschi]@ens-lyon.fr LIP Laboratory / GRAAL Project UMR CNRS, ENS Lyon, INRIA, Univ. Claude Bernard Lyon 1 46 All´ee d’Italie, F-69364 Lyon Cedex 07 2 [email protected] LaRIA Laboratory University of Picardie - Jules Vernes 5 rue du Moulin Neuf, F-80000 Amiens Abstract. DIET (Distributed Interactive Engineering Toolbox) is a set of hierarchical components to design Network Enabled Server systems. These systems are built upon servers managed through distributed scheduling agents for a better scalability. Clients ask to these scheduling components to find servers available using some performance metrics and information about the location of data already on the network. Our target architecture is the grid which is highly heterogeneous and dynamic. Clients, servers, and schedulers are thus better connected in a dynamic (or peer-to-peer) fashion. One critical issue to be solved is the localization of resources on the grid. In this paper, we present the use of an asynchronous version of the Propagate Information with Feedback algorithm to discover computation resources in Network Enabled Servers with distributed schedulers and arbitrary networks. Resource discovery uses peer-to-peer connections between components. Our implementation is based on JXTA from Sun Microsystems and DIET developed in the GRAAL team from INRIA. The algorithm and its implementation are discussed and performance results are given from experiments over the VTHD network which connects several supercomputers in different research institutes through a high-speed network. 1 Introduction The use of distributed resources available through high-speed networks has recently gained a wide interest. So called grids [2, 9] are now widely available for many applications around the world. The number of resources made available grows every day and the scalability of middleware platforms becomes an important issue. Many research projects have produced middlewares to cope with heterogeneity and dynamicity of the target platforms like Globus, Condor, or XtremWeb while trying to hide the complexity of the platform as much as possible to the user. Among them, one simple, yet performant, approach consists in using servers available in different ad- ministrative domains through the classical client-server or RPC1 paradigm. Network Enabled Servers [7] implement this model also called GridRPC [13] (like DIET, NetSolve, or Ninf). Clients submit computation requests to a scheduler which goal is to find a server available on the grid. Scheduling has to be applied to balance the work among the servers and a list of available servers is sent back to the client which is in turn able to send the data and the request to solve a given problem. Due to the growth of the network bandwidth and the reduction of the latency, small computation requests can now be sent to servers available on the grid. One issue can now be the scalability of the middleware itself. The scheduling can be made scalable by distributing the scheduler itself. ? This work was supported in part by the ACI GRID (ASP), RNTL (GASP), and the RNRT (VTHD++) of the french department of research. 1 Remote Procedure Call We thus designed DIET [4], a set of hierarchical elements to build applications using the GridRPC paradigm. This middleware is able to find an appropriate server according to the information given in the client’s request (problem to be solved, size of the data involved), the performance of the target platform (servers load, memory available, communication performance), and the availability of data stored during previous computations. The scheduler is distributed using several hierarchies connected either statically or dynamically (in a peer-to-peer fashion). One important issue to be solved is the discovery of resources available at a given time for a request. In this paper, we use the Propagate Information with Feedback (so called PIF) algorithm to gather information about resource available for a given request. Following our results for static hierarchical networks presented in [6], we extend our platform and our resource discovery scheme for arbitrary (and dynamic) networks. We have designed an asynchronous PIF and implemented it in a dynamic version of our software using peer-to-peer connection between hierarchies. A longer version of this paper is given in a research report [5]. In a first section, we introduce a version of DIET allowing dynamic connections between the hierarchies. Then, in Section 3, we describe the algorithms used to find resources in DIET. These algorithms are based on the Propagate Information with Feedback Algorithm (PIF). Finally, before a conclusion and some hints for our future work, we give the validation of our implementations on a grid based on clusters connected through a Wide Area Network. 2 DIETJ : A Peer-to-Peer Extension of DIET The DIET architecture has been first designed following a hierarchical approach. Thus it provides a good scalability and can take into account the physical network constraints. In this section, we are describing the DIET static hierarchical architecture. 2.1 Basic hierarchical Architecture of DIET DIET is based on severals elements. First a Client is an application that uses DIET to solve problems in a RPC mode. Different kinds of clients should be able to connect to DIET from a web page, a PSE such as Matlab or Scilab, or from a program written in C or Fortran. The scheduler is scattered across a hierarchy of Agents. This hierarchy is made of one Master Agent (MA), several Agents (A), and Local Agents (LA). A Server Daemons (SeD) encapsulates a computational server. A SeD declares the problems it can solve to its parent LA. Figure 1 shows a hierarchy built upon several DIET elements. ¡ ¢ £ ¤ ¥ ¡ ¢ £ ¤ ¥ ¡ ¢ £ ¤ ¥ ¡ ¢ £ ¤ ¥ ¡ ¢ £ ¤ ¥ ¦ § ¨ © § ¨ © § § ¨ © ¨ © . / 0 ) ! , - ¨ © ¨ © ¨ © %& ' ( ) * ) ( + ! " # $ $ ¨ © ) * ) * - # ¨ © ¨ © § ¨ © ¨ © ¨ © Fig. 1. DIET hierarchical organization. Fig. 2. DIETJ architecture. In this version of DIET connections between elements are static. They are configured at the initialization of the platform or during the registration of new elements. Deployment algorithms [3] are used to start 2 elements (and map them on the target architecture) depending of the capacity of the target platform and the history of previous requests. Such a hierarchy has several limits. The first limit concerns the static configuration. Due to the static connections between agents, deploying the hierarchy (using configuration files or command-line) at a large scale becomes quickly fastidious. Moreover, such static configurations do not cope with the dynamicity of nodes of the grid. As a consequence, most of those hierarchies are rarely deployed among more than one administrative domain (One MA and its hierarchy per institute or city). Moreover, the clients are given an entry point statically. One need for the client is to dynamically choose the better MA considering proximity and latency, thus coping with the dynamicity of the grid. The second limit concerns the Master Agent bottleneck. One have a unique entry point (Master Agent) for clients that involves a probability of bottleneck (or even failure) on the MA growing with the number of clients. The third limit concerns the service availability. Real life use cases show that services are quite often deployed among only one hierarchy for many reasons (data locality, security). One key purpose of computa- tional grids is to make services available for clients anywhere in the wide area. So there is a need for making services available for a client, that does not know the entry point of the hierarchy providing the requested service, and thus for gathering services of many hierarchies over the wide area. 2.2 DIETJ : Design and Implementation The aim of DIETJ is to dynamically connect together distributed DIET hierarchies in the wide-area to gather services on demand to improve a large scalability. This new architecture addresses the limits described before and have the following properties: Connecting hierarchies dynamically for scalability To increase the scalability of DIET over the grid, we now dynamically build a multi-hierarchy by connecting the entry points of the hierarchies (Master Agents), and thus research institutes, cities together. Note that the multi-hierarchy is build on-demand by a Master Agent only if it fails to retrieve a service requested by a client, inside its own hierarchy. Moreover, a client can now dynamically discover one or several Master Agents when looking for a service, and thus connect the server with the best latency and locality. Sharing the load on the Master Agents So, the entry point for each client is now dynamically chosen, thus better sharing the load through Master Agents. Master Agents are connected in an unstructured peer-to-peer fashion (without any mechanism of maintenance, routing, or group membership). Gathering services at large scale Services are now gathered on-demand thus providing to clients an entry point to resources of hierarchies put in common in a transparent way. The DIETJ architecture, shown in Figure 2, is divided into two parts. The JXTA part including the MAJ , the SeDJ and the ClientJ . All these elements are peers on the JXTA virtual network and communicate together through it. The interface part: Java (JXTA native language) and C++ (DIET native language) must cooperate. The technology used is JNI, that allows a Java program to call functions written in C++. A quick description of main JXTA features is available in our research report [5]. Now, we introduce the different elements built on top of JXTA and their behavior. Based on results presented in [10], we believe JXTA pipes offer the right trade-off between transparency and performance for our architecture. Our implementation is based on JXTA 2.3 which minimizes the latency of the JXTA pipes, according to [10].