Clustering with Openmosix
Total Page:16
File Type:pdf, Size:1020Kb
Clustering with openMosix M. Michels & W. Borremans February 2005 Master program System and Network Engineering University of Amsterdam Abstract openMosix is a Linux kernel extension for single-system image clustering which turns a network of ordinary computers into a supercomputer. This project fo- cuses on the performance, reliability and characteristics of openMosix. We also asked ourselves where it could be implemented. There are some limitations with a openMosix cluster. We have our remarks looking at the reliability of the cluster application. According to our findings threaded applications can be used, though the threads will not be distributed over the cluster. This means no increase of performance. Due to the large amount of programs that actually use these kinds of programming models, not many applications can be used. The performance of the cluster dramatically drops when adding more then 12 nodes. The cluster’s stability is depended on all nodes, when one node fails the complete cluster could crash. Clustering with openMosix - M. Michels & W. Borremans 1 Contents 1 Introduction 3 1.1 Related work 3 1.2 What is a cluster? 3 1.3 What is openMosix? 5 1.3.1 openMosix vs Beowulf 5 1.4 Why openMosix? 6 2 Methods 7 2.1 Hardware being used 7 2.2 Applications being used 8 2.2.1 Povray 8 2.2.2 Encryption 8 2.2.3 Compiling 9 2.2.4 Synthetic benchmark 9 2.3 Collecting the test data 9 2.4 Reliability testing 10 3 Results 10 3.1 Experiments 10 3.1.1 Povray 10 3.1.2 Encryption 11 3.1.3 Compiling 12 3.1.4 Synthetic benchmark 12 3.1.5 Other benchmarks 12 3.2 Reliability 12 4 Discussion 13 4.1 Threads and Processes 13 4.2 Latency 14 4.3 Summary 16 4.4 Future work 17 A Source shellscipt Encryption benchmark 19 B Source shellscipt starting RSA bechmark 20 Clustering with openMosix - M. Michels & W. Borremans 2 1 Introduction This document describes the findings in the openMosix[1] project carried out in order of the RP1 subject of the ’System and Network Engineering[23]’ course at the University of Amsterdam[24]. The project is carried out by Maarten Michels[25] and Wouter Borremans[26] and is supervised by Harris Sunyoto[27]. This project focuses on the performance, reliability and characteristics of open- Mosix in which we will try to answer the following questions: • Performance - Is the performance linear according to the number of nodes? - What kind of applications will benefit from openMosix? • Reliability What happens if a specific part of a node fails? • Network load How is the network load related to the number of nodes? Depending on the answers of the above stated questions, we will conclude if openMosix is suitable in our field of study. In this report we will first explain what exactly a cluster is, after that we will continue to focus on openMosix. 1.1 Related work As the popularity of openMosix is increasing, more and more people are start- ing to do experiments with it. We found several sites on which we could find reports of the experiments. • Creating a low latency high performance gameservers[19] - Democritos[20] - Italian researchers tried to build a gameserver cluster using open- Mosix. • openMosix Cluster - University of Kiev[21] -created for solving scientific and applied problems that require large computing power and operate with large volumes of information. We were unable to find reports on performance numbers and reliability infor- mation on openMosix, we started some testing of our own. 1.2 What is a cluster? Definition according to an English dictionary: cluster; bunch, clump, cluster, clustering – (a grouping of a number of similar things; ”a bunch of trees”; ”a cluster of admirers”) A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single integrated computing resource. Clustering with openMosix - M. Michels & W. Borremans 3 These jobs vary from complex mathematical problems to generating high res- olution images. Jobs like searching for extraterrestrial intelligence[2] are a well known projects in which multiple computers connected over the internet are working together on just one project. When designing a cluster in general, there are a few goals to achieve; • Complete transparency - Single entry point, think of FTP NFS etc. • Scalable performance - Easy growth, the ability to expand your cluster with new machines continuously. • Enhanced availability - Automatic recovery from failures, when a node fails the cluster au- tomatically reruns the failed job. The most used technique to build up a cluster is SSI (Single System Image). SSI is the illusion created by software and hardware that represents a collection of computing resources as one, more whole resource. SSI makes the cluster appear like a single machine to the user, to applications and to the network. Depending on the job you have to carry out, different kinds of clusters are applicable; HTC High Throughput Cluster - Primary used for serial applications (i.e. open- Mosix) HPC High Performance Cluster - Most of the times used in the computational science, think of clusters that carry out jobs of Shell or the Mathematics department of the University of Amsterdam. SSI makes the use of system resources transparent and will offer improved system response time and performance. It simplifies the management because the system administrator does not have to know the underlying system archi- tecture to use the machines effectively. Computers in clusters are interconnected by a network, the network is -besides the nodes- the most important part of cluster. The network is the limit of the cluster, this means the higher the bandwidth the higher the performance of the cluster will be. The currently most used networks of 100Mbit and 1Gbit are in most cases not sufficient anymore. Other types of networks are available to face the need for higher bandwidth; SCI (Dolphin), Qsnet, Myrinet and In- finiband. These types of networks offer extreme high bandwidth with extreme low latency1. In general, the computer communication ratio in a parallel program remains fairly constant. If the computional power increases, the network speed must be increased also. 1. Latency is discussed in chapter 4.2 Clustering with openMosix - M. Michels & W. Borremans 4 1.3 What is openMosix? openMosix is a Linux kernel extension for single-system image clustering. This kernel extension turns a network of ordinary computers into a supercomputer for Linux applications. Once you installed openMosix, the nodes in the cluster will start talking to each other by exchanging messages. The cluster adapts itself to the workload. openMosix adds cluster functionality to any Linux flavor. openMosix uses adaptive load balancing techniques, processes that run on a node can trans- parently be distributed to one another. Due to the complete transparency of openMosix, a process does not have to know where it is running. The process ’thinks’ that it is running locally. In this case, this transparency means that no additional programming is needed to take advantage of the openMosix load-balancing technology. This is a great and powerful feature of openMosix, it really lives up it’s name. openMosix turns multiple Linux hosts into one large virtual SMP (Symmetric Multi Pro- cessor). Real SMP systems with two or more physical processors can exchange large amounts of data, in practice this means that real SMP systems are much faster. With openMosix, the speed at which the nodes can exchange data is lim- ited to the speed of the LAN connection. Using a high bandwidth connection will increase the effectiveness of your openMosix cluster. Another great advantage of openMosix is the ability to build a cluster out of inexpensive hardware giving you a traditional supercomputer. openMosix can also be used with performance enhancing techniques like Hyper- Threading[4] available on intel Pentium 4 and Xeon processors. Using this technique enables you to enhance the performance of a node. The node can now handle multiple cooperating threads that cannot be separated and dis- tributed among openMosix nodes. Use of the technologies mentioned may re- sult in significant performance increase. Besides the use of openMosix on Hyper-Threading systems, it has efficient par- allelization algorithms of its own. 1.3.1 openMosix vs Beowulf Together with openMosix, Beowulf[5] is one of the most well known cluster applications. The Beowulf project was started by Donald Becker in early 1994 at the NASA Goddard Space Flight Center. NASA was looking for a solution to build a cluster out of relatively inexpensive hardware. They ended up in using Beowulf to beat Grendel[6] .The project gained great popularity of other NASA research groups. The number of Beowulf clusters has grown relatively fast over the past few years. Clustering with openMosix - M. Michels & W. Borremans 5 There are some fundamental differences between the main structure of the two programs. A great disadvantage of Beowulf clusters comparing to openMosix is that only programs written using the PVM (Parallel Virtual Machine) and MPI (Message Passing Interface) libraries are supported. The PVM and MPI libraries are the most used libraries by programmers which program for clus- ters. The PVM and MPI libraries allow computing problems to be solved at a rate that scales almost linearly in relation to the number of nodes in the cluster. Since PVM and MPI programming is quite complex, these kind of techniques will only be used in small dedicated communities. We can conclude that for the ’regular’ cluster user Beowulf may be a little too complicated due to it’s special needs, with openMosix the user does not need to worry about the pro- gram structure.