Evaluation and Tuning of Gigabit Ethernet Performance on Clusters

EVALUATION AND TUNING OF GIGABIT ETHERNET PERFORMANCE ON CLUSTERS A thesis submitted to Kent State University in partial fulfillment of the requirements for the Degree of Master of Science by Harit Desai August, 2007 Thesis Written By Harit Desai B.E., Nagpur University, India, 2000 M.S., Kent State University, OH, 2007 Approved by Dr. Paul A. Farrell, Advisor Dr. Robert A. Walker, Chair, Dept. of Computer Science Dr Jerry Feezel, Dean, College of Arts and Sciences ii TABLE OF CONTENTS ACKNOWLEDGEMENTS …..………………………………………………………….vi CHAPTER 1 INTRODUCTION ....…………………………….…………………….. 1 1.1 Clusters for Scientific Computing ……………………………………….…….... 2 1.2 Thesis Organization .………………………………………………………........ 8 CHAPTER 2 OVERVIEW OF GIGABIT ETHERNET TECHNOLOGY ..............9 2.1 Operating Modes ………………………………………………………………... 9 2.2 Enhanced CSMA/CD…………………………………………………………… 12 2.3 Issues affecting Gigabit Ethernet performance…………………………………. 15 CHAPTER 3 VI ARCHITECTURE OVERVIEW ………………………………… 19 3.1 VI Architecture…………………………………………………………………..20 3.1.1. Virtual Interfaces……………………………………………………………….. 21 3.1.2. VI Provider …..…………………………………………………………...……. 23 3.1.3 VI Consumer……………………………………………………………………. 23 3.1.4. Completion Queues………………………………………………..……………. 24 3.2. Data Transfer Models………………………………………………..………….. 25 3.2.1 Send/Receive……………………………………………………………..………26 3.3. Managing VI Components……………………………………………….………27 iii 3.3.1 Accessing a VI NIC……………………………………………………………...27 3.3.2 Registering and De-registering Memory …..………………...…………………28 3.3.3 Creating and Destroying VIs …………………………………………………. 28 3.3.4 Creating and Destroying Completion Queue …...………………………….….39 3.4. VI Connection and Disconnection………………....…………………………..31 3.4.1. VI Connection…………………………………………………………………31 3.4.2. VI Disconnection……………………………………………………………...34 3.4.3. VI Address Format…………………………………………………………… 35 3.5. VI States…………………………………...…………………………………. 36 CHAPTER 4 NETPIPE……………………………………………………………. 37 4.1. Introduction……………………………………………………………………37 4.2. NetPIPE Design……………………………………………………………….38 4.3. NetPIPE Results……………………………………………………………….40 4.4. VIA driver for NetPIPE……………………………………………………….42 CHAPTER 5 PERFORMANCE COMPARISION……………………………….45 5.1. Testing Environment and Network Parameters……………………………….45 5.2 TCP Comparisons………………......................................................................46 5.2.1 Varying MTU size ……………………………………………………………48 iv 5.2.2. Varying Socket buffer size ……………………………………………………50 5.2.3. Varying TX queue length ……………………….…..………………………...51 5.2.4. Varying processor speed …………………………………………………...…54 5.2.5. Difference gigabit network interface ……………………….………………....57 5.2.6. Performance of Xeon processors ……………………….…………………….59 5.2.7. Performance of Opteron processors …………………….……………………..60 5.3 VIA Comparisons ………………………………………………………..…...63 5.4 TCP and VIA comparison…………………………………………..……...…67 5.5 MVIA Latency comparisons ………………………………………………….72 CHAPTER 6 MPI COMPARISONS …………………………………….………74 6.1 Introduction…………………………………………………………………...74 6.2 Testing environment ………………………………………………………….75 6.3 LAM and MPICH performance comparisons………………………………...76 CHAPTER 7 CONCLUSION ……………………………...……………………..81 References ……………………………………………………….………………...….84 v ACKNOWLEDGEMENT This thesis would not have been possible without the help and encouragement of many people. First and foremost I wish to thank my advisor, Dr. Paul Farrell. Without his encouragement, patience and constant guidance, I could not have completed my thesis. Besides my advisor, I would also like to thank Roy Heath for his help and support in providing me a platform to do various tests. I also want to thank Dr. Ruttan and Dr. Nesterenko for serving on my thesis committee. Last but not the least, I thank my family: My Mom and my wife for unconditional support and encouragement to pursue my interests. My friend – Darshan, who has always advised me when I needed him and always given me inspiration. I also want to thank my friends: Parag, Deepak, Jalpesh, Kiran, Mahesh and Siddharath for their support. vi Chapter 1 Introduction Abstract Cluster computing imposes heavy demands on the communication network. Gigabit Ethernet technology can provide the required bandwidth to meet these demands. However, it has also shifted the communication bottleneck from network media to protocol processing. In this thesis, we present an overview of Gigabit Ethernet technology and study the end-to-end Gigabit Ethernet communication bandwidth and latency. Performance graphs are collected using NetPIPE which clearly show the performance characteristics of TCP/IP and VIA over Gigabit Ethernet. Here we discuss the communication performance attainable with a PC cluster connected by a Gigabit Ethernet network. Gigabit Ethernet is the third generation of Ethernet technology and offers raw bandwidth of 1 Gbps. The focus of this work is to discuss the Gigabit Ethernet technology, to evaluate and analyze the end-to-end communication latency and achievable bandwidth, and to monitor the effects of software and hardware components on the overall network performance. 1 2 1.1. Clusters for Scientific Computing Cluster computing offers great potential for increasing the amount of computing power and communication resources available to large scale applications. The combined computational power of a cluster of powerful PCs connected to a high speed network may exceed that achievable by the previous generation of stand-alone high performance supercomputers. Running large scale parallel applications on a cluster imposes heavy demands on the communication network. Therefore, in early distributed computing, one of the design goals was to limit the amount of communication between hosts. However, due to the features of some applications, a certain degree of communication between hosts may be required. As a result, the performance bottleneck of the network severely limited the potential of cluster computing. Recent high speed networks such as Asynchronous Transfer Mode (ATM), Fibre Channel (FC), Gigabit Ethernet and 10 Gigabit Ethernet [8] change the situation. These high speed networks offer raw bandwidth ranges from 100 megabits per second (Mbps) to 10 gigabit per second (Gbps) satisfying the communication needs of many parallel applications. Due to the increase in network hardware speed and the availability of low cost high performance workstations, cluster computing has become increasingly popular. Many research institutes, universities, and industrial sites around the world have started to purchase or build low cost clusters, such as Linux Beowulf-class clusters, for their parallel processing needs at a fraction of the price of mainframes or supercomputers. Beowulf (PC) clusters represent a cost-effective platform for many large scale scientific 3 computations. They are scalable performance clusters based on commodity hardware, such as PCs and general purpose or third-party network equipment, on a private system area network. By general purpose network equipment, we mean network interface cards (NICs) and switches which have been developed for use in general local area networks (LANs) as opposed to those which are designed by third party vendors specifically for use in clusters or parallel machines, such as Myrinet [22], Giganet, or Quadrics. Latest trends of the most powerful computational machines in the world are tracked on the TOP 500 site ( www.top500.org ) [28]. This was started in 1993 to provide a reliable basis for tracking and detecting trends in high-performance computing. The site also includes summary information on the architectures, operating systems and interconnects of these computers. Some of the summary figures are included below. These clearly illustrate the transition from the early days when proprietary custom built supercomputers were dominant to the current situation where clusters are predominant. 4 Figure 1.1: Processor Family Evolution over time Figure 1.1 clearly indicates that Intel (EM64T, IA-64, i680, IA-32) and AMD processors now predominate in the TOP 500. The only other processors still appearing are the Power PC, Cray and Hewlett-Packard PA_RISC and of these only the Power PC is a significant percentage of the whole. 5 Figure 1.2: Architecture Evolution over time Figure 1.2 illustrates a similar consolidation in architecture, with clusters now representing approximately two-thirds of the machines in the TOP500. Figure 1.3 also illustrates a similar trend in interconnects, with Gigabit Ethernet, Myrinet and Infiniband [16] being the dominant interconnects in recent years. Of all interconnects, approximately 50% of high performance computational machines use Gigabit Ethernet technology. Figure 1.4 shows an even clearer dominance for Linux as it is the operating system used in over two thirds of clusters in the TOP500. Berkeley Systems Distribution (BSD) Unix and other Unix variants comprise most of the remainder. 6 Figure 1.3: Interconnect Evolution over time 7 Figure 1.4: Operating System Evolution over time However, in many cases the maximum achievable bandwidth at the application level is still far away from the theoretical peak bandwidth of the interconnection networks. This major roadblock to achieving high speed cluster communication is caused by the overhead resulting from the time required for the interaction between software and hardware components. To provide a faster path between applications and the network, most researchers have advocated removing the operating system kernel and its centralized networking stack from the critical path and

Evaluation and Tuning of Gigabit Ethernet Performance on Clusters

End-To-End Lightpaths for Large File Transfers Over High Speed Long Distance Networks

Institutionen För Datavetenskap Department of Computer and Information Science

Performance Testing Tools Jan Bartoň 30/10/2003

Realistic Network Traffic Profile Generation: Theory and Practice

A Tool for Evaluating Network Protocols

Optimizing NFS Performance

Initial End-To-End Performance Evaluation of 10-Gigabit Ethernet

Jo˜Ao Pedro Marçal Lemos Martins Sistema Para Gest˜Ao Remota De

Available End-To-End Throughput Measurement

Network Working Group S. Bensley Internet-Draft Microsoft Intended Status: Informational L

Open Source Traffic Analyzer

Applications and Network Performance