Evaluation of the Playstation 2 As a Cluster Computing Node
Total Page:16
File Type:pdf, Size:1020Kb
Rochester Institute of Technology RIT Scholar Works Theses 2004 Evaluation of the PlayStation 2 as a cluster computing node Christopher R. Nigro Follow this and additional works at: https://scholarworks.rit.edu/theses Recommended Citation Nigro, Christopher R., "Evaluation of the PlayStation 2 as a cluster computing node" (2004). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Evaluation of the PlayStation 2 as a Cluster Computing Node Christopher R. Nigro A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Computer Engineering Department of Computer Engineering Kate Gleason College of Engineering Rochester Institute of Technology May 2, 2004 M.Shaaban Primary Advisor: Dr. Muhammad Shaaban Roy S. Czernikowski Advisor: Dr. Roy Czemikowski Greg Semeraro Advisor: Dr. Greg Semeraro Release Permission Form Rochester Institute of Technology Evaluation of the PlayStation 2 as' a Cluster Computing Node I, Christopher R. Nigro, hereby grant permission to the Wallace Library of the Rochester Institute of Technology to reproduce this thesis, in whole or in part, for non-commercial and non-profit purposes only. Christopher R. Nigro Christopher R. Nigro Date Abstract Cluster computing is currently a popular, cost-effective solution to the increasing computational demands of many applications in scientific computing and image processing. A cluster computer is comprised of several networked computers known as nodes. Since the goal of cluster computing is to provide a cost-effective means to processing computationally demanding applications, nodes that can be obtained at a low price with minimal performance tradeoff are always attractive. Presently, the most common cluster computers are comprised of networks of workstations constructed from commodity components. Recent trends have shown that computers being developed and deployed for purposes other than traditional personal computers or workstations have presented new candidates for cluster computing nodes. The new computing node candidates being considered may provide a competitive and even less expensive alternative to the cluster computing nodes being used today. Machines such as video game consoles, whose prices are kept extremely low due to intense marketplace competition, are a prime example of such machines. The Sony PlayStation 2, in particular, provides the user with low-level hardware devices that are often found in more expensive machines. This work presents and evaluation of the PlayStation 2 video game console as a cluster computing node for scientific and image processing applications. From this evaluation, a determination is made as to whether the PlayStation 2 is a viable alternative to the cluster computing nodes being used today. TABLE OF CONTENTS LIST OF FIGURES IV LIST OF TABLES V GLOSSARY VI CHAPTER 1 INTRODUCTION 1 CHAPTER 2 CLUSTER COMPUTING 5 2.1 Background Information 5 2.2 Cluster Computer Components 6 2.2.1 Node Interconnects 6 2.2.2 System Software 7 2.2.3 Computing Nodes 7 2.3 Cluster Computing Performance and Scaling 8 2.3.1 Parallel Performance Scaling 9 2.3.2 Problem-Constrained Scaling 1 1 2.3.3 Time-Constrained Scaling 1 1 2.3.4 Memory-Constrained Scaling 12 2.4 Cluster Computing with the PlayStation 2 12 CHAPTER 3 THE EMOTION ENGINE 14 3.1 Design Background 14 3.2 Emotion Engine Hardware Overview IS 3.2.1 Emotion Engine Core CPU 17 3.2.2 Vector Processing Units 20 3.2.3 DMA Controller 28 3.2.4 Device Collaboration 28 3.3 Programming the Emotion Engine 30 3.3.1 The PlayStation 2 Linux Operating System 30 3.3.2 Programming the EE Core 31 3.3.3 Programming the Vector Processing Units 32 CHAPTER 4 MATRIX MULTIPLICATION: INITIAL EXPERIMENT 35 4.1 Problem Overview 35 4.2 Dot Product Matrix Multiplier 36 4.2.1 Algorithm Overview 35 4.2.2 Implementation Methodology 37 4.2.3 Results and Analysis 42 4.3 Block Matrix Multiplier 45 4.3.1 Data Transfer Benchmarks 46 4.3.2 Algorithm Revisions 48 4.3.3 Implementation Methodology ^ 49 4.3.4 Results and Analysis 54 4.4 Conclusions 58 CHAPTER 5 EMOTION ENGINE PERFORMANCE ANALYSIS 60 5.1 Benchmark Application Selection 60 5.2 Benchmark Comparison Machines 61 5.3 Matrix Multiplication 63 5.3.1 Implementation Details 63 5.3.2 Test Results 63 5.4 BLAS Level 1 Subroutines 65 5.4.1 Algorithm Descriptions 66 5.4.2 Implementation Details 67 5.4.3 Results 67 5.5 Image Processing Applications 69 5.5.1 Algorithm Overview 70 5.5.2 Implementation Details 73 5.5.3 Results 74 5.6 Benchmark Analysis 76 CHAPTER 6 MULTISPECTRAL IMAGING 78 6.1 Background Information 78 6.2 The Landsat Program 78 6.3 Pixel Classification 80 6.3.1 Manual Classification 80 6.3.2 Supervised Classification 81 6.3.3 Unsupervised Classification 81 6.4 K-Means Clustering 82 6.4.1 Algorithm Overview 82 6.4.2 K-Means Clustering Example 85 CHAPTER 7 K-MEANS CLUSTERING ON THE EMOTION ENGINE 87 7.1 K-Means on a Cluster Computer 87 7.1.1 Overview of Parallel K-Means Algorithm 88 7.1.2 K-Means on a PlayStation 2 Cluster 90 7.2 K-Means on the Emotion Engine 91 7.2.1 Sequential K-Means Profiling 92 7.2.2 Optimized K-Means Implementation Details 94 7.3 Performance Analysis 98 7.3.1 Performance Metrics 99 7.3.2 Single Node Performance 100 7.3.3 Parallel Algorithm Performance 104 7.4 Conclusions 109 CHAPTER 8 CONCLUSION 110 8.1 Accomplishments 110 8.2 Limitations 110 8.3 Future Work Ill REFERENCES 112 in LIST OF FIGURES Figure 2.1 Sample Parallel Algorithm Computation Time versus Sequential Algorithm 9 Figure 2.2 Sample Scaling Analysis for Parallel Performance 10 Figure 3.1 Block Diagram of the Emotion Engine [28] 15 Figure 3.2 SIMD Processing of Data versus Standard SISD Processing 16 Figure 3.3 Block Diagram of the Emotion Engine Core CPU [29] 18 Figure 3.4 General Block Diagram of a Vector Processing Unit [32] 22 Figure 3.5 Block Diagram of VUO [30] 25 Figure 3.6 Block Diagram of VU1 [30] 27 Figure 3.7 Emotion Engine Processing Element Collaboration Block Diagrams of (a) parallel display list generation and (b) serial display list generation 29 Figure 4.1 Performance Results of Emotion Engine Implementations of Dot Product Matrix Multiplier 44 Figure 4.2 Performance Results of Optimized Block-Oriented Matrix Multiplier Implementations 55 Figure 4.3 Performance Analysis of SPS2 VU1 Block Matrix Multiplier 57 Figure 5.1 Matrix Multiplication Performance and Benchmarking Results 64 Figure 5.2 Benchmark Performance Summary for Single-Precision AXPY and Dot Product Functions 68 Figure 5.3 Sample 3x3 Image Processing Mask or Filter 71 Figure 5.4 Filter Coefficients for (a) Laplacian Image Enhancement, (b) Horizontal Sobel Operators, and (c) Vertical Sobel Operators 72 Figure 5.5 Benchmark Performance Summary for Halftone Dithering, Laplacian Filtering, and Sobel Filtering Algorithms 74 Figure 6.1 .K-means clustering of a small segment of a Landsat 7 image. A true color composite of the image is shown in (a), while (b) shows the image separated into four clusters. Images (c), (d), (e), and (f), show each of the four clusters of pixels respectively 86 Figure 7.1 Sample K-Means Clustering Synchronization Procedure 90 Figure 7.2 Analysis of K-Means Clustering Algorithm Components Based on Percentage of Overall Processing Time 93 Figure 7.3 Device Collaboration for AT-Means Clustering on the Emotion Engine 95 Figure 7.4 Iterations Per Second Performance of .K-Means Clustering on Various Computing Platforms 101 Figure 7.5 Distance Calculations Per Second Performance of .K-Means Clustering on Various Computing Platforms 102 Figure 7.6 Two Node PlayStation 2 Cluster Performance of .K-Means Clustering versus 2.66 GHz Intel Xeon 105 Figure 7.7 Scaling Estimation for .K-Means Clustering on PlayStation 2 Cluster Computer.... 107 iv LIST OF TABLES Table 3.1 VU Feature Overview and Comparison 24 Table 4.1 Data Memory Map for Dot Product Calculation on VU1 39 Table 4.2 Data Memory Map for Dual VU Dot Product Calculations 40 Table 4.3 Summary of Emotion Engine Dot Product Matrix Multiplier Performance Results... 43 Table 4.4 Summary of Data Transfer Methods Efficiency Test 46 Table 4.5 VUO Data Memory Map for Single VU Micro Mode Implementation of Block Matrix Multiplier 50 Table 4.6 VUO Data Memory Map for Single VU Micro Mode Implementation of Block Matrix Multiplier 52 Table 4.7 Summary of Emotion Engine Block-Oriented Matrix Multiplier Performance Results. 54 Table 6.1 Summary of Landsat 7 Bands and General Applications [47] 79 Table 6.2 Summary of KT-means Clustering Example Results 85 Table 7.1 Memory Map for VU1 Micro Mode Implementation of K-Means Clustering 97 Table 7.2 Performance Comparison of Sequential and Two-Node Parallel K-Means Clustering on the PlayStation 2 105 GLOSSARY Basic Linear Algebra Subprograms (BLAS): A collection of building block functions for vector and matrix-based mathematical operations. Cluster Computer: A group of complete computers with dedicated interconnects that may be used to share in the processing of tasks or workloads. Commodity-Off-The-Shelf (COTS): Components or items, such as cluster computing nodes, that may be easily obtained, generally at a minimum cost. Digital Signal Processor (DSP): A microprocessor designed to repeatedly process streams of digital data.