Benchmarking the Amazon Elastic Compute Cloud (EC2)
Total Page:16
File Type:pdf, Size:1020Kb
Benchmarking the Amazon Elastic Compute Cloud (EC2) A Major Qualifying Project Report submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Bachelor of Science by Patrick Coffey Niyanta Mogre Jason Beliveau Andrew Harner Approved By: WPI Project Advisor Professor Donald R. Brown March 9, 2011 Abstract This project sought to determine whether Amazon EC2 is an economically viable environment for scientific research-oriented activities within a university setting. The methodology involved benchmarking the performance of the cloud servers against the best systems available at the WPI ECE department. Results indicate that the newest ECE server outperformed the best EC2 instance by approximately 25% in most cases. A comprehensive cost analysis suggests that EC2 instances can achieve up to 60% better cost to performance ratios in the short-term when compared against the ECE servers. However, a long-term projected cost analysis determined that the overall cost of owning a large set of reserved instances comes out to almost 60% more than the cost of comparable in-house servers. Acknowledgments We would like to thank our advisor, Professor Donald R. Brown for his guidance throughout the course of this project. We would also like to thank Professor Sergey Makarov for providing us access to his research for our case study, as well as Professor Berk Sunar for advising us on the aspects of computer architecture that were needed to analyze our results. We would also like to thank Professor Hugh Lauer for allowing us the opportunity to run the SPEC benchmarks, as well as his advice in understanding benchmarking concepts. Finally, we would like to thank Robert Brown for allowing us the use of the ECE servers, as well as his generosity in providing us sufficient resources to enable our benchmarking. Table of Contents Contents 1 Introduction 1 2 Background 3 2.1 Cloud Computing . .3 2.2 The Amazon Elastic Compute Cloud . .3 2.3 In-House Servers . .4 2.4 Prior Work . .4 2.4.1 Cluster vs. Single-Node Computing . .4 2.4.2 Physical vs. Virtual Resources . .5 2.4.3 Cost Evaluation . .6 2.5 The Science of Benchmarking . .7 2.6 Benchmarks and Case Study . .7 2.6.1 Pystone . .8 2.6.2 MATLAB benchmark . .8 2.6.3 RAMSpeed (RAMSMP) . .9 2.6.4 SPEC2006 Benchmarks . 10 2.6.5 MATLAB Case Study . 11 3 Methodology 12 3.1 EC2 Configuration . 12 3.2 Determination of the Performance Comparison Ratio . 12 3.2.1 Purpose . 13 3.2.2 Methods . 13 3.2.3 Results . 14 3.2.3.1 Pystone Results . 15 3.2.3.2 MATLAB benchmark Results . 16 3.2.3.3 RAMSMP FLOATmem Results . 17 3.2.3.4 RAMSMP Float Reading Results . 17 3.2.3.5 RAMSMP Float Writing Results . 18 3.2.3.6 RAMSMP INTmem Results . 18 3.2.3.7 RAMSMP Int Reading Results . 19 3.2.3.8 RAMSMP Int Writing Results . 19 3.2.3.9 SPEC Results . 20 3.2.3.10 MATLAB Case Study Results . 21 3.3 Procedure . 22 3.3.1 Benchmarking . 22 3.3.1.1 Pystone and MATLAB benchmarks . 22 3.3.1.2 RAMSMP . 23 3.3.1.3 SPEC2006 Benchmark Suite . 24 3.3.1.4 MATLAB Case Study . 25 3.3.2 Analysis . 25 3.3.2.1 Benchmarks . 25 3.3.2.2 Costs . 26 4 Results 28 4.1 Pystone Benchmark . 28 4.2 MATLAB benchmark . 29 4.3 RAMSMP . 29 4.3.1 FLOATmark and INTmark . 30 4.3.2 FLOATmem and INTmem . 33 4.4 SPEC2006 . 34 4.5 MATLAB Case Study . 39 4.6 Comprehensive Analysis . 41 5 Cost vs. Time to Solution 46 5.1 Single Threaded Applications . 46 5.2 Multi-threaded Applications . 48 5.3 Cost vs. Time to Solution Discussion . 50 6 Cost Analysis 52 6.1 Cost Comparison: EC2 vs. Tesla . 52 6.1.1 Amazon EC2 . 52 6.2 Cost Comparison: EC2 vs. Physical Servers . 53 6.2.1 The ECE Department Servers . 53 6.2.2 EC2 Reserved Instances as Servers . 55 6.3 Discussion . 57 6.3.1 Tesla vs. EC2 . 57 6.3.2 Server lab vs. EC2 instances . 58 7 Overall Discussion 60 7.1 Issues . 60 7.2 Cost Comparison . 61 8 Conclusion 63 References 67 Appendices i A Pystone Code i B MATLAB Benchmark Code vii C Setting up EC2 Instances xv C.1 Creating Instances . xv C.2 Connecting to Instances . xxi C.2.1 The Linux/UNIX/Mac OSX Method . xxi C.2.2 The Windows Method . xxii D Performance Comparison Ratio Results xxiv D.1 Pystone . xxiv D.2 MATLAB Benchmark . xxiv D.3 RAMSMP . xxvi D.3.1 Float Memory . xxvi D.3.2 Float Reading . xxvi D.3.3 Float Writing . xxviii D.3.4 Int Memory . xxix D.3.5 Int Reading . xxx D.3.6 Int Writing . xxxi D.4 SPEC2006 . xxxiii D.5 MATLAB Case Study . xxxiv E Benchmark Results xxxv E.1 Pystone . xxxv E.2 MATLAB . xxxvi E.3 RAMSMP . xlii E.3.1 Float Memory . xlii E.3.2 Float Reading . xliv E.3.3 Float Writing . xlviii E.3.4 Int Memory . liii E.3.5 Int Reading . lv E.3.6 Int Writing . lx E.4 SPEC Benchmarks . lxv 1 Introduction Most private institutions run lengthy simulations and calculations in the process of conducting scientific research. To meet their computing needs, they typically invest in state of the art supercomputers and servers designated for general use by faculty and students. Owning a server generates a cumbersome amount of overhead costs, including the initial cost of purchasing the machine, the procurement of physical space, annual power and cooling costs, hard disk and power redundancy, and various other maintenance costs. In addition to the monetary costs, servers also carry certain usage and time constraints. During peak usage, public servers are exposed to high traffic situations which hamper the performance of all users. Utilizing task prioritization can help to mitigate this effect, but it is often either used incorrectly or is not an option..