Performance Evaluation of Gluster and Compuverde Storage Systems Comparative Analysis

Thesis no: MSEE-2016-30 Performance Evaluation of Gluster and Compuverde Storage Systems Comparative analysis Poojitha Rajana Faculty of Computing Blekinge Institute of Technology SE-371 79 Karlskrona Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfilment of the requirements for the degree of Master of Science in Electrical Engineering with Emphasis in Telecommunication Systems. The thesis is equivalent to 20 weeks of full time studies. Contact Information: Author(s): Poojitha Rajana E-mail: [email protected], [email protected] External advisor: Stefan Bernbo CEO, Compuverde AB E-mail: [email protected] University advisor: Dr. Dragos Ilie Assistant Professor of Telecommunication Systems Department of Communication Systems E-mail: [email protected] Faculty of Computing Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE-371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 ii ABSTRACT Context. Big Data and Cloud Computing nowadays require large amounts of storage that are accessible by many servers. To overcome the performance bottlenecks and single point of failure distributed storage systems came into force. So, our main aim in this thesis is evaluating the performance of these storage systems. A file coding technique is used that is the erasure coding which will help in data protection for the storage systems. Objectives. In this study we investigate the performance evaluation of distributed storage system and understand the effect on performance for various patters of I/O operations that is the read and write and also different measurement approaches for storage performance. Methods. The method is to use synthetic workload generator by streaming and transcoding video data as well as benchmark tool which generate the workload like SPECsfs2014 is used to evaluate the performance of distributed storage systems of GlusterFS and Compuverde which are file based storage. Results. In terms of throughput results, Gluster and Compuverde perform similar for both NFS and SMB server. The average latency results for both NFS and SMB shares indicate that Compuverde has lower latency. When comparing results of both Compuverde and Gluster, Compuverde delivers 100% IOPS with NFS server and Gluster delivers relatively near to the requested OP rate and with SMB server Gluster delivers 100% IOPS and Compuverde delivers more than the requested OP rate. Conclusions. We concluded that Compuverde performs better when compared to gluster even though a low performance hardware is used. We recommend to use cache drive in the Gluster storage nodes for the better performance to avoid bottlenecks. Keywords: Distributed storage, Erasure coding, Performance, Streaming ACKNOWLEDGEMENTS I would like to express my deep gratitude to my supervisor Dr. Dragos Ilie for his valuable support and guidance throughout the completion of my thesis. His guidance, patience, comments and encouragement help me a lot to explore more in the topic. I thank him for his valuable time given for my thesis in his busy schedule. It was great to have such an advisor for my thesis. I would like to thank Dr. Kurt Tutschku examiner for his valuable comments. His guidance in ATS course has helped me a lot in my master thesis. I would like to thank my external supervisor Stefan Bernbo, CEO, Compuverde AB for his guidance, suggestion in my thesis and an opportunity to do as a Research Intern. I would like thank Carl-Henric Möllerström, Vice-President, Compuverde for his support and help during my thesis. I thank Eddie Rapp, system Administrator, Shishir Saha, Software developer and finally Nikoleta Dodova, Tester for helping me in providing the information about the issues that I have faced while performing the thesis and made my project successful at the end. I am thankful to my thesis partner Samuel Sushanth Kolli for his coordination and help for the entire thesis. I thank all my friends who helped me directly or indirectly though out my master studies. I am so thankful to my God for providing me such a wonderful career throughout my life with more strength and confidence in achieving my goals. Last but not least I am very happy to have such a great parent’s who gave me love, support, guidance in my entire life and stood by my side during hard times and encouraged till the end. LIST OF ABBREVIATIONS CIFS Common Internet File System DSS Distributed Storage System DHT Distributed Hash Table GFS Google File System HDD Hard Disk Drive IT Information Technology IOPS Input/output Operations Per Second IaaS Infrastructure as a Service MFS Moose File System NIST National Institute of Standards and Technology NAS Network Attached Storage NFS Network File System OSS Object Storage servers OST Object Storage Target PaaS Platform as a Service SaaS Software as a Service SN Storage Node SSD Solid State Disk SMB Server Message Block SPEC Standard Performance Evaluation Corporation SUT Service Under Test SWBUILD Software Build VDA Video Data Acquisition VDI Virtual Desktop Interface Contents ABSTRACT ...........................................................................................................................................I ACKNOWLEDGEMENTS ................................................................................................................ II LIST OF ABBREVIATIONS ........................................................................................................... III LIST OF TABLES ................................................................................................................................ 6 LIST OF FIGURES .............................................................................................................................. 7 1 INTRODUCTION ....................................................................................................................... 8 1.1 BACKGROUND........................................................................................................................ 8 1.1.1 Cloud computing ............................................................................................................... 8 1.1.2 Big Data ............................................................................................................................ 9 1.1.3 Why Cloud Computing for Big Data? ............................................................................... 9 1.1.4 Distributed Storage Systems ............................................................................................. 9 1.1.5 CONVINcE ..................................................................................................................... 10 1.2 AIM OF THE THESIS .............................................................................................................. 12 1.3 OBJECTIVES ......................................................................................................................... 12 1.4 RESEARCH QUESTIONS ........................................................................................................ 12 1.5 SPLIT OF WORK .................................................................................................................... 12 2 RELATED WORK .................................................................................................................... 14 3 METHODOLOGY .................................................................................................................... 16 3.1 METHOD USED..................................................................................................................... 16 3.1.1 Why SPECsfs2014?......................................................................................................... 16 3.1.2 Description of Four workloads ....................................................................................... 17 3.2 EXPERIMENTAL SETUP ......................................................................................................... 18 3.2.1 GlusterFS and Compuverde setup .................................................................................. 18 3.3 TESTBED WITH SPECSFS LOAD GENERATOR FOR PERFORMANCE EVALUATION .................. 24 3.3.1 Internal structure of SPECsfs 2014: ............................................................................... 25 3.3.2 Gluster and Compuverde test configuration ................................................................... 26 3.3.3 Evaluating Procedure ..................................................................................................... 26 3.3.4 Mounting the File systems .............................................................................................. 28 3.3.5 SPECsfs2014 Workload Generation ............................................................................... 28 3.3.6 Test Configuration of Gluster and Compuverde using SPECsfs2014 ............................. 28 3.4 PERFORMANCE METRICS (RQ2) .......................................................................................... 29 4 RESULTS AND ANALYSIS (RQ1) ......................................................................................... 30 4.1 ANALYSIS FROM RESULTS .................................................................................................... 30 4.1.1 Test results for total throughput using NFS 4.0 and SMB 1.0 ........................................ 30 4.1.2 Test results for Average latency using NFS 4.0 and SMB 1.0 .......................................

Performance Evaluation of Gluster and Compuverde Storage Systems Comparative Analysis

Gluster Roadmap: Recent Improvements and Upcoming Features

Storage Virtualization for KVM – Putting the Pieces Together

Glusterfs Documentation Release 3.8.0

Shared File Systems: Determining the Best Choice for Your Distributed SAS® Foundation Applications Margaret Crevar, SAS Institute Inc., Cary, NC

Texttext Introduction Into Glusterfs Martin Alfke <Martin.Alfke@Buero20

Red Hat Virtualization 4.3 Planning and Prerequisites Guide

Elo Provides On-Demand Infrastructure with Red Hat

An Introduction to Clustering

Software Defined Storage Overview

Introduction to Gluster

Ansible-Powered Red Hat Storage One

Analytical Study on Performance, Challenges and Future Considerations of Google File System