Dictionary-Based Compression Algorithms in Mobile Packet Core
Total Page:16
File Type:pdf, Size:1020Kb
Master of Science in Computer Science October 2019 Dictionary-based Compression Algorithms in Mobile Packet Core Lakshmi Venkata Sai Sri Tikkireddy Faculty of Computing, Blekinge Institute of Technology, 371 79 Karlskrona, Sweden This thesis is submitted to the Faculty of Computing at Blekinge Institute of Technology in partial fulfillment of the requirements for the degree of Master of Science in Computer Science. The thesis is equivalent to 20 weeks of full-time studies. Contact Information: Author(s): Lakshmi Venkata Sai Sri Tikkireddy E-mail: [email protected] External advisor: 1. Erik Vargas ([email protected]) 2. Nils Ljungberg ([email protected]) University advisor: Siamak Khatibi Department of Telecommunications Faculty of Telecommunications Internet : www.bth.se Blekinge Institute of Technology Phone : +46 455 38 50 00 SE-371 79 Karlskrona, Sweden Fax : +46 455 38 50 57 ii ABSTRACT Context: With the rapid growth in technology, the amount of data to be transmitted and stored is increasing. The efficiency of information retrieval and storage has become a major drawback, thereby the concept of data compression has come into the picture. Data compression is a technique that effectively reduces the size of the data to save storage and speed up the transmission of the data from one place to another. Data compression is present in various formats and mainly categorized into lossy compression and lossless compression where lossless compression is often used to compress the data. In Ericsson, SGSN-MME is using one of the data compression technique namely Deflate, to compress each user data independently. Due to the compression ratio between compress and decompress speed, the deflate algorithm is not optimal for the SGSN-MME’s use case. To mitigate this problem, the deflate algorithm has to be replaced with a better compression algorithm. Objectives: This research is performed on several dictionary-based lossless data compression algorithms to find a suitable compression algorithm for the SGSN-MME's use case. To achieve this goal, we need to find out the type of data that is required for creating a dataset for the compression algorithms. After the dataset creation, various dictionary-based algorithms are examined to find the suitable dictionary-based algorithm for the use case and compare the performance of the found algorithm when using/not using a pre-defined dictionary. Methods: In this research, an experiment is performed to evaluate the performance of different dictionary-based algorithms based on compression ratio and time for compression. For this experiment, the data is provided by Ericsson AB, Gothenburg. The dataset consists of the user data from SGSN- MME. The selected dictionary-based algorithms namely LZ4, Brotli, Zstandard are evaluated based on their performance compared to Deflate based on the compression factors such as compression ratio and compression speed. Results: On observation and analysis of the experiment, Zstandard with dictionary was better in performance when compared with the compression factors such as compression ratio and compression speed. Conclusions: This research is concluded by identifying a suitable dictionary-based algorithm. The conclusion of the research is decided by showing the identified algorithm performs better than remaining selected algorithms LZ4, Brotli and Deflate. Keywords: Data Compression, Lossless compression, LZ4, Brotli, Zstandard, Deflate, Dictionary- based compression. iii ACKNOWLEDGMENT Firstly, I would like to thank my supervisor Prof. Siamak Khatibi, Department of Telecommunications and Prof. Emilia Mendes, Head of the Department of Computer Science and Engineering, for their unmatched guidance and support without which I could not have completed this study successfully. I sincerely thank them for believing in me and encouraging me all through this study. I would also like to thank my external supervisors Erik Vargas and Nils Ljungberg at Ericsson AB, Gothenburg. I thank them for the never-ending support and incredible motivation for this thesis. I am grateful to my parents for their unconditional love and support. Lastly, I would like to thank my near and dear friends for traveling alongside me in all my endeavors. iv LIST OF FIGURES FIGURE 1: THE PROCESS OF DATA COMPRESSION. ................................................................................. 10 FIGURE 2: VARIOUS COMPRESSION TECHNIQUES................................................................................... 11 FIGURE 3:TYPES OF LOSSLESS COMPRESSION ALGORITHMS. ............................................................... 13 FIGURE 4: FLOW CHART OF THE LZ4 ALGORITHM. ................................................................................ 15 FIGURE 5: ZSTD VS ZLIB (DEFLATE) [40] ............................................................................................... 18 FIGURE 6: CITRIX OVERVIEW................................................................................................................. 20 FIGURE 7: COMPRESSION OF ZSTANDARD (NORMAL) ............................................................................ 25 FIGURE 8: COMPRESSION OF ZSTANDARD USING THE DICTIONARY...................................................... 25 FIGURE 9: COMPARISON OF THE COMPRESSED SIZES ............................................................................. 26 FIGURE 10: ZSTD VS ZSTD_DICT IN TERMS OF COMPRESSION RATIO (AVERAGE) ................................. 27 FIGURE 11: ZSTD VS ZSTD_DICT IN TERMS OF COMPRESSION SPEED (AVERAGE) ................................. 28 FIGURE 12: COMPRESSION SPEED VS RATIO .......................................................................................... 28 FIGURE 13: ZSTD VS ZSTD_DICT IN TERMS OF SPACE SAVINGS (AVERAGE) .......................................... 29 v LIST OF TABLES TABLE 1: RATIO VS SPEED COMPARISON [42] ....................................................................................... 17 TABLE 2: COMPARING THE RATIOS OF THE ALGORITHM RESULTS. ....................................................... 27 TABLE 3: RANKS OF THE TEST CASE RATIOS FOR THE ALGORITHMS. .................................................... 30 TABLE 4: AVERAGE RANKS OF THE ALGORITHMS ................................................................................. 30 vi ABBREVIATIONS 1) SGSN Service GPRS Support Node 2) GPRS General Packet Radio Service 3) MME Mobility Management Entity 4) UE User Equipment 5) GSNWS GSN Work Space 6) ETS Erlang Term Storage 7) OTP Erlang Open Telecom Platform 8) Zstd Zstandard 9) Zstd_dict Zstandard using the developed dictionary 10) FSE Finite State Entropy 11) ANS Asymmetric Numeral Systems 12) tANS Tabled Variant of ANS vii TABLE OF CONTENTS Abstract ................................................................................................................................................... iii Acknowledgment ..................................................................................................................................... iv List of figures............................................................................................................................................ v List of tables ............................................................................................................................................ vi Abbreviations ......................................................................................................................................... vii Table of Contents .................................................................................................................................. viii 1 Introduction ................................................................................................................................... 10 1.1 Problem statement: ............................................................................................................... 11 2 Related Work ................................................................................................................................. 13 2.1 Entropy-Based Encoding ...................................................................................................... 14 2.1.1 Huffman Coding ............................................................................................................... 14 2.1.2 Arithmetic Coding ............................................................................................................ 14 2.2 Dictionary Based Encoding .................................................................................................. 14 2.2.1 LZ4 ................................................................................................................................... 15 2.2.2 Brotli................................................................................................................................. 16 2.2.3 Deflate .............................................................................................................................. 16 2.2.4 Zstandard .......................................................................................................................... 17 3 Method ........................................................................................................................................... 19 3.1 Experiment Workspace ......................................................................................................... 20 3.1.1 Why