Parallel Lossless Compression Using Gpus

Total Page:16

File Type:pdf, Size:1020Kb

Parallel Lossless Compression Using Gpus Parallel lossless compression using GPUs Eva Sitaridi* Rene Mueller Tim Kaldewey Columbia University IBM Almaden IBM Almaden [email protected] [email protected] [email protected] *Work done while interning in IBM Almaden, partially funded from NSF Grant IIS-1218222 Agenda • Introduction • Overview of compression algorithms • GPU implementation – LZSS compression – Huffman coding • Experimental results • Conclusions 2 Why compression? • Data volume doubles every 2 years* – Data retained for longer periods – Data retained for business analytics • Make better utilization of available storage resources – Increase storage capacity – Improve backup performance – Reduce bandwidth utilization •Compression should be seamless •Decompression important for Big Data workloads *Sybase Adaptive Server Enterprise Data Compression, Business white paper 2012 Compression trade-offs Compression ratio Compression speed Decompression speed Input file Initial input file Resources •Memory bandwidth •Memory space More important in some cases! •CPU utilization Compression speed vs Compression efficiency Decompression speed vs Compression efficiency Compression speed vs Decompression speed 4 Compression resource intensive 1 Dataset: English Wikipedia pages 1GB XML text dump pigz 0.1 gzip 0.01 bzip2 Compression efficiency=0.5 lzma xz 0.001 Compressed file is half the original 0 0.1 0.2 0.3 0.4 Compression efficiency • Default compression level used - Performance on Intel i7-3930K (6 cores, 3.2 GHz) 5 Compression libraries Deflate format – LZ77 compression – Huffman coding – Single threaded snappy Parallel gzip XZ All use LZ-variants 6 LZSS compression Input characters Output tokens 0 1 2 3 … ATTACTAGAATGT TACTAATCTGAT ATTACTAGAATGT(2,5)… CGGGCCGGGCCTG Literals Backreferences Unmatched characters (Position, Length) Minimum match length 7 LZSS compression Input characters Find longest match Output tokens 0 1 2 3 … ATTACTAGAATGT TACTAATCTGAT ATTACTAGAATGT(2,5)… CGGGCCGGGCCTG Literals Backreferences Unmatched characters (Position, Length) Sliding window buffer Unencoded lookahead characters Minimum match length 8 LZSS decompression W I K I P E D I A . C O Window buffer contents Tokens (0,4)M(5,4)COMM… WIKIMEDIACOMM Input data block Output data block Huffman algorithm • Huffman tree 0 13 1 – Leaves: encoded symbols 0 6 1 0 7 1 – Unique prefix for each character 3 4 ‘a’ ‘’f’ 0 1 0 1 ‘s’ ‘e’ ’h’ ‘r’ • Huffman coding – Short codes for frequent characters • Huffman decoding A) Traverse tree to decode B) Use look-up tables for faster decoding 10 What to accelerate? Profile of gzip on Intel i7-3930K Input: Compressible database column 1.9% 1.8% 1.4% 1.9% 4.9% 87.2% LZSS: Longest match LZSS: Other Huffman: Send bits Update crc Huffman: Compress block Huffman: Count tally >85% of time spent on string matching Accelerate LZSS first 11 Why GPUs? •LZSS string matching is memory bandwidth intensive - Leverage GPU bandwidth Intel i7-3930K Tesla K20x Memory Bandwidth 51.2 GB/s 250 GB/s (Spec) Memory Bandwidth 40.4 GB/s 197GB/s (Measured) #Cores 6 2688 12 How to parallelize compression/decompression? >1000 cores available! Data block 1 Thread 1 Data block 2 Thread 2 … … Split input file in independent blocks Input file Naïve approach: Threads process independent data/file blocks 13 Memory access pattern Actual memory access pattern Optimal GPU memory access pattern T1 T2 T3 T1 T2 T3 Data block 1 Data Block 2 Data Block 3 Data Block 1 Data Block 2 Data Block 3 Data block size>32K Many cache lines loaded Thread memory accesses in the same cache line •Low memory bandwidth 14 Thread utilization SIMT Architecture: Group execution Iter. 1 6 active threads T1 T2 T3 T4 T5 T6 i=thread id j=0 … while(window[i]==lookahead[j]) { j++; …. } Data block 1 Data block 2 Data block 3Data block 4 Data block 5 Data block 6 Different #iterations for each thread 15 Thread utilization SIMT Architecture: Group execution Iter. 2 4 active threads T1 T2 T3 T4 T5 T6 i=thread id j=0 … while(window[i]==lookahead[j]) { j++; …. } Data block 1 Data block 2 Data block 3Data block 4 Data block 5 Data block 6 Different #iterations for each thread 16 Thread utilization SIMT Architecture: Group execution Iter. 3 1 active thread T1 T2 T3 T4 T5 T6 i=thread id j=0 … while(window[i]==lookahead[j]) { j++; …. } Data block 1 Data block 2 Data block 3Data block 4 Data block 5 Data block 6 Different #iterations for each thread (6+4+1)/(3*6) = 11/18 = 61% thread utilization 17 GPU LZSS General compression Data block 1 Thread group 1 Data block 2 Thread group 2 Compact Thread group n Data block n Output file Input file Intermediate output Store list of compressed data block offsets Parallel decompression Better approach: Each data block is processed by a thread group 18 Compression efficiency vs Compression performance GPU LZSS* Lookahead: 66 chars Block size: 64K chars Faster performance drop •No gain in compression efficiency Window size * Related papers A. Ozsoy and M. Swany, “CULZSS: LZSS Lossless Data Compression on CUDA” A. Balevic, “Parallel Variable-Length Encoding on GPGPUs” 19 GPU LZSS decompression 1) Compute total size of tokens (serialized) Tokens CCGA(0,2)CGG(4,3)AGTT CCGACCCGGCCCAGTT Compressed input Uncompressed output 20 GPU LZSS decompression 2) Read tokens (parallel) Tokens CCGA(0,2)CGG(4,3)AGTT CCGACCCGGCCCAGTT Compressed input Uncompressed output 21 GPU LZSS decompression 3.2) Write uncompressed output: Tokens CCGA(0,2)CGG(4,3)AGTT CCGACCCGGCCCAGTT 3.1) Compute uncompressed output Compressed input Uncompressed output Problem: Backreferences processed in parallel might be dependent! Use voting function __ballot to detect conflicts 22 Writing LZSS tokens to output Case A: All literals CCGAGATTGAGTT 1) Write literals (parallel) Tokens Case B: Literals & non-conflicting backreferences Tokens CCGA(0,2)CGG(0,3)AGTT 1) Write literals (parallel) 2) Write backreferences (parallel) Case C: Literals & conflicting backreferences Tokens CCGA(0,2)CGG(4,3)AGTT 1) Write literals (parallel) 2) Write non-conflicting backreferences (parallel) 3) Write remaining backreferences (serial) 23 Huffman entropy coding • Inherently sequential • Coding challenge – Compute destination of encoded data • Decoding challenge – Determine codeword boundaries Focus on decoding for end-to-end decompression 24 Parallel Huffman decoding 01100110 10111001 11010110 11100001 10111011 01110001 00000010 00001110 File block 25 Parallel Huffman decoding 01100110 •During coding Offset 1 10111001 •Split data blocks in sub-blocks 11010110 •Store sub-block offsets Parallel sub-block decoding Offset 2 11100001 10111011 Offset 3 01110001 00000010 Offset 4 00001110 File block 26 Parallel Huffman decoding 01100110 •During coding Offset 1 10111001 •Split data blocks in sub-blocks 11010110 •Store sub-block offsets Parallel sub-block decoding Offset 2 11100001 10111011 •During decoding Offset 3 01110001 • 00000010 Use look-up tables for decoding rather than Huffman trees Offset 4 00001110 •Fit look-up table in shared memory File block •Reduce number of codes for length and distance 27 Parallel Huffman decoding 01100110 •During coding Offset 1 10111001 •Split data blocks in sub-blocks 11010110 •Store sub-block offsets Parallel sub-block decoding Offset 2 11100001 10111011 •During decoding Offset 3 01110001 • 00000010 Use look-up tables for decoding rather than Huffman trees Offset 4 00001110 •Fit look-up table in shared memory File block •Reduce number of codes for length and distance Trade compression efficiency for decompression speed 28 Experimental system Linux, kernel 3.0.74 Intel i7-3930K Tesla K20x Memory bandwidth 51.2 GB/s 250 GB/s (Spec) Memory bandwidth 40.4 GB/s 197 GB/s (Measured) Memory capacity 64 GB 6 GB #Cores 6 (12 threads) 2688 Clock frequency 3.2 GHz 0.732 GHz 29 Datasets Dataset Size Comp. efficiency* English 1GB 0.35 wikipedia Database 245MB 0.98 column •Datasets already loaded in memory •No disk I/O *For default parameter of gzip 30 Decompression performance 31 Decompression performance Data transfers slow down performance 32 Hide GPU to CPU transfer I/O using CUDA Streams … Read B1 Decode B1 Decompress B1 Write B1 Read B2 Stream Batch processing Time Hide GPU to CPU transfer I/O using CUDA Streams … Read B1 Decode B1 Decompress B1 Write B1 Read B2 Stream Batch processing Read B3 Decode B3 Read B2 Decode B2 Decompress B2 Stream Read B1 Decode B1 Decompress B1 Write B1 Pipeline PCI/E transfers Time Hide GPU to CPU transfer I/O using CUDA Streams … Read B1 Decode B1 Decompress B1 Write B1 Read B2 Stream Batch processing Read B3 Decode B3 Read B2 Decode B2 Decompress B2 Stream Read B1 Decode B1 Decompress B1 Write B1 Pipeline PCI/E transfers Read B3 Decode B3 Decompress B3 Write B3 Stream Read B2 Decode B2 Decompress B2 Write B2 Read B1 Decode B1 Decompress B1 Write B1 Pipeline PCI/E transfers & Concurrent kernel execution Time Decompression performance 36 Decompression performance Data transfer latency hidden 37 Decompression time breakdown English Wikipedia Database column Huffman % Huffman % LZSS % LZSS % 38 Decompression time breakdown English Wikipedia Database column Huffman % Huffman % LZSS % LZSS % LZSS faster for incompressible datasets 39 Decompression performance vs Compression efficiency English Wikipedia GPU Deflate 10 w PCI/E transfer ) 1 pigz 0.1 lzma gzip Bandwidth (GB/s xz bzip2 0.01 0 0.1 0.2 0.3 0.4 0.5 Compression efficiency Conclusions • Decompression – Hide GPU-CPU latency using 4-stage pipelining – LZSS faster for incompressible files • Compression – Reduce search time (using hash tables ?) 41 Conclusions Questions? • Decompression – Hide GPU-CPU latency using 4-stage pipelining – LZSS faster for incompressible files • Compression – Reduce search time (using hash tables ?) 42 .
Recommended publications
  • Data Compression: Dictionary-Based Coding 2 / 37 Dictionary-Based Coding Dictionary-Based Coding
    Dictionary-based Coding already coded not yet coded search buffer look-ahead buffer cursor (N symbols) (L symbols) We know the past but cannot control it. We control the future but... Last Lecture Last Lecture: Predictive Lossless Coding Predictive Lossless Coding Simple and effective way to exploit dependencies between neighboring symbols / samples Optimal predictor: Conditional mean (requires storage of large tables) Affine and Linear Prediction Simple structure, low-complex implementation possible Optimal prediction parameters are given by solution of Yule-Walker equations Works very well for real signals (e.g., audio, images, ...) Efficient Lossless Coding for Real-World Signals Affine/linear prediction (often: block-adaptive choice of prediction parameters) Entropy coding of prediction errors (e.g., arithmetic coding) Using marginal pmf often already yields good results Can be improved by using conditional pmfs (with simple conditions) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Dictionary-based Coding 2 / 37 Dictionary-based Coding Dictionary-Based Coding Coding of Text Files Very high amount of dependencies Affine prediction does not work (requires linear dependencies) Higher-order conditional coding should work well, but is way to complex (memory) Alternative: Do not code single characters, but words or phrases Example: English Texts Oxford English Dictionary lists less than 230 000 words (including obsolete words) On average, a word contains about 6 characters Average codeword length per character would be limited by 1
    [Show full text]
  • Package 'Brotli'
    Package ‘brotli’ May 13, 2018 Type Package Title A Compression Format Optimized for the Web Version 1.2 Description A lossless compressed data format that uses a combination of the LZ77 algorithm and Huffman coding. Brotli is similar in speed to deflate (gzip) but offers more dense compression. License MIT + file LICENSE URL https://tools.ietf.org/html/rfc7932 (spec) https://github.com/google/brotli#readme (upstream) http://github.com/jeroen/brotli#read (devel) BugReports http://github.com/jeroen/brotli/issues VignetteBuilder knitr, R.rsp Suggests spelling, knitr, R.rsp, microbenchmark, rmarkdown, ggplot2 RoxygenNote 6.0.1 Language en-US NeedsCompilation yes Author Jeroen Ooms [aut, cre] (<https://orcid.org/0000-0002-4035-0289>), Google, Inc [aut, cph] (Brotli C++ library) Maintainer Jeroen Ooms <[email protected]> Repository CRAN Date/Publication 2018-05-13 20:31:43 UTC R topics documented: brotli . .2 Index 4 1 2 brotli brotli Brotli Compression Description Brotli is a compression algorithm optimized for the web, in particular small text documents. Usage brotli_compress(buf, quality = 11, window = 22) brotli_decompress(buf) Arguments buf raw vector with data to compress/decompress quality value between 0 and 11 window log of window size Details Brotli decompression is at least as fast as for gzip while significantly improving the compression ratio. The price we pay is that compression is much slower than gzip. Brotli is therefore most effective for serving static content such as fonts and html pages. For binary (non-text) data, the compression ratio of Brotli usually does not beat bz2 or xz (lzma), however decompression for these algorithms is too slow for browsers in e.g.
    [Show full text]
  • A Survey Paper on Different Speech Compression Techniques
    Vol-2 Issue-5 2016 IJARIIE-ISSN (O)-2395-4396 A Survey Paper on Different Speech Compression Techniques Kanawade Pramila.R1, Prof. Gundal Shital.S2 1 M.E. Electronics, Department of Electronics Engineering, Amrutvahini College of Engineering, Sangamner, Maharashtra, India. 2 HOD in Electronics Department, Department of Electronics Engineering , Amrutvahini College of Engineering, Sangamner, Maharashtra, India. ABSTRACT This paper describes the different types of speech compression techniques. Speech compression can be divided into two main types such as lossless and lossy compression. This survey paper has been written with the help of different types of Waveform-based speech compression, Parametric-based speech compression, Hybrid based speech compression etc. Compression is nothing but reducing size of data with considering memory size. Speech compression means voiced signal compress for different application such as high quality database of speech signals, multimedia applications, music database and internet applications. Today speech compression is very useful in our life. The main purpose or aim of speech compression is to compress any type of audio that is transfer over the communication channel, because of the limited channel bandwidth and data storage capacity and low bit rate. The use of lossless and lossy techniques for speech compression means that reduced the numbers of bits in the original information. By the use of lossless data compression there is no loss in the original information but while using lossy data compression technique some numbers of bits are loss. Keyword: - Bit rate, Compression, Waveform-based speech compression, Parametric-based speech compression, Hybrid based speech compression. 1. INTRODUCTION -1 Speech compression is use in the encoding system.
    [Show full text]
  • Contrasting the Performance of Compression Algorithms on Genomic Data
    Contrasting the Performance of Compression Algorithms on Genomic Data Cornel Constantinescu, IBM Research Almaden Outline of the Talk: • Introduction / Motivation • Data used in experiments • General purpose compressors comparison • Simple Improvements • Special purpose compression • Transparent compression – working on compressed data (prototype) • Parallelism / Multithreading • Conclusion Introduction / Motivation • Despite the large number of research papers and compression algorithms proposed for compressing genomic data generated by sequencing machines, by far the most commonly used compression algorithm in the industry for FASTQ data is gzip. • The main drawbacks of the proposed alternative special-purpose compression algorithms are: • slow speed of either compression or decompression or both, and also their • brittleness by making various limiting assumptions about the input FASTQ format (for example, the structure of the headers or fixed lengths of the records [1]) in order to further improve their specialized compression. 1. Ibrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jorn Ostermann, Claudio Alberti, Marco Mattavelli, and S Cenk Sahinalp. Comparison of high-throughput sequencing data compression tools. Nature Methods, 13(12):1005–1008, October 2016. Fast and Efficient Compression of Next Generation Sequencing Data 2 2 General Purpose Compression of Genomic Data As stated earlier, gzip/zlib compression is the method of choice by the industry for FASTQ genomic data. FASTQ genomic data is a text-based format (ASCII readable text) for storing a biological sequence and the corresponding quality scores. Each sequence letter and quality score is encoded with a single ASCII character. FASTQ data is structured in four fields per record (a “read”). The first field is the SEQUENCE ID or the header of the read.
    [Show full text]
  • Schematic Entry
    Schematic Entry Copyrights Software, documentation and related materials: Copyright © 2002 Altium Limited This software product is copyrighted and all rights are reserved. The distribution and sale of this product are intended for the use of the original purchaser only per the terms of the License Agreement. This document may not, in whole or part, be copied, photocopied, reproduced, translated, reduced or transferred to any electronic medium or machine-readable form without prior consent in writing from Altium Limited. U.S. Government use, duplication or disclosure is subject to RESTRICTED RIGHTS under applicable government regulations pertaining to trade secret, commercial computer software developed at private expense, including FAR 227-14 subparagraph (g)(3)(i), Alternative III and DFAR 252.227-7013 subparagraph (c)(1)(ii). P-CAD is a registered trademark and P-CAD Schematic, P-CAD Relay, P-CAD PCB, P-CAD ProRoute, P-CAD QuickRoute, P-CAD InterRoute, P-CAD InterRoute Gold, P-CAD Library Manager, P-CAD Library Executive, P-CAD Document Toolbox, P-CAD InterPlace, P-CAD Parametric Constraint Solver, P-CAD Signal Integrity, P-CAD Shape-Based Autorouter, P-CAD DesignFlow, P-CAD ViewCenter, Master Designer and Associate Designer are trademarks of Altium Limited. Other brand names are trademarks of their respective companies. Altium Limited www.altium.com Table of Contents chapter 1 Introducing P-CAD Schematic P-CAD Schematic Features ................................................................................................1 About
    [Show full text]
  • Arxiv:2004.10531V1 [Cs.OH] 8 Apr 2020
    ROOT I/O compression improvements for HEP analysis Oksana Shadura1;∗ Brian Paul Bockelman2;∗∗ Philippe Canal3;∗∗∗ Danilo Piparo4;∗∗∗∗ and Zhe Zhang1;y 1University of Nebraska-Lincoln, 1400 R St, Lincoln, NE 68588, United States 2Morgridge Institute for Research, 330 N Orchard St, Madison, WI 53715, United States 3Fermilab, Kirk Road and Pine St, Batavia, IL 60510, United States 4CERN, Meyrin 1211, Geneve, Switzerland Abstract. We overview recent changes in the ROOT I/O system, increasing per- formance and enhancing it and improving its interaction with other data analy- sis ecosystems. Both the newly introduced compression algorithms, the much faster bulk I/O data path, and a few additional techniques have the potential to significantly to improve experiment’s software performance. The need for efficient lossless data compression has grown significantly as the amount of HEP data collected, transmitted, and stored has dramatically in- creased during the LHC era. While compression reduces storage space and, potentially, I/O bandwidth usage, it should not be applied blindly: there are sig- nificant trade-offs between the increased CPU cost for reading and writing files and the reduce storage space. 1 Introduction In the past years LHC experiments are commissioned and now manages about an exabyte of storage for analysis purposes, approximately half of which is used for archival purposes, and half is used for traditional disk storage. Meanwhile for HL-LHC storage requirements per year are expected to be increased by factor 10 [1]. arXiv:2004.10531v1 [cs.OH] 8 Apr 2020 Looking at these predictions, we would like to state that storage will remain one of the major cost drivers and at the same time the bottlenecks for HEP computing.
    [Show full text]
  • The Basic Principles of Data Compression
    The Basic Principles of Data Compression Author: Conrad Chung, 2BrightSparks Introduction Internet users who download or upload files from/to the web, or use email to send or receive attachments will most likely have encountered files in compressed format. In this topic we will cover how compression works, the advantages and disadvantages of compression, as well as types of compression. What is Compression? Compression is the process of encoding data more efficiently to achieve a reduction in file size. One type of compression available is referred to as lossless compression. This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. This is essential to data compression as the file would be corrupted and unusable should data be lost. Another compression category which will not be covered in this article is “lossy” compression often used in multimedia files for music and images and where data is discarded. Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences. Advantages/Disadvantages of Compression Compression of files offer many advantages. When compressed, the quantity of bits used to store the information is reduced. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. File compression can zip up several small files into a single file for more convenient email transmission.
    [Show full text]
  • Pack, Encrypt, Authenticate Document Revision: 2021 05 02
    PEA Pack, Encrypt, Authenticate Document revision: 2021 05 02 Author: Giorgio Tani Translation: Giorgio Tani This document refers to: PEA file format specification version 1 revision 3 (1.3); PEA file format specification version 2.0; PEA 1.01 executable implementation; Present documentation is released under GNU GFDL License. PEA executable implementation is released under GNU LGPL License; please note that all units provided by the Author are released under LGPL, while Wolfgang Ehrhardt’s crypto library units used in PEA are released under zlib/libpng License. PEA file format and PCOMPRESS specifications are hereby released under PUBLIC DOMAIN: the Author neither has, nor is aware of, any patents or pending patents relevant to this technology and do not intend to apply for any patents covering it. As far as the Author knows, PEA file format in all of it’s parts is free and unencumbered for all uses. Pea is on PeaZip project official site: https://peazip.github.io , https://peazip.org , and https://peazip.sourceforge.io For more information about the licenses: GNU GFDL License, see http://www.gnu.org/licenses/fdl.txt GNU LGPL License, see http://www.gnu.org/licenses/lgpl.txt 1 Content: Section 1: PEA file format ..3 Description ..3 PEA 1.3 file format details ..5 Differences between 1.3 and older revisions ..5 PEA 2.0 file format details ..7 PEA file format’s and implementation’s limitations ..8 PCOMPRESS compression scheme ..9 Algorithms used in PEA format ..9 PEA security model .10 Cryptanalysis of PEA format .12 Data recovery from
    [Show full text]
  • Improved Neural Network Based General-Purpose Lossless Compression Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa
    JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 DZip: improved neural network based general-purpose lossless compression Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa Abstract—We consider lossless compression based on statistical [4], [5] and generative modeling [6]). Neural network based data modeling followed by prediction-based encoding, where an models can typically learn highly complex patterns in the data accurate statistical model for the input data leads to substantial much better than traditional finite context and Markov models, improvements in compression. We propose DZip, a general- purpose compressor for sequential data that exploits the well- leading to significantly lower prediction error (measured as known modeling capabilities of neural networks (NNs) for pre- log-loss or perplexity [4]). This has led to the development of diction, followed by arithmetic coding. DZip uses a novel hybrid several compressors using neural networks as predictors [7]– architecture based on adaptive and semi-adaptive training. Unlike [9], including the recently proposed LSTM-Compress [10], most NN based compressors, DZip does not require additional NNCP [11] and DecMac [12]. Most of the previous works, training data and is not restricted to specific data types. The proposed compressor outperforms general-purpose compressors however, have been tailored for compression of certain data such as Gzip (29% size reduction on average) and 7zip (12% size types (e.g., text [12] [13] or images [14], [15]), where the reduction on average) on a variety of real datasets, achieves near- prediction model is trained in a supervised framework on optimal compression on synthetic datasets, and performs close to separate training data or the model architecture is tuned for specialized compressors for large sequence lengths, without any the specific data type.
    [Show full text]
  • Unix (And Linux)
    AWK....................................................................................................................................4 BC .....................................................................................................................................11 CHGRP .............................................................................................................................16 CHMOD.............................................................................................................................19 CHOWN ............................................................................................................................26 CP .....................................................................................................................................29 CRON................................................................................................................................34 CSH...................................................................................................................................36 CUT...................................................................................................................................71 DATE ................................................................................................................................75 DF .....................................................................................................................................79 DIFF ..................................................................................................................................84
    [Show full text]
  • Lossless Compression of Internal Files in Parallel Reservoir Simulation
    Lossless Compression of Internal Files in Parallel Reservoir Simulation Suha Kayum Marcin Rogowski Florian Mannuss 9/26/2019 Outline • I/O Challenges in Reservoir Simulation • Evaluation of Compression Algorithms on Reservoir Simulation Data • Real-world application - Constraints - Algorithm - Results • Conclusions 2 Challenge Reservoir simulation 1 3 Reservoir Simulation • Largest field in the world are represented as 50 million – 1 billion grid block models • Each runs takes hours on 500-5000 cores • Calibrating the model requires 100s of runs and sophisticated methods • “History matched” model is only a beginning 4 Files in Reservoir Simulation • Internal Files • Input / Output Files - Interact with pre- & post-processing tools Date Restart/Checkpoint Files 5 Reservoir Simulation in Saudi Aramco • 100’000+ simulations annually • The largest simulation of 10 billion cells • Currently multiple machines in TOP500 • Petabytes of storage required 600x • Resources are Finite • File Compression is one solution 50x 6 Compression algorithm evaluation 2 7 Compression ratio Tested a number of algorithms on a GRID restart file for two models 4 - Model A – 77.3 million active grid blocks 3.5 - Model K – 8.7 million active grid blocks 3 - 15.6 GB and 7.2 GB respectively 2.5 2 Compression ratio is between 1.5 1 compression ratio compression - From 2.27 for snappy (Model A) 0.5 0 - Up to 3.5 for bzip2 -9 (Model K) Model A Model K lz4 snappy gzip -1 gzip -9 bzip2 -1 bzip2 -9 8 Compression speed • LZ4 and Snappy significantly outperformed other algorithms
    [Show full text]
  • Lossless Compression of Audio Data
    CHAPTER 12 Lossless Compression of Audio Data ROBERT C. MAHER OVERVIEW Lossless data compression of digital audio signals is useful when it is necessary to minimize the storage space or transmission bandwidth of audio data while still maintaining archival quality. Available techniques for lossless audio compression, or lossless audio packing, generally employ an adaptive waveform predictor with a variable-rate entropy coding of the residual, such as Huffman or Golomb-Rice coding. The amount of data compression can vary considerably from one audio waveform to another, but ratios of less than 3 are typical. Several freeware, shareware, and proprietary commercial lossless audio packing programs are available. 12.1 INTRODUCTION The Internet is increasingly being used as a means to deliver audio content to end-users for en­ tertainment, education, and commerce. It is clearly advantageous to minimize the time required to download an audio data file and the storage capacity required to hold it. Moreover, the expec­ tations of end-users with regard to signal quality, number of audio channels, meta-data such as song lyrics, and similar additional features provide incentives to compress the audio data. 12.1.1 Background In the past decade there have been significant breakthroughs in audio data compression using lossy perceptual coding [1]. These techniques lower the bit rate required to represent the signal by establishing perceptual error criteria, meaning that a model of human hearing perception is Copyright 2003. Elsevier Science (USA). 255 AU rights reserved. 256 PART III / APPLICATIONS used to guide the elimination of excess bits that can be either reconstructed (redundancy in the signal) orignored (inaudible components in the signal).
    [Show full text]