Snappy Compression Algorithm

Decompressing Snappy Compressed Files at the Speed of OpenCAPI Speaker: Jian Fang TU Delft 1 Current Project … Sort SHADE DB DNA Join Spark Scalable CPU Seq Skyline Heterogeneous POWER9 … Accelerated HBM DatabasE HBM ARROW OpenCAPI - OpenCAPI OpenCAPI NVMe Memory SNAP SNAP Decom Actions pression Fletcher Fletcher NVLink FPGA Tables FPGA GPU 2 Big Data Processing Database Analytics Data Movement 3 • Moving Data from/to??? Storage • Slow, Bottleneck • 1st step in data processing and database analytics RLE LZ4 LZW Rawstorage GZIPData CPU Netezza • Decompress-filter Solutions Compressed storage DecompressedFilteredFPGA CPU Data Data Data Decompress-filter 4 Snappy compression Algorithm • LZ77-based, byte-level, lossless Goal • In Hadoop ecosystem 500MB/s OpenCAPI Per Core Speed • Support Parquet, ORC, etc. Core • Low compression ratio i7 • Fast compression and decompression 5 Mechanism of Snappy compression • Two kinds of token: Literal token and copy token • Find match from previous data • Example: ABCDEFGHABCD offset 8, length4 Use (8,4) to replace “ABCD” 7 Snappy Compression No Match! Input … ABCDEFGHABCD Generate a literal token Literal Match! Find next match … ABCDEFGHABCD copy : offset 8 Generate a copy token length 4 Output … (Literal, “ABCDEFGH”) (Copy, 8, 4) 8 Snappy Decompression … (Literal, “ABCDEFGH”) … ABCDEFGH (Copy, 8, 4) Copy … ABCDEFGHABCD 9 Snappy compression Algorithm • An independent 64KB history for every 64KB data • Token: copy or literal, copy lenth<=64B • Token size: 2 Byte minimum, average ≈< 4 Byte(assumption for design) • Highly data dependent Benchmark Compression Ratio Average Token Size DNA1 3.31 5.79 DNA2 2.36 7.38 DNA3 2.57 7.03 TPC-H 2.53 2.72 … … … 10 DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1 Design in FPGA • An independent 64KB history for every 64KB data • Token: copy or literal, copy lenth<=64B • Token size: 2 Byte minimum, average ≈< 4 Byte(assumption for design) • Highly data dependent 16X 64KB … history 4KB BRAM 11 DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 13 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 14 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 15 Token Structure Literal Token • LZ77-based, byte-level • In Hadoop ecosystem • Support Parquet, ORC, etc. • Two types of tokens: literal tokens and copy tokens Literal Token: (literal, length, data) Copy Token : (copy, offset, length) 16 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Token Structure Copy • LZ77-based, byte-level Token • In Hadoop ecosystem • Support Parquet, ORC, etc. • Two types of tokens: literal tokens and copy tokens Tokens are in various length 17 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Prior Solutions: Decode all possiblities 18 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 19 Token Dependency • Read – Read • No dependency • Write – Write Token1 Token 2 Token3 Token 4 • Never write same bytes Read Read • No dependency Write Write Read Read • Write – Read Write Write • Forwarding 20 Token Dependency – Address Conflict • Write – Write • Same block, same line 16X … BRAMs 21 Token Dependency – Address Conflict • Write – Write • Same block, different lines 16X … BRAMs 22 Token Dependency – Address Conflict • Read – Read 16X … BRAMs 23 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation Token size: 4B 2 tokens/cycle • Keep up with the interface bandwidth 250MHz --> 13 instances token size * token/cycle * frequency * instances = OpenCAPI BW 24 IDEA – from token-level to BRAM-level 16X … 16X … 25 Architecture Overview 26 Architecture Overview • Two-level Parser 27 Architecture Overview • Two-level Parser 28 Architecture Overview • Two-level Parser • Parallel BRAM Operations 29 Architecture Overview • Two-level Parser • Parallel BRAM Operations • Relax Execution Model 30 Parallel BRAM Operations & Relax Execution Model 31 Results • Implementation Result (on XCKU15P) Flip-Flop LUTs BRAMs Frequency Power 32K(3.32%) 46K(8.95%) 29(2.95%) 250MHz 2.9W • Benchmark • Alice’s Adventures in Wonderland : 85KB (149KB) for compressed (uncompressed) file. • Throughput FPGA(single CPU(Core i7 with decompressor) one thread) Input 3GB/s 300MB/s Output 5GB/s 500MB/s 32 Publish Paper: J. Fang, J. Chen, P. Hofstee, Z. Al-Ars, J. Hidders, A High-Bandwidth Snappy Decompressor Open in Reconfigurable Logic (CODES+ISSS 2018) Source https://github.com/ChenJianyunp/ FPGA-Snappy-Decompressor.git 33 Contact Me: [email protected] (Jian Fang) .

Snappy Compression Algorithm

Contrasting the Performance of Compression Algorithms on Genomic Data

Schematic Entry

Pack, Encrypt, Authenticate Document Revision: 2021 05 02

Improved Neural Network Based General-Purpose Lossless Compression Mohit Goyal, Kedar Tatwawadi, Shubham Chandak, Idoia Ochoa

Lossless Compression of Internal Files in Parallel Reservoir Simulation

User Commands GZIP ( 1 ) Gzip, Gunzip, Gzcat – Compress Or Expand Files Gzip [ –Acdfhllnnrtvv19 ] [–S Suffix] [ Name ... ]

The Ark Handbook

Arrow: Integration to 'Apache' 'Arrow'

Summer 2010 PPAXAXCENTURIONCENTURION Boston Police Patrolmen’S Association, Inc

Gzip, Bzip2 and Tar EXPERT PACKING

Parquet Data Format Performance

The Deep Learning Solutions on Lossless Compression Methods for Alleviating Data Load on Iot Nodes in Smart Cities