Snappy Compression Algorithm

Snappy Compression Algorithm

Decompressing Snappy Compressed Files at the Speed of OpenCAPI Speaker: Jian Fang TU Delft 1 Current Project … Sort SHADE DB DNA Join Spark Scalable CPU Seq Skyline Heterogeneous POWER9 … Accelerated HBM DatabasE HBM ARROW OpenCAPI - OpenCAPI OpenCAPI NVMe Memory SNAP SNAP Decom Actions pression Fletcher Fletcher NVLink FPGA Tables FPGA GPU 2 Big Data Processing Database Analytics Data Movement 3 • Moving Data from/to??? Storage • Slow, Bottleneck • 1st step in data processing and database analytics RLE LZ4 LZW Rawstorage GZIPData CPU Netezza • Decompress-filter Solutions Compressed storage DecompressedFilteredFPGA CPU Data Data Data Decompress-filter 4 Snappy compression Algorithm • LZ77-based, byte-level, lossless Goal • In Hadoop ecosystem 500MB/s OpenCAPI Per Core Speed • Support Parquet, ORC, etc. Core • Low compression ratio i7 • Fast compression and decompression 5 Mechanism of Snappy compression • Two kinds of token: Literal token and copy token • Find match from previous data • Example: ABCDEFGHABCD offset 8, length4 Use (8,4) to replace “ABCD” 7 Snappy Compression No Match! Input … ABCDEFGHABCD Generate a literal token Literal Match! Find next match … ABCDEFGHABCD copy : offset 8 Generate a copy token length 4 Output … (Literal, “ABCDEFGH”) (Copy, 8, 4) 8 Snappy Decompression … (Literal, “ABCDEFGH”) … ABCDEFGH (Copy, 8, 4) Copy … ABCDEFGHABCD 9 Snappy compression Algorithm • An independent 64KB history for every 64KB data • Token: copy or literal, copy lenth<=64B • Token size: 2 Byte minimum, average ≈< 4 Byte(assumption for design) • Highly data dependent Benchmark Compression Ratio Average Token Size DNA1 3.31 5.79 DNA2 2.36 7.38 DNA3 2.57 7.03 TPC-H 2.53 2.72 … … … 10 DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1 Design in FPGA • An independent 64KB history for every 64KB data • Token: copy or literal, copy lenth<=64B • Token size: 2 Byte minimum, average ≈< 4 Byte(assumption for design) • Highly data dependent 16X 64KB … history 4KB BRAM 11 DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 13 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 14 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 15 Token Structure Literal Token • LZ77-based, byte-level • In Hadoop ecosystem • Support Parquet, ORC, etc. • Two types of tokens: literal tokens and copy tokens Literal Token: (literal, length, data) Copy Token : (copy, offset, length) 16 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Token Structure Copy • LZ77-based, byte-level Token • In Hadoop ecosystem • Support Parquet, ORC, etc. • Two types of tokens: literal tokens and copy tokens Tokens are in various length 17 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Prior Solutions: Decode all possiblities 18 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 19 Token Dependency • Read – Read • No dependency • Write – Write Token1 Token 2 Token3 Token 4 • Never write same bytes Read Read • No dependency Write Write Read Read • Write – Read Write Write • Forwarding 20 Token Dependency – Address Conflict • Write – Write • Same block, same line 16X … BRAMs 21 Token Dependency – Address Conflict • Write – Write • Same block, different lines 16X … BRAMs 22 Token Dependency – Address Conflict • Read – Read 16X … BRAMs 23 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation Token size: 4B 2 tokens/cycle • Keep up with the interface bandwidth 250MHz --> 13 instances token size * token/cycle * frequency * instances = OpenCAPI BW 24 IDEA – from token-level to BRAM-level 16X … 16X … 25 Architecture Overview 26 Architecture Overview • Two-level Parser 27 Architecture Overview • Two-level Parser 28 Architecture Overview • Two-level Parser • Parallel BRAM Operations 29 Architecture Overview • Two-level Parser • Parallel BRAM Operations • Relax Execution Model 30 Parallel BRAM Operations & Relax Execution Model 31 Results • Implementation Result (on XCKU15P) Flip-Flop LUTs BRAMs Frequency Power 32K(3.32%) 46K(8.95%) 29(2.95%) 250MHz 2.9W • Benchmark • Alice’s Adventures in Wonderland : 85KB (149KB) for compressed (uncompressed) file. • Throughput FPGA(single CPU(Core i7 with decompressor) one thread) Input 3GB/s 300MB/s Output 5GB/s 500MB/s 32 Publish Paper: J. Fang, J. Chen, P. Hofstee, Z. Al-Ars, J. Hidders, A High-Bandwidth Snappy Decompressor Open in Reconfigurable Logic (CODES+ISSS 2018) Source https://github.com/ChenJianyunp/ FPGA-Snappy-Decompressor.git 33 Contact Me: [email protected] (Jian Fang) .

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    31 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us