Decompressing Snappy Compressed Files at the Speed of OpenCAPI Speaker: Jian Fang TU Delft 1 Current Project … Sort SHADE DB DNA Join Spark Scalable CPU Seq Skyline Heterogeneous POWER9 … Accelerated HBM DatabasE HBM ARROW OpenCAPI - OpenCAPI OpenCAPI NVMe Memory SNAP SNAP Decom Actions pression Fletcher Fletcher NVLink FPGA Tables FPGA GPU 2 Big Data Processing Database Analytics Data Movement 3 • Moving Data from/to??? Storage • Slow, Bottleneck • 1st step in data processing and database analytics RLE LZ4 LZW Rawstorage GZIPData CPU Netezza • Decompress-filter Solutions Compressed storage DecompressedFilteredFPGA CPU Data Data Data Decompress-filter 4 Snappy compression Algorithm • LZ77-based, byte-level, lossless Goal • In Hadoop ecosystem 500MB/s OpenCAPI Per Core Speed • Support Parquet, ORC, etc. Core • Low compression ratio i7 • Fast compression and decompression 5 Mechanism of Snappy compression • Two kinds of token: Literal token and copy token • Find match from previous data • Example: ABCDEFGHABCD offset 8, length4 Use (8,4) to replace “ABCD” 7 Snappy Compression No Match! Input … ABCDEFGHABCD Generate a literal token Literal Match! Find next match … ABCDEFGHABCD copy : offset 8 Generate a copy token length 4 Output … (Literal, “ABCDEFGH”) (Copy, 8, 4) 8 Snappy Decompression … (Literal, “ABCDEFGH”) … ABCDEFGH (Copy, 8, 4) Copy … ABCDEFGHABCD 9 Snappy compression Algorithm • An independent 64KB history for every 64KB data • Token: copy or literal, copy lenth<=64B • Token size: 2 Byte minimum, average ≈< 4 Byte(assumption for design) • Highly data dependent Benchmark Compression Ratio Average Token Size DNA1 3.31 5.79 DNA2 2.36 7.38 DNA3 2.57 7.03 TPC-H 2.53 2.72 … … … 10 DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1 Design in FPGA • An independent 64KB history for every 64KB data • Token: copy or literal, copy lenth<=64B • Token size: 2 Byte minimum, average ≈< 4 Byte(assumption for design) • Highly data dependent 16X 64KB … history 4KB BRAM 11 DNA data source: ftp://ftp.ncbi.nlm.nih.gov/ncbi-asn1 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 13 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 14 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 15 Token Structure Literal Token • LZ77-based, byte-level • In Hadoop ecosystem • Support Parquet, ORC, etc. • Two types of tokens: literal tokens and copy tokens Literal Token: (literal, length, data) Copy Token : (copy, offset, length) 16 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Token Structure Copy • LZ77-based, byte-level Token • In Hadoop ecosystem • Support Parquet, ORC, etc. • Two types of tokens: literal tokens and copy tokens Tokens are in various length 17 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Prior Solutions: Decode all possiblities 18 Source: Y. Qiao. An fpga-based snappy decompressor-filter. Master’s thesis, Delft University of Technology, 2018. Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation • Keep up with the interface bandwidth 19 Token Dependency • Read – Read • No dependency • Write – Write Token1 Token 2 Token3 Token 4 • Never write same bytes Read Read • No dependency Write Write Read Read • Write – Read Write Write • Forwarding 20 Token Dependency – Address Conflict • Write – Write • Same block, same line 16X … BRAMs 21 Token Dependency – Address Conflict • Write – Write • Same block, different lines 16X … BRAMs 22 Token Dependency – Address Conflict • Read – Read 16X … BRAMs 23 Design Challenge • Deal with different types of tokens • Handle various size of tokens • Parallelize token translation Token size: 4B 2 tokens/cycle • Keep up with the interface bandwidth 250MHz --> 13 instances token size * token/cycle * frequency * instances = OpenCAPI BW 24 IDEA – from token-level to BRAM-level 16X … 16X … 25 Architecture Overview 26 Architecture Overview • Two-level Parser 27 Architecture Overview • Two-level Parser 28 Architecture Overview • Two-level Parser • Parallel BRAM Operations 29 Architecture Overview • Two-level Parser • Parallel BRAM Operations • Relax Execution Model 30 Parallel BRAM Operations & Relax Execution Model 31 Results • Implementation Result (on XCKU15P) Flip-Flop LUTs BRAMs Frequency Power 32K(3.32%) 46K(8.95%) 29(2.95%) 250MHz 2.9W • Benchmark • Alice’s Adventures in Wonderland : 85KB (149KB) for compressed (uncompressed) file. • Throughput FPGA(single CPU(Core i7 with decompressor) one thread) Input 3GB/s 300MB/s Output 5GB/s 500MB/s 32 Publish Paper: J. Fang, J. Chen, P. Hofstee, Z. Al-Ars, J. Hidders, A High-Bandwidth Snappy Decompressor Open in Reconfigurable Logic (CODES+ISSS 2018) Source https://github.com/ChenJianyunp/ FPGA-Snappy-Decompressor.git 33 Contact Me: [email protected] (Jian Fang) .
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages31 Page
-
File Size-