Nakamichi 'Dragoneye' Highlights
Total Page:16
File Type:pdf, Size:1020Kb
Nakamichi 'Dragoneye' highlights: - The latest Zennish LZSS Microdeduplicator, 100% FREE; - File-to-File [de]compressor; - Superfast decompression rates, superslow compression rates; - On big (500++MB) textual data, second only to Hamid's LzTurbo 29, ratiowise, resourcewise and speedwise - TRIPLE TRUMP :P; - Single-threaded Non-SIMD console tool written in plain C, compileable under Windows and Linux; - An LZSS (Lempel–Ziv–Storer–Szymanski) implementation with Greedy Parsing and 1TB Sliding Window; - Ability to deduplicate (as little as) 64 bytes long chunks 1TB backwards; - Targets huge textual datasets (mainly English), weak-'n'-slow on binary data; - One goal is to boost traversing (full-text parsing) of the whole XML dump of Wikipedia being ~64GB strong via TRANSPARENT decompression; - The first matchfinder using both the fastest memmem() Railgun ‘Trolldom’ and B-trees; - The first parser using both Internal or External RAM, decided by a single command line option - 'i' or 'e'; - Hashpot/hashpool (residing in Physical RAM) could be tuned via command line parameter, thus lessening the B-trees heights; - The B-trees form the second layer, the first being HASH table handled by FNV1A-Jesteress; - The Leprechaunesque (Internal/External) B-trees order 3 (2 keys MAX) are highly-optimized; - DEPRECIATED (too slow): To keep LEAF’s footprint small, keys 36/64 bytes long are hashed by SHA3-224, otherwise left intact; - The building of B-trees is done in 128 PASSES, thus LOCALITY/LOCALIZATION leads to cache-friendliness, for example, instead of confusing/blinding the SSD controller with building 2^27 ~= 128M B-trees at a time, 'PASSES' revision lowers the "noise/mayhem" 128 times by processing 1M B-trees at a time; - SCALABLE! Gets faster when more Physical or/and External RAM is available, on servers with 1TB RAM (or desktops with 64GB and 1TB Optane SSD) it will dance... HOMEPAGE: http://www.sanmayce.com/Nakamichi/index.html#DOWNLOAD Downloadable at: https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/520602#comment-1943095 https://gist.githubusercontent.com/Sanmayce/33e5047d45cdcb8e7711cd7d3ed52c7f/raw/d72e7126c8fbfde07c0d727dcb353b0267b8196c/Nakamichi_Ryuugan-ditto-1TB.c https://community.centminmod.com/threads/a-lzss-microdeduplicator-tagetting-huge-texts-with-c-source.16427/#post-75533 How to compile?: _MakeELF_Nakamichi_GCC.sh: gcc -O3 -static -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_btree.elf -D_N_XMM -D_N_prefetch_4096 -D_N_alone -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_POSIX_ENVIRONMENT_ -DLongestLineInclusive=64 _MakeEXE_Nakamichi_GCC.bat: gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_(5GB)_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_WIN32_ENVIRONMENT_ -DLongestLineInclusive=64 Corpus ‘XML’: E:\Nakamichi_2019-Aug-06>Nakamichi_Ryuugan-ditto-1TB_btree.exe SMMi :MM2 0MMMMM: rMMMMMa ZMMM. 0Z :MMM7 7B rMM@ MMM 7 MMM MMM. BMMMa XZ :MM: MMM XMMX XMMMMMMZ@M; rMMMM rMM; MM@ WM0 2MMMMM ZMMMM MMW BMM : MMMB aMMMMW 2MM :MM8 MM7 8MMi MMMZ XMMM MMa .WMMM0 @MM rMMMMi MMMMa MM, aMM XMMB MM: MMM2 @MXMMM7 0MM MMS WMM MM MMM, MM7 7MM MM: MMX ;MMM iMM MMM8 . MM8 MMS MMMZ . iMBi :MMMX @MMa WB: XMMZ2aX ZMM MMM2 :M2 :MM ZMMM BM8 ;MMMMMaaMM ,MM XMM@MM rMMMMM2aMM XMMMM SMMSMM. MMXMMM MMMM0 aMMMa@MMr MMS ;MM;MM8 XMMMM. MMr MMM0 MM 2MMW MMM 0MM ZMS 2MM. aMMB MMM BM aMM 8Ma MM iMM BMM M7 MMZ ;MMX MMi ;MM ZMW MM0 rM 7MM ;MM MMM7 iMM ZMM2 0MMa MMX 8M: MMM ZMMS BMM2 7 MM8 ZMr :MM rMB MM@ iMM MMZ iMM WMW BM2 MMX @MM MMi ;MMM 0MS 2MM7 aMMM ;MM .8Mi MMM aMM7 ZMMM MM ;Mr XMM iM8 MMa 0MM BMM MMM MM: BMX .MM MM7 aMM BMMM MM ;MMX ZMMMZ BM@ MMM . rMM7 ZMMMZ :MM:MX 0MZ M8 XMM, MMX MM. ;W: rMM aMX ZMM 7MM MM MMMi MM MMZ 8M:MM MM; MMM, MMa 0M:MM 8MWM8 MMZM0 MMM rMM 0MM WMa,M2 MMS MM8 @M0 :MMMXMa MMM WM XMB ; :MM;M8MMM MMM @M XM0 ; MMM@ MMMW MMrZi MMS MMi MMZM0 rMM Mi MM : ;MM BMMMM: rMM MM MMSMa ZMMM@ iMMr 7MM MM MMXM2 :MMM ZMMM 8MM@@ MMZ0 7MM , :MMMM MMBMa 0MWBr ,MMMM MMMM MMS MM MMMZ MMMM MMM MMS .MM MMMZ @MM. MMM MMMM WMM@ 0MM 2M ZMMM :MMMa MMM7 iMMMMMM@ XMMM .MM XMM aMMM MMM: MMW .MM XM@ ZMMM MM0 aMMX 8MMM .MMM WMM ZMr MMMX MMM8 0MMS SMMMMi MMM, rMM,MM2 MMM: SMMM ;MM2 ;MM,MMS MMM, MMM MMM MMM. MMM .MM MMZ MMM XMMM rMMB :MMM7 MM@ BMMM aMM ;MMX aMMX 8MMM ZMM MM8 .MMX MMM2 iMMX ,MMMMMMZ .@; MMM WMM . .: : 0MMZ .2Z :7r. ;i 0MMMaSMi aMMMM7 .WW Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, inspired by Haruhiko Okumura sharing, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced. Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is. Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing! Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him. Note3: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18, except 36 and 64) or Leprechaun's B-tree order 3. Note4: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used. Note5: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window. Note6: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems. Note7: In this compile, clock() was replaced with time() - to counter bigtime stats misreporting. Note8: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller. Note9: In this revision, B-tree heuristics are in use, allowing skipping many unnecessary memmem() invocations. NoteA: The file being compressed should be 64 bytes or longer due to Building-Blocks being in range 4..18, 36, 64. NoteB: In this compile, the keysizes in the LEAF are not HEXed i.e. not doubled. NoteC: In this latest (2019-Aug-06) compile, keysizes 36/64 are no longer hashed with SHA3-224, it is slow for this case. Syntax: Nakamichi infile [outfile hashsize treesize treetype] hashsize - hash pool in bits, 0..32, 0 meaning 2^0=1 B-tree per keysize treesize - B-trees pool in MB treetype - i|e or I|E, meaning (Internal|External) or (Internal|External but building B-trees in 128 passes) Example1: Nakamichi OSHO.TXT Large Text Compression Benchmark Example2: Nakamichi OSHO.TXT.Nakamichi Matt Mahoney, Last update: July 25, 2019 http://mattmahoney.net/dc/text.html Example3: Nakamichi OSHO.TXT OSHO.TXT.Nakamichi 24 49000 i Note1: Example above uses (8x2^24)x10 bytes for hash and ~48GB for B-trees of physical RAM. Compression Compressed size Decompresser Note2: Total The size bigger Time the hash(ns/byte) pool, the lesser B-tree tiering, i.e. significantly faster the compression is. Program Options enwik8 enwik9 size (zip) Note3: enwik9+prog The 'outfile' Comp name Decomp is a dummy,Mem Alg it alwaysNote is 'infile'+'.Nakamichi', not a bug, just enforcing avoidance of filename mayhem. ------- ------- ---------- ----------- ----------- E:\Nakamichi_2019-Aug-06> ----------- ----- ----- --- --- ---- phda9 1.8 15,010,414 116,544,849 42,944 xd 116,587,793 86182 86305 6319 CM 83 cmix v17 14,877,373 116,394,271 208,263 s Nakamichi_2019-Aug-06.zip 116,602,534 641189 645651 (112,899 25258 bytes): CM 83https://drive.google.com/file/d/1wQyl7MhUXDtr-ZBxwwN6n1KRbo5axa6B/view?usp=sharing ... enwik8.Nakamichi (32,917,888 bytes): https://drive.google.com/file/d/1IqeHzpzoHZGvMkUbGRxnuiqCAHZ-eO3L/view?usp=sharing cabarc 1.00.0601 -m lzx:21 28,465,607 250,756,595 51,917 xd 250,808,853 1619 15 20 LZ77 sr3 28,926,691 253,031,980 9,399 s enwik9.Nakamichi 253,054,625 (277,293,058 148 160 bytes): 68 SR https://drive.google.com/file/d/1f1NJjwPXCO8FvnQ-7nzY_I4FEoKdq0kW/view?usp=sharing 26 bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129 8 BWT ... libzling 20160107 e4 29,721,114 259,475,639 35,582 s 259,511,221 83 27 28 ROLZ 48 ... lzc v0.08 10 30,611,315 266,565,255 11,364 x 266,576,619 302 63 550 LZ77 Nakamichi 'Dragoneye' 32,917,888 277,293,058 112,899 277,405,957 1.3 LZSS 85 crush 1.00 cx 31,731,711 279,491,430 2,489 s 279,493,919 948 2.9 148 LZ77 60 xeloz 0.3.5.3 c889 32,441,272 283,621,211 18,771 s 283,639,982 1079 8 230 LZ77 48 bzp 0.2 31,563,865 283,908,295 36,808 x 283,945,103 110 120 3 LZP ha 0.98 a2 31,250,524 285,739,328 28,404 x 285,767,732 2010 1800 0.8 PPM ulz 0.06 c9 32,945,292 291,028,084 49,450 x 291,077,534 325 1.1 490 LZ77 82 60. Tested by Ilia Muravyov on an Intel Core i7-3770K, 4.8 GHz, 16 GB Corsair Vengeance LP 1800 MHz CL9, Corsair Force GS 240 GB SSD, Windows 7 SP1. 82. Tested by Ilia Muraviev on an Intel Core i7-4790K @ 4.6GHz, 32GB @ 1866MHz DDR3 RAM, RAMDisk. Nakamichi ‘The-Eye-of-the-Dragon’ 85. Tested by Georgi Marinov on i5-7200U @ 3.1GHz, 8GB @ 2133MHz DDR4 RAM, Windows 10. sets a Pareto efficiency (OPEN- SOURCE), only Oodle ‘Mermaid’ and Decompression rate in nanoseconds per byte, 1.3ns/B: LzTurbo 29 outperform Nakamichi, they set the REAL PARETO FRONTIER! enwik9.Nakamichi 725MB/s is 725x1024x1024B per 1,000,000,000ns enwik9 1,000,000,000B per (1,000,000,000B/(725x1024x1024B))x1,000,000,000ns= 1,315,412,850ns Or, Nakamichi decompresses enwik9 in 1.3s on a laptop.