Nakamichi 'Dragoneye' Highlights

Nakamichi 'Dragoneye' Highlights

Nakamichi 'Dragoneye' highlights: - The latest Zennish LZSS Microdeduplicator, 100% FREE; - File-to-File [de]compressor; - Superfast decompression rates, superslow compression rates; - On big (500++MB) textual data, second only to Hamid's LzTurbo 29, ratiowise, resourcewise and speedwise - TRIPLE TRUMP :P; - Single-threaded Non-SIMD console tool written in plain C, compileable under Windows and Linux; - An LZSS (Lempel–Ziv–Storer–Szymanski) implementation with Greedy Parsing and 1TB Sliding Window; - Ability to deduplicate (as little as) 64 bytes long chunks 1TB backwards; - Targets huge textual datasets (mainly English), weak-'n'-slow on binary data; - One goal is to boost traversing (full-text parsing) of the whole XML dump of Wikipedia being ~64GB strong via TRANSPARENT decompression; - The first matchfinder using both the fastest memmem() Railgun ‘Trolldom’ and B-trees; - The first parser using both Internal or External RAM, decided by a single command line option - 'i' or 'e'; - Hashpot/hashpool (residing in Physical RAM) could be tuned via command line parameter, thus lessening the B-trees heights; - The B-trees form the second layer, the first being HASH table handled by FNV1A-Jesteress; - The Leprechaunesque (Internal/External) B-trees order 3 (2 keys MAX) are highly-optimized; - DEPRECIATED (too slow): To keep LEAF’s footprint small, keys 36/64 bytes long are hashed by SHA3-224, otherwise left intact; - The building of B-trees is done in 128 PASSES, thus LOCALITY/LOCALIZATION leads to cache-friendliness, for example, instead of confusing/blinding the SSD controller with building 2^27 ~= 128M B-trees at a time, 'PASSES' revision lowers the "noise/mayhem" 128 times by processing 1M B-trees at a time; - SCALABLE! Gets faster when more Physical or/and External RAM is available, on servers with 1TB RAM (or desktops with 64GB and 1TB Optane SSD) it will dance... HOMEPAGE: http://www.sanmayce.com/Nakamichi/index.html#DOWNLOAD Downloadable at: https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/520602#comment-1943095 https://gist.githubusercontent.com/Sanmayce/33e5047d45cdcb8e7711cd7d3ed52c7f/raw/d72e7126c8fbfde07c0d727dcb353b0267b8196c/Nakamichi_Ryuugan-ditto-1TB.c https://community.centminmod.com/threads/a-lzss-microdeduplicator-tagetting-huge-texts-with-c-source.16427/#post-75533 How to compile?: _MakeELF_Nakamichi_GCC.sh: gcc -O3 -static -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_btree.elf -D_N_XMM -D_N_prefetch_4096 -D_N_alone -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_POSIX_ENVIRONMENT_ -DLongestLineInclusive=64 _MakeEXE_Nakamichi_GCC.bat: gcc -O3 -msse4.1 -fomit-frame-pointer Nakamichi_Ryuugan-ditto-1TB_btree.c -o Nakamichi_Ryuugan-ditto-1TB_RAM_(5GB)_GCC730.exe -D_N_XMM -D_N_prefetch_4096 -D_N_alone -D_N_HIGH_PRIORITY -DHashInBITS=24 -DHashChunkSizeInBITS=24 -DRAMpoolInKB=5120 -DBtreeHEURISTIC -D_WIN32_ENVIRONMENT_ -DLongestLineInclusive=64 Corpus ‘XML’: E:\Nakamichi_2019-Aug-06>Nakamichi_Ryuugan-ditto-1TB_btree.exe SMMi :MM2 0MMMMM: rMMMMMa ZMMM. 0Z :MMM7 7B rMM@ MMM 7 MMM MMM. BMMMa XZ :MM: MMM XMMX XMMMMMMZ@M; rMMMM rMM; MM@ WM0 2MMMMM ZMMMM MMW BMM : MMMB aMMMMW 2MM :MM8 MM7 8MMi MMMZ XMMM MMa .WMMM0 @MM rMMMMi MMMMa MM, aMM XMMB MM: MMM2 @MXMMM7 0MM MMS WMM MM MMM, MM7 7MM MM: MMX ;MMM iMM MMM8 . MM8 MMS MMMZ . iMBi :MMMX @MMa WB: XMMZ2aX ZMM MMM2 :M2 :MM ZMMM BM8 ;MMMMMaaMM ,MM XMM@MM rMMMMM2aMM XMMMM SMMSMM. MMXMMM MMMM0 aMMMa@MMr MMS ;MM;MM8 XMMMM. MMr MMM0 MM 2MMW MMM 0MM ZMS 2MM. aMMB MMM BM aMM 8Ma MM iMM BMM M7 MMZ ;MMX MMi ;MM ZMW MM0 rM 7MM ;MM MMM7 iMM ZMM2 0MMa MMX 8M: MMM ZMMS BMM2 7 MM8 ZMr :MM rMB MM@ iMM MMZ iMM WMW BM2 MMX @MM MMi ;MMM 0MS 2MM7 aMMM ;MM .8Mi MMM aMM7 ZMMM MM ;Mr XMM iM8 MMa 0MM BMM MMM MM: BMX .MM MM7 aMM BMMM MM ;MMX ZMMMZ BM@ MMM . rMM7 ZMMMZ :MM:MX 0MZ M8 XMM, MMX MM. ;W: rMM aMX ZMM 7MM MM MMMi MM MMZ 8M:MM MM; MMM, MMa 0M:MM 8MWM8 MMZM0 MMM rMM 0MM WMa,M2 MMS MM8 @M0 :MMMXMa MMM WM XMB ; :MM;M8MMM MMM @M XM0 ; MMM@ MMMW MMrZi MMS MMi MMZM0 rMM Mi MM : ;MM BMMMM: rMM MM MMSMa ZMMM@ iMMr 7MM MM MMXM2 :MMM ZMMM 8MM@@ MMZ0 7MM , :MMMM MMBMa 0MWBr ,MMMM MMMM MMS MM MMMZ MMMM MMM MMS .MM MMMZ @MM. MMM MMMM WMM@ 0MM 2M ZMMM :MMMa MMM7 iMMMMMM@ XMMM .MM XMM aMMM MMM: MMW .MM XM@ ZMMM MM0 aMMX 8MMM .MMM WMM ZMr MMMX MMM8 0MMS SMMMMi MMM, rMM,MM2 MMM: SMMM ;MM2 ;MM,MMS MMM, MMM MMM MMM. MMM .MM MMZ MMM XMMM rMMB :MMM7 MM@ BMMM aMM ;MMX aMMX 8MMM ZMM MM8 .MMX MMM2 iMMX ,MMMMMMZ .@; MMM WMM . .: : 0MMZ .2Z :7r. ;i 0MMMaSMi aMMMM7 .WW Nakamichi 'Ryuugan-ditto-1TB', written by Kaze, inspired by Haruhiko Okumura sharing, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced. Note0: Nakamichi 'Dragoneye' is 100% FREE, licenseless that is. Note1: Hamid Buzidi's LzTurbo ([a] FASTEST [Textual] Decompressor, Levels 19/29/39) retains kingship, his TurboBench (2017-Apr-07) proves the supremacy of LzTurbo, Turbo-Amazing! Note2: Conor Stokes' LZSSE2 ([a] FASTEST Textual Decompressor, Level 17) is embedded, all credits along with many thanks go to him. Note3: The matchfinder is either 'Railgun_Trolldom' (matches longer than 18, except 36 and 64) or Leprechaun's B-tree order 3. Note4: Instead of '_mm_loadu_si128' '_mm_lddqu_si128' is used. Note5: Maximum compression ratio is 44:1, for 704 bytes long matches within 1TB Sliding Window. Note6: Please send me (at [email protected]) decompression results obtained on machines with fast CPU-RAM subsystems. Note7: In this compile, clock() was replaced with time() - to counter bigtime stats misreporting. Note8: Multi-way hashing allows each KeySize to occupy its own HASH pool, thus less RAM is in use - the LEAF is smaller. Note9: In this revision, B-tree heuristics are in use, allowing skipping many unnecessary memmem() invocations. NoteA: The file being compressed should be 64 bytes or longer due to Building-Blocks being in range 4..18, 36, 64. NoteB: In this compile, the keysizes in the LEAF are not HEXed i.e. not doubled. NoteC: In this latest (2019-Aug-06) compile, keysizes 36/64 are no longer hashed with SHA3-224, it is slow for this case. Syntax: Nakamichi infile [outfile hashsize treesize treetype] hashsize - hash pool in bits, 0..32, 0 meaning 2^0=1 B-tree per keysize treesize - B-trees pool in MB treetype - i|e or I|E, meaning (Internal|External) or (Internal|External but building B-trees in 128 passes) Example1: Nakamichi OSHO.TXT Large Text Compression Benchmark Example2: Nakamichi OSHO.TXT.Nakamichi Matt Mahoney, Last update: July 25, 2019 http://mattmahoney.net/dc/text.html Example3: Nakamichi OSHO.TXT OSHO.TXT.Nakamichi 24 49000 i Note1: Example above uses (8x2^24)x10 bytes for hash and ~48GB for B-trees of physical RAM. Compression Compressed size Decompresser Note2: Total The size bigger Time the hash(ns/byte) pool, the lesser B-tree tiering, i.e. significantly faster the compression is. Program Options enwik8 enwik9 size (zip) Note3: enwik9+prog The 'outfile' Comp name Decomp is a dummy,Mem Alg it alwaysNote is 'infile'+'.Nakamichi', not a bug, just enforcing avoidance of filename mayhem. ------- ------- ---------- ----------- ----------- E:\Nakamichi_2019-Aug-06> ----------- ----- ----- --- --- ---- phda9 1.8 15,010,414 116,544,849 42,944 xd 116,587,793 86182 86305 6319 CM 83 cmix v17 14,877,373 116,394,271 208,263 s Nakamichi_2019-Aug-06.zip 116,602,534 641189 645651 (112,899 25258 bytes): CM 83https://drive.google.com/file/d/1wQyl7MhUXDtr-ZBxwwN6n1KRbo5axa6B/view?usp=sharing ... enwik8.Nakamichi (32,917,888 bytes): https://drive.google.com/file/d/1IqeHzpzoHZGvMkUbGRxnuiqCAHZ-eO3L/view?usp=sharing cabarc 1.00.0601 -m lzx:21 28,465,607 250,756,595 51,917 xd 250,808,853 1619 15 20 LZ77 sr3 28,926,691 253,031,980 9,399 s enwik9.Nakamichi 253,054,625 (277,293,058 148 160 bytes): 68 SR https://drive.google.com/file/d/1f1NJjwPXCO8FvnQ-7nzY_I4FEoKdq0kW/view?usp=sharing 26 bzip2 1.0.2 -9 29,008,736 253,977,839 30,036 x 254,007,875 379 129 8 BWT ... libzling 20160107 e4 29,721,114 259,475,639 35,582 s 259,511,221 83 27 28 ROLZ 48 ... lzc v0.08 10 30,611,315 266,565,255 11,364 x 266,576,619 302 63 550 LZ77 Nakamichi 'Dragoneye' 32,917,888 277,293,058 112,899 277,405,957 1.3 LZSS 85 crush 1.00 cx 31,731,711 279,491,430 2,489 s 279,493,919 948 2.9 148 LZ77 60 xeloz 0.3.5.3 c889 32,441,272 283,621,211 18,771 s 283,639,982 1079 8 230 LZ77 48 bzp 0.2 31,563,865 283,908,295 36,808 x 283,945,103 110 120 3 LZP ha 0.98 a2 31,250,524 285,739,328 28,404 x 285,767,732 2010 1800 0.8 PPM ulz 0.06 c9 32,945,292 291,028,084 49,450 x 291,077,534 325 1.1 490 LZ77 82 60. Tested by Ilia Muravyov on an Intel Core i7-3770K, 4.8 GHz, 16 GB Corsair Vengeance LP 1800 MHz CL9, Corsair Force GS 240 GB SSD, Windows 7 SP1. 82. Tested by Ilia Muraviev on an Intel Core i7-4790K @ 4.6GHz, 32GB @ 1866MHz DDR3 RAM, RAMDisk. Nakamichi ‘The-Eye-of-the-Dragon’ 85. Tested by Georgi Marinov on i5-7200U @ 3.1GHz, 8GB @ 2133MHz DDR4 RAM, Windows 10. sets a Pareto efficiency (OPEN- SOURCE), only Oodle ‘Mermaid’ and Decompression rate in nanoseconds per byte, 1.3ns/B: LzTurbo 29 outperform Nakamichi, they set the REAL PARETO FRONTIER! enwik9.Nakamichi 725MB/s is 725x1024x1024B per 1,000,000,000ns enwik9 1,000,000,000B per (1,000,000,000B/(725x1024x1024B))x1,000,000,000ns= 1,315,412,850ns Or, Nakamichi decompresses enwik9 in 1.3s on a laptop.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    3 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us