Bicriteria data compression∗ Andrea Farruggia, Paolo Ferragina, Antonio Frangioni, and Rossano Venturini Dipartimento di Informatica, Universit`adi Pisa, Italy ffarruggi, ferragina, frangio,
[email protected] Abstract lem, named \compress once, decompress many times", In this paper we address the problem of trading that can be cast into two main families: the com- optimally, and in a principled way, the compressed pressors based on the Burrows-Wheeler Transform [6], size/decompression time of LZ77 parsings by introduc- and the ones based on the Lempel-Ziv parsing scheme ing what we call the Bicriteria LZ77-Parsing problem. [35, 36]. Compressors are known in both families that The goal is to determine an LZ77 parsing which require time linear in the input size, both for compress- minimizes the space occupancy in bits of the compressed ing and decompressing the data, and take compressed- file, provided that the decompression time is bounded space which can be bound in terms of the k-th order by T . Symmetrically, we can exchange the role of empirical entropy of the input [25, 35]. the two resources and thus ask for minimizing the But the compressors running behind those large- decompression time provided that the compressed space scale storage systems are not derived from those scien- is bounded by a fixed amount given in advance. tific results. The reason relies in the fact that theo- We address this goal in three stages: (i) we intro- retically efficient compressors are optimal in the RAM duce the novel Bicriteria LZ77-Parsing problem which model, but they elicit many cache/IO misses during formalizes in a principled way what data-compressors the decompression step.