Incompressibility and Lossless Data Compression: An Approach by Pattern Discovery Incompresibilidad y compresión de datos sin pérdidas: Un acercamiento con descubrimiento de patrones Oscar Herrera Alcántara and Francisco Javier Zaragoza Martínez Universidad Autónoma Metropolitana Unidad Azcapotzalco Departamento de Sistemas Av. San Pablo No. 180, Col. Reynosa Tamaulipas Del. Azcapotzalco, 02200, Mexico City, Mexico Tel. 53 18 95 32, Fax 53 94 45 34
[email protected],
[email protected] Article received on July 14, 2008; accepted on April 03, 2009 Abstract We present a novel method for lossless data compression that aims to get a different performance to those proposed in the last decades to tackle the underlying volume of data of the Information and Multimedia Ages. These latter methods are called entropic or classic because they are based on the Classic Information Theory of Claude E. Shannon and include Huffman [8], Arithmetic [14], Lempel-Ziv [15], Burrows Wheeler (BWT) [4], Move To Front (MTF) [3] and Prediction by Partial Matching (PPM) [5] techniques. We review the Incompressibility Theorem and its relation with classic methods and our method based on discovering symbol patterns called metasymbols. Experimental results allow us to propose metasymbolic compression as a tool for multimedia compression, sequence analysis and unsupervised clustering. Keywords: Incompressibility, Data Compression, Information Theory, Pattern Discovery, Clustering. Resumen Presentamos un método novedoso para compresión de datos sin pérdidas que tiene por objetivo principal lograr un desempeño distinto a los propuestos en las últimas décadas para tratar con los volúmenes de datos propios de la Era de la Información y la Era Multimedia.