PhysComp96 Entropy and Compressibility of Symbol Sequences Full paper Draft, February 23, 1997 Werner Ebeling
[email protected] Thorsten PÄoschel
[email protected] Alexander Neiman
[email protected] Institute of Physics, Humboldt University Invalidenstr. 110, D-10115 Berlin, Germany The purpose of this paper is to investigate long-range correlations in symbol sequences using meth- ods of statistical physics and nonlinear dynamics. Beside the principal interest in the analysis of correlations and fluctuations comprising many letters, our main aim is related here to the problem of sequence compression. In spite of the great progress in this ¯eld achieved in the work of Shannon, Fano, Hu®man, Lempel, Ziv and others [1] many questions still remain open. In particular one must note that since the basic work by Lempel and Ziv the improvement of the standard compression algorithms is rather slow not exceeding a few percent per decade. One the other hand several experts expressed the idee that the long range correlations, which clearly exist in texts, computer programs etc. are not su±ciently taken into account by the standard algorithms [1]. Thus, our interest in compressibility is twofold: (i) We would like to explore how far compressibility is able to measure correlations. In particular we apply the standard algorithms to model sequences with known correlations such as, for instance, repeat sequences and symbolic sequences generated by maps. (ii) We aim to detect correlations which are not yet exploited in the standard compression algorithms and belong therefore to potential reservoirs for compression algorithms.