A Kurtosis-Based Dynamic Approach to Gaussian Mixture Modeling

IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 29, NO. 4, JULY 1999 393 ACKNOWLEDGMENT [25] L. Staiger, “Quadtrees and the Hausdorff dimension of pictures,” in Workshop Geometrical Problems Image Processing, Georgental, GDR, The author thanks R. Riedi, L. Barreira, and L. Olsen for helpful 1989, pp. 173–178. comments on the multifractal formalism. He is also thankful to [26] K. Culik, II and S. Dube, “Affine automata and related techniques for generation of complex images,” Theor. Comput. Sci., vol. 116, no. 2, anonymous reviewers and G. Dorffner for helpful comments on the pp. 373–398, 1993. manuscript. [27] , “Rational and affine expressions for image description,” Discrete Appl. Math., vol. 41, pp. 85–120, 1993. [28] Y. Pesin and H. Weiss, “On the dimension of deterministic and cantor- like sets, symbolic dynamics, and the Eckmann–Ruelle conjecture,” REFERENCES Commun. Math. Phys, vol. 182, no. 1, pp. 105–153, 1996. [29] P. Moran, “Additive functions of intervals and Hausdorff dimension,” [1] G. Mayer-Kress, R. Bargar, and I. Choi, “Musical structures in data in Proc. Cambridge Philos. Soc., 1946, vol. 42, pp. 15–23. from chaotic attractors,” Tech. Rep. CCSR-92-14, Center for Complex [30] Y. Pesin and H. Weiss, “A multifractal analysis of equilibrium measures Syst. Res., Beckman Inst., Univ. Illinois, Urbana-Champaign, 1992. for conformal expanding maps and Moran-like geometric constructions,” [2] C. L. Berthelsen, J. A. Glazier, and M. H. Skolnik, “Global fractal J. Stat. Phys., vol. 86, no. 1/2, pp. 233–275, 1997. dimension of human DNA sequences treated as pseudorandom walks,” [31] L. Barreira, Y. Pesin, and J. Schmeling, “On a general concept of multi- Phys. Rev. A, vol. 45, no. 12, pp. 8902–8913, 1992. fractality: Multifractal spectra for dimensions, entropies, and Lyapunov [3] H. E. Stanley, S. V. Buldyrev, A. L. Goldberger, Z. D. Goldberger, exponents multifractal rigidity,” Chaos: Interdiscipl. J. Nonlinear Sci., S. Havlin, R. N. Mantegna, S. M. Ossadnik, C.-K. Peng, and M. vol. 7, no. 1, pp. 27–53, 1996. Simons, “Statistical mechanics in biology: How ubiquitous are long- [32] D. Ron, Y. Singer, and N. Tishby, “The power of amnesia,” Mach. range correlations?,” Phys. A, vol. 205, pp. 214–253, 1994. Learn., vol. 25, 1996. [4] J. Jeffrey, “Chaos game representation of gene structure,” Nucl. Acids [33] M. J. Weinberger, J. J. Rissanen, and M. Feder, “A universal finite Res., vol. 18, no. 8, pp. 2163–2170, 1990. memory source,” IEEE Trans. Inform. Theory, vol. 41, pp. 643–652, [5] M. F. Barnsley, Fractals Everywhere. New York: Academic, 1988. May 1995. [6] J. L. Oliver, P. Bernaola-Galvan,´ J. Guerrero-Garcia, and R. Roman´ [34] P. Tinoˇ and G. Dorffner, “Constructing finite-context sources from Roldan, “Entropic profiles of DNA sequences through chaos-game- fractal representations of symbolic sequences,” Tech. Rep. TR-98-18, derived images,” J. Theor. Biol., no. 160, pp. 457–470, 1993. Austrian Res. Inst. Artif. Intell., 1998. [7] R. Roman-Roldan, P. Bernaola-Galvan, and J. L. Oliver, “Entropic feature for sequence pattern through iteration function systems,” Pattern Recognit. Lett., vol. 15, pp. 567–573, 1994. [8] S. Basu, A. Pan, C. Dutta, and D. J, “Chaos game representation of proteins,” J. Mol. Graph. Model., vol. 15, no. 5, pp. 279–289, 1997. [9] V. V. Solovyev, S. V. Korolev, and H. A. Lim, “A new approach for the classification of functional regions of DNA sequences based on fractal A Kurtosis-Based Dynamic Approach representation,” Int. J. Genom. Res., vol. 1, no. 1, pp. 109–128, 1993. to Gaussian Mixture Modeling [10] A. Fiser, G. E. Tusnady, and I. Simon, “Chaos game representation of protein structures,” J. Mol. Graph., vol. 12, no. 4, pp. 302–304, 1994. Nikos Vlassis and Aristidis Likas [11] K. A. Hill and S. M. Singh, “The evolution of species-type specificity in the global DNA sequence organization of mitochondrial genomes,” Genome, vol. 40, pp. 342–356, 1997. [12] W. Li, “The study of correlation structures of DNA sequences: A critical Abstract— We address the problem of probability density function review,” Comput. Chem., vol. 21, no. 4, pp. 257–272, 1997. estimation using a Gaussian mixture model updated with the expectation- [13] R. Roman-Roldan, P. Bernaola-Galvan, and J. L. Oliver, “Application maximization (EM) algorithm. To deal with the case of an unknown of information theory to DNA sequence analysis: A review,” Pattern number of mixing kernels, we define a new measure for Gaussian Recognit., vol. 29, no. 7, pp. 1187–1194, 1996. mixtures, called total kurtosis, which is based on the weighted sample [14] Y. Pesin, Dimension Theory in Dynamical Systems: Rigorous Results kurtoses of the kernels. This measure provides an indication of how well and Applications. Chicago, IL: Univ. of Chicago Press, 1997. the Gaussian mixture fits the data. Then we propose a new dynamic [15] P. Tinoˇ and M. Kotele¨ s,˘ “Extracting finite state representations from algorithm for Gaussian mixture density estimation which monitors the recurrent neural networks trained on chaotic symbolic sequences,” IEEE total kurtosis at each step of the EM algorithm in order to decide Trans. Neural Networks, vol. 10, pp. 284–302, Mar. 1999. dynamically on the correct number of kernels and possibly escape from [16] R. Kenyon and Y. Peres, “Measures of full dimension on affine invariant local maxima. We show the potential of our technique in approximating sets,” Ergod. Theory Dynam. Syst., vol. 16, pp. 307–323, 1996. unknown densities through a series of examples with several density [17] A. I. Khinchin, Mathematical Foundations of Information Theory. New estimation problems. York: Dover, 1957. [18] A. Renyi, “On the dimension and entropy of probability distributions,” Index Terms— Expectation-maximization (EM) algorithm, Gaussian Acta Math. Hung., no. 10, pp. 193, 1959. mixture modeling, number of mixing kernels, probability density function [19] P. Grassberger, “Information and complexity measures in dynamical sys- estimation, total kurtosis, weighted kurtosis. tems,” in Information Dynamics, H. Atmanspacher and H. Scheingraber, Eds. New York: Plenum, 1991, pp. 15–33. [20] J. P. Crutchfield and K. Young, “Computation at the onset of chaos,” I. INTRODUCTION in Complexity, Entropy, and the Physics of Information, SFI Studies in The Gaussian mixture model [1] has been proposed as a general the Sciences of Complexity, W. H. Zurek, Ed. Reading, MA: Addison- Wesley, 1990, pp. 223–269, vol. 8. model for estimating an unknown probability density function, or [21] C. Beck and F. Schlogl, Thermodynamics of Chaotic Systems. Cam- simply density. The virtues of the model lie mainly in its good bridge, U.K.: Cambridge Univ. Press, 1995. [22] J. L. McCauley, Chaos, Dynamics and Fractals: An Algorithmic Ap- Manuscript received September 17, 1998. proach to Deterministic Chaos. Cambridge, U.K.: Cambridge Univ. N. Vlassis is with the RWCP, Autonomous Learning Functions SNN, Press, 1994. Department of Computer Science, University of Amsterdam, 1098 SJ Am- [23] R. H. Riedi, “Conditional and relative multifractal spectra,” Fractals, sterdam, The Netherlands (e-mail: [email protected]). vol. 5, no. 1, pp. 153–168, 1997. A. Likas is with the Department of Computer Science, University of [24] K. J. Falconer, Fractal Geometry: Mathematical Foundations and Ap- Ioannina, 45110 Ioannina, Greece (e-mail: [email protected]). plications. New York: Wiley, 1990. Publisher Item Identifier S 1083-4427(99)05344-8. 1083–4427/99$10.00 1999 IEEE 394 IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART A: SYSTEMS AND HUMANS, VOL. 29, NO. 4, JULY 1999 approximation properties and the variety of estimation algorithms that kurtosis that is needed by the algorithm and then show how the algo- exist in the literature [1], [2]. The model assumes that the unknown rithm can make use of this quantity to produce better solutions. We density can be written as a weighted finite sum of Gaussian kernels, consider the univariate case only. Work is under progress to extend the with different mixing weights and different parameters, namely, definition and use of total kurtosis in higher dimensions. Section IV means and covariance matrices. Then, depending on the estimation gives experimental results from the application of the algorithm to algorithm, an optimum vector of these parameters is sought that density estimation problems, while Section V summarizes and gives optimizes some criterion. Most often, the estimation of the parameters hints for future research. of the mixture is carried out by the maximum likelihood method, aiming at maximizing the likelihood of a set of samples drawn II. GAUSSIAN MIXTURES AND THE EM ALGORITHM independently from the unknown density. One of the algorithms often used for Gaussian mixture model- A. Gaussian Mixtures ing is the expectation-maximization (EM) algorithm, a well-known x statistical tool for maximum likelihood problems [3]. The algorithm We say that a random variable has a finite mixture distribution p@xA provides iterative formulae for the estimation of the unknown param- when its probability density function can be written as a finite eters of the mixture, and can be proven to monotonically increase in weighted sum of known densities, or simply kernels. In cases where x each step the likelihood function. However, the main drawbacks of each kernel is the Gaussian density, we say that follows a Gaussian u EM is that it requires an initialization of the parameter vector near the mixture. For the univariate case and for a number of Gaussian solution, and also it assumes that the total number of mixing kernels kernels, the unknown mixture can be written is known in advance. u To overcome the above limitations, we propose in this paper a p@xAa %j p@xjjA (1) novel dynamic algorithm for Gaussian mixture modeling that starts jaI u u aI with a small number of kernels (usually ) and performs EM where p@xjjA stands for the univariate Gaussian x@"j Y'j A steps in order to maximize the likelihood of the data, while at the same I @x " AP time monitors the value of a new measure of the mixture, called total p@xjjAa p exp j P'P (2) kurtosis, that indicates how well the Gaussian mixture fits the input 'j P% j data.

Load more