Parameter Specification for Fuzzy Clustering by Q-Learning
Total Page:16
File Type:pdf, Size:1020Kb
Parameter Specification for Fuzzy Clustering by Q-Learning Chi-Hyon Oh, Eriko Ikeda, Katsuhiro Honda and Hidetomo Ichihashi Department of Industrial Engineering, College of Engineering, Osaka Prefecture University, 1–1, Gakuencho, Sakai, Osaka, 599-8531, JAPAN E-mail: [email protected] Abstract In this paper, we propose a new method to specify the sequence of parameter values for a fuzzy clustering algorithm by using Q-learning. In the clustering algorithm, we employ similarities between two data points and distances from data to cluster centers as the fuzzy clustering criteria. The fuzzy clustering is achieved by optimizing an objective function which is solved by the Picard iteration. The fuzzy clustering algorithm might be useful but its result depends on the parameter specifications. To conquer the dependency on the parameter values, we use Q-learning to learn the sequential update for the parameters during the iterative optimization procedure of the fuzzy clustering. In the numerical example, we show how the clustering validity improves by the obtained parameter update sequences. Keywords Parameter Specification, Fuzzy Clustering, Reinforcement Learning. I. Introduction Many fuzzy clustering algorithms have been proposed since Ruspini developed the first one [1]. Fuzzy ISODATA [2] and its extension, Fuzzy c-means [3], are the popular fuzzy clustering algorithm using distance- based objective function methods. Other approaches are driven by several fuzzy clustering criteria. First, we propose a new fuzzy clustering algorithm, which adopts the similarities between two data point and the distances from data to cluster centers as the fuzzy clustering criteria to be optimized. We can expect that our algorithm might ably partition the data set into some clusters. However, the difficulty lies in specifying the parameters values as any other algorithms. Reinforcement learning is a class of learning algorithms for control and navigation problems where a reward is given as a result of a series of actions [4]. Q-learning [5] is a widely-used reinforcement learning algorithm in which the exact model for the target problem is not necessarily known. Regarding the desirability of fuzzy clustering result as a reward, we apply Q-learning to the parameter specification problem for the fuzzy clustering. In our method, The transition of the parameter specifications to obtain preferable clustering results is learned by using Q-learning. II. Fuzzy Clustering Algorithm In our fuzzy clustering algorithm, we employ two fuzzy clustering criteria. One is the similarities between two data points and the other is the distances from data to cluster centers. We optimize the following objective function L. The optimization procedure is itself a clustering algorithm. XC n XN XN XN o T max L = (1 − α) uciwcjg(dij) − α ucj(xj − vc) (xj − vc) c=1 i=1 j=1,j6=i j=1 XC XN XN XC XC XN −β Cucj log ucj + Nwcj log wcj + λj ucj − 1 + γc wcj − 1 , (1) c=1 j=1 j=1 c=1 c=1 j=1 where, 2 dij g(dij) = exp − . (2) σ C is the number of clusters and N is that of data. xj represents a data point and vc does a cluster center. ucj and wcj denote the membership values of j-th data to c-th cluster. We represent similarities by g(·) where dij is the absolute distance between i-th and j-th data and σ is a control parameter. The first term can be maximized by assigning larger memberships to a cluster if the pair of data has large similarities. Simultaneously, the first term also minimizes the distances between data and cluster centers. The second term represents regularization to obtain fuzzy clusters proposed by Miyamoto et al. [6]. β is the weighting parameter which specifies the degree of fuzziness to the clusters. α is a constant which defines the tradeoff between two fuzzy criteria. λj and γc are the Lagrangean multipliers respectively. We optimize the objective function L with respect to ucj, wcj and vc. From the necessary condition ∂L/∂vci = 0 for the optimality of the objective function L, we have XN . XN vck = uckxki uck,c=1, 2, ···,C, k=1, 2, ···,I, (3) j=1 j=1 where I is the number of items (i.e. dimensionality of data). From ∂L/∂uci =0,∂L/∂wcj , ∂L/∂λj and ∂L/∂γc, we have . XC uci = exp Aci exp Aai,c=1, 2, ···,C, i=1, 2, ···,N, (4) a=1 where n XN XN o. T Aai = (1 − α) wcjg(dij ) − α (xj − va) (xj − va) βC, (5) j=1,j6=i j=1 and . XC wcj = exp Bcj exp Baj ,c=1, 2, ···,C, j =1, 2, ···,N, (6) a=1 where XN . Baj =(1− α) uaig(dij ) βN. (7) i=1,i6=j The optimization algorithm is based on the Picard iteration through necessary conditions for local minima of the objective function L. Our fuzzy clustering algorithm is as follows: Algorithm Step 1: Set values of parameters α, β, σ and . Initialize memberships uci randomly. Step 2: Calculate cluster centers vc using Eq.(3). Step 3: Update memberships wcj using Eqs.(6) and (7). Step 4: Update memberships uci using Eqs.(4) and (5). NEW OLD Step 5: If max |uci − uci | <, then stop. Otherwise, return to Step 2. III. Paramtere Specification by Q-Learning Generally, there are many difficulties to specify the parameters for fuzzy clustering algorithms. For that problem, we propose a new method to specify the parameters by using Q-Learning. Q-Learning is a popular reinforcement learning method, in which learning is proceeded by a reward given as a result of sequential actions. Regarding the desirability of the clustering as a reward in Q-Learning, we embed the parameter modification procedure into the fuzzy clustering algorithm. In our method, α and β in Eq.(1) are specified. We assume that the range of α is [0.1, 1.0] and that of β is [0.005, 0.05]. First, we divide the parameter specification space into 10 discrete states in Fig.1(a). The 0.05 a1 a4 a2 a3 0.005 0.1 1.0 (a) Paramter Specification Space (b) Four Possible Actions Fig. 1. States and Actions position is denoted by a two-dimensional vector y =(yα,yβ). We assume that we can choose four possible actions (a1, a2, a3, a4) moving to four adjacent states in Fig.1(b). Each state action pair is assigned an evaluation value called Q-value Q(y,al). In the learning, the Q-value is updated according to the evaluation of the next state caused by the action and to the reward which is given as a result of the sequential action procedure. More precisely, when an action at is taken and a state transition from yt to yt+1 happens, the Q-value is updated in the following way: Q(yt,at)=(1− ρ)Q(yt,at)+ρ(rt + ηV (yt+1)), (8) where ρ is the learning rate, η is the discount rate, rt is the reward obtained by the move from yt to yt+1 and V (yt+1)is V (yt+1) = max{Q(yt+1,al) | l =1, 2, 3, 4}. (9) In the Q-learning, each action al is chosen with a probability defined by Q-value. We use the following selection probability in our method. X4 P (al)={Q(yt,al) − Qmin} {Q(yt,al) − Qmin}, (10) k=1 Qmin = min{Q(yt,al) | l =1, 2, 3, 4}. (11) In our method, we update α and β in Eq.(1) during the iterative optimization algorithm in Section 2. We can expect that our method will find the transitions of the parameter specifications which well adjust the tradeoff between two fuzzy clustering criteria and fuzziness. The algorithm is written as follows: Algorithm Step 1: Initialize the Q-values Q(y,al) as one. Step 2: Set the initial position y0 at (1.0, 0.05). Step 3: Conduct one iteration of the optimization algorithm in Section 2 using current α and β. Step 4: Choose an action according to Q-values and move to the next states. Step 5: Update Q-value using Eqs.(8) and (9). Step 6: If the prespecified condition for the termination of sequential actions is satisfied, then go to Step 7. Otherwise, return to Step 3. Step 7: If the iteration of sequential actions reaches at the prespecified number, then stop. Otherwise, return to Step 2. 1.0 0.8 0.6 0.4 0.2 0.0 0.0 0.2 0.4 0.6 0.8 1.0 Fig. 2. The Numerical Example IV. Numerical Example In this section, we apply our parameter specification algorithm to a two-dimensional clustering problem which is given a preferable clustering result in advance. The clustering problem is shown in Fig.2. From Fig.2, we can apparently see the desirable clustering result. However it is difficult for the fuzzy clustering algorithm without the well-specified parameters to ably partition the data set in Fig.2 into two clusters because of the tailed characteristics of the data set. We show the percentage of the desirability obtained by our specification method and fuzzy clustering without parameter specifications below. The learning rate ρ and the discount rate η in Eq.(8) are set to 0.9 respectively. We compare our specification method with the fuzzy clustering in Section 2 whose α and β are set to 0.5 and 0.005 respectively. The parameter specification method : 87.5% The fuzzy clustering algorithm without parameter specifications : 84.0% From the result, we see our parameter specification method shows better performance than the fuzzy clus- tering without parameter specifications.