Analog Value Associative Memory Using Restricted Boltzmann Machine Yuichiro Tsutsui and Masafumi Hagiwara†

https://doi.org/10.20965/jaciii.2019.p0060 Tsutsui, Y. and Hagiwara, M. Paper: Analog Value Associative Memory Using Restricted Boltzmann Machine Yuichiro Tsutsui and Masafumi Hagiwara† Department of Information and Computer Science, Keio University 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan E-mail: {tsutsui,hagiwara}@soft.ics.keio.ac.jp †Corresponding author [Received March 9, 2018; accepted October 22, 2018] In this paper, we propose an analog value associa- the memory capacity by numerical experiments in the tive memory using Restricted Boltzmann Machine case where size of the network is small, the number of (AVAM). Research on treating knowledge is becoming neurons is 30. Boltzmann machine is a type of stochas- more and more important such as in natural language tic recurrent neural network and can be regarded as the processing and computer vision fields. Associative stochastic, generative counterpart of Hopfield network. memory plays an important role to store knowledge. In [19], compared with the capacity of Hopfield network, First, we obtain distributed representation of words denoting the number of neurons as N, around 0.14N,the with analog values using word2vec. Then the ob- capacity of the associative memory using Boltzmann ma- tained distributed representation is learned in the pro- chine is around 0.60N. The capacity becomes large, how- posed AVAM. In the evaluation experiments, we found ever, learning is very slow in Boltzmann machine with simple but very important phenomenon in word2vec many hidden layers because large networks should take method: almost all of the values in the generated vec- long time to approach their equilibrium distribution. tors are small values. By applying traditional normal- We have proposed a novel associative memory based ization method for each word vector, the performance on RBM (Restricted Boltzmann Machine) [20, 21]. How- of the proposed AVAM is largely improved. Detailed ever, since treating analog value is difficult, they are experimental evaluations are carried out to show su- converted to discrete values: it deteriorates memory ca- perior performance of the proposed AVAM. pacity. Usage of analog value is extremely effective in many fields especially in natural language processing. In this field, vector expression of words by word2vec Keywords: semantic network, Restricted Boltzmann Ma- method [22] was proposed and it has contributed rapid ad- chine, word2vec, associative memory vancement in this field such as machine translation [23], recommendation system [24, 25], dialog system [26], and so on. This is because similarity between words can be 1. Introduction easily calculated owing to the vector form. In addition, distributed representation has inherent ability to cope with Demands for knowledge processing have been increas- unlearned words by utilizing vector similarity. ing nowadays [1–3]. Several methods have been proposed In this paper, we propose an analog value associative to store knowledge in computers such as semantic net- memory using Restricted Boltzmann Machine (AVAM). work [4, 5], frame model [6, 7], ontology [8], and asso- In Section 2, we briefly explain Restricted Boltzmann ciative memories. Machine, and word2vec method [22]. In Section 3, pro- Especially, associative memories have been attracting posed analog value associative memory using Restricted much attention because of the following three kinds of Boltzmann Machine (AVAM) is explained. In Section 4, abilities; 1) the ability to correct faults if false information evaluation experiments are explained. Here, we found is input: 2) the ability to complete information if some very simple but very important phenomenon in word2vec parts are missing: 3) the ability to interpolate informa- method: almost all of the values in the generated vectors tion, that means if a pattern is not stored, the most similar are small values. By applying traditional normalization stored pattern could be retrieved. There are some original method for each word vector, the performance of the pro- associative memory models such as Willshaw model [9], posed AVAM is largely improved. Detailed experimental Associatron [10], Kohonen model [11], Hopfield Associa- evaluations are shown. Section 5 concludes this paper. tive Memory (HAM) [12, 13], Bidirectional Associative Memory (BAM) [14, 15], and Multidirectional Associa- tive Memory (MAM) [16]. Kojima et al. [17] applied Boltzmann machine learning [18] to an associative memory model and evaluated 60 Journal of Advanced Computational Intelligence Vol.23 No.1, 2019 and Intelligent Informatics © Fuji Technology Press Ltd. Creative Commons CC BY-ND: This is an Open Access article distributed under the terms of the Creative Commons Attribution-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nd/4.0/). Analog Value Associative Memory Using RBM ⎧ ⎫ m ⎨⎪ 2 ⎬⎪ vi − ai − ∑ wijh j p(vi|h,θ) ∝ exp j= . (4) ⎩⎪ − 1 ⎭⎪ 2 2σi From the above equations, the visible layer follows the a + m w h Gaussian distribution with the mean i ∑ j=1 ij j and 2 the variance σi . Update rule for RBM is summarized as, Fig. 1. Structure of Restricted Boltzmann Machine. Δwij = ε(vih jdata −vih jmodel). ..... (5) Δai = ε(vidata −vimodel). ....... (6) 2. Restricted Boltzmann Machine and Δb j = ε(h jdata −h jmodel). ....... (7) word2vec Here, ε is the learning rate. Since the second term ·model is difficult to calculate, contrastive divergence 2.1. Restricted Boltzmann Machine (CD) learning method has been proposed [29]. Algorithm Restricted Boltzmann Machine (RBM) is a kind of neu- of contrastive divergence is summarized as, ( ) ral network and the structure is shown in Fig. 1. 1. Enter the training data v 0 in the visible layer. Generally it is used as a generative model [27]. In re- i (0) cent years, usage of RBM as an associative memory has 2. Calculate probability p j that h j becomes 1 based also been proposed [21, 28]. In this paper, we use RBM on the value of visible layer. as an associative memory. RBM has two layers of a vis- (0) 3. Determines the hidden layer value h according to ible layer and a hidden layer, and the nodes in the visi- j the binomial distribution. ble layer and the nodes in the hidden layer are bidirec- p(1) v tionally fully-connected. However, there is no connection 4. Calculate probability i that i becomes 1 based on between nodes in the visible layer and nodes in the hid- the value of the hidden layer. (1) den layer. Owing to the limitation, amount of calculations 5. Determines the visible layer value vi according to can be largely reduced. Regular RBM takes discrete val- the Gaussian distribution. ues for both visible layer and hidden layer. Such RBM (1) 6. Calculate probability p that h j becomes 1 based is called Bernoulli RBM [27]. In order to treat analog j on the value of visible layer. values, Gaussian- Bernoulli RBM [27] is employed in the proposed AVAM. 7. Repeat 2 ∼ 6forT times. In general, the number of iterations T should be small such as 1 [27]. 2.2. Learning of Gaussian-Bernoulli RBM (0) (0) (1) (1) Δwij = ε vi p j − pi p j . ...... (8) Let the value of ith node in the visible layer vi and the j h E value of -th node in the hidden layer j,energy in the Δa = ε v(0) − p(1) . network is expressed as [27], i i i .......... (9) (0) (1) Δb j = ε h − p . ..........(10) E(v,h,θ) j j n m n m (vi − ai)2 vi 2.3. word2vec = − ∑ − ∑ b jh j − ∑ ∑ wij h j. σ 2 σ i=1 2 i j=1 i=1 j=1 i word2vec [22] is a neural network that transforms ...................(1) words into a distributed representation. By using the distributed representation of words, we can calculate the sim- a b Here, i is the bias in the visible layer, j is the bias ilarity between words. The similarity between word vec- w i in the hidden layer, and ij is the weight between the th tors p and q is given by the following equation. j θ node and the -th node. means all of the parameter set. p · q σi is the standard deviation in the training data. (p · q)= . cos |p|·|q| ..........(11) The state in the hidden layer is calculated as follows. n By learning from a large amount of sentences, similar p(h j = 1|v,θ)=ς1 b j + ∑ wijvi . .... (2) words become similar vectors [30]. word2vec is used i=1 in various research such as machine translation [23], recommendation system [24, 25], and dialog system [26] as 1 ς (x)= . mentioned before. 1 + e−x ............ (3) 1 As a model of word2vec in this paper, we used the C- Here, ς1(x) means sigmoid function. State of visible layer BOW (continuous bag-of-words) model [31]. This is the is calculated as, standard model to learn distributed representation. Vol.23 No.1, 2019 Journal of Advanced Computational Intelligence 61 and Intelligent Informatics Tsutsui, Y. and Hagiwara, M. Table 1. Learning parameters for word2vec. Dimension of vector 200 Window size 8 Negative sampling 25 Down sample rate for high frequency words 1.0 · 10−4 Hierarchical soft-max none Number of repeated training 15 3. Both word1 layer and word2 layer are considered as one visible layer. Fig. 2. Structure of the proposed analog value associative memory using RBM (AVAM). 4. Activation of hidden layer from that of visible layer is calculated. 5. Activation of visible layer from that of hidden layer 3. Proposed Analog Value Associative Mem- is calculated. ory Using Restricted Boltzmann Machine 6. ‘apple’ vector is input to the word1 layer again. (AVAM) 7. Repeat 4 ∼ 6. 8. Vector in the word2 layer is regarded as the output. Figure 2 shows the structure of the proposed analog value associative memory using RBM (AVAM).

Load more