https://doi.org/10.20965/jaciii.2019.p0060 Tsutsui, Y. and Hagiwara, M.

Paper: Analog Value Associative Memory Using Restricted Yuichiro Tsutsui and Masafumi Hagiwara†

Department of Information and Computer Science, Keio University 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan E-mail: {tsutsui,hagiwara}@soft.ics.keio.ac.jp †Corresponding author [Received March 9, 2018; accepted October 22, 2018]

In this paper, we propose an analog value associa- the memory capacity by numerical experiments in the tive memory using Restricted Boltzmann Machine case where size of the network is small, the number of (AVAM). Research on treating knowledge is becoming neurons is 30. Boltzmann machine is a type of stochas- more and more important such as in natural language tic and can be regarded as the processing and computer vision fields. Associative stochastic, generative counterpart of Hopfield network. memory plays an important role to store knowledge. In [19], compared with the capacity of Hopfield network, First, we obtain distributed representation of words denoting the number of neurons as N, around 0.14N,the with analog values using . Then the ob- capacity of the associative memory using Boltzmann ma- tained distributed representation is learned in the pro- chine is around 0.60N. The capacity becomes large, how- posed AVAM. In the evaluation experiments, we found ever, learning is very slow in Boltzmann machine with simple but very important phenomenon in word2vec many hidden layers because large networks should take method: almost all of the values in the generated vec- long time to approach their equilibrium distribution. tors are small values. By applying traditional normal- We have proposed a novel associative memory based ization method for each word vector, the performance on RBM (Restricted Boltzmann Machine) [20, 21]. How- of the proposed AVAM is largely improved. Detailed ever, since treating analog value is difficult, they are experimental evaluations are carried out to show su- converted to discrete values: it deteriorates memory ca- perior performance of the proposed AVAM. pacity. Usage of analog value is extremely effective in many fields especially in natural language processing. In this field, vector expression of words by word2vec Keywords: semantic network, Restricted Boltzmann Ma- method [22] was proposed and it has contributed rapid ad- chine, word2vec, associative memory vancement in this field such as machine translation [23], recommendation system [24, 25], dialog system [26], and so on. This is because similarity between words can be 1. Introduction easily calculated owing to the vector form. In addition, distributed representation has inherent ability to cope with Demands for knowledge processing have been increas- unlearned words by utilizing vector similarity. ing nowadays [1–3]. Several methods have been proposed In this paper, we propose an analog value associative to store knowledge in computers such as semantic net- memory using Restricted Boltzmann Machine (AVAM). work [4, 5], frame model [6, 7], ontology [8], and asso- In Section 2, we briefly explain Restricted Boltzmann ciative memories. Machine, and word2vec method [22]. In Section 3, pro- Especially, associative memories have been attracting posed analog value associative memory using Restricted much attention because of the following three kinds of Boltzmann Machine (AVAM) is explained. In Section 4, abilities; 1) the ability to correct faults if false information evaluation experiments are explained. Here, we found is input: 2) the ability to complete information if some very simple but very important phenomenon in word2vec parts are missing: 3) the ability to interpolate informa- method: almost all of the values in the generated vectors tion, that means if a pattern is not stored, the most similar are small values. By applying traditional normalization stored pattern could be retrieved. There are some original method for each word vector, the performance of the pro- associative memory models such as Willshaw model [9], posed AVAM is largely improved. Detailed experimental Associatron [10], Kohonen model [11], Hopfield Associa- evaluations are shown. Section 5 concludes this paper. tive Memory (HAM) [12, 13], Bidirectional Associative Memory (BAM) [14, 15], and Multidirectional Associa- tive Memory (MAM) [16]. Kojima et al. [17] applied Boltzmann machine learn- ing [18] to an associative memory model and evaluated

60 Journal of Advanced Computational Intelligence Vol.23 No.1, 2019 and Intelligent Informatics

© Fuji Technology Press Ltd. Creative Commons CC BY-ND: This is an Open Access article distributed under the terms of the Creative Commons Attribution-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nd/4.0/). Analog Value Associative Memory Using RBM ⎧ ⎫  m ⎨⎪ 2 ⎬⎪ vi − ai − ∑ wijh j p(vi|h,θ) ∝ exp j= . (4) ⎩⎪ − 1 ⎭⎪ 2 2σi From the above equations, the visible layer follows the a + m w h Gaussian distribution with the mean i ∑ j=1 ij j and 2 the variance σi . Update rule for RBM is summarized as, Fig. 1. Structure of Restricted Boltzmann Machine. Δwij = ε(vih jdata −vih jmodel)...... (5)

Δai = ε(vidata −vimodel)...... (6)

2. Restricted Boltzmann Machine and Δb j = ε(h jdata −h jmodel)...... (7) word2vec Here, ε is the learning rate. Since the second term ·model is difficult to calculate, contrastive divergence 2.1. Restricted Boltzmann Machine (CD) learning method has been proposed [29]. Algorithm Restricted Boltzmann Machine (RBM) is a kind of neu- of contrastive divergence is summarized as, ( ) ral network and the structure is shown in Fig. 1. 1. Enter the training data v 0 in the visible layer. Generally it is used as a generative model [27]. In re- i (0) cent years, usage of RBM as an associative memory has 2. Calculate probability p j that h j becomes 1 based also been proposed [21, 28]. In this paper, we use RBM on the value of visible layer. as an associative memory. RBM has two layers of a vis- (0) 3. Determines the hidden layer value h according to ible layer and a hidden layer, and the nodes in the visi- j the binomial distribution. ble layer and the nodes in the hidden layer are bidirec- p(1) v tionally fully-connected. However, there is no connection 4. Calculate probability i that i becomes 1 based on between nodes in the visible layer and nodes in the hid- the value of the hidden layer. (1) den layer. Owing to the limitation, amount of calculations 5. Determines the visible layer value vi according to can be largely reduced. Regular RBM takes discrete val- the Gaussian distribution. ues for both visible layer and hidden layer. Such RBM (1) 6. Calculate probability p that h j becomes 1 based is called Bernoulli RBM [27]. In order to treat analog j on the value of visible layer. values, Gaussian- Bernoulli RBM [27] is employed in the proposed AVAM. 7. Repeat 2 ∼ 6forT times. In general, the number of iterations T should be small such as 1 [27]. 2.2. Learning of Gaussian-Bernoulli RBM (0) (0) (1) (1) Δwij = ε vi p j − pi p j ...... (8) Let the value of ith node in the visible layer vi and the j h E value of -th node in the hidden layer j,energy in the Δa = ε v(0) − p(1) . network is expressed as [27], i i i ...... (9) (0) (1) Δb j = ε h − p ...... (10) E(v,h,θ) j j n m n m (vi − ai)2 vi 2.3. word2vec = − ∑ − ∑ b jh j − ∑ ∑ wij h j. σ 2 σ i=1 2 i j=1 i=1 j=1 i word2vec [22] is a neural network that transforms ...... (1) words into a distributed representation. By using the dis- tributed representation of words, we can calculate the sim- a b Here, i is the bias in the visible layer, j is the bias ilarity between words. The similarity between word vec- w i in the hidden layer, and ij is the weight between the th tors p and q is given by the following equation. j θ node and the -th node. means all of the parameter set. p · q σi is the standard deviation in the training data. (p · q)= . cos |p|·|q| ...... (11) The state in the hidden layer is calculated as follows. n By learning from a large amount of sentences, similar p(h j = 1|v,θ)=ς1 b j + ∑ wijvi . .... (2) words become similar vectors [30]. word2vec is used i=1 in various research such as machine translation [23], rec- ommendation system [24, 25], and dialog system [26] as 1 ς (x)= . mentioned before. 1 + e−x ...... (3) 1 As a model of word2vec in this paper, we used the C- Here, ς1(x) means . State of visible layer BOW (continuous bag-of-words) model [31]. This is the is calculated as, standard model to learn distributed representation.

Vol.23 No.1, 2019 Journal of Advanced Computational Intelligence 61 and Intelligent Informatics Tsutsui, Y. and Hagiwara, M.

Table 1. Learning parameters for word2vec.

Dimension of vector 200 Window size 8 Negative sampling 25 Down sample rate for high frequency words 1.0 · 10−4 Hierarchical soft-max none Number of repeated training 15

3. Both word1 layer and word2 layer are considered as one visible layer. Fig. 2. Structure of the proposed analog value associative memory using RBM (AVAM). 4. Activation of hidden layer from that of visible layer is calculated. 5. Activation of visible layer from that of hidden layer 3. Proposed Analog Value Associative Mem- is calculated. ory Using Restricted Boltzmann Machine 6. ‘apple’ vector is input to the word1 layer again. (AVAM) 7. Repeat 4 ∼ 6. 8. Vector in the word2 layer is regarded as the output. Figure 2 shows the structure of the proposed analog value associative memory using RBM (AVAM). 9. The word having the highest cosine similarity is re- The network consists of visible layer and hidden layer. garded as the recalled word. The neurons in the visible layer take analog values and The number of repetition in the above steps 4 ∼ 6was those in the hidden layer take discrete values; Gaussian 100 in the experiment. This kind of recall is carried out (analog) - Bernoulli (discrete) RBM is employed. There for each input word. This recall corresponds to a hetero are two layers in the visible layer to receive two words. association. Since only a half input is given to the visible Namely, the proposed network works as a hetero asso- layer (word1 layer and word2 layer), the recall is more ciative memory. When one layer in the visible layer is difficult than an auto association, in which all of the neu- removed, the network works as an auto associative mem- rons in the visible layer receive input. ory. We explain the learning and recall of the proposed AVA M . 4. Evaluation Experiments 3.1. Learning of AVAM 4.1. Learning of word2vec Here, we explain the learning of AVAM using an exam- ple when the knowledge ‘apple is-a fruit’ is learned. Since large-sized corpus is required to generate word vectors using word2vec [22], we employed Wikipedia 1. ‘apple’ vector is input to the word1 layer in the visi- corpus of 6.15 GB (https://dumps.wikimedia.org/jawiki/ ble layer. 20150901/). As a result, distributed representations of 2. Similarly, ‘fruit’ vector is input to the word2 layer. about 1,500,000 words were obtained. Table 1 summa- rizes the parameters. 3. word1 layer and word2 layer are regarded as one vis- ible layer. 4.2. Learning of Relationship (Without Normaliza- 4. Learning is carried out using contrastive divergence tion) method [29]. Also, a mini-batch learning method in which one update is carried out after averaging up- The word vectors obtained by word2vec is memorized date values for nb (= mini-batch size) learning sam- in the proposed AVAM. The relations of words were ex- ples. tracted from the concepts association dictionary [32]. Since the dictionary was constructed manually, the re- This procedure is applied to all of the learning data. liance can be considered as relatively high. The dictionary outputs some related words to the input word. Each input- output word pair belongs to one of the following 7 rela- 3.2. Recall of AVAM tions: upper concept, lower concept, a part of concept, We explain the procedure to recall ‘fruit’ from ‘apple’ attribute, environment, action concept, and synonym. In after the knowledge ‘apple is-a fruit’ has been learned. this experiment, word pairs of upper concept are used. In this dictionary, 9,053 upper concept of relations are de- 1. ‘apple’ vector is input to the word1 layer. fined. The upper concept is defined as ‘A is-a B,’ let ‘A’ 2. 0 vector is input to the word2 layer. be the stimulus word and ‘B’ the recall word. There are

62 Journal of Advanced Computational Intelligence Vol.23 No.1, 2019 and Intelligent Informatics Analog Value Associative Memory Using RBM

Table 2. Examples of learning data from the concepts asso- ciation dictionary.

‘Japanese clothes is-a clothes’ ‘wifeis-aperson’ ‘reply is-a action’ ‘rabbit is-a pet’ ‘chair is-a furniture’

Fig. 4. Distribution of vector element values after normalization.

Fig. 3. Distribution of vector element values generated by word2vec.

71 words appearing more than 20 times as the recalled upper concept words. One relation was taken from such a high frequency Fig. 5. Learning result of normalized vector. word, and 71 relations were taken as learning data. Ta- ble 2 summarizes the examples of learning data. This learning data was converted to a vector form by x σ 2 word2vec and learned. We used four learning rates; Here, ¯ means average, and means variance. Fig. 4 1.0 · 10−1,1.0 · 10−2,1.0 · 10−3,and1.0 · 10−4.Dur- shows the distribution of vector element values after nor- ing learning, a recall experiment was conducted every malization. 100 epochs. We input a stimulus word in the word1 part in Fig. 2 and a recall experiment was carried out based on 4.4. Learning of Relationship (After Normaliza- recall process explained in Section 3.2. tion) As a result of the learning, the highest correct recall After vector normalization, we carried out a similar rate was 7.04%. This result indicates that there is some- experiment as explained in Section 4.2. 71 word rela- thing wrong in the learning process. We tried to find the tions were learned in the proposed AVAM. For parame- cause. We examined the values in vectors generated by ters, those showing the highest correct recall rate in the word2vec and found that values of vector elements were experiment in Section 4.2 were used. very low as shown in Fig. 3. The variance was 0.0046, av- Fig. 5 shows the results. As shown in this figure, the erage was 0.0017 in this case. In [27], values of variance correct recall rate was largely improved by the normaliza- = = 1.0 and average 1.0 are suggested to use Gaussian tion. RBM. Therefore, we determined to normalize the vector generated by word2vec. 4.5. Experiment of Larger Sized Relations 4.3. Normalization of Vectors From Section 4.4, we found that the proposed AVAM The number of word vectors generated by word2vec can work as an associative memory by the normaliza- is about 1,500,000, however, many noisy words such as tion. In this section, we conduct a detailed evaluation blank, non-Japanese words, symbols, etc. are contained. of the proposed AVAM using large amount of data. As In order to eliminate the noise effect, we calculated av- the previous experiment, learning data was created from erage and variance using the words contained in the asso- the associative concept dictionary [32]. For each of up- ciative concept dictionary [32]. The number of words was per words (hypernyms) used in Section 4.2, 15 relation- 18,740. The normalization is carried out as follows. ships were extracted. Out of the 1,065 relationships ac- xoriginal − x¯ quired, we used 992 relations in which lower words (hy- xnew = √ ...... (12) σ 2 ponyms) do not overlap. We evaluated the performance of

Vol.23 No.1, 2019 Journal of Advanced Computational Intelligence 63 and Intelligent Informatics Tsutsui, Y. and Hagiwara, M.

Fig. 7. Correct recall rate of the proposed AVAM for various mini-batch sizes. Fig. 6. Correct recall rate of the proposed AVAM with vari- ance = 1 and no addition of noise.

the proposed AVAM when the following parameters were changed.

• variance of visible layer (var) (Eq. (4) in Section 2.2) • mini-batch size • dimension size of hidden layer First, we examined the influence of variance defined by Eq. (4) in Section 2.2 through experiments. For Gaussian- Bernoulli (analog-discrete) RBM, it is recommended that the normal distribution value with mean 0 and variance 1 Fig. 8. Correct recall rate of the proposed AVAM when the is added in the sampling of the visible layer at the time of number of dimensions of the hidden layer is changed. learning [27]. In this experiment, learning was performed in two cases: the case when the noise of variance 1 is added and gested from the above experiments. The result is shown in the case when no noise is added. Fig. 6 shows the results. Fig. 9. As shown in this figure, the highest correct recall Here, ε means the learning rate. From this figure, it can rate is 90.32% at the time of 5,600 epoch. The reason why be observed that the corrrect recall rate is improved when the correct recall rate does not reach 100% is considered no noise is added, although the addition of noise of vari- that the task itself is very difficult and can be regarded as ance 1 is recommended [27]. It can be considered that the an auto association with 50% missing input as explained addition of randomness does not work well when RBM is in Section 3.2. Although correct recall rate is lower than used as an associative memory. According to the result, the case with small data, it is confirmed that the proposed henceforth we do not add noise at the time of learning in AVAM shows high performance even when large data is the experiments. used. Next, we examined the effect by mini-batch size. Fig. 7 shows the results. From this figure, it can be observed that convergence of learning becomes longer as mini-batch 4.6. Recall Experiment for Unlearned Words size increases. However, instead of slowing convergence, Here, we examined recall performance for unlearned the high correct recall rate is kept in wide range. It is ob- words. One of the largest features of usage of word vec- served that the maximum value of correct recall rate does tors is that similarity between words can be estimated by not change largely by mini-batch size. Considering the calculating distance of vectors such as Euclidian distance. balance between convergence and time to keep high cor- Since 18,740 word vectors were generated, 18,740 words rrect recall rate, we used mini-batch size of 100 in the can be treated in this experiment. experiments afterwards. In the following experiments, the lower words (hy- Next, we carried out experiments to examine the ef- ponyms) satisfying the following conditions are input to fect of the dimensions of the hidden layer. The result is the proposed AVAM as unknown words. shown in Fig. 8. From this figure, it can be seen that the corrrect recall rate decreases when the dimension of the • It is not included in the learning data used in Sec- hidden layer is too small or too large. According to the tion 4.5. experiment, the dimension of 1,000 shows the best result. • The upper word (hypernym) is included in learning Learning was carried out using the parameters sug- data in Section 4.5.

64 Journal of Advanced Computational Intelligence Vol.23 No.1, 2019 and Intelligent Informatics Analog Value Associative Memory Using RBM

Table 3. Some examples of stimulus word and recalled word.

Stimulus word Recalled word Dictionary Manual exp wolf mammalian T T musics art T T city place F T imitation behavior F T space the earth F F jewelry flower F F harbor building F F seaweed animal F F

results of the correct recall rate is 47.54%. Fig. 9. Correct recall rate of the proposed AVAM for large Table 3 shows some examples of stimulus word and amount of data. recalled word. For example, ‘place’ can be considered as an upper concept of ‘city,’ however, ‘place’ is not defined as the upper concept of ‘city’ in the associative concept dictionary. As shown in this table, there are similar cases in the associative concept dictionary. From the experiments, we can consider that the pro- posed AVAM can treat unknown words because similarity between words can be estimated by distributed represen- tation of words and has the high performance as an asso- ciative memory.

5. Conclusion Fig. 10. Examples of true and false recalls for unlearned words. An analog value associative memory using Restricted Boltzmann Machine (AVAM) has been proposed in this paper. The construction is very simple like the origi- • It appears in the associative concept dictionary more nal Restricted Boltzmann Machine. Since the proposed than 5 times. AVAM can treat analog value vectors, it can treat word vectors generated by word2vec. A word vector is gener- By defining unknown words like this, we obtained ated using huge amount of sentences in which the word 183 unknown words. These words were input as stimulus appears: the word vector itself contains rich and valuable words and recall experiments were carried out. When the information. Therefore, the proposed AVAM can treat un- words defined in the associative concept dictionary were learned words. output, it was regarded as correct recall. Various kinds of evaluation experiments were carried Figure 10 shows examples of true and false recalls for out to confirm the superior performance of the proposed unlearned words. When ‘food’ is recalled to the input AVA M . ‘clam’, in this case, the recall is regarded as true. This is We should note here again that finding of tendency of because ‘clam is-a food’ relation exists in the associative small values in the word vectors generated by word2vec concept dictionary. When ‘animal’ is recalled to the input also plays an important role in the high performance of ‘apple’, in this case, the recall is regarded as false. Be- AVAM. In addition, representation in the hidden layer cause ‘apple is-a animal’ relation does not exist in the as- is the further representation by the proposed AVAM. It sociative concept dictionary. The averaged corrrect recall might be used as representation learning. Furthermore, rate was 26.78%. Although this figure seems relatively since AVAM is not designed especially for word2vec, it low, correct recall rate for unlearned words would be 0% can be applied to the other forms of vectors. if vectors generated by word2vec were not appropriately Associative memories can be applied to higher level used. tasks such as classification [33] and transfer learning [34]. Although the associative concept dictionary [32] is a This is because these kinds of tasks require not only a human-made dictionary, the coverage of words is not huge amount of data but also the domain knowledge. We enough. In order to examine the low correct recall rate, believe that the AVAM has possibility to be a fundamental we carried out the second experiment in which a recalled memory in the advanced knowledge processing systems. word is checked by majority vote by five subjects. The five subjects in their twenties decided whether a recalled word was appropriate as the upper concept or not. The

Vol.23 No.1, 2019 Journal of Advanced Computational Intelligence 65 and Intelligent Informatics Tsutsui, Y. and Hagiwara, M.

References: [30] T. Mikolov, W.-T. Yih, and G. Zweig, “Linguistic regularities in [1] H. Alani et al., “Automatic ontology based knowledge extraction continuous space word representations,” Proc. of the 2013 Conf. of from web documents,” Intelligent Systems, IEEE, Vol.18, No.1, the North American Chapter of the Association for Computational pp. 14-21, 2003. Linguistics: Human Language Technologies (NAACL-HLT-2013), [2] H. C. Liu et al., “Dynamic adaptive fuzzy petri nets for knowl- pp. 746-751, 2013. edge representation and reasoning,” Systems, Man, and Cybernet- [31] H. Nishio, “Natural Language Processing by word2vec,” O’Reilly ics: Systems, IEEE Trans., Vol.43, No.6, pp. 99-1410, 2013. Japan, 2014 (in Japanese). [3] M. Beetz, et al., “Know Rob 2.0 – A 2nd generation knowledge [32] J. Okamoto and S. Ishizaki, “Construction of associative concept processing framework for cognition-enabled robotic agents,” IEEE dictionary with distance information, and comparison with elec- Int. Conf. on Robotics and Automation (ICRA), pp. 512-519, 2018. tronic concept dictionary,” Natural Language Processing, Vol.8, [4] A. M. Collins and M. R. Quillian, “Retrieval time from semantic No.4, pp. 37-54, 2001. memory,” J. of Verbal Learning and Verbal Behavior, Vol.8, No.2, [33] L. Zhang and D. Zhang, “Visual understanding via multi-feature pp. 240-247, 1969. shared learning with global consistency,” IEEE Trans. on Multime- dia, Vol.18, No.2, pp. 247-259, 2016. [5] M. R. Quillian, “The teachable language comprehender: A simu- lation program and theory of language,” Commun. ACM, Vol.12, [34] L. Zhang, W. Zuo, and D. Zhang, “LSDT: Latent sparse domain No.8, pp. 459-476, 1969. transfer learning for visual adaptation,” IEEE Trans. on Image Pro- cessing, Vol.25, No.3, pp. 1177-1191, 2016. [6] M. L. Minsky, “A framework for representing knowledge,” The Psy- chology of Computer Vision, pp. 211-277, 1975. [7] M. L. Minsky and S. A. Papert, “,” Personal Media, 1993 (in Japanese). [8] R. Mizoguchi, “Ontology Engineering,” Ohmsha, 2005 (in Name: Japanese). Yuichiro Tsutsui [9] D. J. Willshaw, O. P. Buneman, and H. C. Longuet-Higgins, “Nonholographic associative memory,” Nature, Vol.222, No.5197, Affiliation: pp. 960-962, 1969. Department of Information and Computer Sci- [10] K. Nakano, “Associatron-A model of associative memory,” IEEE ence, Keio University Trans. Syst. Man. Cybrn, Vol.SMC-2, No.3, pp. 380-388, 1972. [11] T. Kohonen, “Self-Organization and Associative Memory,” Springer Series in Information Sciences, Vol.8, Springer-Verlag, Berlin, Heidelberg, New York, Tokyo, 1984. [12] J. J. Hopfield, “Neural networks and physical systems with emer- gent collective computational abilities,” Proc. National Academy Address: Sciences, Vol.79, pp. 2554-2558, 1982. 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan [13] J. J. Hopfield, “Neurons with grabbed response have collective com- Brief Biographical History: putational properties like those of two state neurons,” Proc. National 2016 B.E. degree in Science an Technology from Keio University Academy Sciences, Vol.81, pp. 3088-3092, 1982. 2018 M.E. degree in Science an Technology from Keio University [14] B. Kosko, “Bidirectional Associative Memories,” IEEE Trans. Syst. Man. Cybrn, Vol.18, No.1, pp. 49-60, 1988. [15] B. Kosko, “Adaptive bidirectional associative memories,” Applied Optics, Vol.26, No.23, pp. 4947-4960, 1987. [16] M. Hagiwara, “Multidirectional associative memory,” Proc. IEEE Name: and INNS Int. Joint Conf. on Neural Networks, Vol.1, pp. 3-6, 1990. Masafumi Hagiwara [17] T. Kojima, H. Nagaoka, and T.Da-Te, “Some properties of an as- sociative memory model using the Boltzmann ,” Affiliation: Proc. IEEE Int. Joint Conf. on Neural Networks, Vol.3, pp. 2662- Professor, Department of Information and Com- 2665, 1993. puter Science, Keio University [18] D. Ackley, G. E. Hinton, and T. Sejnowski, “A Learning Algorithm for Boltzmann Machines,” Cognitive Science, Vol.9, No.1, pp. 147- 169, 1985. [19] T. Kojima, H. Nonaka, and T. Da-Te, “Capacity of the associative memory using the Boltzmann machine learning,” Proc. the IEEE Int. Conf. on Neural Networks, Vol.5, pp. 2572-2577, 1995. Address: [20] P. Smolensky, “Information processing in dynamical systems: 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522, Japan Foundations of harmony theory,” Parallel Distributed Processing: Brief Biographical History: Explanations in the Microstructure of Cognition, Vol.1, pp. 194- 1982 B.E. degree in Electrical Engineering from Keio University 281. MIT Press, 1986. 1984 M.E. degree in Electrical Engineering from Keio University [21] K. Nagatani and M. Hagiwara, “Restricted Boltzmann Machine 1987 Ph.D. degree in Electrical Engineering from Keio University associative memory,” IEEE Int. Joint Conf. on Neural Networks, pp. 3745-3750, 2014. 1987- Keio University 1990 IEEE Consumer Electronics Society Chester Sall Award [22] T. Mikolov et al., “Efficient estimation of word representations in vector space,” Int. Conf. on Learning Representations, Vol.3781, 1991-1993 Visiting Scholar at Stanford University pp. 1-12, 2013. 1996 Author Award from the Japan Society of Fuzzy Theory and Systems [23] T. Mikolov, Q. V. Le, and I. Sutskever, “Exploiting sim- 2003, 2004, and 2014 Technical Award and Paper Awards from Japan ilarities among languages for machine translation,” CoRR, Society of Kansei Engineering Vol.abs/1309.4168, 2013. 2013 Best Research Award from Japanese Neural Network Society [24] K. J. Oh et al., “Travel intention-based attraction network for rec- Main Works: ommending travel destinations,” 2016 Int. Conf. on Big Data and • Neural networks, fuzzy systems, and affective engineering Smart Computing (BigComp), pp. 277-280, 2016. Membership in Academic Societies: [25] Y. Zhao, J. Wang, and F. Wang, “Word embedding based retrieval • The Institute of Electronics, Information and Communication Engineers model for similar cases recommendation,” 2015 Chinese Automa- (IEICE) tion Congress (CAC), pp. 2268-2272, 2015. • Information Processing Society of Japan (IPSJ) [26] I. V. Serban et al., “Hierarchical neural network generative models • The Japanese Society for Artificial Intelligence (JSAI) for movie dialogues,” CoRR, Vol.abs/1507.04808, 2015. • The Japan Society for Fuzzy Theory and Intelligent Informatics (SOFT) [27] H. Okaya, “,” Kodansha, 2015 (in Japanese). • The Institute of Electrical Engineers of Japan (IEEJ) [28] Y. Tsutsui and M. Hagiwara, “Construction of semantic network • Japan Society of Kansei Engineering (JSKE) using restricted boltzmann machine,” IEICE NC2015-71, Vol.115, • Japanese Neural Network Society (JNNS) No.514, pp. 13-18, 2016 (in Japanese). • The Institute of Electrical and Electronics Engineers (IEEE), Senior [29] G. E. Hinton, “Training products of experts by minimizing con- Member trastive divergence,” Neural Computation, Vol.14, No.8, pp. 1771- 1800, 2002.

66 Journal of Advanced Computational Intelligence Vol.23 No.1, 2019 and Intelligent Informatics

Powered by TCPDF (www.tcpdf.org)