Retrieval Term Prediction Using Deep Belief Networks

PACLIC 28 Retrieval Term Prediction Using Deep Belief Networks Qing Ma† Ibuki Tanigawa† Masaki Murata‡ † Department of Applied Mathematics and Informatics, Ryukoku University ‡ Department of Information and Electronics, Tottori University [email protected] Abstract children, seniors, and foreigners, have difficulty deciding on the proper retrieval terms for represent- This paper presents a method to predict re- ing the retrieval objects,1 especially with searches trieval terms from relevant/surrounding words related to technical fields. The support systems are or descriptive texts in Japanese by using deep in place for search engine users that show suitable belief networks (DBN), one of two typical retrieval term candidates when some clues such as types of deep learning. To determine the effec- their descriptive texts or relevant/surrounding words tiveness of using DBN for this task, we tested are given by the users. For example, when the it along with baseline methods using example- based approaches and conventional machine relevant/surrounding words “computer”, “previous learning methods, i.e., multi-layer perceptron state”, and “return” are given by users, “system re- (MLP) and support vector machines (SVM), store” is predicted by the systems as a retrieval term for comparison. The data for training and test- candidate. ing were obtained from the Web in manual Our objective is to develop various domain- and automatic manners. Automatically cre- specific information retrieval support systems that ated pseudo data was also used. A grid search was adopted for obtaining the optimal hyper- can predict suitable retrieval terms from rele- parameters of these machine learning meth- vant/surrounding words or descriptive texts in ods by performing cross-validation on training Japanese. To our knowledge, no such studies have data. Experimental results showed that (1) us- been done so far in Japanese. As the first step, here, ing DBN has far higher prediction precisions we confined the retrieval terms to the computer- than using baseline methods and higher pre- related field and proposed a method to predict them diction precisions than using either MLP or using machine learning methods with deep belief SVM; (2) adding automatically gathered data and pseudo data to the manually gathered data networks (DBN), one of two typical types of deep as training data is an effective measure for fur- learning. ther improving the prediction precisions; and In recent years, deep learning/neural network (3) DBN is able to deal with noisier training techniques have attracted a great deal of attention data than MLP, i.e., the prediction precision of in various fields and have been successfully applied DBN can be improved by adding noisy train- not only in speech recognition (Li et al., 2013) and ing data, but that of MLP cannot be. image recognition (Krizhevsky et al., 2012) tasks but also in NLP tasks including morphology & syn- 1 Introduction 1For example, according to a questionnaire admin- The current Web search engines have a very high istered by Microsoft in 2010, about 60% of users had difficulty deciding on the proper retrieval terms. retrieval performance as long as the proper retrieval (http://www.garbagenews.net/archives/1466626.html) terms are given. However, many people, particularly (http://news.mynavi.jp/news/2010/07/05/028/) Copyright 2014 by Qing Ma, Ibuki Tanigawa, and Masaki Murata 28th Pacific Asia Conference on Language, Information and Computation pages 338–347 !338 PACLIC 28 tax (Billingsley and Curran, 2012; Hermann and scriptive texts and retrieval terms — is needed. The Blunsom, 2013; Luong et al., 2013; Socher et al., responses are typically called labels in supervised 2013a), semantics (Hashimoto et al., 2013; Srivas- learning and so here we call the retrieval terms tava et al., 2013; Tsubaki et al., 2013), machine labels. Table 1 shows examples of these pairs, translation (Auli et al., 2013; Liu et al., 2013; Kalch- where the “Relevant/surrounding words” are those brenner and Blunsom, 2013; Zou et al., 2013), text extracted from descriptive texts in accordance with classification (Glorot et al., 2011), information re- steps described in Subsection 2.4. In this section, trieval (Huang et al., 2013; Salakhutdinov and Hin- we describe how the corpus is obtained and how the ton, 2009), and others (Seide et al., 2011; Socher et feature vectors of the inputs are constructed from the al., 2011; Socher et al., 2013b). Moreover, a uni- corpus for machine learning. fied neural network architecture and learning algo- rithm has also been proposed that can be applied to 2.1 Manual and Automatic Gathering of Data various NLP tasks including part-of-speech tagging, Considering that the descriptive texts of labels nec- chunking, named entity recognition, and semantic essarily include their relevant/surrounding words, role labeling (Collobert et al., 2011). we gather Web pages containing these texts in both To our knowledge, however, there have been no manual and automatic manners. In the manual man- studies on applying deep learning to information re- ner, we manually select the Web pages that de- trieval support tasks. We therefore have two main scribe the labels. In contrast, in the automatic man- objectives in our current study. One is to develop ner, we respectively combine five words or parts of an effective method for predicting suitable retrieval phrases (toha, “what is”), (ha, “is”), terms and the other is to determine whether deep (toiumonoha, “something like”), learning is more effective than other conventional (nitsuiteha, “about”), and (noim- machine learning methods, i.e., multi-layer percep- iha, “the meaning of”), on the labels to form the re- tron (MLP) and support vector machines (SVM), in trieval terms (e.g., if a label is such NLP tasks. (gurafikku boudo, “graphic board”), then the re- The data used for experiments were obtained from trieval terms are (gu- the Web in both manual and automatic manners. Au- rafikku boudo toha, “what is graphic board”), tomatically created pseudo data was also used. A (gurafikku boudo ha, “graphic grid search was used to obtain the optimal hyper- board is”), and etc.) and then use these terms to ob- parameters of these machine learning methods by tain the relevant Web pages by a Google search. performing cross-validation on training data. Ex- 2.2 Pseudo Data perimental results showed that (1) using DBN has a far higher prediction precision than using baseline To acquire as high a generalization capability as pos- methods and a higher prediction precision than us- sible, for training we use not only the small scale of ing either MLP or SVM; (2) adding automatically manually gathered data, which is high precision, but gathered data and pseudo data to the manually gath- also the large scale of automatically gathered data, ered data as training data is an effective measure for which includes a certain amount of noise. In con- further improving the prediction precision; and (3) trast to manually gathered data, automatically gath- the DBN can deal with noisier training data than the ered data might have incorrect labels, i.e., labels MLP, i.e., the prediction precision of DBN can be that do not match the descriptive texts. We there- improved by adding noisy training data, but that of fore also use pseudo data, which can be regarded MLP cannot be. as data that includes some noises and/or deficiencies added to the original data (i.e., to the descrip- 2 The Corpus tive texts of the manually gathered data) but with less noise than the automatically gathered data and For training, a corpus consisting of pairs of inputs with all the labels correct. The procedure for creat- and their responses (or correct answers) — in our ing pseudo data from the manually gathered data in- case, pairs of the relevant/surrounding words or de- volves (1) extracting all the different words from the !339 PACLIC 28 Labels Inputs (Descriptive texts or relevant/surrounding words; translated from Japanese) (Retrieval terms) Graphic Descriptive text Also known as: graphic card, graphic accelerator, GB, VGA. While the board screen outputs the picture actually seen by the eye, the screen only displays as commanded and does not output anything if ··· Relevant/surrounding screen, picture, eye, displays, as commanded, words ··· Descriptive text A device that provides independent functions for outputting or inputting video as signals on a PC or various other types of computer ··· Relevant/surrounding independent, functions, outputting, inputting, video, signals, PC, words ··· Main memory · ·· · ·· Table 1: Examples of input-label pairs in the corpus. manually gathered data and (2) for each label, ran- than two labels; (5) use the words obtained by the domly adding the words that were extracted in step above steps as the vector elements with binary val- (1) but not included in the descriptive texts and/or ues, taking value 1 if a word appears and 0 if not; deleting words that originally existed in the descrip- and (6) perform morphological analysis on all data tive texts so that the newly generated data (i.e., the described in Subsections 2.1, 2.2, and 2.3 and con- newly generated descriptive texts) have 10% noises struct the feature vectors in accordance with step (5). and/or deficiencies added to the original data. 3 Deep Learning 2.3 Testing Data Two typical approaches have been proposed for im- The data described in Subsections 2.1 and 2.2 are for plementing deep learning: using deep belief net- training. The data used for testing are different to the works (DBN) (Hinton et al., 2006; Lee et al., training data and are also obtained from automat- 2009; Bengio et al., 2007; Bengio, 2009; Bengio et ically gathered data. Since automatically gathered al., 2013) and using stacked denoising autoencoder data may include a lot of incorrect labels that cannot (SdA) (Bengio et al., 2007; Bengio, 2009; Bengio et be used as objective assessment data, we manually al., 2013; Vincent et al., 2008; Vincent et al., 2010).

Load more