JournalofMachineLearningResearch13(2012)643-669 Submitted 6/11; Revised 11/11; Published 3/12
Learning Algorithms for the Classification Restricted Boltzmann Machine
Hugo Larochelle [email protected] Universite´ de Sherbrooke 2500, boul. de l’Universite´ Sherbrooke, Quebec,´ Canada, J1K 2R1
Michael Mandel [email protected] Razvan Pascanu [email protected] Yoshua Bengio [email protected] Departement´ d’informatique et de recherche operationnelle´ Universite´ de Montreal´ 2920, chemin de la Tour Montreal,´ Quebec,´ Canada, H3T 1J8
Editor: Daniel Lee
Abstract Recent developments have demonstrated the capacity of restricted Boltzmann machines (RBM) to be powerful generative models, able to extract useful features from input data or construct deep arti- ficial neural networks. In such settings, the RBM only yields a preprocessing or an initialization for some other model, instead of acting as a complete supervised model in its own right. In this paper, we argue that RBMs can provide a self-contained framework for developing competitive classifiers. We study the Classification RBM (ClassRBM), a variant on the RBM adapted to the classification setting. We study different strategies for training the ClassRBM and show that competitive classi- fication performances can be reached when appropriately combining discriminative and generative training objectives. Since training according to the generative objective requires the computation of a generally intractable gradient, we also compare different approaches to estimating this gradient and address the issue of obtaining such a gradient for problems with very high dimensional inputs. Finally, we describe how to adapt the ClassRBM to two special cases of classification problems, namely semi-supervised and multitask learning. Keywords: restricted Boltzmann machine, classification, discriminative learning, generative learn- ing
1. Introduction The restricted Boltzmann machine (RBM) is a probabilistic model that uses a layer of hidden binary variables or units to model the distribution of a visible layer of variables. It has been successfully ap- plied to problems involving high dimensional data such as images (Hinton et al., 2006; Larochelle et al., 2007) and text (Welling et al., 2005; Salakhutdinov and Hinton, 2007; Mnih and Hinton, 2007). In this context, two approaches are usually followed. First, an RBM is trained in an unsuper- vised manner to model the distribution of the inputs (possibly more than one RBM could be trained, stacking them on top of each other (Hinton et al., 2006)). Then, the RBM is used in one of two ways: either its hidden layer is used to preprocess the input data by replacing it with the represen-