Fundamentals of Learning Algorithms in Boltzmann Machines
Total Page:16
File Type:pdf, Size:1020Kb
Fundamentals of Learning Algorithms in Boltzmann Machines by Mihaela G. Erbiceanu M. Eng., "Gheorghe Asachi" Technical University, 1991 Project Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Computing Science in the School of Computing Science Faculty of Applied Sciences © Mihaela G. Erbiceanu 2016 SIMON FRASER UNIVERSITY Fall 2016 Approval Name: Mihaela Erbiceanu Degree: Master of Science (Computing Science) Title: Fundamentals of Learning Algorithms in Boltzmann Machines Examining Committee: Chair: Binay Bhattacharya Professor Petra Berenbrink Senior Supervisor Professor Andrei Bulatov Supervisor Professor Leonid Chindelevitch External Examiner Assistant Professor Date Defended/Approved: September 7, 2016 ii Abstract Boltzmann learning underlies an artificial neural network model known as the Boltzmann machine that extends and improves upon the Hopfield network model. Boltzmann machine model uses stochastic binary units and allows for the existence of hidden units to represent latent variables. When subjected to reducing noise via simulated annealing and allowing uphill steps via Metropolis algorithm, the training algorithm increases the chances that, at thermal equilibrium, the network settles on the best distribution of parameters. The existence of equilibrium distribution for an asynchronous Boltzmann machine is analyzed with respect to temperature. Two families of learning algorithms, which correspond to two different approaches to compute the statistics required for learning, are presented. The learning algorithms based only on stochastic approximations are traditionally slow. When variational approximations of the free energy are used, like the mean field approximation or the Bethe approximation, the performance of learning improves considerably. The principal contribution of the present study is to provide, from a rigorous mathematical perspective, a unified framework for these two families of learning algorithms in asynchronous Boltzmann machines. Keywords: Boltzmann–Gibbs distribution, Gibbs free energy, asynchronous Boltzmann machine, thermal equilibrium, data–dependent statistics, data–independent statistics, stochastic approximation, variational method, mean field approximation, Bethe approximation. iii Dedication This thesis is dedicated to my mother for her support, sacrifice, and constant love. iv Acknowledgements First and foremost, I would like to thank my supervisor Petra Berenbrink not only for giving me the opportunity to work on this thesis under her supervision, but also for her valuable feedback. I would also like to thank Andrei Bulatov, my second supervisor, and my committee members, Leonid Chindelevitch and Binay Bhattacharya, for their support, encouragement, and patience. v Table of Contents Approval ..................................................................................................................... ii Abstract .................................................................................................................... iii Dedication .................................................................................................................... iv Acknowledgements ................................................................................................................... v Table of Contents .................................................................................................................... vi List of Tables .................................................................................................................. viii List of Figures .................................................................................................................... ix List of Acronyms .................................................................................................................... x Chapter 1. Introduction ......................................................................................................... 1 1.1 Motivation .................................................................................................................... 1 1.2 Overview and roadmap ..................................................................................................... 3 1.3 Related work .................................................................................................................... 4 1.4 Connection to other disciplines ......................................................................................... 6 Chapter 2. Foundations ........................................................................................................ 7 2.1 Boltzmann–Gibbs distribution ............................................................................................ 7 2.2 Markov random fields and Gibbs measures .................................................................... 10 2.3 Gibbs free energy ........................................................................................................... 15 2.4 Connectionist networks ................................................................................................... 19 2.5 Hopfield networks ........................................................................................................... 21 2.5.1 Hopfield network models ............................................................................................ 22 2.5.2 Convergence of the Hopfield network ......................................................................... 28 Chapter 3. Variational methods for Markov networks ...................................................... 32 3.1 Pairwise Markov networks as exponential families .......................................................... 32 3.1.1 Basics of exponential families .................................................................................... 33 3.1.2 Canonical representation of pairwise Markov networks .............................................. 34 3.1.3 Mean parameterization of pairwise Markov networks ................................................. 35 3.1.4 The role of transformations between parameterizations ............................................. 37 3.2 The energy functional...................................................................................................... 38 3.3 Gibbs free energy revisited ............................................................................................. 42 3.3.1 Hamiltonian and Plefka expansion ............................................................................. 43 3.3.2 The Gibbs free energy as a variational energy ........................................................... 45 3.4 Mean field approximation ................................................................................................ 48 3.4.1 The mean field energy functional ............................................................................... 48 3.4.2 Maximizing the energy functional: fixed–point characterization .................................. 50 3.4.3 Maximizing the energy functional: the naïve mean field algorithm .............................. 55 3.5 Bethe approximation ....................................................................................................... 57 3.5.1 The Bethe free energy ............................................................................................... 58 3.5.2 The Bethe–Gibbs free energy .................................................................................... 60 3.5.3 The relationship between belief propagation fixed–points and Bethe free energy....... 62 3.5.4 Belief optimization ...................................................................................................... 66 Chapter 4. Introduction to Boltzmann Machines ............................................................... 68 4.1 Definitions .................................................................................................................. 68 4.2 Modelling the underlying structure of an environment ..................................................... 74 4.3 Representation of a Boltzmann Machine as an energy–based model ............................. 77 vi 4.4 How a Boltzmann Machine models data ......................................................................... 82 4.5 General dynamics of Boltzmann Machines ..................................................................... 84 4.6 The biological interpretation of the model ........................................................................ 90 Chapter 5. The Mathematical Theory of Learning Algorithms for Boltzmann Machines ............................................................................................................................ 93 5.1 Problem description ........................................................................................................ 93 5.2 Phases of a learning algorithm in a Boltzmann Machine ................................................. 97 5.3 Learning algorithms based on approximate maximum likelihood ..................................... 99 5.3.1 Learning by minimizing the KL–divergence of Gibbs measures ................................ 100 5.3.2 Collecting the statistics required for learning ............................................................ 114 5.4 The equilibrium distribution of a Boltzmann machine .................................................... 121 5.5 Learning algorithms based on variational approaches..................................................