2.3 the Boltzmann Machine As a Neural Network

A new approach to Decimation in High Order Boltzmann Machines PhD dissertation Student: E. Farguell ([email protected]) Advisor: F. Mazzanti ([email protected]) Acknowledgments There is a lot of people who should be addressed at this moment, and who have helped me in so many ways that I do not have enough words to thank them for their efforts. However, I would specially like to thank my parents and my fiancée for their support all over those years when it was most needed. It has been 5 years Susanna now -not actually counting from 1987 since we first met-, but it is just as the first day. This note would not be complete if I did not thank my friends, the guys from the old Electronics Department and these great people who move around Enginyeria i Ar- quitectura La Salle and who I meet so often around there. I would like to thank finally Enginyeria i Arquitectura La Salle for their support over these years. Preface Outline The Boltzmann Machine (BM) is a stochastic neural network with the ability of both learning and extrapolating probability distributions. However, it has never been as widely used as other neural networks such as the perceptron, due to the complexity of both the learning and recalling algorithms, and to the high computational cost required in the learning process: the quantities that are needed at the learning stage are usually estimated by Monte Carlo (MC) through the Simulated Annealing (SA) algorithm. This has led to a situation where the BM is rather considered as an evolution of the Hopfield Neural Network or as a parallel implementation of the Simulated Annealing algorithm. Despite this relative lack of success, the neural network community has continued to progress in the analysis of the dynamics of the model. One remarkable extension is the High Order Boltzmann Machine (HOBM), where weights can connect more than two neurons at a time. Although the learning capabilities of this model have already been discussed by other authors [Kosmatopoulos and Christodoulou, 1994,Albizuri et al., 1995], a formal equivalence between the weights in a standard BM and the high order weights in a HOBM has not yet been established. We analyze this latter equivalence between a second order BM and a HOBM by propos- ing an extension of the method known as decimation [Itzykson and Drouffe, 1991,Saul and Jordan, 1994]. Decimation is a common tool in statistical physics that may be applied to some kind of Boltzmann Machines, that can be used to obtain analytical expressions for the n-unit correlation elements required in the learning process. In this way, decimation avoids using the time consuming Simulated Annealing algorithm. However, as it was first conceived, it could only deal with sparsely connected neural networks. The extension that we define in this thesis allows computing the same quantities irrespective of the topology of the network. This method is based on adding enough high order weights to a standard BM to guarantee that the system can be solved. Next, we establish a direct equivalence between the weights of a HOBM model, the probability distribution to be learnt and Hadamard matrices. The properties of these matrices can be used to easily calculate the value of the weights of the system. Finally, we define a standard BM with a very specific topology that helps us better understand the exact equivalence between hidden units in a BM and high order weights in a HOBM. Contents This memory is organized as follows: in chapter 1 a review of the historical facts that lead to the development of the original neural networks theory is introduced. In this chapter, the behavior of two of the best known neural network models that have been used along the years (the multilayer perceptron and the Hopfield neural network) is also briefly revisited. The dynamics of the BM, its extension to the HOBM model and the common learning techniques that are used on Boltzmann Machines are described in chapter 2 Standard decimation is analyzed in chapter 3. The discussion includes the full expla- nation on how this method works, as well as its limitations. The way all these problems are overcome is described in the last sections of this chapter. Chapter 4 discusses how a Hadamard matrix can be used to relate the weights of a HOBM with the probability distribution that it represents. In chapter 5 we present a specific BM model where high order weights find a direct equivalence in terms of second order connections and hidden units. Finally, chapter 6 points out the conclusions that are extracted from this thesis. In the appendix we describe Hadamard matrices and the Walsh-Hadamard transform. Contents 1 The Neural Network 1 1.1Introduction................................... 1 1.2ThebiologicalNeuralNetwork......................... 2 1.2.1 Structure................................ 2 1.2.2 Thetransmissionofelectricalimpulses................ 4 1.3TheArtificialNeuralNetwork......................... 6 1.3.1 DynamicsandTopology........................ 7 1.3.2 Learning in ANNs . ........................ 12 1.3.3 HighorderANNmodels........................ 16 2 The Boltzmann Machine 21 2.1Introduction................................... 21 2.2SimulatedAnnealing.............................. 22 2.2.1 TheMetropolisalgorithm....................... 22 2.2.2 TheSimulatedAnnealingalgorithm.................. 23 2.3TheBoltzmannMachineasaNeuralNetwork................ 25 2.3.1 TopologyoftheBM.......................... 25 2.3.2 DynamicsandalgorithmforaBM.................. 28 2.3.3 Themeanfieldequations........................ 30 i ii CONTENTS 2.3.4 ThehighorderBoltzmannMachine.................. 35 2.4LearningonBoltzmannMachines....................... 36 2.4.1 LearningexpressionforastandardBM................ 36 2.4.2 LearningalgorithmforaBM..................... 40 2.4.3 LearningonaHOBM.......................... 42 2.4.4 TheMeanFieldlearningsolution................... 44 3 The process of Decimation 47 3.1Introduction................................... 47 3.2DecimationappliedtotheBM......................... 49 3.2.1 Mainconceptsfromdecimation.................... 49 3.2.2 Parallelassociation........................... 52 3.2.3 Serialassociation............................ 54 3.2.4 Star-triangledecimation........................ 57 3.3Correlationsandexpectationvalues...................... 60 3.3.1 Expectationvalueforasingleunit.................. 61 3.3.2 Correlationoftwofreeunits...................... 62 3.3.3 Correlationofafreeandaclampedconnectedunits......... 64 3.4HighorderDecimation............................. 65 3.4.1 Biasedstar-triangledecimation.................... 65 3.4.2 TheHOBMappliedtodecimation................... 67 3.4.3 HODnumericalexample........................ 73 3.5Multipleunitdecimationprocess....................... 75 3.5.1 IterativeHODandtheMultipleDecimationequivalence...... 76 3.5.2 Twounitsdecimation.......................... 77 3.5.3 Multipleunitdecimationfora10unitsBM............. 80 CONTENTS iii 3.6SimulationsandresultsapplyingHOD.................... 83 3.6.1 Theletterrecognitionproblem:atoyproblem............ 84 3.6.2 Problemsfromabenchmarkingrepository.............. 86 3.6.3 TheMonkProblem........................... 88 4 BM learning through Hadamard matrices 93 4.1Introduction................................... 93 4.2 Reduction of connections between input units on a HOBM ......... 94 4.3Theforwardproblem.............................. 95 4.4Thebackwardsproblem............................ 96 4.4.1 Thebackwardsproblemforaknownp.d.f............... 97 4.4.2 BackwardsproblemsolutionforathreeunitsBM..........103 4.4.3 Thebackwardsproblemforaconditionalp.d.f.............106 4.4.4 Generalsolutionforthebackwardsconditionalproblem.......111 4.4.5 Thebackwardsincompleteproblem..................115 4.5BackwardsincompleteproblemLUsolution.................117 4.5.1 Kullback-Leibler distance optimization and the LU solution . 118 4.5.2 Thepriorityencoderproblem.....................121 5 Analytical learning process for a BM 125 5.1Introduction...................................125 5.2BoolearithmeticrepresentationonaBM...................126 5.2.1 Basiclogicoperations..........................126 5.2.2 Extensionsofthebasiclogicoperations................131 5.2.3 Twostagelogicoperations.......................140 5.2.4 System with two output units and several inputs ..........148 5.2.5 General case for the output joint probability distribution ......159 iv CONTENTS 5.2.6 Errortermduetothehyperboliccosineapproximation.......166 5.3PracticalimplementationofaBM.......................169 5.3.1 Descriptionoftheimplementation...................169 5.3.2 Two inputs, two outputs BM .....................170 5.3.3 Three inputs, three outputs BM ....................177 6 Summary and conclusions 187 Properties of Hadamard matrices 193 A.1GeneralpropertiesofHadamardmatrices...................193 A.2UseofHadamardmatricesinHOBMs.....................195 A.2.1TheWalsh-Hadamardtransform....................203 List of Figures 1.1Standardneuronstructure............................ 3 1.2 Activation process. Vrest is set at −60 mV, notice the time and voltage scalerepresentedintheupperpartoftheimage................ 5 1.3Perceptronstructure............................... 7 1.4Perceptronwithonehiddenlayer........................ 9 1.5 Piece-wise linear (a) and pure linear (b) functions. ............. 10 1.6 Third order weight linking units Si, Sj and Sk................. 17 2.1 Notation for input (a), hidden (b) and output (c) units. ........... 26 2.2 Termination BM (a) and Input-Output

2.3 the Boltzmann Machine As a Neural Network

Neural Networks for Machine Learning Lecture 12A The

A Boltzmann Machine Implementation for the D-Wave

Boltzmann Machine Learning with the Latent Maximum Entropy Principle

Stochastic Classical Molecular Dynamics Coupled To

Learning and Evaluating Boltzmann Machines

Learning Algorithms for the Classification Restricted Boltzmann

An Efficient Learning Procedure for Deep Boltzmann Machines

Machine Learning Algorithms Based on Generalized Gibbs Ensembles

Quantum Restricted Boltzmann Machine Is Universal for Quantum Computation

Machine Learning and Quantum Phases of Matter Hugo Théveniaut

Feature Learning and Graphical Models for Protein Sequences

Introduction Boltzmann Machine: Learning