The Helmholtz Machine Revisited, EPFL2012

Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The Helmholtz machine revisited Danilo Jimenez Rezende BMI-EPFL November 8, 2012 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks 1 Introduction 2 Variational Approximation 3 Relevant special cases 4 Learning with non-factorized qs 5 Extending the model over time 6 Final remarks Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Only X0 is observed. Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Helmholtz machines Helmholtz machines [Dayan et al., 1995, Dayan and Hinton, 1996, Dayan, 2000] are directed graphical models with a layered structure: XL Complete data likelihood: L−1 g Y g p(X jθ ) = p(Xl jXl+1; θ )p(XL); X1 l=0 X0 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Helmholtz machines Helmholtz machines [Dayan et al., 1995, Dayan and Hinton, 1996, Dayan, 2000] are directed graphical models with a layered structure: XL Complete data likelihood: L−1 g Y g p(X jθ ) = p(Xl jXl+1; θ )p(XL); X1 l=0 Only X0 is observed. X0 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Smooth units: g g g p(Xl jXl+1) = N(Xl ; tanh(Wl Xl+1 + Bl ); Σl ): Parameters θg = fW g ; Bg ; Σg g. Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Types of Helmholtz machines Binary units: g g p(Xl jXl+1) = Bern ◦ sigmoid(Wl Xl+1 + Bl ); Parameters θg = fW g ; Bg g. Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Types of Helmholtz machines Binary units: g g p(Xl jXl+1) = Bern ◦ sigmoid(Wl Xl+1 + Bl ); Parameters θg = fW g ; Bg g. Smooth units: g g g p(Xl jXl+1) = N(Xl ; tanh(Wl Xl+1 + Bl ); Σl ): Parameters θg = fW g ; Bg ; Σg g. Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Goal For a data-set of iid samples y 2 Data, maximize data log-likelihood w.r.t θg X ln p(Datajθg ) = ln p(yjθg ); y2Data where Z L L−1 g Y Y g p(yjθ ) = dXl p(yjX1) p(Xl jXl+1; θ )p(XL) l>0 l=1 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited q can be any distribution with same support as p Choose q so that we can calculate expectations in a reasonable time Allows to solve inference using standard optimization techniques Why ? Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Choose q so that we can calculate expectations in a reasonable time Allows to solve inference using standard optimization techniques Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Why ? q can be any distribution with same support as p Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Allows to solve inference using standard optimization techniques Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Why ? q can be any distribution with same support as p Choose q so that we can calculate expectations in a reasonable time Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Why ? q can be any distribution with same support as p Choose q so that we can calculate expectations in a reasonable time Allows to solve inference using standard optimization techniques Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Fast convergence Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Yields local message-passing type of algorithms Fast convergence Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Fast convergence Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Fast convergence Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Fast convergence Bad approximation to multimodal posteriors Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Conditioned on X0 Cannot solve any expectation analytically Resort to Monte Carlo approximations Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Helmholtz' machine choice of q Bottom-up graph [Dayan et al., 1995, X L Dayan and Hinton, 1996, Dayan, 2000]: L r Y r q(Xl>0jX0; θ ) = p(Xl jXl−1; θ

The Helmholtz Machine Revisited, EPFL2012

Natural Reweighted Wake-Sleep

Introduction Boltzmann Machine: Learning

Factor Analysis Using Delta-Rule Wake-Sleep Learning

Arxiv:1506.00354V2 [Q-Bio.NC] 24 Jul 2015

Does the Wake-Sleep Algorithm Produce Good Density Estimators?

Flexible and Accurate Inference and Learning for Deep Generative Models

Backpropagation and the Brain the Structure That Is Only Implicit in the Raw Sensory Input

Restricted Boltzmann Machine and Deep Belief Network: Tutorial and Survey

Bidirectional Helmholtz Machines

Helmholtz Machines Memory • Fluctuation • Dreams

Flexible and Accurate Inference and Learning for Deep Generative Models

The Helmholtz Machine