The Helmholtz Machine Revisited, EPFL2012

The Helmholtz Machine Revisited, EPFL2012

Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The Helmholtz machine revisited Danilo Jimenez Rezende BMI-EPFL November 8, 2012 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks 1 Introduction 2 Variational Approximation 3 Relevant special cases 4 Learning with non-factorized qs 5 Extending the model over time 6 Final remarks Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Only X0 is observed. Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Helmholtz machines Helmholtz machines [Dayan et al., 1995, Dayan and Hinton, 1996, Dayan, 2000] are directed graphical models with a layered structure: XL Complete data likelihood: L−1 g Y g p(X jθ ) = p(Xl jXl+1; θ )p(XL); X1 l=0 X0 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Helmholtz machines Helmholtz machines [Dayan et al., 1995, Dayan and Hinton, 1996, Dayan, 2000] are directed graphical models with a layered structure: XL Complete data likelihood: L−1 g Y g p(X jθ ) = p(Xl jXl+1; θ )p(XL); X1 l=0 Only X0 is observed. X0 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Smooth units: g g g p(Xl jXl+1) = N(Xl ; tanh(Wl Xl+1 + Bl ); Σl ): Parameters θg = fW g ; Bg ; Σg g. Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Types of Helmholtz machines Binary units: g g p(Xl jXl+1) = Bern ◦ sigmoid(Wl Xl+1 + Bl ); Parameters θg = fW g ; Bg g. Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Types of Helmholtz machines Binary units: g g p(Xl jXl+1) = Bern ◦ sigmoid(Wl Xl+1 + Bl ); Parameters θg = fW g ; Bg g. Smooth units: g g g p(Xl jXl+1) = N(Xl ; tanh(Wl Xl+1 + Bl ); Σl ): Parameters θg = fW g ; Bg ; Σg g. Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Goal For a data-set of iid samples y 2 Data, maximize data log-likelihood w.r.t θg X ln p(Datajθg ) = ln p(yjθg ); y2Data where Z L L−1 g Y Y g p(yjθ ) = dXl p(yjX1) p(Xl jXl+1; θ )p(XL) l>0 l=1 Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick r Introduce a parametric family of distributions q(Xl>0jX0; θ ) Then g r F(X0,θ ,θ ) g z r }| { r ln p(X0jθ ) = − hln q(Xl>0jX0; θ ) − ln p(X )iq(Xl>0jX0,θ ) +KL(q; p); where r q(Xl>0jX0; θ ) KL(q; p) = hln i r g q(Xl>0jX0,θ ) p(Xl>0jX0; θ ) g g r r KL ≥ 0 ) ln p(X0; θ ) ≥ −F(X0; θ ; θ ) 8θ Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited q can be any distribution with same support as p Choose q so that we can calculate expectations in a reasonable time Allows to solve inference using standard optimization techniques Why ? Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Choose q so that we can calculate expectations in a reasonable time Allows to solve inference using standard optimization techniques Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Why ? q can be any distribution with same support as p Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Allows to solve inference using standard optimization techniques Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Why ? q can be any distribution with same support as p Choose q so that we can calculate expectations in a reasonable time Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks The variational trick Redefine the learning problem as X fθ^g ; θ^r g = arg min F(x; θg ; θr ); g r θ ,θ x2Data Why ? q can be any distribution with same support as p Choose q so that we can calculate expectations in a reasonable time Allows to solve inference using standard optimization techniques Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Fast convergence Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Yields local message-passing type of algorithms Fast convergence Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Fast convergence Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Bad approximation to multimodal posteriors Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Fast convergence Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Fully factorized q r Y r q(Xl>0jX0; θ ) = p(Xl jθl;i ); l>0;i Expectations are analytically tractable for sigmoid/tanh nonlinearities [Frey, 1996, Jordan, 1999] Yields local message-passing type of algorithms Fast convergence Bad approximation to multimodal posteriors Danilo Jimenez Rezende BMI-EPFL The Helmholtz machine revisited Conditioned on X0 Cannot solve any expectation analytically Resort to Monte Carlo approximations Introduction Variational Approximation Relevant special cases Learning with non-factorized qs Extending the model over time Final remarks Helmholtz' machine choice of q Bottom-up graph [Dayan et al., 1995, X L Dayan and Hinton, 1996, Dayan, 2000]: L r Y r q(Xl>0jX0; θ ) = p(Xl jXl−1; θ

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    64 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us