Applications of Information Geometry
Clustering based on Divergence: Cluster Center Stochastic Reasoning: Belief Propagation in Graphical Model Support Vector Machine Bayesian Framework and Restricted Boltzmann Machine Natural Gradient in Multilayer Perceptron Learning Independent Component Analysis Sparse Signal Analysis and Minkovskian Gradient Convex Optimization Clustering of Points D {123 , , ,...,N } Clustering : center of a cluster of points Center of C C 1,,m
arg minD , i i C 2 1 3
m * is simply the arithmetic average in the dual coordinate system
D[: iii ] () ()
D[:ii ]
()0i 1 * m i k‐means: clustering algorithm k‐means: clustering algorithm
1. Choose k cluster centers 2. Classify patterns of D into clusters 3. Calculate new cluster centers Voronoi Diagram: Boundaries of Clusters
DD[: 12 *] [: *] Total Bregman divergence (Vemuri)
()pq ()p q tBD(pq : ) 2 1| ()|q t-center of C
w * ii wi
1 w i 2 1 i Conformal change of divergence
Dpq :: qDpq
gpgij ij
TTsijk () ijk kg ijs jg iks ig jk s log ii t-center is robust
E 1,,; n y
1 zy;, n influence function z ;y
zyc as : robust 1[:][:DD y] minimize ( i ) Nww1 iN1 y zy, G 1 wy
y zy , G 1 Robust: z is bounded 1 y 2
zyy, yy wy 2 1 y 1 2 Euclidean case f x 2 wy 1 MPEG7 database
• Great intraclass variability, and small interclass dissimilarity. Shape representation First clustering then retrieval Other TBD applications
Diffusion tensor imaging (DTI) analysis [Vemuri] • Interpolation • Segmentation
Baba C. Vemuri, Meizhu Liu, Shun-ichi Amari and Frank Nielsen, Total Bregman Divergence and its Applications to DTI Analysis, IEEE TMI, to appear Information Geometry of Stochastic Reasing
Belief Propagation in Graphical Model
• Shun‐ichi Amari (RIKEN BSI) • Shiro Ikeda (Inst. Statist. Math.) • Toshiyuki Tanaka (Kyoto U.) Stochastic Reasoning: Graphical Model pxyzrs(,,,,) pxyz(, , |,) rs xyz,,,...1,1 Stochastic Reasoning q(x1,x2,x3,…| observation)
X= (x1 x2 x3 …..) x = 1, ‐1
X= argmax q(x1, x2 ,x3 ,…..) maximum likelihood
Xi = sgn E[xi] least bit error rate estimator Mean Value Marginalization: projection to independent distributions
011220qqxqxqxq(xx ) ( ) ( )...nn ( ) ( ) qx( ) qx ( ..., xdxdxdx ) .. .. ii 1, n 1 i n Ex[] E [] x qq0 cliques L qkxcxxexp ii r q r1 ccxxrx , ii rrii1 s 1 s xriii1, 1 12, qwxxhxx epx Boltzmann machine, spin glass,ij ij neural networks ii Turbo Codes, LDPC Codes Computationally Difficult
qExx
qcxxexp rq mean‐field approximation belief propagation tree propagation, CCCP (convex‐concave) Information Geometry of Mean Field Approximation qx() • m‐projection Dq[:] p qx ()log • e‐projection x p()x
M {()} px m q = argmin D[q:p] 0 ii i 0 e q = argmin D[p:q] 0
px() M0 m‐projection keeps the expectation of x
pqM(x, *) 0 px(x, *) * qx() p *() x tx() px*( )
tx(), x *p* ( x *)() qx p *()0 x x Exqp[] E* [] x Information Geometry qx
M r M r' M 0
qx() exp{ cr () x Mp000 xx,exp 0 0
Mprrrxxx,exp c r r r rL 1, , Belief Prop Algorithm
r 'r M r ' r M r' r M0 Belief Propagation
px(, rrrr ) exp{() c x x } px(,tt ) px (, ) 00rr r tt1t rrrr0 p (xcx , ) : belief for r ( ) tt11 rr ' rr' t1 t1 0 r Equilibrium of BP ,ξr
1) m‐condition
* M 0 prrx, r M' mM-flat submanifold r M0 ξθ1(')ξ 2) e‐condition q * θξ M r r M r' qex -flat submanifold M0 Belief Propagation e‐condition OK
(θξ ;12 , ξ , ξLr , θ ' ξ '
(,12 ,LL ) (',' 1 2 ,')
CCCP m‐condition OK θθ:(,)(,)px px 00rrrr
θ0 ' r Convex‐Concave Computational Procedure (CCCP) : A. Yuille
FFF() 12 () () tt1 FF12() ()
Elimination of double loops Geometry of Support Vector Machine modification of kernel
Simple perceptron: decision surface fx() wx b minimize |w |2 ||wx b d constraint ywxii ( b ) 1 ||w 1 Lwb(,,) | w |2 ( wx b ) 2 ii iiyx 0 : support vector i : i 0 w y x f (xw , ) y xb iii iii Embedding in higher‐dimensional space
xz() x f ()xwxb ()
z: infinite‐dimensional K(,xx ') () x (') x Kernel trick K(,')(')'xx xdx (') x iii 1 ()xx () i i
fxw(, ) yKxx ( ,) ii i 22 ds|( z x dx ) z ()| x z() x z () x dxij dx xxij gxij () () x () x xxij 2 gxij() Kx (,x')|' x x xxij Conformal Transformation of Kernel (xfx ) exp{ { ( )}2 Kxx (, ') () x (')(, x Kxx ') 2 (x ) exp{ii | x x *| } 2 gij()xxx{ ()}g ij() i () xx j () Basic Principles of Unsupervised and Supervised Learning Toward Deep Learning
Shun‐ichi Amari (RIKEN Brain Science Institute) collaborators: R. Karakida, M. Okada (U. Tokyo) Self‐Organization + Supervised Learning Deep Learning RBM: Restricted Boltzmann Machine Auto‐Encoder, Recurrent Net
Dropout Contrastive divergence Boltzmann Machine RBM: Restricted Boltzmann Machine RBM Simple Hebbian Self‐Organization self‐organization of Self‐Organization Interaction of Hidden Neurons Bayesian Duality in Exponential Family
Data x Parameter (higher‐order concepts)
Curved exponential family RBM = h, x = Wv x = v = hW Gaussian Boltzmann Machine Equilibrium Solution Equilibrium Solution
General Solution
• • diagonalized by
You can choose m(≤ k) eigen values form
Stable Solution the case of m = k Contrastive Divergence
• RBM Visible v Hidden h ‐ 2‐layered probabilistic neural network ‐ No connections within layers W
• How to train RBM Sampling Maximum Likelihood (ML) learning is hard
Input Equilibrium
Many iterations of Gibbs Sampling demand too much computational time 50 Contrastive Divergence Solution Two Manifolds Geometry of CDn (contrastive divergence) Bernoulli‐Gaussian RBM
ICA R. Karakida Simulation The number of Neurons: N = M = 2, σ = 1/2 Sources p (s) Input ICA Solution Mixing CD Uniform Output Distribution