<<

Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute

1. Function and Dually Flat Riemannian Structure 2.Invariant Geometry on Manifold of Probability Distributions 3.Geometry and Statistical Inference semi‐parametrics 4. Applications to Machine Learning and Signal Processing Geometry

-- Manifolds of Probability Distributions Mp {()}x Information Geometry

Systems Theory

Statistics Neural Networks

Combinatorics Information Sciences Physics Math. AI Vision

Riemannian Manifold Optimization Dual Affine Connections

Manifold of Probability Distributions   Information Geometry ?

 2 1  x    Spx;, px ;, exp 2  2  2   Gaussian distributions px  θ  (,)  Spx  ;θ

 Manifold of Probability Distributions

xSpx 1,2,3 n ={ ( )} p ppp123, , p 1 p 2 p 3 1

p 3 Mpx   ; 

p p1 2 Manifold and Coordinate System

coordinate transformation Examples of Coordinate systems

Euclidean space  2

 2   2 x

      

 1 2  distributions

  ;, ;, exp    Gaussian Spx px Discrete Distributions

Positive measures Divergence: Dzy:  M

D zy:0   Y Z Dzy: 0, iff z y

Not necessarily symmetric  D[z : y] = D[y : z] D zz: d z  gijdz i dz j positive‐definite Taylor expansion Various Divergences

Euclidean

f‐divergence

KL‐divergence

(α‐β)‐divergence Kullback‐Leibler Divergence quasi‐

px() Dpx[():()] qx  px ()log x qx()

Dpx[ ( ) : qx ( )] 0 =0 iff px ( ) qx ( ) Dp[:] q Dq [:] p (,  ) divergence 

    Dpq[:]{} p  q pq ,  iiii

 :  divergence   1: -divergence Manifold with Convex Function S : coordinates    12,,, n 

   : convex function

1 2     i 2  negative entropy  ppxpxdx   log   energy

mathematical programming, control systems physics, engineering, vision, economics Riemannian metric and flatness (affine structure) {,S  (), } Bregman divergence D  ,grad       

1 Dd,    gdd ij 2  ij  g   ,   ij i j i  i

Flatness (affine) : geodesic (not Levi-Civita) Legendre Transformation   ,   ii i i

  one-to-one

i    i 0 i ii i () max{ ()}  ,    i i

D  ,    Two affine coordinate systems  , 

 : geodesic (e-geodesic)

 : dual geodesic (m-geodesic) “dually orthogonal”

jj ii, 

 i  i i ,   i

* X  YZ,,, XX YZ    Y  Z  Bi‐orthogonality Dually flat manifold

 -coordinates   -coordinates potential functions  ,     22  ij ggij    ij  ij     ii 0  

: px, exp ii x  : cumulant generating function  : negative entropy 

canonical di vergence D(P: P')=   '  ii ' Exponential Family

px( , ) exp{ x ( )}  ( ) : convex function, free-energy

Gaussian:

Negative entropy natural parameter expectation parameter x : discrete X = {0, 1, …, n}

SpxxXn { ( ) | }: exponential family nn i px() pii () x exp[ x i  ()]exp x   ii01  i log(ppiii /00 ); x ( x ); ( ) log p

 iiiEx[ ] p (η )= p ii log p Two geodesics

Tangent directions Function space of probability distributions: topology {p(x)} Exponential Family Pythagorean Theorem (dually flat manifold)

DPQ::: DQR  DPR 

Euclidean space: self-dual   

1 2    2  i Projection Theorem

S qDps arg minsM [ : ] p m-geodesic s q M qDsp arg minsM [ : ] e-geodesic Projection Theorem

minDPQ :  QM Q = m-geodesic projection of P to M

minDQP : QM Q’ = e-geodesic projection of P to M Convex function – Bregman divergence – Dually flat Riemannian divergence

Dually flat R‐manifold –convex function –canonical divergence KL‐divergence Exponential family – Bregman divergence Banerjee et al Invariance Spx   , 

Invariant under different representation

2 yyx ,, py   p xpxdx,,  12 |(,)p ypydy (,)|2  12 Invariant divergence (manifold of probability Chentsov Amari ‐Nagaoka distributions; Spx { (,) } )

ykx   : sufficient

D pxqxXX::   Dpyqy YY    Invariance ‐‐‐ characterization of f‐divergence

Csiszar 1 n pi :

p :   1 2 m

A pp p  ()p   i iA  A A DDp ::qpq   A A DDp ::qpq  

 pcqiAii ; 

p :

q :

Invariance ⇒ f‐divergence Csiszar f‐divergence Ali‐Silvey Morimoto

qi Dpffipq:,   pi

fu : convex, f 1 0,

DcDcfpq:: f  pq fu()

fu   fu  cu 1 u ff110 '''  ; f11 1 Theorem

An invariant separable divergence belongs to the class of f‐divergence.

Separable divergence: Dkpq[:]pq  (ii , )

qi kpq (ii , ) pf i ( ) pi divergence

S  {p} : space of probability distributions invariance dually flat space invariant divergence Flat divergence convex functions F‐divergence Bregman Fisher inf metric KL‐divergence Alpha connection p(x) D[p : q] = p(x)log { }dx  q(x)  ‐Divergence: why? flat & invariant in   Sn1 421 fu( ) {1 u2 } (1 u ),  1 11 2  KL-divergence fu() u log u ( u 1) p  i   D[:]pq { piiilog} p q qi Space of positive measures : vectors, matrices, arrays    Sppnnp , ii 0 : ( 1 holds)

f‐divergence Bregman divergence

α‐divergence f divergence of S

q   i Dpffipq:0   pi

Df pq:0  p  q 

not invariant under fu  fu  cu 1   divergence 1111  Dpq[:] { p  q  p 22 q  }  22iiii

KL‐divergence p  i   Dp[:] q  { piii log p q } qi

 S : dually flat S : not dually flat (except    1)

 pi 1 2 1  ri 1 Metric and Connections Induced by Divergence Riemannian metric (Eguchi) 1 gDDgzzyzyz : : : =  (z - y )(z - y ) ij i jyz 2 ij i i j j

affine connections {,* }  zzy' D : ijk i j k yz  '  ii,  zyii  zzy'' D : ijk i j k yz Invariant geometrical structure Spx   ,  alpha‐geometry (derived from invariant divergence)

  gEllij  i j      TElllijk i j k  lpxlog , ; i  i

α‐connection

 Levi‐civita: ijki,;j kT  ijk

  : dually coupled

 XYZ,,,XX YZ  Y  Z * Duality: X  YZ,,, XX YZ    Y  Z 

  kijg  kij  kji

 ijk ijkT ijk

MgT,, Riemannian Structure ds2  g() dij d ij  dGT ( ) d 

Gg() (ij ) Euclidean GE

Fisher information Affine Connection covariant derivative

XcYXY,  geodesic X X 0, X=X(t)

sgdd () ij   ij minimal distance non-metric straight line Duality

 ij XY, X ,  Y  XY ,  gXYij

X  YZ,,,  YZ    Y * Z  * XX Y X Y X

Riemannian geometry:   Dual Affine Connections  ,    * e‐geodesic (,  ) logrxt , t log px   1 t lo g qx  ct 

m‐geodesic rxt ,1 tpx   tqx  

qx

px Mathematical structure of Spx   , 

gEll   ij i j  {M,g,T}    TElllijk  i j  k 

 lpxlog , ; i  i  -connection

 ijkijk,;  T ijk

  : dually coupled

 XYZ,,,XX YZ  Y  Z α‐geometry

Dual Foliations k‐cut

Two neurons: {p00,,,ppp 01 10 11} x 1 0011000101101 x2 0100100110100

x3 0101101001010

firing rates: rrr1212,; correlation—covariance? Correlations of Neural Firing

x  x1 2 pxx 12,  2

pppp00,,, 10 01 11 1 rp1 1 p 10 p 11 firing rates correlations rp2 1 p 01 p 11

pp   log 11 00 {(rr12 , ), } pp 10 01 orthogonal coordinates Independent Distributions

xx12,0,1

Spxx {(12 , )}

M  {(qx12 )( qx )} two neuron case rrr,, ;  , , 1 2 12 1 2 12 pp rrrr1 log00 11 log 12 12 1 2 12  p01prrrr 10 1 12 2 12 rfrr ,, 12 1 2     rt12 frtrt 1,, 2  Decomposition of KL-divergence

correlations D[p:r] = D[p:q]+D[q:r] p

q p,q: same marginals  , r 12 independent r,q: same correlations 

p()x Dp[:] r  px ()log x qx() pairwise correlations

covariance: crrrij ij i j not orthogonal

independent distributions

rrrij i j, r ijk rrr i j k ,

How to generate correlated spikes? (Niebur, Neural Computation [2007])

higher-order correlations Orthogonal higher‐order correlations

   ii, j ; , 1n   r  rrij, i ; ,r1n Population and Synfire

x1 x2 xn Neurons

xuii 1 

uEuuiij Gaussian [ ]   Synfiring

ppxx(x ) (1 ,...,n ) 1 rx qr n  i qr()

r Input‐output Analysis

Gross product consumption Relations among industires (K. Tsuda and R. Morioka) Mathematical Problems

M submanifold of S ? Hong van Le

{M, g} {M, g, T} dually flat J. Armstrong

Affine Hessian manifold Almost complex structure