Information Geometry and Its Applications Shun‐ichi Amari RIKEN Brain Science Institute
1.Divergence Function and Dually Flat Riemannian Structure 2.Invariant Geometry on Manifold of Probability Distributions 3.Geometry and Statistical Inference semi‐parametrics 4. Applications to Machine Learning and Signal Processing Information Geometry
-- Manifolds of Probability Distributions Mp {()}x Information Geometry
Systems Theory Information Theory
Statistics Neural Networks
Combinatorics Information Sciences Physics Math. AI Vision
Riemannian Manifold Optimization Dual Affine Connections
Manifold of Probability Distributions Information Geometry ?
2 1 x Spx;, px ;, exp 2 2 2 Gaussian distributions px θ (,) Spx ;θ
Manifold of Probability Distributions
xSpx 1,2,3 n ={ ( )} p ppp123, , p 1 p 2 p 3 1
p 3 Mpx ;
p p1 2 Manifold and Coordinate System
coordinate transformation Examples of Coordinate systems
Euclidean space 2
2 2 x
1 2 distributions
;, ;, exp Gaussian Spx px Discrete Distributions
Positive measures Divergence: Dzy: M
D zy:0 Y Z Dzy: 0, iff z y
Not necessarily symmetric D[z : y] = D[y : z] D zz: d z gijdz i dz j positive‐definite Taylor expansion Various Divergences
Euclidean
f‐divergence
KL‐divergence
(α‐β)‐divergence Kullback‐Leibler Divergence quasi‐distance
px() Dpx[():()] qx px ()log x qx()
Dpx[ ( ) : qx ( )] 0 =0 iff px ( ) qx ( ) Dp[:] q Dq [:] p (, ) divergence
Dpq[:]{} p q pq , iiii
: divergence 1: -divergence Manifold with Convex Function S : coordinates 12,,, n
: convex function
1 2 i 2 negative entropy ppxpxdx log energy
mathematical programming, control systems physics, engineering, vision, economics Riemannian metric and flatness (affine structure) {,S (), } Bregman divergence D ,grad
1 Dd, gdd ij 2 ij g , ij i j i i
Flatness (affine) : geodesic (not Levi-Civita) Legendre Transformation , ii i i
one-to-one
i i 0 i ii i () max{ ()} , i i
D , Two affine coordinate systems ,
: geodesic (e-geodesic)
: dual geodesic (m-geodesic) “dually orthogonal”
jj ii,
i i i , i
* X YZ,,, XX YZ Y Z Bi‐orthogonality Dually flat manifold
-coordinates -coordinates potential functions , 22 ij ggij ij ij ii 0
exponential family: px, exp ii x : cumulant generating function : negative entropy
canonical di vergence D(P: P')= ' ii ' Exponential Family
px( , ) exp{ x ( )} ( ) : convex function, free-energy
Gaussian:
Negative entropy natural parameter expectation parameter x : discrete X = {0, 1, …, n}
SpxxXn { ( ) | }: exponential family nn i px() pii () x exp[ x i ()]exp x ii01 i log(ppiii /00 ); x ( x ); ( ) log p
iiiEx[ ] p (η )= p ii log p Two geodesics
Tangent directions Function space of probability distributions: topology {p(x)} Exponential Family Pythagorean Theorem (dually flat manifold)
DPQ::: DQR DPR
Euclidean space: self-dual
1 2 2 i Projection Theorem
S qDps arg minsM [ : ] p m-geodesic s q M qDsp arg minsM [ : ] e-geodesic Projection Theorem
minDPQ : QM Q = m-geodesic projection of P to M
minDQP : QM Q’ = e-geodesic projection of P to M Convex function – Bregman divergence – Dually flat Riemannian divergence
Dually flat R‐manifold –convex function –canonical divergence KL‐divergence Exponential family – Bregman divergence Banerjee et al Invariance Spx ,
Invariant under different representation
2 yyx ,, py p xpxdx,, 12 |(,)p ypydy (,)|2 12 Invariant divergence (manifold of probability Chentsov Amari ‐Nagaoka distributions; Spx { (,) } )
ykx : sufficient statistics
D pxqxXX:: Dpyqy YY Invariance ‐‐‐ characterization of f‐divergence
Csiszar 1 n pi :
p : 1 2 m
A pp p ()p i iA A A DDp ::qpq A A DDp ::qpq
pcqiAii ;
p :
q :
Invariance ⇒ f‐divergence Csiszar f‐divergence Ali‐Silvey Morimoto
qi Dpffipq:, pi
fu : convex, f 1 0,
DcDcfpq:: f pq fu()
fu fu cu 1 u ff110 ''' ; f11 1 Theorem
An invariant separable divergence belongs to the class of f‐divergence.
Separable divergence: Dkpq[:]pq (ii , )
qi kpq (ii , ) pf i ( ) pi divergence
S {p} : space of probability distributions invariance dually flat space invariant divergence Flat divergence convex functions F‐divergence Bregman Fisher inf metric KL‐divergence Alpha connection p(x) D[p : q] = p(x)log { }dx q(x) ‐Divergence: why? flat & invariant in Sn1 421 fu( ) {1 u2 } (1 u ), 1 11 2 KL-divergence fu() u log u ( u 1) p i D[:]pq { piiilog} p q qi Space of positive measures : vectors, matrices, arrays Sppnnp , ii 0 : ( 1 holds)
f‐divergence Bregman divergence
α‐divergence f divergence of S
q i Dpffipq:0 pi
Df pq:0 p q
not invariant under fu fu cu 1 divergence 1111 Dpq[:] { p q p 22 q } 22iiii
KL‐divergence p i Dp[:] q { piii log p q } qi
S : dually flat S : not dually flat (except 1)
pi 1 2 1 ri 1 Metric and Connections Induced by Divergence Riemannian metric (Eguchi) 1 gDDgzzyzyz : : : = (z - y )(z - y ) ij i jyz 2 ij i i j j
affine connections {,* } zzy' D : ijk i j k yz ' ii, zyii zzy'' D : ijk i j k yz Invariant geometrical structure Spx , alpha‐geometry (derived from invariant divergence)
Fisher information gEllij i j TElllijk i j k lpxlog , ; i i
α‐connection
Levi‐civita: ijki,;j kT ijk
: dually coupled
XYZ,,,XX YZ Y Z * Duality: X YZ,,, XX YZ Y Z
kijg kij kji
ijk ijkT ijk
MgT,, Riemannian Structure ds2 g() dij d ij dGT ( ) d
Gg() (ij ) Euclidean GE
Fisher information Affine Connection covariant derivative
XcYXY, geodesic X X 0, X=X(t)
sgdd () ij ij minimal distance non-metric straight line Duality
ij XY, X , Y XY , gXYij
X YZ,,, YZ Y * Z * XX Y X Y X
Riemannian geometry: Dual Affine Connections , * e‐geodesic (, ) logrxt , t log px 1 t lo g qx ct
m‐geodesic rxt ,1 tpx tqx
qx
px Mathematical structure of Spx ,
gEll ij i j {M,g,T} TElllijk i j k
lpxlog , ; i i -connection
ijkijk,; T ijk
: dually coupled
XYZ,,,XX YZ Y Z α‐geometry
Dual Foliations k‐cut
Two neurons: {p00,,,ppp 01 10 11} x 1 0011000101101 x2 0100100110100
x3 0101101001010
firing rates: rrr1212,; correlation—covariance? Correlations of Neural Firing
x x1 2 pxx 12, 2
pppp00,,, 10 01 11 1 rp1 1 p 10 p 11 firing rates correlations rp2 1 p 01 p 11
pp log 11 00 {(rr12 , ), } pp 10 01 orthogonal coordinates Independent Distributions
xx12,0,1
Spxx {(12 , )}
M {(qx12 )( qx )} two neuron case rrr,, ; , , 1 2 12 1 2 12 pp rrrr1 log00 11 log 12 12 1 2 12 p01prrrr 10 1 12 2 12 rfrr ,, 12 1 2 rt12 frtrt 1,, 2 Decomposition of KL-divergence
correlations D[p:r] = D[p:q]+D[q:r] p
q p,q: same marginals , r 12 independent r,q: same correlations
p()x Dp[:] r px ()log x qx() pairwise correlations
covariance: crrrij ij i j not orthogonal
independent distributions
rrrij i j, r ijk rrr i j k ,
How to generate correlated spikes? (Niebur, Neural Computation [2007])
higher-order correlations Orthogonal higher‐order correlations
ii, j ; , 1n r rrij, i ; ,r1n Population and Synfire
x1 x2 xn Neurons
xuii 1
uEuuiij Gaussian [ ] Synfiring
ppxx(x ) (1 ,...,n ) 1 rx qr n i qr()
r Input‐output Analysis
Gross product consumption Relations among industires (K. Tsuda and R. Morioka) Mathematical Problems
M submanifold of S ? Hong van Le
{M, g} {M, g, T} dually flat J. Armstrong
Affine differential geometry Hessian manifold Almost complex structure