Probabilistic Graphical Models and Their Inference: with Applications on Functional Network Estimation Wei Liu! University of Utah Graphical Model Applications

graph Theory theory

probabilistic ! graphical models Image Understanding! Web Search! ! Natural Language Processing! Chemical Reaction!

2 Unified Models

z1 z2 zn 1 zn zn+1 ⇡ zn ⇤

x1 x2 xn 1 xn xn+1 xn µ

N Hidden Markov Models! Kalman Filters! Mixture of Gaussians! Neural Networks! Boltzmann Machines! Probabilistic Principal Component Analysis

(2) (2) (2) zn 1 zn zn+1

(1) (1) (1) zn 1 zn zn+1

xn 1 xn xn+1

3 Outline

Overview of probabilistic graphical model.! ! Undirected graph: Markov random field.! ! Graphical model inference on single subject fMRI.! ! Inference on group of subject.! ! Multi-Task learning on Autism patients.! ! Lesion detection.

4 Graphs

chain

regular lattice

directed graph

tree

general graph

chain graph

5 Probability <—> Directed Graph

P (A, B, C)=P (A) P (B,C A) · | = P (A) P (B A) P (C B,A) · | · | full connected

P (A, B, C)=P (A) P (B A) P (C B) A C B · | · | head to tail |

P (A, B, C)=P (A) P (B A) P (C A) B C A · | · | | tail to tail!

P (A, B, C)=P (A) P (B) P (C A, B) A B C · · | | head to head 6 Markov

=( , ) G V E V V = ,... { } (, ) G N E N G V ( )=( ) | V | N 7 Gibbs Distribution

X X G 1 1 P (X)= exp U(X) ,U(X)= V (x ). Z T c c c C

cliques examples

U(X)= (r,s) (xr,xs) V 8

• X is defined on MRF. ! • Y is assumed to be generated from X! • Inverse problem: Given Y, estimate X.

Other forms exist: conditional random field. No Bayesian interpretation.

9

What statistical question we can ask: () () regular lattice

Inference methods: directed graph

chain general tree graph

10 fMRI

J. C. Snow, Nature 2011

11 fMRI

• Blood oxygen level dependent (BOLD) indirectly measure neuronal activity.! • 3D volumes sampled at time interval. ! • Fast scan, but noisy image. ! • Spatio-temporal dependency.

12 Task v.s. Resting

• Paradigm signal.! • No paradigm signal.! • Subjects in active cognitive • Subject in scanner, eyes closed/ activity.! open to fixed cross.! • of stimulus and • Correlation between BOLD BOLD response. signals.!

paradigm BOLD BOLD

task resting-state 13 Spatial Coherent Connectivity

Voxel correla- tions y in 2d dimensional space

yij

p(y x ) ij| ij

xij xik

Connectivity map x in 2d dimensional MRF , k { } i j ( ) Original d di- | mensional im- age space

14

1 Hierarchical Model

Existing Methods:! Proposed:! • Bottom-up or top-down.! • a hierarchical graph including group • Subject is estimate and subjects.! independently.! • Joint estimation of both levels.! • Estimation is one way. • Bayesian, data driven and parameter estimation.

fMRI sub1 sub2 sub3 sub1 sub2 sub3 sub1 sub2 sub3 time courses sub1 sub2 sub3 sub1 sub2 sub3 group functional network map group sub1 sub2 sub3 group [Bellec, 2010] [Calhoun, 2001b] [Varoquaux, 2010, 2011] [Van Den Heuvel, 2008] [Beckmann, 2009] Our method: HMRF [Esposito, 2005] [Filippini, 2009] 15 A Single graph

data

sub1 sub2 sub3

sub1 sub2 sub3

between-level links network map group Within-subject links

• Within-subject piecewise constant constraint.! • Between-subject (between- level) dependency.

16 An Abstract Graphical representation

group

subjects

BOLD

J

17 Emission Distribution

• von Mises-Fisher distribution: Multivariate Gaussian on sphere.! • Gaussian mixture -> vMF mixture.

18 Inference

group

Monte Carlo Expectation Maximization

()=E [log (, ; )] | sub1 sub2 log (, ; ) Gibbs Schedule

Sampling voxels Sampling voxels in group map in subjects

19 Consistency Test by

20 Autism Classification for Multi-Sites

features

samples grp1

grp2

21 Learning Multiple Tasks

w (w ,) t N + v + w = = t

22 Lesion Detection by Active Learning

• Multi-modality, longitudinal, complex patterns. • Existing methods: high false-positive/negative, 2D, single object. • A slight user involvement significantly improves result. • Computer active, user passive (less burden).

unlabeled semi-supervised active learning 23 Multi-task Learning

24 a MRF Prior

Prior:

K = 2 (FG) K = 3 (BG)

MRF

Likelihood:

25 Active Learning Demo

26 Learning Deep structures

27 Conclusion

Multivariate distribution -> graph.! ! Unified model.! ! Inference is difficult, but approximation possible.! ! Application in hierarchical data.! ! Application in other domain: chemical reaction.! ! , Boltzmann machines.! !

28