Pattern Recognition (for Neuroimaging ) Fundamentals

OHBM Educational Course Vancouver, June 25, 2017

C. Phillips, GIGA – Research, ULiège, Belgium [email protected] http://www.giga.ulg.ac.be Today’s Menu Overview

• Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Overview

• Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Introduction

Series of images = 4D image = 3D array of feature series = series of 3D images

N Many variable values

Series of measurements Univariate vs. multivariate

Standard univariate approach, aka. Statistical Parametric Mapping

Standard Statistical Analysis (encoding) Input Voxel-wise Output GLM model Independent Correction

estimation ... statistical for test at each multiple voxel comparisons Univariate statistical

BOLD signal BOLD Parametric map Time

Find the mapping from explanatory variable (my design matrix) to observed data (one voxel values across images). Univariate vs. multivariate

Multivariate approach, aka. “pattern recognition”

Input Output

… Training “trained machine” Samples from Cond 1 Phase = link from image to Cond {1,2} … Samples from Cond 2

New sample Test Phase Prediction: Cond 1 or Cond 2

Find the mapping f from observed data X (one whole image) to explanatory variable y (label/score) f : X  y Pattern recognition concept

Data X

Labels y f : X  y

f : x*  y* Pattern recognition concepts

• Classification vs regression problem – Classification → output = one discrete label e.g. condition A/B, healthy/diseased, etc. → y{1,1}

– Regression → output = one continuous value e.g. age, score, level, etc. → y[,]

• Supervised vs unsupervised learning At training, you know – both input & output → Supervised – only the input → Unsupervised Pattern recognition framework

Input (brain scans) Output (label or score) No mathematical X1 model available y1 X2 y2 X3 Machine y3 Learning Methodology

Computer-based procedures that learn a function from a series of examples Learning/Training Phase Training Examples: Optimize the parameters of (X ,y ),...,(X ,y ) 1 1 s s a function f such that f

f(xi)  y i Testing Phase Test Example X f(X*) = y* * Prediction Overview

• Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Data representation Image = 3D matrix of voxels

Whole brain volume “Feature Vector” or “data point”

Data dimensions • dimensionality of a sample/“data point” = #voxels considered • number of samples/“data points” = #scans/images considered Linear classification example

Only 2 voxels

L R 2

Class A Class B Class A Class B voxel 4 2 Sample 2 Sample 3 Sample 1 Sample 2 Sample 3 Sample 4 Sample with

unknown label Sample 4 Class ? 2 Sample 1

Sample *

• Hyper-plane = decision boundary 4 voxel 1 • Training = fixing hyper-plane parameters

• When #features(=#voxels) » #samples(=#scans) → “ill posed problem” Data representation

Solutions to the dimensionality problem: • Region of interest (ROI)

• Searchlight = scan all locally defined ROIs • Feature selection strategies • Kernel Methods + Regularization – computational shortcut – efficient solution of ill-conditioned problems

Kernel matrix

Kernel matrix = “similarity measure” Image 3 Linear kernel → dot product K(7,3)=(4*-2)+(1*3)=-5 4 1 Image 7

-2 3 K

• 2 patterns xi and xj → a real number characterizing their similarity • simple similarity measure = a dot product → linear kernel • kernel matrix size : #samples x #samples Support Vector Machine

SVM • Relies on kernel representation • “maximum margin”, ρ, classifier

⊤ (w xi + b) > 0 ⊤ (w xi + b) < 0 (w,b)

N w = a x å i i Data: , i=1,..,N i=1 d Observations: xi  R “Support vectors” have αi ≠ 0 Labels: yi  {-1,+1}

Class prediction

Samples of class 1 “Weight vector” or “Discrimination map” Voxel 1 Voxel 2 … Voxel 1 Voxel 2

w1 = +5 w2 = -3 Samples of class 2 Training

Voxel 1 Voxel 2 Voxel 1 Voxel 2 …

f(x) = (w1*v1+w2*v2)+b = (+5*0.5-3*0.8)+0 Testing = 0.1 New example v1 = 0.5 v2 = 0.8 Positive value  Class 1 Kernel methods

SVM  Hard – simple & efficient, quick calculation but – NO ‘grading’ in output {-1, 1} Gaussian Processes  probabilistic model – more complicated, slower calculation but – returns a probability [0 1] – can be multiclass

Other approaches: Deep learning, tree-based methods, etc.

Validation principle Data samples = {labels , features} label features:

feat 1 feat 2 feat 3 … feat m

1 1 … 2 -1 … Trained

3 -1 …

… classifier

… … … …

i 1 … Training set Training

i+1 1 … 1

i+2 1 … -1

… … … …

n -1 … -1 Test set Test True Predicted label label Prediction accuracy evaluation M-fold cross-validation

• Split data in 2 sets: “train” & “test”  evaluation on 1 “fold”

• Rotate partition and repeat  evaluations on M “folds”

• Applies to scans/events/blocks/subjects/…  Leave-”X”-out approach

Classification validation

Predicted Confusion matrix class

1 0 Accuracy estimation 1 A B

class 0 C D • Class 1 accuracy, p1 = A/(A+B) Actual

• Class 0 accuracy, p0 = D/(C+D) • Total accuracy, p = (A+D)/(A+B+C+D)

Other criteria • Sensitivity/specificity • Positive/Negative Predictive Value (PPV/NPV)

• Balanced accuracy = (p1 + p0)/2 Regression validation

Consider N folds CV: • prediction error in one fold • across all folds

 Out-of-sample “mean

squared error” (MSE)

n y

Other measure: target, Correlation between

predictions and targets predicted, f(xn) Inference by permutation testing

• H0: “no link between features and target” • Test , e.g. CV accuracy

• Estimate distribution of test statistic under H0  Random permutation of labels  Estimate CV accuracy  Repeat M times • Calculate p-value as

Overview

• Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Conclusions

Univariate Multivariate • 1 voxel • 1 volume • Target → Data • Data → Target • Look for difference or • Look for similarity or correlation score • General • Specific machine • GLM inversion → • Machine training → parameter & error terms machine parameters • Calculate contrast of • Estimate prediction interest accuracy with CV • Voxel/cluster activation • Sample label prediction inference → localisation inference → no localisation …much more to come

• Pradeep → cross-validation, how & why. • Carsten → permutation & • Jessica → weight maps & their interpretation • Jo → fMRI, BOLD signal & hrf • Bertrand → fMRI, stability of your results • Georg → multi-subject data • Olivier → multi-modal data & disease prediction • Janaina → psychiatric applications • Vince → deep learning approaches in neuro-imaging • Moritz → MVPA models interpretation Thank you for your attention!

Any question?

Thanks to the PRoNTo Team for the borrowed slides.  References

Reviews: • Haynes and Rees (2006) Decoding mental states from brain activity in humans. Nat. Rev. Neurosci., 7, 523-534 • Pereira, Mitchell, Botnivik (2009) Machine learning classifiers and fMRI: a tutorial overview. Neuroimage,45, S199-S209 Books: • Hastie , Tibishirani, Friedman (2003) Elements of Statistical Learning. Springer • Shawe-Taylor and Christianini (2004) Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press. • Bishop, Jordan, Kleinberg, Schölkopf (2006) Pattern Recognition and Machine learning. Springer Machines: • Burges (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167. • Rasmussen, Williams (2006) Gaussian Processes for Machine Learning. The MIT Press. • Tipping (2001) Sparse Bayesian Learning and the Relevance Vector Machine Journal of Machine Learning Research, 1, 211-244 • Breiman (1996) Bagging Predictors Machine Learning, 24, 123-140 • Rakotomamonjy, A., Bach, F., Canu, S., & Grandvalet, Y. (2008). SimpleMKL. Journal of Machine Learning Research, 9, 2491-2521.