Pattern Recognition (For Neuroimaging Data) Fundamentals
Total Page:16
File Type:pdf, Size:1020Kb
Pattern Recognition (for Neuroimaging Data) Fundamentals OHBM Educational Course Vancouver, June 25, 2017 C. Phillips, GIGA – Research, ULiège, Belgium [email protected] http://www.giga.ulg.ac.be Today’s Menu Overview • Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Overview • Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Introduction Series of images = 4D image = 3D array of feature series = series of 3D images N Many variable values Series of measurements Univariate vs. multivariate Standard univariate approach, aka. Statistical Parametric Mapping Standard Statistical Analysis (encoding) Input Voxel-wise Output GLM model Independent Correction estimation ... statistical for test at each multiple voxel comparisons Univariate statistical BOLD signal BOLD Parametric map Time Find the mapping from explanatory variable (my design matrix) to observed data (one voxel values across images). Univariate vs. multivariate Multivariate approach, aka. “pattern recognition” Input Output … Training “trained machine” Samples from Cond 1 Phase = link from image to Cond {1,2} … Samples from Cond 2 New sample Test Phase Prediction: Cond 1 or Cond 2 Find the mapping f from observed data X (one whole image) to explanatory variable y (label/score) f : X y Pattern recognition concept Data X Labels y f : X y f : x* y* Pattern recognition concepts • Classification vs regression problem – Classification → output = one discrete label e.g. condition A/B, healthy/diseased, etc. → y{1,1} – Regression → output = one continuous value e.g. age, score, level, etc. → y[,] • Supervised vs unsupervised learning At training, you know – both input & output → Supervised – only the input → Unsupervised Pattern recognition framework Input (brain scans) Output (label or score) No mathematical X1 model available y1 X2 y2 X3 Machine y3 Learning Methodology Computer-based procedures that learn a function from a series of examples Learning/Training Phase Training Examples: Optimize the parameters of (X ,y ),...,(X ,y ) 1 1 s s a function f such that f f(xi) y i Testing Phase Test Example X f(X*) = y* * Prediction Overview • Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Data representation Image = 3D matrix of voxels Whole brain volume “Feature Vector” or “data point” Data dimensions • dimensionality of a sample/“data point” = #voxels considered • number of samples/“data points” = #scans/images considered Linear classification example Only 2 voxels L R 2 Class A Class B Class A Class B voxel 4 2 Sample 2 Sample 3 Sample 1 Sample 2 Sample 3 Sample 4 Sample with unknown label Sample 4 Class ? 2 Sample 1 Sample * • Hyper-plane = decision boundary 4 voxel 1 • Training = fixing hyper-plane parameters • When #features(=#voxels) » #samples(=#scans) → “ill posed problem” Data representation Solutions to the dimensionality problem: • Region of interest (ROI) • Searchlight = scan all locally defined ROIs • Feature selection strategies • Kernel Methods + Regularization – computational shortcut – efficient solution of ill-conditioned problems Kernel matrix Kernel matrix = “similarity measure” Image 3 Linear kernel → dot product K(7,3)=(4*-2)+(1*3)=-5 4 1 Image 7 -2 3 K • 2 patterns xi and xj → a real number characterizing their similarity • simple similarity measure = a dot product → linear kernel • kernel matrix size : #samples x #samples Support Vector Machine SVM • Relies on kernel representation • “maximum margin”, ρ, classifier ⊤ (w xi + b) > 0 ⊤ (w xi + b) < 0 (w,b) N w = a x å i i Data: <xi,yi>, i=1,..,N i=1 d Observations: xi R “Support vectors” have αi ≠ 0 Labels: yi {-1,+1} Class prediction Samples of class 1 “Weight vector” or “Discrimination map” Voxel 1 Voxel 2 … Voxel 1 Voxel 2 w1 = +5 w2 = -3 Samples of class 2 Training Voxel 1 Voxel 2 Voxel 1 Voxel 2 … f(x) = (w1*v1+w2*v2)+b = (+5*0.5-3*0.8)+0 Testing = 0.1 New example v1 = 0.5 v2 = 0.8 Positive value Class 1 Kernel methods SVM Hard binary classification – simple & efficient, quick calculation but – NO ‘grading’ in output {-1, 1} Gaussian Processes probabilistic model – more complicated, slower calculation but – returns a probability [0 1] – can be multiclass Other approaches: Deep learning, tree-based methods, etc. Validation principle Data samples = {labels , features} label features: feat 1 feat 2 feat 3 … feat m 1 1 … 2 -1 … Trained 3 -1 … … … classifier … … … … i 1 … Training set Training i+1 1 … 1 i+2 1 … -1 … … … … … … … … n -1 … -1 Test set Test True Predicted label label Prediction accuracy evaluation M-fold cross-validation • Split data in 2 sets: “train” & “test” evaluation on 1 “fold” • Rotate partition and repeat evaluations on M “folds” • Applies to scans/events/blocks/subjects/… Leave-”X”-out approach Classification validation Predicted Confusion matrix class 1 0 Accuracy estimation 1 A B class 0 C D • Class 1 accuracy, p1 = A/(A+B) Actual • Class 0 accuracy, p0 = D/(C+D) • Total accuracy, p = (A+D)/(A+B+C+D) Other criteria • Sensitivity/specificity • Positive/Negative Predictive Value (PPV/NPV) • Balanced accuracy = (p1 + p0)/2 Regression validation Consider N folds CV: • prediction error in one fold • mean across all folds Out-of-sample “mean squared error” (MSE) n y Other measure: target, Correlation between predictions and targets predicted, f(xn) Inference by permutation testing • H0: “no link between features and target” • Test statistic, e.g. CV accuracy • Estimate distribution of test statistic under H0 Random permutation of labels Estimate CV accuracy Repeat M times • Calculate p-value as Overview • Introduction – Uni- vs. multi-variate – Pattern recognition framework • Pattern Recognition – Data representation – Linear machine & Kernel – SVM principles – Validation & inference • Conclusion Conclusions Univariate Multivariate • 1 voxel • 1 volume • Target → Data • Data → Target • Look for difference or • Look for similarity or correlation score • General Linear Model • Specific machine • GLM inversion → • Machine training → parameter & error terms machine parameters • Calculate contrast of • Estimate prediction interest accuracy with CV • Voxel/cluster activation • Sample label prediction inference → localisation inference → no localisation …much more to come • Pradeep → cross-validation, how & why. • Carsten → permutation & statistical inference • Jessica → weight maps & their interpretation • Jo → fMRI, BOLD signal & hrf • Bertrand → fMRI, stability of your results • Georg → multi-subject data • Olivier → multi-modal data & disease prediction • Janaina → psychiatric applications • Vince → deep learning approaches in neuro-imaging • Moritz → MVPA models interpretation Thank you for your attention! Any question? Thanks to the PRoNTo Team for the borrowed slides. References Reviews: • Haynes and Rees (2006) Decoding mental states from brain activity in humans. Nat. Rev. Neurosci., 7, 523-534 • Pereira, Mitchell, Botnivik (2009) Machine learning classifiers and fMRI: a tutorial overview. Neuroimage,45, S199-S209 Books: • Hastie , Tibishirani, Friedman (2003) Elements of Statistical Learning. Springer • Shawe-Taylor and Christianini (2004) Kernel Methods for Pattern Analysis. Cambridge: Cambridge University Press. • Bishop, Jordan, Kleinberg, Schölkopf (2006) Pattern Recognition and Machine learning. Springer Machines: • Burges (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2:121–167. • Rasmussen, Williams (2006) Gaussian Processes for Machine Learning. The MIT Press. • Tipping (2001) Sparse Bayesian Learning and the Relevance Vector Machine Journal of Machine Learning Research, 1, 211-244 • Breiman (1996) Bagging Predictors Machine Learning, 24, 123-140 • Rakotomamonjy, A., Bach, F., Canu, S., & Grandvalet, Y. (2008). SimpleMKL. Journal of Machine Learning Research, 9, 2491-2521. .