MATHEMATICAL ENGINEERING TECHNICAL REPORTS Spectrally
Total Page:16
File Type:pdf, Size:1020Kb
MATHEMATICAL ENGINEERING TECHNICAL REPORTS Spectrally weighted Common Spatial Pattern algorithm for single trial EEG classification Ryota TOMIOKA, Guido DORNHEGE, Guido NOLTE, Benjamin BLANKERTZ, Kazuyuki AIHARA, and Klaus-Robert MULLER¨ METR 2006–40 July 2006 DEPARTMENT OF MATHEMATICAL INFORMATICS GRADUATE SCHOOL OF INFORMATION SCIENCE AND TECHNOLOGY THE UNIVERSITY OF TOKYO BUNKYO-KU, TOKYO 113-8656, JAPAN WWW page: http://www.i.u-tokyo.ac.jp/mi/mi-e.htm The METR technical reports are published as a means to ensure timely dissemination of scholarly and technical work on a non-commercial basis. Copyright and all rights therein are maintained by the authors or by other copyright holders, notwithstanding that they have offered their works here electronically. It is understood that all persons copying this information will adhere to the terms and constraints invoked by each author’s copyright. These works may not be reposted without the explicit permission of the copyright holder. Spectrally weighted Common Spatial Pattern algorithm for single trial EEG classification Ryota TOMIOKA1;2, Guido DORNHEGE2, Guido NOLTE2, Benjamin BLANKERTZ2, Kazuyuki AIHARA1, and Klaus-Robert MULLER¨ 2;3 1 Dept. Mathematical Informatics, IST, The University of Tokyo, Japan 2 Fraunhofer FIRST.IDA, Berlin, Germany 3 Technical University Berlin, Germany [email protected] July 5, 2006 Abstract We propose a simultaneous spatio-temporal filter optimization algo- rithm for the single trial ElectroEncephaloGraphy (EEG) classification problem. The algorithm is a generalization of the Common Spatial Pat- tern (CSP) algorithm, which incorporates non-homogeneous weighting of the cross-spectrum matrices. We alternately update the spectral weighting coefficients and the spatial projection. The cross validation results of our SPECtrally-weighted CSP (SPEC-CSP) algorithm on 162 EEG datasets show significant improvements when compared to wide-band filtered CSP and similar accuracy as Common Sparse Spec- tral Spatial Pattern (CSSSP), however, with far less computational cost. The proposed method is at the same time highly interpretable and modular because the temporal filter is parameterized in the fre- quency domain. The possibility of incorporating any prior filter opens up the applicability of the method far beyond brain signals. 1 Introduction The goal of the Brain-Computer Interface (BCI) research is to provide a direct control pathway from human intentions reflected in their brain signals to computers. Recently, a considerable amount of effort has been done in the development of BCI systems [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. We will be focusing on non-invasive, electroencephalogram (EEG) based BCIs. Such a system not only provides disabled people more direct and natural 1 control over a neuro-prosthesis or over a computer application (e.g. [3, 4]) but also opens up an opportunity for healthy people to communicate solely by their intentions. The study of BCIs consists of two parts; the first part is the techniques that development of such a system requires and the second part is the rela- tionship between the system and the user. The design of an experimental paradigm or a study on subject-to-subject or session-to-session variability belongs to the second category, whereas a feature extraction technique (e.g., this study) or a classification algorithm belongs to the first category. Al- though novel techniques are often more general in the sense that they could be applied to other types of problems, yet the importance of interpretability and transparency of the method as the basis of second type of BCI studies is often neglected. Recently machine learning approaches to BCI have proven to be effective by making the subject training required in a classical subject “conditioning” framework unnecessary and allow to compensate for the high inter-subject variability. The task is to extract subject-specific discriminative patterns from high- dimensional spatio-temporal EEG signals. We study a BCI based on the motor imagination paradigm. Motor imagination can be captured through spatially localized band-power modulation in ¹- (7-15Hz) and ¯- (15-30Hz) rhythms; underlying neuro-physiology is well known as Event Related Desyn- chronization (ERD) [14]. With respect to the topographic patterns of brain rhythm modulations, the Common Spatial Pattern (CSP) (see [15, 16, 17]) algorithm, which was first introduced as a decomposition method that finds projections common to two states of brain activity (e.g., abnormal or nor- mal) and afterwards successfully applied to the classification problem of the two states, has also proven to be very useful for motor-imagery BCI. On the other hand, the frequency band on which the CSP algorithm operates, has been either selected manually or unspecifically set to a broad band fil- ter [17, 5]. Recently, Lemm et al. [18] proposed a method called Common Spatio Spectral Pattern (CSSP) which applies the CSP algorithm to a time- delay embedded signal; they doubled the number of electrodes with the addition of ¿ delayed channels. The challenge in their approach was that the selection of the time-lag parameter ¿, which embodies the whole prob- lem of choosing characteristic temporal structure, was only possible through cross validation on the training set. Dornhege et al. [19] proposed a method called Common Sparse Spectral Spatial Pattern (CSSSP) which solves the CSP problem not only for the spatial projection but also for the Finite Im- pulse Response (FIR) temporal filter coefficients. The difficulty in this work was the computational inefficiency of the optimization procedure, in which solving a generalized eigenvalue problem was required for the evaluation of the objective function at every point. In this paper, we present a method for simultaneous spatio-temporal 2 filter optimization, which is an iterative procedure of solving a spectrally weighted CSP problem and updating the spectral weighting coefficients. The proposed method is highly interpretable and modular at the same time because the temporal filter is parameterized in the frequency domain. More- over it is capable of handling arbitrary prior filters based on neurophysiolog- ical insights. The proposed method outperforms wide-band filtered CSP in most datasets. Moreover, a detailed validation shows how much of the gain is obtained by the theoretically obtained filter and how much is obtained by imposing a suitable prior filter. This paper is organized as follows: in Sec. 2 the method is proposed; in Sec. 3 our novel SPEC-CSP is compared against all three recent filter methods, namely, CSP, CSSP, and CSSSP; in Sec. 4 the optimal combination of a spectral filter obtained from the statistical criterion and prior filters is investigated in detail; finally in Sec. 5 concluding remarks are given. 2 Method Let us denote by X 2 Rd£T the EEG signal of a single trial of an imaginary motor movement1, where d is the number of electrodes and T is the number of sampled time-points in a trial. We consider a binary classification problem where each class, e.g. right or left hand imaginary movement, is called the positive (+) or negative (¡) class. The task is to predict the class label for a single trial X. In this study, we use a feature vector, namely log-power features defined as follows: y y y Áj(X; wj;Bj) = log wjXBjBj X wj (j = 1;:::;J); (1) where the upper-script y denotes a conjugate transpose or a transpose for a d real matrix, wj 2 R is a spatial projection that projects the signal into a T £T single dimension, and Bj 2 R denotes the linear time-invariant temporal filter. A log-power feature Áj captures a brain-rhythm modulation, which is not only spatially localized (captured by wj) but also localized in the frequency domain (captured by Bj). The training of a classifier is composed of two steps. In the first step, the coefficients wj and Bj are optimized. In the second step, the Linear Discriminant Analysis (LDA) classifier is trained on the feature vector. Note that the whole procedure is equivalent to the conventional CSP based classifiers (see [17, 5, 20]) except that in this study the temporal filter Bj is also optimized, which was manually chosen and fixed in the previous work. 1For simplicity, we assume that the trial mean is already subtracted and the signal is scaled by the inverse square root of the number of time-points. This can be achieved by 1 ` 1 y´ a linear transformation X = p X IT ¡ 11 . T original T 3 We adopt the Common Spatial Pattern (CSP) [15, 16, 17] algorithm for the optimization of the spatial projection w. The idea is to simul- taneously diagonalize the sensor covariance matrices corresponding to two motor-imagery classes. Here, a sensor covariance refers to the covariance between channels averaged over time as well as trials2: D Ec Σ(c)(B) := XBByXy ; where angled brackets h¢ic denote expectation within a class c 2 f+; ¡g. Using the fact that a linear time-invariant temporal filter B is diagonal in y y 3 the frequency domain U BB U = diag(®1; : : : ; ®T ) , we can rewrite the sensor covariance matrix as a weighted sum of cross-spectrum matrices as follows: T 0 T X X D Ec D Ec (c) c y y y y y (c) Σ (®) := ®k hVki := ®k xˆkxˆk = XUU BB UU X = Σ (B); k=1 k=1 where U := f p1 e¡2¼ikl=T g 2 CT £T is the Fourier transformation (thus T kl h i y d y U U = IT ), xˆk 2 C is the k-th frequency component, Vk := 2 ¢ Re xˆkxˆk c d£d and hVki 2 R is the real-part of the k-th frequency component in the cross spectrum. Here, without loss of generality we only take the real-part of the cross spectrum because (a) the imaginary part cancels out since the T spectrum of the filter f®kgk=1 is symmetric around the Nyquist frequency and (b) the phase of the signal is irrelevant to the log-power feature (Eq.