Spatial Filtering for Brain Computer Interfaces
Total Page:16
File Type:pdf, Size:1020Kb
Spatial Filtering for Brain Computer Interfaces: A Comparison between the Common Spatial Pattern and Its Variant He He and Dongrui Wu, Senior Member, IEEE Key Laboratory of the Ministry of Education for Image Processing and Intelligent Control, School of Automation, Huazhong University of Science and Technology, Wuhan, China E-mail: [email protected], [email protected] Abstract—The electroencephalogram (EEG) is the most popu- CSP finds a set of spatial filters that can achieve good lar form of input for brain computer interfaces (BCIs). However, discrimination among different classes. At the same time, it it can be easily contaminated by various artifacts and noise, can reduce the dimensionality of the EEG signals. Different e.g., eye blink, muscle activities, powerline noise, etc. Therefore, the EEG signals are often filtered both spatially and temporally approaches have been proposed in the literature to compute to increase the signal-to-noise ratio before they are fed into a them [2], [8], [12], [17]. In this paper we review some typical machine learning algorithm for recognition. This paper considers approaches, and propose a new approach by solving a slightly spatial filtering, particularly, the common spatial pattern (CSP) different objective function in the optimization. We then use filters for EEG classification. In binary classification, CSP seeks two motor imagery datasets in BCI Competition IV to compare a set of filters to maximize the variance for one class while minimizing it for the other. We first introduce the traditional these approaches. solution, and then a new solution based on a slightly different The rest of this paper is organized as follows: Section II objective function. We performed comprehensive experiments on introduces existing approaches for computing the CSP filters. motor imagery to compare the two approaches, and found that Section III introduces the proposed Stiefel manifold based generally the traditional CSP solution still gives better results. We approach. Section IV compares the performance of these also showed that adding regularization to the covariance matrices can improve the final classification performance, no matter which approaches in motor imagery applications. Finally, Section V objective function is used. draws conclusions. Index Terms—Brain computer interface, common spatial pat- tern, motor imagery, Stiefel manifold, regularization II. COMMON SPATIAL PATTERN (CSP) I. INTRODUCTION Without loss of generality, this paper considers binary classification only. However, multi-class classification can be Brain computer interface (BCI) [16] provides a direct com- extended from binary classification by the one-versus-one munication pathway between a user and an external device, approach or the one-versus-the-rest approach [17]. such that the device can recognize the user’s brain status and × Let X ∈ RC T be an EEG epoch, where C is the number respond accordingly. BCIs have found successful applications of channels and T the number of time samples. Assume Class in robotics, text input, games, healthcare, etc [13], [15]. c has N epochs, and X is the ith EEG epoch in Class c. The electroencephalogram (EEG) is the most popular form c c,i Then, the class mean covariance matrices are: arXiv:1808.06533v1 [eess.SP] 8 Aug 2018 of BCI input as it is easy to acquire (no surgery) and offers high temporal resolution. However, there are still many N0 1 T challenges for wide-spread real-world applications of EEG- Σ0 = X0,iX (1) N X 0,i based BCIs [9], [11]. An important one is related to the EEG 0 i=1 signal quality, as EEG signals can be easily contaminated N1 1 T by various artifacts and noise such as muscle movements, Σ1 = X1,iX1,i (2) N X eye blinks, heartbeats, environmental electromagnetic fields, 1 i=1 etc. Therefore, it is important to filter the EEG signals both ′ CSP identifies a spatial filtering matrix W ∈ RC ×C (C′ < spatially and temporally to increase the signal-to-noise ratio. C) from Σ and Σ , which projects the EEG signals from This paper focuses on the common spatial pattern (CSP) fil- 0 1 the original C-dimensional space to a lower C′-dimensional ters [2], [7], [8], [12], [14], [17], [18] for spatial filtering. CSP space: was first proposed by Koles et al. [8] to extract discriminative ′ EEG features from two human populations (normal people and X∗ = W T X ∈ RC ×T (3) patients). Mueller-Gerking et al. [12] extended it to single-trial classification of motor imagery EEG, which remains the most Each column of W is a spatial filter, and it is designed popular and important application field of CSP filters. such that for the projected signal, the variance for one class is maximized, meanwhile the variance for the other class is 4) Sort the diagonal terms of Λ0 in descending order, and minimized. adjust the columns of U accordingly. Assemble V as the Different motivations and approaches for computing W first and last C′/2 columns of U. have been proposed in the literature. Some typical ones are 5) Compute W = P T V . introduced next. B. Approach 2 A. Approach 1 In Approach 1, we need to first calculate the whitening ma- The earliest CSP approach for EEG processing was pro- trix P , then the orthogonal transformation matrix U, assemble posed by Koles et al. [8]. It first forms a composite covariance V from U, and finally obtain W = P T V . Blankertz et al. [2] matrix: proposed a simpler solution to compute W directly from Σ1 and Σ2. Σ=Σ0 +Σ1 (4) According to [6], we can simultaneously diagonalize the which has the following singular value decomposition (SVD): two mean covariance matrices: T Σ= UΛU T (5) V Σ0V =Λ0 (13) V T Σ V =Λ (14) where U is an orthonormal matrix whose columns are nor- 1 1 malized eigenvectors of Σ, and Λ is a diagonal matrix, whose where Λ0 +Λ1 = I. diagonal terms are the corresponding eigenvalues. Assume the diagonal terms of Λ0 has been sorted in It then constructs a whitening matrix descending order. Then, the diagonal terms of Λ1 must be − 1 in the ascending order. The first a few columns of V account P =Λ 2 U T (6) for the maximum variance in Σ0 and the least variance in Σ1, to equalize the variances in the space spanned by the eigenvec- so they are very useful in discriminating between Σ0 and Σ1. Similarly, the last a few columns of account for the least tors. Applying the whitening transformation to both Σ0 and V variance in and the maximum variance in , so they are Σ1, we have Σ0 Σ1 also very useful in discriminating between Σ0 and Σ1. T S0 = P Σ0P (7) In summary, Blankertz et al. [2] used the following proce- ′ T ∈ RC ×C S1 = P Σ1P (8) dure to compute CSP matrix W : 1) Solve the following generalized eigenvalue problem S0 + S1 = I (9) w w where I is the identify matrix. Σ0 = λΣ1 (15) It can be shown that S and S share the same eigenvectors, 0 1 to obtain λi and the corresponding wi, i =1, ..., C. and the sum of the corresponding eigenvalues always equals 2) Sort λi and the corresponding wi in descending order. 1 [6], i.e., if we perform eigen decomposition on S0 by w w ′ w ′ w 3) Construct W = [ 1, ..., C , C− C +1, C], i.e., W ′ 2 2 T C w S0 = UΛ0U (10) uses the first and last 2 i as its columns. Then we should also have C. Discussions T Although Approaches 1 and 2 use different procedures to S1 = UΛ1U (11) compute W , they actually give the same results. Λ0 +Λ1 = I (12) For Approach 1, from (7) and (10) we have: where U is an orthogonal matrix whose columns are the T T S0 = P Σ0P = UΛ0U (16) normalized eigenvectors, i.e., UU T = I. Assume the diagonal terms of Λ0 has been sorted in which can be rewritten as: descending order. Then, the diagonal terms of Λ must be − 1 Σ P T U = P 1UΛ (17) in the ascending order. The first column of U accounts for the 0 0 maximum variance in S0 and the least variance in S1, so it is Since P is the whitening matrix for the composite covari- very useful in discriminating between S0 and S1. Similarly, ance matrix Σ, i.e., the last column of U accounts for the least variance in S 0 P ΣP T = I (18) and the maximum variance in S1, so it is also very useful in discriminating between S0 and S1. In practice we usually we have select a few eigenvectors corresponding to the maximum −1 T eigenvalues, and also a few eigenvectors corresponding to the P =ΣP (19) minimum eigenvalues, to form W . Substituting (19) into (17), it follows that: In summary, Koles et al. [8] used the following procedure ′× T T to compute the CSP matrix W ∈ RC C : Σ0P U =ΣP UΛ1 (20) 1) Compute Σ in (4) and its SVD in (5). i.e., 2) Compute the whitening matrix P in (6). − Σ 1Σ (P T U) = (P T U)Λ (21) 3) Compute S0 in (7) and its SVD in (10). 0 1 So the filtering matrix P T U consists of the eigenvectors of IV. EXPERIMENTS −1 −1 Σ Σ0, which are identical to the eigenvectors of Σ1 Σ0. In this section we compare the SM approach with the T T W = P V consists a subset of P U. traditional CSP approach on two different motor imagery For Approach 2, (15) can be rewritten as: datasets, using two classifiers. −1 w w Σ1 Σ0 = λ (22) A.