arXiv:1808.06533v1 [eess.SP] 8 Aug 2018 oua n motn plcto edo S filters. CSP of field most application the remains important which and EEG, popular imagery singl to motor it of extended classification [12] and al. discriminati people et extract (normal Mueller-Gerking to populations patients). human [8] two al. filteri from et features spatial Koles EEG for by [18] proposed [17], first [14], was [12], [8], [7], [2], ters t.Teeoe ti motn ofitrteEGsgasboth signals r EEG signal-to-noise the the increase filter to to temporally movements, important and muscle is spatially fie it as contaminated electromagnetic Therefore, such environmental easily etc. noise heartbeats, be blinks, and can EEG and eye artifacts the signals to surgery) various EEG related is by (no as one quality, important EEG acquire An of signal [11]. to applications [9], easy BCIs real-world m based still wide-spread is are for there it However, challenges resolution. as temporal high input offers BCI [15]. [13], of etc healthcare, games, an input, status text applicati successful brain robotics, user’s found in device, have the external BCIs recognize accordingly. an can respond device and the user that a such between pathway munication used. w is matter no function performance, objective classification ma final covariance the the improve that to can regularization found r adding better and that gives showed approaches, still differe also solution two slightly CSP while traditional the a the class compare generally tradition on to one the based imagery experimen for solution introduce comprehensive motor performed new first We function. a We the objective then other. maximize and the to solution, for filters it CSP of classification, minimizing binary set In patter classification. a in spatial EEG fed common for are the filters con particularly, they paper filtering, This before temporaspatial recognition. ratio for and noise algorithm spatially signal-to-noise learning and machine both the T filtered artifacts increase etc. often various are noise, to powerline signals by activities, EEG contaminated muscle the blink, easily Howe eye (BCIs). be e.g., interfaces computer can brain for it input of form lar oprsnbtenteCmo pta Pattern Spatial Common the between Comparison A en oo mgr,Siflmnfl,regularization manifold, Stiefel imagery, motor tern, hspprfcsso h omnsailpten(S)fil- (CSP) pattern spatial common the on focuses paper This h lcrecpaorm(E)i h otpplrform popular most the is (EEG) electroencephalogram The com- direct a provides [16] (BCI) interface computer Brain Terms Index Abstract pta itrn o ri optrInterfaces: Computer Brain for Filtering Spatial Teeetonehlga EG stems popu- most the is (EEG) electroencephalogram —The Bancmue nefc,cmo pta pat- spatial common interface, computer —Brain e aoaoyo h iityo dcto o mg Proces Image for Education of Ministry the of Laboratory Key colo uoain uzogUiest fSineadTe and Science of University Huazhong Automation, of School .I I. NTRODUCTION eH n ogu Wu, Dongrui and He He -al ee1hs.d.n [email protected] [email protected], E-mail: n t Variant Its and sls We esults. herefore, g CSP ng. (CSP) n e-trial siders trices atio. seeks son ts hich oa to any lds, ver, ons lly ve nt al d - , xeddfo iaycasfiainb h one-versus-one the [17]. approach by one-versus-the-rest the classification or binary approach from c classification extended multi-class However, only. classification fcanl and channels of prah eto Vcmae h efrac fthese of Sectio performance Finally, conclusions. applications. based the draws imagery manifold motor compares in Stiefel IV approaches proposed Section the filter approach. introduces CSP the III computing Section for approaches existing introduces compare to u IV approaches. then Competition BCI these We in optimization. datasets imagery the slightly motor a two in typica solving function some by review objective approach we new different compute paper a to this propose In and literature [17]. approaches, Different the [12], [8], signals. in [2], EEG proposed them i the been time, of have same dimensionality approaches the the At reduce classes. can different among discrimination hn h ls encvrac arcsare: matrices covariance mean class the Then, c C h original the uhta o h rjce inl h ainefroeclas one for variance the signal, projected the for that such of column Each space: eirMme,IEEE Member, Senior from ) has Let binary considers paper this generality, of loss Without h eto hsppri raie sflos eto II Section follows: as organized is paper this of rest The good achieve can that filters spatial of set a finds CSP S dnie pta leigmatrix filtering spatial a identifies CSP N X c ∈ Σ pcs and epochs, I C II. 0 R C C and dmninlsaet lower a to space -dimensional × igadItlietControl, Intelligent and sing OMMON T T hooy ua,China Wuhan, chnology, Σ W h ubro iesmls sueClass Assume samples. time of number the ea E pc,where epoch, EEG an be X Σ Σ 1 0 1 hc rjcsteEGsgasfrom signals EEG the projects which , ∗ sasailfitr n ti designed is it and filter, spatial a is = = X = c,i S N N W PATIAL 1 1 1 0 sthe is T X X i i N N X =1 =1 1 0 ∈ X X P i R 1 0 hEGeohi Class in epoch EEG th ATTERN ,i ,i C X X ′ × 1 0 T T T ,i ,i W (CSP) C ∈ C R stenumber the is ′ -dimensional C ′ × C ( nbe an C V n ′ (3) (2) (1) se < c s. s t l . is maximized, meanwhile the variance for the other class is 4) Sort the diagonal terms of Λ0 in descending order, and minimized. adjust the columns of U accordingly. Assemble V as the Different motivations and approaches for computing W first and last C′/2 columns of U. have been proposed in the literature. Some typical ones are 5) Compute W = P T V . introduced next. B. Approach 2 A. Approach 1 In Approach 1, we need to first calculate the whitening ma- The earliest CSP approach for EEG processing was pro- trix P , then the orthogonal transformation U, assemble posed by Koles et al. [8]. It first forms a composite covariance V from U, and finally obtain W = P T V . Blankertz et al. [2] matrix: proposed a simpler solution to compute W directly from Σ1 and Σ2. Σ=Σ0 +Σ1 (4) According to [6], we can simultaneously diagonalize the which has the following singular value decomposition (SVD): two mean covariance matrices: T Σ= UΛU T (5) V Σ0V =Λ0 (13) V T Σ V =Λ (14) where U is an orthonormal matrix whose columns are nor- 1 1 malized eigenvectors of Σ, and Λ is a , whose where Λ0 +Λ1 = I. diagonal terms are the corresponding eigenvalues. Assume the diagonal terms of Λ0 has been sorted in It then constructs a whitening matrix descending order. Then, the diagonal terms of Λ1 must be − 1 in the ascending order. The first a few columns of V account P =Λ 2 U T (6) for the maximum variance in Σ0 and the least variance in Σ1, to equalize the in the space spanned by the eigenvec- so they are very useful in discriminating between Σ0 and Σ1. Similarly, the last a few columns of account for the least tors. Applying the whitening transformation to both Σ0 and V variance in and the maximum variance in , so they are Σ1, we have Σ0 Σ1 also very useful in discriminating between Σ0 and Σ1. T S0 = P Σ0P (7) In summary, Blankertz et al. [2] used the following proce- ′ T ∈ RC ×C S1 = P Σ1P (8) dure to compute CSP matrix W : 1) Solve the following generalized eigenvalue problem S0 + S1 = I (9) w w where I is the identify matrix. Σ0 = λΣ1 (15) It can be shown that S and S share the same eigenvectors, 0 1 to obtain λi and the corresponding wi, i =1, ..., C. and the sum of the corresponding eigenvalues always equals 2) Sort λi and the corresponding wi in descending order. 1 [6], i.e., if we perform eigen decomposition on S0 by w w ′ w ′ w 3) Construct W = [ 1, ..., C , C− C +1, C], i.e., W ′ 2 2 T C w S0 = UΛ0U (10) uses the first and last 2 i as its columns. Then we should also have C. Discussions T Although Approaches 1 and 2 use different procedures to S1 = UΛ1U (11) compute W , they actually give the same results. Λ0 +Λ1 = I (12) For Approach 1, from (7) and (10) we have: where U is an orthogonal matrix whose columns are the T T S0 = P Σ0P = UΛ0U (16) normalized eigenvectors, i.e., UU T = I. Assume the diagonal terms of Λ0 has been sorted in which can be rewritten as: descending order. Then, the diagonal terms of Λ must be − 1 Σ P T U = P 1UΛ (17) in the ascending order. The first column of U accounts for the 0 0 maximum variance in S0 and the least variance in S1, so it is Since P is the whitening matrix for the composite covari- very useful in discriminating between S0 and S1. Similarly, ance matrix Σ, i.e., the last column of U accounts for the least variance in S 0 P ΣP T = I (18) and the maximum variance in S1, so it is also very useful in discriminating between S0 and S1. In practice we usually we have select a few eigenvectors corresponding to the maximum −1 T eigenvalues, and also a few eigenvectors corresponding to the P =ΣP (19) minimum eigenvalues, to form W . Substituting (19) into (17), it follows that: In summary, Koles et al. [8] used the following procedure ′× T T to compute the CSP matrix W ∈ RC C : Σ0P U =ΣP UΛ1 (20) 1) Compute Σ in (4) and its SVD in (5). i.e., 2) Compute the whitening matrix P in (6). − Σ 1Σ (P T U) = (P T U)Λ (21) 3) Compute S0 in (7) and its SVD in (10). 0 1 So the filtering matrix P T U consists of the eigenvectors of IV. EXPERIMENTS −1 −1 Σ Σ0, which are identical to the eigenvectors of Σ1 Σ0. In this section we compare the SM approach with the T T W = P V consists a subset of P U. traditional CSP approach on two different motor imagery For Approach 2, (15) can be rewritten as: datasets, using two classifiers. −1 w w Σ1 Σ0 = λ (22) A. Datasets w −1 2 So each is also an eigenvector of Σ1 Σ0. Both datasets were from BCI Competition IV . In summary, the above derivations show that W can be The first is Dataset 1 [3], which was recorded from seven −1 constructed from the eigenvectors of Σ1 Σ0, and this is healthy subjects. For each subject two classes of motor im- actually the most frequently used approach in computing the agery were selected from the three classes: left hand, right CSP in practice. hand, and foot. Continuous EEG signals were acquired from Finally, if we take a closer look at the rationale for CSP, 59 channels and were divided into three parts: calibration data, we can conclude that the CSP filters actually optimize the evaluation data, and special feature. We only used calibration following objective function: data in this paper, and each subject had 100 trials in each class. ′ The second is Dataset 2a, which consists of EEG data C /2 wT w i Σ0 i from nine subjects. Every subject was instructed to perform Ratio1=arg max X T W =[w1,...,w ′ ] w Σ w four different motor imagery tasks, namely the imagination of C i=1 i 1 i ′ movement of the left hand, right hand, both feet, and tongue. C wT Σ w The signals were recorded using 22 EEG channels and 3 EOG + i 1 i (23) X wT w channels. We only used two classes (left hand and right hand), ′ i Σ0 i i=C /2+1 and each class has 72 trials. III. ANEW APPROACH FOR COMPUTINGTHE SPATIAL B. Preprocessing and Classifiers FILTERS IN THE STIEFEL MANIFOLD Ratio1 in (23) optimizes the sum of ratios. A closely related The EEG signals were preprocessed using the Matlab objective function is to optimize the ratio of sums: EEGLAB toolbox [5], following the guideline in [2]. First, ′ a band-pass filter (7-30 Hz) was applied to remove muscle C /2 wT w artifacts, line-noise contamination and DC drift. Then, we Pi=1 i Σ0 i Ratio2=arg max ′ C /2 extracted EEG signals between [1, 3.5] seconds after the cue W =[w1,...,w ′ ] wT w C i=1 i Σ1 i P ′ appearance as our trials. C wT w Pi=C′/2+1 i Σ1 i For each subject, we randomly selected 50% trials for + ′ (24) C T training, and the remaining 50% for testing, and repeated ′ w Σ w Pi=C /2+1 i 0 i this process 30 times to get statistically meaningful results. We are interested in (24) because: For a given partition, we computed the spatial filters by the 1) (24) is very similar to (23). In fact, they are identical traditional CSP approach, and also the proposed SM approach. when C′ =2. So, we would like to investigate whether We then tested two different classifiers: this new objective function could result in better classi- 1) Linear discriminant analysis (LDA), as in [2]. The fication performance. features for the ith trial were: 2) Although both Σ0 and Σ1 are symmetric, generally ′ − j wT T w 1 fi = log( j XiXi j ), j =1, ..., C (25) Σ1 Σ0 is not symmetric, so its eigenvectors are not mutually orthogonal. In other words, the columns of W 2) Minimum distance to Riemannian mean (MDRM), as obtained from optimizing Ratio1 are correlated, which in [1]. The features were the covariance matrices of the may encode redundant information. On the other hand, trials. when solving Ratio2, it is possible to make the first C′/2 columns of W orthogonal, and also the last C′/2 C. Experimental Results ′ columns orthogonal (although the first and last C /2 The classification accuracies for different subjects, averaged columns are generally not mutually orthogonal). It’s across 30 runs, are shown as the first four bars in Figs. 1 and interesting to investigate whether this orthogonality can 2, for different number of spatial filters. The horizontal axis improve the classification performance. shows the indices of the subjects, and also the average across The two terms in (24) are independent, so we can compute the subjects. Observe that for both LDA and MDRM, generally them separately. Unfortunately, they do not have a closed- the performances of SM were slightly worse than CSP. Also, form solution. Cunningham and Ghahramani [4] proposed an generally the performance of MDRM was slightly worse than approach for solving this problem in the Stiefel manifold LDA. (SM), which is the set of ordered tuples of orthonormal Ratio1 and Ratio2 from the two objective functions are vectors. Their algorithm is complex and iterative, and we refer shown in Figs. 3 and 4. Observe that CSP always had higher the readers to [4] for it. The authors have also provided their Ratio1 than SM, and SM always had higher Ratio2 than Matlab code1, which was used in our experiment. CSP, which are as expected. However, it seems that Ratio1

1http://github.com/cunni/ldr 2http://www.bbci.de/competition/iv/. 0.9 1 CSP+LDA CSP+LDA SM+LDA 0.95 SM+LDA 0.85 CSP+MDRM CSP+MDRM SM+MDRM 0.9 SM+MDRM RCSP+LDA RCSP+LDA 0.8 RSM+LDA RSM+LDA RCSP+MDRM 0.85 RCSP+MDRM RSM+MDRM RSM+MDRM 0.75 0.8

0.7 0.75 Accuracy Accuracy 0.7 0.65

0.65 0.6 0.6

0.55 0.55

0.5 0.5 1 2 3 4 5 6 7 Avg 1 2 3 4 5 6 7 8 9 Avg Subject Subject (a) (a)

0.9 1 CSP+LDA CSP+LDA SM+LDA 0.95 SM+LDA 0.85 CSP+MDRM CSP+MDRM SM+MDRM 0.9 SM+MDRM RCSP+LDA RCSP+LDA 0.8 RSM+LDA RSM+LDA RCSP+MDRM 0.85 RCSP+MDRM RSM+MDRM RSM+MDRM 0.75 0.8

0.7 0.75 Accuracy Accuracy 0.7 0.65

0.65 0.6 0.6

0.55 0.55

0.5 0.5 1 2 3 4 5 6 7 Avg 1 2 3 4 5 6 7 8 9 Avg Subject Subject (b) (b)

0.9 1 CSP+LDA CSP+LDA SM+LDA 0.95 SM+LDA 0.85 CSP+MDRM CSP+MDRM SM+MDRM 0.9 SM+MDRM RCSP+LDA RCSP+LDA 0.8 RSM+LDA RSM+LDA RCSP+MDRM 0.85 RCSP+MDRM RSM+MDRM RSM+MDRM 0.75 0.8

0.7 0.75 Accuracy Accuracy 0.7 0.65

0.65 0.6 0.6

0.55 0.55

0.5 0.5 1 2 3 4 5 6 7 Avg 1 2 3 4 5 6 7 8 9 Avg Subject Subject (c) (c)

Fig. 1. Classification accuracies on Dataset 1. (a) C′ = 4; (b) C′ = 6; (c) Fig. 2. Classification accuracies on Dataset 2. (a) C′ = 4; (b) C′ = 6; (c) C′ = 8. C′ = 8. 10 is a better objective function, since a higher Ratio1 usually 15 CSP 8 SM results in a better classification accuracy. RCSP RSM 10 6 40 60

Ratio1 Ratio2 4 CSP CSP 5 SM 30 SM 2 40 RCSP RCSP RSM RSM 20 0 0 1 2 3 4 5 6 7 8 9 Avg 1 2 3 4 5 6 7 8 9 Avg Ratio1 Ratio2 20 10 Subject Subject (a) 0 0 1 2 3 4 5 6 7 Avg 1 2 3 4 5 6 7 Avg Subject Subject 8 CSP SM 15 (a) RCSP 6 RSM 10 30 4

60 Ratio1 Ratio2 CSP CSP SM SM 5 2 RCSP 20 RCSP 40 RSM RSM 0 0 1 2 3 4 5 6 7 8 9 Avg 1 2 3 4 5 6 7 8 9 Avg Ratio1 Ratio2 20 10 Subject Subject (b) 0 0 1 2 3 4 5 6 7 Avg 1 2 3 4 5 6 7 Avg 8 Subject Subject 20 CSP SM (b) 15 6 RCSP RSM

25 10 4

60 CSP CSP Ratio1 Ratio2 20 SM SM 5 2 RCSP RCSP 40 RSM 15 RSM 0 0 1 2 3 4 5 6 7 8 9 Avg 1 2 3 4 5 6 7 8 9 Avg Ratio1 Ratio2 10 20 Subject Subject 5 (c) 0 0 1 2 3 4 5 6 7 Avg 1 2 3 4 5 6 7 Avg Ratio1 Ratio2 C′ = 4 C′ = 6 Subject Subject Fig. 4. and on Dataset 2. (a) ; (b) ; (c) C′ = 8. (c)

Fig. 3. Ratio1 and Ratio2 on Dataset 1. (a) C′ = 4; (b) C′ = 6; (c) C′ = 8. to study if this is always true. Lotte and Guan [10] showed that regularization on the traditional CSPs can improve the The average correlation coefficients between the columns classification performance. In this subsection we study if of W for different subjects are shown in Tables I and II, for regularization can also improve Ratio1, i.e., the improved Datasets 1 and 2, respectively. Observe that on average the performance is due to the increased Ratio1. columns of W computed from the SM approach were less Several different regularized CSP (RCSP) approaches have correlated; however, this did not necessarily result in better been proposed in [10]. In this paper we compute the first C′/2 classification performance. columns of W in the RCSP from the eigenvectors of (Σ1 + −1 ′ λI) Σ0, and the last C /2 columns from the eigenvectors of TABLE I −1 (Σ0 + λI) Σ1, where λ is an adjustable parameter identified AVERAGECORRELATIONCOEFFICIENTSBETWEENTHECOLUMNSOF W FOR DATASET 1. by cross-validation on the labeled data [10]. This approach has showed good performance in [10]. Subject 1 2 3 4 5 6 7Avg Similarly, we also develop a regularized SM (RSM) ap- CSP .1393 .0760 .1064 .0797 .0839 .1122 .1148 .1018 proach, where the first C′/2 columns of W in the RSM SM .0907 .0477 .0399 .1373 .0738 .0943 .0886 .0817 ′ C /2 wT w Pi=1 i Σ0 i are computed from maximizing ′ , and C /2 wT w Pi=1 i (Σ1+λI) i the last C′/2 columns are computed from maximizing TABLE II ′ C T W ′ w w AVERAGECORRELATIONCOEFFICIENTSBETWEENTHECOLUMNSOF Pi=C /2+1 i Σ1 i ′ FOR DATASET 2. C , in which λ is again an adjustable ′ wT w Pi=C /2+1 i (Σ0+λI) i Subject 1 2 3 4 5 6 7 8 9Avg parameter identified by cross-validation on the labeled data. CSP .1807 .3224 .2571 .1776 .1416 .2116 .2419 .1904 .1665 .2100 The classification accuracies for different subjects, averaged SM .1744 .1759 .1522 .1488 .1280 .1481 .1354 .1288 .1521 .1493 across 30 runs, are shown as the last four bars in Figs. 1 and 2, for different number of spatial filters. Observe that: D. Discussions 1) Generally the performance of RCSP was better than The above results showed that generally a larger Ratio1 CSP, and the performance of RSM was better than results in better classification performance. We would like SM, for both LDA and MDRM. This confirmed the observations in [10]. practice. Moreover, we also confirmed that adding regulariza- 2) For both LDA and MDRM, generally the performances tion on the covariance matrices can improve the classification of RSM were slightly worse than RCSP. This pattern performance, no matter which objective function is used. was also observed between SM and CSP. 3) For both RCSP and RSM, generally the performance REFERENCES of MDRM was slightly worse than LDA. Again, this [1] A. Barachant, S. Bonnet, M. Congedo, and C. Jutten, “Multiclass brain- computer interface classification by Riemannian geometry,” IEEE Trans. pattern was observed before for CSP and SM. on Biomedical Engineering, vol. 59, no. 4, pp. 920–928, 2012. Next we check if the improved performance of RCSP and [2] B. Blankertz, R. Tomioka, S. Lemm, M. Kawanabe, and K. R. Muller, “Optimizing spatial filters for robust EEG single-trial analysis,” IEEE RSM over CSP and SM was indeed due to increased Ratio1. Signal Processing Magazine, vol. 25, no. 1, pp. 41–56, 2008. For this purpose, we plot Ratio1 and Ratio2 from RCSP and [3] B. Blankertz, G. Dornhege, M. Krauledat, K. R. Muller, and G. Curio, RSM as the last four bars in Figs. 3 and 4. Observe that: “The non-invasive Berlin brain-computer interface: Fast acquisition of effective performance in untrained subjects,” NeuroImage, vol. 37, no. 2, 1) Although RCSP had higher classification accuracy than pp. 539–550, 2007. CSP, its Ratio1 and Ratio2 were slightly worse than [4] J. P. Cunningham and Z. Ghahramani, “Linear dimensionality reduction: those from the unregularized CSP. A similar pattern can survey, insights, and generalizations.” Journal of Machine Learning Research, vol. 16, no. 1, pp. 2859–2900, 2015. also be observed from RSM and SM. This is reasonable, [5] A. Delorme and S. Makeig, “EEGLAB: an open source toolbox for as CSP directly optimizes Ratio1, whereas the objective analysis of single-trial EEG dynamics including independent component function for RCSP is slightly different. analysis,” Journal of Neuroscience Methods, vol. 134, pp. 9–21, 2004. [6] K. Fukunaga, Introduction to statistical pattern recognition. Academic 2) RCSP always had higher Ratio1 than RSM, and RSM press, 1972. always had higher Ratio2 than RCSP. This pattern was [7] H. He and D. Wu, “Transfer learning enhanced common spatial pattern similar to what we have observed on CSP and SM. filtering for brain computer interfaces (BCIs): Overview and a new approach,” in Proc. 24th Int’l. Conf. on Neural Information Processing, From all these observations, we can reach the following two Guangzhou, China, November 2017. conclusions: [8] Z. J. Koles, M. S. Lazar, and S. Z. Zhou, “Spatial patterns underlying population differences in the background EEG,” Brain Topography, 1) Although the SM approach optimizes an objective vol. 2, no. 4, pp. 275–284, 1990. function very similar to the objective function of the [9] B. J. Lance, S. E. Kerick, A. J. Ries, K. S. Oie, and K. McDowell, “Brain-computer interface technologies in the coming decades,” Proc. traditional CSP, it generally results in slightly worse of the IEEE, vol. 100, no. 3, pp. 1585–1599, 2012. classification performance. Meanwhile, the SM approach [10] F. Lotte and C. Guan, “Regularizing common spatial patterns to improve does not have a closed-form solution, and hence requires BCI designs: Unified theory and new algorithms,” IEEE Trans. on Biomedical Engineering, vol. 58, no. 2, pp. 355–362, 2011. much higher computational cost. For these two reasons, [11] S. Makeig, C. Kothe, T. Mullen, N. Bigdely-Shamlo, Z. Zhang, and the new objective function and optimization method are K. Kreutz-Delgado, “Evolving signal processing for brain-computer not recommended in designing spatial filters. interfaces,” Proc. of the IEEE, vol. 100, no. Special Centennial Issue, pp. 1567–1584, 2012. 2) Ratio1 in general is a reliable performance measure for [12] J. M¨uller-Gerking, G. Pfurtscheller, and H. Flyvbjerg, “Designing op- the spatial filters. However, because Σ0 and Σ1 may be timal spatial filters for single-trial EEG classification in a movement noisy, adding regularization to the covariance matrices task,” Clinical Neurophysiology, vol. 110, no. 5, pp. 787–798, 1999. [13] L. F. Nicolas-Alonso and J. Gomez-Gil, “Brain computer interfaces, a can improve the final classification performance, al- review,” Sensors, vol. 12, no. 2, pp. 1211–1279, 2012. though it may reduce Ratio1 very slightly. This suggests [14] H. Ramoser, J. Muller-Gerking, and G. Pfurtscheller, “Optimal spatial that maybe it is possible to define an improved objective filtering of single trial EEG during imagined hand movement,” IEEE Trans. on Rehabilitation Engineering, vol. 8, no. 4, pp. 441–446, 2000. function for the spatial filters, which is better correlated [15] J. van Erp, F. Lotte, and M. Tangermann, “Brain-computer interfaces: with the final classification performance. This is one of Beyond medical applications,” Computer, vol. 45, no. 4, pp. 26–34, our future research directions. 2012. [16] J. R. Wolpaw, N. Birbaumer, D. J. McFarland, G. Pfurtscheller, and T. M. Vaughan, “Brain-computer interfaces for communication and control,” V. CONCLUSIONS Clinical Neurophysiology, vol. 113, no. 6, pp. 767–791, 2002. [17] D. Wu, J.-T. King, C.-H. Chuang, C.-T. Lin, and T.-P. Jung, “Spatial CSP is a popular spatial filtering approach for EEG-based filtering for EEG-based regression problems in brain-computer interface BCIs, especially for motor imagery applications. It is used (BCI),” IEEE Trans. on Fuzzy Systems, vol. 26, no. 2, pp. 771–781, to increase the signal-to-noise ratio of EEG signals before 2018. [18] D. Wu, V. J. Lawhern, B. J. Lance, S. Gordon, T.-P. Jung, and C.- they are fed into a classifier. Its main idea is to project T. Lin, “EEG-based user reaction time estimation using Riemannian the EEG signals from the original sensor space into a low- geometry features,” IEEE Trans. on Neural Systems and Rehabilitation dimensional space which maximizing the variance for one Engineering, vol. 25, no. 11, pp. 2157–2168, 2017. class while minimizing it for the other. Different motivations and approaches have been proposed to compute the CSP filters. In this paper, we gave an overview of some typical approaches, and showed that they lead to the same closed-form solution. We also proposed a new objective function, which closely resembles the objective function of the traditional CSP, for developing spatial filters. Experimental results on two Motor Imagery datasets showed that the new objective function results in slightly worse spatial filters than the CSP filters, so the traditional CSP approach is still preferred in