IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 8, AUGUST 2009 2095 Extraction of Bistable-Percept-Related Features From Local Field Potential by Integration of Local Regression and Common Spatial Patterns Zhisong Wang, Member, IEEE, Alexander Maier, Nikos K. Logothetis, and Hualou Liang∗, Senior Member, IEEE

Abstract—Bistable perception arises when an ambiguous stim- using bistable stimuli. Bistable perception refers to the phe- ulus under continuous view is perceived as an alternation of two nomenon of spontaneously alternating percepts while viewing mutually exclusive states. Such a stimulus provides a unique op- the same stimulus continuously. The study of bistable perception portunity for understanding the neural basis of because it dissociates the perception from the visual input. In this holds great promise for understanding the neural correlates of vi- paper, we focus on extracting the percept-related features from the sual perception [1]. The local field potential (LFP) is thought to local field potential (LFP) in monkey visual cortex for decoding its predominantly reflect the synaptic activities of local populations bistable structure-from-motion (SFM) perception. Our proposed of neurons [2] and has recently received increasing attention in feature extraction approach consists of two stages. First, we es- the analysis of the neuronal population activity [3], [4]. The in- timate and remove from each LFP trial the nonpercept-related stimulus-evoked activity via a local regression method called the vestigation of correlations between perceptual reports and LFP locally weighted scatterplot smoothing because of the dissocia- oscillations during physically identical but perceptually differ- tion between the perception and the stimulus in our experimental ent conditions may provide new insights on the mechanism of paradigm. Second, we use the common spatial patterns approach neural information processing and the neural basis of visual to design spatial filters based on the residue signals of multiple perception and perceptual decision-making. channels to extract the percept-related features. We exploit a sup- port vector machine (SVM) classifier on the extracted features to The LFP signal recorded during each trial is composed of decode the reported perception on a single-trial basis. We apply the percept-related and nonpercept-related neuronal activity, among proposed approach to the multichannel intracortical LFP data col- which the stimulus-evoked activity is often the strongest, over- lected from the middle temporal (MT) visual cortex in a macaque shadowing the other activity such as the induced activity and monkey performing an SFM task. We demonstrate that our ap- noise. However, in our data, the stimulus-evoked activity is not proach is effective in extracting the discriminative features of the percept-related activity from LFP and achieves excellent decoding related to perception due to the dissociation between the per- performance. We also find that the enhanced gamma band syn- ception and the stimulus in our experimental paradigm. There- chronization and reduced alpha and beta band desynchronization fore, we first need to estimate and remove the stimulus-evoked may be the underpinnings of the percept-related activity. activity from each LFP trial to facilitate the extraction of the Index Terms—Common spatial patterns (CSPs), event-related percept-related features from the remaining signal for single- synchronization and desynchronization, feature extraction, local trial decoding of bistable perception. field potential (LFP), local regression, locally weighted scatter- Ensemble averaging is a widely used approach for estimating plot smoothing (LOWESS), nonstationary time series, single trial, stimulus-evoked activity. It is performed by averaging across tri- stimulus-evoked activity, structure-from-motion (SFM), support vector machine (SVM). als the potentials measured over repeated presentations of a stim- ulus. However, ensemble averaging generally requires a large number of trials to achieve satisfactory results. This requires a I. INTRODUCTION long time for data acquisition, which may also cause nuisance HE QUESTION of how perception arises from neural ac- for the subjects. In addition, averaging assumes that there are no T tivity in the visual cortex is of central importance to cogni- variations in amplitude, latency, or waveform of the stimulus- tive neuroscience. To answer this question, one important exper- evoked potential across subsequent trials. This assumption may imental paradigm is to dissociate percepts from the visual input not be valid in practice [5]. Whenever it is violated, important information regarding the dynamics of the cognitive process un- Manuscript received August 27, 2008; revised December 22, 2008 and der study is lost. Time-varying adaptive filters [6] such as the February 3, 2009. First published April 7, 2009; current version published least mean square (LMS) and recursive least squares (RLS), and July 15, 2009. This work was supported by the National Institutes of Health Wiener filter can also be used to estimate the stimulus-evoked under Grant NIH R01 MH072034 and by the . Asterisk indicates corresponding author. activity. However, all of these filters require a reference sig- Z. Wang is with the Microsoft Corporation, Bellevue, WA 98007 USA. nal, which is typically calculated as the ensemble average of all A. Maier is with the National Institutes of Health, Bethesda, MD 20892 USA. trials except the trial being considered. Therefore, they suffer N. K. Logothetis is with the Max Planck Institute for Biological Cybernetics, Tubingen,¨ Germany (e-mail: [email protected]). from the trial-by-trial variability of the stimulus-evoked activity ∗H. Liang is with the School of Biomedical Engineering, Drexel University, since they rely on averaging multiple trials to obtain a good Philadelphia, PA 19104 USA (e-mail: [email protected]). reference signal for each single trial. Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. In this paper, we consider a local regression method called Digital Object Identifier 10.1109/TBME.2009.2018630 locally weighted scatterplot smoothing (LOWESS) [7]–[9] to

0018-9294/$25.00 © 2009 IEEE

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. 2096 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 8, AUGUST 2009 estimate the stimulus-evoked activity from each single LFP trial. the application of the feature extraction approach for decoding LOWESS has gained widespread acceptance in statistics as an the bistable SFM perception. Finally, Section 4 contains the appealing solution for fitting smooth curves to noisy data. The conclusions. advantage of LOWESS lies in the fact that it does not require the specification of a global function to fit all the data samples, II. MATERIALS AND METHODS but only demands the setting of several parameters to fit the data locally. Consequently, it greatly simplifies the estimation of the A. Subjects and Neurophysiological Recordings complex processes such as the stimulus-evoked activity. In addi- Electrophysiological recordings were performed in a healthy tion, the LOWESS-based approach does not require a reference adult male rhesus monkey. After behavioral training was com- signal and is robust against trial-by-trial variability. After the plete, an occipital recording chamber was implanted and a cran- removal of the stimulus-evoked activity based on LOWESS, we iotomy was made. Intracortical recordings were conducted with propose to use the common spatial patterns (CSP) method to a multielectrode array while the monkey was viewing SFM stim- extract the percept-related features from the remaining signals uli, which consisted of an orthographic projection of a transpar- of multiple channels. CSP finds spatial filters that maximize the ent sphere that was covered with randomly distributed dots on its variance for one class and simultaneously minimize the vari- entire surface. SFM is the perception of 3-D shape from motion ance for the other class. It has become a popular feature extrac- cues. When in motion, this stimulus gives the striking percept tion approach in EEG-based brain–computer interface (BCI) of a 3-D sphere spinning in one of the two possible directions: applications [10]–[12]. There are many feature extraction ap- clockwise or counterclockwise. proaches including principal component analysis (PCA), inde- Trials were initiated with a short period of fixation (300– pendent component analysis (ICA), empirical mode decompo- 500 ms), followed by the presentation of the SFM stimulus sition (EMD), and the time-embedding techniques. PCA and for 2000–3000 ms with an average interstimulus-interval of ICA generate multiple components for each trial from multiple 2000 ms. The stimulus rotated for the entire period of presenta- channels. EMD generates multiple components for each trial tion, giving the appearance of a 3-D structure. The monkey was from each channel. The time-embedding techniques transform well-trained and required to indicate the choice of rotation di- 1-D time series of the sensor signals into a high-dimensional rection (clockwise or counterclockwise) by pushing one of two space of time-delayed time series to make subspace projection levers. The stimuli in the task were randomly intermixed with possible, but by doing so, they introduce nonlinearity into the both the disparity-biased (catch) trials with an externally defined data analysis. The main reason we choose CSP over the other ap- direction of rotation and ambiguous (bistable) trials. Correct re- proaches is that CSP is a supervised approach and makes use of sponses for disparity-defined stimuli were acknowledged with the label information of the training data for feature extraction, application of a fluid reward. In the case of fully ambiguous while the other feature extraction approaches are unsupervised (bistable) stimuli, where the stimuli can be perceived in one and do not exploit the label information of the training data. of two possible ways, the monkey was rewarded by chance. The focus of this paper is the feature extraction of the percept- Only trials corresponding to bistable stimuli are analyzed in the related activity from the LFP, simultaneously collected from paper. multiple channels in the middle temporal (MT) visual area The recording site was the middle temporal (MT) area of the of a macaque monkey, for decoding its perception of bistable monkey’s visual cortex, which is commonly associated with structure-from-motion (SFM) on a single-trial basis. We first visual motion processing. Multielectrode multiunit and LFP employ LOWESS to estimate and remove the stimulus-evoked recordings (nine electrodes simultaneously lowered into the activity from the single trials of LFP. We then apply CSP to brain) were collected in MT. Only LFP was used in this study to design spatial filters for the remaining signals from multiple demonstrate the proposed feature extraction method. LFP was channels and extract percept-related features. Based on the high-pass filtered at 1 Hz and sampled at 200 Hz. extracted features, we decode the monkey’s perception on a single-trial basis using the support vector machine (SVM) clas- B. Feature Extraction sifier. In doing so, we have demonstrated that our proposed ap- proach is effective in extracting the discriminative features of the Our goal is to extract the percept-related features from each percept-related activity from LFP, which leads to excellent de- LFP trial. Our approach consists of two stages. First, we employ coding performance. We also discover that the enhanced gamma a local regression method to estimate the nonpercept-related band synchronization and reduced alpha and beta band desyn- stimulus-evoked activity and remove it from the nonstationary chronization may be the underpinnings of the percept-related single-trial time series for each channel. Second, we use the activity. common spatial patterns approach to design spatial filters to The rest of the paper is organized as follows. In Section 2, extract the percept-related features from the remaining signals we first present the materials and feature extraction approach of multiple channels. Using the extracted features of LFP re- consisting of local-regression-based stimulus-evoked activity sponses, the perceptual reports can be determined with high estimation and CSP-based spatial filtering. We then introduce accuracy on a single-trial basis. In what follows, we provide the SVM classifier. In Section 3, we first show the excellent more details for each stage of feature extraction. performance of LOWESS for the estimation of the stimulus- 1) Local Regression-Based Stimulus-Evoked Activity Esti- evoked activity based on the simulation results and then explore mation: We consider a local regression method called LOWESS

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. WANG et al.: EXTRACTION OF BISTABLE-PERCEPT-RELATED FEATURES FROM LOCAL FIELD POTENTIAL 2097

[7]–[9] for the estimation of the stimulus-evoked activity. In addition, the degree of the local polynomial affects the bias– LOWESS is desirable and attractive in this application since variance tradeoff. A higher degree polynomial may result in a it does not require specifying a global function to fit a model less biased, but more variable fit, i.e., estimates closely follow- to all the data, but only demands setting several parameters to ing the observations. We choose the bandwidth parameter based fit the data locally. Hence, it greatly simplifies the estimation of on Mallows’ CP statistics [7], [9], [13] that estimate the average the complex processes of the stimulus-evoked activity. squared estimation error. Besides Mallows’ CP statistics, there Let x and Y denote the predictor variable and response vari- are other approaches for model selection including cross val- able, respectively. Given a set of observation pairs (xi ,Yi ),i= idation, the Akaike information criterion (AIC), the Bayesian 1,...,n, their relationship can be modeled as Yi = µ(xi)+i , information criterion (BIC), the minimum description length where µ(x) is an unknown function to be estimated by fitting a (MDL), etc. Cross validation estimates the average squared pre- polynomial model (most commonly linear or quadratic) within diction error. A drawback of the cross-validation method is its a moving window, and i is an error term. inherent variability. Mallows CP statistics is similar to the AIC, In LOWESS, we consider the locally weighted least-squares BIC, etc., in that it tends to be less dependent on the number of estimate of a fitting point x predictors in the model. The motif for choosing the CP criterion

   2 in the paper is that it provides an unbiased estimate of the sum n P x − x of the squared error over the time samples [9]. W i Y − β (x − x)p (1) α i p i 2) Common Spatial Patterns: The removal of the i=1 p=0 nonpercept-related stimulus-evoked activity from each LFP trial where W (x) is a weight function, α, the bandwidth, determines may facilitate the extraction of the percept-related features from the size of the neighborhood of x, and βp is the coefficient the remaining signal. After that, we propose to use the CSP with p denoting the polynomial order. Specifically, at each point method to extract the percept-related features from the remain- of interest, a polynomial is fit to the observation pairs of the ing signals of multiple channels. The CSP technique has become predictor variable and response variable, giving more weight to a popular feature extraction approach in EEG-based BCI appli- points close to the point of interest and less weight to points cations [10]–[12] and essentially finds spatial filters that max- farther away from the point of interest. imize the variance for one class and simultaneously minimize LOWESS is flexible in the selection of the parameters such the variance for the other class. as the bandwidth, the degree of the polynomial model, and the Assume that we have M channels of signals and each trial has weight function. The bandwidth plays an important role in the T time samples. Let Xj (i) denote the M × T data matrix for the local regression fit by controlling the smoothness of the fit. If ith trial from class j, where j ∈{−1, +1} denotes a perceptual the bandwidth is too small, the fit becomes too noisy and the report for the rotation direction with j = −1 for the clockwise variance increases. On the other hand, if it is too large, the fit direction and j =+1for the counterclockwise direction. Let Nj becomes oversmoothed and important features may be distorted denote the number of single trials for class j. Then, the mean or lost, and consequently, the fit will have a large bias. The matrix for each class is bandwidth needs to be chosen to compromise the bias–variance N j tradeoff. The degree of the local polynomial also affects the 1 X¯ = X (i). (2) bias–variance tradeoff. A higher degree polynomial may result j N − 1 j j i=1 in a less biased, but more variable fit, i.e., estimates closely following the observations. Since both polynomial degree and Let bandwidth influence bias as well as variance, it often suffices to − ¯ choose a low-degree polynomial (linear or quadratic) and con- Vj = Xj (i) Xj (3) centrate on choosing the bandwidth to obtain a satisfactory fit. and The weight function has much less effect on the bias–variance tradeoff, but it influences the visual quality of the fitted regres- N j  1 Vj V sion curves. Generally, it is chosen to be continuous, symmetric, R = j (4) j N − 1 tr(V V ) peaked at zero, and supported on [−1, 1]. j i=1 j j In this paper, we choose the degree of the local polyno- where Rj is the average normalized correlation matrix for class mial as two, and the commonly used tricube weight function  j, tr(·) represents the sum of the diagonal elements, and (·) W (x)=(1−|x|3 )3 [8]. The motivation for the kernel function means transpose. used by LOWESS in the paper is based on the fact that any The spatial filters of CSP can be determined by solving a function can be approximated by a low-order polynomial in a generalized eigenvalue problem small neighborhood and that simple models can be used to fit data well. To avoid overfitting, the local polynomials chosen are R−1 v = λR+1v. (5) almost always of first or second degree. Using a zero-degree polynomial may not approximate the underlying function well It requires that the correlation matrix R−1 be positive definite. enough since it turns LOWESS into a weighted moving average. In general, the eigenvectors corresponding to the largest and High-degree polynomials would tend to overfit the data and are smallest few eigenvectors are used as spatial filters for CSP numerically unstable, making accurate computations difficult. since they are most suitable for the discrimination purpose.

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. 2098 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 8, AUGUST 2009

C. SVM Classifier the internal MATLAB function randn and is realized differently in different trials. The ratio of the power of the stimulus-evoked SVM is a promising classifier that minimizes the empirical classification error, and at the same time, maximizes the margin activity to that of the Gabor atom and to that of the white Gaus- sian noise is 5 and 1, respectively. We generate 50 trials of by determining a separating hyperplane to distinguish different simulated data. Each trial contains 1500 data samples. Fig. 1(b) classes of data [14], [15]. SVM has been successfully used in many applications due to its robustness to outliers and good shows a single trial containing the stimulus-evoked activity, Ga- bor atom, and white Gaussian noise. A reference signal needs generalization capability. to be chosen for the Wiener filter, LMS, and RLS and is calcu- Assume that xk ,k =1,...,Kare the K training feature vec- tors for decoding and the class labels are y ∈{−1, +1}.SVM lated as the ensemble average of all trials except the trial being k considered. The filter order for the Wiener filter, LMS, and RLS solves the following optimization problem: is chosen to be 100. For LOWESS, we choose the bandwidth as K 0.15 based on the CP statistics. 2 minw + C ξk subject to To assess the performances of different approaches, we em- k=1 ploy the rms error (RMSE) between the true stimulus-evoked y (wx + b) ≥ 1 − ξ activity s(t) and the estimated evoked potential sˆ(t) as the merit k k k measure. The RMSE is computed as ξk ≥ 0 (6)   1 1 T 2 where w is the weight vector, C>0 is the penalty parameter of RMSE = (s(t) − sˆ(t))2 (8) T the error term chosen by cross validation, ξk is the slack variable, t=1 and b is the bias term. It turns out that the margin of the two where T is the number of data samples in each trial. classes is inversely proportional to w2 . Therefore, the first In the first example, we have a single channel with 50 trials, term in the objective function of SVM is used to maximize the all of which contain the same stimulus-evoked activity. Table I margin. The second term in the objective function is the regu- compares the LMS, RLS, Wiener filter, and LOWESS-based larization term that allows for training errors for the inseparable approaches in terms of the mean and standard deviation (SD) case. of the RMSEs of the stimulus-evoked activity estimates of all The optimal solution of w and b to the above optimization trials. It is obvious that the LOWESS-based method obtains problem can be found by using the Lagrange multiplier method. the most accurate estimates among all approaches. Fig. 1(c) Assume that t is the testing feature vector. Then, testing is and (d) shows the single-trial LOWESS-based estimates of the done simply by determining on which side of the separating  stimulus-evoked activity and the average across 50 trials of hyperplane t lies, i.e., if w t + b ≥ 0, the label of t is classified the LOWESS-based estimates of the stimulus-evoked activity, as +1, otherwise the label is classified as −1. respectively. Note that LOWESS has excellent performance in estimating the stimulus-evoked activity. III. RESULTS Next, we consider a more realistic situation where there is In this section, we first provide simulation results to demon- a random latency jitter in the stimulus-evoked activity across strate the excellent performance of LOWESS for the estimation the 50 trials. The jitter has a Gaussian distribution with zero of the stimulus-evoked activity. We then explore the application mean and standard deviation equal to the duration of 30 sam- of the proposed feature extraction approach for decoding the ples. The noise and Gabor atom for each trial is the same as bistable SFM perception. in the corresponding trial of the first example. The latency jit- ter may occur when the monkey judges the onset times of the same external stimulus differently at different trials. Table II A. Simulation Results compares the LMS, RLS, Wiener filter, and LOWESS-based In the simulation, we use a specific waveform for the stimulus- approaches in terms of the mean and SD of the RMSEs of the evoked activity, as shown in Fig. 1(a). It was realized as the stimulus-evoked activity estimates of all trials. The LOWESS- addition of two Gaussian window functions using the internal based method maintains similar RMSE to that in the first ex- MATLAB function gausswin. The remaining activity is simu- ample. On the other hand, significant performance degradations lated as a mixture of a Gabor atom and white Gaussian noise. arise in the Wiener filter, LMS, and RLS due to the poor refer- The Gabor atom is used to simulate the induced percept-related ence signal. activity. It is a cosine-modulated Gaussian window function and has this form B. Experimental Results   1 t − u Here, we provide experimental examples to demonstrate g (t)=√ g cos(2πft + φ) (7) γ s s the performance of the proposed feature extraction approach for predicting perceptual decisions from the neuronal data. −t2 where g(t)=e . The Gabor atom gγ (t) is centered at u and its Simultaneously collected nine-channel LFP data were used energy is mostly concentrated in a neighborhood of u with a size for demonstration. We focus on the LOWESS-based approach proportional to s. The white Gaussian noise was generated using for the stimulus-evoked activity estimation. For LOWESS, we

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. WANG et al.: EXTRACTION OF BISTABLE-PERCEPT-RELATED FEATURES FROM LOCAL FIELD POTENTIAL 2099

Fig. 1. Simulated results for a single trial: (a) stimulus-evoked activity, (b) raw signal containing the stimulus-evoked activity, Gabor atom, and white Gaussian noise, (c) single-trial LOWESS-based estimates of the stimulus-evoked activity, and (d) average across 50 trials of the LOWESS-based estimates of the stimulus- evoked activity. Note that LOWESS has excellent performance in estimating the stimulus-evoked activity.

TABLE I choose N − 1 trials for feature extraction and training and use COMPARISON OF THE DIFFERENT APPROACHES FOR 50 TRIALS FROM A SINGLE the remaining one trial for testing. This is repeated for N times CHANNEL IN THE ABSENCE OF STIMULUS-EVOKED ACTIVITY VARIATION with each trial used for testing once. The decoding accuracy is obtained as the ratio of the number of correctly decoded trials over N. We first compare the decoding accuracy based on different TABLE II features obtained from the nine channels. Fig. 2(a)–(d) shows the COMPARISON OF THE DIFFERENT APPROACHES FOR 50 TRIALS FROM A SINGLE decoding accuracy based on the CSP features obtained from the CHANNEL IN THE PRESENCE OF STIMULUS-EVOKED ACTIVITY VARIATION raw signal, ensemble average removed signal, LOWESS esti- mate of the stimulus-evoked activity, and the LOWESS residue, respectively, when the CSP component number is equal to 2, 4, 6, and 8, within nine temporal windows, all starting from 200 ms after stimulus onset but with different ending times. The choose the bandwidth as 0.015 based on the CP statistics. LOWESS residue is obtained by subtracting the LOWESS esti- In the paper, we employ the linear SVM classifier from the mate of the stimulus-evoked activity from the raw signal of the LIBSVM package [16]. We use decoding accuracy as a perfor- single trial. We ignore the first 200 ms after stimulus onset to mance measure and calculate it via leave-one-out cross valida- reduce the possible side effects not related to percepts. For CSP, tion (LOOCV). In particular, for a data set with N trials, we we use the pairs of eigenvectors corresponding to the largest

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. 2100 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 8, AUGUST 2009

Fig. 2. Comparison of the decoding accuracy based on the CSP features obtained from the (a) raw signal, (b) ensemble average removed signal, (c) LOWESS estimate of stimulus-evoked activity, and (d) LOWESS residue, respectively, when the CSP component number is equal to 2, 4, 6, and 8, within nine temporal windows all starting from 200 ms after stimulus onset but with different ending times. We ignore the first 200 ms after stimulus onset to reduce the possible side effects not related to percepts. For CSP, we use the pairs of eigenvectors corresponding to the largest and smallest few eigenvectors as spatial filters. The range of the decoding accuracy of each figure can be seen from the colorbar on the right. Note the general trend that the decoding accuracy increases as the temporal window becomes longer. This indicates that the temporal integration of the neuronal population activity could enhance the perceptual discriminability. Note that the LOWESS residue outperforms all the other signals in terms of decoding accuracy. and smallest few eigenvectors as spatial filters. The range of the age removed signal, LOWESS estimate of the stimulus-evoked decoding accuracy of each figure can be seen from the colorbar activity, and the LOWESS residue after removing the estimate on the right. Note the general trend that the decoding accuracy of the stimulus-evoked activity, respectively. Table IV is the increases as the temporal window becomes longer. This indi- same as Table III except that the mean of the decoding accuracy cates that the temporal integration of the neuronal population across different CSP components is presented. Note that the CSP activity could enhance the perceptual discriminability. Note that features obtained from the LOWESS estimate of the stimulus- the LOWESS residue outperforms all the other signals in terms evoked activity have similar performance to the CSP features of decoding accuracy. obtained from the raw signal, both of which are much inferior To have a closer look at the decoding accuracy values in to the CSP features of the ensemble average removed signal and Fig. 2, we present some quantitative results in Tables III and the LOWESS residue. As can be seen, the LOWESS residue IV. In Table III, we compare the maximum of the decoding quite consistently yields the highest decoding accuracy among accuracy across different CSP components with the component all signals. In particular, its CSP feature with eight components number equal to 2, 4, 6, and 8, for nine temporal windows with in the temporal window between 200 and 1455 ms stands out the window ending time changing from 555 to 1995 ms. All the with the best decoding performance of 0.83. temporal windows start from 200 ms after stimulus onset. The Fig. 3 shows the p-value of the t-test in comparing the de- CSP features are obtained from the raw signal, ensemble aver- coding accuracy results obtained based on the CSP features

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. WANG et al.: EXTRACTION OF BISTABLE-PERCEPT-RELATED FEATURES FROM LOCAL FIELD POTENTIAL 2101

TABLE III COMPARISON OF THE MAXIMUM OF THE DECODING ACCURACY ACROSS DIFFERENT CSP COMPONENTS WITH THE COMPONENT NUMBER EQUAL TO 2, 4, 6 AND 8, FOR NINE TEMPORAL WINDOWS WITH THE WINDOW ENDING TIME CHANGING FROM 555 TO 1995 MS

TABLE IV COMPARISON OF THE MEAN OF THE DECODING ACCURACY ACROSS DIFFERENT CSP COMPONENTS WITH THE COMPONENT NUMBER EQUAL TO 2, 4, 6, AND 8, FOR NINE TEMPORAL WINDOWS WITH THE WINDOW ENDING TIME CHANGING FROM 555 TO 1995 MS

cantly higher than the decoding accuracy obtained based on the other signals (p<0.05, t-test), when the window ending time is sufficiently large (larger than 900 ms for the raw signal and LOWESS estimate of stimulus-evoked activity, and larger than 1400 ms for the ensemble average removed signal). This confirms the effectiveness of our proposed feature extraction approach and implies that the removal of the nonpercept-related stimulus-evoked activity from each LFP trial can improve the extraction of the percept-related features. In the review process, one reviewer raised one interesting hypothesis for ambiguous stimuli that the monkey actually identifies a stimulus as simply being ambiguous and pushes the lever by chance independently of the perception of the direction of rotation. We feel that the hypothesis is unlikely based on the following reasons. First, we have some control in the experiments. The stimuli in the task were randomly intermixed with both the disparity-biased (catch) trials with an externally defined direction of rotation and ambiguous (bistable) trials. We find that the monkey has excel- Fig. 3. Comparison of the p-value of the t-test in the decoding accuracy lent behavioral performance in the disparity-biased trials. This results obtained based on the CSP features from the LOWESS residue versus those from the raw signal, ensemble average removed signal, and LOWESS shows that the monkey is well-trained and can reliably indicate estimate of stimulus-evoked activity, as the ending time of the temporal window the direction of rotation. Second, we have achieved high decod- increases from 555to 1995 ms. All temporal windows start from 200 ms after ing accuracy after feature extraction from trials corresponding stimulus onset. Note that the decoding accuracy obtained based on the LOWESS residue is significantly higher than the decoding accuracy obtained based on the to bistable stimuli. If the monkey pushed the lever by chance other signals (p<0.05, t-test), when the window ending time is sufficiently independently of the perception of the direction of rotation and large (larger than 900 ms for the raw signal and LOWESS estimate of stimulus- the neuronal activity, the decoding accuracy would not be high evoked activity, and larger than 1400 ms for the ensemble average removed signal). even after feature extraction. In order to explain why the decoding accuracy obtained based from the LOWESS residue versus those from the raw signal, on the LOWESS residue is significantly higher than the decod- ensemble average removed signal, and LOWESS estimate of ing accuracy obtained based on the other three signals, we next stimulus-evoked activity, as the ending time of the temporal compare the time–frequency representation of the different sig- window increases from 555 to 1995 ms. All temporal windows nals. Fig. 4(a)–(d) shows the spectrogram for the raw signal, start from 200 ms after stimulus onset. Note that the decoding ensemble average removed signal, LOWESS estimate of the accuracy obtained based on the LOWESS residue is signifi- stimulus-evoked activity, and LOWESS residue, respectively,

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. 2102 IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, VOL. 56, NO. 8, AUGUST 2009

Fig. 4. Comparison of the spectrogram for the (a) raw signal, (b) ensemble average removed signal, (c) LOWESS estimate of stimulus-evoked activity, and (d) LOWESS residue, respectively, from a single channel. Time 0 indicates the stimulus onset. Red and blue in the figure represent strong and weak activity, respectively. Note that the time-frequency representation of both the raw signal and LOWESS estimate of stimulus-evoked activity is dominated by strong low-frequency activity below 10 Hz within 500 ms after stimulus onset, while the spectrogram of the ensemble average removed signal is dominated by strong low-frequency activity in a larger band (below 20 Hz) within a much longer window after stimulus onset. The spectrogram of the LOWESS residue shows clear event-related synchronization and desynchronization, i.e., enhanced activity in the high gamma band (50−60 Hz) and reduced activity in the alpha and beta bands (10−30 Hz) after stimulus onset. from a single channel. Red and blue in the figure represent Taken together, our results demonstrate that the proposed ap- strong and weak activity, respectively. Note from Fig. 4(a) and proach is effective in extracting the discriminative features of the (c) that the time–frequency representation of both the raw signal percept-related activity from LFP, which leads to excellent de- and LOWESS estimate of the stimulus-evoked activity is dom- coding performance. We also discover that the enhanced gamma inated by strong low-frequency activity below 10 Hz within band synchronization and reduced alpha and beta band desyn- 500 ms after stimulus onset. The spectrogram of the ensemble chronization may be the underpinnings of the percept-related average removed signal, as shown in Fig. 4(b), on the other activity. hand, is dominated by strong low-frequency activity in a larger IV. CONCLUSION band (below 20 Hz) within a much longer window after stimulus onset. Interestingly, the spectrogram of the LOWESS residue, In this paper, we have shown how to extract the features of the as shown in Fig. 4(d) reveals clearly the event-related synchro- percept-related activity from the LFP in monkey visual cortex nization and desynchronization, i.e., enhanced activity in the for decoding its bistable SFM perception. We have employed a high gamma band (50−60 Hz) and reduced activity in the alpha local regression approach called the locally weighted scatterplot and beta bands (10−30 Hz) after stimulus onset. Such event- smoothing (LOWESS) to estimate and remove the stimulus- related synchronization and desynchronization may underlie the evoked activity for each channel because the stimulus-evoked percept-related activity. activity in our data is not related to perception. We have used the

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply. WANG et al.: EXTRACTION OF BISTABLE-PERCEPT-RELATED FEATURES FROM LOCAL FIELD POTENTIAL 2103

common spatial patterns (CSP) approach to design spatial filters Zhisong Wang (S’02–M’05) received the B.E. de- to extract the percept-related features from the remaining signals gree from Tongji University, , China, in 1998, and the M.Sc. and Ph.D. degrees from the Uni- of multiple channels. We have exploited the SVM classifier on versity of Florida, Gainsville, in 2003 and 2005, re- the extracted features of LFP to decode the reported perception spectively, all in electrical engineering. From 2001 to on a single-trial basis. We have applied the feature extraction 2005, he was a Research Assistant in the Department of Electrical and Computer Engineering, University approach to the multichannel intracortical LFP data collected of Florida. From 2005 to 2008, he was a Postdoc- from the middle temporal (MT) visual area in a macaque mon- toral Fellow in the School of Health Information Sci- key performing an SFM task. Using these techniques, we have ences, University of Texas Health Science Center at Houston. He is currently with the Microsoft Corpo- demonstrated that our proposed approach is effective in extract- ration, Bellevue, WA. His current research interests include machine learning, ing the discriminative features of the percept-related activity data mining, signal processing, time-series analysis, biomedical engineering, from LFP and achieves excellent decoding performance. We and computational neuroscience. He is a member of the Sigma Xi and the Phi Kappa Phi. have also found that the enhanced gamma band synchroniza- tion and reduced alpha and beta band desynchronization may be the underpinnings of the percept-related activity. This study is based on the data from one monkey. We plan to work on Alexander Maier received the Ph.D. degree from population studies from multiple sessions of multiple monkeys the University of Tubingen,¨ Tubingen,¨ Germany, for to further validate the feature extraction approach and neuro- research at the Max Planck Institute of Biological scientific finding. Another research direction is to examine how Cybernetics, Tubingen.¨ He studied neurobiology at the University of the feature extraction approach will contribute to decoding the Munich, Munich, Germany. He is currently with the bistable percepts of SFM in a response time (RT) perceptual National Institute of Mental Health, Bethesda, MD, discrimination task, to understand how the perceptual discrim- as a Research Fellow. inative information of neuronal population activity evolves and accumulates over time to mediate behaviors.

REFERENCES

[1] R. Blake and N. K. Logothetis, “Visual competition,” Nat. Rev. Neurosci., Nikos K. Logothetis received the B.S. degree in vol. 3, pp. 13–21, 2002. mathematics from the University of Athens, Athens, [2] U. Mitzdorf, “Current source-density method and application in cat cere- , the B.S. degree in biology from the Univer- bral cortex: Investigation of evoked potentials and EEG phenomena,” sity of Thessaloniki, Thessaloniki, Greece, and the Physiol. Rev., vol. 65, pp. 37–99, 1985. Ph.D. degree in human neurobiology from Ludwig [3] B. Pesaran, J. S. Pezaris, M. Sahani, P.P.Mitra, and R. A. Andersen, “Tem- Maximillians University, Munich, Germany. poral structure in neuronal activity during working memory in macaque In 1985, he was with the Brain and Cognitive Sci- parietal cortex,” Nat. Neurosci., vol. 5, pp. 802–811, 2002. ences Department, Massachusetts Institute of Tech- [4] G. Kreiman, C. Hung, R. Quiroga, A. Kraskov, T. Poggio, and J. DiCarlo, nology. In 1990, he joined the faculty of the Division “Selectivity of local field potentials in inferior temporal cortex,” Neuron, of Neuroscience, Baylor College of Medicine. Seven vol. 49, pp. 433–445, 2006. years later, he joined the Max Planck Institute to con- [5] W. Truccolo, K. H. Knuth, A. S. Shah, S. L. Bressler, C. E. Schroeder, and tinue research on the physiological mechanisms that underly visual perception M. Ding, “Estimation of single-trial multi-component ERPs: Differentially and object recognition. Since 1992, he has also been Adjunct Professor of neu- variable component analysis (dVCA),” Biol. Cybern., vol. 89, pp. 426– robiology at the Salk Institute, San Diego, and, since 1995, Adjunct Professor 438, 2003. of ophthalmology at the Baylor College of Medicine, Houston. Currently, he [6] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-Hall, is the Director of the Neurophysiology Division of the Max Planck Institute 2001. for Biological Cybernetics, Tubingen,¨ Germany. His current research interests [7] W. S. Cleveland and S. J. Devlin, “Robust locally weighted regression and include the application of functional imaging techniques to monkeys, and mea- smoothing scatterplots,” J. Amer. Stat. Assoc., vol. 74, pp. 829–836, 1979. surement of how the functional magnetic resonance imaging signal relates to [8] W. S. Cleveland and S. J. Devlin, “Locally weighted regression: An ap- neural activity. proach to regression analysis by local fitting,” J. Amer. Stat. Assoc., vol. 83, pp. 596–610, 1988. [9] C. Loader, Local Regression and Likelihood. New York: Springer- Verlag, 1999. [10] Z. J. Koles, “The quantitative extraction and topographic mapping of the abnormal components in the clinical EEG,” Electroencephalogr. Clin. Neurophys., vol. 79, pp. 440–447, 1991. Hualou Liang (M’00–SM’01) received the Ph.D. de- [11] J. Muller-Gerking,¨ G. Pfurtscheller, and H. Flyvbjerg, “Designing optimal gree in physics from the Chinese Academy of Sci- spatial filters for single-trial EEG classification in a movement task,” Clin. ences, Beijing, China. Neurophysiol., vol. 110, no. 5, pp. 787–798, 1991. He studied signal processing at Dalian Univer- [12] H. Ramoser, J. Muller-Gerking,¨ and G. Pfurtscheller, “Optimal spatial sity of Technology, Dalian, China. He was a Post- filtering of single trial EEG during imagined hand movement,” IEEE doctoral Researcher at Tel-Aviv University, Tel-Aviv, Trans. Rehabil. Eng., vol. 8, no. 4, pp. 441–446, Dec. 2000. Israel, Max Planck Institute for Biological Cybernet- [13] C. Mallows, “Some comments on cp,” Technometrics, vol. 15, pp. 661– ics, Tubingen,¨ Germany, and the Center for Complex 675, 1973. Systems and Brain Sciences, Florida Atlantic Univer- [14] V. N. Vapnik, The of Statisical Learning Theory.NewYork: sity, Boca Raton. He is currently an Associate Profes- Springer-Verlag, 1995. sor at the School of Biomedical Engineering, Drexel [15] C. Cortes and V. N. Vapnik, “Support-vector networks,” Mach. Learning, University, Philadelphia, PA. He teaches graduate interdisciplinary courses in- vol. 20, pp. 273–297, 1995. cluding biomedical signal processing and computational biomedicine. His re- [16] C.-C. Chang and C.-J. Lin. (2001). LIBSVM: A library for support vector search experience throughout the years ranges from areas in biomedical signal machines, [Online]. Available: http://www.csie.ntu.edu.tw/∼cjlin/libsvm processing to cognitive and computational neuroscience.

Authorized licensed use limited to: Drexel University. Downloaded on July 15, 2009 at 08:16 from IEEE Xplore. Restrictions apply.