Convolutional Neural Networks for Decoding of Covert Attention Focus and Saliency Maps for EEG Feature Visualization
Total Page:16
File Type:pdf, Size:1020Kb
bioRxiv preprint doi: https://doi.org/10.1101/614784; this version posted April 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. Convolutional Neural Networks for Decoding of Covert Attention Focus and Saliency Maps for EEG Feature Visualization Amr Farahat1 , Christoph Reichert2,3, Catherine M. Sweeney-Reed1, and Hermann Hinrichs1,2,3,4 1Neurocybernetics and Rehabiliation Research Group, Department of Neurology, Otto-von-Guericke University Hospital, Leipziger Str. 44, 39120 Magdeburg, Germany 2Department of Behavioral Neurology, Leibniz Institute for Neurobiology, Brenneckestraße 6, 39118 Magdeburg, Germany 3Forschungscampus STIMULATE, Sandtorstraße 23, 39106 Magdeburg, Germany 4Centre for Behavioral Brain Sciences (CBBS), Universitätsplatz 2, 39106 Magdeburg, Germany ABSTRACT vironment without the need for muscular activity. They could be especially beneficial for patients who have lost Objective. Convolutional neural networks (CNNs) have proven muscular control and are ‘locked-in’, such as due to stroke successful as function approximators and have therefore been or amyotrophic lateral sclerosis, or have sustained spinal used for classification problems including electroencephalog- cord injuries. To design such an interface, a voluntarily raphy (EEG) signal decoding for brain-computer interfaces evoked neural signal must be recorded from the brain, and (BCI). Artificial neural networks, however, are considered black a pattern recognition system is required that can detect the boxes, because they usually have thousands of parameters, making interpretation of their internal processes challenging. signal. Here we systematically evaluate the use of CNNs for EEG sig- nal decoding and investigate a method for visualizing the CNN Stimulus-evoked event-related potentials (ERP), such as model decision process. Approach. We developed a CNN model the P300, are frequently elicited to provide the neural signal to decode the covert focus of attention from EEG event-related (40). The P300 is an attention-dependent ERP component potentials during object selection. We compared the CNN and showing a positive deflection in the ERP waveform, peak- the commonly used linear discriminant analysis (LDA) classi- ing 300 milliseconds after stimulus onset, regardless of the fier performance, applied to datasets with different dimension- stimulus modality, e.g., visual, auditory, or somatosensory ality, and analyzed transfer learning capacity. Moreover, we val- (38). It can be reliably evoked using the oddball paradigm, idated the impact of single model components by systematically in which infrequent target “relevant” stimuli are presented altering the model. Furthermore, we investigated the use of among frequent standard “irrelevant” stimuli and is mainly saliency maps as a tool for visualizing the spatial and tempo- ral features driving the model output. Main results. The CNN distributed over the midline scalp electrodes (Fz, Cz, Pz), model and the LDA classifier achieved comparable accuracy on extending to parietal electrodes. This phenomenon was the lower-dimensional dataset, but CNN exceeded LDA perfor- utilized to design the first P300 BCI that allowed partici- mance significantly on the higher-dimensional dataset (with- pants to type letters by decoding the EEG signals and it out hypothesis-driven preprocessing), achieving an average de- was called “the P300 speller” (14). Other ERPs, such as vi- coding accuracy of 90.7% (chance level = 8.3%). Parallel con- sually evoked potentials (VEPs), can also be modulated by volutions, tanh or ELU activation functions, and dropout regu- attention (13, 33). Both the P1 and the N1 components larization proved valuable for model performance, whereas the show larger amplitude for the attended stimuli than non- sequential convolutions, ReLU activation function, and batch attended stimuli. The N1 component is followed by the P2 normalization components, reduced accuracy or yielded no sig- and N2 components that are elicited by rare, relevant stim- nificant difference. Saliency maps revealed meaningful fea- uli (32). These modulations of VEPs depend on whether tures, displaying the typical spatial distribution and latency of the P300 component expected during this task. Significance. the participants are attending to the target stimulus overtly Following systematic evaluation, we provide recommendations or covertly, i.e., whether they foveate the target stimulus for when and how to use CNN models in EEG decoding. More- or not. The classification performance and ERP analysis over, we propose a new approach for investigating the neural of P300 spellers under overt and covert attentional condi- correlates of a cognitive task by training CNN models on raw tions were investigated by Treder and Blankertz (49), who high-dimensional EEG data and utilizing saliency maps for rel- concluded that P1, N1, P2, N2, and P3 components were evant feature extraction. enhanced in the case of overt attention. On the contrary, in the case of covert attention, only N2 and P3 were en- Deep Learning | CNN | EEG | BCI | P300 hanced. As a result, the classifiers depend mainly on early Correspondence: [email protected] VEPs in the case of overt attention and on late endoge- nous components in the case of covert attention. More- over, it was shown that using posterior occipital electrodes INTRODUCTION Oz, PO7, and PO8 in addition to the regular frontal, central, and parietal electodes Fz, Cz, and Pz leads to significantly Brain-computer interfaces (BCI) represent a bridge that al- better classification performance of the P300 spellers (27). lows direct communication between the brain and the en- Comparing a full set of EEG electrodes to these customized Farahat et al. | April 21, 2019 | 1–11 bioRxiv preprint doi: https://doi.org/10.1101/614784; this version posted April 22, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder. All rights reserved. No reuse allowed without permission. electrodes showed that this significant increase in perfor- ization is an algorithm that was developed to help facili- mance only occurs in the case of overt attention and not in tate faster training of neural networks with convergence on the case of covert attention (6). lower minima (22). A CNN was developed to examine the impact of including batch normalization on P300 detection Machine learning techniques are effective in learning dis- accuracy (31). It was shown that removing more than one criminative signal patterns from brain data. Most conven- batch normalization layer markedly decreased the classifi- tional machine learning techniques, however, require an cation accuracy. CNNs trained on EEG data showed also a initial step before classification, namely feature extraction. promising potential in decoding motor imagery tasks rela- Feature extraction includes preprocessing like noise re- tive to the commonly used filter bank common spatial pat- moval, dimensionality reduction and transformations like terns (FBCSP) algorithm (28, 43). principal component analysis (PCA) or frequency decom- position, and feature selection like channel and time inter- In order to make CNNs interpretable, several feature vi- val selection (2, 50, 51). Feature extraction is hypothesis- sualization methods have been developed to explain the driven and requires domain expertise and thus using a ma- performance of the CNN models and the relevant features chine learning algorithm that autonomously determines that drive their outputs (3, 35, 39). Some of these tech- relevant features could save time and effort and potentially niques were adopted and other novel techniques were de- make use of features that would have been overlooked or veloped for CNNs applied to EEG data (28, 43, 47). Here removed. we adopt the gradient-based visualization technique com- monly used in computer vision research called saliency Deep learning – a subarea of machine learning – has maps (44). A saliency map is generated as the gradient of emerged in recent years as a technique that could be ap- the CNN output with respect to its input example. It is plied to extract features automatically from the data that computed using backpropagation, after fixing the weights maximize between-class discriminability (30). Deep learn- of the trained network, by propagating the gradient with ing has outperformed other machine learning techniques, respect to the layers’ inputs back to the first layer that re- especially on image and audio data, in a variety of tasks, ceives the input data. We can interpret the saliency map by large margins (25, 42). Most of these successes are at- representing the relevant features that drive the network’s tributed to a specific type of deep learning models called decision. convolutional neural networks (CNNs) (29). CNN refers to a special neural network architecture, which is inspired by Here we investigated the potential of a CNN model in de- the hierarchical structure of the ventral stream of the visual coding the focus of covert attention using EEG data, sys- system (20). Stacking neurons with local receptive fields on tematically tested the impact of different components of top of each other leads to the development of simple and the model on its performance, and examined the model’s complex cells that extract features at different levels of ab- capacity for transfer learning. We explored the use of straction (29).