Convolutional Neural Networks for EEG Signal Classification in Asynchronous Brain-Computer Interfaces
Total Page:16
File Type:pdf, Size:1020Kb
DISSERTATION CONVOLUTIONAL NEURAL NETWORKS FOR EEG SIGNAL CLASSIFICATION IN ASYNCHRONOUS BRAIN-COMPUTER INTERFACES Submitted by Elliott M. Forney Department of Computer Science In partial fulfillment of the requirements For the Degree of Doctor of Philosophy Colorado State University Fort Collins, Colorado Fall 2019 Doctoral Committee: Advisor: Charles Anderson Asa Ben-Hur Michael Kirby Donald Rojas This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License. To view a copy of this license, visit: http://creativecommons.org/licenses/by-nc-nd/4.0/ Or send a letter to: Creative Commons PO Box 1866 Mountain View, CA 94042, USA ABSTRACT CONVOLUTIONAL NEURAL NETWORKS FOR EEG SIGNAL CLASSIFICATION IN ASYNCHRONOUS BRAIN-COMPUTER INTERFACES Brain-Computer Interfaces (BCIs) are emerging technologies that enable users to interact with computerized devices using only voluntary changes in their mental state. BCIs have a number of important applications, especially in the development of assistive technologies for people with motor impairments. Asynchronous BCIs are systems that aim to establish smooth, continuous control of devices like mouse cursors, electric wheelchairs and robotic prostheses without requiring the user to interact with time-locked external stimuli. Scalp-recorded Electroencephalography (EEG) is a noninvasive approach for measuring brain activity that shows considerable potential for use in BCIs. Inferring a user’s intent from spontaneously produced EEG signals remains a challenging problem, however, and generally requires specialized machine learning and signal processing methods. Current approaches typ- ically involve guided preprocessing and feature generation procedures used in combination with with carefully regularized, often linear, classification algorithms. The current trend in ma- chine learning, however, is to move away from approaches that rely on feature engineering in favor of multilayer (deep) artificial neural networks that rely on few prior assumptions and are capable of automatically learning hierarchical, multiscale representations. Along these lines, we propose several variants of the Convolutional Neural Network (CNN) architecture that are specifically designed for classifying EEG signals in asynchronous BCIs. These networks perform convolutions across time with dense connectivity across channels, which allows them to capture spatiotemporal patterns while achieving time invariance. Class labels are assigned using linear readout layers with label aggregation in order to reduce suscep- tibility to overfitting and to allow for continuous control. We also utilize transfer learning in order to reduce overfitting and leverage patterns that are common across individuals. We show ii that these networks are multilayer generalizations of Time-Delay Neural Networks (TDNNs) and that the convolutional units in these networks can be interpreted as learned, multivariate, nonlinear, finite impulse-response filters. We perform a series of offline experiments using EEG data recorded during four imagined mental tasks: silently count backward from 100 by 3’s, imagine making a left-handed fist, vi- sualize a rotating cube and silently sing a favorite song. Data were collected using a portable, eight-channel EEG system from 10 participants with no impairments in a laboratory setting and four participants with motor impairments in their home environments. Experimental results demonstrate that our proposed CNNs consistently outperform baseline classifiers that utilize power-spectral densities. Transfer learning yields an additional performance improvement, but only when used in combination with multilayer networks. Our final test results achieve a mean classification accuracy of 57.86%, which is 8.57% higher than the 49.29% achieved by our base- line classifiers. In terms of information transfer rates, our proposed methods achieve a mean of 15.82 bits-per-minute while our baseline methods achieve 9.35 bits-per-minute. For two in- dividuals, our CNNs achieve a classification accuracy of 90.00%, which is 10–20% higher than our baseline methods. A comparison with external studies suggests that these results are on par with the state-of-the-art, despite our relatively rigorous experimental design. We also perform a number of experiments that analyze the types of patterns our classifiers learn to utilize. This includes a detailed analysis of aggregate power-spectral densities, exam- ining the layer-wise activations produced by our CNNs, extracting the frequency responses of convolutional layers using Fourier analysis and finding optimized input sequences for trained networks. These analyses highlight several ways that the patterns our methods learn to utilize are related to known patterns that occur in EEG signals while also creating new questions about some types of patterns, including high-frequency information. Examining the behavior of our CNNs also provides insights into the inner workings of these networks and demonstrates that they are, in fact, learning to form hierarchical, multiscale representations of EEG signals. iii ACKNOWLEDGEMENTS I would like to thank Charles Anderson for the teaching, mentorship and guidance that he has provided me over the years and for his feedback, comments and proofreading of this doc- ument. Much of the code that I have used in my implementations also originated during his courses in machine learning and from his contributions to the CEBL3 project. I have been hugely impacted by all of the wonderful and inspirational teachers at CSU and I would like to thank you all. I would also like to thank my dissertation committee and all of the members of the BCI laboratory. Bill Gavin, Patti Davies and Marla Roll, in particular, have taught me much of what I know about EEG and how to work with participants in a kind and professional way. Brittany Taylor, Jewel Crasta, Stephanie Scott, Katie Bruegger, Kim Teh and Stephanie Teh have all been especially instrumental in the success of my research and the BCI project and have all been great sources of support and friendship. Tomojit Ghosh and Fereydoon Vafaei have been great friends and colleagues and were extremely helpful throughout the process of formulat- ing my ideas and designing my experiments. I would also like to extend a special thank you to everyone that participated in our studies and, especially, to those who have graciously al- lowed us to enter their homes. I truly hope that this research helps to develop next-generation assistive technologies for everyone who can benefit from them. This work was supported in part by the National Science Foundation through grant number 1065513 and through gener- ous support from the Department of Computer Science, Department of Occupational Therapy and Department of Human Development and Family Studies at Colorado State University. I have also received generous financial support from the Forney Family Scholarship and through the Computer Science Department’s Artificial Intelligence and Evolutionary Computation Fel- lowship. I would like to thank my father, Glen Forney, for igniting my passion for science and my mother, Nancy Forney, for passing on her passion for reading, writing and critical thinking. Most of all, I would like to thank my wife, Maggie, and my two children, Parker and Alexa, for their phenomenal and enduring support of my academic endeavors. iv TABLE OF CONTENTS ABSTRACT............................................. ii ACKNOWLEDGEMENTS ....................................... iv LIST OF TABLES . vii LIST OF FIGURES . viii Chapter 1 Introduction . 1 1.1 Challenges . 6 1.2 Problem Statement . 8 1.3 Proposed Solution . 10 1.4 Outline of Remaining Chapters . 12 Chapter 2 Background . 14 2.1 Current Approaches . 15 2.1.1 Power Spectral Densities . 16 2.1.2 Common Spatial Patterns . 23 2.1.3 Time-Delay Embedding . 30 2.2 Deep Learning . 33 2.2.1 Convolutional Neural Networks . 34 2.2.2 CNNs in Computer Vision . 36 2.2.3 CNNs in Brain-Computer Interfaces . 38 2.2.4 Transfer Learning . 39 Chapter 3 Methods . 41 3.1 Baseline Classifiers . 41 3.1.1 Preprocessing and Feature Generation . 42 3.1.2 Discriminant Analysis . 44 3.1.3 Artificial Neural Networks . 47 3.2 Time-Delay Neural Networks . 50 3.2.1 Label Aggregation . 52 3.2.2 Time-Delay Embedding as a Discrete Convolution . 54 3.2.3 Finite Impulse-Response Filters . 58 3.3 Deep Convolutional Networks . 59 3.3.1 Fully Connected Readout Layers . 61 3.3.2 Label Aggregation Readout Layers . 63 3.3.3 Transfer Learning . 66 3.3.4 Optimizing Input Sequences . 67 3.4 Neural Network Details . 69 3.4.1 Initialization and Transfer Function . 69 3.4.2 Pooling . 72 3.4.3 Scaled Conjugate Gradients . 74 3.4.4 Regularization . 78 v 3.5 Data and Implementation . 79 3.5.1 The Colorado EEG and BCI Laboratory Software . 80 3.5.2 EEG Acquisition System . 82 3.5.3 Participants and Mental Tasks . 83 3.5.4 Validation and Performance Evaluation . 85 Chapter 4 Results . 88 4.1 Model Selection . 88 4.1.1 Power Spectral Densities . 89 4.1.2 Time-Delay Embedding Networks . 93 4.1.3 Convolutional Networks . 96 4.1.4 Label Aggregation . 105 4.1.5 Transfer Learning . 110 4.2 Test Performance . 116 4.2.1 Power Spectral Densities . 116 4.2.2 Convolutional Networks . 121 4.2.3 Comparison with External Studies . 127 4.2.4 Varying the Decision Rate . 132 4.2.5 Mental Tasks . 136 4.3 Analysis . 142 4.3.1 Aggregate PSDs . 143 4.3.2 LDA Weight Analysis . 151 4.3.3 Learned Filters . 155 4.3.4 Layerwise Outputs . 160 4.3.5 Optimized Input Sequences . 171 Chapter 5 Conclusions . 185 5.1 Discussion and Summary . 185 5.2 Looking Forward . 200 Bibliography . 203 Appendix A Comparisons of Mean Test Classification Accuracies . 221 Appendix B Layerwise Outputs for a Full-Sized CNN-TR. 222 vi LIST OF TABLES 3.1 Summary of PSD baseline classifiers. 42 3.2 Summary of CNN architectures. 61 3.3 Specifications of our EEG acquisition system. 82 3.4 Mental tasks used and cues shown to the participants. 84 4.1 Network configurations for transfer learning experiments.