Toward an Imagined Speech-Based Brain Computer Interface Using EEG Signals
Total Page:16
File Type:pdf, Size:1020Kb
Toward an Imagined Speech-Based Brain Computer Interface Using EEG Signals Mashael M. Alsaleh Department of Computer Science The University of Sheffield This thesis is submitted for the degree of Doctor of Philosophy Speech and Hearing Research Group July 2019 I would like to dedicate this thesis to My loving parents, thank you for everything. Ever loving memory of my sister Norah, until we meet again. Declaration I hereby declare that I am the sole author of this thesis. The contents of this thesis are my original work and have not been submitted for any other degree or any other university. Some parts of the work presented in Chapters 3, 4, 5, and 6 have been published in conference proceedings, and a journal as given below: • AlSaleh, M. M., Arvaneh, M., Christensen, H., and Moore, R. K. (2016, Decem- ber). Brain-computer interface technology for speech recognition: a review. In 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) (pp. 1-5). IEEE. • AlSaleh, M., Moore, R., Christensen, H., and Arvaneh, M. (2018, July). Discrim- inating Between Imagined Speech and Non-Speech Tasks Using EEG. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) (pp. 1952-1955). IEEE. • Alsaleh, M., Moore, R., Christensen, H., and Arvaneh, M. (2018, October). Ex- amining Temporal Variations in Recognizing Unspoken Words Using EEG Signals. In 2018 IEEE International Conference on Systems, Man, and Cybernetics (SMC) (pp. 976-981). IEEE. • Alsaleh, M., Moore, R., Christensen, H., and Arvaneh, M. (2019). EEG-based Recognition of Unspoken Speech using Dynamic Time Warping. planned journal publication. Mashael M. Alsaleh July 2019 Acknowledgements In the name of ALLAH, the Most Gracious and the Most Merciful. Thanks to ALLAH who is the source of all the knowledge in this world, for the strengths and guidance in completing this thesis. I thank those who rendered their direct or indirect support during the period of my PhD work. First of all, I would like express my utmost gratitude to my supervisors, Professor Roger K. Moore and Dr. Mahnaz Arvaneh, for their mentorship, guidance and support throughout my time studying in Sheffield. Their constructive comments and suggestions, throughout the experimental and thesis work, have contributed to the success of this research. It has been an honour and privilege to work with them. Special thanks to my panel members, Dr. Heidi Christensen and Dr. Anthony Simons for their useful suggestions and for being so kind to me. I also like to thank my examiners, Prof. Tomas Ward and Dr. Yoshi Gotoh for the valuable comments and an intellectually stimulating and enjoyable viva. I am hugely grateful to my parents for their support, prayers, love and care throughout my life; they have played a vital role in helping me to reach this milestone. I owe special thanks to my sister Nouf, my brothers Abdullah and Saad for their continuous love and prayers. My beautiful niece Norah, you came into our family bringing much joy and love. My friends who never let me feel that I am away from my homeland: Rabab, Sadeen, Lubna, Hanan, Maryam, and Najwa. The members of the Speech and Hearing lab and Physiological Signals and Systems lab at University of Sheffield have been, and continue to be, a source of knowledge, friendship and humour. viii Last but not the least, I thankfully acknowledge the financial funding for this work from King Saud University in Saudi Arabia, who gave me a scholarship to pursue my PhD studies. Abstract Individuals with physical disabilities face difficulties in communication. A number of neuromuscular impairments could limit people from using available communication aids, because such aids require some degree of muscle movement. This makes brain–computer interfaces (BCIs) a potentially promising alternative communication technology for these people. Electroencephalographic (EEG) signals are commonly used in BCI systems to capture non-invasively the neural representations of intended, internal and imagined activities that are not physically or verbally evident. Examples include motor and speech imagery activities. Since 2006, researchers have become increasingly interested in classifying different types of imagined speech from EEG signals. However, the field still has a limited understanding of several issues, including experiment design, stimulus type, training, calibration and the examined features. The main aim of the research in this thesis is to advance automatic recognition of imagined speech using EEG signals by addressing a variety of issues that have not been solved in previous studies. These include (1) improving the discrimination between imagined speech versus non-speech tasks, (2) examining temporal parameters to optimise the recognition of imagined words and (3) providing a new feature extraction framework for improving EEG-based imagined speech recognition by considering temporal information after reducing within-session temporal non-stationarities. For the discrimination of speech versus non-speech, EEG data was collected during the imagination of randomly presented and semantically varying words. The non-speech tasks involved attention to visual stimuli and resting. Time-domain and spatio-spectral features were examined in different time intervals. Above-chance-level classification accuracies were achieved for each word and for groups of words compared to the non-speech tasks. To classify imagined words, EEG data related to the imagination of five words was collected. In addition to words classification, the impacts of experimental parameters on classification accuracy were examined. The optimization of these parameters is important to improve the rate and speed of recognizing unspoken speech in on-line applications. These parameters included using different training sizes, classification algorithms, feature extraction in different time intervals and the use of imagination time length as classification feature. Our extensive results showed that Random Forest classifier with features extracted using Discrete Wavelet Transform from 4 seconds fixed x time frame EEG yielded that highest average classification of 87.93% in classification of five imagined words. To minimise within class temporal variations, a novel feature extraction framework based on dynamic time warping (DTW) was developed. Using linear discriminant analysis as the classifier, the proposed framework yielded an average 72.02% accuracy in the classification of imagined speech versus silence and 52.5% accuracy in the classi- fication of five words. These results significantly outperformed a baseline configuration of state-of-the art time-domain features. Table of contents List of figures xvii List of tables xix 1 Introduction1 1.1 Research challenges in speech recognition using EEG signals . .3 1.1.1 The nature of EEG signals . .4 1.1.2 Speech imagery research . .5 1.2 Research aim and objectives . .6 1.3 Thesis structure . .7 2 Brain Computer Interface for Communication 10 2.1 Brain and language . 11 2.1.1 Process of speech production in the human brain . 13 2.1.2 The similarity between voiced and imagined speech production . 14 2.2 How BCI works . 16 2.3 BCI applications . 19 2.4 BCI technologies for signal acquisition . 21 2.4.1 Invasive BCI . 22 2.4.2 Non-invasive . 22 2.5 Electroencephalography (EEG)...................... 23 2.5.1 EEG acquisition device . 24 xii Table of contents 2.5.2 Neurophysiological signals in EEG used for BCI-based communi- cation . 25 2.6 Summary . 32 3 Brain Computer Interface for Unspoken Speech Recognition 34 3.1 Invasive Electrocorticography (ECoG).................. 35 3.2 Functional Magnetic Resonance imaging (fMRI)............. 37 3.3 Functional Near-Infrared Spectroscopy (fNIRS)............. 38 3.4 Magnetencephalography (MEG)...................... 39 3.5 Electroencephalogram (EEG)....................... 39 3.5.1 Word imagination . 40 3.5.2 Syllable imagination . 45 3.5.3 Vowel imagination . 46 3.6 Discussion on current EEG-Based studies . 49 3.6.1 Stimuli and tasks design . 49 3.6.2 Feature extraction . 50 3.6.3 Classification accuracies . 51 3.7 Summary . 55 4 Discriminating between Imagined Speech and Non-speech 56 4.1 Effect of word meaning on brain activity . 58 4.2 Experiment . 59 4.2.1 Participants . 59 4.2.2 Device . 60 4.2.3 Stimuli . 60 4.2.4 Task . 60 4.2.5 Data pre-processing . 61 4.2.6 Feature extraction . 62 4.2.7 Classification . 64 4.3 Results and discussion . 66 Table of contents xiii 4.3.1 Classifying group of words vs non-speech . 66 4.3.2 Spatio-spectral features vs time domain features to classify imag- ined words vs relaxation . 68 4.3.3 Combining spatio-spectral features and time domain features to classify imagined words vs relaxation . 68 4.3.4 Classification of individual words versus relaxation . 70 4.4 Conclusions . 73 4.5 Summary . 74 5 Examining Temporal Issues Related to Imagined Speech Recognition 75 5.1 Experiment design . 77 5.1.1 Participants . 77 5.1.2 EEG Device . 78 5.1.3 Stimuli and task . 78 5.1.4 Procedure . 79 5.2 Data analysis . 80 5.2.1 Pre-processing . 80 5.2.2 Feature extraction . 81 5.2.3 Classification . 82 5.3 Results and discussion . 82 5.3.1 Classifying between the five imagined words . 82 5.3.2 Effect of training size . 85 5.3.3 The relation between repetitions order and classification accuracy 88 5.3.4 The effect of imagination time . 89 5.4 Conclusions . 91 5.5 Summary . 92 6 Dynamic Time Warping in the Recognition of Imagined Speech 93 6.1 Dynamic time warping . 95 6.2 DTW using EEG and ECoG . 97 xiv Table of contents 6.3 Proposed DTW-based framework for feature extraction . 99 6.3.1 Computing training features using the proposed DTW framework 99 6.3.2 Classifying a new EEG trial using the proposed framework . 100 6.4 Methodology .