<<

Brain-Computer Interfaces

ISSN: 2326-263X (Print) 2326-2621 (Online) Journal homepage: http://www.tandfonline.com/loi/tbci20

Neurolinguistic and machine-learning perspectives on direct BCIs for restoration of naturalistic communication

Olga Iljina, Johanna Derix, Robin Tibor Schirrmeister, Andreas Schulze- Bonhage, Peter Auer, Ad Aertsen & Tonio Ball

To cite this article: Olga Iljina, Johanna Derix, Robin Tibor Schirrmeister, Andreas Schulze- Bonhage, Peter Auer, Ad Aertsen & Tonio Ball (2017): Neurolinguistic and machine-learning perspectives on direct speech BCIs for restoration of naturalistic communication, Brain-Computer Interfaces, DOI: 10.1080/2326263X.2017.1330611

To link to this article: http://dx.doi.org/10.1080/2326263X.2017.1330611

© 2017 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group

Published online: 21 Jun 2017.

Submit your article to this journal

View related articles

View Crossmark data

Full Terms & Conditions of access and use can be found at http://www.tandfonline.com/action/journalInformation?journalCode=tbci20

Download by: [93.230.60.226] Date: 22 June 2017, At: 15:41 BRAINCOMPUTER INTERFACES, 2017 https://doi.org/10.1080/2326263X.2017.1330611

OPEN ACCESS Neurolinguistic and machine-learning perspectives on direct speech BCIs for restoration of naturalistic communication

Olga Iljinaa,b,c,e,g,h, Johanna Derixe,h, Robin Tibor Schirrmeistere,h, Andreas Schulze-Bonhaged,e, Peter Auera,b,c,f, Ad Aertseng,i and Tonio Balle,h aGRK 1624 ‘Frequency effects in ’, University of Freiburg, Freiburg, Germany; bDepartment of German , University of Freiburg, Freiburg, Germany; cHermann Paul School of Linguistics, University of Freiburg, Germany; dEpilepsy Center, Department of , University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; eBrainLinks-BrainTools, University of Freiburg, Freiburg, Germany; fFreiburg Institute for Advanced Studies (FRIAS), University of Freiburg, Freiburg, Germany; gNeurobiology and Biophysics, Faculty of Biology, University of Freiburg, Freiburg, Germany; hTranslational Lab, Department of Neurosurgery, University Medical Center Freiburg, Faculty of Medicine, University of Freiburg, Freiburg, Germany; iBernstein Center Freiburg, University of Freiburg, Germany

ABSTRACT ARTICLE HISTORY The ultimate goal of brain-computer-interface (BCI) research on speech restoration is to develop Received 2 August 2016 devices which will be able to reconstruct spontaneous, naturally spoken language from the underlying Accepted 11 May 2017 neuronal signals. From this it follows that thorough understanding of brain activity and its functional KEYWORDS dynamics during real-world speech will be required. Here, we review current developments in BCI; ECoG; speech intracranial neurolinguistic and BCI research on speech production under increasingly naturalistic production; neurolinguistics; conditions. With an example of neurolinguistic data from our ongoing research, we illustrate the real-world communication plausibility of neurolinguistic investigations in non-experimental, out-of-the-lab conditions of speech production. We argue that interdisciplinary endeavors at the interface of and linguistics can provide valuable insight into the functional signifcance of speech-related neuronal data. Finally, we anticipate that work with neurolinguistic corpora composed of real-world language samples and simultaneous neuronal recordings, together with machine-learning methodology accounting for the specifcs of the neurolinguistic material, will improve the functionality of speech BCIs.

1. Introduction Speech reconstruction from brain activity is not only an exciting but also a challenging endeavor, and difer- Establishing approaches to enable communication ent recording methods, both hemodynamic and electro- in severely paralyzed patients is a major aim of brain- physiological, have been used to attempt it. Each of them computer-interface (BCI) research [1]. One increas- ingly pursued method is to infer the message the patient has its own drawbacks and advantages. Popular hemo- desires to convey from the underlying neuronal activity in dynamic methods are functional magnetic resonance speech-related brain areas and to externalize the message imaging (fMRI) and functional near-infrared spectros- using an assistive technical device. A speech BCI based copy (fNIRS). Tey detect brain activity by non-invasive on this method can be referred to as ‘direct’, as it relies measurements of changes in the level of cerebral oxy- on a one-to-one correspondence between the neuronal genation. Tese are correlated with changes in neuronal activity and the behavioral output [2]. For instance, if activation [5] although this correlation can be loose [6] a paralyzed patient wants to say, ‘I’d like to have a cup and dependent on the investigated anatomical area [7]. of strong black cofee, please’, a direct speech BCI, also fMRI has the advantage of whole-head coverage, and referred to as a ‘brain-to-text’ system [3,4], will try to it allows studying the entire network involved in task- reconstruct this utterance as precisely as possible from related processing. In comparison with fMRI, fNIRS can the underlying neuronal signal and convert it into, for only measure hemodynamic responses from the outer example, a text message or into voiced synthetic speech cortical layers, but it is associated with the advantages via a text-to-speech device. of being silent, portable, and cheaper to acquire [8,9].

CONTACT Olga Iljina [email protected] © 2017 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group. This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives License (http://creativecommons.org/licenses/by-nc- nd/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited, and is not altered, transformed, or built upon in any way. 2 O. ILJINA ET AL.

Recent BCI studies have shown that fNIRS can be used Accordingly, ECoG is available for research only at large to decode intended simple communication via ‘yes’ or neurosurgical centers, and the placement and the number ‘no’ responses in paralyzed subjects [10,11], suggesting of electrodes can difer between subjects due to their indi- that fNIRS can aid restoration of communicative abili- vidual clinical needs. Also, a researcher may ofen have ties in complete paralysis [12]. Hemodynamic methods to wait for a suitable patient, in whom the main areas of of measurement are non-invasive, and they possess good interest are sufciently covered by electrodes and whose spatial resolution. Nevertheless, their main demerit for clinical picture allows for studying the areas of interest speech-related research is that the temporal resolution (e.g. one may want to make sure the seizure onset zone of the recorded signals only allows detecting changes on in patients with lies outside the respective areas) the temporal scale of seconds, corresponding to the slow [18,20,21]. Important advantages of using this method speed of metabolic processes inherent to these methods. for speech-related research are that neuronal activity can To be able to capture dynamic speech-related changes be recorded directly from the cortical surface, and that in the time range of milliseconds, electrophysiological recordings, which are usually obtained over an extended recordings [3] or combinations of electrophysiologi- period of one to three weeks of pre-neurosurgical diag- cal and hemodynamic methods [10] may be helpful. nostics, can be used to study naturalistic, self-initiated (EEG) and magnetoencepha- human behavior [21,22] and real-world communication lography (MEG) are popular non-invasive methods to [18,20,23]. study speech-related neuronal processing. Tese methods Another invasive method used in previous direct can record electrophysiological signals from the cortical speech BCI research is intracortical microelectrodes, surface, and they possess very high temporal resolution which can record extracellular potentials from small multi- and spatial resolution comparable with fNIRS and fMRI, unit populations. Like ECoG, this recording method pos- respectively. One demerit of these methods, however, sesses high signal quality and robustness against ocular is that they can only record bone- and scalp-fltered and other movement-related artifacts, and it has been signals and therefore ofer inferior signal-to-noise ratios used in long-term clinical trials due to the comparatively compared with intracranial [13]. An small area of implantation and hence a reduced risk of important methodological drawback of both hemody- infection or mechanical damage to the neuronal tissue namic and non-invasive electrophysiological methods [24,25]. A major limitation of this approach, however, is for studies on expressive language is that they are prone that it allows recording from small cortical regions, which to distortion of the brain signal by myographic activ- may be a risk factor for long-term stability of the recorded ity. It accompanies expressive speech, during covert neuronal signal. production as well [14], and may thus obscure biologi- In the following, we describe the latest developments cally relevant neuronal responses. Due to this drawback, in the feld of ECoG research on speech production using perception-based paradigms are ofen preferred when increasingly naturalistic approaches (section 1), illustrate studying linguistic phenomena with the help of non- the plausibility of studying speech production-related invasive methods [15], and research on linguistic func- neuronal activity in conditions of real-world commu- tions in conditions of speech production is underrepre- nication using a visualized presentation of such activity sented in comparison [16]. from our ongoing research (section 2), and address the (ECoG) obtained in neurological challenge of understanding the functional signifcance of patients is an attractive alternative to non-invasive data, brain activity in its temporal dynamics during continu- especially for studies on expressive language. Such record- ous, non-experimental communication (section 3). Ten, ings have a high signal-to-noise ratio, they possess very we review the latest developments in the area of direct high temporal and high spatial resolution under the area speech BCI research with ECoG from linguistic (section covered by electrodes [13,17], and their invasive nature 4) and machine-learning (section 6) perspectives. We keeps the impact of myographic activity on the recorded argue in these sections that direct speech BCIs can proft signal moderate (a comparison of simultaneously from linguistically grounded approaches embracing mul- recorded ECoG and EEG over the same cortical region tiple levels of linguistic abstraction and from the usage of is shown in Fig. 1 of Derix et al. [18]; also see Fiederer decoding methodologies accounting for the hierarchical et al. [19] for a characterization of myographic impacts and probabilistic nature of the linguistic material. In sec- on ECoG signals). Due to its invasiveness, this method of tion 5, we propose that methodology from corpus-based measurement can only be implemented for neurolinguis- linguistic research may be useful to refne principles of tic research in consented neurological patients in whom natural speech decoding. Finally, we provide an outlook ECoG electrodes are implanted for diagnostic procedures. for the feld of invasive direct speech BCIs in section 7. BRAIN-COMPUTER INTERFACES 3

Figure 1. Spatially and temporally specific ECoG responses can be observed at different temporal stages of non-experimental, real-life production of simple clauses. (A) Individual electrode positions of one are visualized on the standard brain from SPM5 based on their MNI coordinates. Anatomical assignment (procedure described in detail in [21]) and the results of electrocortical stimulation mapping (ESM) are color-coded (see legend). Electrode labels (e.g. C3) are included for ease of reference. PMC, premotor cortex; PFC, prefrontal cortex; BR, Broca’s area; SPC/IPC, superior/inferior parietal cortex; PoP, parietal operculum; A1, primary auditory cortex; S1, primary somatosensory cortex; STG, superior temporal gyrus without assignment to a particular . (B) A schematic illustrating linguistic compositions of the simple clauses. Dotted lines with marker above them (‘ss’, speech start; ‘cls’, clause start; ‘cle’, clause end; ‘se’, speech end) indicate the positions of the corresponding linguistic events (also referred to as ‘conditions’) in the simultaneously recorded raw ECoG signal. A schematic of the raw potential at one electrode is visualized as a black solid line. Vertical dashed lines: onset of the respective condition. (C) Examples of relative spectral magnitude changes (RSMC) in speech-relevant anatomical areas of the respective subject during non-experimental, real-world production of 232 simple clauses. ‘0’ (white color) indicates no change in spectral magnitude relative to the baseline period, yellow and red colors depict increases and blue colors decreases relative to the baseline. The time information above the bullets shows trial-averaged (median) temporal differences between conditions in the course of the speech production epoch. Please note that the short temporal differences between ‘ss’ and ‘cls’, ‘cle’ and ‘se’, ‘cls’ and ‘mp1’, and ‘mp2’ and ‘cle’ may be due to the fact that these trial categories could, in some instances, coincide in time due to their linguistic definition. Abbreviations: frq., frequency; log., natural logarithm. Asterisks before the dotted line, significant effects in the early time window; asterisks after the dotted line, significant effects in the later time window; other conventions as in (B). 4 O. ILJINA ET AL.

2. Developments in ECoG research on speech spontaneous, naturally spoken language from the under- production using increasingly naturalistic lying neuronal signal. From this it follows that a thor- approaches ough understanding of brain activity during connected, real-world speech will be required. Current knowledge on ECoG is more and more frequently being used in neu- the neuronal processes involved in expressive language, rolinguistic experiments [26,27], and it has proven well however, has largely been gathered using decontextualized suited to the study of the neuronal signatures of expressive and disconnected linguistic output. It is thus conceivable language. Previous ECoG research has shown that spatially that natural speech production may difer from speech and temporally specifc neuronal efects can be obtained elicited in simplifed experiments, e.g. with regard to a in the fronto-temporo-parietal cortex during overt speech diferent amount of , behavioral restrict- production [28]. Tey align well with speech-relevant edness or associated attentional resources [18,39]. Non- areas identifed by means of electrocortical stimulation experimental research based on speech produced during mapping [29] and occur in a spatially [23,30] as well as continuous, real-world communication can enable com- temporally [31,32] specifc manner. Until recently, ECoG parisons with previous experimental observations. investigations on speech production have most frequently Brain activity during spontaneous, non-experimental addressed the general properties of the neuronal activity communication, however, is a largely unexplored regardless of linguistic parameters of the spoken material, phenomenon, which has been referred to as ‘the dark and they have classically been conducted to evaluate the matter’ of [40]. To bridge this usefulness of such data for localizing eloquent language gap, several labs worldwide have started investigating the cortex in pre-neurosurgical diagnostics [21,29,33,34]. neuronal activity underlying language in extraoperatively In recent years, ECoG has increasingly been used to recorded ECoG in increasingly naturalistic experiments study speech-related neuronal processing from a linguistic [3,4,41–44] as well as in conditions of non-experimental, perspective. Phonological properties of the spoken lan- real-world communication [18,20,21,23,45–49]. Teir guage material, such as the place of articulation reported efects align with previous experimental [4,35,36], the manner of articulation, the voicing, and the fndings (cf. Crone et al. [28]) and refect the somatotopic distinction between vowels and consonants [4], have been arrangement of the human sensorimotor cortex [21]. studied. Using a syntactically informed approach to seg- Tis research provides proof of principle for the non- mentation of continuous speech, Derix et al. [20] investi- experimental approach and allows elucidation of the gated the neuronal diferences between ECoG recordings largely unexplored neuronal signatures of authentic, obtained during the production of simple sentences with uninstructed communication. compared to without memory content. In a sociolinguis- tic perspective on speech-ECoG data, Derix et al. [18] 3. The plausibility of the non-experimental studied dialog partner-specifc diferences in the neuronal approach to study real-world speech production activity during spontaneous communication. Chen et al. [37] investigated diferences between semantic categories To further extend this non-experimental line of research, of during overt naming, and Fedorenko et al. [38] we are currently constructing a multimodal neurolinguis- investigated the neural correlates of how the meaning of tic corpus. It contains retrospectively documented unin- a sentence is constructed. structed, real-world, spontaneous speech production with Linguistically grounded ECoG studies, however, are concurrent ECoG recordings. Te speech data have been still a handful, and one main challenge in current research acquired based on simultaneous audio and video materi- is ‘to break down language function into computational als. Tese multimodal recordings have been obtained at primitives suitable for biology’ (D. Poeppel in [26]). Tis the University Medical Center Freiburg for the purpose of is, in our opinion, a particularly urgent call for studies on pre-neurosurgical diagnostics of epilepsy, and they were speech production. Compared to the extensively studied donated for research by consented neurological patients. side of speech perception, little evidence is available on All speakers included in the corpus (data sets from a the spatiotemporal representation of speech production total amount of eight subjects at diferent stages of anno- and its subfunctions in the [16]. Such evi- tation) gave written informed consent that these multi- dence is essential to make linguistically informed choices modal recordings would be made available for scientifc of, for example, optimal (and possibly spatially dispersed) investigation, and the Ethics Committee of the University implantation sites and spatial scales for direct speech BCI Medical Center Freiburg approved the recruitment - applications. cedure [20]. Te linguistic data in our corpus consist of Te ultimate goal of direct speech BCI research is continuous transcriptions of the subjects’ speech produc- to develop devices which will be able to reconstruct tion based on the audiovisual signals. Transcriptions were BRAIN-COMPUTER INTERFACES 5 generated by trained linguists according to GAT-2 conven- stay of the patient’s conversation partners (private visi- tions for the ‘basic’ transcript [50] established in applied tors or medical personnel) and on the wakefulness of the linguistics. A crucial advantage of using this method is patient. Te patient was fully conscious and alert in the that it provides detailed information about the linguis- analyzed epochs of conversation, and all of these conver- tic material, including accents, pitch contours, intensity, sation periods lay at least 30 minutes away from epileptic pause duration, and several paralinguistic features. Simple seizures to minimize the risk of contamination of physi- clauses constitute basic units of speech production in our ologically relevant responses by inter-ictal activity. corpus. We identify them in continuous transcriptions Te analyzed clauses consisted of the main verb (i.e. of the subjects’ speech according to structural linguistic full verb, copula verb, or a transitive modal verb in the criteria [51], tag the outer borders of the clauses as well absence of a full verb [53]) plus all its syntactically gov- as several other phenomena relevant to clause description erned lexical elements. Time periods between speech start (Figure 1(C)) in ECoG data using the Coherence EEG/ and speech end (vertical dashed lines with markers ‘ss’ PSG System sofware by Deltamed (Paris, France), analyze and ‘se’ above them in Figure 1, respectively) represent the clauses according to linguistic parameters (parts of epochs of continuous speech production with no pauses speech, syntactic constituents, and dependency relations, exceeding 200 ms, in which clauses corresponding to time etc.), and study the associated neuronal efects. periods from clause start (i.e. the start of the frst in Te main advantage of such an integrative approach a clause, marker ‘cls’ in Figure 1) to clause end (i.e. the end combining expertise in linguistics and invasive electro- of the last word in a clause, marker ‘cle’ in Figure 1) were physiology is that it ofers a unique opportunity to elu- embedded. Te 200-ms threshold for defnition of speech cidate the neuronal correlates of linguistic processing production epochs was chosen based on evidence from in conditions of non-experimental, real-world human conversation linguistics indicating that pauses shorter that communication. Te aforementioned non-experimen- 200 ms are not perceived by interlocutors as ‘pauses’ in a tal ECoG studies have demonstrated with the example conversation [54]. Within each clause, two additional tem- of speech onset-related data that spatially and tempo- poral stages of speech production were identifed accord- rally meaningful neuronal efects can be observed dur- ing to linguistic criteria: ‘mp1’ corresponds to the start of ing real-world speech production. Here, we would like the word in the lef sentence bracket: a fnite verb in main to provide an additional illustration of the plausibility of clauses and a subordinate conjunction in conjunction- the non-experimental approach by visualizing neuronal introduced subordinate clauses [51]; ‘mp2’ is the start of efects at distinctive temporal stages of real-world speech the word in the right sentence bracket (usually a non-fnite production, including the starts and ends of speech and verb in a main clause and a non-fnite verb in a subordi- clause production epochs (Figure 1). nate clause) or the right-most word in the middle feld of Figure 1 shows typical neuronal efects in the non- a main clause (ofen an adverb or a noun), depending on experimentally-obtained data from our multimodal whether or not the right sentence bracket was present. corpus (previously unpublished material). Tis fgure Te linguistic positions ‘mp1’ and ‘mp2’ within the clause contains examples of most prominent ECoG responses were defned based on the topological sentence model in which were observed in one native speaker of German German [51,53]. Te resulting temporal precedence of ‘ss’ (male, 41 years old at the moment of implantation, lef- ‘cls’ ‘mp1’ ‘mp2’ ‘cle’ ‘se’ in the trial-averaged lateralized language areas) at distinctive temporal stages of data in Figure 1(C) therefore shows the progression of the real-world production of simple clauses. Tese efects took neuronal response over the course of non-experimental, place in the premotor cortex, on the central sulcus and on real-world speech production. the superior temporal gyrus (classic areas implicated in As in our previous studies [18,20,21], spectral mag- expressive language [16,52]). Te three electrodes visu- nitude changes were calculated based on common aver- alized in Figure 1(C) showed maximum relative spectral age reference (CAR)-fltered data using a fast Fourier magnitude changes in their respective anatomical region transformation. Grid electrodes lying within the seizure (Figure 1(A)), and they were signifcant at at least one of onset area, identifed by epileptologists in ongoing ECoG the six investigated time points (further also referred to recordings (C2, D6, E3 in Figure 1(A)), were excluded as ‘conditions’; Figure 1(B)) in the course of speech pro- from all analyses to reduce the impact of strong epileptic duction. Below follows a description of the procedures spiking activity at these electrodes on our observations. we undertook for data including the statistics. Additional details on data pre-processing are available Te simple clauses were extracted from 11 hours of in [20]. Here, we used a sliding window of 200 ms with continuously transcribed expressive language material a time step of 20 ms and fve Slepian tapers. Since the from the subject. Diferent amounts of speech production sampling frequency of recordings was 1024 Hz, this anal- were present in these hours, depending on the length of ysis resulted in a 5-Hz frequency resolution. Te absolute 6 O. ILJINA ET AL. spectral magnitudes for each trial and electrode in the real-world speech production (Figure 1). For instance, entire visualized time period of −1 o 1.5 s relative to the one may assume that activity in the temporal cortex ena- onset of the respective condition (‘ss’, ‘cls’, ‘mp1’, ‘mp2’, bles mapping of the acoustic properties of speech onto ‘cle’, ‘se’) and in each time-frequency bin were divided by conceptual and semantic representations, such as for the the same baseline activity. Tis baseline was calculated purpose of self-monitoring via state feedback control [56]. by median-averaging the absolute spectral magnitude in Activation in the motor cortex, in contrast, may refect each frequency bin over the time bins corresponding to forward predictions of the intended articulatory output the time period −2 to −1.5 s relative to the onset of each in the dorsal language stream, responsible for sound-to- trial of the respective condition, median-averaging it over articulation mapping. all trials from this condition, and by mean-averaging the Many open questions, however, remain as to how these obtained value over the six conditions described above. dynamics are informative about the temporal precedence Te baseline-corrected spectral magnitudes in individ- of linguistic functions deployed in the stream of speech. ual trials were then averaged over trials in the respective For instance, when do semantic, phonological, or syntactic condition. processes occur? Do they take place at separate or (partly) Similarly to [20], we evaluated the statistical signif- overlapping temporal stages, e.g. in the production of cance of neuronal responses in the high gamma (70– words at the positions ‘mp1’ and ‘mp2’ in Figure 1(C)? 150 Hz) frequency band in two time windows (−1 to 0 and Diferent ways of interpreting such dynamics are conceiv- 0 to 1 s relative to the onset of the respective condition) able. For instance, the model by Levelt et al. [57] suggests using a sign-test (FDR-corrected over the entire number that word production evolves in a sequence of temporally of CAR-rereferenced grid electrodes (61/64) and the two distinctive stages of linguistic processing. First, the con- analyzed time windows at a signifcance threshold of q = ceptual and semantic information is extracted. Afer this, .001). Te examples of most prominent increases in high the lemma, i.e. the basic word form, is retrieved from the gamma activity shown in Figure 1(C) were signifcant in mental . Next, it is integrated into the morphosyn- both time windows for the conditions speech start (‘ss’) tactic and phonological context, and then the resulting and clause start (‘cls’) in the speech motor region (elec- word is articulated. Some electrophysiological studies trodes C3, D3), and the increase in high gamma activity agree with this notion of sequential processing [58,59]. observed in the superior temporal cortex (electrode H8) Other linguistic models, however, suggest that a parallel was signifcant in the late time window for the conditions architecture of processing phonological, conceptual, and ‘sentence end’ (‘cle’) and ‘speech production end’ (‘se’). syntactic information is conceivable [60]. More research Tese efects occurred in speech-relevant areas identifed may be needed to evaluate the physiological plausibility by clinicians in the course of pre-neurosurgical diagnos- of these diferent possibilities. tics (Figure 1(A)). To understand the dynamics of neuronal responses With only about a dozen recent exceptions underlying continuous speech, taking neurolinguistic [18,20,21,23,45–49], neurolinguistic and direct speech studies beyond the level of single words and towards larger BCI studies to date are based on experiments. Investigation speech units of real-life-like complexity will be essential. of non-experimental, real-word language is a novel, largely Also on the level of multi-word sequences, more research unexplored approach. For this reason, we considered that is needed to understand the exact functional signifcance its plausibility merits a detailed illustration (Figure 1(C)). of brain activity at diferent time points of speech pro- Te efects visualized in Figure 1 align in their location duction. For instance, whenever we produce an utterance, with results of previous experimental [23,28] and non- how much of it do we pre-determine prior to articula- experimental [20,21] fndings, and they enable illustra- tion: do we produce a syntactic plan of the entire sentence tion of temporal progression of neuronal activity at several before we articulate it [61] or do we plan the structure temporal stages of speech production. Tese temporally of the sentence during its production in an incremental meaningful and spatially reproducible neuronal responses manner online [62], such as by putting together salient lend support to our notion that spontaneous, non-exper- combinations of words (‘chunks’) stored in the mental lex- imental speech is worth investigating. icon as single entities as a result of the speaker’s experience with language [63]? A related and hitherto unexplored 4. The temporal dynamics of neuronal activity question is, what are the exact basic ‘building blocks’ of as a challenge for research on connected speech spontaneous, natural speech and how can their borders be detected in neuronal activity (e.g. Chafe [64], Frazier Te classic dual-stream model by Hickok and Poeppel and Fodor [65])? Te validity of these diferent linguis- [55] can be helpful in interpreting the temporal dynam- tic accounts of how we produce continuous language ics of neuronal responses underlying non-experimental, sequences remains to be evaluated neurolinguistically. To BRAIN-COMPUTER INTERFACES 7 this end, implementation of approaches similar to those gestures from postcentral areas, whereas frontal contacts used in experimental ECoG research on the spatiotempo- proved more informative when decoding . ral dynamics of cortical activity during speech production Lotte et al. [4] decoded classes of phonemes based on the [66] and neurocomputational simulations of the speaking manner of articulation, the voicing, and the assignment brain [67] can be helpful. to the category of vowels vs. consonants. Tey observed We believe that neurolinguistic research in conditions that ECoG sites which were most informative of these of spontaneous, non-experimental communication can diferences lay in spatially segregated areas of the fron- shed light on such questions, and that an interdisciplinary to-temporo-parietal cortex. Several direct BCI studies on approach bringing together and linguists speech production have decoded auditory features, such can play an important role in understanding the exact as vowel formants [74] and spectrograms of the acoustic functional dynamics of brain activity to support natural signal [79]. Tese studies speak for the suitability of sen- language. For instance, usage-based linguistics, which sorimotor [74] as well as fronto-opercular and temporal ofers a broad fund of theoretical and empirical knowledge regions [79] to reconstruct overt expressive speech from on spontaneous speech and provides detailed descriptions the auditory signal. of its structural, temporal, and distributional proper- Other studies with similar implantations have per- ties [62,63,68], may be helpful. Linguistically informed formed classifcation of individual overtly [80] and also accounts of how information is encoded in the dynamics covertly produced words [81], overtly produced sentences of brain activity can not only contribute to a better under- [82], of the semantic category the spoken word [83] or standing of the linguistic functioning of the human brain. sentence [20] belongs to, and the identity of the conver- Tey can also aid derivation of adequate decoding models sation partner [18]. Tese studies could achieve success- for speech reconstruction with direct BCIs. ful classifcation in most of these linguistic scenarios. Martin et al. [84] also asked whether a decoding model 5. Developments in the area of direct speech trained on ECoG recorded during overt speech produc- BCIs from a linguistic perspective tion would elicit above-chance classifcation when applied to data obtained during a covert speech production con- Beyond their utility in clinically oriented and basic dition. Tis was indeed the case, with the best decoding research, invasive recordings of cortical activity have performance yielded in the superior temporal and in proven valuable in direct BCI approaches to speech recon- the pericentral cortex. All in all, this research speaks for struction. Here, we overview the advances of research on the plausibility of deciphering expressive language from speech production with the help of direct ECoG-based underlying neuronal activity and supports the translata- BCIs from a linguistic perspective (Table 1). Please note bility of fndings from overt to silent speech production. that the present paper focuses on speech production. Direct speech BCI is a young branch of research (as far Studies conducted with perception-based paradigms are as we are concerned, the frst study was conducted using beyond the scope of the present paper, and information ECoG by Blakely et al. in 2008 [73]; also see Chaudhary about the progress in this related feld of research is avail- et al. [12] for an overview of the history of BCI technolo- able elsewhere [39,69,70]. gies). In spite of recent achievements, more work is needed Decoding studies have shown that speech production to take such speech-restoration devices into the daily life can reliably be distinguished from non-speech behavior of paralyzed patients. One challenge in direct speech BCI based on ECoG data from the fronto-temporo-parietal research is that decoding approaches operate on a limited, region [4,22,71]. Diferences between phonetic features pre-selected set of linguistic categories, whereas human of speech can also be decoded. Robust categorization language is complex and rich in combinatorics. A cur- of phonemes, for instance, has been achieved based on rently popular way of dealing with linguistic variability is neuronal spiking activity obtained with neurotrophic to reduce the number of decoding categories to a basic set electrodes in the speech motor cortex of locked-in indi- of features, such as by using the phonemic inventory or the viduals [24,25,72] as well as using spectral magnitude or entire set of articulatory gestures speech is composed of power changes in the fronto-temporo-parietal cortex of [3,4,77]. An advantage of these approaches is that they can epilepsy patients [3,4,73–78]. Recent ECoG studies report be applied to diferent items of speech regardless of their successful classifcation of phoneme categories and of length and combinatorial properties. Te latter is espe- articulatory features of speech on a subphonemic level. cially the case with articulatory gesture-based approaches, Mugler et al. [77] compared the classifcation performance as they allow on e to account for co-articulation efects (i.e. for articulatory gestures (i.e. movements of the diferent diferences in articulation between instances of the same articulators) vs. phonemes. Tese authors could achieve phoneme depending on the immediate phonological con- higher decoding accuracies when decoding articulatory text) [85]. Decoding approaches using the phonetic level 8 O. ILJINA ET AL.

- - M

V S (i.a.) M V S M V S A A -means clustering k - and clustering-based - and clustering-based A C M M M P V V V S S S classifier classifier A A decoding approach / classi f er approach decoding , principal-component analysis. analysis. principal-component , multiple-kernel Gaussian feature distributions and Gaussian feature probabil an in-built bigram-based istic language model sion with a custom two-stagedsion with a custom procedure optimization spectrogram features spectrogram classifier aive Bayes classifier, linear classifier, Bayes aive aive Bayes classifier Bayes aive adial-basis-function egularized LD egularized egularized LD egularized rincipal-component linear regres spectrogram features spectrogram A Dynamic time-warping-based Hierarchical Hierarchical Custom decoder with per-phoneme with per-phoneme decoder Custom LD P R Linear regression-based decoder for for decoder Linear regression-based Linear LD R R Linear N Custom linear decoder Custom N Custom Custom Linear C Linear regression-based decoder for for decoder Linear regression-based P features , support vector machine; vector support , and their combination, high and their combination, gamma (70–150 Hz) (70–300 Hz) ultiple bands (1–53 Hz) and their combination, high and their combination, gamma (70–150 Hz) 12 Hz, 18–26 Hz, 70–170 Hz) 10–13,14–25, 26–35, 36–70, 70–150 Hz) frq. range(s) of Inv. neuronal neuronal of Inv. range(s) frq. heta (3–5 Hz), alpha (8–12 Hz) ix frequency (7–12, ranges heta (3–5 Hz), alpha (8–12 Hz) hree frequency (8– ranges hree M .s., .s., total number of analyzed subjects; impl., implantation site; High gamma (70–150 Hz) M High gamma (70–170 Hz) High gamma (70–170 Hz) High gamma (70–150 Hz) T High gamma (70–150 Hz) High gamma (70–110 Hz) High very to high gamma 1–Hz bands (0–256 Hz) T High gamma (60–90 Hz) High gamma (60–120 Hz) 2–Hz bands (0–550 Hz) T 2–Hz bands (0.3–500 Hz) S V High gamma (70–170) N S - - CoG) E classification) rest clusters were discovered were clusters rest by by articulation, and status voicing categoryphonological of conso active speaking vs. or vowel, nant silence vs. without memoryvs. content) 39 phonemes partner partner) (doctor vs. objects phoneme vs. rest phoneme vs. neme pairs, binaryneme pairs, classification) vert vs. covert phoneme, imagined imagined covert phoneme, vert vs. ne phoneme vs. another (two pho another (two ne phoneme vs. nsupervised, movement/speech/ ll pairs from six words (binary six words ll pairs from econstruction of auditory signals lace of articulation,lace manner of entence (sentences with (sentences semantics entence peech vs. non –speech peech vs. hree vowels hree hirteen articulatory and gestures he identity of the conversation he identity of the conversation our vowels, nine consonant pairs nine consonant our vowels, wo sentences wo en words cation / regression targets / regression classi f cation econstruction of auditory signals A U Words (from 23 phonemes decoded 23 phonemes decoded (from Words P Vowel formants Vowel S R T T S T T erent semantic categories of categories semantic erent Di ff O F T O R - study paradigm words es, fan fiction and children rhymes upon visual fan fiction rhymes es, and children presentation erences in mnemonic content erences clauses with di ff historical political or a children’s story political speeches or a children’s historical upon visual presentation vowels erent communication partners communication (treating erent with di ff partner) life in the duration leveled vs. physician and production of speech perception (famous proverbs) upon aural presentation upon aural (famous proverbs) naming, property naming the naming, identification, closest )semantically phonemes corresponding to CVC to syllables corresponding vert and covert repetition of aurally presented presented vert of aurally repetition and covert vert production of excerpts from political speech from vert production of excerpts vert reading of well-known political speeches vert of well-known reading vert of CV syllables reading vert and covert production of excerpts from from vert production and covert of excerpts vert visually cued production of words on-experimental, real-world behavior real-world on-experimental, on-experimental, real-world production of simple real-world on-experimental, on-experimental, real-world conversation epochs conversation real-world on-experimental, rbitrary overt production of phonemes sentence epetition of a visually and aurally presented presented epetition of a visually and aurally O N O O O N O Covert articulation of three visually presented articulation visually presented Covert of three Visually cued overt production of words Visually Visually and aurally cued repetition of syllables cued repetition and aurally Visually N Visually cued repetition of 8-character cued repetition sentences Visually Visually cued overt speech production (picture Visually Visually cued overt production and covert of Visually Visually cued overt or covert repetition of words of words cued overt repetition or covert Visually O A R CoG CoG CoG studies on speech production from linguistic and machine-learning perspectives. CoG CoG CoG CoG CoG CoG CoG CoG CoG CoG CoG CoG CoG CoG I E E E E E E E E E E E E E E E E CoG E E - icro- mi cro- icro- icro- ECoG type ECoG Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. M Conv. Conv. Conv. Conv. Conv. Conv. Conv. and Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. Conv. M M Conv. Conv. CoG-BC E impl. f-t-p f-t-p f-t-p f-t-p f-t-p f-t-p f-t-p f-t-p f-p, f f-p, f-t-p f-t-p f-t f-t-p f-t-p f-t-p f-t f-t f-t N. s. 5 6 7 7 2 3 7 4 2 1 3 1 4 4 8 1 1 1 year 2016 2016 2015 2015 2014 2014 2014 2014 2014 2014 2012 2012 2011 2011 2011 2010 2008 2016 n overview of direct A et al. artin et al. artin et al. et al. ugler et al. requency information in the penultimate column is spectral column magnitude- in the penultimate requency or power-based. information ei et al. keda et al. keda Her ff et al. M Wang et al. Wang Her ff et al. Lotte et al. Lotte Bouchard Bouchard Derix et al. M I M Kanas et al. Derix et al. Zhang et al. Wang et al. Wang Leuthardt Leuthardt P Kellis et al. authors et al. Blakely Table Table 1. altera; inter i.a., vowel; V, consonant; C, investigated; inv., frequency; frq., conventional; conv., parietal; p, temporal; t, frontal; f, F BRAIN-COMPUTER INTERFACES 9 of analysis in these recent studies proved to elicit relatively signal was evaluated. Compared with this number, Te high accuracies of decoding. Nevertheless, exploration of International Corpus of English (ICE) [90] contains other levels of may help to further one million words per variety of the English language. enhance the performance of direct speech BCIs. Wouldn’t it be interesting to extend the approach by Kellis Recent work shows that neighborhood probability esti- et al. [80] to any word in the whole register of ICE to mates can be useful to improve speech reconstruction by ensure better coverage of natural speech by BCI applica- evaluating the decoded speech unit with regard to its sta- tions? Tis is, however, hardly feasible (or at least method- tistical probability in the linguistic context (e.g. a personal ologically difcult without breaking down the words into pronoun (‘he’) in English is more likely to precede a fnite a limited number of primitives) for the following reason. verb (‘reads’) than vice versa). It has been shown that pre- For a decoding algorithm to recognize diferences in brain dictive methods from linguistics, which rely on probabil- activity, training data with multiple examples of neuronal ities of co-occurrence of language units, can enhance the recordings underlying each word would be required. Tus, speed of spelling in P300 interfaces in healthy [86] and in one would have to ask the subject to repeat the entire ICE paralyzed [87] subjects. A recent development in ECoG- corpus multiple times to obtain such data, which would BCI studies is that probabilistic n-gram-based models be extremely time-consuming, burdensome to the subject, have been applied in production [3] and perception [88] and most likely impossible in ECoG studies which rely on studies to constrain the number of meaningful choices data recorded over relatively short time periods of about for speech decoding. Together with a recent review on one to three weeks prior to surgical resection. the integration of language models into BCI classifers We anticipate that such ample data can be generated [89], these publications support the notion that the use by adopting methodology from linguistics, similar to of predictive models of language, which can be derived the approach we have illustrated in Figure 1. A central from linguistic research, is a promising strategy for future assumption in corpus-based linguistics is that the sta- direct speech BCI studies. tistical properties of a language generalize across users Language is a highly complex system, which possesses of the respective language, and that these statistics are multiple levels of abstraction ranging from articulatory similarly refected in the (para-)linguistic behavior of dif- primitives to complex and abstract syntactic and semantic ferent speakers [63]. For this reason, for example, word structures. Multi-level approaches with in-built models of frequency extracted from a retrospective corpus can be language can be useful to improve the accuracy of speech associated with comparable response latencies between reconstruction with direct speech BCIs. Information on speakers of the same language [91]. Neurolinguistic evi- several levels of linguistic abstraction may be helpful, e.g. dence on the functional organization of language areas is when decoding results on these diferent levels are mutu- also based on the principle of generalizability: in spite of ally incompatible. For instance, if the frst estimate of the inter-individual variability of language-relevant anatom- phoneme-to-phoneme approach is \’ sō\ (‘so’), the second ical [92] and functional areas [52], reproducible spatial is \’so\̇ (‘saw’), and the best estimate for the part-of-speech [16] and temporal [57] patterns of functional organization category decoding is ‘noun’ followed by the category ‘verb’, can be observed. Following the same principle, one could the mutually compatible solution (‘saw’, ‘noun’) would be expect that the neuronal signal components which are selected. Such linguistically multi-level approaches can be most informative of linguistic distinctions will, to some expected to harness more information which can be used extent, generalize between speakers. It is an interesting for decoding and to restrict the number of meaningful open question for speech BCI research, whether and choices given the linguistic context [89]. under what conditions a decoder trained on data from other subject(s) can yield successful decoding from neu- 6. Relevance of corpus-based linguistic ronal recordings of a diferent individual [93]. methodology for direct speech BCI research An important advantage of generating and studying neurolinguistic corpora may be that they would allow As proposed above, the performance of direct BCIs may for classifcation of linguistic phenomena which have a beneft from accounting for multiple levels of linguistic large number of realizations in natural language, as is the abstraction. Nevertheless, a challenge when implement- case with the above-mentioned linguistic category ‘word’. ing such an integrative approach is that neurolinguistic Such neurolinguistic evidence can be generated similarly evidence of neuronal distinctions between linguistic to the way linguistic corpora are generated, that is, by entities on many of these levels still needs to be gener- recording continuous speech of multiple speakers over ated. Consider, for instance, the linguistic category ‘word.’ extensive time periods, segmenting it in linguistic units In a BCI study on speech decoding by Kellis et al. [80], (e.g. words) and annotating those units with regard to var- classifcation of 10 individual words from the neuronal ious linguistic categories (similarly to what we have done 10 O. ILJINA ET AL. in the example in Figure 1). Many linguistic corpora are classifers summarized in this paragraph are associated currently available online, which greatly promotes quan- with the demerit of manual feature extraction based on titative linguistic research. To our knowledge, however, a priori assumptions as to what signal features are most no published neurolinguistic corpus bringing together informative for classifcation. spontaneously spoken speech and concurrent neuronal Te decoders used in previous direct speech BCI recordings is currently available, and the joint efort of research can be divided into standard algorithms using linguists and neuroscientists is needed to create such data standard classifers (linear regression [79,84], linear and make them available to interested researchers. [73,75,82] and radial-basis-function [71] SVMs, LDA [4,18,20,77], a naive Bayes classifer [78], hierarchical 7. Developments in the area of direct speech k-means clustering [22]) and customized, non-standard BCIs from a machine-learning perspective methods [3,74,76,80]. As can be seen from Table 1, about two-thirds of previous direct speech-ECoG-BCI studies Direct BCI studies on speech production have taken dif- have opted for standard methods. Tese methods have the ferent machine-learning approaches, which are summa- advantage that the decoding algorithms are well known rized in Table 1. Tese can be classifed by their usage and thus more transparent for the reader to understand, of high-bias vs. high-variance classifers and usage of and that diferences in decoding performance elicited with standard machine-learning algorithms vs. non-standard, the same standard classifer can be more easily attributed customized algorithms tailored to the specifc decoding to the other steps of the decoding pipeline such as data problem. As in the case of recoding techniques, each of pre-processing and feature extraction. Custom meth- these approaches has its own advantages and disadvan- ods make such comparisons difcult. However, they can tages [94]. achieve a substantially better performance than standard Most studies summarized in Table 1 have used high- decoders by virtue of being adapted to the requirements bias classifers, i.e. classifers that make strong assump- of the specifc decoding problem. An example is the usage tions about the mathematical relationships between of an in-built probabilistic language model to alleviate input features and decoding targets. For instance, linear the multi-class decoding problem in the study by Herf decoders assume a linear relationship. Examples of high- et al. [3]. bias classifers include linear support vector machines Another fundamental diference is between super- (SVM [73,75,82]), linear discriminant analysis (LDA vised and unsupervised decoding approaches. Supervised [4,18,20,77]), linear regression-based [79,84] or custom decoding approaches require known decoding targets linear decoders [76], principal-component linear regres- such as the words produced by the subject during a sion [74], and a naive Bayes classifer [78]. Such classi- recording. Unsupervised approaches can be used to dis- fers have the advantages of being fairly robust against cover meaningful structure in the data even if one does overftting. Teir strong assumptions can prevent them not have decoding targets. One example is the study by from learning ‘false’ relationships, which only exist due Wang et al. [22], in which the authors recorded the sub- to noise in the particular data they are trained on. Also, jects’ brain signals during spontaneous activities. Trough they are usually fast to train and fast to apply compared unsupervised clustering methods, they could detect clus- with more complex high-variance classifers. Te demerit ters of similar brain activity for diferent behaviors includ- of this robustness is infexibility: if the imposed assump- ing movement, speaking, and resting, even though they tions are far from the true mathematical relationships did not use a priori labels indicating which behaviors took between input features and decoding targets, the classi- place. Unsupervised approaches therefore open the door fer will yield suboptimal decoding performance. High- for discovering meaningful insights from ample unlabeled variance classifers impose less strong assumptions on the recordings of brain signals. mathematical relationships in the data, and thus they can Another class of decoding algorithm relevant to direct learn a larger range of relationships, while being more speech BCI research is deep learning, which is currently vulnerable to overftting. High-variance approaches used gaining in BCI research [95–97]. Deep learning in previous studies (Table 1) include radial-basis-function has most prominently been used for recognition of graph- SVM [71], dynamic time-warping-based multiple-kernel ical patterns in computer vision, and it has rapidly pene- SVM [81], hierarchical k-means clustering [22], and cus- trated multiple other areas due to its general applicability. tom non-linear clustering-based classifers [3,80]. Some Schmidhuber [98] provides a detailed historical review of authors have also evaluated multiple classifers, thereby the evolution of deep learning; also see LeCun et al. [99] showing which classifers are better suited for their decod- for an introductory review. Deep learning approaches, ing problem (e.g. [71]). All high-bias and high-variance such as convolutional neural networks or recurrent neural BRAIN-COMPUTER INTERFACES 11 networks, are of interest to the BCI community. Tey can References handle complex, hierarchical non-linear relationships in [1] Birbaumer N, Gallegos-Ayala G, Wildgruber M, et al. the data and use the entire signal without a priori limita- Direct brain control and communication in paralysis. tions on the amount of potentially informative features. Brain Topogr. 2014;27(1):4–11. Disadvantages of deep learning are that models take [2] T innes-Elker F, Iljina O, Apostolides JK, et al. Intention longer to train and that they are harder to interpret com- concepts and brain-machine interfacing. Front Psychol. pared with those yielded with other methods. A crucial 2012;3:1–10. Article id 455. [3] Her f C, Heger D, de Pesters A, et al. Brain-to-text: advantage, however, is that deep learning approaches can decoding spoken phrases from phone representations in handle large numbers of classes [100,101]. Tis makes the brain. Front Neurosci. 2015;9:1–11. Article id 217. them particularly attractive to direct speech BCI research, [4] Lotte F, Brumberg JS, Brunner P, et al. which has to face the complexity and variability of natural Electrocorticographic representations of segmental human language. To our knowledge, no published direct features in continuous speech. Front Hum Neurosci. 2015;9:1–13. Article id 97. speech BCI study so far has used these methods, and they [5] Logothetis NK. What we can do and what we cannot do bear hitherto unexplored opportunities for research to with fMRI. Nature. 2008;453(7197):869–878. come. [6] Ojemann GA, Ojemann J, Ramsey NF. Relation between functional magnetic resonance imaging (fMRI) and single , local feld potential (LFP) 8. Future prospects of invasive BCI for speech and electrocorticography (ECoG) activity in human restoration cortex. Front Hum Neurosci. 2013;7:1–9. Article id 34. [7] Conner CR, Ellmore TM, Pieters TA, et al. Variability BCI technology is now developing towards autonomous of the relationship between electrophysiology and bold- wireless implants, and, although several technical chal- fMRI across cortical regions in humans. J Neurosci. lenges need to be surmounted [39], long-term recordings 2011;31(36):12855–12865. with such implants are conceivable in the near future. A [8] Hong K-S, Santosa H. Decoding four diferent sound- companion paper in this special issue [102] provides a categories in the auditory cortex using functional near- infrared spectroscopy. Hear Res. 2016;333:157–166. detailed discussion of the current trends and future per- [9] Santosa H, Hong MJ, Hong K-S. Lateralization of music spectives of implant technology for human application. processing with noises in the auditory cortex: an fNIRS We believe that ECoG-based invasive BCI is a promising study. Front Behav Neurosci. 2014;8:1–9. Article id 418. strategy to restore communication in paralyzed patients. [10] Chaudhary U, Xia B, Silvoni S, et al. Brain-computer It is nevertheless difcult to estimate when this will be interface-based communication in the completely possible in clinical practice. Tis will depend on the new- locked-in state. PLoS Biol. 2017;15(1):e1002593. [11] Gallegos-Ayala G, Furdea A, Takano K, et al. Brain est and conceivable future advances in implant, decod- communication in a completely locked-in patient ing, and neurolinguistic methodologies. Speech-related using bedside near-infrared spectroscopy. . neuronal signals obtained continuously over months 2014;82(21):1930–1932. or perhaps even years during daily-life experiences of [12] Chaudhary U, Birbaumer N, Ramos-Murguialday A. implanted neurological patients will most likely be avail- Brain-computer interfaces for communication and rehabilitation. Nat Rev Neurol. 2016;12(9):513–525. able soon. Such ‘big data’ can provide valuable content [13] Ball T, Kern M, Mutschler I, et al. Signal quality of for neurolinguistic corpora. Decoding approaches suit- simultaneously recorded invasive and non-invasive able for long-term recordings of brain activity, such as EEG. NeuroImage. 2009;46(3):708–716. that implemented in [22], and also deep learning [95–97], [14] McGuigan FJ. Covert oral behavior during the together with automated or semi-automated approaches silent performance of language tasks. Psychol Bull. to account for linguistic information [3,88], may advance 1970;74(5):309. [15] Ganushchak LY, Christofels IK, Schiller NO. T e use current understanding of the dynamic processes underly- of electroencephalography in ing speech production and open up new possibilities for research: a review. Front Psychol. 2011;2:1–6. Article id research on direct speech BCIs. 208. [16] Price CJ. A review and synthesis of the frst 20 years of PET and fMRI studies of heard speech, spoken language Acknowledgements and reading. NeuroImage. 2012;62(2):816–847. [17] del Millán J del R, Carmena JM. Invasive or noninvasive: We thank GRK 1624 ‘Frequency efects in language’ (University understanding brain-machine interface technology. of Freiburg) and BrainLinks-BrainTools (DfG grant EXC1086) IEEE Eng Med Biol Mag. 2010;29(1):16–22. for fnancial support. [18] Derix J, Iljina O, Schulze-Bonhage A, et al. “Doctor” or “darling”? Decoding the communication partner Disclosure statement from ECoG of the anterior temporal lobe during non- experimental, real-life social interaction. Front Hum No potential confict of interest was reported by the authors. Neurosci. 2012;6:1–14. Article id 251. 12 O. ILJINA ET AL.

[19] Fiederer LDJ, Lahr J, Vorwerk J, et al. Electrical [34] Taplin AM, de Pesters A, Brunner P, et al. Intraoperative stimulation of the human cerebral cortex by extracranial mapping of expressive language cortex using passive muscle activity: efect quantifcation with intracranial real-time electrocorticography. Epilepsy Behav Case EEG and FEM simulations. IEEE Trans Biomed Eng. Rep. 2016;5:46–51. 2016;63(12):2552–2563. [35] Bouchard KE, Mesgarani N, Johnson K, et al. Functional [20] Derix J, Iljina O, Weiske J, et al. From speech to organization of human sensorimotor cortex for speech thought: the neuronal basis of cognitive units in non- articulation. Nature. 2013 Mar 21;495(7441):327–332. experimental, real-life communication investigated [36] Toyoda G, Brown EC, Matsuzaki N, et al. using ECoG. Front Hum Neurosci. 2014;8:383. Electrocorticographic correlates of overt articulation [21] Ruescher J, Iljina O, Altenmüller D-M, et al. Somatotopic of 44 English phonemes: intracranial recording in mapping of natural upper- and lower-extremity children with focal epilepsy. Clin Neurophysiol. movements and speech production with high gamma 2013;125(6):1129–1137. electrocorticography. NeuroImage. 2013;81:164–177. [37] Chen Y, Shimotake A, Matsumoto R, et al. Te “when” [22] Wang NXR, Olson JD, Ojemann JG, et al. Unsupervised and “where” of semantic coding in the anterior temporal decoding of long-term, naturalistic human neural lobe: temporal representational similarity analysis of recordings with automated video and audio annotations. electrocorticogram data. Cortex. 2016;79:1–13. Front Hum Neurosci. 2016;10:165. [Epub ahead of [38] Fedorenko E, Scott TL, Brunner P, et al. Neural correlate print]. doi: 10.1016/j.bandl.2016.06.003 of the construction of sentence meaning. Proc Nat Acad [23] Towle VL, Yoon H-A, Castelle M, et al. ECoG gamma Sci. 2016;113(41):E6256–E6262. activity during a language task: diferentiating expressive [39] Martin S, Millán JDR, Knight RT, et al. Te use of and receptive speech areas. Brain. 2008;131(8):2013– intracranial recordings to decode human language: 2027. challenges and opportunities. Brain Lang. 2016. [Epub [24] Brumberg JS, Wright EJ, Andreasen DS, et al. ahead of print]. doi: 10.1016/j.bandl.2016.06.003 Classifcation of intended phoneme production from [40] Przyrembel M, Smallwood J, Pauen M, et al. Illuminating chronic intracortical microelectrode recordings in the dark matter of : Considering speech-motor cortex. Front Neurosci. 2011;5:1-12. the problem of social interaction from philosophical, Article id 65. psychological, and neuroscientifc perspectives. Front [25] Guenther FH, Brumberg JS, Wright EJ, et al. A wireless Hum Neurosci. 2012;6:1-15. Article id 190. brain-machine interface for real-time speech synthesis. [41] Brown EC, Rothermel R, Nishida M, et al. In vivo PLoS ONE. 2009;4(12):e8218. animation of auditory-language-induced gamma- [26] Ritaccio A, Matsumoto R, Morrell M, et al. Proceedings oscillations in children with intractable focal epilepsy. of the seventh international workshop on advances NeuroImage. 2008;41(3):1120–1131. in electrocorticography. Epilepsy Behav. 2015;51: [42] Kojima K, Brown EC, Rothermel R, et al. Multimodality 312–320. language mapping in patients with lef-hemispheric [27] Towle VL, Dai Z, Zheng W, et al. Mapping cortical language dominance on Wada test. Clin Neurophysiol. function with event-related electrocorticography. In: 2012;123(10):1917–1917. RW Byrne, Editor. Functional mapping of the cerebral [43] Genetti M, Tyrand R, Grouiller F, et al. Comparison cortex. Springer International Publishing; 2016. p. 91– of high gamma electrocorticography and fMRI 104. doi: 10.1007/978-3-319-23383-3_6 with electrocortical stimulation for localization of [28] Crone NE, Hao L, Hart J Jr, et al. Electrocorticographic somatosensory and language cortex. Clin Neurophysiol. gamma activity during word production in spoken and 2015;126(1):121–130. sign language. Neurology. 2001;57(11):2045–2053. [44] Nourski KV, Steinschneider M, Rhone AE. [29] Sinai A, Bowers CW, Crainiceanu CM, et al. Electrocorticographic activation within human auditory Electrocorticographic high gamma activity versus cortex during dialog-based language and cognitive electrical cortical stimulation mapping of naming. testing. Front Hum Neurosci. 2016;10:1–15. Article id Brain. 2005;128(7):1556–1570. 202. [30] Fukuda M, Rothermel R, Juhász C, et al. Cortical [45] Wray CD, Blakely TM, Poliachik SL, et al. Multimodality gamma-oscillations modulated by listening and overt localization of the sensorimotor cortex in pediatric repetition of phonemes. NeuroImage. 2010;49(3):2735– patients undergoing epilepsy surgery. J Neurosurg 2745. Pediatr. 2012;10(1):1–6. [31] Edwards E, Nagarajan SS, Dalal SS, et al. Spatiotemporal [46] Cho-Hisamoto Y, Kojima K, Brown EC, et al. Cooing- imaging of cortical activation during verb generation and babbling-related gamma-oscillations during and picture naming. NeuroImage. 2010 Mar;50(1):291– infancy: Intracranial recording. Epilepsy Behav. 301. 2012;23(4):494–496. [32] Kunii N, Kamada K, Ota T, et al. Te dynamics of [47] Vansteensel MJ, Bleichner MG, Dintzner LT, et al. Task- language-related high-gamma activity assessed on free electrocorticography frequency mapping of the a spatially-normalized brain. Clin Neurophysiol. motor cortex. Clin Neurophysiol. 2013;124:1169–1174. 2013;124(1):91–100. [48] Arya R, Wilson JA, Vannest J, et al. Electrocorticographic [33] Leuthardt EC, Miller K, Anderson NR, et al. language mapping in children by high-gamma Electrocorticographic frequency alteration mapping: synchronization during spontaneous conversation: a clinical technique for mapping the motor cortex. comparison with conventional electrical cortical Neurosurgery. 2007;60(4 Suppl 2):260–271. stimulation. Epilepsy Res. 2014;110:78–87. BRAIN-COMPUTER INTERFACES 13

[49] VanGilder JL. Electrocorticographic analysis of [Experience matters. Frequency efects in language: a spontaneous conversation to localize receptive and lab report]. 2013;43(1):7–32. expressive language areas [master thesis]. Tempe (AZ): [69] Chakrabarti S, Sandberg HM, Brmberg JS, et al. Progress Arizona State University; 2013. in speech decoding from the electrocorticogram. [50] Selting M, Auer P, Barth-Weingarten D, et al. Biomed Eng Lett. 2015;5(1):10–21. Gesprächsanalytisches Transkriptionssystem 2 (GAT [70] Her f C, Schultz T. Automatic speech recognition 2) [Conversational transcription system 2 (CTS 2)]. from neural signals: a focused review. Front Neurosci. Gesprächsforschung: Online-Zeitschrif zur verbalen 2016;10:1–7. Article id 429. Interaktion; 2009. [71] Kanas VG, Mporas I, Benz HL, et al. Joint spatial- [51] Musan R. Satzgliedanalyse. Heidelberg: Winter; 2009. spectral feature space clustering for speech activity [52] Pen feld W, Boldrey E. Somatic motor and sensory detection from ECoG signals. IEEE Trans Biomed Eng. representation in the cerebral cortex of man as studied 2014;61(4):1241–1250. by electrical stimulation. Brain. 1937;60(4):389–443. [72] Kennedy PR, Bakay RA. Restoration of neural output [53] Eisenberg P. Grundriss der deutschen Grammatik: from a paralyzed patient by a direct brain connection. Band 2: Der Satz. (überarb. und aktualisierte Auf.) NeuroReport. 1998;9(8):1707–1711. [Foundations of the German : volume 2: the [73] Blakely T, Miller KJ, Rao RPN, et al. Localization and sentence (modifed and revised edition)]. Stuttgart: classifcation of phonemes using high spatial resolution Metzler; 2004. electrocorticography (ECoG) grids. Conf Proc IEEE [54] Walker MB, Trimboli C. Smooth transitions in Eng Med Biol Soc. 2008;2008:4964–4967. conversational interactions. J Physiol. 1982;117(2):305– [74] Bouchard KE, Chang EF. of spoken 306. vowels from human sensory-motor cortex with high- [55] Hickok G, Poeppel D. Te cortical organization of density electrocorticography. Conf Proc IEEE Eng Med speech processing. Nat Rev Neurosci. 2007;8(5):393– Biol Soc. 2014;2014:6782–6785. 402. [75] Ikeda S, Shibata T, Nakano N, et al. Neural decoding [56] Hickok G. Computational of speech of single vowels during covert articulation using production. Nat Rev Neurosci. 2012;13(2):135–145. electrocorticography. Front Hum Neurosci. 2014;8:1–8. [57] Levelt WJ, Roelofs A, Meyer AS. A theory of lexical access [76] Leuthardt EC, Gaona C, Sharma M, et al. Using the in speech production. Behav Brain Sci. 1999;22(1):1–38. electrocorticographic speech network to control a [58] Levelt WJ, Schriefers H, Vorberg D, et al. Te time brain-computer interface in humans. J Neural Eng. course of lexical access in speech production: a study of 2011;8(3):036004. picture naming. Psychol Rev. 1991;98(1):1–8. Article id [77] Mugler EM, Patton JL, Flint RD, et al. Direct 122. classifcation of all American English phonemes using [59] Sahin NT, Pinker S, Cash SS, et al. Sequential processing signals from functional speech motor cortex. J Neural of lexical, grammatical, and phonological information Eng. 2014;11(3):035015. within Broca's area. Science. 2009;326(5951):445–449. [78] Pei X, Barbour DL, Leuthardt EC, et al. Decoding [60] Jackendof R. A parallel architecture perspective on vowels and consonants in spoken and imagined words language processing. Brain Res. 2007;1146:2–22. using electrocorticographic signals in humans. J Neural [61] Garrett MF. Production of speech: observations from Eng. 2011;8(4):046028. normal and pathological language use. In: AW Ellis, [79] Her f C, Johnson G, Diener L, et al. Towards direct editor. Normality and pathology in cognitive functions. speech synthesis from ECoG: a pilot study. Conf Proc London: Academic; 1982. p. 19–76. IEEE Eng Med Biol Soc. 2016;2016:1540–1543. [62] Auer P. On-line : thoughts on the temporality of [80] Kellis S, Miller K, Tomson K, et al. Classifcation of spoken language. Lang Sci. 2009;31:1–13. spoken words using surface local feld potentials. Conf [63] Bybee J. Language, usage and cognition. Cambridge: Proc IEEE Eng Med Biol Soc. 2010;2010:3827–3830. Cambridge University Press; 2010. [81] Martin S, Brunner P, Iturrate I, et al. Word pair [64] Chafe W. Discourse, consciousness, and time: the fow classifcation during imagined speech using direct brain and displacement of conscious experience in speaking recordings. Sci Rep. 2016;6:25803. and writing. Chicago, IL: University of Chicago Press; [82] Zhang D, Gong E, Wu W, et al. Spoken sentences 1994. decoding based on intracranial high gamma response [65] Frazier L, Fodor JD. Te sausage machine: a new two- using dynamic time warping. Conf Proc IEEE Eng Med stage parsing model. Cognition. 1978;6(4):291–325. Biol Soc. 2012;2012:3292–3295. [66] Brumberg JS, Krusienski DJ, Chakrabarti S, et al. [83] Wang W, Degenhart AD, Sudre GP, et al. Decoding Spatio-Temporal progression of cortical activity related semantic information from human electrocorticographic to continuous overt and covert speech production in a (ECoG) signals. Conf Proc IEEE Eng Med Biol Soc. reading task. PLoS ONE. 2016;11(11):e0166872. 2011;6294–6298. [67] Schomers MR, Garagnani M, Pulvermüller F. [84] Martin S, Brunner P, Holdgraf C, et al. Decoding Neurocomputational consequences of evolutionary spectrotemporal features of overt and covert speech connectivity changes in perisylvian language cortex. J from the human cortex. Front Neuroeng. 2014;7:1–15. Neurosci. 2017;37(11):3045–3055. Article id 14. [68] Pfänder S, Behrens H, Auer P. Erfahrung zählt. [85] Liberman AM, Delattre PC, Cooper FS, et al. Te role Frequenzefekte in der Sprache: ein Werkstattbericht. of consonant-vowel transitions in the perception of the Zeitschrif für Literaturwissenschaf und Linguistik stop and nasal consonants. Psych Mon. 1954;68:1–13. 14 O. ILJINA ET AL.

[86] Ryan DB, Frye GE, Townsend G, et al. Predictive spelling [94] Lotte F, Congedo M, Lécuyer A, et al. A review of with a P300-based brain-computer interface: increasing classifcation algorithms for EEG-based brain-computer the rate of communication. Int J Hum Comput Interact. interfaces. J Neural Eng. 2007;4(2):R1–R13. 2011;27(1):69–84. [95] Hajinoroozi M, Mao Z, Jung TP, et al. EEG-based prediction [87] Speier W, Chandravadia N, Roberts D, et al. Online of driver’s cognitive performance by deep convolutional BCI typing using language model classifers by ALS neural network. Signal Process Image. 2016;47:549–555. patients in their homes. Brain-Computer Interfaces. [96] Manor R, Mishali L, Geva AB. Multimodal neural 2016;4:1–8. network for rapid serial visual presentation brain [88] Moses DA, Mesgarani N, Leonard MK, et al. Neural computer interface. Front Comput Neurosci. 2016;10: speech recognition: continuous phoneme decoding 1–13. Article id 130. using spatiotemporal representations of human cortical [97] Tabar YR, Halici U. A novel deep learning approach for activity. J Neural Eng. 2016;13(5):056004. classifcation of EEG motor imagery signals. J Neural [89] Speier W, Arnold C, Pouratian N. Integrating language Eng. 2017;14(1):016003. models into classifers for BCI communication: a review. [98] Schmidhuber J. Deep learning in neural networks: an J Neural Eng. 2016;13(3):031002. overview. Neural Networks. 2015;61:85–117. [90] Greenbaum S, Nelson G. The International [99] LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. Corpus of English (ICE) project. World Englishes. 2015;521(7553):436–444. 1996;15(1):3–15. [100] Esteva A, Kuprel B, Novoa RA, et al. Dermatologist-level [91] Old feld RC, Wingfeld A. Response latencies in naming classifcation of skin cancer with deep neural networks. objects. Q J Exp Psychol. 1965;17(4):273–281. Nature. 2017;542(7639):115–118. [92] Amunts K, Schleicher A, Bürgel U, et al. Broca's region [101] Morin F, Bengio Y. Hierarchical probabilistic neural revisited: Cytoarchitecture and intersubject variability. J network language model. Aistats. 2005;5:246–252. Comp Neurol. 1999;412(2):319–341. [102] Kohler F, Gkogkidis CA, Bentler C, et al. Closed- [93] Jayaram V, Alamgir M, Altun Y, et al. Transfer learning loop interaction with the cerebral cortex: present and in brain-computer interfaces. IEEE Comput Intell Mag. future of wireless implant technology. Brain Comput 2016;11(1):20–31. Interfaces. forthcoming.