Curr Neurobiol 2020; 11(2): 27-36 ISSN 0975-9042

Sensorimotor Rhythm is Associated with Reinforcement Learning and Cognitive Impulsivity: A Neurofeedback Study Eddy J. Davelaar1 and Jakub Jilek2

1Birkbeck, University of London, United Kingdom 2University College London, United Kingdom Abstract Electroencephalography (EEG) Neurofeedback (NFB) is a method which enables participants to achieve voluntary control over their brain oscillations. Here, we trained participants to enhance their sensorimotor rhythm (SMR; 12-15Hz), while simultaneously suppressing theta (4-7Hz) and beta2 (25-29Hz), and assessed the effects on attention, , reinforcement learning and rational thinking using a battery of tests. One half of 30 participants completed 15 NFB sessions each, while the other half waited for the same amount of time. Although we did not find any within- or between-groups effects on attention and working memory, we did find significant improvements on measures of reinforcement learning and rational thinking for the training group compared to the control group. The improvement was accompanied by a consistent increase of SMR on multiple measures of neural learning. The increase in resting-state SMR correlated with the increase in reinforcement learning and cognitive impulsivity. These findings support a central role in striatal processes underlying the homeostatic set point of SMR and in cognitive inhibition.

Keywords: sensorimotor rhythm, electroencephalography, neurofeedback, cognitive impulsivity, reinforcement learning

Introduction specificity remains sparse. This is in part due to methodological challenges (see for discussion [12-14]) and the dearth of Cognitive training is defined as an intervention which aims studies demonstrating that the change in the neural parameters to improve the mental capacities of individuals beyond what is correlates with the change in cognitive and behavioural outcome required to preserve good health or treat a medical condition (see for discussions, [15,16]). In addition, there is an ongoing [1]. Whereas one approach to cognitive enhancement is to focus question about the cognitive specificity of training protocols, on cognition directly, through [2], i.e., demonstrating that a training intervention has an effect on another approach is to target the underlying neural processes with one, but not another cognitive domain. This requires the use of methods such as brain stimulation [3] and assess the cognitive tests that cover a range of cognitive domains. outcome. Neurofeedback (NFB) is unique in that it combines the two approaches. During NFB training, the trainee observes a In this study, we investigated the effect of upregulating correlate of a relevant neurobiological signal (e.g. the amplitude sensorimotor rhythm (SMR) on four separate cognitive functions of a brain oscillation) and has to learn how to get it under in healthy adults. SMR is an oscillation of electrical activity that volitional control [4,5]. The learning process follows operant is localised in the somatosensory cortex and has the frequency contingencies, where the relevant behaviour (neural activity) of 12-15 Hz [17]. Neurophysiological studies in cats revealed is either reinforced or suppressed with positive and negative that SMR is generated by neurons in the ventrobasal thalamus feedback, respectively [6]. In contrast with pharmaceuticals and (VBT) during inhibition of motor output [18]. Since the VBT brain-stimulation, which alter neural activity exogenously, the serves as a relay through which signals are sent from sensory neurobiological changes elicited by neurofeedback are the result pathways to the cortex, input from proprioceptive receptors is of endogenous self-regulatory mechanisms. Neurofeedback thus an important modulator of SMR [19]. Whereas proprioceptive engages a cognitive component (e.g., effort and motivation), discharge caused by body movements causes a suppression of while directly targeting neural activity. SMR, motor inhibition leads to its increase [20]. Since motor activity is known to disengage visual processing areas [21], Electroencephalography (EEG) neurofeedback has it has been suggested that it may interfere with perceptual predominantly been used as a therapeutic tool to alleviate processing and information integration [22]. Given that during symptoms associated with such conditions as ADHD [7] and the production of SMR the conduction of somatosensory input epilepsy [8]. However, researchers also explore its potential to to the cortex must be attenuated, learning to voluntarily enhance enhance cognitive functions, such as attention and memory, in SMR likely necessitates the regulation or suppression of this healthy individuals [9-11]. Despite an exponentially increasing input. SMR enhancement is therefore hypothesised to improve interest in NFB [5], the evidence for its effectiveness and 27 Curr Neurobiol 2020 Volume 11 Issue 2 Sensorimotor Rhythm is Associated with Reinforcement Learning and Cognitive Impulsivity: A Neurofeedback Study

a variety of cognitive functions by reducing sensorimotor We further extended our investigation into two other interference. domains, not yet explored in the NFB literature: reinforcement learning and rational thinking. Reinforcement learning (RL) is Apart from positive clinical outcomes [23], an increase in a cognitive process by means of which organisms predict and SMR power has been associated with improvements in attention acquire future rewards [35]. The probabilistic selection task and memory in healthy subjects. Regarding attention, most (PST) measures the ability to learn from positive and negative studies have found reductions in omission (failure to respond to feedback, which are dependent on dopaminergic innervation a target) and commission errors (i.e. failure to inhibit response to of the striatum [36] and Parkinson [37]. Since levels a distractor). For instance, Egner and Gruzelier [24] compared have been associated with alpha (8-13 Hz) and beta (13-30 Hz) the differential outcomes of SMR and beta1 (15-18 Hz) up- oscillations, the striatal-thalamo-cortical circuitry is believed regulation on performance on the Test of Variables of Attention to be involved in the generation of these oscillations [38]. (ToVA). The participants who received SMR training exhibited We hypothesised that if there is indeed a connection between a reduction in commission errors, which directly correlated with reinforcement learning, striatum and cortical EEG, the training the change in SMR amplitude. Vernon et al. [25] compared the of SMR (which combines a higher part of alpha and a lower part effects of theta and SMR upregulation on performance in two of beta) would improve reinforcement learning in participants. versions of a continuous performance task. Whereas in the two- sequence version, participants had to respond whenever they We assessed rational thinking by using the cognitive saw one of two three-digit sequences of numbers (e.g. 4-2-7 reflection test (CRT, [39,40]), which consists of mathematical or 5-6-2), in the three-sequence version, they had to respond problems, each offering an intuitive, but incorrect, answer and to three such sequences. SMR enhancement led to significant a correct answer which requires more deliberation. In order reductions in both omission and commission errors in the two- to provide the correct answer, participants must consciously sequence version of a continuous performance task, but not in inhibit the intuitive answer. Given that the upregulation of the three-sequence version. The researchers hypothesised that SMR has been associated with improved internal inhibition, we SMR upregulation may operate by enhancing the effectiveness expected that it should improve the performance on the CRT. of alerting and orienting attention networks invoked in the As a contrast logical reasoning problems, such as syllogisms, two-sequence task but not on the executive attention network which are solved in an incremental manner were hypothesised invoked in the three-sequence task with greater memory load. not to show any benefit of SMR training. Such a finding would contribute to assessing cognitive specificity. In the same study, the authors also found improvements in working memory. In particular, compared to the theta- Apart from the impact on cognitive performance, referred to upregulation group, SMR trainees exhibited a 10% improvement as cognitive training effect, we were also interested in the impact in on a conceptual span task ([26]; memorisation of on resting-state EEG, referred to as neural training effects. We sequences of words belonging to different categories). Others would expect that successful SMR training would correlate with [27] also reported an improvement in working memory, as changes in SMR at the training electrode location. However, measured using backwards digit span (memorising sequences given that brain wave oscillations are generated by overlapping of digits and reporting them in reversed order of presentation). neural networks, we could observe training effects on frequency However, [28] observed improvements on tests of long-term bands and electrode locations other than the trained frequency- memory, but not on backwards digit span. electrode combination. Our first goal in the present study was to test the hypothesis Methods posed by [25] through assessing the effect of SMR upregulation Participants on the three functionally distinct, albeit mutually interacting, attention networks: alerting, orienting and executive network Thirty-five candidates were recruited through adverts and [29]. For this purpose, we administered the Attention Network were interviewed in order to determine their motivation level, Test ANT; [30-32]) that has traditionally been used to test the physical and mental state. We excluded candidates with mental efficiency of each network. and physical disorders, a tendency towards anxiety, high levels of stress, pharmacological treatment and frequent headaches. Our second goal was to address the discrepancy in the results Out of initial 35 candidates, five were excluded based on on working memory [25,27, 28]. Working memory has storage these criteria. The final 30 participants (10 females; mean age and processing capacity, which has traditionally been assessed = 30.6; SD = 9.4) were randomly allocated to an active SMR with span tasks. Whereas simple span tasks (e.g. forward training group or a wait-list control group. There were no age digit span) engage only the storage component, complex span differences between the two genders (males: mean = 30.9; SD = tasks are believed to invoke both components by interspersing 2.2; females: mean = 30.1; SD = 2.8), or between the two groups sequences of stimuli with unrelated secondary processing tasks (SMR: mean = 29.3; SD = 2.6; control: mean = 31.9; SD = 2.3). such as solving mathematical problems [33]. In this study, we There were 6 females in the SMR group and 4 females in the assessed the effects of SMR upregulation on working memory control group. using the complex operation span task (OSPAN; [34]).

Curr Neurobiol 2020 Volume 11 Issue 2 28 Davelaar EJ, Jilek J

Design location and congruence (for an examples with within-trial timing see Figure 2). The proportional increase in response This study conforms to a 2 (pre- vs post-training) by 2 (SMR times to targets in different blocks were used as proxies for the training vs wait-list control) mixed design addressing the impact network efficiencies. of SMR upregulation on cognitive performance (cognitive transfer) and resting-state EEG (neural transfer). The study was In the Operation Span Task (OSPAN; [34]) participants approved by the Local Ethics Committee and was conducted had to memorise letters while solving mathematical problems. in accordance with the Helsinki Declaration for ethical human Letters were displayed one at a time for 800 ms and each letter research. was preceded by a math problem, until solved. The letter-math Procedure pairs were presented in sequences of three, four, five, six or seven pairs, with each length repeated three times (for a total After signing a consent form, each participant was assigned of 75 letter-math pairs). After each sequence, participants were two unique letters which were used as an anonymous identifying asked to recall the letters. Letters were selected randomly from code. They then completed a single session with four cognitive a fixed set (F, H, J, K, L, N, P, Q, R, S, T, Y). The math problems tests and a quantitative EEG (QEEG) recording, which was consisted of a multiplication or division operation followed by repeated seven weeks later. Between these two test sessions, addition or subtraction (e.g. 6/2 – 3 = ?). Performance on the participants in the SMR training group completed 15 NFB OSPAN was the sum of letters remembered correctly. training sessions, approximately 2-4 times per week over the space of 7 weeks, whereas the control group did not receive The Probabilistic Selection Task (PST; [41]) is a any training over the same time period. Participants were reinforcement learning task comprising a learning phase and a financially incentivised to perform at the peak of their ability in test phase. In the learning phase, participants were presented both cognitive test sessions. Based on their performance, they with six randomly selected symbols from Japanese Hiragana. received between £20 and £80 for one session of cognitive tests. The selected symbols were randomly grouped into three fixed Figure 1 presents the design and procedure of the study. pairs (AB, CD, EF). The learning phase consisted of 60 trials (20 per pair). In each trial, one of these pairs was presented on the Cognitive assessments screen, with the left-right location of each symbol randomised The Attention Network Task (ANT) [30-32] consisted of across trials. The task was to choose either the left or the right 6 blocks presented in random order. Simple blocks (48 trials) symbol with a response button, after which a positive or negative and focused on one attentional network and interaction blocks feedback was given. Participants were instructed to select (96 trials) on one interaction between two networks. The test the particular symbol in each pair that was more frequently was preceded by a practice session of 20 trials. In each trial, associated with positive feedback. Each symbol had associated participants had to indicate whether the middle arrow pointed probabilities of providing positive and negative feedback (see to the left or right by pressing the “A” or “L” key, respectively. Figure 3). Participants proceeded to the test phase after they The middle arrow was always surrounded by 2 flanking arrows had reached a pre-defined success criterion (see [42]) for each on either side. The targets differed by their direction, spatial pair (65% of A choices in AB trials, 60% of C choices in CD

Figure 1. Experimental design of the study, spanning 7 weeks. On the first and last day of the experiment, participants completed a battery of four cognitive tests in a random order. NFB denotes EEG Neurofeedback training. Each NFB training session lasted approximately 50 minutes and was split into one baseline block of one minute and 8 training blocks, each lasting 3 minutes.

29 Curr Neurobiol 2020 Volume 11 Issue 2 Sensorimotor Rhythm is Associated with Reinforcement Learning and Cognitive Impulsivity: A Neurofeedback Study

Figure 2. The time course of an ANT trial. After a fixation cross lasting between 400 and 1600ms, either a cue appears (in cueing blocks) or the fixation cross continues for a further 100ms (in non-cueing blocks). This is followed by another fixation cross lasting 300ms. The target is presented until a button is pressed or 1700ms have passed. Another fixation cross is displayed to ensure that the length of every trial equals 3000ms.

Figure 3. Six examples from the PST and their associated probabilities of positive/negative feedback. The symbols are grouped into 3 fixed pairs (AB, CD, EF). The probabilities were the same for all participants, only the symbols differed. trials and 50% of E choices in EF trials). If they had not satisfied activity from 19 electrodes (Fp1, Fp2, F7, F3, Fz, F4, F8, T3, all three criteria, they had to repeat the learning phase until all C3, Cz, C4, T4, T5, P3, Pz, P4, T6, O1, O2) were referenced the criteria were satisfied (maximum number of repetitions was to the auricles (A1, A2). The ground electrode was attached six and none of the participants reached this maximum). In the to the scalp (between Fz and the middle of Fp1/Fp2). The test phase, participants were presented with eight novel pairs measurement consisted of five minutes of eyes closed (EC) and of symbols: Go-pairs (AC, AD, AE, AF) and No-go pairs (BC, five minutes of eyes open (EO). Impedance was kept below BD, BE, BF). Whereas in the Go-pairs (“go for A”), the correct 10kΩ for all electrodes. Participants sat on a reclining armchair symbol to choose was A for all combinations of symbols, in the with leg support in a tilted position (45° angle) and instructed to No-go pairs (“don’t go for B”), the correct symbol was always relax and to minimise blinking. Participants were instructed to the symbol other than B. Since learning to select A over B could avoid hard work one day prior to the session and to reduce their be achieved by either associating A with positive feedback or alcohol and caffeine consumption to a minimum. associating B with negative feedback (or both), the success Neurofeedback training procedure rates for Go-pairs and No-go pairs provided a metric for how well participants learned from positive and negative feedback, The NFB protocol was run using a 2-channel Deymed respectively. BrainFeedbackPro machine from a Windows 7 bootcamp partition on an Apple Macbook Pro with OSx Mavericks (late The Cognitive reflection Task (CRT) was paper-based and 2013; 2.4 GHz Intel Core i5). This computer was also used for consisted of eight numerical questions and five logical questions administration of cognitive tests. The NFB protocol involved [39,40,43,44]. An example question was “A bat and a ball cost measuring electrical activity from electrode Cz (sampling rate: $1.10 in total. The bat costs a dollar more than the ball. How 128 Hz). There was one reference electrode clipped to one much does the ball cost?”. The intuitive, but incorrect answer ear and one ground electrode clipped to the contralateral ear. is $0.10, whereas the correct answer is $0.05. The logical Participants sat on a reclining armchair with leg support in a questions were syllogisms. For example, in the question: “All tilted position (45° angle). They received visual feedback on an flowers have petals. Roses have petals. If these two statements external 23-inch screen and auditory feedback through a pair of are true, can we conclude from them that roses are flowers?”, the Logitech speakers. For the training, we used the native Deymed intuitive, but incorrect answer is “yes”, whereas the answer is NFB application. The application collected the EEG data and “no”. Participants had a fixed time limit of 8 minutes to answer computed the frequency spectrum using the Gabor transform the numerical questions and 5 minutes to answer the logical (sliding window 1s). questions. The number of correct questions was the dependent measure. In all training sessions, participants learned to increase the amplitude of SMR (12-15Hz) over the Cz electrode. In addition Quantitative EEG recordings to up-regulating SMR, subjects were not allowed to raise the The pre- and post-training QEEG data were recorded using amplitude of theta (4-7 Hz) and beta2 (25-29 Hz) bands (see a 19-channel Deymed Truscan EEG system (sampling rate: 256 also [27,28]). Each session started with a one-minute baseline Hz) connected to its designated computer (Windows 7). The recording during which participants looked at a white wall.

Curr Neurobiol 2020 Volume 11 Issue 2 30 Davelaar EJ, Jilek J

Then there were eight blocks of three minutes each, interspersed DipFit plugin and the procedure for their manual removal was with breaks of about one minute. adapted from the online ICLabel tutorial1. The mean amplitude was removed from each channel. After the pre-processing, In the first training session, participants tried eight computer powers were computed at 1Hz resolution. Relative power of games and chose that game that they like best. This would each frequency band was calculated as the ratio of their absolute equalise the level of engagement across participants. The power and the absolute total power (1Hz to 40Hz). participants were also provided with graphs, each showing the immediate amplitude and the amplitude history for the last 20s To assess neural learning, we used the data from the NFB of one of the three frequency bands. Vibro-tactile feedback system. We computed the relative amplitude of each frequency was provided through a pair of speakers, which participants band as the ratio of absolute amplitude and the sum of amplitudes held in their hands. If the SMR amplitude reached the required for all frequencies between 4Hz and 29Hz (the range used during threshold and at the same time the amplitudes of theta and beta2 NFB training). This values were stored online by the software. did not exceed their respective thresholds, a discrete sound of The delta frequency band was omitted in the analyses as it could a gong was played for 1s. If the threshold requirements were not be measured during NFB training by the software. continuously satisfied, the sound was repeated every 1s. Results Unlike earlier work where we omitted instructions [45], we Neural learning provided the following instructions to the participants: We examined the change in baseline absolute amplitudes “The learning happens partly on the sub-conscious level. of each frequency band. For each participant, the absolute You may not be able to get the game directly under your amplitude in each of the 15 baselines was transformed into a conscious control so don’t worry if you feel like you don’t know proportional change with respect to their session 1 baseline (see what to do or you’re not in control. Try to relax and synchronise [49]). Repeated-measures ANOVA was subsequently run on yourself with the video-game and sound. ” each frequency band. Concerning the target frequencies of the “Try to relax and concentrate at the same time. It may seem NFB protocol, (uncorrected) within-subject contrasts revealed incompatible at first but throughout the sessions you will slowly a significant positive linear trend for SMR, F(1,14) = 33.70, p 2 2 get better at it.” <.001, ηp = .707, and theta, F(1,14) = 6.72, p = .021, ηp = .32. There was no significant trend for beta2 (p = .539). Between- “There is no concrete strategy of what you must do to make session baseline learning curves are displayed in Figure 4. The the car go forward. Try to experiment with yourself.” linear trends for Alpha and Beta1 were not significant (both sp > We applied an adaptive threshold-setting procedure based .10). The statistical results are reported in Table 1. on perceived cognitive load. In the first training block of each To assess within-session learning, we divided the absolute session, the thresholds of SMR, theta and beta2 were set at amplitude of each block by the relevant session baseline. 50%, 250% and 225%, respectively of that session’s mean We subsequently collapsed these ratios across sessions baseline recording. During the breaks, participants indicated the (individually for each participant). Every participant therefore difficulty level (very easy, easy, OK, difficult, very difficult). The provided 8 data points to the analysis, each representing the experimenter subsequently adjusted the thresholds manually mean proportional change in relative amplitude with respect in steps of 3% of the baseline level based on the subjectively to relevant session baseline. Within-session learning curves perceived cognitive load. In this manner, the level of subjective cognitive load could be equated across participants, which has 1 http://reaching.ucsd.edu:8000/tutorial/overview been shown to influence learning [46,47]. Table 1. Summary of the linear trend analyses (F-values) for the two neural Each training session lasted approximately 50 minutes, learning measures. including electrode placements, baseline recordings, breaks and Linear training. Participants received a candy bar of their choice at the constrast end of each session regardless of their performance. Measure Frequency band F(1,14) p Data pre-processing and analysis Theta (4-7Hz) 6.72 .021† .32 Alpha (8-12Hz) 3.10 .100 .18 The QEEG data were re-referenced offline to the average Between-session SMR (12-15Hz) 33.70 <.001*** .71 of the auricles [(A1 + A2) / 2]. A Finite Impulse Response Beta1 (16-24Hz) 1.32 .271 .09 (FIR) band-pass filter was applied with a high-pass cut-off set Beta2 (25-29Hz) .40 .539 .03 at 1Hz and a low-pass cut-off at 40Hz (automatically selected Theta (4-7Hz) 8.43 .012† .37 filter order). Artefacts were removed automatically with the Alpha (8-12Hz) 7.55 .016† .35 EEGLAB [48] continuous artefact removal procedure (moving Within-session SMR (12-15Hz) 14.61 .002** .51 window size: 1000ms, step: 250ms, threshold: 100µV) and Beta1 (16-24Hz) 2.54 .134 .15 manually with the Independent Component Analysis (ICA; Beta2 (25-29Hz) .005 .947 .00 runica algorithm). ICA components were localised using the † p < .05; ** p < .01; *** p < .001

31 Curr Neurobiol 2020 Volume 11 Issue 2 Sensorimotor Rhythm is Associated with Reinforcement Learning and Cognitive Impulsivity: A Neurofeedback Study

For the Attention Network Task, the network efficiencies were significantly different from zero (all ps < .05). However, a mixed ANOVA (time x group) did not reveal any main effects

or interactions on the overall RTs (ps > .13). We also carried out separate mixed ANOVAs (time x group) on the 9 attention network indices. There were no main effects or interaction effects

for any of the comparisons (all ps > .05). Various transformations of RTs produced the same results. The NFB protocol therefore had no effect on the efficiency of the attention networks or on their mutual interactions. A mixed ANOVA (time x group) on the Ospan score did not reveal a significant difference between the SMR and the control group, p > .451. Tests of within-subject effects revealed 2 a significant main effect of session, F(1,28) = 4.56, p = .042, ηp = .14, but no significant interaction, p = .340. For the Probabilistic Selection Task, the analyses focused on the differential effect of session on the number of correct Figure 4. Between-session and within-session learning curves of the amplitudes answers to Go pairs and NoGo pairs. Mixed ANOVAs (time relative to the first session’s baseline recording. x group) did not reveal a significant main effect of time, p = are displayed in Figure 4. Within-subject contrasts revealed a .857, nor a significant interaction, p = .425, for the Go pairs. significant linear trend (uncorrected) for SMR, F(1,14) = 14.61, For the NoGo pairs, the main effect of time was not significant, p = .002, p = .51 and Theta, F(1,14) = 8.43, p = .012, η 2 = .38. p = .393, but the interaction was significant, F(1,26) = 6.80, p p 2 There was no significant trend for Beta2 (p = .95). In addition, = .015, ηp = .21, due to a significant increase in the number there was a significant positive linear trend for Alpha, F(1,14) of correct answers to NoGo pairs in the SMR group, F(1,13) 2 2 = 6.67, p = .023, ηp = .34, but not the control group, p = .261 = 7.55, p = .016, ηp = .35, but no significant linear trend for Beta 1 (p = .134). The statistical results are reported in Table 1. (Bonferroni-adjusted p = .025). When applying correction for multiple comparison (requiring For the Cognitive Reflection Task, mixed ANOVAs revealed significance level of p < .01), only the change in SMR remained 2 a significant main effect of time, F(1,28) = 16.47, p < .001,p η significant. 2 = .37 and a significant interaction, F(1,28) = 5.08, p = .032, pη Training effects = .15 for the numerical questions. This was due to a significant increase in the number of correct answers in the SMR group, In order to ascertain which frequency bands were modulated 2 F(1,14) = 19.39, p < .001, ηp = .58, but not in the control by NFB training, we ran a series of 2 x 2 ANOVAs on the group (p = .217; Bonferroni-corrected p = .025). For the logical absolute powers of SMR (12-15Hz), delta (1-3Hz), theta (4- questions, neither the main effect of time nor the interaction 7Hz), alpha (8-11Hz), and beta2 (25-29Hz) in the EO and EC were significant (p > .168). conditions, measured at Cz. Given the exploratory nature of this analysis, the significance level was lowered to .01. Although, In the final set of analyses, we correlated the change there were no group differences, the power in higher frequency in resting-state SMR from the QEEG recordings with the bands were increased over time. In the eyes open condition there performance in the PST and the numerical version of the CRT. was an increase in power for SMR, F(1,28) = 7.90, p < .01, and These were the measures that showed significant time x group beta2, F(1.28) = 11.07, p < .01 and in the eyes closed condition, interactions. Correlating these with each other is an example of only for beta2, F(1,28) = 12.93, p <.01. The interaction between using the neurofeedback methodology to enhance the variability group and time was only significant for SMR in the eyes open in the neural measure by which brain-behaviour relations can be condition, F(1,28) = 8.80, p < .01. There were no significant addressed in cognitive neuroscience (see [15] for discussion). interactions in the eyes closed condition. Figure 6 shows the scatter plots for the change in SMR predicting the change in PST-Go, PST-NoGo, and numerical CRT, with To visualise this pattern topographically, Figure 5 present correlation coefficients of r = .60 (p < .001), r = .50 (p < .01), and r maps of SMR power measured during the pre- and post-training = .47 (p < .01), respectively. The correlation between SMR change QEEG sessions for the control and SMR training groups and logical CRT was not significant (r = .28, p = .13). together with the statistical parametric map (t-values) of the within-subject comparison (corrected for multiple comparison Discussion across electrodes). The maps show that the SMR NFB training We examined the impact of training to enhance the significantly enhanced the SMR power over the frontal sensorimotor rhythm (SMR) on cognitive measures of attention, electrodes. working memory, reinforcement learning and rational thinking.

Curr Neurobiol 2020 Volume 11 Issue 2 32 Davelaar EJ, Jilek J

Figure 5. Topographic maps of SMR power (in µV2) during the pre- (left column) and post-training (middle column) resting QEEG session for the control (top row) and SMR training (bottom row) groups. The right column shows maps of t-values testing for the within-subject effect of session (corrected for multiple comparisons). A value of t = +/-3 is significant at p < .05 (two-tailed permutation test).

Figure 6. Scatterplots showing the correlation between the difference in relative SMR power in the eyes open condition and A. the difference in performance on PST-Go pairs, B. the difference in performance on PST-NoGo pairs, and C. the number of correct answers to numerical CRT. Our major finding is that the protocol led to improvements on indicates a lack of band specificity. Dynamic within-session measures of reinforcement learning and the numerical version changes in frequencies other than the trained frequency band of the cognitive reflection task. However, it failed to produce are not unusual in the literature. For instance, SMR training has any cognitive enhancement in attention and working memory. been associated with increases in delta [27], alpha [27], Beta [28] In addition, we observed a range of effects on the EEG profile and even the entire EEG spectrum [53]. Our study corroborates both over training sessions and in the QEEG recordings. these findings by showing that SMR upregulation is not specific to SMR, but affects other frequency bands as well. It is not clear Neural learning why this ‘leakage’ should occur, although it may be due to the We found that the SMR protocol was effective at increasing fact that NFB training engages many cognitive processes (e.g. SMR power within sessions and that across sessions the attention, motivation and effort), which likely invokes a number power increased further. This demonstrates both fast learning of different frequency bands [14]. A possible methodological within each session based on feedback and the emergence of solution to prevent leakage could be to train relative instead of consolidation from one session to the next. The protocol was absolute SMR power. thus successful at achieving its primary goal of enhancing The increase in theta both within and between sessions is SMR. Our findings are consistent with previous literature that surprising given that it was suppressed by the NFB protocol. demonstrated neural learning of SMR within and between However, these findings are also consistent with previous sessions [25,27,28,50-52]. literature. In a similar protocol to ours, Ros et al. [54] found Although not significant after correcting for multiple that theta increased between sessions for participants who comparisons, theta showed an increase within and between exhibited slow learning despite the fact that it was suppressed. sessions, while alpha also increased within sessions. This Such rebound effects have also been documented for alpha

33 Curr Neurobiol 2020 Volume 11 Issue 2 Sensorimotor Rhythm is Associated with Reinforcement Learning and Cognitive Impulsivity: A Neurofeedback Study

suppression [55]. Ros et al. [56] suggested that this paradoxical (21.7% improvement) for the SMR group, but no significant rebound could reflect the dynamics of a homeostatic system, improvement in positive reinforcement learning. However, the whereby a suppression of a frequency band leads to synaptic change in SMR predicted the improvement in both negative and changes, which in turn leads to increases in the frequency band positive reinforcement learning. This paradoxical finding is due power. Although homeostatic plasticity could explain why a to the different statistical questions that are being addressed with rebound may occur, it falls short of elucidating why a particular the mixed ANOVA and the correlation. The former tests whether frequency band should rebound and how the frequency bands the change in task performance differs between groups, whereas are biophysically related to synaptic strengths. These are the latter assesses the brain-behaviour relation, irrespective of questions that should be addressed by future studies. training. Nevertheless, the positive association could either mean that the increased striatal sensitivity to feedback increases Training effects the SMR rhythm, or that increase in resting-state SMR rhythm Despite the lack of band specificity during training, compared heightens the striatal sensitivity to feedback. To speculate, to the control group, resting-state SMR was significantly given that increase in SMR precedes a motor action, it makes enhanced after the 7 weeks intervention compared to the pre- sense to have the system prepared to make positive or negative training QEEG measurement. In addition, the topography was evaluations of that action. Future research could look into the mainly frontal, which may indicate that training SMR at the relation between SMR and evaluative processing. Cz location influence the cortical-thalamic networks for which In the CRT, we did not observe any improvement on the the largest electrical signal can be measured over the frontal logical questions. We found an effect of the NFB protocol on electrodes. Future studies could expand the number of electrode the CRT scores for the numerical version. We interpret this positions to allow source localisation. finding as the evidence that participants who learned to increase We assessed the effect of SMR enhancement on the SMR are more able to suppress the initial intuitive answer and performance in four cognitive tests: The Attention network task engage in more deliberative processing. This suppression might (ANT), the Operation span task (OSPAN), the Probabilistic not be an active process, but merely a consequence of a general selection task (PST) and the Cognitive reflection test (CRT). increased level of cortical damping, which is indexed by the Based on previous results [25], we hypothesised that SMR increased SMR power. Future studies could investigate whether upregulation would improve the efficiency of the executive decreasing SMR could reverse this pattern and enhance intuitive network. However, none of the attention network indices processing. changed due to SMR upregulation. The previously found Overall, we initially hypothesised that SMR upregulation improvements in attention associated with SMR upregulation would impact on all four cognitive areas. Instead, the results [24,57] may therefore not be due to an effect of SMR on the revealed that SMR training only improved learning from efficiency of attentional networks. Rather, it is possible that the negative feedback and suppressing intuitive errors. The change observed reductions in commission errors associated with SMR in SMR was correlated with the change in performance on the enhancement were the result of improved motor inhibition, PST and numerical CRT. Although this points to cognitive which would be in accord with Sterman’s original findings [19]. specificity, the lack of consistency in tasks used across studies We found no impact of the SMR protocol on ospan from different labs makes it difficult to provide a definite performance. These results are in line with [28], who also found confirmation of cognitive specificity. Future research could no improvement in working memory after SMR upregulation optimise the experimental designs to directly address whether but contradict those of [25], who found a positive effect. A SMR training influences specific cognitive domains or specific possible ad-hoc explanation is that the present study (and tests within the same cognitive domain. [28]) used a storage-plus-processing task to measure working In this study, we adopted an adaptive threshold procedure memory, whereas [25] used a storage-only task. However, [27] based on the subjective assessment of cognitive load. In recent observed improvements on the backward span task following years, researchers have begun to investigate which threshold- SMR upregulation . In two of the three studies [27,28], SMR setting procedures are most conducive to neural learning. upregulation did improve the non-verbal backwards Corsi Block Procedures based on the subjective assessment of cognitive load test, which may indicate the requirement of non-phonological [46] and objectively estimated learning incentives [46,58] are information. Hence, it is possible that the positive effect of very promising as they provide NFB researchers with a superior SMR in [25] may relate to semantic short-term maintenance, criterion based on which to adjust thresholds. which underlies performance in the conceptual span task (see [26]), but not in the digit span or operation span tasks. Future An important limitation of our study is the use of a waiting- work could include a battery of span tasks to assess the locus of list control group, which was due to limited resources. Although influence. the passive control group allowed us to control for practice effects, we could not discount all non-specific influences such For the PST, there was a significant improvement in negative as prolonged sitting and concentrating, which may have had reinforcement learning between the pre- and post-training session impact on the EEG spectrum in the active group. However,

Curr Neurobiol 2020 Volume 11 Issue 2 34 Davelaar EJ, Jilek J

there is no perfect control condition, as even the sham-control 13. Micoulaud-Franchi JA, McGonigal A, Lopez R, Daudet C, Kotwas I, condition has been shown to be inappropriate [59]. Future Bartolomei F. Electroencephalographic neurofeedback: Level of evidence in mental and brain disorders and suggestions for good clinical practice. Clin multi-centre large-scale studies should employ multiple control Neurophysiol. 2015; 45: 423-433. groups to control for a range of critiques against any single 14. Gruzelier JH. EEG-neurofeedback for optimising performance. III: A review control condition. of methodological and theoretical considerations. Neurosci Biobehav Rev. Conclusion 2014; 44: 159-182. 15. Berger AM, Davelaar EJ. Frontal alpha oscillations and attentional control: a This study has shown that SMR upregulation result in virtual reality neurofeedback study. Neurosci. 2018; 378: 189-197. a consistent increase in SMR across all measures of neural 16. Gruzelier JH. EEG-neurofeedback for optimising performance. I: A learning. The protocol led to improvements in measures review of cognitive and affective outcome in healthy participants. Neurosci of reinforcement learning and rational thinking, but not in Biobehav Rev. 2014; 44: 124-141. measures of attention and working memory. The discrepancies 17. Sterman MB, Friar L. Suppression of seizures in an epileptic following with previous literature indicate that replications are necessary sensorimotor EEG feedback training. Electroencephalogr Clin Neurophysiol. in order to establish firm connections between NFB protocols 1972; 33: 89-95. and cognitive enhancement. Nevertheless, we managed to 18. Howe RC, Sterman MB. Cortical-subcortical EEG correlates of suppressed demonstrate the feasibility of the SMR protocol. The positive motor behavior during sleep and waking in the cat. Electroencephalogr Clin effects of SMR enhancement on measures of reinforcement Neurophysiol. 1972; 32: 681-695. learning and rational thinking are worthy of further investigation. 19. Sterman MB. Physiological origins and functional correlates of EEG rhythmic activities: implications for self-regulation. Biofeedback Self Acknowledgement Regul. 1996; 21: 3-33. We acknowledge funding from the Wellcome Trust (ref 20. Mann CA, Sterman MB, Kaiser DA. Suppression of EEG rhythmic 204770/z/16/z) under the ISSF Mid-Career Award to ED. frequencies during somato-motor and visuo-motor behavior. Int J Psychophysiol. 1996; 23: 1-7. References 21. Pfurtscheller G. Event-related synchronization (ERS): an electrophysiological 1. Dresler M, Sandberg A, Ohla K, Bublitz C, Trenado C, et al. (2013) Non- correlate of cortical areas at rest. Electroencephalogr Clin Neurophysiol. pharmacological cognitive enhancement. Neuropharmacol. 2013; 64: 529- 1992; 83: 62-69. 543. 22. Kober SE, Schweiger D, Witte M, Reichert JL, Grieshofer P, Neuper C, 2. Hogrefe A, Studer-Luethi B, Kodzhabashev S, Perrig WJ. Mechanisms Wood G. Specific effects of EEG based neurofeedback training on memory underlying n-back training: Response consistency during training influences functions in post-stroke victims. J Neuroeng Rehabil. 2015; 12: 107. training outcome. J Cogn Enhanc. 2017; 1: 406-408. 23. Enriquez-Geppert S, Huster RJ, Herrmann CS. EEG-Neurofeedback as 3. McKinley RA, Bridges N, Walters CM, Nelson J. Modulating the brain at a tool to modulate cognition and behavior: a review tutorial. Front Hum work using noninvasive transcranial stimulation. NeuroImage. 2012; 59: Neurosci. 2017; 11: 51. 129-137. 24. Egner T, Gruzelier JH. Learned self-regulation of EEG frequency 4. Gaume A, Vialatte A, Mora-Sanchez A, Ramdani C, Vialatte FB. A components affects attention and event-related brain potentials in humans. psychoengineering paradigm for the neurocognitive mechanisms of biofeedback Neuroreport. 2001; 12: 4155-4159. and neurofeedback. Neurosci Biobehav Rev. 2016; 68: 891-910. 25. Vernon D, Egner T, Cooper N, Compton T, Neilands C, Sheri A, Gruzelier 5. Van Boxtel GJM, Gruzelier JH. Neurofeedback: Introduction to the special JH. The effect of training distinct neurofeedback protocols on aspects of issue. Biol Psychol. 2014; 95: 1-3. cognitive performance. Int J Psychophysiol. 2003; 47: 75-85. 6. Sherlin LH, Kerson C. Neurofeedback and Basic Learning Theory: 26. Haarmann HJ, Davelaar EJ, Usher M. Individual differences in semantic Implications for research and practice. J Neurother. 2011; 15: 292-304. short-term memory capacity and reading comprehension. J Mem Lang. 2003; 48: 320-345. 7. Arns M, Heinrich H, Strehl U. Evaluation of neurofeedback in ADHD: The long and winding road. Biol Psychol. 2014; 95: 108-115. 27. Kober SE, Witte M, Neuper C, Wood G. Specific or nonspecific? Evaluation of band, baseline, and cognitive specificity of sensorimotor rhythm- and 8. Tan G, Thornby J, Hammond DC, Strehl U, Canady B, Arnemann K, Kaiser gamma-based neurofeedback. Int J Psychophysiol. 2017; 120: 1-13. DA. Meta-Analysis of EEG biofeedback in treating epilepsy. Clin EEG Neurosci. 2009; 40: 173-179. 28. Kober SE, Witte M, Stangl M, Väljamäe A, Neuper C, Wood G. Shutting down sensorimotor interference unblocks the networks for stimulus 9. Gruzelier JH. Differential effects on mood of 12–15 (SMR) and 15–18 processing: An SMR neurofeedback training study. Clin Neurophysiol. (beta1) Hz neurofeedback. Int J Psychophysiol. 2014; 93: 112-115. 2015; 126: 82-95. 10. Escolano C, Navarro-Gil M, Garcia-Campayo J, Minguez J. The effects of 29. Posner MI, Petersen SE. The attention system of the human brain. Annu Rev a single session of upper alpha neurofeedback for cognitive enhancement: Neurosci. 1990; 13: 25-42. a sham-controlled study. Appl Psychophysiol Biofeedback. 2014; 39: 227-236. 30. Fan J, McCandliss BD, Sommer T, Raz A, Posner MI. Testing the efficiency 11. Reiner M, Rozengurt R, Barnea A. Better than sleep: Theta neurofeedback training accelerates memory consolidation. Biol Psychol. 2014; 95: 45-53. and independence of attentional networks. J Cogn Neurosci. 2002; 14: 340- 347. 12. Alkoby O, Abu-Rmileh A, Shriki O, Todder D. Can we predict who will respond to neurofeedback? A review of the inefficacy problem and existing 31. Wang Y-F, Cui Q, Liu F, Huo YJ, Lu FM, Chen H, Chen HF. A new method predictors for successful EEG neurofeedback learning. Neurosci. 2017; 378: for computing attention network scores and relationships between attention 155-164. networks. PLoS ONE. 2014; 9: e89733.

35 Curr Neurobiol 2020 Volume 11 Issue 2 Sensorimotor Rhythm is Associated with Reinforcement Learning and Cognitive Impulsivity: A Neurofeedback Study

32. Wang Y-F, Jing X-J, Liu F, Li M-L, Long Z-L, Yan JH, Chen H-F. Reliable 46. Bauer R, Vukelić M, Gharabaghi A. What is the optimal task difficulty for attention network scores and mutually inhibited inter-network relationships reinforcement learning of brain self-regulation? Clin Neurophysiol. 2016; revealed by mixed design and non-orthogonal method. Sci Rep. 2015; 5: 127: 3033-3041. 10251. 47. Bauer R, Fels, M, Royter V, Raco V, Gharabaghi A. Closed-loop adaptation 33. Conway ARA, Kane MJ, Bunting MF, Hambrick DZ, Wilhelm O, Engle of neurofeedback based on mental effort facilitates reinforcement learning of RW. Working tasks: A methodological review and user’s brain self-regulation. Clin Neurophysiol. 2016; 127: 3156-3164. guide. Psychon Bull Rev. 2005; 12: 769–786. 48. Delorme A, Makeig S. EEGLAB: An open source toolbox for analysis of 34. Unsworth N, Heitz RP, Schrock JC, Engle RW. An automated version of the single-trial EEG dynamics including independent component analysis. J operation span task. Behav Res Methods. 2005; 37: 498-505. Neurosci Methods. 2004; 134: 9-21. 35. Gershman SJ, Daw ND. Reinforcement learning and episodic memory in 49. Enriquez-Geppert S, Huster RJ, Scharfenort R, Mokom ZN, Zimmermann J, humans and animals: an integrative framework. Annu Rev Psychol. 2017; Herrmann CS. Modulation of frontal-midline theta by neurofeedback. Biol Psychol. 2014; 95: 59-69. 68: 1-528. 50. Gruzelier JH, Hirst L, Holmes P, Leach J. Immediate effects of alpha/ 36. Frank MJ, Santamaria A, O’Reilly RC, Willcutt E. Testing computational theta and sensory-motor rhythm feedback on music performance. Int J models of dopamine and noradrenaline dysfunction in attention deficit/ Psychophysiol. 2014; 93: 96-104. hyperactivity disorder. Neuropsychopharmacol. 2006; 32: 1583-1599. 51. Doppelmayr M, Weber E. Effects of SMR and Theta/Beta neurofeedback on 37. Smittenaar P, Chase HW, Aarts E, Nusselein B, Bloem BR, Cools R. reaction times, spatial abilities, and creativity. J Neurother. 2011; 15: 115-129. Decomposing effects of dopaminergic medication in Parkinson’s disease on probabilistic action selection - learning or performance? Eur J Neurosci. 52. Hoedlmoser K, Pecerstorfer T, Gruber G, Anderer P, Doppelmayr M, 2012; 35: 1144-1151. Klimesch W, Schabus M. Instrumental conditioning of human sensorimotor rhythm (12-15 Hz) and its impact on sleep as well as declarative learning. 38. Bosboom JLW, Stoffers D, Stam CJ, van Dijk BW, Verbunt J, Berendse Sleep. 2008; 31: 1401-1408. HW, Wolters EC. Resting state oscillatory brain dynamics in Parkinson’s disease: an MEG study. Clin Neurophysiol. 2006; 117: 2521-2531. 53. Ros T, Munneke MAM, Ruge D, Gruzelier JH, Rothwell JC. Endogenous control of waking brain rhythms induces neuroplasticity in humans. 39. Frederick S. Cognitive reflection and decision making. J Econ Perspect. European Journal of Neuroscience, 2010; 31: 770-778. 2005; 19: 25-42. 54. Ros T, Moseley MJ, Bloom PA, Benjamin L, Parkinson LA, Gruzelier JH. 40. Toplak ME, West RF, Stanovich KE. Assessing miserly information Optimizing microsurgical skills with EEG neurofeedback. BMC Neurosci. processing: An expansion of the Cognitive Reflection Test. Think Reason. 2009; 10: 87-97. 2014; 20: 147-168. 55. Escolano C, Navarro-Gil M, Garcia-Campayo J, Congedo M, Minguez J. 41. Frank MJ, Seeberger LC, O’Reilly RC. By carrot or by stick: cognitive The effects of individual upper alpha neurofeedback in ADHD: an open- reinforcement learning in parkinsonism Science. 2004; 306: 1940-1943. label pilot study. Appl Psychophysiol Biofeedback. 2014; 39: 193-202. 42. Solomon M, Smith AC, Frank MJ, Ly S, Carter CS. Probabilistic 56. Ros T, Baars B, Lanius RA, Vuilleumier P. Tuning pathological brain reinforcement learning in adults with autism spectrum disorders. Autism oscillations with neurofeedback: a systems neuroscience framework. Front Res. 2011; 4: 109-120. Hum Neurosci. 2014; 8: 1-22. 43. Primi C, Morsanyi K, Chiesi F, Donati MA, Hamilton J. The development 57. Gruzelier JH, Foks M, Steffert T, Chen MJL, Ros T. Beneficial outcome and testing of a new version of the cognitive reflection test applying item from EEG-neurofeedback on creative music performance, attention and response theory (IRT). J Behav Decis Mak. 2016; 29: 453-469. well-being in school children. Biol Psychol. 2014; 95: 86-95. 58. Bauer R, Gharabaghi A. Reinforcement learning for adaptive threshold 44. Thomson KS, Oppenheimer DM. Investigating an alternate form of the control of restorative brain-computer interfaces: A Bayesian simulation. cognitive reflection test. Judgm Decis Mak. 2016; 11: 99-113. Front Neurosci. 2015; 9: 1-10. 45. Davelaar EJ, Barnby JM, Almasi S, Eatough V. Differential subjective 59. Davelaar EJ, Eatough V, Etienne M, Ozolins C. Mid-frontal theta experiences in learners and non-learners in frontal alpha neurofeedback: oscillations discriminate between sham-control and neurofeedback training piloting a mixed-method approach. Front Hum Neurosci. 2018; 12: 402. manipulations: a signal-detection analysis. Curr Neurobiol. 2018; 9: 95-100.

Correspondence to: Eddy J. Davelaar Department of Psychological Sciences Birkbeck, University of London Malet Street WC1E 7HX London, UK E-mail: [email protected]

Curr Neurobiol 2020 Volume 11 Issue 2 36