UC Riverside UC Riverside Electronic Theses and Dissertations

Title Towards Improving Cybersecurity and Augmenting Human Training Performance Using Brain Imaging Techniques

Permalink https://escholarship.org/uc/item/1kg856mj

Author Rahman, Md Lutfor

Publication Date 2020

Peer reviewed|Thesis/dissertation

eScholarship.org Powered by the California Digital Library University of California UNIVERSITY OF CALIFORNIA RIVERSIDE

Towards Improving Cybersecurity and Augmenting Human Training Performance Using Brain Imaging Techniques

A Dissertation submitted in partial satisfaction of the requirements for the degree of

Doctor of Philosophy

in

Computer Science

by

Md Lutfor Rahman

December 2020

Dissertation Committee:

Dr. Chengyu Song, Chairperson Dr. Megan Peters Dr. Vagelis Papalexakis Dr. Srikanth Krishnamurthy Dr. Zhiyun Qian Copyright by Md Lutfor Rahman 2020 The Dissertation of Md Lutfor Rahman is approved:

Committee Chairperson

University of California, Riverside Acknowledgments

Alhamdulillah for everything. It would be impossible for a child of parents without a formal education to attain the highest degree (Ph.D.) without the mercy of the Almighty.

I am grateful to my advisor Dr. Chengyu Song - a brilliant researcher and humble being who allowed me to grow as an independent researcher, but was always by my side for any help. My

Ph.D. journey was smooth and stress-free from beginning to end. I am also grateful to Dr. Megan

Peters, Dr. Vagelis Papalexakis, Dr. Srikanth Krishnamurthy, Dr. Zhiyun Qian for their valuable suggestions in my research and career. It will surely help me to be a better scientist. I would like to thank my internship mentors at Army Research Lab, Dr. Antony Passaro and Dr. Benjamin T. Files, and other collaborators, Dr. Peter Khooshabehadeh, Dr. Kimberly Pollard, and Ashley Oiknine, who taught me to carefully address precise details in designing the neuroscience experiment.

I want to express my gratitude to all of my formal and informal teachers in my entire educational journey. I am grateful to all our family friends in the Riverside community, with whom we spent excellent family time. I’d like to thank my co-authors, lab-mates, friends, and staff of the computer science department. I am grateful to sahadat uncle for supporting me in different ways. I like to thank all volunteers of the Education Foundation (efcharity.org) who are doing many wonderful things for underprivileged students with me.

I’d like to thank my previous supervisor, Dr. Nitesh Saxena, who helped me fall in love with this challenging domain. I have to give special thanks to Dr. Ajaya Neupane, for brainstorming new ideas with me and helping me consistently throughout my Ph.D. journey. I am incredibly grateful to Dr. Mahmud Hossain for providing assistance and support in many ways. I appreciate the endorsement of Dr. Md. Mostofa Akbar, Dr. Mohammad Mahfuzul Islam, Prof. Md. Abdus Sattar,

iv Dr. Ragib Hasan, and Dr. Robert M Hyatt for giving me recommendations during MS and Ph.D. admission.

Throughout my educational pursuit, my family has supported me in countless ways. I want to thank my two elder brothers, Md Samsul Islam and Md Rafiqul Islam. If they had not pursued an education, I might not have had the chance to study. I am grateful to my whole family for their love and prayers. I also like to thank my relatives, and well wishers.

My children, Nabiha and Rifqat, made tremendous sacrifices during my journey. Many nights I could not go to sleep with them due to work. I am so fortunate to have married my friend

Umme Hani Mst. Zoaria, a superwoman whose unconditional love and tremendous support often gave me the strength to keep going one more step.

v Dedication

To my father, Md Zasim Uddin, and my mother, Mst. Mazeda Begum.

You never had a chance for formal education but your hardship, love, and prayers have

paved the way for me to attain the highest degree.

vi ABSTRACT OF THE DISSERTATION

Towards Improving Cybersecurity and Augmenting Human Training Performance Using Brain Imaging Techniques

by

Md Lutfor Rahman

Doctor of Philosophy, Graduate Program in Computer Science University of California, Riverside, December 2020 Dr. Chengyu Song, Chairperson

Human behaviors can weaken the security of cyber-physical systems. However, con- ventional security research focuses more on hardware and software security than analyzing and improving human behaviors to provide better protection for digital systems. In this regard, we study the neural insights of computer systems users to identify cyber-attacks, such as phishing, and improve cybersecurity. First, we analyze neural activities to detect phishing attacks. We demonstrate that the variations in neural activity levels can be utilized to identify phishing websites with improved data preprocessing and feature extraction methods. Second, we study users’ neural activities to learn their high-level intents when they use applications. The inferred intents are then used to ensure the security and privacy of sensitive resources, such as cameras and multi-media files. Finally, we design an adaptive training model that enables users to differentiate between benign and malicious scenarios.

We consider both behavior and neural metrics to develop adaptive logic. Our experimental results show that participants trained with our approach outperform in the transfer task than those trained with non-adaptive and behaviorally adaptive designs.

vii Contents

List of Figures xi

List of Tables xiii

1 Introduction1 1.1 Understanding Phishing Attacks through the Lens of BCI...... 2 1.2 Neural Signals in the Loop...... 3 1.2.1 Neural Feedback for Access Control...... 4 1.2.2 Neural Feedback for Adaptive Training...... 5 1.3 Thesis Contribution...... 6 1.4 Organization...... 7

2 Background8 2.1 Introduction to Brain-Computer Interface (BCI)...... 8 2.1.1 Brain Imaging Techniques...... 9 2.1.2 Data Preprocessing and Feature Extraction...... 10 2.1.3 Brain Areas...... 11 2.2 Ethical and Safety Considerations...... 12

3 Learning Tensor-based Representations from Brain-Computer Interface Data for Cy- bersecurity 13 3.1 Introduction...... 14 3.2 Prior Work...... 19 3.2.1 Related Works...... 19 3.3 Study Design & Data Collection...... 22 3.3.1 Phishing Detection Task...... 22 3.3.2 Study Protocol...... 24 3.4 Behavioral Data Analysis...... 25 3.5 Neural Data Analysis...... 26 3.5.1 Data Preprocessing...... 26 3.5.2 Independent Component Analysis...... 27 3.5.3 Feature Extraction Using Auto Regression...... 28

viii 3.5.4 Feature Extraction Using Tensor Decomposition...... 29 3.6 Results and Analysis...... 35 3.6.1 Classifiers and Evaluation Metrics...... 35 3.6.2 Global Model...... 37 3.6.3 Human vs Machine...... 37 3.6.4 Comparison of Auto-regression and Tensor Decomposition Result..... 39 3.7 Discussion...... 41 3.7.1 Phishing Detection Mechanism...... 41 3.7.2 Feasibility of the Defense Mechanism...... 42 3.7.3 Phishing Detection vs Brain Areas...... 42 3.7.4 Statistical Analysis: Real vs Fake Events...... 44 3.7.5 Feature Space Reduction...... 45 3.7.6 Study Strengths and Limitations...... 45 3.8 Chapter Conclusion...... 46

4 IAC: On the Feasibility of Utilizing Neural Signals for Access Control 48 4.1 Introduction...... 49 4.2 Background...... 53 4.3 Intent-driven Access Control...... 55 4.3.1 Threat Model and Assumptions...... 56 4.3.2 IAC via BCI...... 57 4.4 Experiment Design...... 59 4.4.1 Single App Experiment...... 60 4.4.2 Multiple Apps Experiment...... 62 4.4.3 Experimental Procedures...... 63 4.5 Data Process and Analysis...... 65 4.6 Feasibility Test...... 68 4.6.1 Single App Analysis...... 69 4.6.2 Cross-app Portability Analysis...... 70 4.6.3 Results Analysis...... 72 4.6.4 Authorization Accuracy...... 73 4.7 Discussion...... 73 4.8 Related Work...... 74 4.9 Chapter Conclusion...... 77

5 Augmenting Training Performance by Adding Neural Signals into the Adaptive Feed- back Loop 79 5.1 Introduction...... 80 5.2 Background...... 83 5.2.1 Training...... 83 5.2.2 Closed-loop BCI...... 85 5.2.3 Theta/Alpha Ratio (TAR)...... 85 5.3 Design of the Experiments...... 86 5.3.1 Go/No-Go Training Task...... 86 5.3.2 Target Identification Transfer Task...... 90

ix 5.4 Methods...... 91 5.4.1 Participants...... 91 5.4.2 Apparatus...... 91 5.4.3 Procedures...... 92 5.4.4 EEG Data Processing...... 95 5.5 Results...... 96 5.6 Discussion...... 98 5.7 Implications and Applications of Our Work...... 102 5.8 Study Limitations and Future Work...... 104 5.9 Chapter Conclusion...... 105

6 Conclusions 106 6.1 Thesis Summary...... 106 6.2 Future Directions...... 107 6.3 Acknowledgement...... 108

Bibliography 109

x List of Figures

2.1 Schematic diagram of a generic setup of Brain-Computer Interface. The EEG devices record brain signals from the scalp and then extract the meaningful features after preprocessing. A classifier then predicts a command that is subsequently used by the external Automated Execution Device...... 9 2.2 Brain areas and functions [13, 155]...... 12

3.1 a) (left) Emotiv Epoc+ electrode placement based on international standard b) (middle) Experimental Set Up: A participant performing the phishing detection task c) (right) Experiment Flow: The websites were randomly presented...... 23 3.2 PARAFAC decomposition with 3 factor matrices (Time, Channel and Event). Event matrix (blue colored) is used as features...... 30 3.3 PARAFAC2 decomposition of a mode - 3 tensor...... 32 3.4 PARAFAC2 model representing the brain EEG data across different events..... 33 3.5 Human vs Machine mistakes...... 38 3.6 AUC curve for All channels vs Top 6 channels using RandomForest algorithm. Here, we observed that TPR for all channels is 79.04% and 94.91% for top 6 channels when FPR is < 1%...... 40 3.7 a) shows the channel activity after the application of SPARTan decomposition on the tensor. The channel data for the first component is plotted in this figure to determine which channels have high activity. b) shows the corresponding brain region activation. 44

4.1 Overview of IAC’s Architecture. IAC will 1 continuously monitor the brain signals using the EEG sensor and user interaction with the system. Upon an input event, IAC will create an ERP, 2 preprocess the raw EEG data to get purer signals, 3 extract feature vector from the purified signals, 4 feed the extracted features to a ML model to infer the user’s intent. In step 5 , if the ML gives enough decision confidence, IAC will directly 7 authorize access to protected resources. Otherwise, it will 6 prompt users to authorize the access and improve the ML model with the feedback loop...... 56 4.2 Example permission request. Compare to existing permission request, the biggest difference is IAC also asks for intended task (e.g., taking a photo)...... 57 4.3 Android app for data collection...... 59

xi 4.4 Experiment setup user is playing android apps while wearing the Emotiv Epoc+ BCI headset. The sensors of headset captured neural signals, converted to digital form and transmitted encrypted data to the neural data collection computer via USB dongle receiver...... 64 4.5 Boxplot of Precision, Recall, and F − measure of individual model. The red line indicates the median value and + symbol indicates the outliers...... 70 4.6 How classification metrics varies with the number of seen intents? The first bar represents the Precision, Recall, and F − measure without adding any new intents from multiple real world apps experiment to the global model from single app experiment. The second bar represents results with adding new intents to the global model, The third bar represents the results after adding two intents to the global model, and so on and so forth. We observed the upward trends of Precision, Recall, and F − measure with the addition of more new intents to the global model..... 72

5.1 Schematic diagram of an adaptive training system. A task is presented to a trainee, and the trainee’s score on that task then informs the adaptive logic which subsequently modifies the task. This is BAT. With the inclusion of neural signals (dotted line paths), the trainee’s EEG measurements are also fed into the adaptive logic, which then modifies the task. This is CAT. Removing the adaptive logic would yield a fixed training regimen...... 84 5.2 A flow chart of the training experiment...... 87 5.3 A demo participant is wearing Biosemi electrode cap. The stimulus was presented on the monitor. The trainee pressed the RT box key to respond to the stimulus. The EEG signals collected by electrodes and converted to the digital format by AD-box. The digitized signal then received in ActiveTwo software interface through USB2 cable. We updated the difficulty of task based on the neural and behavioral response after each block...... 93 5.4 CAT technical block diagram...... 95

xii List of Tables

3.1 Accuracy vs Response Time...... 25 3.2 True positive rate and false positive rate for global model...... 37 3.3 Comparing Human vs. Machine performance accuracy...... 38 3.4 In this table, we present the classification results with feature extraction of autore- gression and tensor decomposition. In tensor, we have classification results for two scenarios. First, we used features of tensor components by considering all channels and top 6 channels based on their activation. All* = All Channels, Top 6* = Top 6 Channels...... 39

4.1 The list of Apps used in testing phase: we test the performance of the model built on the neural data collected from the in-house android app in correctly identifying the intention of the users when they interact with these real apps...... 62 4.2 Classification result of global model...... 69

5.1 Top: Behavioral based adaptive logic table for the BAT system. Bottom: Behavioral and neural based adaptive logic table for the CAT system. The difficulty level increased (+) by one level, decreased (-) by one level, or remained the same (=).. 89 5.2 Top: Overall percentage of scores in the three categories defined by the adaptive rule in the behavior adaptive condition. Bottom: Overall percentage of scores in the nine categories defined by the adaptive rule in the combined adaptive condition. Cells in dark gray show cases in which the behavior rule would keep difficulty the same, but the combined rule changed difficulty. Those in light blue show cases in which the behavior rule would change difficulty, but the combined rule kept difficulty the same. 101

xiii Chapter 1

Introduction

Traditional security research focuses on securing the hardware and software stack, with lesser concentration on how humans weaken the cyber ecosystem. However, all kinds of preventive mechanisms from traditional research might be in vain if a user falls for a phishing attack. Attackers usually target the of the security chain, and humans are typically considered one of the weakest links. As a result, we observed a surge of phishing and social engineering attacks in the past few years, with many large corporations penetrated through targeted/spear-phishing attacks. We take an unconventional approach to exploring cybersecurity by studying the human brain and unfolding some of its mysteries. This thesis aims to understand and incorporate human factors, particularly neural insights for security-relevant and human training improvement tasks. The human brain is extremely mysterious, with more than 100 billion neurons and trillions of synapses. We investigates the treacherous and mysterious world of cybersecurity by dissecting the neural underpinnings. We design and develop more reliable access control mechanisms by keeping neural signals within the loop [142], develop new analysis techniques of neural data [140] for phishing detection task, and

1 develop applications [172] for improving human training performance.

1.1 Understanding Phishing Attacks through the Lens of BCI

Cybersecurity has continued to evolve with technological advancements throughout the history of the internet. The more we integrate technology into our society, the higher the chance we will become victims of cyberattacks. According to the White House, cyberattacks cost the U.S. economy between $57 -109 billion in 2016 [147]. Research and market data indicate the global cybersecurity market is expected to reach $267.73 billion by 2024 [157]. While tremendous efforts have been made to secure the hardware and software stack of cyber systems, all these preventive technologies might be in vain if a user falls for a phishing attack. As attackers usually target the weakest link of the security chain, we observed a surge of phishing and social engineering attacks in the past few years. Many large corporations were penetrated through targeted/spear-phishing attacks.

Phishing is the act of impersonating a trusted third party to steal users’ sensitive and private information. The stolen information can cause direct financial loss or be used to penetrate large corporations, making phishing one of the most severe threats to cybersecurity. To understand why users are deceived by phishing attacks, we conducted a study [118] that measured and characterized users’ neural processes in detecting phishing websites. We used neural signals and an eye tracker to measure users’ engagement and workload during the phishing detection task in a near-realistic environment. This study showed that brain areas related to critical decision making and visual search exhibit differences in activation when participants view real versus phishing websites. We also found that users may not heed the website’s key areas and may exhibit some differences while processing real and fake websites. We observed that users do not spend enough time analyzing key phishing

2 indicators and often fail to detect these attacks, although they may be mentally engaged in the task and subconsciously processing real sites differently from fake sites.

Inspired by our observation from this study, we used the differences in the activation levels as features to predict whether the participants were viewing a real website or a phishing website

[140, 171]. This study analyzed the underlying hidden patterns in neural data using tensor-based representations of electroencephalography (EEG) data related to phishing detection tasks. Traditional feature extraction techniques, such as power spectral density, autoregressive models, and Fast Fourier

Transform, can represent data either in spatial or temporal dimension, but not both; however, our tensor modeling used both spatial and temporal traits in the input data. We found brain areas related to the users’ decision-making process regarding the real and the fake websites based on the latent factors extracted using tensor. The machine learning classifiers showed that tensor-based neural features were consistently above 97% accurate across all classifiers, exceeding human decision-making accuracy of 84.82%. This pattern of results indicates that neural signatures encoded in the brain are important to correctly classifying a website as real versus phishing. A more surprising result was that the pattern did not always align with the participants’ final decisions, and that ML-based classifiers can be more accurate than participants’ decisions. This phenomenon suggests that phishing websites will trigger “internal” warnings in the user’s brain that will be ignored or overridden by other signals.

1.2 Neural Signals in the Loop

The "neural signals in the loop" is defined as a system that requires feedback from neural signals. Here, neural signals are involved in a feedback loop to train, tune, and test the system, making it more accurate and confident. We used neural signals in the loop in both the hardware [142]

3 space for improving security in the devices and in the software [172] space for improving human training performance.

1.2.1 Neural Feedback for Access Control

Access control is the core security mechanism of an operating system (OS). Ideally, the access control system should enforce context integrity, i.e., an application can only access security and privacy-sensitive resources expected by users. Unfortunately, existing access control systems, including the permission systems in the modern OS, such as iOS and Android, fail to enforce context integrity; thus, these systems allow apps to abuse their permissions. In our Intent-driven Access

Control (IAC) study [142] we explored the feasibility of a novel approach to enforce the context integrity – by inferring what task users wanted to do under the given context from the Event-related potentials (ERPs) of their neural signals. ERPs are small, but measurable (with an EEG sensor), voltage changes generated by the brain in response to a stimulus event. In our experiments, the stimulus event is to perform a given task with mobile apps. During normal operations, the OS continuously monitors neural signals through the BCI device and the user’s interaction with the system to create and cache the most recent ERPs. ERPs are bound to the app to which the input event is delivered (e.g., the most foreground app at that moment) and expire after a context switch.

This prevents one app from “stealing” another app’s ERP. Upon an application’s request to access a protected resource (e.g., camera), the access control system retrieves the most recent ERP. The ERP is then fed into the trained classifier to infer whether the user intended to perform a task that requires access to that resource. If so, permission is automatically granted to that request. If the intended task does not require the permission, or the confidence of the classification result is not high enough,

IAC falls back to prompt the user to make the decision. The ground truth collected from the prompt

4 window is then to update the machine learning (ML) model. As demonstrated in the previous work

[180] and our experiment, this feedback is important for fine-tuning the ML model to improve the precision of the prediction. The idea of using brain electrical signals to help OSes make dynamic access control decisions is interesting and especially relevant if BCI will be popular in the future, as the idea could be smoothly integrated into BCI supported systems.

1.2.2 Neural Feedback for Adaptive Training

Training is a systematic approach to acquiring skills that improve performance in a task of interest. The difficulty of adaptive training is varied to keep it within the optimal ranges for the trainee. An adaptive training, based on behavioral metrics, outperformed non-adaptive training in several cognitive tasks [75]. In our study [172], we added neurophysiological traits into a behavioral- based adaptive training system, as a close correlation has been found between the performance of cognitive tasks and the neural signals. In this method, a task is presented to a trainee and measures the trainee’s theta-alpha-ratio (TAR) from the neural signals using the EEG sensor. The task difficulty changes based on the TAR and performance score. Past research reveals that alpha (8-12 Hz) power increases, and focus and attention decrease; the opposite direction is observed for theta power (3-7

Hz). We combined these two neural features as a ratio since they demonstrate negative and positive relationships with the behavior of interest (focus). The neural loop-based adaptive training system improved 6% over the non-adaptive training system, and 9% over the behavioral adaptive training system in the subsequent transfer task. The benefits of our novel approach are broad within the context of interactive training (e.g., game-based cognitive training) and could allow for improved transfer performance, reduced training times, and reduced training costs.

5 1.3 Thesis Contribution

This dissertation investigates cybersecurity problems (e.g., phishing detection and access control) and augmenting human performance training methodology using Brain-Computer Interface

(BCI). We have made the following contributions as part of this work.

1. We introduced an automated phishing detection task in the first study by applying tensor

decomposition methodology to BCI data collected by a consumer-grade EEG device. Its

sub-contributions are below:

(a) Automated Detection Techniques using Brain Signals: Based on the measured neural cues,

we explored the design of an automated phishing detection approach. After preprocessing

the neural signals, we extracted features using autoregressive coefficients and tensor

decomposition. We then evaluated the machine learning models based on these features

with different classifiers using 15-fold cross-validation and achieved the average true

positive rate of 97% with the false positive rate of 2%.

(b) Comprehensive Analysis of Human Accuracy vs. Machine Accuracy: We compared the

improvement in performance using the phishing detection model for the worst-performing

participants, the participants with the overall accuracy below the mean accuracy, and

demonstrated that this approach provides the defensive mechanism to detect phishing

attacks programmatically even when the users themselves would have failed to detect

them.

2. In the second study, we discussed an approach to automate the access control process in the

OS permission model, such as the Android app permission model, by monitoring the neural

6 signals. We experimentally validated the feasibility of constructing such a system with a

consumer-grade EEG headset via a user study of 41 participants. Our experimental results

show the feasibility of intent-driven access control. To the best of our knowledge, this is the

first study using brain signals to protect users’ privacy.

3. In the third study, we demonstrated that combining neural measures with behavioral measures

in an adaptive training scenario yields performance improvement in the transfer task. We

also demonstrated the effectiveness of the newly designed system by collecting data from 44

participants in three different training systems. We found a statistically significant performance

improvement on a transfer task in the CAT system relative to the other two counterparts.

1.4 Organization

The remainder of the thesis is organized as follows:

We discuss the relevant background for this thesis in chapter 2. In chapter 3, we study the phishing detection through the neuroimaging technique (e.g., EEG). We apply tensor decomposition and autoregressive coefficients for feature engineering for brain data collected from EEG devices.

We find that some regions of the brain are more activated for phishing detection. More surprisingly, we discover that decoding accuracy from brain signals is higher than the conscious human decision itself. In chapter 4, we present our intention driven access control system. In this study, we present our intention-driven access control system based on neural signals. In chapter 5, we design and develop a neuroadaptive training system. In chapter 6, we discuss some future studies in my domain of research and conclude this dissertation.

7 Chapter 2

Background

2.1 Introduction to Brain-Computer Interface (BCI)

Since the invention of the computer system, humans have used their hands to interact with computers or computer-enabled machines. Over the decades, several other modalities (e.g., speech, gesture) were developed to make our communication with computers more intuitive. BCI is a new type of user interface that interprets our neural signals into machine-understandable commands (see

Figure 2.1 for schematic diagram of BCI setup). It converts brainwaves into digital commands that instruct the machine to conduct various tasks. For example, researchers have shown that BCI may enable patients who suffer from neurological diseases, such as locked-in syndrome, to spell words and move computer cursors [17, 167] , or allow patients to move a prosthesis [164]. BCI enables a human to use mind interaction instead of physical interactions (e.g., clickjacking [76]).

8 Figure 2.1: Schematic diagram of a generic setup of Brain-Computer Interface. The EEG devices record brain signals from the scalp and then extract the meaningful features after preprocessing. A classifier then predicts a command that is subsequently used by the external Automated Execution Device.

2.1.1 Brain Imaging Techniques

Several noninvasive (e.g., sensors placed on the scalp) neuroimaging technologies (e.g., electroencephalography (EEG), functional magnetic resonance imaging (fMRI)), and invasive (e.g., sensors implanted directly to brain) technologies (e.g., electrocorticography (ECoG) ) are used to record the brain activity. EEG measures the brain’s electrical activity from the surface of the scalp, whereas fMRI measures brain activity by detecting related changes in blood flow. On the other hand,

ECoG uses sensors placed on the brain’s surface to measure electrical activity from the brain. Each technique has its advantages and disadvantages. Due to the EEG system’s higher temporal resolution, we used this methodology for our studies. In future studies, we will use fMRI because of its higher spatial resolution characteristics. We discuss EEG and fMRI in the following paragraph.

EEG. Electroencephalography is a monitoring technique that records the brain’s electrical activities.

The recorded EEG data is time-series data. Voltage fluctuations generated from neurons inside the brain are captured by electrodes and amplified. The wearable EEG devices track and record brain wave patterns (graphs voltage over time) through electrodes by using the summation of many action potentials sent by neurons in the brain. In our studies, we use both the research grade (e.g., Biosemi wired) and consumer grade ( e.g., Emotiv EPOC+ wireless [1]) devices to record the brain activities.

9 An ActiveTwo (called ActiView) Biosemi [16] system (BioSemi, Amsterdam, The Netherlands) is a research-grade multichannel, high-resolution biopotential measurement device. It has an electrode cap with 64 pre-amplified, active surface electrodes. The Emotiv EPOC+ wireless headset is a consumer-grade EEG device consisting of 14 sensors. The sensors placement for both devices are based on the standardized international 10-20 electrode placement system [85]. The Common Mode

Sense (CMS) electrode and Driven Right Leg (DRL) electrode served as a ground. The sampling rate for the Biosemi system is 512Hz, and Emotiv Epoc+ is 128Hz. fMRI. fMRI measures brain activity by detecting related changes in blood flow. It is assumed that such blood flow changes are associated with brain activity [111]. Any activity causes more demand for oxygen, which leads to an increase in blood flow. When an area of the brain is in use, blood

flow to that region also increases [183]. The MRI machine has a cylindrical tube with a powerful electromagnet to detect which areas of the brain are active in a specific moment. In contrast to EEG, fMRI has a higher spatial resolution.

2.1.2 Data Preprocessing and Feature Extraction

Data preprocessing and feature extraction play a vital role in research related to brain imaging techniques. The raw brain data is often too noisy because of different artifacts added during data collection by several muscles and eye movement. Thus, the first step is to preprocess the brain data to increase their signal-to-noise ratio using different filters. It is common to use the high-pass

filtering at 0.1 Hz and low-pass filtering at 120 Hz to remove the background noise and artifacts.

EEG signals have five major frequency bands: delta (< 3 Hz), theta (3-7 Hz), alpha (8-12 Hz), beta

(12-30 Hz), and gamma ( > 30 Hz) bands. Based on tasks, different frequency bands are used. There

10 are many different techniques to extract the features of the EEG signals after preprocessing, such as power spectrum analysis [39] and time-frequency analysis [71]. We use 6th order autoregressive

(AR) coefficients [181] as our feature extraction due to their common use as feature vectors in classification algorithms of EEG data (e.g., [10, 117, 128]). In all those methods, we use time and space modes of EEG data. Several other modes (e.g., trial, condition, subject) can be present in the

EEG data. We introduce tensor decomposition by adding a phishing detection task as one of the modes in EEG data. The 3-mode tensor is then formed as Time X Channel X Events. In events, both the real and the phishing websites are considered. Tensor decomposition become attractive in signal processing research [36] , and it become practical to capture [92] the underlying structure of the brain data. In our experiment, the participants are given the flexibility to take the required amount of time to select whether or not the website is real. We use a modified version of the SPARTAN algorithm [134] to compute the tensor.

2.1.3 Brain Areas

This section provides a brief overview of the brain area. It will help us to understand the relationship between the brain area and phishing detection tasks. The brain is one of the fundamental components of the human body, consisting of about 100 billion neurons. The brain areas are broadly divided into four regions, called lobes. The four lobes are the frontal, temporal, parietal, and occipital.

Each lobe has a different function (see Figure 2.2). According to prior studies, while viewing something (e.g., fake paintings [77] or fake websites [120]) untrustworthy, the brain activities increased in frontal and parietal areas. These areas are implicated with perception, reasoning, and planning.

11 Figure 2.2: Brain areas and functions [13, 155]

2.2 Ethical and Safety Considerations

We got approval from the Institutional Review Board (IRB) of concerned authority to conduct all of our studies. We then invited volunteers to participate in our study using social media posts, flyers, and word-of-mouth as advertisement modes. The IRB approved the informed consent form obtained from the participants before they were engaged in the task. The collected neural and behavioral data were stored safely following standard procedures.

12 Chapter 3

Learning Tensor-based Representations from Brain-Computer Interface Data for

Cybersecurity

Detecting phishing attack is a classic security problem which has drawn much attention from both the academia and industry. Nevertheless, the existing mechanisms either have low accuracy or low efficiency in detecting phishing websites, allowing these attacks continue to cost industries billions of dollars every year. In this light, there is a need for new solutions to identify phishing attacks. In this paper, we introduce a novel phishing detection mechanism based on user’s brain signals measured using a commercial grade Brain-Computer Interface (BCI) device.

Compared to prior studies, we make new contributions in several aspects: First, we conduct a focused study of phishing detection in ecologically valid conditions using a commercially available low-cost Electroencephalography (EEG) based BCI device, while previous studies used clinical

13 grade neuroimaging devices – fMRI and fNIRS. Second, we use a new feature vectors extraction method based on Independent Component Analysis, autoregression and tensor decomposition which yields significantly better detection results. Third, we compare the performance of our neural cues based machine learning models to the behavioral performance of our participants, and demonstrate that our models can accomplish statistically significantly better accuracy than the users with below average accuracy. Fourth, we perform a comprehensive analysis of the neural data using tensor decomposition by leveraging both spatial and temporal traits and show the practicality of multi-way neural data analysis. We demonstrate that using tensor-based representations, we can classify real and phishing websites with accuracy as high as 97%, which outperforms state-of-the-art approaches in the same task by 21%. Furthermore, the extracted latent factors are interpretable, and provide insights with respect to the brain’s response to real and phishing websites. The results of our studies exhibit a promising direction in introducing neural cues based automated mechanism to detect phishing websites.

3.1 Introduction

Phishing is the act of impersonating a trusted third party to steal user’s sensitive and private information such as passwords, credit card details. Phishers (attackers) usually exploit the users’ weakness of trusting a website based on its appearance by designing fake websites with the look and feel of the real websites and luring users to these websites through emails, instant messaging or false . Security indicators, warnings, and training kits have been deployed to help users identify phishing websites. Despite these efforts, phishers still manage to consistently succeed in deceiving users and stealing billions of dollars from companies [80]. So, there is a great demand for better

14 detection mechanisms to protect users’ from phishing attacks.

Several studies have evaluated the effectiveness of different security indicators [48, 153, 72] and security warnings [165, 184] in preventing these phishing attacks. The results were not very encouraging – users generally do not perform well at these tasks. For this reason, researchers started focusing on developing automated phishing detection system, For example, by utilizing image processing [187] or URL processing [33] techniques. However, these approaches do not always succeed in detecting the attacks or they take a long time to identify phishing sites, which could undermine their security as well as usability.

Recently, Neupane et al. introduced a neuroscience based methodology to analyze the neural mechanics and provide unique insights into users’ performance that is not possible to capture through traditional performance studies [120, 118, 119]. They measured the underlying neural activations when users were presented with phishing attacks and monitored with clinical-grade state-of-the-art neuroimaging devices, namely, functional magnetic resonance imaging (fMRI) [120],

Electroencephalography (EEG) [118], and functional Near Infrared Spectroscopy (fNIRS) [119].

These studies showed that there are observable differences in neural activities when users’ were viewing real and fake websites. Based on the results, they also explored a fake websites detection system based on fNIRS-measured neural cues, where they were able to obtain the best area under the curve of 76% [119].

Inspired by these recent advances, in this study, we also aim to explore leveraging the underlying neural activities for automated phishing detection. Our study differs from previous ones in several aspects. First, previous studies [120, 118, 119] utilized clinical-grade neuroimaging device.

However, due to the form factor of these devices, data collection was conducted in a single session,

15 which may have resulted in participant fatigue and less number of trials per task. Moreover, the involvement of clinical-grade device also limits the practicability of the detection mechanism. Our study is based on a low-cost commercially available consumer-grade EEG-based BCI device, Emotiv

[1]. EEG-based BCI devices measure electrical activity in the brain, referred as Event-Related

Potentials (ERPs), which provides better temporal resolution (in terms of milliseconds) than fMRI and fNIRS devices. Due to its light weight, this device also allows measuring neural activities in ecologically valid conditions, unlike fMRI where users’ perform task inside scanner in supine posture with constrained input and output interface. As a result, we were able to collect data in four different sessions with a break of 5 minutes between each session. Second, compared to previous fNIRS studies [120], our experimental set-up is much more amenable to real-world settings. We provide enough time for participants to decide on the legitimacy of a website in order to prevent the mistakes they may cause on the rush. Also, the headset we used is very light-weight and less-intimidating compared to the fNIRS device. Third, we utilized a better feature extraction process than the one used in the prior work [119]. More specifically, prior automated approaches use simple features like mean, median, slope, and standard deviation to adjudicate whether a website is real or fake.

Our approach utilizes independent component analysis, auto-regression and tensor decomposition.

Fourth, we test our models under global model setting – where training and test samples are collected from different users. Therefore, it is evident that: (1) Our machine learning models give significantly better results compared to the fNIRS-based models [119] and (2) Our model can even outperform human detection accuracy.

We would also like to argue that while this Brain-Computer Interface (BCI) based approach may sound futuristic and impractical, it actually is not. The Emotiv BCI device is already popular in

16 gaming, meditation, and entertainment industry. More importantly, the advancements of BCI device is on the rise. The industry giants like Facebook, Neuralink, and ambitious startups like Kernel are actively working [23] for BCI commercial applications that turn thoughts into action and vice versa.

In this light, BCI devices could soon be included in daily spheres of life, and people will be wearing these headsets or tiny brain modems most of the times.

Our Contributions and Novelty Claims: In summary, this work studies phishing attack detection based on the neurological phenomena captured by consumer-grade BCI device. Its contributions are three-pronged:

1. EEG Study of Phishing Detection: We design and conduct a focused phishing detection study

with commercial-grade EEG-based BCI device to pursue a thorough investigation of users’

interpretation of real vs. fake websites. Unlike previous fMRI [120] and fNIRS [119] studies,

in this study, we measure neural activations from brain signal when users are viewing real

and fake websites using low-cost, light-weight 14-channel EEG headset in ecologically valid

environmental conditions (see §3.3 for more details).

2. Automated Detection Techniques using Brain Signals: Based on the measured neural cues, we

explore the design of an automated phishing detection approach. We filtered the high-frequency

noise, including head movement and muscle movement, by removing Electromyogram (EMG)

signal, eye movement by removing Electrooculography (EOG) signal from the neural data

using the bandpass filter, segregated linearly independent sources using independent component

analysis and extracted sixth-order autoregressive coefficients as features to represent real and

fake websites. We then evaluated the machine learning models based on these features with

different classifiers using 15 fold cross-validation and achieved the average true positive rate

17 of 97% with the false positive rate of 2% (see §3.6 for more details).

3. Comprehensive Analysis of Human Accuracy vs. Machine Accuracy: We compare the improve-

ment in performance using the phishing detection model for the worst performing participants,

the participants with the overall accuracy below the mean accuracy, and demonstrate that this

approach provides the defensive mechanism to detect phishing attacks programmatically even

when the users themselves would have failed to detect them (see §3.6.3 for more details).

4. Tensor-based Representations from Brain-Computer Interface Data for Cybersecurity We

perform a comprehensive tensor analysis of the neural data and identify the level of activation

in the channels or brain areas related to the users’ decision making process with respect to the

real and the fake websites based on the latent factors extracted. We also reduce the dimension

of feature vector keeping the features related to the highly activated channel, and show that we

can achieve better accuracy (97%) with the dimension-reduced feature vector than considering

the whole space dimension. To the best of our knowledge, this is the first study which employs

the tensor representations to understand human performance in security tasks.

With the emergence of the Brain-Computer Interface (BCI), electroencephalography (EEG) devices have become commercially available and have been popularly used in gaming, meditation, and entertainment sectors. Thus, in this study, we used an EEG-based BCI device to collect the neural activity of users when subjected to a phishing detection task. In this paper, we show that the tensor representation of the EEG data helps better understanding of the brain areas activated during the phishing detection task. The tensor representations of the data collected in our study provided several interesting insights and results. We observed that the users have higher component values for the

18 channels located in the right frontal and parietal areas, which meant the areas were highly activated during the phishing detection task. These areas have been found to be involved in decision-making, working memory, and memory recall. Higher activation in these areas shows that the users were trying hard to infer the legitimacy of the websites, and may be recalling the properties of the website from their memory. The results of our study are consistent with the findings of the previous phishing detection studies [120, 119]. Unlike these studies, our study demonstrates a tool to obtain the active brain areas or channels involved in the phishing detection task without performing multiple statistical comparisons. On top of that, our methodology effectively derives more predictive features from these channels to build highly accurate machine-learning based automated phishing detection mechanism.

3.2 Prior Work

3.2.1 Related Works

The researchers have proposed several measures to identify phishing websites. Some approaches are based on features extracted from URL [95, 107], web-hosting [103, 104], webpage content [185, 187] and visual similarity [29] of the websites. Some research studies have proposed combination of features from multiple sources, such as URL, web content and web-hosting for detecting phishing attacks [178, 173, 33]. Such heuristics based approaches are usually embedded in anti-phishing toolbar or are used in generating blacklists of URLs.

Apart from the automated phishing detection mechanisms, the researchers have conducted studies to understand user performance on phishing detection tasks (e.g., [48, 153, 165, 184, 72]).

These studies also analyze how effective the anti-phishing toolbar, security indicators and phishing warnings are in assisting users’ on identifying fake websites. These studies have generally revealed

19 that users do not pay attention to the browser-based phishing cues and often make incorrect choices.

The blacklists based approach among the proposed methods have the highest accuracy on detecting phishing websites. The blacklists are, however, created once the URL linking to phishing websites is identified and hence are unable to protect users from phishing attack at real-time. The reports from previous study suggests that 47% - 83% of phishing websites appear on blacklists about twelve hours after the initial test [160]. But the phishing websites would have done enough harm before that as they have a median lifetime of a few hours [4]. Also humans are involved in verifying, updating and maintaining some blacklists, for example, Phishtank [136], which may take even longer.

In view of the circumstances, there is a need for a new mechanism to identify and prevent phishing attacks. The phishing detection mechanism that we are proposing in this paper is an attempt in this direction. Our approach does not necessarily compete with other approaches, but can rather be used in conjunction with other approaches.

A recent trend of studies have employed state-of-the-art neuroimaging methodologies to understand users’ performance in security tasks. Neupane et al. used fMRI [120], EEG [118] and fNIRS [119] to analyze neural activities when users were viewing real and fake websites. They also used fNIRS-measured neural activities to build an automated phishing detection mechanism, and only achieved the best AUC (area under curve) of 76%. In our study, unlike them, we use light-weight, commercially available consumer-grade EEG-based BCI device to measure event related potentials

(ERPs) to build machine learning based phishing detection mechanism and achieve average True

Positive Rate of 97%. EEG has better temporal resolution than fNIRS and fMRI and hence have better reaction time, and better feasibility to build defensive mechanisms. Researchers have shown vulnerabilities of commercial BCI device in launching side-channel attacks to infers users’ private

20 credentials [117, 109, 60]. This is the first study of its kind to reveal defensive characteristics of these BCI devices.

Tensor Decomposition and Phishing Detection. Tensor is useful for EEG brain data representation and visualization as well. It provides a compact representation of the brain network data. Moreover, it is useful to use tensor decomposition method to capture the underlying structure of the brain data. In

Cichocki et al.[37], applied tensor decomposition in EEG signals which is related to brain computer interface system. Tensor decomposition has already been applied for feature extraction in different problems involving EEG data. In P300 based BCIs, tensor decomposition is used to extract hidden features because of its multi-linear structures[125]. Unlike the general Event-related Potentials(ERP) based BCI problems, tensor can consider both temporal and spatial structure for feature extraction instead of only temporal structure which ensures better accuracy[40][35]. Tensor decomposition method has been used for a variety of problems such as classification of Mild and Severe Alzheimer’s

Disease using brain EEG data [94].

Tensor decomposition has been used for brain data analysis as well. GEBM is an algorithm that models the brain activity effectively[131]. SEMIBAT is a semi-supervised Brain network analysis approach based on constrained Tensor factorization[26]. The optimization objective is solved using Alternating Direction Method of Multipliers (ADMM) framework. The proposed

SEMIBAT method showed 31.60% improved results over plain vanilla tensor factorization for graph classification problem in EEG brain network.

21 3.3 Study Design & Data Collection

In this section, we describe our phishing detection experiments, the experimental setup involving Emotiv device to collect EEG data, and the protocol we followed for data collection with human participants.

3.3.1 Phishing Detection Task

Phishing is the act of masquerading a fake website as a real website with the intent to lure victims and steal their sensitive personal information like pin/passwords, credit card details, and social security numbers. Phishers design fake websites with the look and feel of the real ones and use email spoofing (e.g., DNS spoofing) to direct users to such websites. In this experiment, we study the neural activation when users are viewing such real and fake websites and eventually build a defensive measure against phishing attacks.

For this experiment, we selected twenty different websites belonging to popular categories, including social networking (e.g., Facebook, LinkedIn, Twitter), web email (e.g., Gmail, Yahoo), banking (Bank of America, Discover, Capital One), e-commerce (Amazon, eBay, Booking.com), online storage (e.g., Dropbox), and others like (e.g., Netflix, our University portal) from the list of top

100 popular websites ranked by Alexa [6]. We used the login pages of these websites for our study.

Some of these websites were repeated multiple times in their fake version during the experiment.

Experiment Design and Implementation. The design of our experiment is in line with the prior studies [48, 120, 118, 119] and we assume that the users are explicitly asked to identify fake websites from real websites. We design our fake webpage based on the layout and URLs of the phishing websites available at PhishTank [136] and OpenPhish [126]. We, hence, obfuscated the URL by

22 altering some letters or changed the layout of the page or removed the SSL certificate or changed the color of address bar or combinations thereof to create fake webpage.

Figure 3.1: a) (left) Emotiv Epoc+ electrode placement based on international standard b) (middle) Experi- mental Set Up: A participant performing the phishing detection task c) (right) Experiment Flow: The websites were randomly presented The flow of the experiment is presented in the Figure 3.1 c). First, we displayed a page with instructions that the participants were supposed to follow in our task. The participants were instructed to identify the real and fake websites among the websites presented to them. We displayed 34 randomized fake/real websites to the participants. Among the 34 websites, there were 17 webpages of real and fake category each. The response dialog box was presented in the top of the window with the question ‘Please answer (Yes/No). Is this webpage real?’ and the ‘Yes’ and ‘No’ buttons.

Unlike prior studies [120], we provided enough time to the participants to prevent the mistakes which may occur because of hastened decision making. In this sense, our experiment setup is more amenable to the real-world settings. When the participant respond to the question it automatically moved to the next webpage. The experiment ended after 34 trials with the goodbye note and an ‘end’ button to close the session. We conducted 4 sessions experiment with a 5 minutes break in between two sessions. For setting the number of trials in EEG experiments, we conformed to the rules of prior

EEG study [102].

23 3.3.2 Study Protocol

Recruitment and Preparation Phase. After obtaining the approval of the IRB, fifteen healthy computer science students were recruited for our study. We recruited most of the participants for our study via social media (e.g.Facebook) and the rest via flyers around campus area. Informed consent and some non-personally-identifiable data (gender, age, and major) was obtained from all participants. Out of 15 participants, ten (66.66%) participants were male, and five (33.33%) were female. Participants have age range in 20-32. We considered these age groups as they are reported to use internet mostly [158] and are more susceptible to phishing than other age groups [159]. Future studies with diverse background and age-groups might be needed to validate our results.

We did not disclose the objective of our study before the experiments to the participants.

We requested every participants to remain relaxed for the entire duration of the experiments. We kept our conversation as short and concise as possible before the experiment.

Scanning Phase. The study followed a within-subjects design for our experiment, whereby all participants were presented with the same set of (randomized) trials corresponding to the task designed to measure website legitimacy. In this study, EEG raw data were collected using the

EmotivPro software package [51]. The fourteen channels of the Emotiv headset were wet with a gel for better connection between scalp and the electrodes, before placing it on the head of the participant. Figure 3.1 b) depicts the experiment setup for a participant.

The participants were instructed to give response pressing ‘Yes’/ ‘No’ button based on their observation to the website on the screen. We displayed fake/real webpage to same machine where

EmotivPro software was running. Our study design is more realistic than prior studies [120, 119]. In those studies, a user has to make decision within some strict timeframe (5s) which may lead the user

24 to make wrong decisions. In our experiment a user gets unlimited time for making the decision. We recorded the user responses and their neural signal for our analysis. All the participants performed the same tasks for four different sessions. There was a break of approximately 5 minutes between two consecutive sessions. We collected all sessions data in the same day and same room. After

finishing all the sessions for a participant, we compensated them with $10 Amazon giftcard.

3.4 Behavioral Data Analysis

In this section, we summarize the performance of the participants in identifying the website correctly. We also measure the amount of time they spent on each website before providing their answer referred to as response time. A user gave 34 responses per session and a total of 136 responses for all four sessions. We have total 2040 responses for all 15 users. Of the 2040 responses, 1853 were correct and 187 responses were wrong. Table 3.1 shows the summary of the results. From Table 3.1,

Metrics → Accuracy Response Time(ms) Trials ↓ Mean Stddev Mean Stddev Real 92.74 9.74 5549.26 1081.43 Fake 88.92 9.90 5015.13 1174.09 Overall 90.83 7.10 5282.20 1068.92 Table 3.1: Accuracy vs Response Time. we observed that the accuracy of correctly identifying fake websites was lower than the accuracy of identifying real website. We also found that the users spent less time on fake websites compared to the real websites. On correlating the accuracy and the response times of users using Pearson’s correlation, we observed that the users with low response time were more prone to phishing attack

(p < 0.05, r = 0.35). This is an interesting correlation, as it indicates that the users taking quick decisions may be prone to phishing attacks.

25 3.5 Neural Data Analysis

In this section, we discuss a comprehensive analysis of neural data acquired during our study. We discuss the neural data preprocessing step, followed by Independent Component Analysis

(ICA) and our feature extraction method using AutoRegressive (AR) coefficients.

3.5.1 Data Preprocessing

In this study, high-level data preprocessing is applied to the data before classification.

The traditional wearable electroencephalography devices capture noise along with the original EEG signal. The complex signal capturing mechanism from the human brain is the main reason for that.

The raw EEG data collected by the Emotiv device captures noisy brain signal. Moreover, the EEG signal captured is also of low-resolution. Therefore, it is necessary to preprocess the data to remove noise.

The wearable EEG device e.g. Emotiv device is placed on the scalp and the brain signal is captured non-invasively. The amplitude of the EEG signal is around 10-100 µV. In general, the brain EEG signal continuously changes over time and this behavior also affects the EEG signal acquisition process of the Emotiv headset. The nerves, tissues, fluids of human brain causes this non-linearity in the brain signal. Moreover, eye movements, eye blinks, cardiac signals, muscles noise, etc are also responsible for creating noise in the EEG signal. The Electromyography (EMG) and Electrooculogram (EOG) are two techniques that are responsible for inducing noise in EEG signals. Electromyography (EMG) is generated by muscle movements. The amplitude of EMG signal is around 20-200 µV and the frequency range is 10-2000 Hz. The Electrooculogram (EOG) is generated by eye movements. The amplitude of EOG signal rises upto 1.5 mV and the frequency

26 range is 0-20 Hz. These signals can seriously corrupt the original EEG signal leading to incorrect analysis. Therefore, these extraneous signals should be removed from the raw EEG data before classification.

The brain EEG data is first partitioned using a MATLAB script and then feed into the neural data preprocessing system. In the data preprocessing step, we removed both EMG and EOG [84] signals from the EEG data using the AAR (Automatic Artifact Removal) toolbox [65]. In order to remove other extraneous signals, we applied the Butterworth band pass filter of order 8 with a cutoff frequency of 3-60 Hz. The preprocessing steps returns the original EEG signal that has a good SNR

(Signal to Noise Ratio).

3.5.2 Independent Component Analysis

The Emotiv device contains 14 electrodes that captures brain signals from 14 different regions of the brain. These sensors work as a receiver that captures signal from a noisy mixture of sources. In order to separate independent sources that are linearly mixed in several sensors,

Independent Component Analysis (ICA) is used. ICA is used to compute a linear mapping matrix so that the unknown signals can be separated from the mixture of signals.

Let, n linear mixtures x1(t),x2(t),...xn(t) of n independent sources (electrodes) are ob- served. This can be represented as the following equation,

x j(t) = a j1s1 + a j2s2 + ... + a jnsn∀ j (3.1)

where X is set of observation of random variables whose elements are the mixture of

(x1(t),x2(t),..xn(t)), S be the random vector with components (s1(t),s2(t),..sn(t)) and n is the

27 number of independent sources.

Equation 3.1 is presented as a mathematical model to formalize this problem. Assume that, there is some data s that is generated via n independent sources.

x = As (3.2)

where A is an unknown square matrix called the mixing matrix. A dataset xi;i = 1,...,m can be obtained by repeating the observation process. The objective is to find the sources si that has generated the data (xi = Asi). Let W = A−1, which is the un-mixing matrix. Therefore, it can be written as,

si = Wxi = A−1xi (3.3)

3.5.3 Feature Extraction Using Auto Regression

For extracting features from the EEG data, we used autoregressive (AR) method. The auto regressive method is used in statistical calculations where the current value depends on the previous values. It is a stochastic process that computes the new values from the weighted sum of previous values. Let, x(t) is the current term of the series and x(t − 1) is the linear weighted sum of the previous term. For example, the first order process AR(1) indicates that the current value is computed based on the immediate previous value. Therefore, for time series data (EEG brain data) a generic formula can be developed as,

28 p x(t) = ∑ αix(t − i) + ε(t) (3.4) i=1

Here αi is weight or autoregressive coefficients, x(t) is the EEG signal, p is the order of the model, and ε(t) is noise.

The AR model is used to extract meaningful features from the clean data. The best values of autoregressive coefficients, αi from a given series x(t) is computed. The selection of order in AR is important since the quality of the application depends on it. Several studies showed that, the order six AR model works better than others [117, 128]. We used the Yule-Walker method [52] to compute

AR coefficients. Since there are total 14 channels in the Emotiv device and six AR coefficients are considered for each channel, there are total 84 (14 X 6) features for each real/fake webpage viewing brain data.

3.5.4 Feature Extraction Using Tensor Decomposition

Tensor decomposition method is useful to capture the underlying structure of the data. In this experiment, the tensor decomposition method is applied on the EEG brain data measured for a phishing detection task.

One of the most popular tensor decomposition is the so-called PARAFAC decomposition

[70]. In PARAFAC, by following an Alternating Least Square (ALS) method we decompose the tensor into 3 factor matrices. The PARAFAC decomposition decomposes the tensor into a sum of component rank-one tensors. Therefore, for a 3-mode tensor where X ∈ RIXJXK, the decomposition

29 will be,

R X = ∑ ar ◦ br ◦ cr (3.5) r=1

Figure 3.2: PARAFAC decomposition with 3 factor matrices (Time, Channel and Event). Event matrix (blue colored) is used as features.

I J K Here, R is a positive integer and ar ∈ R , br ∈ R and cr ∈ R are the factor vectors which we combine over all the modes and get the factor matrices. Figure Figure 3.2 is showing the graphical representation of PARAFAC decomposition. However, PARAFAC model assumes that, for a set of variables the observations are naturally aligned. Since, in our phishing experiments, this is not guaranteed, we switched to PARAFAC2 model which is a variation of PARAFAC model.

The dimension of the feature matrix varies in dimension 68 X N, where 68 is for the number of event and N indicates the number of components or features. We have selected different number of features for our experiment to test what number of features trains a better model.

PARAFAC2 Decomposition. In real life applications, a common problem is the dataset is not completely aligned in all modes. This situation occurs for different problems for example, clinical records for different patients where patients had different health problems and depending on that the

30 duration of treatments varied over time [134]. Moreover, participants response record for phishing detection where each of them took variable amount of time to select and decide whether the website presented is a real one or phishing one. In these examples, the number of samples per participant do not align naturally. The traditional models (e.g., PARAFAC and Tucker) assume that, the data is completely aligned. Moreover, if further preprocessing is applied in the data to make it completely aligned it might be unable to represent actual representation of the data [177][73]. Therefore, in order to model unaligned data the traditional tensor models need changes. The PARAFAC2 model is designed to handle such data.

The PARAFAC2 model is the flexible version of the PARAFAC model. It also follows the uniqueness property of PARAFAC. However, the only difference is that the way it computes the factor matrices. It allows the other factor matrix vary while applying the same factor in one mode.

Suppose, the dataset contains data for K subjects. For each of these subjects (1, 2, . . . , K) there are J variables across which Ik observations are recorded. The Ik observations are not necessarily of equal length. The PARAFAC2 decomposition can be expressed as,

T Xk ≈ UkSkV (3.6)

This is an equivalent relation of Equation 3.5. It only represents the frontal slices Xk of the input tensor X. Where, for subject k and rank R, Uk is the factor matrix in first mode with dimension

Ik X R, Sk is a diagonal matrix with dimension R X R and V is the factor matrix with dimension J X

R. The Sk is the frontal slices of S where S is of dimension R X R X K and also Sk = diag(W(k,:)).

Figure 3.3 shows the PARAFAC2 decomposition.

PARAFAC2 can naturally handle sparse data or dense data[89]. However, this statement

31 Figure 3.3: PARAFAC2 decomposition of a mode - 3 tensor. was true only for a small number of subject[30]. The SPARTan algorithm is used for PARAFAC2 decomposition when the dataset is large and sparse [134].

Formulating Our Problem using PARAFAC2. In order to apply different tensor decomposition method, at first we need to form the tensor. We form the initial tensor by considering all participants phishing detection brain data. The tensor for this experiment is of three dimension, time X channel X events.

In this experiment, the participants were given the option to take necessary time to decide whether the current website is phishing or not. Since, the participants were not restricted to take decision within a particular time-frame, it has been found that for each event different participants took variable amount of time. Therefore, it is not possible to apply general tensor decomposition algorithm and even form a general tensor.

In order to solve the above problem, the PARAFAC2 model is used in this experiment.

The SPARTan [134] algorithm is used to compute the PARAFAC2 decomposition. This algorithm has used the Matricized-Tensor-Times-Khatri-Rao-Product (MTTKRP) kernel. The major benefit of

SPARTan is that it can handle large and sparse dataset properly. Moreover, it is more scalable and faster than existing PARAFAC2 decomposition algorithms.

Phishing Detection & Tensor. In this project, each participant were shown the real and phishing

32 website and during that time, the brain EEG signal was captured. The participants were given the

flexibility to take required amount of time to select whether the website is real or not. Therefore, the observations for a set of variables do not align properly and the PARAFAC2 model is used to meaningfully align the data.

In order to create the PARAFAC2 model, the EEG brain data for all user for both real/phishing website was merged. The 3-mode tensor was then formed as Time X Channel X

Events. In events, both the real and the phishing website are considered. Therefore, the tensor formed from this dataset consists of 1853 events, 14 channels (variables) and a maximum of 3753 observations (time in seconds). Figure 3.4 shows the PARAFAC2 model of the phishing experiment.

Figure 3.4: PARAFAC2 model representing the brain EEG data across different events.

The 3 factor matrices obtained from the decomposition are U, V and W. These factor matrices representing the mode Time, Channel and Events respectively. In this experiment, we analyzed the V and W factor matrices to see which channels capture the high activity of brain regions and also distinguish between real and phishing events respectively.

In the SPARTAN algorithm [134], a modified version of the Matricized-Tensor-Times-

Khatri-Rao-Product (MTTKRP) kernel has been used. It computes a tensor that is required in the

33 PARAFAC2 decomposition algorithm. For a PARAFAC2 model, if our factor matrices are H, V and

W and of dimension RXR, JXR and KXR respectively, then for mode 1 with respect to K MTTKRP is computed as,

(1) (3.7) M = Y(1)(W V)

The computation here is then parallelized by computing the matrix multiplication as the sum of outer products for each block of (W V). The efficient way to compute the specialized

MTTKRP is, first computing YkV for each row of the intermediate result and then computing the

Hadamard product with W(k,:). Since Yk is column sparse, it reduces the computation of redundant operations. For this project, we have computed the factor matrices in Channel mode and Events mode using the above method.

Brain Data vs Tensor Rank. In exploratory data mining problems, it is really important to determine the quality of the results. In order to ensure a good quality of the decomposition, it is important to select a right number of components as the rank of the decomposition. In this experiment, we used the AutoTen[130] algorithm to assess the performance of the decomposition with different ranks.

The application of AutoTen algorithm is not straightforward for the phishing experiment, since the observations for a set of variables do not align properly. Therefore, a number of additional operations are performed to bring the tensor of whole dataset into a naturally aligned form. From equation(2), if we decompose Uk as QkH, then we can rewrite equation 2 as,

34 T Xk ≈ QkHSkV (3.8)

Where, Qk is with dimension IkXR and H is with dimension R X R. Qk has ortho normal

T columns. Now, if the both sides of the above equation is multiplied by Qk , then we get,

T T T T Qk Xk ≈ Qk QkHSkV ≈ HSkV (3.9)

Therefore, we can write,

T Yk ≈ HSkV (3.10)

T Where Yk is the outer product of Qk and Xk. The above equation is now same as the

PARAFAC decomposition with consistency in all the modes. Yk is also a tensor and is used in the

AutoTen algorithm as input. The AutoTen algorithm was run for maximum rank 20 and it has been found that, 3 is the rank for which the model can perform better. Therefore, for the PARAFAC2 decomposition using SPARTan, rank 3 is used.

3.6 Results and Analysis

In this section, we present the results and findings for our phishing detection experiments described in §3.3.

3.6.1 Classifiers and Evaluation Metrics

In this section, we discuss our classification method we used for this study. We used autoregressive coefficients as feature vectors for our machine learning model. We validated our

35 model using 2 types of algorithms which are common and gives good accuracy for EEG based classification [100]. We used Tree based - RandomForest (RF) [21] and Funcitons based - Logistic

Regression (LR) [96] to demonstrate our classifiers performance.

We choose True Positive Rate (TPR) and False Positive Rate (FPR) as our evaluation metrics. The TPR is calculated by dividing total number of correctly identified instances to the total number of instances present in the data set. An ideal classification model has a TPR of 100%. FPR is calculated by dividing the total number of negative instances incorrectly classified as positive to the total number of actual negative instances in the data set. An ideal classification model has an FPR of

0%.

Finally, we used k-fold cross-validation to validate our results, which is a broadly used technique for calculating test accuracy in the classification problem. The basic idea of cross-validation is to divide the data set into k equal-sized parts and repeat the method k times. In each iteration, k-1 subsets are combined into the training set and the remaining subset is used for testing. The process is applied each part of k= 1,2,...,K, and the average accuracy across all k trials are reported. In this method, every data point gets to be in a test set exactly once and gets to be in a training set k-1 times.

We are presenting our evaluation section into two different sections.

• Global Model. In this model, we collected training and test samples from different users. We

showed the classification performance after merging the data of across the users and across the

sessions. The result obtained from the global model is given in section §3.6.2.

• Human vs Machine. In this setting, we compared the human vs our machine learning model

accuracy based on the number of mistakes. We showed that machine learning model can help

cybersecurity vulnerable users. We have presented the results of human vs machine model in

36 Metric→ TPR FPR Algorithm↓ Logistic Regression 93.00 6.10 Random Forest 85.00 14.00 Table 3.2: True positive rate and false positive rate for global model

section §3.6.3.

3.6.2 Global Model

To recall, in the global model, we merged all 15 users data for all sessions and applied

15-fold cross validation. We choose 15-fold because we have 15 users in our dataset. Here, the dataset is divided into 15 subsets where 14 subsets will be in training set and rest one subset will be in testing subset.

15-fold cross-validation. In this section, we present the result of our global model for 15-fold cross validation. The classification performance for 15-fold cross-validation is given in Table 3.2 a). We got 93% TPR while FPR is 6% for MultilayerPerceptron algorithm. We can observe that overall true positive rate of detecting the fake and real website is lower than the individual model. The reason behind this is that, we are using brain signal from different individual and each individual has some unique signatures of brain signals.

3.6.3 Human vs Machine

In this section, we discuss how our model may help users vulnerable to phishing attacks.

We compare the performance of the worst performing participants, the participants with the accuracy below average (see Table 3.3, with the performance of the phishing detection model built using the best performing classifier. We used only the neural data associated with users’ correct responses

37 from three sessions for training our model. We then computed the accuracy of this model in correctly identifying the real and fake websites based on the EEG data measured in the fourth session. We repeated this analysis four times alternatively making each session a testing set, and report the average result in Table 3.3.

Table 3.3 presents the average accuracy of users vs. our model in correctly identifying phishing websites. We notice that our machine learning model based on human brain signal outshines the users performance (84.8%) with an accuracy of 97.7%. We also performed paired sample t-test to determine the significance of differences between users and our model in correctly identifying fake websites. We observed that our machine learning model has statistically significantly higher accuracy compared to human (p=.00002). We present the comparison on the number of mistakes made by human and machine in Figure 3.5, and demonstrate how our model may help vulnerable users by automatically detecting phishing webpage using their brainwave. We observe that human made nearly two-fold mistakes compared to the machine learning model.

Human Machine User1 80.14 98.08 User2 86.02 96.58 User3 86.76 100.00 User4 88.23 97.50 User5 86.02 94.87 User6 86.76 96.61 User7 82.35 100.00 User8 82.35 98.21 Mean 84.82 97.73 STD 2.83 1.75 p-value 0.000028 Figure 3.5: Human vs Machine mistakes Table 3.3: Comparing Human vs. Ma- chine performance accuracy These results suggest that our unconscious decision might sometimes be better than our

38 Metric→ AR (All*) Tensor (All*) Tensor (Top 6*) Algorithm↓ TPR FPR TPR FPR TPR FPR Logistic Regression 93.00 6.10 94.00 6.00 95.08 4.83 RandomForest 85.00 14.00 89.30 10.70 97.62 2.39 Table 3.4: In this table, we present the classification results with feature extraction of autoregression and tensor decomposition. In tensor, we have classification results for two scenarios. First, we used features of tensor components by considering all channels and top 6 channels based on their activation. All* = All Channels, Top 6* = Top 6 Channels conscious decision. The users might have unconsciously noticed the differences in the URL or the layout of the websites. Previous neuroscience studies [14, 46] have also reported similar findings.

Beck et al. [14] reported that human may make an optimal decision when their unconscious brain works with conscious brain. For example, we may not consciously make decision to stop our vehicle at the red light or steer around the obstacle in the road. Dehaene [46] reported that the unconscious decision making process may not achieve a threshold required make a clear decision, and hence may not be reflected in the conscious thoughts. Similarly, in our study, the doubts or the suspicions raised by the users’ unconscious brain on the legitimacy of a fake website might not have achieved the required threshold to translate into their final decision.

3.6.4 Comparison of Auto-regression and Tensor Decomposition Result

In this section, we compare our classification performance for detecting the real and phishing page using autoregression and tensor decomposition method. We used global model data and extracted features using tensor decomposing that we discussed in Figure 3.5.4 and applied the different type of machine learning algorithms for distinguishing the real and fake website based on brain data and check our classifiers performance. We use 15-fold cross validation like §3.6.2. For feature extraction, we used rank 3 as computed by our modification of AutoTen in Equation 3.5.4.

We compared our classification performance between AR and tensor decomposition. The

39 Figure 3.6: AUC curve for All channels vs Top 6 channels using RandomForest algorithm. Here, we observed that TPR for all channels is 79.04% and 94.91% for top 6 channels when FPR is < 1% summary of classification performance can be found in Table Table 3.4. We have seen that for considering all channels logistic regression algorithm gives 94% accuracy. We get 97% accuracy for considering top 6 highly activated channels using Random Forest algorithm. Tensor result are better than autoregressive coefficient. We achieved improved performance in than prior studies [119] for the same task where they get 76% accuracy for considering single users fNIRS data.

We also validated our comparison by plotting the ROC curve in Figure 3.6 using Random

Forest algorithm which give the best accuracy among all the algorithms. In an ideal scenario, the

AUC should be 100%. The baseline for AUC is 50%, which can be achieved through purely random guessing. Our model achieved 97.32% AUC for when considering all channels data and 99.22% when considering only top 6 highly activated channels data. We have seen that our True Positive

Rate is 79.04 in case of all channels data and True Positive Rate is 94.91 in case of top 6 channels data while keeping False Positive Rate less than 1%. Reducing the channels gives us better phishing detection accuracy.

40 3.7 Discussion

In this section, we answer why we are getting good accuracy in classifying real and fake websites using brain data. We highlight the several key points for getting the good accuracy. First, we show that certain brain areas are highly activated during the phishing detection task. Second, we show that there is a statistically significant difference between the real and fake components. We also outline the strengths and limitations of our study.

3.7.1 Phishing Detection Mechanism

The ultimate goal for this study was to develop an automated system for phishing webpage detection based on users’ action and decision. The previous studies by Neupane et al.[120, 118, 119] had reported differences in neural activations when users were viewing real and fake websites. In our study, we also observed the differences on visualizing the brain components related to real and fake websites. Based on these differences, we built neural cues based machine learning models and evaluated §3.6 them to understand the feasibility of such commercial BCI device based phishing detection mechanisms. We demonstrate that the true positive rate of our phishing detection models based on neural cues is 97% with the false positive rate of 2%. The results we achieved in our study is better than the results of Neupane et al. observed in their study of such phishing detection mechanism using clinical-grade fNIRS device.

We also performed the statistical comparison of the results obtained from phishing detection engine with the results obtained from the users’ task performance result in Table 3.3. We report the statistically significantly better accuracy of phishing website detection from neural cues based model compared to the users with the accuracy below average. This shows that our detection engine

41 is capable of helping users to make better decision. We also show in Figure 3.5 that our machine learning models made fewer mistakes than human in identifying the fake websites. This demonstrates that the neural cues based machine learning models could correct the mistakes done by the users. Our results are also better than most of the existing phishing detection system [119], and are comparable with the best ones. Also, our phishing detection models have got better reaction times than the existing phishing detection system and hence may help prevent the zero-day attacks.

The performance of our system may be limited by the state of the mind of the users. Under conditions like stress, and exhaustion, the system may not be able to accurately infer the neural signatures of phishing detection. However such scenarios can be automatically monitored by these

EEG-based systems and we can switch to heuristics based or blacklist based phishing detection mechanisms under these circumstances. Future studies may be conducted to test such scenarios.

3.7.2 Feasibility of the Defense Mechanism

The development of Brain Computer Interfaces in on the rise [137]. Companies [23] like Facebook and Neuralink are laying out projects to convert users thoughts into actions. The researchers have published papers showing that earpiece EEG-based authentication [43] system can be efficient and secure. In this context, we can assume that these devices will be more robust, common and embedded in daily spheres of life in near future and our defensive systems can be easily integrated in such systems.

3.7.3 Phishing Detection vs Brain Areas

In this section, we provide a concise neuro-scientific insights of the brain data measured for the phishing detection. We discuss the relationship between the brain activity and phishing detection

42 task. In our experiments, we collected brain data from human scalp using a commercially available non-invasive brain computer interface device. The data we collected using Emotiv Epoc+ device come from fourteen (AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF4) different sensors as shown in Figure 3.7. These sensors are placed on different regions according to the International

10-20 system. Two sensors positioned above the participant’s ears (CMS/DRL) are used as references.

Sensors location and functionality of each region is given below:

• Frontal Lobe, located at the front of the brain and associated with reasoning, attention, short

memory, planning, and expressive language. The sensors are placed in those area area AF3,

F7, F3, FC5, FC6, F4, F8, and AF4.

• Parietal Lobe, located in the middle section of the brain and associated with perception, making

sense of the world, and arithmetic. The censors P7 and P8 belongs to this area.

• Occipital Lobe, located at the back portion of the brain and associated with vision. The sensors

from this location are O1 and O2.

• Temporal Lobe, located on the bottom section of the brain and associated with sensory input

processing, language comprehension and visual memory retention. The censors of this location

are T7 and T8.

Based on the factor analysis in channel dimension, we observed that AF3, F3, FC5, F7, P7 and P8 are highly activated for phishing detection activities. For phishing detection tasks, mostly

Frontal lobe and Parietal lobe sensors are highly activated. In Figure 3.7 a), we present the channel activity based on channel factor data. Here, we consider all phishing detection events and get the factor matrix data in channel dimension using rank 3. We consider the first component data for

43 drawing this graph. We have found that same subset of channels while considering the second and the third component data. In Figure 3.7 b) we show the corresponding brain mapping for phishing detection task. Higher the red is the higher brain activity for phishing detection task. Our findings are aligned with the prior fMRI [120] and fNIRS [119] studies.

Figure 3.7: a) shows the channel activity after the application of SPARTan decomposition on the tensor. The channel data for the first component is plotted in this figure to determine which channels have high activity. b) shows the corresponding brain region activation.

3.7.4 Statistical Analysis: Real vs Fake Events

In this subsection, we present the statistical analysis of the components obtained from the tensor analysis brain data related to real and fake events. First, we performed the Kolmogorov-

Smirnov (KS) test to determine the statistical distribution of the first component values of real and fake factor matrix. In KS test we observed that the distribution of the real and fake samples were non-normal (p < .0005). We then applied Wilcoxon Singed-Rank Test, a non-parametric test comparing two sets of scores that come from the same participants, to measure the difference between real and fake components. We observed that there was statistically significantly high differences between the real and fake components (Z = 6.8, p < .0005).

44 3.7.5 Feature Space Reduction

One of the primary application of our study is the reduction of the dimension of the feature vector by keeping the features related to highly activated frontal and parietal channels. We observed that the prediction accuracy of the machine learning model trained on the the features belonging to the top 6 highly activated channels was better than the prediction accuracy of the models better trained on features related to all channels. Our model achieved 97% of accuracy while applying reduced features vector. From ROC curve in Figure 3.6, we can see that our true positive rate increases from

79% to 94% when we use reduced feature vector in classification while keeping false positive rate <

1%.

3.7.6 Study Strengths and Limitations

We have several strengths in our study. First, we designed our experiment in a way so that it resembles the real world browsing experience. During the experiments, we did not impose any time limit for displaying the webpages and deciding the real and fake like prior studies [120, 118, 119] where the user can see the webpage for fixed five seconds. Second, we used an Emotiv EPOC+ device which is commercially available hugely in entertainment, gaming, and research. On the other hand, prior studies [120, 118, 119] used medical grade fMRI machine and make these approaches impractical. Third, we leveraged independent component analysis and autoregression for feature extraction rather than simple features like mean, median, and standard deviation used in the prior work [119]

However, our study also has several limitations, similar to other prior studies involving human subjects [109, 118]. First, the sample size in our study is relatively small in terms of

45 participants and the number of trials. All participants produced total 2040 events where we considered

1853 correct responses for our machine learning model. It is smaller than the data set used in general machine learning applications. We, however, used cross-validation to validate our models and prevent them from overfitting in our study. Second, our participants were students of computer science department who were familiar with phishing attacks. People who do not have knowledge on phishing websites might not have similar activation and may be more vulnerable to these attacks. Future studies with broad participants may be needed to generalize our results. Third, we conducted our study in lab environment, which is a common limitation for neuroscience based studies [109, 117].

This might have impacted the performance of the users in the study. Fourth, in our study, we explicitly asked the participants to identify real and fake websites. In real world the users have to make this decision implicitly and may not perform as good as the results of our study. Fifth, our classifier might be vulnerable for adversarial machine learning attacks. In such scenarios, the heuristic based phishing detection approaches can be used to validate the results.

3.8 Chapter Conclusion

In this paper, we studied the problem of detecting phishing webpages using brain signals.

We designed an experiment where users were shown two different version of a webpage (real vs. fake) while EEG data of their brain was collected. The collected data then went through rigorous pre- processing to remove noise. After this, we apply the autoregression and tensor decomposition method to extract features for machine learning classifiers. In this paper, we show that tensor representation of brain data helps better understanding of the brain activations during the phishing detection task.

We observed that right frontal and parietal areas are highly active for participants during the phishing

46 website detection task. These areas are involved for decision making, reasoning, and attention. We reduce the dimension of feature vectors and achieve a maximum 97% of classification accuracy while considering only highly activated brain area sensor’s data. Our results show that the proposed methodology can be used in the cybersecurity domain for detecting phishing attacks using human brain data.

47 Chapter 4

IAC: On the Feasibility of Utilizing

Neural Signals for Access Control

Access control is the core security mechanism of an operating system (OS). Ideally, the access control system should enforce context integrity, i.e., an application can only access security and privacy sensitive resources expected by users. Unfortunately, existing access control systems, including the permission systems in modern OS like iOS and Android, all fail to enforce context integrity thus allow apps to abuse their permissions. A naive approach to enforce context integrity is to prompt users every time a sensitive resource is accessed, but this will quickly lead to habituation.

The state-of-art solutions include (1) user-driven access control, which binds a predefined context to protected GUI elements and (2) predicting users’ authorization decision based on their previous behaviors and privacy preferences. However, previous studies have shown that the first approach is vulnerable to attacks (e.g., clickjacking) and the second approach is challenging to implement as it is difficult to infer the context. In this work, we explore the feasibility of a novel approach to enforce

48 the context integrity—by inferring what task users want to do under the given context from their neural signals; then automatically authorizes access to a predefined set of sensitive resources that are necessary for that task. We conducted a comprehensive user study including 41 participants where we collected their neural signals when they were performing tasks that required access to sensitive resources. After preprocessing and features extraction, we trained machine learning classifier to infer what kind of tasks a user wants to perform. The experiment results show that the classifier was able to infer the high-level intents like take a photo with a weighted average precision of 88%.

4.1 Introduction

Access control is the core security mechanism of an operating system (OS). It decides what resources a subject can access and in what way the access can be performed (e.g., read, write, execute). Classic access control models include Discretionary Access Control (DAC), Mandatory

Access Control (MAC), Role-based Access Control, Attribute-based Access Control, etc. An important property of all these models is that a subject is not the human user, but a process/thread that operates on behalf of the human user (i.e., a proxy). Therefore, the effectiveness of these models heavily relies on the assumption that the software truly operates as the user intended. This assumption generally held in the early era of computing history when software was either written by users themselves or by a trusted authority (e.g., an administrator). However, with the boom of the software industry, this assumption no longer holds—as users, we usually do not fully understand what a piece of software truly does. Consequently, numerous security and privacy issues arise. For example, ransomware can abuse our credentials to encrypt our files and spyware can easily steal our private information.

49 Modern operating systems like iOS and Android use sandbox and permission system to mitigate this threat. In these systems, apps are no longer trusted—by default, they can only access to their own files and limited system resources. Accesses to user-owned data and privacy sensitive sensors are mediated by the permission system through which user can decide either to allow the accesses or deny them. While this is a step forward, the problem of these systems (iOS and Android M+) is that they only ask users to authorize the first access to the protected resources, a.k.a., ask-on-first-use (AOFU). Any subsequent access to the same resource will be automatically allowed unless users manually revoke the permission. However, since an app can have different functionalities and the resources may be used under quite different context, recent research results have shown that AOFU failed to protect users’ privacy over half of the time [179].

A straightforward idea to solve this problem is to prompt user every time a protected resource is about to be accessed. However, as the number of accessing requests can be huge (e.g.,

Wijesekera et al. found that a single app can make tens of hundreds of requests per day [179]), this approach can easily cause habituation and loose its effectiveness. So, the real challenge is how to reduce the number of prompts without sacrificing users’ privacy.

A general idea to solve this challenge is to infer what decision a user is likely to make thus avoiding redundant prompts [123, 180, 114, 150, 148, 82, 101]. Existing solutions can be divided into two directions. Solutions in the first direction associate GUI gadget with predefined context, then extract user’s authorization from their interactions with the gadget, a.k.a.user-driven access control [150, 149, 148, 82, 186, 163, 101, 114, 135]. For example, a downloaded file is allowed to be executed only if the user has clicked the “Save” button to save it [101]; an email is allowed to be sent only if the user has clicked the “Send” button and its content must match what is displayed

50 on screen [82]; and only when the user clicks the “Camera” button can an app access the camera device [150, 148, 114]. However, this associating user’s authorization to GUI gadgets has two major drawbacks. First, there are many GUI attacks that can mislead the user, such as clickjacking attacks [76]. For this reason, existing user-driven access control models have to employ additional steps to prevent such attacks, e.g., by isolating the gadgets from the application and letting the OS to render [150]. Secondly and more importantly, not all legitimate resources accesses are initiated from user interaction [55].

The second direction is to predict users’ authorization decision based on their privacy preference [99], privacy profile [98], or previous authorization decisions and other behaviors [123,

180]. Because the decisions are usually context-sensitive, the biggest challenge for this direction is how to infer the context. Olejnik et al. used 32 raw features to define a unique context but admitted that they are not exhaustive [123]. Wijesekera et al. believed that the problem of inferring the exact context in an automated way is likely to be intractable thus focused on inferring when context has changed [180].

In this study, we explore the feasibility of a new way to infer users’ authorization decisions— by directly inferring their intent through the brain-computer interface. Our observation is that the notion of contextual integrity [122] suggests that each unique context will setup a set of corresponding social norms on how users would expect their privacy information to be used. Whenever the information is used in ways that defy the users’ expectations, a privacy violation occurs. Applying this notion to the access control systems (permission models) implies that we can automate the authorization process by (1) associating each context of an app with a functionality it appears to perform; (2) associating each functionality with a set of expected sensitive resources that are

51 necessary (i.e.norms); and (3) limiting the requested resources to the expected set. However, as discussed earlier, the first step—inferring functionality from a context is very difficult. The key idea behind our approach is that we can actually avoid solving this challenging problem by utilizing our brain as a “magic” inference engine to directly output the result: what is the intended functionality the user wants to perform under the given context. Once we can infer intents from the user’s brain signals, we can easily follow step (2) and (3) to make authorization decisions.

As the first footprint towards this direction, this work studies the feasibility of constructing such a decision-making system based on non-invasive electroencephalography (EEG) headset. Recent advances of the EEG sensor technology have enabled us to use consumer-grade headset to capture brain signals that used to be only available to clinical settings with invasive probes. Utilizing these

EEG sensors, researchers have shown it is possible to recognize simple mental tasks as well as playing games. In this study, we aim to explore the feasibility of utilizing these sensors to infer user’s intent through answering the following research questions:

• Q1: Is it possible to extract high-level intents (e.g., taking a photo) from the neural signals with

a machine learning classifier?

• Q2: Is the accuracy of the classifier high enough to support automated authorization?

To answer these questions, we designed and conducted a user study with 41 participants.

Experiment over the collected data indicates that the answers to the above research questions are mostly positive. Specifically, our classifier is able to distinguish four different high-level intents

(taking a photo, taking a video, choosing a photo from the gallery, and cancel) with a weighted average Precision of 88.34%, while the weighted average Recall is 86.52%, and the weighted average

F − measure is 86.92%.

52 In brief, our contributions in this study are:

1. We designed a new intent-driven access control model that relies on inferring of user’s high-

level intents through the brain-computer interface (BCI).

2. We experimentally validated the feasibility of constructing such a system with consumer-grade

EEG headset via a user study of 41 participants. Our experimental results show the feasibility

of intent-driven access control. To our best knowledge, this is the first study of utilizing brain

signals to protect users’ privacy.

The rest of the paper is organized as follows: §4.2 provides the background on Electroen- cephalography (EEG), Event-related potential (ERP), Emotiv Epoc + headset and Brain Computer

Interface (BCI) which are required to understand our study, §4.3 introduces the threat model of our new access control design and how it works, §4.4 presents the experiments design and experimental procedures, §4.5 provides the details of how raw EEG data is processed before feeding into a ma- chine learning algorithm, §4.6 empirically answers the two research questions, §4.7 discusses the limitations of our existing design and possible future work, §4.8 compares our work with related research, and §4.9 concludes the paper.

4.2 Background

In this section, we give the background of Electroencephalography (EEG), event-related potential (ERP), Emotiv Epoc + headset and Brain Computer Interface (BCI).

EEG. Electroencephalography (EEG) is a monitoring technique that records the brain’s electrical activities. The recorded EEG data is a time series data. Voltage fluctuations generated from neurons

53 inside the brain are captured by electrodes and amplified. The electrodes are usually placed in a non-invasive way (i.e., attached to the skin of the head scalp), but they can also be used invasively.

For this study, we used non-invasive EEG sensors.

Event-Related Potentials. Event-related potentials (ERPs) are small but measurable (with an EEG sensor) voltages changes generated by the brain in response to a stimulus event. The stimulus events include a wide range of cognitive, sensory, or motor activities, such as showing different letters to the participants, or in our experiments, performing a given task with mobile apps. ERPs are time-locked to the stimulus, i.e., given stimuli, an EEG voltage change is always expected in a known time frame. Because the voltage changes are small, ERPs are calculated by averaging multiple trials of the time-locked EEG samples. This procedure will filter out the background EEG noise and extract ERPs. The resulting ERP waveforms consist of a sequence of positive and negative voltage deflections, which are called components. So far, researchers have discovered more than a dozen

ERP components [166]. Among them, the most well-studied ERP component is P300 or P3 wave.

Emotiv EPOC+ Headset. The Emotiv EPOC+ wireless headset [1] is a consumer-grade EEG device that is widely used in gaming and entertainment industry. It allows gamers to control computer game based on their thoughts or facial expression [143]. We used this device in our study because it is significantly less expensive than other clinical-grade EEG devices and is more portable. For this reason, it is also widely used in research projects [109, 117]. The headset consists 14 data collecting electrodes (AF3, AF42, F3, F4, F7, F8, FC5, FC6, O1, O2, P7, P8, and T7, T8) and two reference electrodes (CMS/DRL). The electrodes are placed according to the International 10-20 system.

Getting good quality signal from the Emotiv headset requires pressing the two references for 5s or more before data collection. The Emotiv headset collects EEG data at 128 sample per

54 second 1. The captured EEG signals are then converted to digital form. The digital data are then processed and transmitted as encrypted data to the stimuli computer via USB dongle receiver. This proprietary USB dongle communicates with Emotiv headset in 2.4 GHz frequency. Emotiv also provides companion software for its device. EmoEngine is a software component for post-processing data. This software exposes data to BCI applications via the Emotiv Application Programming

Interface (Emotiv API). Pure.EEG is a software component for data collection, which is used in this study. Pure.EEG collects data from the Emotiv device independently via the USB dongle and can upload data to the cloud and download from the cloud recorded sessions.

BCI. Brain-Computer Interface (BCI) is a new type of user interface where our neural signals are interpreted into machine understandable commands. Here, it converts brainwaves into digital commands which instruct machine to conduct various tasks. For example, researchers have shown it is possible to use BCI to allow patients who suffer from neurological diseases like locked-in syndrome to spell words and move computer cursors [17, 167] or allow patients to move a prosthesis [164].

With BCI, instead of using physical interactions human can use mind interaction. In our study, we choose this interface as it can directly reveal the user’s intent thus is resistant to some perception manipulation attacks (e.g., clickjacking [76]).

4.3 Intent-driven Access Control

In this section, we introduce how our new access model would work. We start with the threat model and assumptions. Then we show how to realize the model with BCI.

1The device internally collected data at a frequency of 2048 Hz, then down-sampled to 128 Hz before sending it to the computer.

55 Figure 4.1: Overview of IAC’s Architecture. IAC will 1 continuously monitor the brain signals using the EEG sensor and user interaction with the system. Upon an input event, IAC will create an ERP, 2 preprocess the raw EEG data to get purer signals, 3 extract feature vector from the purified signals, 4 feed the extracted features to a ML model to infer the user’s intent. In step 5 , if the ML gives enough decision confidence, IAC will directly 7 authorize access to protected resources. Otherwise, it will 6 prompt users to authorize the access and improve the ML model with the feedback loop.

4.3.1 Threat Model and Assumptions

We make following assumptions for constructing a BCI-based intent inference engine and use it to authorize access to user-owned sensitive resources and sensors. We assume the OS is trusted.

Attacks that exploit OS vulnerabilities to gain illegal access to the protected resources and sensors are out-of-scope. We also assume the OS already employs a permission model that considers context integrity (e.g., an ask-on-every-use model). Our goal is not to replace the existing access control system, but to make it more user-friendly.

We assume our adversary is skilled application developer aiming to gain access to the user-owned resources/sensors without user’s consent and abuse such access. Attackers are allowed to launch UI attacks (e.g., clickjacking) to mislead users. With one exception, to correctly identify which app the user is interacting with, the OS should not allow transparent overlay [61]. We consider phishing-style attacks (e.g., explicitly instructing users to perform sensitive operations) and side-channel attacks (that leak protected information) out-of-scope.

Regarding access to the raw EEG data, we envision a restricted programming model.

Specifically, existing platforms like Emotiv expose raw EEG data to any applications build against their APIs. This programming model has been proven to be vulnerable to side-channel attacks that

56 can infer user’s sensitive and private information [19, 109, 59, 117]. To prevent such attacks, we assume a programming model that is similar to the voice assistants [8, 115]. That is, the raw EEG data is exclusively accessed by a trusted module, which will interpret the data and translate into app understandable events. We assume our inference engine to be part of this module and is implemented correctly. We also do not consider physical attacks against the EEG sensors.

4.3.2 IAC via BCI

Our BCI-based intent-driven access control system works similarly to the systems proposed in [180, 123]. In particular, the baseline access control system will prompt the user to authorize every access to protected resources. The goal of IAC is to minimize the number of the prompts by checking whether the access is intended by the user. Specifically, a legitimate access to protected resource should be (1) initiated by user’s intent to perform a certain task under the presented app context2 and

(2) within the expected set of necessary resources for that task. Therefore we can create intent-based access control mechanism based on ERPs and use them as the inputs to a machine learning classifier.

The data flow diagram for IAC is given in Figure 4.1.

Figure 4.2: Example permission request. Compare to existing permission request, the biggest difference is IAC also asks for intended task (e.g., taking a photo).

To train the classifier, we use user’s explicit answers to the ask-on-every-use prompt as the ground truth. However, instead of just asking the user to authorize the access, IAC will also list

2Note that unlike access control gadget, we do not require the intent to be expressed through certain interactions with the app’s GUI.

57 a set of tasks that rely the requested resource for user to choose (e.g., Figure 4.2). If the access is authorized, we label the ERP with the task user has chosen; otherwise the event is discarded.

During the normal operations, the OS will continuously monitor neural signals through the BCI device as well as user’s interaction with the system to create and cache most recent ERPs.

ERPs are bound to the app to which the input event is delivered (e.g., the most foreground app at that moment) and will expire after a context switch. This prevents one app from “stealing” another app’s

ERP. Upon an application’s request to access a protected resource (e.g., camera), the access control system will retrieve the most recent ERP. The ERP will then be fed into the trained classifier to infer whether the user intended to perform a task that requires access to that resource. If so, permission is automatically granted to that request; if the intended task does not require the permission or the confidence of the classification result is not high enough, IAC will fall back to prompt the user to make the decision. The ground truth collected from the prompt window is then to update the ML model. As demonstrated in previous works [123, 180] and our own experiment, this feedback is important for fine tuning the ML model to improve the precision of the prediction.

Applicable Scenarios. Apparently, using BCI-based access control for existing systems like desktop and mobile devices is impractical; users need to wear the device all the time. However, this field is advancing fast and companies like Facebook and Neuralink are laying out projects to decode users’ intents into machine readable commands to scroll menus, select items, launch applications, and manipulate objects [23]. BCI has also been used in manufacturing to control machines [3, 152] or to monitor workers’ mental status in order to avoid over-stressing [28]. With the rapid progress in neural imaging and signal processing, in not so distant future, BCI-based applications can be far beyond gaming and entertainment. Hence, we believe BCI could become ubiquitous and a practical

58 (a) Main activity (b) Task options 1 (c) Task options 2 (d) Task activity

Figure 4.3: Android app for data collection. way to interact with digital systems and our IAC be easily integrated into such systems to protect users’ privacy.

4.4 Experiment Design

The goal of our experiment is to study the feasibility of inferring user’s high-level intents through the brain-computer interface (BCI) and use user’s intents to authorize access to protected resources. More specifically, we want to assess whether the event-related potentials (ERPs) recorded using a consumer-based EEG headset could be used to infer three types of high-level common tasks:

(1) taking a photo, (2) taking a video, and (3) pick a photo from library. The hypothesis to be tested is:

Hypothesis. Visual and mental processing of each unique intention has distinguishable patterns in event-related potentials that can be extracted with a supervised machine learning algorithm.

59 4.4.1 Single App Experiment

We designed a special Android app (Figure 4.3) to test our hypothesis. This app consists of three steps. The main activity (Figure 4.3a) contains 10 TASK buttons to start 10 sets of tasks.

The tasks are randomized in each set. Before starting each session, participants will click the START button to begin logging all the click events into a text file. In each session, participants are asked to go through all 10 sets of tasks. Clicking on each TASK button will lead to the task option screen

(Figure 4.3b and Figure 4.3c). Here participants are asked to perform 4 actions. When an action is

finished, participants will return to the same task option screen and continue to the next task until all

4 actions are done. Then they move on to the next task set. When all 10 sets of tasks are completed, participants will click the STOP button to stop the session and take a break before starting another session. Among these 4 tasks, three involve accessing user-owned privacy sensitive sensors (camera and microphone) and files (photo gallery). The order of these four tasks is different between different task sets. Details of the 4 tasks are listed below.

• Take Photo (Photo) Clicking this button will send a

MediaStore.ACTION_IMAGE_CAPTURE intent, to start the camera app. As the name suggests,

participants are then asked to take a photo of a target object (e.g., a pen Figure 4.3d) with the

camera app. This task requires access to the camera device.

• Take Video. (Video) Similar to taking a photo, clicking this button will send a

MediaStore.ACTION_VIDEO_CAPTURE intent and invoke the camera app. Participants are then

asked to take a short video of the target object. The differences from taking a photo are (1)

taking a video will access both the camera and the microphone device and (2) accessing to both

devices are continuous.

60 • Choose from Gallery. (Gallery) Clicking this button will send an

Intent.ACTION_GET_CONTENT intent with image/* type. Participants are then asked to

pick the photo of the target object (e.g., a pen) from the photo gallery of the Android device. To

make sure the photo is always available, we do not use this task as the first option of the first

task set. This task requires access to the privacy-sensitive files.

• Cancel. Cancel is a unique task, it does not perform particularly interesting operations or access

any privacy-sensitive resources. Its sole purpose is to ask the participants to click a button on the

touchscreen of the phone.

Alternative Explanations. An important part of this experiment design is to rule out a few alternative explanations (AE). Specifically, as our experiment involves asking participants to perform a task using the smartphone, we want to rule out the possibility that what we captured from the neural signals is not the user’s intent to perform the given task but

• AE1: The intent to interact with the phone (e.g., click a button).

• AE2: The intent to click a specific position of the touch screen (e.g., a button at a fixed position).

• AE3: The reaction of seeing similar pictures.

We added the Cancel task so if AE1 is true, we will not be able to distinguish the Cancel task from the rest tasks. We randomize the order of the tasks on the options activity so if AE2 is true, we will not be able to distinguish between randomized tasks. We deliberately choose three visually similar tasks so if AE3 is true, we will not be able to distinguish between these tasks that involve the same photo.

61 4.4.2 Multiple Apps Experiment

For testing the “portability” of the learned model (i.e., the model can identify the same intent across different apps and contexts), we designed a second experiment with eight popular real-world apps (Table 4.1). All of them have more than 500k downloads in the Play Store.

We created testing accounts for WhatsApp, Hangouts, Messenger, Snapchat and Instagram. The other three apps Camera, QuickVideo, and VideoCamDirect did not need any account to take photos or videos.

We instructed participants to browse these apps as they use it in their real-life (e.g., they might be taking a photo, or writing texts). However, in this study, we just focus on the participants’ interaction events related to the following three tasks: (1) taking a photo, (2) taking video, and (3) select and upload a photo from the gallery. This experiment has more realistic and ecologically valid settings as the participants were browsing these popular apps and performing the common tasks (e.g., take photo, take video and upload photo) as per their own choice.

Table 4.1: The list of Apps used in testing phase: we test the performance of the model built on the neural data collected from the in-house android app in correctly identifying the intention of the users when they interact with these real apps. App Name Actions Facebook Messenger Photo, Gallery Google Hangouts Photo, Gallery WhatsApp Photo, Gallery Instagram Photo, Gallery Camera Photo, Video VideoCamDirect Video QuickVideo Video SnapChat Gallery

62 4.4.3 Experimental Procedures

Participants Recruitment. After obtaining the IRB approval, we recruited a total of 41 healthy participants for our experiments. Among the 41 participants, 33 participants were for single app experiment and 8 participants were for multiple app experiment. Participants were recruited by word of mouth, flyers, and social media (Facebook) advertising. Informed consent and some non- personally-identifiable data (gender, age, and major) were obtained from all participants. Twenty- seven (65.85%) of the participants were male, and Fourteen (34.15%) were female.

Experiment Setup. The experiment consists a consumer-grade EEG headset (Emotiv EPOC+), an

Android phone (Google Nexus 5X), an experiment app (§4.4.1), a laptop, and the Emotiv software package [2]. Participants are asked to use the app on the Android phone while wearing the lightweight

EEG headset. The EEG headset connects to laptop and sends EEG data via a Bluetooth dongle. The

Android phone connects to the laptop via USB. To construct the ERPs, the Android app records the timestamp of the task. Clocks of the phone and the laptop are synchronized with network time to precisely align the event time stamps and the EEG data. EEG data is recorded using the Emotiv

Pure.EEG software.

Testbed. Our testbed is based on Android. To ease the creation of ERP, in the experiments, we use touch events as the anchor to distinguish different ERPs. In particular, we developed a standalone monitoring app which uses the accessibility service in Android to capture all the touch events (using the flagRetrieveInteractiveWindows flag) [61] and log the timestamps of the events and the target

GUI element. The logged timestamps are then used to synchronize with the neural signals captured by the BCI device and generate ERPs corresponding to the touch events. To label ERPs, we manually label GUI controls with corresponding intents (similar to access control gadget). If a monitored

63 Figure 4.4: Experiment setup user is playing android apps while wearing the Emotiv Epoc+ BCI headset. The sensors of headset captured neural signals, converted to digital form and transmitted encrypted data to the neural data collection computer via USB dongle receiver. touch event triggers a labeled GUI control, we tag the ERP with the corresponding intent.

Preparation Phase. The first step of the preparation is to inform participants that their brain signals would be collected while playing our app on our test Android device and will be used to improve the access control model. Next, we sanitize the electrodes of the EEG headset and apply gel on them to improve their connectivity with the skin. Then we set-up the EEG headset by putting it on the head of the participant. Because the signal-to-noise ratio is lower in raw EEG data, additional preparation steps are followed to ensure the quality of the data. First, all experiments were conducted in a quiet meeting room reserved for one participant only (Figure 4.4).

Second, a preprocessing step is carried out on the raw EEG data to increase their signal-to- noise ratio. During preprocessing, noise reduction is applied to each of the raw EEG channels. To ensure all the signals from the electrodes were properly channeled, we checked the Pure.EEG control panel [2]. With the help of this tool, we can validate the signal strength of each channel (electrodes).

The color green against the channel in the control panel meant good strength while black meant no signal.

Task Execution Phase. Before starting the data collection, the operator verbally instructed to the participants about the procedure of experiments. For the single app experiment, all participants

64 performed the same set of tasks for 5 sessions, where each session includes performing all 10 sets of tasks (Figure 4.3a); so a total number of 200 actions (trials) were performed by each participant if without doing any mistake. All sessions were performed on the same day and in the same room.

A break of 2-4 minutes was given to participant between each session. Users were instructed to stay calm and relax in the entire session of the experiment. In real life, participants may not face close to 40 actions within a short time (∼ 5 min). However, multiple trials are the fundamental requirement of most ERP-related study [152, 109]. We conducted this single app for proving the ground truth of IAC. For multiple app experiments, participants interacted with 8 popular apps for the entire time of the experiments. They were instructed to play those apps for approximately 25 minutes. The operator notified the participants to stop the browsing after 25 minutes. However, the participants were allowed to stop the session if they were feeling uneasy or bored. On average, the session duration for this experiment was 21 minutes. After finished the experiment, if the participant is interested about our study, we explained the details of our experiment to those curious participants.

4.5 Data Process and Analysis

Figure 4.1 depicts the work flow of our system. First, we acquire the neural data using the

EEG device. Then the raw EEG data is preprocessed to make it usable for the classifiers. Next, we apply Independent Component Analysis (ICA) to recover original signals from unknown mixtures of sources and extract features using autoregressive coefficients. Finally, we utilize machine learning

(ML) techniques to get the intent.

Raw Data Acquisition. We collected raw EEG data using the Emotiv Pure.EEG software [2].

We synchronize the EEG data with actions (i.e., click events received by the app) using calibrated

65 clocks on the phone and the laptop. Based on the study of Martinovic et al.[109] and Neupane et al.[117], we epochize the signals with 938 ms window which starts at 469 ms before a touch event and 469 ms after the event. We chose this window size as it provides the best results during our analyses. Similar to the previous works [117, 109], we also consider the window before the touch event because participants know beforehand which action they will perform; so the stimuli session actually starts before the event is recorded.

Data Preprocessing. Neural activities of human involve a huge number of neuronal-membrane potentials. EEG records the voltage change of cerebral tissues and the state of brain function.

However, these signals are weak, non-stationary and nonlinear in nature [11]. For this reason, EEG signals can easily be contaminated by external noises like the frequency of the power supply and noise generated by the human body, such as eye movements, eye blinks, cardiac signals, muscles noise, etc. The most significant and common artifact produced by eye movements and blinks is known as electrooculogram (EOG). Electromyography (EMG) is another type of contaminating artifact, which is a measurement of the electrical activity in muscles as a byproduct of contraction. EMG artifacts are much more complex than EOG artifacts due to the movement of muscles, particularly those of the neck, face, and scalp. Both EMG and EOG seriously degrade the extraction of the EEG signals and lead to incorrect analyses. Hence they must be removed from the raw data. Similar to previous work [7, 141], we used the AAR (Automatic Artifact Removal) toolbox [65], which utilizes the Blind

Source Separation (BSS) algorithm to remove both EOG and EMG [84]. After removing the EOG and EMG artifacts, we applied an 8th order Butterworth band pass filter with a cutoff frequency of

3-60 Hz to remove all other useless signals. The band pass filter keeps signals within the specified frequency range and rejects the rest. The selected frequency range covers all five major frequency

66 bands in EEG signal, namely delta (0.1 to 4 Hz), theta (4.5 to 8 Hz), alpha (8.5 to 12 Hz), beta (12.5 to 36 Hz), and gamma (36.5 Hz and higher) [45]. This preprocessing step extracts quality signals with good SNR (signal-to-noise-ratio).

ICA. Independent Component Analysis (ICA) is standard method to recover original signals from known observations where each observation is an unknown mixture of the original signals.

EEG device has 14 electrodes for receiving the brain signals from different regions of the brain.

Typically, each sensor will receive signals from a mixture of regions. ICA can be applied to separate independent sources from a set of simultaneously received signals from different regions of human brain [78, 79, 175]. In this study, we used ICA to separate multi-channel EEG data into independent sources.

Feature Extraction. The features from neural signals are extracted using autoregressive (AR) model. This model is a popular feature extraction method for biological signals, especially for time series data [27]. It can estimate the current values x(t) of a time series from the previous x(t − 1) observations of the same time series. The current term x(t) of the series can be estimated by a linear weighted sum of previous term x(t − 1). A generic formula for representing the time series data (e.g.,

EEG) is n x(t) = ∑ αix(t − i) + e(t) (4.1) i=1

Where αi, is weight which also known as the autoregressive coefficients, x(t) is the EEG signal, and n is the order of the model, indicating the number of previous data points used for estimation. e(t) is called noise or residual term which is assumed to be Gaussian white noise. x(t)

measured in time period t.

The selection of order in AR is the crucial step for getting a successful application. We

67 chose AR order six like previous studies [10, 117, 128]. All these studies used the 128Hz Emotiv

EPOC device. We calculated AR coefficients using the Yule-Walker method [52]. We consider all

14 channels data for our analysis. Therefore, six AR coefficients were obtained for each electrode channel, resulting in 84 (14x6) features for each action of data. The total process of extracting feature applied all the actions for both of the experiments.

Classification Models and Evaluation Metrics. In this study, we used random forest (RF) [22] because our extracted features (autoregressive coefficients) are suitable for RF algorithms [58, 15].

For implementation, we used the Weka classification software package [69].

We evaluate IAC using the weighted average of Precision, Recall and F − Measure.A higher weighted average Precision value indicates less false positives (i.e.incorrectly authorize access to sensitive data and sensors). A higher weighted average Recall value indicate less false negatives (i.e.unnecessarily prompt users for authorization). The weighted average F − Measure is the weighted average of Precision and Recall which takes both false positives and false negatives into account and gives the balance of our machine learning model. Finally, we used k-fold cross validation to validate our results, where k = 10. This is a broadly used technique for calculating test accuracy in the classification problem for small sample which can prevent overfitting. The goal of our study is to train a classifier which can be used to predict user’s intent based on features that extracted using earlier step.

4.6 Feasibility Test

In this section, we aim to answer the research questions through analyzing the data we collected from the two different experiments described in §4.4. We start from Q1—is it possible to

68 distinguish the three high-level intent based on neural signals using machine learning algorithm.

Table 4.2: Classification result of global model. Metrics Precision Recall F − Measure 70.70% 70.70% 70.70%

4.6.1 Single App Analysis

Recall that our single app experiment includes 5 sessions for each participant, where each session includes 10 sets of tasks and each task set includes 4 actions. Therefore, each participant has

50 instances per action (5 sessions x 10 task sets). In total, we have 1650 instances (50 instances x

33 users) per action from all 33 participants in the single app experiment. We then extracted features from these instances using the methodology discussed in §4.5 and labeled the feature vectors with the following four actions as classes:

• Camera for the task of taking photo action,

• Video for the task of taking video action,

• Gallery for the task of choosing a photo from gallery, and

• Cancel for canceling the pop-up.

Global Model. In this model, we consider dataset of all the users with all the sessions. We have total 6600 (1650 instances x 4 actions) ERP events for this model. The experiment results of this model are shown in Table 4.2. As shown in the table, the weighted average of Precision is 70.70%.

This implies that our IAC can correctly detect human intention for 70.70% of time, which is not very good for automated authorization. The reason behind this relatively low accuracy is that even for the

69 Figure 4.5: Boxplot of Precision, Recall, and F − measure of individual model. The red line indicates the median value and + symbol indicates the outliers. same task, different people are likely to have different ERPs patterns, which actually has been used to build authentication systems [10, 174]. For this reason, we would like to know how the classifier performs when only consider actions belong to the same participant.

Individual Model. In the individual model, we train and test the model with data from a single user across all sessions of single app experiment. The results for the individual model are reported in Figure 4.5. Overall, the results were much better than when considering all segments across all participants (i.e., the global model). From the boxblot, we observed that the median of weighted average of Precision and Recall are 99.50% and 99.50%, respectively. The median of weighted average F − measure is 99.50% also. These results imply that IAC correctly detect human intent for

99.50% of the time. The results also indicate that IAC works well when the ML model is trained and tested with a single user and a single app.

4.6.2 Cross-app Portability Analysis

Through the single app experiment, we partially verified that it is possible to infer users’ high-level intents based on their brain signals. In terms of app context, this implies that our classifier can distinguish different app contexts. However, since it only involves one app, the remaining

70 questions is: can the learned model work across different apps? That is, in terms of app context, we want to know whether our classifier can identify similar context from different apps (i.e., cross-app portability).

We answer this question using the multiple real-world apps experiment where 8 participants interacted with 8 real world apps with a duration of 21 minutes on average. However, we had to discard 3 participants data due to the device error caused data loss. So we only consider those 5 participants whose data is sufficient. On average, the 5 participants performed 22 actions for video,

47 actions for camera, and 27 actions for gallery. In total, we have 484 ERPs from 5 users.

Because these 5 participants have not participated in the single app experiment, this experiment resembles a more practical scenario. With this setup, we have two options to bootstrap the individual model: (1) we can start with an empty model can completely rely on the feedback loop

(in Figure 4.1) to collect training data; or (2) we start with a half-baked model and use the feedback loop to improve it. In this experiment, we chose the second option as it requires less training and the global model we tested in §4.6.1 still showed reasonable accuracy.

With Initial Model. We used the global model learned from all participants in the single app experiment as the initial model (i.e., train the model with all data in the single app experiment) and tested it with all data collected from the multiple app experiments. The classification results of Precision, Recall, and F − measure of the initial model are presented in the first bar diagram in Figure 4.6. From this figure, we can observe that we can only correctly infer the user intention with the precision of 43.16%.

Adding Feedback Loop. When we gradually add new training intents collected from the user when he/she is using real world apps, the improvement on Precision, Recall, and F − measure are shown

71 Figure 4.6: How classification metrics varies with the number of seen intents? The first bar represents the Precision, Recall, and F − measure without adding any new intents from multiple real world apps experiment to the global model from single app experiment. The second bar represents results with adding new intents to the global model, The third bar represents the results after adding two intents to the global model, and so on and so forth. We observed the upward trends of Precision, Recall, and F − measure with the addition of more new intents to the global model. in Figure 4.6. All newly added intents were from the multiple app experiment and we have to stop at

5 so we can have enough data for the testing phase. As we can see, after adding 5 intents from real world apps, the weighted average Precision improved from 43.16% to 88.34%, the weighted average

Recall improved from 39.82% to 86.52%, and the weighted average F − measure improved from

38.94% to 86.92%. The results imply that in real world context, IAC can correctly infer the user intention 86.92% of time by adding only 5 intents to re-train the ML model. Again, the precision is expected to continue improving and the only reason we stop at 5 is due to lack of data.

4.6.3 Results Analysis

Based on the classification results from above experiments, we decided to accept our hypothesis. That is, it is possible to identify high-level intents based on neural signals using a machine learning algorithm. In terms of app context, our classifier can both distinguish different contexts from the same app and identify similar contexts from different apps. Hence, the answer to

Q1 is positive.

72 4.6.4 Authorization Accuracy

In the above analysis, we have shown that it is possible to identify user’s high-level intent through the brain-computer interface. However, whether the classification result can be used for automated access authorization for user-owned sensitive sensors and resources still faces the question: is it accurate enough (Q2). In this subsection, we analyze the classification results to answer this question. From the analysis of multiple app experiment data, we observed that our classifier can achieve a weighted average of Precision 88.34% with the weighted average of F − measure 86.92% for the completely unknown scenarios. Based on this, we think the answer to Q2 is positive.

4.7 Discussion

IAC and Contextual Integrity. Access control system is a mechanism to protect user’s privacy.

Modern OS, including Android (M+), iOS, and Windows (8+) uses an ask-on-first-use permission system to guard access to sensitive data and sensors. This approach provides some context cues but only at the first time when the permission is requested. Researchers have argued that permission should be requested under the context that matches user’s expectations, i.e., contextual integrity [122].

IAC enforces contextual integrity in the way that user would only have an intent in her mind when the context is relevant to the intent. In other word, if an app violates contextual integrity, then the user will not express the intent and IAC will block the access.

Learning Strategy. As demonstrated in §4.6, the classification accuracy can vary based on the learning strategy. Overall, since different people may exhibit different brain signals even when thinking about the same thing (which has been used for neural-signal-based authentication); it is preferable to use individual models. However, bootstrapping such a model require users to go

73 through a calibration phase. An alternative approach, as used in [180] and our own experiment, is to use a half-baked model (e.g., the generalized model learned from all participants in the single app experiment), then personalized it by adding feedbacks from explicit prompts, especially for newly installed apps. Once the model has seen enough feedback, we can start using it to make real authorization decisions. Our multiple app experiment has partially validated the effectiveness of this strategy.

Limitations. Similar to other previous studies on BCI [109, 118], our study also has several limitations. First, the study was conducted in the controlled environment so whether unwanted artifacts like EOG and EMG can be reliably removed in an uncontrolled environment is still unclear.

However, since this is a common problem for BCI, we believe future techniques will be able to address it. Second, despite that our sample set is relatively larger (41 participants) than previous studies (e.g., 5 participants [10, 127], 9 participants [106], 16 participants [12]) and have diverse demography background, it is still much smaller than data set in other machine learning applications, such as computer vision, voice recognition, and natural language processing. Third, we used only popular apps for testing our feasibility and the number of apps is only 8. This could be a bias scenario as participants are more familiar with popular apps. Finally, our classifier is likely to be vulnerable to phishing-style attacks. That is, similar to following our instructions to perform actions that would allow an app to access protected resources, a phishing-style attack might also be able to trick users into willing to perform operations that would compromise the security and privacy of their data.

4.8 Related Work

In this section, we briefly discuss related work on neural signals and permission model.

74 BCI-based security studies. Neural signals have used for user authentication [34, 83, 174, 116] and identification [138, 188]. Ashby et al.[10] proposed an EEG-based authentication system using a consumer grade 14-sensor Emotiv Epoc headset. Abdullah et al.[5] discussed the possibility of the EEG-based biometric system using 4 or fewer electrodes. Chuang et al.[34] developed a user authentication model using one single-sensor Neurosky headset. Campbell et al.[25] developed a neurophone which is based upon ERP of brain signal. They implemented a brain-controlled address book dialing app, which shows a sequence of photos of contacts from address book to users.

Thorpe et al.[174] suggested pass-thoughts to authenticate users. In their study, they used EEG signal to replace password typing. The EEG-based authentication system overcomes the weakness of current authentication protocol which suffers from several types of attacks including dictionary attack, password guessing, etc. However, there are some drawbacks to this approach like non-pervasiveness of EEG equipment and lack of feedback to the users during the authentication process.

Exposing user’s neural signals to third-party apps via the brain computer interfaces in- troduced new security and privacy issues [20, 109, 59, 117]. Martinovic et al.[109] introduced a side-channel attack which they referred to as ”brain spyware” using commercially available headset

Emotiv EPOC. The authors extracted private information like familiar banks, ATMs, PIN digits, and month of birth using only brain signal. Their work is similar to Guilty-KnowledgeTest (GKT) [139] where familiar items evoked a different response than unfamiliar items. In their experiment, users are shown images of banks, digits, known people images. The users’ ERP responses will be different for their very known banks as that information stored their memory beforehand. However, their attack is intrusive and can be easily detectable as the users may notice the abnormality in the application when it displays some of their familiar information sequentially. Frank et al.[59] proposed a subliminal

75 attack in which attacker can learn relevant private information from the victim at the levels below his cognitive perception. Bonaci et al.[20] showed how non-invasive BCI platforms used in games or web navigation, can be misused to extract user’s private information. Neupane et al.[117] showed the feasibility of stealing users’ PIN from their brain signals.

Runtime Permission Models. Requesting access to sensitive resources at runtime—the moment they will be used provide more context information thus can help users better understanding the nature of these requests and make more optimal decisions [55]. The challenge is how to avoid habituation caused by high frequency of resource access [179].

User-driven access control. The first approach to reduce the number of prompts is to automatically authorize the requests based on users’ intent. Existing user-driven access control systems [150, 148, 82, 101, 114, 124, 135] all utilize the same way to infer the intent—by capturing authentic user interaction with trusted GUI gadgets (i.e., access control gadgets), e.g., the “camera” button. Our approach also tries to infer the intent of an user. However, as we directly infer the intent from the neural signals, our system is not vulnerable to GUI attacks [76, 135] thus do not require additional protection for GUI gadgets. Please note that although we only used user-initiated actions in our experiment, unlike existing user-driven access control systems, our approach is not limited to user-initiated events. Because any external stimulus, including viewing an app’s foreground GUI context can be used to create event-related potentials (ERPs) and drive our system.

Decision prediction. The second approach is to use machine learning (ML) to predict users’ privacy decisions [180, 123, 179, 98]. Liuet al.[98] proposed using user’s answers to a few privacy related questions to build a personalized privacy profile. They then create a Privacy Assistant that offer recommendations for future permission settings based on the profile, apps category, requested

76 permission, and purposes associated with the permission. While they found that 78.8% of the recommendations were adopted by users, the biggest limitation is that they used the ask-on-install model so the recommendations were made without considering context. Recognizing the importance of context integrity, Wijesekera et al.[179] pioneered the work on predicting user’s privacy decisions based on the context. In their first attempt, they used a one-size-fits-all logistic regression model which can provide 40%-60% better accuracy than random guessing. In [180], they further extended this idea by building a SVM-based classifier based on when context has changed and user’s past decisions and behavior. This new approach improved the accuracy to 96.8% across all users. However, the accuracy drops to 80% among users who truly make different decisions based on context. Around the same time, Olejnik et al.[123] also propose using context information and ML technique to predict user’s privacy decisions. In this work, they used 32 raw contextual features (e.g., app name, foreground app, method, time, semantic location) to train a linear regression model based on users’ previous decisions under different contexts. The mean correct classification rate of their model is

80%. Our approach also relies on ML techniques and our learning strategy is very close to [180].

However, instead of trying to encode context as a set of features to the ML techniques, we rely on users to interpret the context and aim to infer what they want to do under the given context.

4.9 Chapter Conclusion

In this work, we proposed a new direction to protect user-owned, security and privacy sensitive sensors and resources—by inferring user’s intents and use it to automate authorization decisions. As a first step, we studied the feasibility of leveraging the brain-computer interface to infer the intents. Our experiment with 41 participants showed that neural signals can be utilized to

77 train a machine learning classifier to recognize high-level intents like taking a photo. The accuracy of the classifier was also good enough for this security and privacy sensitive task.

78 Chapter 5

Augmenting Training Performance by

Adding Neural Signals into the Adaptive

Feedback Loop

Adaptive training is one kind of training regimen that adjusts the task difficulty based on the user’s past behavior. Adaptive training has been shown to improve human performance in attention, working memory capacity, and motor control tasks. Additionally, a correlation was identified between spectral features (4-13 Hz) and the performance of cognitive tasks. Here, we anticipated that adding a neural measure into a behavioral-based adaptive training system would improve human performance on a subsequent transfer task. To test this, we designed, developed, and conducted a study of 44 participants comparing three training regimens: Single Item Fixed Difficulty

(SIFD), Behaviorally Adaptive Training (BAT), and Combined Adaptive Training (CAT). Results showed a statistically significant transfer task performance advantage of the CAT-based system

79 relative to SIFD and BAT systems of 6% and 9%, respectively. Our research shows a promising pathway for designing closed-loop BCI systems based on both users’ behavioral performance and neural signals for augmenting human performance.

5.1 Introduction

Training is a systematic approach to acquiring skills that improve performance in a task of interest. There are two types of training regimens: fixed training and adaptive training. A key assumption of training is that for any given skill level, there exists a difficulty of training that will provide the largest skill gains [105]. In a fixed training regimen, the training task difficulty is fixed, so that training will be optimal only for individuals within a narrow range of skill. In adaptive training, the difficulty of the training is varied in an attempt to keep it within the optimal range for the trainee.

A common approach is to adapt difficulty based on past behavioral performance. A be- havioral based adaptive system outperformed a non-adaptive training in several cognitive tasks, particularly those improving attention [42, 146], working memory capacity [75, 81, 86, 90], and motor control [32]. However, behavioral data might be an incomplete indicator of optimal diffi- culty. Neurophysiological measures can provide additional information to adaptive systems, and electroencephalography (EEG) is one non-invasive way of measuring electrical potentials of the brain.

The motivation of using EEG measurement instead of using other modalities (e.g, GSR, eye movements, HR, and facial temperature distribution) have several reasons. First, EEG hardware is comparably cheap, has high temporal resolution and can detect brain responses within milliseconds of the stimulus presentation [154]. Second, though it is difficult to find best physiological indicators

80 of workload, many studies showed EEG gives promising workload measurement than other indic- ators [168, 74]. Third, other modalities certainly have their costs and benefits; some users might prefer not to have a camera pointed at their face (eye movements, thermal imaging), and peripheral recording (GSR, HR) might not be specific enough to cognitive workload. But regardless, there’s no reason these could not be combined in some future application. Additionally, EEG is being incorporated into new V/AR headsets, and big companies like Facebook are working to integrate

BCI-based technology. Our technology can be smoothly integrated with these kinds of headsets in near future.

Measures derived from EEG have been shown to correlate with task performance [44, 161,

110]. More specifically, theta/alpha ratio (TAR) is one neural measure that showed promising results in closed-loop feedback systems for learning and several other cognitive tasks [49, 145, 38, 145, 50,

91]. Because of the encouraging results of TAR in several cognitive tasks, here, we incorporated

TAR with behavioral task performance to build a novel adaptive system. Behavioral features give insight into the relationship between cognitive workload and performance from one angle while neural measures may provide insight from another angle. Combining both features may yield a more accurate predictor of cognitive workload and its effects on performance. In this study, we explore the feasibility of adding neural features into behaviorally adaptive training to boost transfer task performance. Specifically, we aim to answer the following question:

Research Question Will adding neural measures into a behaviorally adaptive training system im-

prove human performance?

To answer the above question, we designed, developed, and conducted a user study that compared three training regimens: Single Item Fixed Difficulty (SIFD), Behaviorally Adaptive

81 Training (BAT), and Combined Adaptive Training (CAT). The training methodologies differ only in how and whether the training difficulty changes. For the SIFD condition, task difficulty was fixed at the easiest level regardless of performance. For the BAT condition, task difficulty was varied based on the user’s behavioral performance. For the CAT condition, task difficulty was varied based on both behavioral performance and neural measures.

Our results suggest that adding neural measure together with behavioral performance criteria leads to better learning. Results showed a statistically significant transfer task performance advantage of the CAT-based system relative to SIFD and BAT systems of 6% and 9%, respectively.

These findings illustrate the promise of combining neural and behavioral features in practical applications of adaptive training.

In summary, this study makes the following contributions:

• We designed and developed a novel methodology that combined behavioral and neural feature

to change the difficulty in adaptive training.

• We demonstrated the effectiveness of newly designed system by collecting data from 44 par-

ticipants in three different training systems. We found a statistically significant performance

improvement on a transfer task in the CAT system relative to the other two counterparts.

• The findings of this study suggest a promising pathway for designing practical interactive

systems based on users’ behavioral performance and neural signals.

82 5.2 Background

Here we review some past work on fixed versus adaptive training and discuss the implic- ations of these approaches in various learning contexts such as classrooms, and traditional work environments. We also discuss related work that has attempted to incorporate Brain Computer

Interfaces (BCI) into adaptive training. Then, we discuss the potential effectiveness of an adaptive training technology based on the theta/alpha ratio.

5.2.1 Training

Training broadly categorizes into two regimens. One regimen is called fixed training in which the training difficulty level is fixed at some pre-specified level. The other, called adaptive training (AT), is a methodology in which the difficulty of the task is varied as a function of user’s performance. A schematic diagram of an adaptive training system is shown in Figure 5.1. According to Kelly et al., [87] there are three core elements in the adaptive system that all must be carefully chosen to achieve an effective system: the performance metric, the adaptive variable, and the adaptive logic. The adaptive variable is the property of the system that can be adjusted. Typically the adaptive variable is chosen for its relationship to the difficulty of the task or training. The adaptive logic is a decision algorithm that recommends changes to the adaptive variable based on the performance metric. Frequently, adaptive training includes a closed-loop feedback system in which task difficulty is automatically adjusted based on the trainee’s performance. The main goal of the adaptive system is to set the level of task difficulty to the level that maintains the optimal level of learning for the trainee. The impact of AT over fixed training difficulty has been studied in the past, particularly in attention [42, 146], working memory capacity [75, 81, 86, 90], and motor control [32]. Previous

83 Figure 5.1: Schematic diagram of an adaptive training system. A task is presented to a trainee, and the trainee’s score on that task then informs the adaptive logic which subsequently modifies the task. This is BAT. With the inclusion of neural signals (dotted line paths), the trainee’s EEG measurements are also fed into the adaptive logic, which then modifies the task. This is CAT. Removing the adaptive logic would yield a fixed training regimen. studies [67, 68, 151] showed that AT is more effective compared to non-adaptive counterparts.

In previous works, adaptive instructional systems [132, 133, 169] performed better than non-adaptive or control conditions. Romero et al. [151] reported that students who utilized adaptive training for learning the theory and content of cardiac life support performed better in computerized tests than students with non-adaptive content learning. Tennyson et al. [170] found that students performed well when the number of concepts presented to the students depended on their performance rather than being fixed at all times. Earlier research [9, 18] showed that adaptive feedback in human tutoring systems generally promoted learning. Here, the adaptive tutoring system changed the difficulty based on the student’s test scores, error rates, and success rates [31, 176]. Holmes et al. [75] developed an adaptive system for children to train a variety of working memory tasks in a computerized gaming environment. Raybourn and colleagues [144] proposed an adaptive training system based on a simulation experience in which participants received real-time in-game performance feedback. Flegal et at [57] conducted a behaviorally adaptive training study and found that the adaptive task difficulty improved training. There are numerous examples of AT systems that

84 outperform non-adaptive systems. However, behavioral performance is not the only performance metric on which to base adaptive systems. Next, we review work that incorporates neural measures into closed-loop systems.

5.2.2 Closed-loop BCI

Incorporating neural signals into adaptive training systems may allow us to improve adaptive training beyond what can be achieved by incorporating behavioral measures (e.g., scores) alone. In CAT, we can modify the training environment or task in response to the trainee’s neural signals, which may be indicators of relevant internal states that are not otherwise rapidly measurable.

Electroencephalography (EEG) is a monitoring technique that measures the brain’s electrical activities.

For example, research has found a strong correlation between workload estimated from brain signals and task performance [44, 161, 110].

In the current study, we used closed-loop BCI for training. BCI’s are technologies that interact between the brain and machines, where the brain signal collected by sensors invasively or non-invasively are decoded into a digital command for the machine. Over the years, we have seen

BCI-based diverse applications from controlling a prosthetic arm [24], to playing video games [88], controlling a cursor [53] to spelling words [129]. Some other BCI-based applications include drone control [93] and security applications have also used BCI, including authentication [112], phishing detection [140, 118], and access control [142].

5.2.3 Theta/Alpha Ratio (TAR)

Previous studies [64, 63, 162, 97] found that alpha power inhibits the active condition such as alpha power increases, focus and attention decrease. Theta power has been observed to fluctuate in

85 the opposite direction in some cases so that a positive relationship between theta amplitude and focus exists. These two spectral bands fluctuate to varying degrees based on specific situations. An obvious approach would be to combine these two neural features as a ratio since they demonstrate negative

(denominator) and positive (numerator) relationships with the behavior of interest (focus). Based on these findings, we chose to use the ratio of theta over alpha as our neural measure. Theta/alpha ratio has been used in several studies [38, 145, 49, 50, 91]. TAR varies substantially from person to person based on the conductivity of their scalp, their anatomy, and their neural patterns, so here we look at how TAR changes over time rather than its absolute value.

5.3 Design of the Experiments

The goal of this study was to measure the effectiveness of adding neural measures within a closed-loop BCI into adaptive training. To achieve this goal, we designed three variants of a common training task, along with a shared transfer task.

5.3.1 Go/No-Go Training Task

The training task involved presentation of images that were designated threats (character holding a gun) vs. non-threats (character without a gun). The participant’s task was to respond to threats by pressing a button and to respond to non-threats by withholding a button press (go/no-go paradigm). This training was meant to improve inhibitory control in speeded perceptual classification of threats and non-threats. We designed our training task experiment based on a previous study [56].

The number of character(s) shown on the screen during each trial was dependent upon the training condition. For SIFD, there was only character in each trial. For BAT and CAT, there were up to

86 Figure 5.2: A flow chart of the training experiment.

5 characters depending on the adaptive logic for each trial. For any difficulty level of go trial, all character(s) are threats. For the no-go trial, only one character was a threat regardless of difficulty level. The characters were presented in random non-overlapping locations on the screen. The characters were computer-rendered images isolated from any background. The threat character was a male character holding a rifle and his face partially covered. The non-threat character was in similar attire but without a rifle and his face uncovered.

There were 20 blocks in the training task. Each block consisted of 30 trials. Thus, there was a total of 600 trials per participant in the training task. The ratio of go and no-go trials was

4:1. In each block, there were 6 no-go trials and 24 go trials. We pseudo-randomized the order of go/no-go trials with the constraint that there were no more than 7 go trials in between two no-go trials. Each stimulus was presented for 0.4s on the screen. The participant had to respond within 1s after stimulus onset to register a response within the system. The trial feedback was given for 0.5s after the response deadline. The duration of each trial was 1.5s with a variable (uniform distribution) inter-trial interval of 1.0 - 2.0s. The process flow diagram of the experiment is shown in Figure 5.2.

In our experiment, a participant had to identify threats as quickly as possible by pressing a button using their dominant hand for threats (Go) and by withholding a button press (i.e., doing nothing) for a non-threat (No-Go). To measure participant performance, each trial was scored.

87 There were up to 60 points assigned for go trials and 180 points assigned for no-go trials, with the participant earning more points for a faster response in go trials. They got the full 60 points if they respond within 170ms, after which the points gradually decrease based on a piecewise-linear function of response time. The minimum 30 points was assigned for responses between 0.5s and

1.0s. The participant got 180 points for correctly withholding a response for non-threat trial. No points were awarded if they responded after the deadline or responded incorrectly. We provided feedback indicating whether or not the participant was correct at the end of each trial by displaying a green check mark for a correct response and a red X for an incorrect response. The number of points received was not shown to the participant. The response marks for non-threat trials were four times larger than those for threat trials.

From this common base, we built three versions of the training that differed in whether and when the difficulty of training changed. One used a single fixed difficulty, one adapted based only on behavior, and the third adapted based on a combination of behavioral and neural measures.

The details of these versions follow. Single-Item Fixed Difficulty (SIFD). In the SIFD condition, task difficulty was fixed at the easiest level (no more than one character on the screen at any given time), regardless of the user’s performance. SIFD is a non-adaptive training methodology. The SIFD condition described here was included in the analysis of the two primary conditions of interest (CAT and BAT) as the data from that condition had been previously acquired from another experiment focusing on feedback framing. This condition represented a conventional approach to oddball training, and it serves as a reasonable alternative to a no-contact control. Other controls could be used in future work; however, we believe that the SFID condition presents an adequate baseline to which we can compare performance.

88 BAT Behavioral→ S<1500 1500<=S<=1700 S>1700 - = + CAT Behavioral→ S<1500 1500<=S<=1700 S>1700 Neural↓ TARP<-5 - - = -5<=TARP<=5 = = = TARP>5 = + + Table 5.1: Top: Behavioral based adaptive logic table for the BAT system. Bottom: Behavioral and neural based adaptive logic table for the CAT system. The difficulty level increased (+) by one level, decreased (-) by one level, or remained the same (=).

Behavioral Adaptive Training (BAT). In the BAT condition, we changed the number of characters shown on the screen during a trial based on the user’s score on the preceding block. We followed the adaptive logic shown in Table 5.1 (Top). We acknowledge the point thresholds used in BAT are somewhat ad-hoc. The lower threshold (1500) was set just above the score a participant would get from just pushing the button as fast as possible, ignoring the actual stimuli (1460). The upper threshold was just below average performance on the SIFD (1760). We reasoned that adding stimuli would make the task more difficult, so performing at or above the average on the easiest condition would warrant increasing difficulty. The difficulty level was increased by one if the user scored above

1700. The difficulty level decreased by one if the user scored below 1500. The difficulty level stayed the same if the user scored in between 1500 and 1700. The participants were not shown their scores, but they were instructed to respond as quickly and accurately as they could.

Combined Adaptive Training (CAT). In the CAT condition, task difficulty was varied based on both the behavioral scores and EEG values. The neural point thresholds of -5% and +5% used in

CAT are also somewhat ad-hoc and based on pilot data from two users. The difficulty level was adjusted based on the adaptive logic presented in Table 5.1 (Bottom). This scheme incorporates both

89 the neural and the behavioral measure, but it gives more weight to the neural measure. The adaptive logic was applied after the second block, where we got the first percentage of change of TAR value.

5.3.2 Target Identification Transfer Task

After training, participants engaged in a transfer task that has been used previously [113,

56]. This transfer task involves the same perceptual judgment of threat and non-threat as in training, but it takes place in a naturalistic context and with both trained and untrained stimuli. The context was a simulated 10-minute ride in a vehicle in a semi-realistic virtual dessert village environment.

The untrained stimuli were tables with or without a cloth obscuring beneath the table.

Unobscured tables were non-threats, and obscured tables were threats, because they might conceal a threat.

The transfer task stimuli were static 3D models added to the environment. There were 200 stimuli belonging to the four stimulus categories. The stimuli appeared in randomized order. The stimulus distance from the center was randomized. The vehicle speed was fixed. The participants saw each stimulus over a range of sizes and angles. We added an additional level of difficulty by displaying an intermittent fog (30s to 2min) five times throughout the transfer task. We included the table in the transfer task in the hope that we could infer whether training increases accuracy in a stimulus-specific or a non-specific stimulus manner. During the task, one of the four stimuli appeared on the screen randomly and stayed for 1s. The inter-stimulus interval was uniformly distributed within 1s - 3s. The participants had to press either one of two buttons within 1s of respond window.

They responded to threats (Human, Table) stimuli with their dominant hand (based on the Edinburgh

Handedness Inventory test) and to non-threats (Human, Table) stimuli with their other hand. We provided feedback with a green letter Y for the correct response, red letter N for incorrect response,

90 and white letters OO for non-response. We evaluated users’ performance by measuring the accuracy of detecting the objects (threat/non-threat) within the transfer task.

5.4 Methods

In this section, we describe the participants recruitment, apparatus, design and procedure for conducting the experiment.

5.4.1 Participants

We recruited 68 healthy adult participants via Craigslist. Subjects were screened for normal or corrected-to-normal binocular vision (minimum of 20/40 acuity) using a standard Snellen chart and for color vision using a standard Ishihara 14-plate color test. Individuals were excluded if they reported a tendency for motion sickness or any brain-related diseases. We had to discard data from

12 participants (SIFD = 2, BAT = 6, CAT = 4) due to technical problems with the equipment, 1 for failing the vision tests, and 11 additional participants (SIFD = 2, BAT = 6, CAT = 3) who failed to respond to transfer task stimuli more than 40% of times. This left 44 total participants: 22 (M 13, F

9, mean age 29.40, SD 11.6) in the SIFD condition, 10 (M 4, F 6, mean age 33.4, SD 8.7) in the BAT condition, and 12 (M 5, F 7, mean age 29.83, SD 5.5) in the CAT condition. All participants gave voluntary, fully informed, and written consent to participate in our study.

5.4.2 Apparatus

Dell Ultra Sharp 24" Desktop Monitor. We used a 24" standard desktop monitor, keyboard, and mouse to execute the tasks for our study. It was built and manufactured by Dell.

91 ActiveTwo Biosemi System. We recorded EEG using an ActiveTwo (called ActiView) Biosemi system (BioSemi, Amsterdam, The Netherlands), which is a research-grade multichannel, high- resolution biopotential measurement device. It has an electrode cap with 64 pre-amplified, active surface electrodes. The sensors were placed into the cap holes in accordance with the standardized international 10-20 electrode placement system (Jurcak, Tsuzuki, & Dan, 2007). A water-soluble, saline gel was inserted into the cap holes before placing the electrodes to ensure better connectivity with the scalp surface and electrodes. There are four additional sensors placed onto the skin for monitoring EOG related eye movements. The Common Mode Sense (CMS) electrode and Driven

Right Leg (DRL) electrode served as a ground. The sampling rate for our experiment was 512Hz.

The EEG signals captured through active electrodes and were then converted to digital format using an AD-box (Analog to Digital box). The digitized data were then sent to a PC/Laptop. The Biosemi provides a proprietary software interface named called ActiView for data processing.

Instrument Control Toolbox. We used an Instrument Control Toolbox for TCP/IP communication between Matlab instance in stimlus PC and ActiView in Biosemi data recording PC. Here, ActiView software configured as TCP/IP server and one of the Matlab instance in stimulus PC configured as TCP/IP client. The Matlab program parsed incoming data packets, and stored the data from all channels.

5.4.3 Procedures

Preparation Session. In this step, the devices were prepared for starting the training and testing session. Participants were fit with a 64-channel cap using a modified 10-10 electrode placement with external electrodes placed at the external canthi of the eyes, above and below the left eye, and

92 Figure 5.3: A demo participant is wearing Biosemi electrode cap. The stimulus was presented on the monitor. The trainee pressed the RT box key to respond to the stimulus. The EEG signals collected by electrodes and converted to the digital format by AD-box. The digitized signal then received in ActiveTwo software interface through USB2 cable. We updated the difficulty of task based on the neural and behavioral response after each block.

93 on the mastoids. Once the cap was on the participant’s head, the participant was asked to fill out the questionnaires on a laptop. The operator was gelling each electrode while the participant was completing the questionnaire. The participant was then taken into a soundproof room ( Figure 5.3) after the completion of sensors preparation and questionnaire. The operator then calibrated the EEG laptop in ActiView software interface. Electrode offsets were maintained at or below 30 uV. The participant was told to read all the on-screen instructions carefully during the whole duration of the experiment. The participant was instructed to maintain consistent eye distance to the monitor and to keep their legs uncrossed and both feet on the floor. The participant had the chance to ask any questions for clarification. The operator then closed the soundproof room door and turned on a video camera for observation and a microphone to speak with the participant if needed.

System Setup. We followed the technical block diagram (Figure 5.4) for the experiment setup. We used two computers for the experiment: the stimulus computer and Biosemi data recording computer.

Two Matlab instances were running on the stimulus computer. One instance was for displaying the stimulus, and other instances were for communicating with Biosemi ActiView via TCP/IP connection.

We used PsychToolBox (PTB) as our stimulus presentation software. During the training task, PTB sends the stimulus onset trigger signals to the Biosemi equipment simultaneously using Biosemi

USB Trigger Interface Cable. The delay of Biosemi USB Trigger Interface Cable is less than 200 microseconds. A photodiode was installed into the bottom left corner on the stimulus onset computer for accurate timing of stimulus onset. The participant responded to the go task using an RT-box installed connected with stimulus onset computer.

Task Execution Session. After the preparation session, the training task started and lasted for an average of 30 minutes followed by the participant filling out a questionnaire. The transfer task started

94 Figure 5.4: CAT technical block diagram. with the completion of that questionnaire set. The duration of the transfer task was always fixed at 10 minutes. By the end of the transfer task, there was an exit questionnaire.

5.4.4 EEG Data Processing

We recorded EEG data using the 64-channel high-end Biosemi data acquisition system and used MATLAB (version- 9.1.0.441655 R2016b) to analyze the EEG data. The sampling rate was 512

Hz. We received the continuous EEG data from the ActiView Biosemi software device over TCP/IP channel. We synchronized the stimulus and its EEG recording using a photo-diode in the USB2 receiver. We performed a column-wise Fast Fourier Transform across all 64 channels to calculate the theta (3-7 Hz) and alpha (8-12 Hz) power of the EEG in the 1s following the stimulus onset. We then calculated the ratio of theta over alpha for each trial and averaged after each block (30 trials). We then calculated the percentage change in theta/alpha ratio from the previous block. We computed theta and alpha power for each electrode for 1s of EEG data per trial. Whole-head theta and alpha were computed as the average over all electrodes. For each trial, the ratio of whole-head theta to

95 whole-head alpha was computed, and these single trial TAR values were averaged over the 30 trials in a block. We then calculated the percentage change in TAR from the previous block. We refer to this theta/alpha ratio percentage change as TARP.

5.5 Results

The main goal of this study was to measure the effectiveness of adding a neural feature into adaptive training. For evaluating the effectiveness of any training regimen, transfer task performance is one of the key criteria. Hence, we evaluated the effectiveness of our training regimens by measuring the accuracy (percentage of the correct response) of the post-training transfer task. In this study, we had three separate training tasks (one for each condition) along with a shared transfer task. Average transfer task accuracy for each condition is shown in Figure 5.5. The average transfer task accuracy

Figure 5.5: Average transfer task performance for the three training conditions. The combined adaptive training (CAT) resulted in better transfer task performance compared to behavior adaptive training (BAT) and fixed difficulty (SIFD) conditions. The large dots show the sample average, and the small dots show individual data points. Error bars show 95% confidence intervals from bias corrected, accelerated percentile bootstrap.

96 scores in the SIFD, BAT, and CAT conditions were 55.93%, 52.56%, and 61.57%, respectively. The

CAT system improved transfer task performance 10% relative to SIFD (6% point improvement) and

17% relative to BAT (9% point improvement). A two sample t-test with unequal variance adjustment indicated a statistically significant difference in transfer task performance between the SIFD and

CAT conditions, T(31.85) = -2.25, M = -0.68 95% CI [-1.30 -0.06], p = .031. There was also a statistically significant difference between the BAT and CAT conditions, T(17.98) = -3.63, M =

-1.09 95% CI [-1.72 -0.46], p = .002. Accuracy under BAT was lower than under SIFD, but not significantly T(25.48) = -1.21, M = -0.41 95% CI [-1.10 0.28], p = .236. These results show that participants in the CAT training condition performed better on the transfer task than those in the BAT and SIFD conditions.

Relationship of training difficulty level and transfer task performance. The CAT regimen led to a greater average difficulty level during training, as compared to the BAT. To examine whether the difference in average difficulty could account for the better transfer task performance with CAT training, we fit a linear model (Figure 5.6) with transfer task accuracy as the outcome variable and average training difficulty level and training condition as predictors. Condition was dummy-coded with BAT as the reference level. Continuous variables were mean-centered prior to model fitting.

Difficulty level in the CAT condition was associated with a transfer task accuracy improvement of

6.64, 95% CI [0.55, 12.72] percentage points, T(18) = 2.29, p = .034. The effect of difficulty level in the BAT condition was 2.53 [-0.64, 5.70] percentage points per level, T(18) = 1.67, p = .11. The interaction of condition with difficulty level, representing the difference in the effect of difficulty level in CAT relative to BAT was 0.07 [-7.96 8.09] percentage points per level, T(18) = 0.02, p =

.99. There was a marginal association between difficulty of training and transfer task accuracy, but

97 the transfer task performance advantage associated with CAT remained when controlling for this marginal effect.

Relationship of training score, and transfer task performance. To examine whether the differ- ence in average training task score could account for the transfer task accuracy advantage associated with CAT, we fit a linear model (Figure 5.7) with transfer task accuracy as an outcome variable and average score and condition as predictors. Condition was dummy-coded with SIFD as the reference category. At an average score, BAT increased accuracy by 0.82, 95% CI [-6.55, 8.20] percentage points, T(38) = 0.23, p = .82. CAT increased accuracy by 7.6 [2.24, 13.09], T(38) = 2.81, p = .008.

The effect of a one point increase in average score on accuracy was 0.03, [0.005, 0.048] percentage points, T(38) = 2.48, p = .018. The interaction of BAT with score was -0.008 [-0.05, 0.03], T(38) =

-0.40, p = .69, and with CAT was -0.006 [-0.05 0.04], T(38) = -0.28, p = .78. Although higher training task score was associated with better transfer task performance, the effect of the CAT remained when controlling for training task score.

Answer of research question. Based on the transfer task accuracy from three different conditions, we have found that by adding the neural feature into a behavioral adaptive training boost the human performance into the transfer task. Hence, the answer to our research question is positive.

5.6 Discussion

We developed a multi-modal adaptive training system by combining both behavioral and neural features and compared it with a non-adaptive and a behaviorally adaptive training systems.

We showed that the combined (CAT) system outperformed the others in terms of accuracy on a subsequent post-training transfer task. These advantages remained even after controlling for

98 Figure 5.6: Differences in training difficulty do not account for the transfer performance advantage of CAT. Here, we excluded SIFD condition as it was fixed at easiest level. Dashed lines show 95% confidence intervals of the fit. differences in difficulty levels and behavioral scores during the training. Here we discuss insights regarding our results.

CAT was more stable than BAT. In the CAT system, people stayed more often in stable blocks than BAT system. Blue data in Table 5.2 showed instances where the CAT logic kept difficulty constant during training, but where the BAT logic would have changed difficulty. Gray cell data showed instances where the CAT logic would have changed difficulty, but the BAT logic would have kept it the same. The difference (36.6%) is the net stabilizing effect of the CAT logic. To check whether this stabilizing effect might account for the difference in transfer task accuracy, we ran a simple linear model with the number of novel blocks as the only predictor. The effect of the number of novel blocks on accuracy was marginal, B = -0.78, 95% CI [-1.60, 0.03], T(20) = -2.0,

99 Figure 5.7: Differences in training task scores do not account for the transfer performance advantage of CAT. Curved lines show 95% confidence intervals of the fit. p = .059. A linear model with condition, number of novel blocks, and their interaction found a significant effect of the CAT condition, B = 9.23, 95% CI [2.01, 16.46], T(18) = 2.69, p = .015, but the the effect size of number of novel blocks and the interaction were not statistically significantly different from zero (both p > .2). However, the strong correlation between the number of novel blocks and condition (r(20) = .69) makes these coefficients difficult to interpret. This suggests that the stabilizing effect was not the best explanation for the better results in the CAT condition in this study.

Future work could examine larger sample sizes or adaptive training scheme with different levels of stability to determine whether such effects emerge. A participant’s engagement in the learning task

100 BAT Behavioral→ S<1500 1500<=S<=1700 S>1700 Total % of block 37.00 28.00 35.00 CAT Behavioral→ S<1500 1500<=S<=1700 S>1700 Neural↓ TARP<-5 6.25 6.25 12.95 -5<= TARP<= 5 6.70 15.18 25.89 TARP>5 4.90 7.59 14.29 Total % of block 17.85 29.02 53.13

Table 5.2: Top: Overall percentage of scores in the three categories defined by the adaptive rule in the behavior adaptive condition. Bottom: Overall percentage of scores in the nine categories defined by the adaptive rule in the combined adaptive condition. Cells in dark gray show cases in which the behavior rule would keep difficulty the same, but the combined rule changed difficulty. Those in light blue show cases in which the behavior rule would change difficulty, but the combined rule kept difficulty the same. is necessary for staying at the optimal learning rate. Frequent change of task difficulty might induce disengagement from the learning task in the BAT system. This phenomenon might hurt the training and transfer task performance.

Were CAT participants better learners?. Assignment to condition was not based on any assessed skill level or performance and the sample sizes in the two adaptive training conditions were not particularly large. This leaves the possibility that the sample of participants assigned to CAT was simply better at the task than those assigned to BAT. A within-subjects design or matched pre-test might eliminate that concern. However, for both adaptive conditions, the performance scores during

first block of training were identical. We checked the common block 1 (Figure 5.8) for both groups and found similar performance.

BAT vs SIFD . The comparison of transfer task results of three training methods demonstrated that the BAT condition did not yield significant improvement over the SIFD condition. For building an effective adaptive training system, the right adaptive variable and adaptive logic must be chosen.

Any suboptimal choice of performance metrics, adaptive variable, or adaptive logic might hurt

101 Figure 5.8: Performance on the first block of training was similar across the CAT and BAT training conditions. Large dots show means with 95% confidence intervals. the effectiveness. Perhaps our BAT condition did not yield improvement over SIFD because of a suboptimal choice of one or more of these variables. There might be some alternatives scoring function which might improve the behavioral adaptive training results. If we could design a better behaviorally adaptive system and applied our neural parameter on top of that, we might have a chance to get more effective adaptive training. In our study, we found that adding the neural feature into an adaptive system boost the transfer task performance.

5.7 Implications and Applications of Our Work

Implications. We believe our study moves the ball forward in a few ways. Our method is especially applicable in learning studies, because it is based on estimating changes in theta-alpha ratio (TAR), rather than estimating an absolute objective or subjective workload levels. Also, our EEG measure

102 requires no task-specific or user-specific tuning. The absence of tuning to the task allows us to hope it might generalize more than some highly fit model of workload. Finally, rather than documenting a relationship between TAR and difficulty, we close the loop and focus on the effect of our neuro- adaptive training approach on training outcomes. Our developed technology has potentially wide relevance, as it can be applied to virtually any computer-based human learning experience. Although behavioral adaptive training has had notable success, including neural signals for adaptation provides additional insight. The benefits of our novel approach are broad within the context of interactive training (e.g, game based cognitive training) and could allow for improved transfer performance, reduced training times, and reduced training costs.

Applications. Neuroadaptive technology can be used in many areas [54] such as decision support, human-robot teams, learning, and memory. We observed performance improvement in a stimulus recognition transfer task, following CAT training in a go/no-go inhibitory control task. Future work should examine how robust this finding is to other training and transfer tasks, and how robust this

finding is when compare to other, perhaps more sophisticated, forms of BAT. The general approach of incorporating TARP into adaptive training systems could be applied to existing adaptive training of perceptual or cognitive tasks in which blocks of practice are subject to an adaptive difficulty modification (e.g., tutoring, exercise, simulated training). Virtual reality (VR) based adaptive training could be a particularly fruitful area to explore, because it needs to satisfy similar constraints. Our system uses laboratory-grade EEG, but the same principle could be applied using consumer-grade

EEG. The incorporation of EEG into Virtual Reality (VR) headsets opens the door for combined adaptive training to be used in VR. This substantially expands the range of applications for our work.

Whether adaptive training takes place with conventional or VR hardware, the addition of neural

103 measures to such systems might result in better learning.

5.8 Study Limitations and Future Work

Study Limitations. Despite the fact that we found statistically significant improvement in transfer task accuracy following CAT training vs. BAT and SIFD, we also acknowledge some limitations that could be addressed in future research. First, the fact that our BAT condition did not yield improved performance compared to the SIFD condition suggests that our BAT condition was not optimally designed. It may be the case that CAT is not superior to a better-designed BAT system. This remains to be tested. Similarly, the frequency of difficulty changes observed in the BAT paradigm may have led to decreased engagement on the part of the participants, and a BAT system designed to have more stability might lead to better outcomes. We used a simple mapping function between performance and score for our BAT. We used research-grade high-end headset for our study rather than a consumer grade low resolution EEG headset. Since, we used whole-head TAR, the lower coverage and spatial resolution of consumer-grade EEG are unlikely to be a problem. SNR could be an issue, but recent work [108] showed that the consumer grade dry EEG headset has given similar results like research grade headset. In addition, our lab study was conducted under highly controlled conditions, including a sound-attenuated and light-attenuated room to block out distractions. We might get degraded performance in noisy open environments. This could limit potential applications, but much independent learning takes place in classrooms and other relatively controlled environments.

Our sample size is similar to that of other recent BCI studies, which used eleven participants [62] and twelve participants [156], and we used a diverse participant pool, recruited from outside a university setting. Our sample size is imbalance between different training methodologies. The primary

104 imbalance arises due to the larger number of SIFD participants (22) vs the other two conditions (10,

12). The data for the SIFD condition were collected as part of another, larger experiment and served as a convenient comparison for the two adaptive conditions. None of our statistical tests are biased by sample size, so this imbalance should not have any systematic effect on observed outcomes.

5.9 Chapter Conclusion

We have explored a novel approach of designing an adaptive training by incorporating both behavior and TARP as the performance metrics feeding the adaptive logic. We evaluated this newly designed adaptive training using an abstract threat/non-threat training task and a contextualized transfer task. We found that participants trained with the CAT system had higher accuracy in the transfer task compared to those trained with the other two systems. The experimental results showed that CAT improvement was 6% relative to SIFD and 9% relative to the BAT system. Future work will need to determine whether these results could be replicated in a revised behaviorally adaptive system, or if these advantages are uniquely attributable to the inclusion of the neural measure.

105 Chapter 6

Conclusions

6.1 Thesis Summary

In the dissertation, we presented the projects on utilizing Brain-Computer Interface to enhance the security of web browsing and the privacy of personal devices, and to augment human performance. We developed a phishing detection system, access control system, and neuroadaptive training system using neural signals. In chapter 2, we explored the feasibility of using neural activities for automated phishing detection. In chapter 3, we explored the feasibility of utilizing neural activities of giving permission to access control. In chapter 4, we explored a novel approach of designing adaptive training by incorporating behavior percentage of theta by an alpha ratio as the performance metrics feeding the adaptive logic. We found that participants trained with the combined adaptive system were more accurate in the transfer task than those trained with the other two methods (non-adaptive and behaviorally adaptive design).

106 6.2 Future Directions

Decoding Skepticism in Phishing Detection. In this study, we intend to dissect the role of skepticism in phishing detection in a more systematic, theory-driven manner. Specifically, we want to test the hypothesis that visual differences between real and phishing websites will result in neural activity differences that can be captured through electroencephalography (EEG). Secondly, we plan to delineate the signature of neural activities associated with different visual components (e.g., security indicator, address bar, page content). We also intend to learn how the neural activities evolve as users process these visual components. This will enable us to understand how the difference in each visual component affects the level of skepticism signal. Finally, we plan to study the effectiveness of feedback in enhancing the skepticism signal and the final decision-making process. In the future study, our goal is to strengthen current knowledge for phishing detection from a neuroscience standpoint.

The primary questions driving our research include: 1) Does the skepticism signal correlate with the task? 2) Does the skepticism signal correlate with visual attention? 3) Which visual components trigger skepticism?

Fusioning fMRI with EEG. We know that EEG has a high temporal resolution and fMRI has a high spatial resolution. We plan to combine EEG and fMRI into one experiment to understand the comprehensive insights of phishing detection by combining both temporal and spatial features.

Generalized Methodology. Our discussed BCI-based methodology might generally apply to other user-centered security (e.g., malware warning) or non-security tasks (e.g., real fake news, misinformation, disinformation, augmenting human performance) in the human-computer interaction domain. Our extended framework might create a new subfield of science called neurocybersecurity where neuroscientific principles, tools, methods, and theory can be applied to cybersecurity problems.

107 6.3 Acknowledgement

This research was supported, in part, by the US Army Research Laboratory’s Human

Research and Engineering Directorate, NSF award CNS-1718997 and ONR under grant N00014-17-

1-2893. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of ARL, NSF or ONR.

108 Bibliography

[1] Emotiv EEG Headset. http://emotiv.com/.

[2] Emotiv pureeeg raw eeg software. https://www.emotiv.com/product/ emotiv-pure-eeg/, 2017. Accessed: 5-17-2017.

[3] Mind-controlled robots: the factories of the future? https://www.youtube.com/watch? v=wXYvuhH_4Uw, 2018. Accessed: 02-10-2018.

[4] Greg Aarin and Rod Rasmussen. Global phishing survey 1h2014: Trends and domain name use. Technical Report 1H2014, APWG, 2014.

[5] Muhammad Kamil Abdullah, Khazaimatol S Subari, Justin Leo Cheang Loong, and Nurul Na- dia Ahmad. Analysis of effective channel placement for an eeg-based biometric system. In IEEE EMBS Conference, Biomedical Engineering and Sciences (IECBES), pages 303–306. IEEE, 2010.

[6] Alexa. Alexa. https://www.alexa.com/topsites/countries/US, 2017. Accessed: 10-17-2017.

[7] Mohammad H Alomari, Aya Samaha, and Khaled AlKamha. Automated classification of l/r hand movement eeg signals using advanced feature extraction and machine learning. arXiv preprint arXiv:1312.2877, 2013.

[8] Amazon.com, Inc. Alexa skill kit. https://developer.amazon.com/ alexa-skills-kit, 2027.

[9] Joanne Anania. The effects of quality of instruction on the cognitive and affective learning of students. 1982.

[10] Corey Ashby, Amit Bhatia, Francesco Tenore, and Jacob Vogelstein. Low-cost electroenceph- alogram (eeg) based authentication. In 2011 5th International IEEE/EMBS Conference on Neural Engineering, pages 442–445. IEEE, 2011.

[11] H Aurlien, IO Gjerde, JH Aarseth, G Eldøen, B Karlsen, H Skeidsvoll, and NE Gilhus. Eeg background activity described by a large computerized database. Clinical Neurophysiology, 115(3):665–673, 2004.

109 [12] Louise Barkhuus and Anind K Dey. Location-based services for mobile telephony: a study of users’ privacy concerns. In International Conference on Human-Computer Interaction.

[13] bartleby. Brain lobes. https://www.bartleby.com/107/illus728.html, 2020. Ac- cessed: 02-15-2020.

[14] Jeffrey M Beck, Wei Ji Ma, Roozbeh Kiani, Tim Hanks, Anne K Churchland, Jamie Roitman, Michael N Shadlen, Peter E Latham, and Alexandre Pouget. Probabilistic population codes for bayesian decision making. Neuron, 60(6):1142–1152, 2008.

[15] Maouia Bentlemsan, ET-Tahir Zemouri, Djamel Bouchaffra, Bahia Yahya-Zoubir, and Karim Ferroudji. Random forest and filter bank common spatial patterns for eeg-based motor imagery classification. In International Conference on Intelligent Systems, Modelling and Simulation (ISMS), 2014.

[16] Biosemi. Biosemi. https://www.biosemi.com/products.htm, 2020. Accessed: 02-15- 2020.

[17] Niels Birbaumer, Nimr Ghanayim, Thilo Hinterberger, Iver Iversen, Boris Kotchoubey, Andrea Kübler, Juri Perelmouter, Edward Taub, and Herta Flor. A spelling device for the paralysed. Nature, 398(6725):297–298, 1999.

[18] Benjamin S Bloom. The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational researcher, 13(6):4–16, 1984.

[19] Tamara Bonaci, Ryan Calo, and Howard Jay Chizeck. App stores for the brain: Privacy & security in brain-computer interfaces. In IEEE International Symposium on Ethics in Science, Technology and Engineering, 2014.

[20] TLBMT Bonaci, J Herron, and HJ Chizeck. How susceptible is the brain to the side-channel private information extraction. American Journal of Bioethics, Neuroscience, 6(4), 2015.

[21] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.

[22] Leo Breiman. Random forests. Machine learning, 45(1):5–32, 2001.

[23] Ahier Brian. Neuralink, facebook, and kernel compete on direct brain-computer inter- face. https://www.linkedin.com/pulse/direct-brain-interface-brian-ahier, 2017.

[24] Dany Bright, Amrita Nair, Devashish Salvekar, and Swati Bhisikar. Eeg-based brain controlled prosthetic arm. In 2016 Conference on Advances in Signal Processing (CASP), pages 479–483. IEEE, 2016.

[25] Andrew Campbell, Tanzeem Choudhury, Shaohan Hu, Hong Lu, Matthew K Mukerjee, Mashfiqui Rabbi, and Rajeev DS Raizada. Neurophone: brain-mobile phone interface using a wireless eeg headset. In Proceedings of the second ACM SIGCOMM workshop on Networking, systems, and applications on mobile handhelds, pages 3–8. ACM, 2010.

110 [26] Bokai Cao, Chun-Ta Lu, Xiaokai Wei, S Yu Philip, and Alex D Leow. Semi-supervised tensor factorization for brain network analysis. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 17–32. Springer, 2016.

[27] Chris Chatfield. The analysis of time series: an introduction. CRC press, 2016.

[28] Stephen Chen. China is mining data directly from workers’ brains on an indus- trial scale. http://www.scmp.com/news/china/society/article/2143899/ forget-facebook-leak-china-mining-data-directly-workers-brains, 2018. Accessed: 04-30-2018.

[29] Teh-Chung Chen, Scott Dick, and James Miller. Detecting visually similar web pages: Application to phishing detection. ACM Transactions on Internet Technology (TOIT), 10(2):5, 2010.

[30] Peter A. Chew, Brett W. Bader, Tamara G. Kolda, and Ahmed Abdelali. Cross-language information retrieval using parafac2. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’07, pages 143–152. ACM, 2007.

[31] Michelene TH Chi. Two approaches to the study of experts’ characteristics. The Cambridge handbook of expertise and expert performance, pages 21–30, 2006.

[32] Younggeun Choi, Feng Qi, James Gordon, and Nicolas Schweighofer. Performance-based adaptive schedules enhance motor learning. Journal of motor behavior, 40(4):273–280, 2008.

[33] Weibo Chu, Bin B Zhu, Feng Xue, Xiaohong Guan, and Zhongmin Cai. Protect sensitive sites from phishing attacks using features extractable from inaccessible phishing urls. In Communications (ICC), 2013 IEEE International Conference on, pages 1990–1994. IEEE, 2013.

[34] John Chuang, Hamilton Nguyen, Charles Wang, and Benjamin Johnson. I think, therefore i am: Usability and security of authentication using brainwaves. In Financial Cryptography and Data Security, pages 1–16. Springer, 2013.

[35] A. Cichocki, Y. Washizawa, T. Rutkowski, H. Bakardjian, A. H. Phan, S. Choi, H. Lee, Q. Zhao, L. Zhang, and Y. Li. Noninvasive bcis: Multiway signal-processing array decompositions. Computer, 41(10):34–42, Oct 2008.

[36] Andrzej Cichocki, Danilo Mandic, Lieven De Lathauwer, Guoxu Zhou, Qibin Zhao, Cesar Caiafa, and Huy Anh Phan. Tensor decompositions for signal processing applications: From two-way to multiway component analysis. IEEE signal processing magazine, 32(2):145–163, 2015.

[37] Andrzej Cichocki, Yoshikazu Washizawa, Tomasz Rutkowski, Hovagim Bakardjian, Anh- Huy Phan, Seungjin Choi, Hyekyoung Lee, Qibin Zhao, Liqing Zhang, and Yuanqing Li. Noninvasive bcis: Multiway signal-processing array decompositions. Computer, 41(10), 2008.

111 [38] Adam R Clarke, Robert J Barry, Rory McCarthy, and Mark Selikowitz. Age and sex effects in the eeg: differences in two subtypes of attention-deficit/hyperactivity disorder. Clinical Neurophysiology, 112(5):815–826, 2001.

[39] Mike X Cohen. Analyzing neural time series data: theory and practice. MIT press, 2014.

[40] Fengyu Cong, Qiu-Hua Lin, Li-Dan Kuang, Xiao-Feng Gong, Piia Astikainen, and Tapani Ristaniemi. Tensor decomposition of eeg signals: a brief review. Journal of neuroscience methods, 248:59–69, 2015.

[41] Fengyu Cong, Qiu-Hua Lin, Li-Dan Kuang, Xiao-Feng Gong, Piia Astikainen, and Tapani Ristaniemi. Tensor decomposition of eeg signals: a brief review. Journal of neuroscience methods, 248:59–69, 2015.

[42] Ariane Cuenen, Ellen MM Jongen, Tom Brijs, Kris Brijs, Katrijn Houben, and Geert Wets. Effect of a working memory training on aspects of cognitive ability and driving ability of older drivers: Merits of an adaptive training over a non-adaptive training. Transportation research part F: traffic psychology and behaviour, 42:15–27, 2016.

[43] Max T Curran, Nick Merrill, John Chuang, and Swapan Gandhi. One-step, three-factor authentication in a single earpiece. In Proceedings of the 2017 ACM International Joint Conference on UBICOMP and ISWC, pages 21–24. ACM, 2017.

[44] Deepika Dasari, Guofa Shou, and Lei Ding. Ica-derived eeg correlates to mental fatigue, effort, and workload in a realistically simulated air traffic control task. Frontiers in neuroscience, 11:297, 2017.

[45] Jan C de Munck, Sonia I Gonçalves, R Mammoliti, Rob M Heethaar, and FH Lopes Da Silva. Interactions between different eeg frequency bands and their effect on alpha–fmri correlations. Neuroimage, 47(1):69–76, 2009.

[46] Stanislas Dehaene. Consciousness and the brain: Deciphering how the brain codes our thoughts. Penguin, 2014.

[47] Arnaud Delorme and Scott Makeig. Eeglab: an open source toolbox for analysis of single-trial eeg dynamics including independent component analysis. Journal of neuroscience methods, 134(1):9–21, 2004.

[48] Rachna Dhamija, J Doug Tygar, and Marti Hearst. Why phishing works. In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 581–590. ACM, 2006.

[49] Tobias Egner and John H Gruzelier. The temporal dynamics of electroencephalographic responses to alpha/theta neurofeedback training in healthy subjects. Journal of Neurotherapy, 8(1):43–57, 2004.

[50] Tobias Egner, Emilie Strawson, and John H Gruzelier. Eeg signature and phenomenology of alpha/theta neurofeedback training versus mock feedback. Applied psychophysiology and biofeedback, 27(4):261, 2002.

112 [51] EmotivPro. Emotivpro. https://www.emotiv.com/product/emotivpro/, 2017. Ac- cessed: 08-10-2017.

[52] Gidon Eshel. The yule walker equations for the ar coefficients. Internet resource, 2:68–73, 2003.

[53] Georg E Fabiani, Dennis J McFarland, Jonathan R Wolpaw, and Gert Pfurtscheller. Conversion of eeg activity into cursor movement by a brain-computer interface (bci). IEEE transactions on neural systems and rehabilitation engineering, 12(3):331–338, 2004.

[54] Magdalena Fafrowicz, Tadeusz Marek, Waldemar Karwowski, and Dylan Schmorrow. Neuroadaptive systems: Theory and applications. CRC Press, 2012.

[55] Adrienne Porter Felt, Serge Egelman, Matthew Finifter, Devdatta Akhawe, David Wagner, et al. How to ask for permission. 2012.

[56] Benjamin T Files, Kimberly A Pollard, Ashley H Oiknine, Antony D Passaro, and Peter Khooshabeh. Prevention focus relates to performance on a loss-framed inhibitory control task. Frontiers in psychology, 10:726, 2019.

[57] Kristin E Flegal, J Daniel Ragland, and Charan Ranganath. Adaptive task difficulty influences neural plasticity and transfer of training. NeuroImage, 188:111–121, 2019.

[58] Luay Fraiwan, Khaldon Lweesy, Natheer Khasawneh, Heinrich Wenz, and Hartmut Dickhaus. Automated sleep stage identification system based on time–frequency analysis of a single eeg channel and random forest classifier. Computer methods and programs in biomedicine, 108(1):10–19, 2012.

[59] Mario Frank, Tiffany Hwu, Sakshi Jain, Robert Knight, Ivan Martinovic, Prateek Mittal, Daniele Perito, and Dawn Song. Subliminal probing for private information via eeg-based bci devices. arXiv preprint arXiv:1312.6052, 2013.

[60] Mario Frank, Tiffany Hwu, Sakshi Jain, Robert T Knight, Ivan Martinovic, Prateek Mittal, Daniele Perito, Ivo Sluganovic, and Dawn Song. Using eeg-based bci devices to subliminally probe for private information. In Proceedings of the 2017 on Workshop on Privacy in the Electronic Society, pages 133–136. ACM, 2017.

[61] Yanick Fratantonio, Chenxiong Qian, Simon P Chung, and Wenke Lee. Cloak and dagger: From two permissions to complete control of the ui feedback loop. 2017.

[62] Lukas Gehrke, Sezen Akman, Pedro Lopes, Albert Chen, Avinash Kumar Singh, Hsiang-Ting Chen, Chin-Teng Lin, and Klaus Gramann. Detecting visuo-haptic mismatches in virtual reality using the prediction error negativity of event-related brain potentials. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, page 427. ACM, 2019.

[63] Alan Gevins, Michael E Smith, Harrison Leong, Linda McEvoy, Susan Whitfield, Robert Du, and Georgia Rush. Monitoring working memory load during computer-based tasks with eeg pattern recognition methods. Human factors, 40(1):79–91, 1998.

113 [64] Alan Gevins, Michael E Smith, Linda McEvoy, and Daphne Yu. High-resolution eeg mapping of cortical activation related to working memory: effects of task difficulty, type of processing, and practice. Cerebral cortex (New York, NY: 1991), 7(4):374–385, 1997.

[65] Germán Gómez-Herrero, Wim De Clercq, Haroon Anwar, Olga Kara, Karen Egiazarian, Sabine Van Huffel, and Wim Van Paesschen. Automatic removal of ocular artifacts in the eeg without an eog reference channel. In NORSIG, Signal Processing Symposium, pages 130–133. IEEE, 2006.

[66] Ian J. Goodfellow, Jonathon Shlens, and Christian Szegedy. Explaining and harnessing adversarial examples. In International Conference on Learning Representations (ICLR), 2015.

[67] Daniel Gopher, Beverly H Williges, Robert C Williges, and Diane L Damos. Varying the type and number of adaptive variables in continuous tracking. Journal of motor behavior, 7(3):159–170, 1975.

[68] Arthur C Graesser, Mark W Conley, and Andrew Olney. Intelligent tutoring systems. 2012.

[69] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard Pfahringer, Peter Reutemann, and Ian H Witten. The weka data mining software: an update. ACM SIGKDD explorations newsletter, 11(1):10–18, 2009.

[70] Richard A Harshman. Foundations of the parafac procedure: Models and conditions for an" explanatory" multimodal factor analysis. 1970.

[71] Christoph S Herrmann, Stefan Rach, Johannes Vosskuhl, and Daniel Strüber. Time–frequency analysis of event-related potentials: a brief tutorial. Brain topography, 27(4):438–450, 2014.

[72] Amir Herzberg and Ahmad Jbara. Security and identification indicators for browsers against spoofing and phishing attacks. ACM Transactions on Internet Technology (TOIT), 8(4):16, 2008.

[73] Joyce C. Ho, Joydeep Ghosh, and Jimeng Sun. Marble: High-throughput phenotyping from electronic health records via sparse nonnegative tensor factorization. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 115–124. ACM, 2014.

[74] Maarten A Hogervorst, Anne-Marie Brouwer, and Jan BF Van Erp. Combining and comparing eeg, peripheral physiology and eye-related measures for the assessment of mental workload. Frontiers in neuroscience, 8:322, 2014.

[75] Joni Holmes, Susan E Gathercole, and Darren L Dunning. Adaptive training leads to sustained enhancement of poor working memory in children. Developmental science, 12(4):F9–F15, 2009.

[76] Lin-Shung Huang, Alexander Moshchuk, Helen J Wang, Stuart Schecter, and Collin Jackson. Clickjacking: Attacks and defenses. 2012.

114 [77] Mengfei Huang, Holly Bridge, Martin J Kemp, and Andrew J Parker. Human cortical activity evoked by the assignment of authenticity when viewing works of art. Frontiers in human neuroscience, 5:134, 2011.

[78] Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent component analysis, volume 46. John Wiley & Sons, 2004.

[79] Aapo Hyvärinen and Erkki Oja. Independent component analysis: algorithms and applications. Neural networks, 13(4):411–430, 2000.

[80] Internet security threat report. https://www.symantec.com/content/dam/symantec/ docs/reports/gistr22-government-report.pdf, 2017. Accessed:12-12-2017.

[81] Susanne M Jaeggi, Martin Buschkuehl, John Jonides, and Walter J Perrig. Improving fluid intelligence with training on working memory. Proceedings of the National Academy of Sciences, 105(19):6829–6833, 2008.

[82] Yeongjin Jang, Simon P Chung, Bryan D Payne, and Wenke Lee. Gyrus: A framework for user-intent monitoring of text-based networked applications. ISOC, 2014.

[83] Benjamin Johnson, Thomas Maillart, and John Chuang. My thoughts are not your thoughts. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication, pages 1329–1338. ACM, 2014.

[84] Carrie A Joyce, Irina F Gorodnitsky, and Marta Kutas. Automatic removal of eye movement and blink artifacts from eeg data using blind component separation. Psychophysiology, 41(2):313–325, 2004.

[85] Valer Jurcak, Daisuke Tsuzuki, and Ippeita Dan. 10/20, 10/10, and 10/5 systems revisited: their validity as relative head-surface-based positioning systems. Neuroimage, 34(4):1600–1611, 2007.

[86] Julia Karbach, Tilo Strobach, and Torsten Schubert. Adaptive working-memory training benefits reading, but not mathematics in middle childhood. Child Neuropsychology, 21(3):285– 301, 2015.

[87] Charles R Kelley. What is adaptive training? Human Factors, 11(6):547–556, 1969.

[88] Bojan Kerous, Filip Skola, and Fotis Liarokapis. Eeg-based bci and video games: a progress report. Virtual Reality, 22(2):119–135, 2018.

[89] Henk A. L. Kiers, Jos M. F. ten Berge, and Rasmus Bro. Parafac2 - part i. a direct fitting algorithm for the parafac2 model. Journal of Chemometrics, 13:275–294.

[90] Torkel Klingberg, Elisabeth Fernell, Pernille J Olesen, Mats Johnson, Per Gustafsson, Kerstin Dahlström, Christopher G Gillberg, Hans Forssberg, and Helena Westerberg. Computerized training of working memory in children with adhd-a randomized, controlled trial. Journal of the American Academy of Child & Adolescent Psychiatry, 44(2):177–186, 2005.

115 [91] S Koehler, P Lauer, T Schreppel, C Jacob, M Heine, A Boreatti-Hümmer, AJ Fallgatter, and MJ Herrmann. Increased eeg power density in alpha and theta bands in adult adhd patients. Journal of neural transmission, 116(1):97–104, 2009.

[92] Tamara G Kolda and Brett W Bader. Tensor decompositions and applications. SIAM review, 51(3):455–500, 2009.

[93] Karl LaFleur, Kaitlin Cassady, Alexander Doud, Kaleb Shades, Eitan Rogin, and Bin He. Quadcopter control in three-dimensional space using a noninvasive motor imagery-based brain–computer interface. Journal of neural engineering, 10(4):046003, 2013.

[94] Charles-Francois Vincent Latchoumane, Francois-Benois Vialatte, Jaeseung Jeong, and An- drzej Cichocki. Eeg classification of mild and severe alzheimer’s disease using parallel factor analysis method. Advances in Electrical Engineering and Computational Science, pages 705–715, 2009.

[95] Anh Le, Athina Markopoulou, and Michalis Faloutsos. Phishdef: Url names say it all. In INFOCOM, 2011 Proceedings IEEE, pages 191–195. IEEE, 2011.

[96] S. le Cessie and J.C. van Houwelingen. Ridge estimators in logistic regression. Applied Statistics, 41(1):191–201, 1992.

[97] Ying Lean and Fu Shan. Brief review on physiological and biochemical evaluations of human mental workload. Human Factors and Ergonomics in Manufacturing & Service Industries, 22(3):177–187, 2012.

[98] Bin Liu, Mads Schaarup Andersen, Florian Schaub, Hazim Almuhimedi, SA Zhang, Nor- man Sadeh, Alessandro Acquisti, and Yuvraj Agarwal. Follow my recommendations: A personalized privacy assistant for mobile app permissions. 2016.

[99] Bin Liu, Jialiu Lin, and Norman Sadeh. Reconciling mobile app privacy and usability on smartphones: Could user privacy profiles help? 2014.

[100] Fabien Lotte, Marco Congedo, Anatole Lécuyer, Fabrice Lamarche, and Bruno Arnaldi. A review of classification algorithms for eeg-based brain–computer interfaces. Journal of neural engineering, 4(2):R1, 2007.

[101] Long Lu, Vinod Yegneswaran, Phillip Porras, and Wenke Lee. Blade: an attack-agnostic approach for preventing drive-by malware infections. pages 440–450. ACM, 2010.

[102] Steven J Luck. Ten simple rules for designing erp experiments. Event-related potentials: A methods handbook, 262083337, 2005.

[103] Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. Beyond blacklists: learning to detect malicious web sites from suspicious urls. In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 1245–1254. ACM, 2009.

116 [104] Justin Ma, Lawrence K Saul, Stefan Savage, and Geoffrey M Voelker. Identifying suspi- cious urls: an application of large-scale online learning. In Proceedings of the 26th annual international conference on machine learning, pages 681–688. ACM, 2009.

[105] Amir M Mané, JA Adams, and Emanuel Donchin. Adaptive and part-whole training in the acquisition of a complex perceptual-motor skill. Acta Psychologica, 71(1-3):179–196, 1989.

[106] Sebastien Marcel and José del R Millán. Person authentication using brainwaves (eeg) and maximum a posteriori model adaptation. IEEE transactions on pattern analysis and machine intelligence, 29(4), 2007.

[107] Samuel Marchal, Jérôme François, Radu State, and Thomas Engel. Phishstorm: Detecting phishing with streaming analytics. Network and Service Management, IEEE Transactions on, 11(4):458–471, 2014.

[108] Francesco Marini, Clement Lee, Johanna Wagner, Scott Makeig, and Mateusz Gola. A comparative evaluation of signal quality between a research-grade and a wireless dry-electrode mobile eeg system. Journal of neural engineering, 16(5):054001, 2019.

[109] Ivan Martinovic, Doug Davies, Mario Frank, Daniele Perito, Tomas Ros, and Dawn Song. On the feasibility of side-channel attacks with brain-computer interfaces. In Presented as part of the 21st USENIX Security Symposium (USENIX Security 12), pages 143–158, 2012.

[110] Urs Maurer, Silvia Brem, Martina Liechti, Stefano Maurizio, Lars Michels, and Daniel Brandeis. Frontal midline theta reflects individual task performance in a working memory task. Brain topography, 28(1):127–134, 2015.

[111] Ravi S Menon, Joseph S Gati, Bradley G Goodyear, David C Luknowsky, and Christopher G Thomas. Spatial and temporal resolution of functional magnetic resonance imaging. Biochem- istry and cell biology, 76(2-3):560–571, 1998.

[112] Nick Merrill and John Chuang. From scanning brains to reading minds: Talking to engineers about brain-computer interface. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, page 323. ACM, 2018.

[113] Jason S Metcalfe, Stephen M Gordon, Antony D Passaro, Bret Kellihan, and Kelvin S Oie. Towards a translational method for studying the influence of motivational and affective variables on performance during human-computer interactions. In International Conference on Augmented Cognition, pages 63–72. Springer, 2015.

[114] Kristopher Micinski, Daniel Votipka, Rock Stevens, Nikolaos Kofinas, Michelle L Mazurek, and Jeffrey S Foster. User interactions and permission use on android. 2017.

[115] Microsoft. Cortana skill kit. https://developer.microsoft.com/en-us/windows/ projects/campaigns/cortana-skills-kit, 2017.

[116] Fabian Monrose and Aviel Rubin. Authentication via keystroke dynamics. In Proceedings of the 4th ACM conference on Computer and communications security, pages 48–56. ACM, 1997.

117 [117] Ajaya Neupane, Md Lutfor Rahman, and Nitesh Saxena. Peep: Passively eavesdropping private input via brainwave signals. In International Conference on Financial Cryptography and Data Security, 2017. [118] Ajaya Neupane, Md Lutfor Rahman, Nitesh Saxena, and Leanne Hirshfield. A multi-modal neuro-physiological study of phishing detection and malware warnings. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 479–491. ACM, 2015. [119] Ajaya Neupane, Nitesh Saxena, and Leanne Hirshfield. Neural underpinnings of website legitimacy and familiarity detection: An fnirs study. In Proceedings of the 26th International Conference on World Wide Web, pages 1571–1580. International World Wide Web Conferences Steering Committee, 2017. [120] Ajaya Neupane, Nitesh Saxena, Keya Kuruvilla, Michael Georgescu, and Rajesh Kana. Neural signatures of user-centered security: An fMRI study of phishing, and malware warnings. In Proceedings of the Network and Distributed System Security Symposium (NDSS), pages 1–16, 2014. [121] Ajaya Neupane, M.L. Rahman, and Nitesh Saxena. Peep: Passively eavesdropping private input via brainwave signals. In International Conference on Financial Cryptography and Data Security, pages 227–246. Springer, 2017. [122] Helen Nissenbaum. Privacy as contextual integrity. Wash. L. Rev., 79:119, 2004. [123] Katarzyna Olejnik, Italo Ivan Dacosta Petrocelli, Joana Catarina Soares Machado, Kévin Huguenin, Mohammad Emtiyaz Khan, and Jean-Pierre Hubaux. Smarper: Context-aware and automatic runtime-permissions for mobile devices. 2017. [124] Kaan Onarlioglu, William Robertson, and Engin Kirda. Overhaul: Input-driven access control for better privacy on traditional operating systems. 2016. [125] Akinari Onishi, Anh Huy Phan, Kiyotoshi Matsuoka, and Andrzej Cichocki. Tensor classifica- tion for p300-based brain computer interface. In Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on, pages 581–584. IEEE, 2012. [126] OpenPhish. Phishing url. https://openphish.com/feed.txt, 2017. Accessed: 05-10- 2017. [127] Ramaswamy Palaniappan. Electroencephalogram signals from imagined activities: A novel biometric identifier for a small population. In International Conference on Intelligent Data Engineering and Automated Learning. Springer, 2006. [128] Ramaswamy Palaniappan. Two-stage biometric authentication method using thought activity brain waves. International Journal of Neural Systems, 18(01):59–66, 2008. [129] Chethan Pandarinath, Paul Nuyujukian, Christine H Blabe, Brittany L Sorice, Jad Saab, Francis R Willett, Leigh R Hochberg, Krishna V Shenoy, and Jaimie M Henderson. High performance communication by people with paralysis using an intracortical brain-computer interface. Elife, 6:e18554, 2017.

118 [130] E. E. Papalexakis. Automatic Unsupervised Tensor Mining with Quality Assessment. ArXiv e-prints, March 2015. [131] Evangelos E. Papalexakis, Alona Fyshe, Nicholas D. Sidiropoulos, Partha Pratim Talukdar, Tom M. Mitchell, and Christos Faloutsos. Good-enough brain model: Challenges, algorithms and discoveries in multi-subject experiments. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, pages 95–104, New York, NY, USA, 2014. ACM. [132] Ok-choon Park and Jung Lee. Adaptive instructional systems. Educational Technology Research and Development, 25:651–684, 2003. [133] Ok-Choon Park and Robert D Tennyson. Adaptive design strategies for selecting number and presentation order of examples in coordinate concept acquisition. Journal of Educational Psychology, 72(3):362, 1980. [134] Ioakeim Perros, Evangelos E. Papalexakis, Fei Wang, Richard Vuduc, Elizabeth Searles, Michael Thompson, and Jimeng Sun. Spartan: Scalable parafac2 for large & sparse data. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’17, pages 375–384. ACM, 2017. [135] Giuseppe Petracca, Ahmad-Atamli Reineh, Yuqiong Sun, Jens Grossklags, and Trent Jaeger. Aware: Preventing abuse of privacy-sensitive sensors via operation bindings. 2017. [136] Phishtank. OpenDNS. Phishtank. http://www.phishtank.com/, 2017. [Online; accessed 19-May-2017]. [137] Adam Piore. U.s. to fund advanced brain-computer interfaces. https://www. technologyreview.com/s/608219, 2017. Accessed: 07-15-2017. [138] M Poulos, M Rangoussi, V Chrissikopoulos, and A Evangelou. Person identification based on parametric processing of the eeg. In IEEE International Conference, Electronics, Circuits and Systems, volume 1, pages 283–286. IEEE, 1999. [139] NA Press. The polygraph and lie detection. committee to review the scientific evidence of the polygraph.: Division of behavioral and social sciences and education, 2003. [140] Md Lutfor Rahman, Sharmistha Bardhan, Ajaya Neupane, Evangelos Papalexakis, and Chengyu Song. Learning tensor-based representations from brain-computer interface data for cybersecurity. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 389–404. Springer, 2018. [141] Md Lutfor Rahman, Sharmistha Bardhan, Ajaya Neupane, Evangelos Papalexakis, and Chengyu Song. Learning tensor-based representations from brain-computer interface data for cybersecurity. In Proceedings of the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (PKDD), 2018. [142] Md Lutfor Rahman, Ajaya Neupane, and Chengyu Song. Iac: On the feasibility of utilizing neural signals for access control. In Proceedings of the 34th Annual Computer Security Applications Conference, pages 641–652. ACM, 2018.

119 [143] Rijin Raju, Chenguang Yang, Chunxu Li, and Angelo Cangelosi. A video game design based on emotiv neuroheadset. In Advanced Robotics and Mechatronics (ICARM), pages 14–19. IEEE, 2016.

[144] Elaine M Raybourn. Applying simulation experience design methods to creating serious game-based adaptive training systems. Interacting with computers, 19(2):206–214, 2007.

[145] Joshua Raymond, Carolyn Varney, Lesley A Parkinson, and John H Gruzelier. The effects of alpha/theta neurofeedback on personality and mood. Cognitive brain research, 23(2-3):287– 292, 2005.

[146] Thomas S Redick, Zach Shipstead, Tyler L Harrison, Kenny L Hicks, David E Fried, David Z Hambrick, Michael J Kane, and Randall W Engle. No evidence of intelligence improve- ment after working memory training: a randomized, placebo-controlled study. Journal of Experimental Psychology: General, 142(2):359, 2013.

[147] CEA Report. https://www.whitehouse.gov/articles/cea-report-cost-malicious-cyber-activity-u- s-economy.

[148] Talia Ringer, Dan Grossman, and Franziska Roesner. Audacious: User-driven access control with unmodified operating systems. pages 204–216. ACM, 2016.

[149] Franziska Roesner and Tadayoshi Kohno. Securing embedded user interfaces: Android and beyond. pages 97–112. USENIX, 2013.

[150] Franziska Roesner, Tadayoshi Kohno, Alexander Moshchuk, Bryan Parno, Helen J Wang, and Crispin Cowan. User-driven access control: Rethinking permission granting in modern operating systems. pages 224–238. IEEE, 2012.

[151] Cristóbal Romero, Sebastián Ventura, Eva L Gibaja, Cesar Hervás, and Francisco Romero. Web-based adaptive training simulator system for cardiac life support. Artificial Intelligence in Medicine, 38(1):67–78, 2006.

[152] Andres F Salazar-Gomez, Joseph DelPreto, Stephanie Gil, Frank H Guenther, and Daniela Rus. Correcting robot mistakes in real time using eeg signals. In IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2017.

[153] Stuart E Schechter, Rachna Dhamija, Andy Ozment, and Ian Fischer. The emperor’s new security indicators. In IEEE Symposium on Security and Privacy, pages 51–65. IEEE, 2007.

[154] Christina Schneegass, Thomas Kosch, Albrecht Schmidt, and Heinrich Hussmann. Investigat- ing the potential of eeg for implicit detection of unknown words for foreign language learning. In IFIP Conference on Human-Computer Interaction, pages 293–313. Springer, 2019.

[155] ECMS School. Brain lobes functions. https://sites.google.com/a/sudbury.k12.ma. us/ecms-school-counseling/the-middle-school-student/Yourchildsbrain/ brain-parts-and-functions, 2020. Accessed: 02-15-2020.

120 [156] Nathan Arthur Semertzidis, Betty Sargeant, Justin Dwyer, Florian Floyd Mueller, and Fabio Zambetta. Towards understanding the design of positive pre-sleep through a neurofeedback artistic experience. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, page 574. ACM, 2019. [157] Liu Shanhong. Size of the cybersecurity market worldwide, from 2017 to 2023. https://www.statista.com/statistics/595182/ worldwide-security-as-a-service-market-size, October 2019. [158] Internet Fact Sheet. internet-broadband. http://www.pewinternet.org/fact-sheet/ internet-broadband/, 2017. Accessed: 05-10-2017. [159] Steve Sheng, Mandy Holbrook, Ponnurangam Kumaraguru, Lorrie Faith Cranor, and Julie Downs. Who falls for phish?: a demographic analysis of phishing susceptibility and effect- iveness of interventions. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pages 373–382. ACM, 2010. [160] Steve Sheng, Brad Wardman, Gary Warner, Lorrie Faith Cranor, Jason Hong, and Chengshan Zhang. An empirical analysis of phishing blacklists. 2009. [161] Y Shoji, Chanakya Reddy Patti, and Dean Cvetkovic. Electroencephalographic neurofeedback to up-regulate frontal theta rhythms: Preliminary results. In 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 1425– 1428. IEEE, 2017. [162] Michael E Smith and Alan Gevins. Neurophysiologic monitoring of mental workload and fatigue during operation of a flight simulator. In Biomonitoring for Physiological and Cognitive Performance during Military Operations, volume 5797, pages 116–126. International Society for Optics and Photonics, 2005. [163] Marc Stiegler, Alan H Karp, Ka-Ping Yee, Tyler Close, and Mark S Miller. Polaris: virus-safe computing for windows xp. Communications of the ACM, 49(9):83–88, 2006. [164] Md Sohel Parvez Sumon. First man with two mind-controlled prosthetic limbs. Bangladesh Medical Journal, 44(1):59–60, 2016. [165] Joshua Sunshine, Serge Egelman, Hazim Almuhimedi, Neha Atri, and Lorrie Faith Cranor. Crying wolf: An empirical study of ssl warning effectiveness. In USENIX Security Symposium, pages 399–416, 2009. [166] Shravani Sur and VK Sinha. Event-related potential: An overview. Industrial psychiatry journal, 18(1):70, 2009. [167] Desney Tan and Anton Nijholt. Brain-computer interfaces and human-computer interaction. In Brain-Computer Interfaces, pages 3–19. Springer, 2010. [168] Grant Taylor, Lauren Reinerman-Jones, Keryl Cosenzo, and Denise Nicholson. Comparison of multiple physiological sensors to classify operator state in adaptive automation systems. In Proceedings of the human factors and ergonomics society annual meeting, volume 54, pages 195–199. Sage Publications Sage CA: , CA, 2010.

121 [169] Robert D Tennyson. Use of adaptive information for advisement in learning concepts and rules using computer-assisted instruction. American Educational Research Journal, 18(4):425–438, 1981.

[170] Robert D Tennyson and Wolfgang Rothen. Pretask and on-task adaptive design strategies for selecting number of instances in concept acquisition. Journal of Educational Psychology, 69(5):586, 1977.

[171] M.L. Rahman, Ajaya Neupane, Sharmistha Bardhan, Evangelos Papalexakis, and Chengyu Song. Phishing detection based on neural signals: An eeg study. Under Submission.

[172] M.L. Rahman, Benjamin T Files, Antony D. Passaro, Peter Khooshabeh, Ashley H. Oiknine, Kimberly Pollard, and Chengyu Song. Cat beats bat: Combining behavioral performance and neural signals to improve adaptive training. Under Review.

[173] Kurt Thomas, Chris Grier, Justin Ma, Vern Paxson, and Dawn Song. Design and evaluation of a real-time url spam filtering service. In Security and Privacy (SP), 2011 IEEE Symposium on, pages 447–462. IEEE, 2011.

[174] Julie Thorpe, Paul C van Oorschot, and Anil Somayaji. Pass-thoughts: authenticating with our minds. In Proceedings of the 2005 workshop on New security paradigms, pages 45–56. ACM, 2005.

[175] M Ungureanu, C Bigan, R Strungaru, and V Lazarescu. Independent component analysis applied in biomedical signal processing. Measurement Science Review, 4(2):18, 2004.

[176] Jeroen JG Van Merrienboer and John Sweller. Cognitive load theory and complex learning: Recent developments and future directions. Educational psychology review, 17(2):147–177, 2005.

[177] Yichen Wang, Robert Chen, Joydeep Ghosh, Joshua C. Denny, Abel Kho, You Chen, Brad- ley A. Malin, and Jimeng Sun. Rubik: Knowledge guided tensor factorization and completion for health data analytics. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’15, pages 1265–1274. ACM, 2015.

[178] Colin Whittaker, Brian Ryner, and Marria Nazif. Large-scale automatic classification of phishing pages. In Proceedings of the Network and Distributed System Security Symposium (NDSS), 2015.

[179] Primal Wijesekera, Arjun Baokar, Ashkan Hosseini, Serge Egelman, David Wagner, and Konstantin Beznosov. Android permissions remystified: A field study on contextual integrity. volume 15. USENIX, 2015.

[180] Primal Wijesekera, Arjun Baokar, Lynn Tsai, Joel Reardon, Serge Egelman, David Wagner, and Konstantin Beznosov. The feasibility of dynamically granted permissions: Aligning mobile privacy with user preferences. In 2017 IEEE Symposium on Security and Privacy (SP), pages 1077–1093. IEEE, 2017.

122 [181] wikipedia. Autoregressive model. https://en.wikipedia.org/wiki/Autoregressive_ model, 2020. Accessed: 02-15-2020.

[182] wikipedia. Eeg. https://en.wikipedia.org/wiki/Electroencephalography, 2020. Accessed: 02-15-2020.

[183] wikipedia. fmri. https://en.wikipedia.org/wiki/Functional_magnetic_ resonance_imaging, 2020. Accessed: 02-15-2020.

[184] Min Wu, Robert C Miller, and Simson L Garfinkel. Do security toolbars actually prevent phishing attacks? In Proceedings of the SIGCHI conference on Human Factors in computing systems, pages 601–610. ACM, 2006.

[185] Guang Xiang and Jason I Hong. A hybrid phish detection approach by identity discovery and keywords retrieval. In Proceedings of the 18th international conference on World wide web, pages 571–580. ACM, 2009.

[186] Zhemin Yang, Min Yang, Yuan Zhang, Guofei Gu, Peng Ning, and X Sean Wang. Appintent: Analyzing sensitive data transmission in android for privacy leakage detection. pages 1043– 1054. ACM, 2013.

[187] Yue Zhang, Jason I Hong, and Lorrie F Cranor. Cantina: a content-based approach to detecting phishing web sites. In Proceedings of the 16th international conference on World Wide Web, pages 639–648. ACM, 2007.

[188] Qinglin Zhao, Hong Peng, Bin Hu, Quanying Liu, Li Liu, YanBing Qi, and Lanlan Li. Improving individual identification in security check with an eeg based biometric solution. In International Conference on Brain Informatics, pages 145–155. Springer, 2010.

123