A Wearable Virtual Reality Stress Laboratory Based on the Electrocardiogram

UNIVERSITY PRESS

According to the World Health Organization, stress-related problems and diseases will be one of the major healthcare problems in this decade. Prolonged exposure to stress has been found to reduce immune system functioning and promote secondary diseases like hypertension, arrhythmias, and stroke. In order to combat this development, the relatively young field of stress research needs tools to assess stress levels and reactivity parameters in large populations to advance research especially with respect to work- related stress management. Existing stress research methods like the Trier Social Stress Test (TSST) are unsuitable for such endeavors, as they cannot easily be applied at scale. This work introduces a wearable, in-the-wild usable stress laboratory approach. Its prototypical implementation, from wearable-based biosignal recording device up to virtual reality-enhanced mobile stress recognition toolbox, can be used to assess and classify stress and the stress response FAU Studien aus der Informatik 14 of users within minutes at scale, without requiring complicated and expensive laboratory evaluations.

Stefan Gradl

The Stroop Room A Wearable Virtual Reality Stress Laboratory The Stroop Room The Stroop Based on the Electrocardiogram

ISBN 978-3-96147-384-7 FAU UNIVERSITY PRESS 2020 FAU Stefan Gradl

Stefan Gradl

The Stroop Room

A Wearable Virtual Reality Stress Laboratory Based on the Electrocardiogram

FAU Studien aus der Informatik

Band 14

Herausgeber der Reihe: Björn Eskofier, Richard Lenz, Andreas Maier, Michael Philippsen, Lutz Schröder, Wolfgang Schröder-Preikschat, Marc Stamminger, Rolf Wanka

Stefan Gradl

The Stroop Room A Wearable Virtual Reality Stress Laboratory Based on the Electrocardiogram

Erlangen FAU University Press 2020

Bibliografische Information der Deutschen Nationalbibliothek: Die Deutsche Nationalbibliothek verzeichnet diese Publikation in der Deutschen Nationalbibliografie; detaillierte bibliografische Daten sind im Internet über http://dnb.d-nb.de abrufbar.

Das Werk, einschließlich seiner Teile, ist urheberrechtlich geschützt. Die Rechte an allen Inhalten liegen bei ihren jeweiligen Autoren. Sie sind nutzbar unter der Creative-Commons-Lizenz BY.

Der vollständige Inhalt des Buchs ist als PDF über den OPUS-Server der Friedrich-Alexander-Universität Erlangen-Nürnberg abrufbar: https://opus4.kobv.de/opus4-fau/home

Bitte zitieren als Gradl, Stefan. 2020. The Stroop Room. A Wearable Virtual Reality Stress Laboratory Based on the Electrocardiogram. FAU Studien aus der Informatik Band 14. Erlangen: FAU University Press. DOI: 10.25593/978-3-96147-385-4

Verlag und Auslieferung: FAU University Press, Universitätsstraße 4, 91054 Erlangen

Druck: docupoint GmbH

ISBN: 978-3-96147-384-7 (Druckausgabe) eISBN: 978-3-96147-385-4 (Online-Ausgabe) ISSN: 2509-9981 DOI: 10.25593/978-3-96147-385-4 The Stroop Room A Wearable Virtual Reality Stress Laboratory Based on the Electrocardiogram

Der Stroop Room Ein tragbares Stresslabor in Virtueller Realität auf Basis des Elektrokardiogramms

Der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg

zur Erlangung des Doktorgrades Dr.-Ing.

vorgelegt von

Stefan Gradl, M.Sc. aus Kleinsendelbach Als Dissertation genehmigt von der Technischen Fakultät der Friedrich-Alexander-Universität Erlangen-Nürnberg

Tag der mündlichen Prüfung: 30.09.2020

Vorsitzender des Promotionsorgans: Prof. Dr.-Ing. habil. Andreas Paul Fröba

Gutachter: Prof. Dr. Björn Eskofier Prof. Dr. Elisabeth André Abstract According to the World Health Organization, stress-related problems and diseases will be one of the major healthcare problems in this decade. That applies particularly to work-related stress, which became ubiquitous even before the COVID-19 crisis, and now, with the additional unprecedented disruptions to work-life everywhere, will lead to even more demand for research. Prolonged exposure to stress has been found to reduce immune system functioning and promote secondary diseases like hypertension, arrhythmias, and stroke. All of which are still the leading causes of death right now in the world. In order to combat this development, the relatively young field of stress research needs tools to assess stress levels and reactivity parameters in large populations to gain useful insights especially with respect to work-related stress. Research experiments in this domain need to be done efficiently with many subjects inside the stressful environment (in situ/in the wild). Yet existing gold standard stress methods like the Trier Social Stress Test (TSST) or the Maastricht Acute Stress Test (MAST) are unsuitable for such endeavors, as they cannot easily be applied in the wild. They require expert supervision and acting elements inside a controlled/laboratory setup to provoke robust social stress with associated cortisol responses. These responses can then only be identified using blood orsaliva sampling and an elaborate follow-up offline evaluation process that usually requires several days to get results. This thesis presents a possible solution strategy by introducing a wearable, in-the-wild usable stress laboratory approach. It provides a prototypical pipeline from wearable-based biosignal recording device up to virtual reality-enhanced mobile stress recognition toolbox to assess and classify stress and the stress response of users within minutes and without requiring saliva or blood sampling and their associated complicated and expensive laboratory evaluations. The individual components of this work are: (1) an everywhere-usable wearable electrocardiography (ECG) device with Bluetooth transmission capabilities; (2) a mobile device application to record and analyze ECG signal data in quasi real-time on smartphones or smartwatches, in particular for the purpose of automated heart rate variability (HRV) and arrhythmia determination; (3) an algorithm to refine 푅-peak detections for maximally accurate HRV values; (4) a cognitive stressor application for mobile virtual reality headsets, based on the Stroop effect, to induce

iii stress using a new Stroop response paradigm and (5) a machine learning approach to automatically classify stress reactivity parameters. It is prototypically demonstrated that all individual parts of the system can be connected seamlessly and provide an end-to-end pipeline for non-expert users in the future. Using this thesis as a blueprint, it is easily feasible at this point to put all these components together using off-the- shelf devices available to consumers. All algorithms developed in this work are open-sourced on GitHub and available for inspection and reuse. Extensive evaluation of the components has been conducted and shows that heart rate detection accuracy is beyond 99 % on mobile devices, peak refinement improves classical 푅-peak detectors by up to 50 %, and the stress reactivity classifier recognizes the correct habituation or non- habituation in a test population with a precision and recall of 96 % (F1 score of 0.956). Furthermore, three distinct datasets were collected and are presented in this work in detail, that show how all important HRV parameters behave in different stress conditions. These include subjects in a completely relaxed condition (N=12), i.e. floating in warm water, subjects under mild peripheral cognitive demand and in the condition of virtual reality- immersion (N=14), and a third dataset with participants being subjected to severe cognitive stress in a very demanding Stroop Test related virtual reality scenario, the Stroop Room (N=71). Results for these datasets show that stress reactions are different between men and women and that the heart rate and the HRV PNN50 parameter seem to be best suited for a separation of short-term cognitive stress. Four ultra-short-term HRV features are presented that, in addition to others, allow a classification of stress habituation based solely onHRV and Stroop Room reaction parameters. The whole work demonstrates that stress magnitude recognition on mobile wearable and smart devices is feasible using HRV parameters derived from an ECG and that in particular the Stroop Room can be used to recognize stress habituation parameters. Overall, this provides a wearable stress laboratory usable with minimal supervisory effort. It can be employed immediately in a first step for participant screening in scientific gold standard experiments to increase efficiency and optimize information gain/outcome by allowing a pre-estimation of expected responder-type-distribution in the study population. In the long run, it could enable the setup of an in-the-wild stress-disease recognition system with automated recommendation abilities to inform possible end users about the necessity to seek medical aid iv or guidance and help reduce the prevalence of long-term stress maladies like burnout or depression.

Kurzdarstellung Nach Angaben der Weltgesundheitsorganisation werden stressbedingte Probleme und Krankheiten eines der größten Probleme im Gesundheits- wesen in diesem Jahrzehnt sein. Dies gilt insbesondere für arbeitsbedingten Stress, der bereits vor der COVID-19-Krise allgegenwärtig wurde und nun, mit den zusätzlichen, beispiellosen Störungen des Arbeitsle- bens überall, zu einem noch größeren Forschungsbedarf führen wird. Es hat sich gezeigt, dass eine längere Stressexposition die Funktion des Im- munsystems beeinträchtigt und Folgeerkrankungen wie Bluthochdruck, Herzrhythmusstörungen und Schlaganfall begünstigt. All dies sind nach wie vor die häufigsten Todesursachen in der Welt. Um dieser Entwicklung entgegenzuwirken, benötigt das relativ junge Gebiet der Stressforschung Instrumente zur Bewertung von Stressni- veaus und Reaktivitätsparametern in großen Bevölkerungsgruppen, um nützliche Erkenntnisse insbesondere im Hinblick auf arbeitsbedingten Stress zu gewinnen. Dazu muss es möglich werden Forschungsexperi- mente in der tatsächlichen Stressumgebung (in situ) effizient mit vielen Teilnehmern durchzuführen. Jedoch sind die bestehenden Laborstressoren wie der Trier Social Stress Test (TSST) oder der Maastricht Acute Stress Test (MAST) für solche Be- strebungen ungeeignet, da sie nicht einfach außerhalb des Labors durch- führbar sind. Sie erfordern eine fachkundige Aufsicht und schauspieleri- sche Elemente in einer kontrollierten-/Laborumgebung, um robusten sozialen Stress mit den damit verbundenen Cortisol-Reaktionen zu provo- zieren. Diese Reaktionen können dann auch nur mit Hilfe von Blut- oder Speichelproben identifiziert werden die in einem aufwendigen Offline- Prozess evaluiert werden müssen und Ergebnisse erst nach Tagen liefern. Diese Arbeit stellt eine mögliche Lösungsstrategie vor, indem sie einen tragbaren, in der Natur verwendbaren Stresslaboransatz präsentiert. Sie stellt eine prototypische Pipeline von einem tragbaren, auf Biosignalen basierenden Aufzeichnungsgerät bis hin zu einer in virtueller Realität ver- besserten mobilen Stress-Erkennungs-Toolbox zur Bewertung und Klas- sifizierung von Stress und der Stressreaktion von Anwendern innerhalb von Minuten bereit, ohne dass Speichel- oder Blutproben und die damit verbundenen umfangreichen Laborauswertungen erforderlich sind. Die einzelnen Komponenten dieser Arbeit sind: (1) ein universell ein- setzbares, tragbares Elektrokardiographie-Gerät (EKG) mit Bluetooth- Übertragungsmöglichkeiten; (2) eine mobile App zur Aufzeichnung und Analyse von EKG-Signaldaten in Quasi-Echtzeit auf Smartphones oder

vii Smartwatches, insbesondere zum Zweck der automatisierten Bestim- mung der Herzfrequenzvariabilität (HRV) und von Arrhythmien; (3) ein Algorithmus zur Verfeinerung der R-Zacken-Erkennung für maximal genaue HRV-Werte; (4) eine kognitive Stressor-An-wendung für mobile Virtual-Reality-Headsets, die auf dem Stroop-Effekt basiert, um Stress zu induzieren, wobei ein neues Stroop-Reaktionsparadigma und schließlich (5) ein maschineller Lernansatz zur Klassifizierung von Stressreaktivi- tätsparametern verwendet wird. Es wird prototypisch demonstriert, dass alle Einzelteile des Systems nahtlos miteinander verbunden werden können und eine End-to-End- Pipeline für eine Verwendung selbst durch Laien in der Zukunft bilden. Mit dieser Arbeit als Vorlage ist es an diesem Punkt leicht möglich, all diese Komponenten mit Hilfe von Standardgeräten, die den Verbrau- chern zur Verfügung stehen, zusammenzubauen. Alle in dieser Arbeit entwickelten Algorithmen sind auf GitHub als Open Source Implementie- rungen verfügbar und können eingesehen und wiederverwendet werden. Eine umfassende Auswertung aller Einzelkomponenten wurde durchge- führt und zeigt, dass die Genauigkeit der Herzfrequenz-Erkennung auf mobilen Geräten über 99 % liegt, die Spitzenverfeinerung für klassische R- Zacken Detektoren um bis zu 50 % verbessert und der Stressreaktivitäts- Klassifikator die korrekte Habituation oder Nicht-Habituation in einer Testpopulation mit einer Präzision und Recall von 96 % erkennt. (F1 Score von 0,956). Darüber hinaus wurden drei verschiedene Datensätze gesammelt die detailliert präsentiert werden und zeigen, wie sich alle wichtigen HRV- Parameter unter verschiedenen Stressbedingungen verhalten. Dazu ge- hören Probanden in einem völlig entspannten Zustand (N=12), d.h. in warmem Wassertreibend, Probanden unter leichterperipherer kognitiver Beanspruchung und im Zustand der Virtual-Reality-Immersion (N=14), und ein dritter Datensatz mit Probanden, die in einem sehr anspruchsvol- len Stroop Test-bezogenen Virtual-Reality-Szenario, dem Stroop Room, starkem kognitiven Stress ausgesetzt sind (N=71). Die Ergebnisse für diese Datensätze zeigen, dass die Stressreaktionen zwischen Männern und Frauen unterschiedlich sind und dass die Herz- frequenz und der HRV PNN50-Parameter am besten für eine Differen- zierung von kurzfristigem kognitivem Stress geeignet zu sein scheinen. Es werden vier neue HRV-Parameter vorgestellt für ultrakurze Zeitfens- ter, die, mit weiteren Parametern, eine Klassifizierung der Stresshabi- tuation auf der Grundlage der HRV und der Stroop Room-Reaktions- parameter ermöglichen. viii Die gesamte Arbeit zeigt, dass die Erkennung des Stresszustandes auf mobilen tragbaren und intelligenten Geräten mit Hilfe von HRV-Parametern, die aus einem EKG abgeleitet wurden, möglich ist und dass insbesondere der Stroop Room zur Erkennung von Stressgewohnheitsparametern verwendet werden kann. Insgesamt steht damit ein tragbares Stresslabor zur Verfügung, das mit minimalem Betreuungsaufwand genutzt werden kann und in einem ersten Schritt für das Teilnehmer-Screening in wissenschaftlichen Gold- Standard Experimenten eingesetzt werden kann, um die Effizienz und den Informationsgewinn zu optimieren indem eine Voreinschätzung der Verteilung der erwarteten Reaktionstypen in der Studienpopulati- on ermöglicht wird. Langfristig könnte es die Einrichtung eines Erken- nungssystems für Stresskrankheiten im normalen Leben mit automatisierten Empfehlungsfähigkeiten ermöglichen, um mögliche Endbe- nutzer über die Notwendigkeit zu informieren, medizinische Hilfe oder Anleitung zu suchen, und dazu beitragen, die Prävalenz von Langzeit- Stresskrankheiten wie Burnout oder Depression zu reduzieren.

Acknowledgements This thesis is the finalization and achievement of a personal goal that, alongside the demons of my pathological condition, is a feat I did not believe I could accomplish. Without the immense altruistic support of some people, I might not have reached it. Thank you Carina, Hanna, Jörg, Kai, Kalle, Manu, Markus, Paul, and Yun for your unfaltering friendships, open ear to all my problems, saving my life, more than half a life of companionship, in the real world and beyond, understanding, sympathy, or also just your love. I also want to thank my family, especially my parents for believing in and supporting me. Bina for providing nutrition on the last meters. My dad, for always being there for me. Thank you Björn and Kurtl, for giving me the chance to achieve this. With- out both of your foundation stones ten and eight years ago, this would not have happened. I would be somewhere/-one completely different by now. Furthermore, I want to thank the following people who helped me along the way with either small or large things, or just with their friendship, and are thus a part of this achievement: Andrea, Ben, Christine, Chris- tian, Cristian, Daniel, Dominik, Eva, Heike, Julia, Malte, Michael, Nora, Patrick, Robert, Tomek, Tobias, Ulf, Veronika, Vivien, and Wolle. Thanks to Elisabeth, Nic, and Anne for instantly agreeing to review this thesis and/or being part of my defense committee. And once more, big thanks to Ben, Björn, Daniel, Eva, and Markus for providing so many valuable suggestions for my thesis. Last, and most importantly I want to thank my two wonderful kids. Nora and Jan, I will eternally love you both, you are the light of my life, thank you simply for being.

Contents List of Symbols and Abbreviations ...... xvii List of Figures ...... xix List of Tables ...... xxi

I Introduction 1 1 Motivation ...... 3 1.1 Stress in our Modern Society ...... 3 1.2 Contributions ...... 8 1.3 Overview of this Thesis ...... 14

II Background & State-of-the-Art 15 2 Fundamentals ...... 17 2.1 Human Stress ...... 17 2.1.1 Psychological Stress and Its Manifestation in Bio- signals ...... 17 2.1.2 Stress Manifestation in the Electrodermal Activity 20 2.1.3 Stress Manifestations in Other Biosignals . . . . . 21 2.2 The Electrocardiogram ...... 23 2.2.1 The Electric Heart Cycle ...... 23 2.2.2 The History of the Electrocardiogram ...... 25 2.2.3 Electrocardiography Measures ...... 26 2.2.4 Stress Manifestation in the Electrocardiogram . . 27 2.3 Virtual Reality ...... 30 2.3.1 The History of Extended Realities ...... 30 2.3.2 Today’s Virtual Reality, Presence & Immersion . . 32 2.3.3 Hardware Systems for Immersive Virtual Environ- ments ...... 33 2.3.4 Software to Create Immersive Virtual Environments 34 2.4 Synopsis of the Fundamentals ...... 36

xiii Contents

3 Related Work ...... 39 3.1 Mobile/Wearable Electrocardiography and Arrhythmia Classification ...... 39 3.1.1 The Apple Watch as an Example ...... 39 3.1.2 Market Overview ...... 40 3.1.3 Algorithms ...... 43 3.2 Wearable-Based Stress Detection ...... 45 3.3 In Vivo and Virtual Reality Stressors ...... 46 3.3.1 Trier Social Stress Test ...... 48 3.3.2 Cold Pressor Test ...... 50 3.3.3 Maastricht Acute Stress Test ...... 51 3.3.4 Montreal Imaging Stress Task ...... 51 3.3.5 Stroop Test ...... 52 3.3.6 Synopsis of the Related Work ...... 58

III Components forElectrocardiogram-Based Wearable Stress Classification 61 4 An Everywhere-Usable Electrocardiography System .... 63 4.1 Platform and Current-Based Circuit Design ...... 65 4.1.1 Transimpedance Amplifier ...... 66 4.1.2 Signal Processing ...... 67 4.1.3 Power Supply and Housing ...... 68 4.2 Circuit Validation Study ...... 69 4.3 Heart Rate Variability Characteristics in Relaxed Environ- ments ...... 71 4.3.1 Subjects ...... 72 4.3.2 Method ...... 73 4.3.3 Results ...... 75 4.4 Discussion ...... 78 5 Real-Time QRS Detection and Classification on Mobile Devices ...... 83 5.1 Energy Efficient Detection and Classification of QRS Com- plexes ...... 84 5.1.1 QRS detection ...... 84 5.1.2 Template Formation and Adaptation ...... 86 5.1.3 Feature Extraction ...... 88 xiv Contents

5.1.4 Beat Classification ...... 88 5.2 Implementation ...... 89 5.3 Mobile-Phone Based Evaluation ...... 95 5.4 Results ...... 96 5.5 Discussion ...... 96 6 Refinement of R-Peak Detections ...... 99 6.1 Beat Slackness ...... 101 6.2 Beat Slackness in Existing QRS Detectors ...... 103 6.3 R-peak Refinement Algorithm ...... 105 6.4 Implementation and Results ...... 107 6.4.1 Thresholds ...... 107 6.4.2 Slackness Results ...... 107 6.5 Discussion ...... 108 7 Biosignal Awareness in Virtual Environments ...... 111 7.1 Methods ...... 112 7.1.1 Visualizations ...... 113 7.1.2 Experimental Design ...... 115 7.1.3 Participants ...... 117 7.1.4 Hardware and Software ...... 117 7.2 Evaluation and Results ...... 117 7.2.1 Biofeedback Tendencies ...... 117 7.2.2 Estimation Task ...... 118 7.2.3 HRV Analysis ...... 118 7.2.4 User Experience ...... 119 7.3 Discussion ...... 119

IV The Stroop Room – A Novel Tool for Stress Research 125 8 The Stroop Room ...... 127 8.1 Topology and Interaction Concept ...... 128 8.2 Trials ...... 131 8.3 Pressure Conditions ...... 131 8.4 Parameters ...... 133 8.5 Implementation ...... 135

xv Contents

9 Validation of the Stroop Room ...... 137 9.1 Participants ...... 137 9.2 Questionnaires ...... 138 9.3 Measures of Physiology ...... 139 9.4 Procedure ...... 140 9.5 Signal Processing ...... 142 9.6 Results ...... 144 9.7 Discussion ...... 154 10 The Stroop Room Classifier ...... 161 10.1 Stress Habituation ...... 162 10.2 Data Preprocessing ...... 162 10.3 Calculation and Selection of Features ...... 163 10.4 Training and Evaluation of the Classifier ...... 164 10.5 Results ...... 165 10.6 Discussion ...... 167

V Perspective 173 11 Summary ...... 175 11.1 Discussion of Contributions ...... 175 11.2 From Prototype to Continuously Integrated System . . . 179 12 Conclusions & Outlook ...... 181 Bibliography ...... 185

xvi List of Symbols and Abbreviations

Abbreviation Description

ADC analog-to-digital converter ANS autonomic nervous system AR augmented reality bpm beats per minute CPT Cold Pressor Test CSV comma-separated values ECG Electrocardiogram EDA electrodermal activity FAU Friedrich-Alexander-Universität Erlangen-Nürnberg HMD head-mounted display HPA axis hypothalamic-pituitary-adrenocortical axis HR heart rate HRV heart rate variability ICD International Classification of Diseases IMU inertial measurement unit JELY Java ECG Library NS-SCR non-specific skin conductance response PNS parasympathetic nervous system PPG Photoplethysmography/-gram RSA respiratory sinus arrhythmia SCL skin conductance level SCR skin conductance response SCWT Stroop Color Word Test SNR signal-to-noise ratio SNS sympathetic nervous system

xvii List of Symbols and Abbreviations

Abbreviation Description

TSST Trier Social Stress Test VE virtual environment VR virtual reality XR extended reality

Symbol Unit Description

푑iB fraction/ change of a metric to its baseline ratio measurement for subject 푖

푑B fraction/ mean of the individual changes of a metric to ratio their baseline measurements

푝푁50 % percentage of normal heartbeats with high variability (PNN50)

푡푅푆퐷 ms root mean squared successive differences of 푅푅-intervals (RMSSD)

푇푅푅 ms or s time between successive heartbeats (푅푅-interval)

푇푅푅 ms or s average of 푇푅푅 over a certain number of heartbeats (AVNN)

xviii List of Figures

1.1 Increase in the percentage of stress publications in Scopus 6 1.2 The general adaptation syndrome ...... 7 1.3 Overview of the wearable stress laboratory ...... 9

2.1 Electrical anatomy of a single heartbeat ...... 24 2.2 A modern virtual reality headset ...... 33

3.1 ECG recording on an Apple Watch ...... 40 3.2 Apple Watch ECG signal characteristics ...... 42 3.3 The TSST in vivo and in virtual reality ...... 49 3.4 The Stroop stimuli ...... 52 3.5 Ridley Stroop’s Stroop effect graph ...... 53

4.1 Underwater ECG sensor overview ...... 66 4.2 Block diagram of current-based ECG circuit ...... 67 4.3 Transimpedance amplifier ...... 68 4.4 Dry and submerged ECG measurements ...... 71 4.5 Measured ECG signals and their PSD ...... 72 4.6 Study procedure for each subject ...... 74 4.7 Positions of the subjects during the study ...... 74 4.8 Distribution plots of relative changes in HRV ...... 76

5.1 Algorithm overview and detailed decision tree for heartbeat detection and classification ...... 85 5.2 Process of automated QRS complex template selection . . 87 5.3 Features used for beat classification ...... 88 5.4 Hearty framework and outsourced components ...... 90 5.5 Screenshot showing Hearty’s main interface ...... 94

6.1 Differences between detected and annotated 푅-peaks . . 100 6.2 Location definitions for the Peak Refinement Algorithm . 102 6.3 Peak refinement with the slackness reduction algorithm . 106

7.1 Reference and Radial Visualizations ...... 114 7.2 ScreenPulse Visualization ...... 115 7.3 CubeGrid Visualization ...... 116 7.4 Results of heart rate estimations ...... 119 7.5 Distribution plots of relative changes in HRV ...... 120

xix List of Figures

7.6 Results of the AttrakDiff questionaire ...... 122

8.1 The wall layout of the Stroop Room ...... 128 8.2 The HUD of the Stroop Room ...... 129 8.3 View of the user during a trial in the Stroop Room . . . . 130 8.4 A participant inside the Stroop Room ...... 132 8.5 Shrinking target areas in the Stroop Room ...... 133 8.6 The different conditions in the Stroop Room . . . . 134

9.1 Devices used to record the biosignals in the Stroop Room 140 9.2 The procedure used for the evaluation study ...... 141 9.3 The overall relative changes in the HRV metrics ...... 145 9.4 Distributions of relative average changes in HR ...... 146 9.5 Distribution plots of relative changes in HRV ...... 148 9.6 Distribution plots of performance changes ...... 149 9.7 Comparison of the Stroop interference effect ...... 150 9.8 Comparison of Stroop Room results with other work . . . 151 9.9 EDA changes in the Stroop Room ...... 152 9.10 훼-amylase and cortisol changes in the Stroop Room . . . 153 9.11 Results of the Flow State Scale ...... 153

10.1 Feature space overview ...... 168 10.2 Feature distributions ...... 170

11.1 Comparison of HRV changes across all datasets ...... 176

xx List of Tables

2.1 Electrodermal measures, definitions, and typical values . 21 2.2 Overview of important ECG derived parameters . . . . . 28 2.3 Overview of scientifically usable 3D engines for VR . . 36

3.1 Overview of wearable ECG devices ...... 41 3.2 Overview of most recent biosignal-tracking wearables . . 47

4.1 HRV parameters as a mean over all subjects ...... 75

5.1 Evaluation results of the algorithm for all processed beats 97

6.1 Slackness results ...... 108

7.1 HRV parameters as a mean over all subjects ...... 121

9.1 Result parameters as a mean over all subjects ...... 144 9.2 The results of the Flow State Scale (FSS) ...... 151

10.1 Overview of the features in the Stroop Room classifier . . 166 10.2 Confusion matrix for the SGD classifier ...... 166 10.3 Confusion matrix for the Random Forest classifier . . . . 167 10.4 Confusion matrix for the Naive Bayes classifier ...... 167

11.1 Overview of HRV measures in all datasets ...... 176

xxi

Part I

Introduction

1 Motivation “Stress [is] the struggle to adapt to life” [1, p. 29]. Richard Lazarus, an iconic figure in the stress research community of the last half century, used this simple explanation to describe what stress means for living organisms, and in particular humans. This struggle can, at times, be particularly hard, with crises like the current COVID-19 pandemic in 2020 [2, 3], which requires massive disrupting adaptations at all levels of society. Maximizing survival from virus infections and slowing down spread is the most important thing right now to the society system, yet it should not be forgotten that all other disease mechanisms, for example those connected to the effects of stress on the human body, continue to be asrel- evant as before the appearance of this particular coronavirus. In fact, the sudden change to home-office environments for large parts of the workforce within days, probably put some work-related stress research goals upside-down over night. It provides an example and good motivation for why the content of this work might be even more relevant today. This thesis proposes a method to assess stress levels and recognize inade- quate stress adaptation in a mobile stress laboratory that was originally designed to be usable in the wild/in situ in office-environments, but can also be used in the home-quarantined environment. This offsets the approach from existing work and provides a beneficial new method in the toolbox of stress-related research in a more physically distanced COVID-19-society. 1.1 Stress in our Modern Society Richard Lazarus pointed out that stress is a relatively young term, which “came into vogue” [1, p. 30] only about half a century ago. Fueled at first by military interests in the fallout of the two world wars, research about this topic became increasingly relevant again, when the technological development speed in our post-industrialized society started to outpace our evolutionary adaption capabilities [4]. Our technological achievements from the last 20 years and the ongoing shifts in society led to radical changes in the way we interact with each other or go about our work [5–7]. Computers, smartphones, tablets, voice assistants, computer-brain-interfaces, social network applications, satellite-internet-based everywhere-on-earth real-time video conferenc- ing – the list can be arbitrarily continued. Those remarkable advances changed and continue to change our society at unprecedented speeds [8]. This is even more drastically demonstrated right now with society be-

3 1 Motivation

ing required, and actually able to, switch to home-office environments without too much friction from one day to the other. Soon, no one will have at least a technical ’excuse’ anymore to not always be online, always reachable. The mobile and wearable devices that enable such behavior are everywhere. They are built to increase our productivity, availability, and connectivity in our already very technologically advanced world [9]. Smartwatches for example are built for consumers to allow them to get push notifications as soon as new messages arrive onthe smartphone. They vibrate and blink, they subtly request attention. They improve our task efficiency, but also our exposure to stress. On the other hand, they are equipped with a continually increasing number of tracking capabilities [10, 11]. The marketing approach of manufacturers is to give the consumer knowledge about every step he/she takes, every heartbeat they have, every movement they make, and how they sleep. This development has been coined “quantified self”, “life log”, or “Sousveillance”. Some users do it out of fun, some out of interest, companies want their users to do it to enable targeted services/ads and science can do it to maybe observe stress in life [4, 12–16]. Besides the positive things that come from life-logging and targeted service exposure, it also prevents us from reaching prolonged homeostasis, i.e. stress- absent relaxation. Even at subtle levels, continuous, uninterrupted stress impacts our health significantly [17–24]. The problem around stress in our society did not go unnoticed. The health system has already clearly recognized diseases directly related to stress. The International Classification of Diseases (ICD)-10 has several codings for it [25]: Z56.3 classifies a “Stressful work schedule”, Z73.3 classifies arbitrary stress, and the entire coding tree of F43, whichis designated as “Reaction to severe stress, and adjustment disorders”. This sub-tree also contains F43.12 which codes “Chronic Posttraumatic Stress Disorder” [25, 26]. This is diagnosed with high prevalence in war veterans, which was among the main reasons the military started research about stress during/after World War I and II in the first place, when the effect was still misleadingly termed and thought to be the “shell shock” [1, 27f.]. Stress-related diseases have become highly prevalent in the last two decades [20, 27–29]. Workplace stress was even coined by the World Health Organization (WHO) to be the “health epidemic of the 21st century” [27, 30, 31]1. One of the most important stressors in our lives is our

1 This thesis was written mostly in 2019, with the COVID-19 pandemic not existing yet. However, as mentioned above in the motivation, even with this being a major health crisis, all other healthcare concerns are still existent and as relevant as before.

4 1.1 Stress in our Modern Society

work and how we go about it. Long work hours and permanent availability for work are the main reasons for work-related stress and, according to Jeffrey Pfeffer, have reached a “critical mass” [7]. Some surveys report that more than 50 % of workers check their work e-mail after 11 p.m. and on vacation [7]. Apparently, 80 % to 90 % of all industrial accidents can be related to stress [20]. There also is a correlation to the light pollution of our society, we are not restricted to work during sunlight and thus our sleep habits also changed dramatically, and with them, created a lack of sleep, fostering associated diseases and promoting stress even more [32]. The direct and indirect costs of stress-related diseases have been estimated for the United States to $300 billion in 2013 and were projected to go into the trillions of Euros in Europe by 2020 [28, 29, 33]. The second number has somewhat been indirectly confirmed in 2019 by the WHO, stating this number as the money lost due to common mental disorders [34]. In the United States, it has been estimated that stress accounts for 75 % of all doctors visits [20]. Besides these estimated figures, it is hard, if not impossible, to find continuous numbers showing the development of stress or stress-related diseases in our society over the last century. However, the prevalence of stress-related diseases or disorders might be approximated indirectly, by looking at the research that was done about it. If it is assumed that the research interest adapts to changes and demands of society, which seems like a relatively valid assumption, the relative increase in publications about stress and related diseases or experiments could be used to extrapolate its prevalence. In order to facilitate this thought-experiment, the relative, growth-corrected percentage of newly indexed stress-related publications per year in the Scopus database [35] was extracted and evaluated in all medicine/ healthcare subareas. The result is plotted over the last 100 years in Fig- ure 1.1, with a non-linear large increase visible in the percentage of stress related research work in the decade before and after the year 2000. With literature providing isolated figures, the correlation of this curve to some actual case numbers can be determined: in a study from 2008, the measured prevalence of stress and stress-related diseases in a sample population in the United Kingdom went up from 0.8 % to 1.7 % between 1990 and 2002 [20]. This is an increase by 113 %, while the percentage of stress- related publications during that period increased by 80 %. Therefore, this approach may provide a trend approximation. Some elements of the curve also match descriptions of Richard Lazarus about the history of stress research and the interest about it in our society [1]. After an interest spike related to World War II, for reasons

5 1 Motivation

6 ) Stress AND (Study OR Experiment) % ( Stress AND (Disease OR Sickness)

s 5 s e r t S

t 4 u o b A 3 s n o i t a

c 2 i l b u P 1 w e N 0 1940 1960 1980 2000 2020 Year

Figure 1.1: Increase in the percentage of stress publications in Scopus. The relative (growth-corrected) increase in the number of newly indexed publications about stress per year in the Scopus database [35]. Two queries were performed for each combination of reference words to obtain each trend line. E.g. for the first query (green), the base number was obtained by querying for the words ’study’ or ’experiment’ in title, abstract, or keywords of publications. An additional search queried for ’study’ or ’experiment’ with the required term ’stress’. The percentage is the fraction of the result numbers. Queries were restricted to the healthcare-related subareas MEDI, PHAR, NEUR, PSYC, SOCI, NURS, HEAL [35]. mentioned above, not much was published until the mid 60s. Stag- nating again during the 70s, a non-linear sharp rise in interest can be observed up to almost five times more publications today in this area than during the 80s. The time-frame of 100 years for this curve was not selected arbitrarily. As already described, the development of stress research started in the 20s and 30s of the last century. Hans Selye published his landmark review in 1946 and extensively described stress in terms of a “general adaptation syndrome” and long-term stress effects as the “disease of adaptation” [17, 20]. He separated stress reactivity into three phases: the alarm reaction (shock and counter-shock), the resistance phase, and finally exhaustion. He also coined terms like “adaptation energy”, which is needed to maintain resistance against a stressor. However, no organism can indefinitely maintain resistance and ultimately breaks down (or “burns out” in mod-

6 1.1 Stress in our Modern Society

Alarm Reaction Resistance Exhaustion

+ -

Counter Shock Shock Speciﬁc Resistance Time

Figure 1.2: The general adaptation syndrome. Slightly improved recreation of Hans Selye’s general adaption syndrome plot [17, p. 123], showing the specific resistance function over time of an organism going through the three stages proposed in the model. ern language) [17]. He also defined the distinction between the terms “stress” (the result) and “stressor” (the reasons) [20, 36]. In Figure 1.2, the characteristic curve of the general adaptation syndrome is shown. Especially the stress reactivity during the alarm reaction is influenced by a person’s current stress state. Even though this very accurately models the physiological resistance/reaction an organism shows against a noxious/stressful environment/situation, it does not model “the way psychological stressors work” [1, p. 48]. Therefore, modern research developed more sophisticated stress models, like the allostatic load model which considers e.g. the history, predisposition, and physiology of individuals within a complex interactive environment [37], and to create standardized stressor methods in order to characterize reactions when inside the alarm phase in greater detail [38]. There are several of these, which are discussed in the Fundamentals and Related Work, the most important ones are the Trier Social Stress Test (TSST), the Cold Pressor Test (CPT), and the Stroop Test as one of the most well known and best researched stressors available today [39–43]. With a lot of stress coming from increasingly stressful office/work lives [6], it will be important in future assessments of stress in large populations to do this with minimal effort for the individual. The Stroop Test in particular tests not only stress reactivity, but also the ability for voluntary control of exogenous and endogenous attention [44], a skill becoming increasingly important in the modern office life and the lack thereof a large source for stress [45–47]. Many people know that they are stressed in their jobs [7]. However, they can’t say how much exactly, they can’t

7 1 Motivation

intuitively say why and they don’t know how to deal with it, i.e. remove the stressor or reduce the stress effect. Incentives like Green Buildings (biophilic design), Work-Life balance strategies, mind–body interventions (meditation, mindfulness-based stress reduction, yoga, progressive muscle relaxation, biofeedback), and cognitive–behavioral stress management (CBSM) are some approaches whose importance and effectiveness science has understood well for many years now, but which haven’t fully arrived in the mind- and toolset of society and in particular the healthcare system yet [7, 28, 48–53]. There is an urgent need to drive the development of stress research even further, allow faster assessments and experiment cycles, include more participants in more diverse study populations than just the usual student population in common university-lab based studies. This work is intended to provide one of the pieces needed to bridge that gap or at least explore a direction. The goal is to build a prototypical stress analysis laboratory platform that allows insights into stress developments, habituation processes, attentional control assessments and treatment efficiencies in the wild. The presented approach relies on wearable devices, the recording of precise cardiovascular activity and the Stroop Test to modify stress states of humans and observe reactivity results in potentially very large and diverse populations. The major problems of almost all existing reference stressors are their very elaborate construction with the requirement of subjects coming into laboratory facilities with a lot of spare-time and their reliance on substantially effective stress mechanisms which are quite uncomfortable for subjects to experience. New methods are needed to approach this in a more participant-centered way, in a way that can be easily and quickly performed among a large number of subjects and with a stressor that can be accepted easily by the users. It is this kind of stress-laboratory that has been developed in this work. 1.2 Contributions In this work, a prototypical but complete pipeline is presented that allows an in-the-wild, wearable, easy-to-use stress reactivity and response evaluation of a person. It consists of five distinct modular systems that can be continuously integrated to form a complete pipeline for such evaluations, as depicted in Figure 1.3, or used on their own for their isolated purpose:

Everywhere-ECG A high precision lightweight current-based Electro- cardiogram (ECG) recording and digitization circuit built into an existing and scientifically validated wearable platform that allows

8 1.2 Contributions

Chapter 4 Chapter 5 Chapter 8+9

Everywhere SensorLib Hearty HRV Stroop Room ECG (Smartphone) (VR)

Chapter 6+7 Chapter 10

Y QRS Detection ECG Stroop Room Stress Reactivity & Human R-Peak Reﬁnement Habituation Type JEL HRV Determination Classiﬁer

Figure 1.3: Overview of the wearable stress laboratory.

real-time signal transmission to a portable device (smartphone or tablet) has been developed. It extends the state-of-the-art by providing an integrated circuit for the measurement of current rather than voltage in order to determine the electrical field projected by the activity of the human heart onto the skin surface. The major advantage of this technique is the robustness to environmental changes of the electrode sites. This work was published 2017 in a special issue in Applied Sciences [54].

Hearty A smartphone application was created that allows algorithmic real-time on-device QRS complex detection and evaluation, as well as classification of abnormal beats with high precision and low algorithmic complexity using input data from any Bluetooth- enabled ECG interface. It extended the state-of-the-art in 2012, when on-device computing for biomedical signals on smartphones was still in its infancy, or not even existent yet. The approach implemented a QRS detector, arrhythmia classifier, and evaluation pipeline directly on Android-based smartphones [55]. The algorithms, which were open-sourced during publication, are still highly relevant for the next generation of wearables.

JELY A Java-based ECG library was designed, containing many algorithms related to ECG signal processing, which can be used on Android mobile devices in the wild as well as on regular computers for evaluations. In particular it contains an 푅-peak correction algorithm which was published in 2015 to fill the gap of an evaluation step that was missing in almost any QRS complex detection algorithm paper screened during development [56]. Furthermore, the library contains a heart rate variability (HRV) calculation platform and connects to a transmission system to get 푅-peak information,

9 1 Motivation

RR-intervals and HRV metrics into real-time rendering engine systems, which was used in all follow-up projects. The library is to be submitted to the Journal of Open Source Software.

Stroop Room A virtual reality (VR) immersive stressor application is the core of this thesis. It was created to allow characteristic performance data related to the Stroop interference effect to be recorded even in home settings. The software is tailored to provoke multiple interference effect errors that allow the error responsiveness ofa person to be determined, which will be shown to contain important information about stress parameters of the user. The Stroop Room was published in 2019 on the ACM symposium on Virtual Reality Software and Technology (VRST) [43]. It extended the state-of-the-art by a completely novel modality for performing the Stroop Test by exploiting the presence and three-dimensionality of a virtual reality. This was preceded by precursor-work where my colleagues and I tested and published the possibility of implementing an improved Stroop Test in VR [57].

Stroop Room Classifier Using the Stroop Room, a machine learning approach was devised allowing the assessment of the stress habituation responder type of the user, solemnly based on data collected in a single Stroop Room session. This is so far unpublished research, which, with its still preliminary results could prove to be a significant extension to the state-of-the-art in psychological research. Right now, there is no published method to determine the stress responder type of a person without drawing blood or gathering saliva samples of the subject and analyzing them in a laboratory. The Stroop Room Classifier would enable researchers to do this in 15 min at anyone’s home just using a smartphone and a VR headset device. At the end of this work it is demonstrated how each of these modular components can either be used individually on their own right, or run interconnected as a real-time biofeedback, stress monitoring, and evaluation platform that can be used for bulk laboratory-like stress parameter assessment in large populations. In order to allow an easy reuse of this work as a blueprint, most of the algorithms presented here are available as open-source implementations on GitHub. This essentially extends/exposes the contribution of this work to “the largest open source community in the world” [58].

10 1.2 Contributions

In order for all technical and scientific details to be more approachable to the reader, the main Fundamentals and Related Work are explained in detail in chapter 2 and chapter 3. The contribution of these chapters is not only a thorough overview of literature and governing principles of electrocardiophysiology, stress, and virtual reality, but also an overview of state-of-the-art ECG wearables and open-source Stroop Test implementations for use in stress research. This has partly been published at the conference on Pervasive Computing Technologies for Healthcare in 2019 [4]. Additionally, this thesis contributes the data and evaluation results of three independent HRV analysis studies to assess how changes in this parameter can be used for stress assessment in VR. All three were performed using the Everywhere-ECG. In chapter 4, the first study is presented, which provides HRV data of highly relaxed subjects in a stress-free environment. Next, in chapter 7, subjects were immersed into a non-stressful virtual environment (VE) to record HRV characteristics in a not completely stress-free environment and while confronted with a non-trivial task. This was partly published in 2018 [59]. Finally, in the validation study of the Stroop Room, the HRV characteristics were evaluated with subjects put into a stressful VE with and without inducing significant cognitive load. Looking at all these data side-by-side allows an holistic conclusion at the end of this work about the changes in the HRV that can be expected in such environments and how it validates the stressor laboratory approach presented here. Even though the reader will get to know that there have been similar ideas for some isolated components in the Related Work, a complete connectable framework for mobile, in-the-wild stress analysis with thorough HRV evaluation using the same method in different environments is missing so far. The work done in this thesis and the development of its methods were further influenced by ideas that have been published previously and each one also extended the state-of-the-art in their respective fields. These are listed in the following for completeness2: • “Somnography using unobtrusive motion sensors and Android- based mobile phones”, in 2013 provided a mechanism to record hypnograms in the home-environment by just using a smartphone on the mattress while sleeping [32]. My contribution was to develop the algorithm, conduct the study, evaluate the results, and write the publication.

2 Publications with me as the first or senior author are emphasized in italics.

11 1 Motivation

• For the “Comparison of real-time classification systems for arrhythmia detection on Android-based mobile devices”, in 2014 [60] I contributed through help with the data evaluation and providing feedback, input, and review for the paper. • In 2015, I contributed to the paper “Filter and processing method to improve R-peak detection for ECG data with motion artefacts from wearable systems” [61] through suggestions to the method, as well as review and feedback for the publication.

• In “Arrhythmia classification using RR intervals: Improvement with sinusoidal regression feature” from 2015 [62] I gave feedback and suggestions during development of the algorithm, and helped with the evaluation and review of the paper.

• “An Overview and Acceptance Study” about “Virtual and Augmented Reality in Sports” was conducted in 2016 [63]. The contribution here was to develop an online questionnaire, recruit participants, evaluate the results and write most of the publication. • For the paper about “Instantaneous P- and T-wave detection: As- sessment of three ECG fiducial points detection algorithms” in 2016 [64] I supported algorithm development and writing of the paper. • “Detection of fetal kicks using body-worn accelerometers during pregnancy: Trade-offs between sensors number and positioning” was published in 2016 [65]. I contributed here by supporting the entire algorithm development and evaluation and reviewing the paper. • “Textile Integrated Wearable Technologies for Sports and Medi- cal Applications” was published in 2017 about the FitnessSHIRT platform, which is the basis for chapter 4 [66]. The contribution here was data evaluation, as well as feedback, writing support, and review of the publication. • The “MigraineMonitor – Towards a System for the Prediction of Migraine Attacks using Electrostimulation” in 2018 [67] was supported during development and evaluation as well as writing support, feedback and review for the paper.

12 1.2 Contributions

• In 2018, an algorithm about “Movement Speed Estimation Based on Foot Acceleration Patterns” was published [11], with my contributions being the development and evaluation of the algorithm and the writing of the publication. • In the 2018 publication about “Comparison of Different Algorithms for Calculating Velocity and Stride Length in Running Using In- ertial Measurement Units” [68] I contributed the development, evaluation and description in the paper of one of the four algorithms, as well as review and feedback for the publication. • “Assessment of Perceptual-Cognitive Abilities among Athletes in Virtual Environments: Exploring Interaction Concepts for Soccer Players” in 2018 explored the feasibility of VR and different interaction concepts in sports [69]. The contributions here were the support of the study design, writing input to and review of the publication. • “Exploring the Feasibility of EMG Based Interaction for Assess- ing Cognitive Capacity in Virtual Reality” from 2018 was similarly supported by writing input to and review of the publication [70]. • The same contributions were made for “Evaluation of Interaction Techniques for a Virtual Reality Reading Room in Diagnostic Ra- diology” from 2018 [71]. • In “Same Same but Different: Exploring the Effects of the Stroop Color Word Test in Virtual Reality” my colleagues and I published in 2019 the results from our original study from 2016 about the Stroop Test in VR, sparking the idea of the Stroop Room. My contributions here were the conception of the original idea, the development of all versions of the Stroop Test in VR, support with the study design, help with the data collection, writing input for the paper as well as review and feedback for it. • In 2019, my colleagues and I published a machine learning method for “Classification of Acute Stress-Induced Response Patterns” [72], which indirectly provided the precursor-idea to the Stroop Room Classifier in chapter 10. • In 2019, “Sick Moves! Motion Parameters as Indicators of Simulator Sickness” was published in the IEEE Transactions on Visualiza- tion and Computer Graphics [73]. The contributions here were

13 1 Motivation

significant help during the experiment design, support during the data collection, data evaluation, and input to and review of the publication. 1.3 Overview of this Thesis The next part of the thesis contains an extensive description of the research background and related work. First, in chapter 2 (Fundamentals), relevant information of the three main research fields this work draws from is summarized (Human Stress, The Electrocardiogram, Virtual Re- ality). This constitutes an excursion into the history of the field, general state-of-the-art knowledge of today about the fundamentals of the area directly required to understand this work, and where applicable, information extending beyond this work to allow the reader to more easily see a larger picture. At the end of the chapter, a Synopsis of the Funda- mentals is given, specifically tailored to summarize the most important facts related to this work. Then, chapter 3 introduces the reader to related work in the fields of Mobile/Wearable Electrocardiography and Arrhythmia Classification, Wearable-Based Stress Detection, and In Vivo and Virtual Reality Stress- ors. With the last one being the most extensive contribution, giving a broad overview of all important stressor methods, and in particular of the Stroop Test in all its variations (subsection 3.3.5). Again, a Synopsis of the Related Work is provided at the end of the chapter. The next seven chapters then highlight the individual contributions of this work. They are ordered according to the processing pipeline for the wearable stress lab as shown in Figure 1.3: chapter 4 describes the wearable ECG device needed to record data from subjects; then, chapter 5, chapter 6, and partly (next to the biofeedback study) chapter 7 describe the mobile application and Java framework as well as the signal processing algorithms needed to extract HRV information from the wearable sensor and provide it to the VR system to connect it with the Stroop Room. The latter is the stressor and core part of the wearable lab that is introduced in its general concepts in chapter 8, in its implementation and validation study in chapter 9, and finally with its classification method in chapter 10. At the end, a summary is given in chapter 11, along with a discussion of this work’s contribution and an example of a continuously integrated framework implementation. The last chapter (12) concludes this work and provides an outlook into future directions.

14 Part II

Background & State-of-the-Art

2 Fundamentals The contributions of this work originate mainly from three general research fields: cardiology, psychophysiology, and virtual reality. Therefore, the fundamentals of these will be covered in detail in this chapter1. It should allow the reader to get familiar with the important theoretical concepts behind the remainder of this work and also some further information and history, which are not mandatory for the understanding. First, detailed information is given about the mechanisms of stress in humans, including an overview of how human biosignals change under its influence. The introduction already gave a short overview about the history and the prevalence of stress and related diseases, so this is omitted here. Such a history is, however, then given in the second part of this chapter about the ECG, after introducing its electrophysiological origins in the human heart. This is followed by an overview of most important ECG/HRV signal parameters and how they change under stress. Afterwards, virtual reality is discussed. The chapter is concluded by a Synopsis, summarizing the most important and crucial takeaways for the rest of the thesis. 2.1 Human Stress After a general introduction, stress manifestation in different biosignals is discussed, first in the electrodermal activityEDA ( ) and then in other biosignals. Stress manifestation in the ECG is discussed in the next section (2.2) dedicated wholly to the ECG. 2.1.1 Psychological Stress and Its Manifestation in Biosignals This section is based in large parts on knowledge described by the two pioneers in modern human stress research, Hans Selye and Richard Lazarus [1, 24, 77–80], as well as described in the chapters of the Hand- book of Psychophysiology [81], which providean up-to-dateand extensive description of all the aspects of psychophysiology. Furthermore, models developed by Hans Selye were refined by Sally Dickerson [19, 82], whose work also influenced this section. Unless otherwise noted, itcan be assumed that the information behind my descriptions are based on knowledge found in those works.

1 Unless otherwise noted or directly cited, information in this chapter has been extracted from [4, 74–76]

17 2 Fundamentals

Stress is a reaction of the human body to changes that threaten the body’s homeostasis. It can be differentiated into Eustress, positively associated stress, and Distress, negatively associated stress. The three basic types of stressors are cognitive, emotional, and physical/physiological stress. Cognitive stress occurs while performing tasks that require a large amount of cognitive capacity, e.g. complicated mental math calculations. Emotional stress is any other form of non-physical and non-cognitive stress, e.g. during phases of intense fear or anxiety. Physical/physiological stress occurs when large amounts of pain, heat, etc. receptors in the human body are activated. There are two main communication systems in the human body that promote and regulate stress reactions. These are the autonomic nervous system (ANS), and the hypothalamic-pituitary-adrenocortical axis (HPA axis). Increased HPA axis activation (through increased release of coticotropin-releasing hormone CRH by the hypothamalus in response to stress) leads to an increase in the release of the hormone cortisol [83]. Its presence in the bloodstream causes primary effects to the body’s metabolism, e.g. elevation of blood glucose levels, which provides the body with more energy, and indirect effects on the ANS. Although short- term increased cortisol levels increase immune system activation, long- term exposure leads to an inhibition of it [83]. This happens in accordance with the stress reactivity curve model of the general adaptation syndrome by Hans Selye [17], which was already described earlier. The ANS is comprised of the sympathetic nervous system (SNS) and the parasympathetic nervous system (PNS). These systems directly affect the beat-to-beat changes of the heart, which are characterized by the HRV. The low-frequency (LF) part of the HRV contains SNS and PNS influences and is modulated by baroreflex activity. The high–frequency (HF) component contains PNS influences and is also connected to the respiratory sinus arrhythmia (RSA), which describes the modulation of the breathing pattern into the HRV (see below). An estimate of the sympathetic modulation can be made using the LF/HF power ratio which describes the sympatho/vagal balance [84]. Next to the various HRV measures, the enzyme 훼-amylase (sAA), mostly found in human saliva, has been proposed as a biomarker reflecting SNS activity [85–87]. Other possibly measurable biosignal-related changes to stress in the human body are found in levels of noradrenaline, serotonin, dopamine, neu- ropeptideY (NPY), and the brain-derived neurotrophic factor (BDNF) [83]. All those systems react differently in their activation/deactivation based on the type of stressor and also based on genetic factors and exposure-

18 2.1 Human Stress

training [19, 82, 83]. It has been concluded that, besides direct physical stressors (injury, pain), socially evaluative threat causes significant reaction in the HPA axis, even more so when coupled with a feeling of uncontrollability (impossibility to succeed despite best efforts and the public exposure of that situation, called “exposed failure” [19, p. 1198]). Any other kind of stress (non-shamefully linked emotions) does not seem to activate the HPA axis [19]. Previous work has shown that the largest overall stress reaction is provoked by activating both systems. This can be achieved by combination of socially evaluative threat (e.g. public speaking), cognitive load, and physical stress (e.g. pain) [82, 88], as will be described in the next chapter. The concentration of cortisol in the blood, or saliva, is used in many works as the gold standard for stress determination, mostly because socially evaluative threat is, for various reasons, considered the most negative distress (see Related Work). Cognitive stress only has an effect on the ANS, with direct effects observed in several parameters of the HRV and 훼-amylase [85–87, 89]. However, cognitive stress can more reliably be induced repeatedly, while socially evaluative threat is usually subjected to habituation. Depending on cortisol reactions in response to a repeated standardized stressor (e.g. the TSST), six different stress responder types can be identified. These are the habituation responder, non-habituation, sensitization, non-responder, anticipatory and secondary anticipatory responders [72, 90]. Habituation is considered the healthy, or normal reaction, which is observed in about 70 % of the population, while all other responses seem to indicate an abnormality in the stress response axes behavior. So far, no easy way has been found to continuously (in quasi real-time) measure concentrations of cortisol and 훼-amylase, or in general the activation of both main stress axes. If a real-time assessment is required, the EDA and the ECG are usually used to provide a reliable approximation for at least the ANS activation. This is one of the reasons why this work focuses on the ECG, which can provide stress information from the HRV parameters in an unobtrusive and continuous way. Since the EDA is also controlled through the ANS branch and will be relevant in the Stroop Room evaluation, it will be discussed in more detail as well in the following section.

19 2 Fundamentals

2.1.2 Stress Manifestation in the Electrodermal Activity Measurement and evaluation of changes in the electrodermal activity (EDA) is among the most often used approaches in studies and experiments in the fields of psychology, psychiatry, and psychophysiology [91]2. There are two different ways of measuring this biosignal, which was for- merly also referred to as Galvanic Skin Response (GSR). The exosomatic method uses small external currents applied across the skin, the endoso- matic method passively measures internally generated skin potentials. Essentially, EDA describes the amount of sweat produced in sweat glands indirectly through measurement of the conductivity of the upper skin layers, where the sweat glands are located. They are present all over the body-skin, but their density is highest on the palmar and plantar surfaces. Therefore, EDA is best measured at the fingers (volar surfaces of medial or distal phalanges), hand palms (thenar and hypothenar eminences), forehead, and the insteps of the feet [92]. The sweat glands are innervated by the sympathetic nervous system via the sympathetic anterolateral pathway through the premotor and frontal cortex, the hypothalamus and limbic system (with influences from amygdala and hippocampus), and the reticular formation. There are several important measures with respect to stress. Two different signals are distinguished from the EDA: the tonic and the phasic component. The tonic (low-frequency) component allows a measure of the skin conductance level (SCL), which describes the general arousal level. It can be thought of as the baseline level of the EDA signal curve, usually calculated as the mean over a time window of several seconds. Colloquially, this is a measure of how “sweaty” a person currently is in general. The phasic (high-frequency) component allows determination of specific/ stimulus-related skin conductance responses (SCRs), and non-specific skin conductance responses (NS-SCRs). These comprise reactions of the SNS to specific stimuli in time (bursts of sympathetic nerve activity), leading to a hyper-activation of sweat glands as a result. These events produce distinct characteristic spikes in the EDA, which usually are delayed by 1 s–3 s after the stimuli and measured in number and amplitude. It has also been observed that physical stress/activation increases the overall number of NS-SCRs, which has to be considered as noise when recording cognitive stress during physical activity. An overview of relevant EDA measures is shown in Table 2.1.

2 Information in this section was compiled largely from [91].

20 2.1 Human Stress

Parameter Normal range Definition

Skin conductance level (SCL) 2 µS–20 µS Tonic level of the electrical conductivity of the skin

Skin conductance response (SCR) Phasic response to a specific stimulus SCR amplitude 0.2 µS–1.0 µS Phasic increase in conductance short- ly following stimulus onset SCR latency 1-3 s Temporal interval between stimulus onset and SCR initiation SCR rise time 1-3 s Temporal interval between SCR initiation and SCR peak SCR habituation 2-8 stimuli Number of stimuli before 2-3 trials with no response

Non-specific skin conductance re- Phasic response not relatable to a spe- sponse (NS-SCR) cific stimulus Frequency of NS-SCRs 1-3 per min Number of SCRs in absence of identifiable eliciting stimulus

2.1.3 Stress Manifestations in Other Biosignals Smets et al., as well as Giannakakis et al. have collected a thorough overview of biosignal variables that stress has an influence on [36, 93]. They also referenced classifiers and their accuracy in dependence ofthe used biosignals. They concluded that almost 100 % accuracy in classification of no stress, mild and heavy stress can be reached by using the ECG, the EDA and the respiration rate together [93]. However, there are other biosignals that can also be used to assess the stress state or even quantify it. The remainder of this section was taken directly, with only slight changes, from [4], as I wrote these parts in the corresponding publication. It was enhanced by information found in [94]. The respiration or breathing rate (Resp) can be measured using chest straps, depth imaging [95], thoracic electrical bioimpedance [96], or the RSA (see above). Several breathing characteristics, such as respiration variability, respiratory rate, tidal volume and in-/expiratory duration changewith stress experienceas well as thegas composition of theexhaled breath [16, 97]. Depending on all these breathing characteristics, the peripheral blood oxygen saturation (SpO2) can also provide insights into the stress state [98].

21 2 Fundamentals

Temperature changes on the skin surface, skin/surface temperature (ST), in various areas or the core body temperature (BT) have been shown to change with stress [99, 100]. They are usually measured using ther- mometers or thermal imaging [8]. Besides ECG and EDA, stress-related studies also often include data from the measurement of the electromyogram (EMG). It has been shown that in particular the upper trapezius and masseter muscle show specific, unconsciously performed activation patterns during stress episodes [8, 101–103]. The electroencephalogram (EEG) measures the electrical brain activity through surface electrodes and is a standard, non-invasive method for monitoring and analyzing the state of the brain [104]. Through changes in certain frequency bands of the EEG signal, in particular in the Al- pha and Beta band, the experienced mental stress can be assessed [8]. Additionally, near-infrared spectroscopy (NIRS) is also capable of measuring human brain activity using optical determination of the (de)oxy- hemoglobin concentration in the brain [105]. EEG recordings can also be used to derive the electrooculogram (EOG), which allows the determination of parameters such as blink rate and eye movement. Other parameters, such as pupil diameter (PD) or gaze characteristics can be measured using eye-tracker systems and show changes during the experience of acute stress [8, 106]. There are directly quantifiable gaze parameters, e.g. duration, speed and occurrence of saccades, but also indirect changes in characteristics of gaze and object focus, e.g. attentional selectivity [106]. Not only changes in subconscious behavioral aspects of the eye can provide stress-related information, also the facial expression itself as well as the head movement patterns for specific tasks or in general observation [8, 94, 107]. Beyond that, the usage of every-day items like the keyboard or mouse dynamics, mobile phone usage, calendar events (e.g. meeting behaviors or avoidance) and location change patterns are different under stress [8]. Additionally, other movement patterns such as body posture [8], recorded for example by inertial measurement units (IMUs) that measure acceleration and angular velocity, and potentially gait characteristics change as well. Speech characteristics have also been shown to change in stressful situations [8, 96]. Although not directly linked to stress, other signals can be helpful to be recorded in order to improve stress detection in the measurable signals. Since the emergence of autonomic response classification, it has

22 2.2 The Electrocardiogram

been acknowledged that stress quantification yields most accurate results if it is determined from the profiling of biosignals across multiple response domains and across time [74]. For example, lactate or Ammonium concentrations in sweat might allow to discard phases of high intense physical activity [108], which cannot be distinguished from pure mental stress in other biosignals, while, at the same time, analyzing HRV, EDA, respiration and head movement patterns. Similarly, analysis of sleep patterns can be used to determine phases of bad sleep which result in a worse stress reaction or limited ability of the body to cope with it [109]. Blood pressure changes, which are often thought of being related to stress, do not seem to correlate with mental stress directly [89, 110]. 2.2 The Electrocardiogram This section is dedicated to the ECG. In the first subsection, the origin and propagation of the heart’s electrical activation is explained. This is followed by an overview about the history of this biosignal, and afterwards a description of all electrocardiographic measures that are relevant in this work. In the last subsection, the relevance of ECG parameters to stress detection is outlined, as was done for other biosignals in the previous section. 2.2.1 The Electric Heart Cycle One of the many important purposes of the cardiovascular system is to provide all body tissue with oxygenated blood. Therefore, the system is comprised of two circulation systems: a pulmonary (lungs, oxygenation of the blood) and a systemic (body, transport of oxygenated blood) circulation. The heart as main operator of this system (minor operators are e.g. the lungs or the vasoconstrictive system) needs to generate a continuous blood flow/pressure using muscular pumping. For this purpose, the heart is made up of specialized muscle cells that contract in a very distinct and predefined manner across the heart. This pattern allows highly effective pumping of blood through the pulmonary circulation (i.e. from the right ventricle to the left atrium), and into the systemic circulation. Rhythmic contraction of individual muscle cells is initiated by a defined electrical activation using the main pacemaker systems of the heart: the sinoatrial (SA-) and atrioventricular (AV-) nodes. They both are able to work on their own, but during normal states the AV-node is controlled by the SA-node and the SA-node is controlled by the hormonal system as part of the autonomic nervous system (ANS). This close coupling to the ANS

23 2 Fundamentals

Figure 2.1: Electrical anatomy of a single heartbeat. A single heart cycle in a typical ECG signal from a Lead II derivation (A) with definitions of the most important characteristic peaks and waves as well as interval durations and their correspondence in an anatomical view of the human heart (B). The picture also shows Einthoven’s triangle with its main bipolar limb leads (I-III) and unipolar augmented leads (aV푅,퐿,퐹), the main wave propagation directions for P, Q, and R waves as well as the location of the sinoatrial (SA) and atrioventricular (AV) node. Reproduced from [74] © 2016 with permission of the Cambridge University Press through PLSclear. in healthy humans allows a good characterization of its current state by precise tracking of electrical heart activation. The ECG describes the flow of electrical current across the heart. These characteristics are quite limited in their time-spatial and amplitude variation for healthy subjects. Therefore, it can be used to identify and characterize heart diseases (e.g. resulting from dead muscle tissue) as well as infer the activity of the ANS. The ECG is a recording of repeated heart cycles. Each cycle has different activation phases which are represented in the ECG as the P, Q, R, S, and T waves or peaks. The most dominant of these is the 푅-peak, in a normally working heart. The 푅-peak mostly represents (the electrical activation of ventricular muscle cells leading to) the compression of the ventricles, which make up the majority of the heart muscle mass. The 푃-wave represents atrial activation and the 푇-wave the repolarization of the ventricles. The typical cyclic wave sequence is depicted in Figure 2.1, along with an anatomical explanation of the electric wave propagation and their measurement using Einthoven’s triangle. This triangle specifies how different ECG leads or “views” are taken from the heart by electrical

24 2.2 The Electrocardiogram

measures through surface skin-attached electrodes. Einthoven3 specified the three main bipolar limb leads: Lead I, Lead II, and Lead III (Figure 2.1). Any of these are frequently used by wearable ECG devices and together can be seen as a minimal set of derivations to allow a rough full picture of the heart’s individual electrical characteristics. However, next to those, several more leads have been established in medical diagnostics. Modern routine clinical assessments use up to 12 different ECG leads. These comprise additional pre-cordial leads derived on the chest. Due to the effort of using additional electrode measurement sites on the body, many wearables only obtain one single lead (usually Lead I or Lead II) by recording the potential difference between the two respective electrode sites. Since full clinical diagnosis of the ECG using all 12 leads is still almost4 entirely in the domain of hospital settings, many consumer ECG devices are mainly interested in acquiring accurate 푅-peak measurements. This is already feasible using single Lead I or Lead II derivations, with both usually having a distinctly visible and large 푅-peak. 2.2.2 The History of the Electrocardiogram The history5 of the ECG is a long one compared to many other biosignals. It began almost 150 years ago, when Augustus Desire Waller, professor of physiology in London, published the first ECG, measured with a capillary- electrometer in 1887. Widespread use was then later enabled by technological advancements made by Willem Einthoven in the early 20th century. He watched [114] the original experiments by Waller and ultimately made some decisive steps towards modern electrocardiography. The naming of the waves as PQRST date back to an illustration he made during early developments of his improved Lippmann capillary electrometer, influenced by Descartes notation on curves and with the thought to be able to extend the naming in both directions in case new important points get discovered [115]. For his work, that also included standardization of the ECG leads which are today known as the Einthoven leads, he received the Nobel Prize in Medicine in 1924 and is therefore often referred to as the ’Father of Electrocardiography’.

3 See next section. 4 The Apple Watch performs a test for Atrial Fibrilliation, which is normally only diagnosed in 12-lead ECGs. Apple’s method is challenged by experts [111], as well as patent infringement charges [112, 113]. 5 Unless noted otherwise, the majority of historic details are derived from [114].

25 2 Fundamentals

The first mobile ECG was recorded in 1947 by Norman Jefferis Holter with a ’wearable’ device backpack weighing almost 40 kg that he published in 1949 [116, 117]. Although he did not receive as much fame as Einthoven, he pioneered fundamental concepts in today’s ECG recording and analysis techniques. This included methods to store ECGs ’memory efficient’ by extracting the QRS complexes and plotting them over each other to separate abnormal ones in a quickly recognizable way. He also used a method to represent 푅푅-intervals with different sounds, with a pitch proportional to the interval length. This allowed ECG records to be evaluated at 60- to 80-times the recording speed by just listening to a fast replay of the sounds [114, 117, 118]. This is why a modern long-term ECG recording system is often referred to as ’Holter monitor’ and observing one’s ECG over a longer period of time as ’Holter-Monitoring’ [119, 120]. Today’s long-term monitoring solutions are nothing like that bulky apparatus from 1947 anymore. The latest developments allow the heart rate to be monitored without contact to the body. Be it with propagated body- vibration tracking (seismo-/gyrocardiography), camera-based systems, modern radar technology, or sound [84, 120–126]. However, these methods often aim to just assess cardiac cycle parameters, and only for the gyrocardiography has it been shown that accurate ECG related parameters can be derived [122]. Generally, these methods do not provide an electrical derivation of the heart activity, which contains a lot more information and quality. Furthermore, most of these techniques require an external device, or very specific and restrictive positioning and are thus more suited for stationary monitoring, e.g. in a clinical environment. Therefore, such devices are of no concern to this thesis, except for vibration-based systems that could be used in extended reality (XR) headsets, as these are among the building blocks of the method proposed in this work [123, 127]. 2.2.3 Electrocardiography Measures Given the long history of the ECG, its importance in clinical practice, and the prevalence of heart diseases in post-industrialized countries, a lot of research has been conducted on this biosignal. Today, all fiducial points in the ECG that can be discerned using modern analog-to-digital converter (ADC) hardware and sample timings are characterized and well understood. The challenge for future work remains in finding different ways to record this signal and use its information in unprecedented ways. As this work is not concerned with clinical arrhythmia classification on 12-lead ECGs, the parameter descriptions will be focused on those

26 2.2 The Electrocardiogram

that can be extracted using the limb leads of the Einthoven triangle (i.e. Lead I, Lead II, and Lead III). Clinical ECGs are usually recorded at a sampling rate of 1000 Hz or higher to allow fine-grain changes in interval-timings to be discerned. However, many of the feature-points in this signal can already be quite accurately determined at sampling rates above 128 Hz. The most basic parameters of the ECG are the amplitude and timings of the main fiducial points (P, Q, R, S, T) relative to each other, the surrounding heart cycles and in relation to the baseline. Table 2.2 shows an overview of these parameters, including the symbols used for them throughout this work, the physical unit, a description, and also the norm values. The table also introduces a color scheme that will be used throughout the work for results related to the respective parameter. There are several further parameters that are used to characterize specific waveform abnormalities in the ECG waves. With a bit of generalization it can be found that any skewed or elevated waveform has a clinically relevant meaning [129, pp. 13–14, 39–55]. Due to the many variations and conditional norm criteria, these are not listed in detail in this work. Trappe and Schuster provided a wealth of examples for all abnormal variations including multitudes of pictures [129]. 2.2.4 Stress Manifestation in the Electrocardiogram Stress also influences several parameters of the ECG, most dominantly the time and frequency information in the 푅푅-interval. Kim et al. conducted a meta-analysis about this influence on HRV parameters [131]. They screened 235 studies and included 37 in their review. They found that the most reported parameter correlating to stress is low parasympathetic activity, with a decrease in 푃퐻 and an increase in 푃퐿 [131]. This would mean the 푃퐿/푃퐻 parameter should show a sharp increase during stress. This is backed by [74, 84, 89]. However, several of the other ECG parameters also showed significant changes in many of the studies. These are heart rate (HR), 푝푁50, and 푡푅푆퐷. As these are time-domain parameters, they are much easier and more reliable in their computation compared to the 푃퐿/푃퐻 parameter, which is dependent on long measurement times. Usually hours, but at least several minutes of ECG recording is needed to provide reasonable results for this parameter, whereas the other three can theoretically already be calculated with just three consecutive heartbeats. The average heart rate will increase during stress and decrease during relaxation. The 푝푁50, which describes the percentage of highly variable inter-beat intervals (>50 ms), increases during relaxation and decreases during stress. The 푡푅푆퐷 quantifies this variability for all intervals, not just

27 2 Fundamentals al 2.2 Table parameter. related) closely a (or this ing represent- plots to result color e.g. work, entire the out through- used scheme palette the represent table this in ors Col- recording. minute 5 least at or hour 24 a from extracted HRV 128–131]. 75, [74, from lected Col- classification. stress for or detection arrhythmia for settings clinical in rameters used monly parameters derived ECG portant aaeesaeusually are parameters : h otcom- most The . vriwo im- of Overview ECG drvd pa- (derived) R rqec-oanparameters frequency-domain HRV parameters time-domain HRV rate Heart Intervals 푇 푅 Description range Normal 푄 Unit Symbol 푃 Parameter -wave -wave -wave -wave LF/HF HF LF VLF SDNN RMSSD pNN50 AVNN RR-interval time QT width QRS time PQ Amplitude Amplitude Width Amplitude Width Amplitude 푃 푝 푡 푇 HR 퐿 푇 푇 푇 푇 퐴 휎 퐴 퐴 푅푆퐷 퐴 푇 푇 푃 푃 푁50 푃 푅푅 /푃 푣퐿 푃푄 푅푅 푅푅 푄푇 푄푆 퐻 푄 퐿 푃 푅 푃 푇 푄 퐻 oe ai .–. ai of Ratio 1.5–2.0 ratio power p ( bpm ms ms ms V082Rwv mltd from amplitude wave R 0.8–2 mV baseline signal from amplitude wave mV P 0.25 < mV mV ms of end to Q- of ms beginning current from the period of Time period Time of end to Q- of beginning from period Time of ms beginning to P- 600–1200 of 550 beginning from < period Time 60–100 120–200 ms ms ms period ms time wave Q 40–60 period time wave P 50–100 ms ms % 2 2 2 min 1 0102–5 erbasprmnt nra ag:Sinusrhythm/physiological) range: (normal minute per Heartbeats 50–100/20–250 ) 6 1 퐴 90±1056 ± 1900 8. 79.2 ± 787.7 10±416 ± 1170 79±1. 푡 12.3 ± 27.9 푅 7 203 ± 975 4 39 ± 141 . 7.6 ± 7.5 퐴 < < 4 1 푇 퐴 < 푅 3 2 퐴 푅 inlpwri .5H–. Hz Hz–0.4 0.15 in power Signal Hz Hz–0.15 0.04 in power Signal power Signal normal of deviation Standard normal adjacent of Percentage normal all of mean Arithmetic baseline signal from amplitude wave T baseline signal from amplitude wave Q 푅푆퐷 = √ 푃 푁−1 퐿 1 to ∑ ≤ 푃 퐻 푛 푁 .4Hz 0.04 (푇 value 푅푅 푛 푇 − (푛) 푅 pa to -peak 푅 푅푅 pa otepeiu one previous the to -peak 푛−1)) − (푛 푇 푇 푅푅 푅푅 푅푅 -intervals 푆 ifrn ymr hn50 ms than more by differing pa,dpnso edmsl mass lead/muscle on depends -peak, 2 ihtenra beats normal the with 푇 푆 -wave -wave 푄 -wave ,3 .,푁 ..., 3, 2, = 푛

28 2.2 The Electrocardiogram

those differing by more than 50 ms. Therefore, the 푡푅푆퐷 will also increase during relaxation and decrease during stress. Since this work describes two different study types: one that maximizes relaxation and one that maximizes stress, the following correlations should be remembered forall of these HRV parameters when interpreting result plots and numbers. When comparing two different conditions A and B, and calculating the relative change 훿 in one of the HRV parameters from A→B, B will be more relaxing if

훿 < 1.0; 훿 < 1.0; 훿 > 1.0; 훿 > 1.0; 훿 > 1.0. HR 푃퐿/푃퐻 푡푅푆퐷 푝푁50 푇푅푅 (1) B will be more stressful if

훿 > 1.0; 훿 > 1.0; 훿 < 1.0; 훿 < 1.0; 훿 < 1.0. HR 푃퐿/푃퐻 푡푅푆퐷 푝푁50 푇푅푅 (2) In addition to these relatively simple HRV parameters, other measures can be derived from the ECG that also have a relation to stress. Using more complex extraction techniques, it is possible to determine the breathing rate from the ECG/HRV signals. The respiratory sinus arrhythmia (RSA) is a physiological feature found in all vertebrates [132]. It describes an increase in heart rate during inhalation and decrease during exha- lation. This pattern in the heart rate variability can be extracted and allows an assessment of stress, with the breathing rate being decelerated during stressful phases, the respiratory variability decreased, and the frequency of sighing increased [16]. Other observations also indicate that the breathing rate increases during stressful tasks [84]. It might be that this depends on the nature of the stressful task. Photoplethysmography/-gram (PPG) has been suggested to also provide accurate HR and even HRV values [133, 134]. This would allow it to similarly characterize stress, as is possible with the ECG. However, such measurements are highly susceptible to movement artifacts which can often lead to a total loss of the signal. This method determines the blood perfusion in the arteries through reflectance or transmission photometry. It measures the changes in the amount of reflected or transmitted light from/into the surface skin in the red and green spectrum. The pulse wave detected in the capillaries with this method has its own characteristics. Even though a close relation to the heart systole exists, and thus the QRS complex, it is not an electric ablation of the heart activity, and consequently it contains different information than an ECG. With most

29 2 Fundamentals

wearables still relying on this technique, a reliable stress determination seems infeasible with a PPG. Therefore, this work focused on providing a real ECG as input to the wearable stress laboratory (chapter 4). 2.3 Virtual Reality The largest part of the user interface in the wearable stress laboratory is presented in a virtual reality. Therefore this section introduces the related terms, technical backgrounds, and history. 2.3.1 The History of Extended Realities Extended reality (XR) is an umbrella term for all forms of virtual and augmented environments. It is an alternative term for the reality-virtuality continuum described by Milgram and Kishino to clearly include the completely real and virtual parts of the spectrum as well as the concepts of virtual reality (VR) and augmented reality (AR) [135, 136]. In the last two years, this term has started to take precedence over the term mixed reality (MR), e.g. the World Wide Web Consortium (W3C)6 has named its combined AR and VR device API recommendation WebXR (still a draft at the time of writing this thesis) [137]. XR environments are usually computer generated scenes in a virtual three dimensional (3D) space which are perceived by the human senses as real, in particular the visual apparatus using stereoscopic vision-based headsets [76]. These are also called head-mounted display (HMD), as stereop- sis is achieved by two independent displays, one for each eye, showing the scene from a slightly different angle/viewport and generating the illusion of visual depth. They have become technically more advanced in recent years, with breakthrough development for the consumer market started by a crowdfunding campaign for the Oculus Rift (Facebook Technologies, LLC, California, USA) in 2012. With its success, the modern idea of XR is somewhat tied to stereoscopic headset systems and their input, however, the idea of feeling present/presence/telepresence in a virtual world is per definition not tied to this kind of hardware [138]. The concepts anddefi- nitions of presence and immersion will be discussed in the next section. Palmer Luckey, one of the original inventors and co-founder of Oculus Inc. (acquired by Facebook Inc. in 2013 [139]) was later featured on the cover of the TIME magazine in 2015, with the slogan that his invention will “change the world” [140, 141]. His HMD, the Oculus Rift, was among the first to use cheap, bright, high resolution LCD flat panel screens in a stereoscopic setup to create a cheaply manufacturable VR head-

6 https://www.w3.org/Consortium/ (visited on 2019-10-10).

30 2.3 Virtual Reality

set [142]. This development was essentially possible only with the fast growing smartphone market, for which those displays were developed and cheaply produced. The problems those displays solved, e.g. that of too little pixel density, were the major limiting factors for the breakthrough of VR in the decades before the Oculus Rift. Still, a large number of research questions regarding VR had already been asked and answered during the time when wearable HMDs just became feasible in the 90s, in a time when they were very expensive specialized devices. VR research dates back much longer, with Ivan Sutherland in 1968 being the first scientist attributed with the development of a computer connected VR headset [143]. There were even earlier devices that, one could argue, generated some kind of virtual reality, however not with a computer. For example the stereoscope in 1832, invented even before photography [76]. Jason Jerald even argues that the cave drawings of early human civilizations could be considered analog VR [76]. Either way, the VR by Sutherland was the first one that allowed immersion into a 3D virtual computer generated scene. Similar predecessors of modern HMDs were the infamous Sensorama device by Morton Heilig in the 1950s or the Stereoscopic Television Apparatus in 1960. With the rise of the personal computer and console gaming industry, VR devices were hyped for consumers again in the 1990s. However, technological vision hardware development could not live up to the promises, so the interest subsided and no significant hardware progress was made in the first decade of the 21st century [76]. Only following the success of the Oculus Rift, a lot of manufacturers started to invest heavily again into this market and created their own XR platforms. This includes companies like Google, Microsoft, Samsung, Sony, and many more. Google developed the Google Cardboard, a low- end housing device for an Android smartphone, which allowed masses of people to experience VR. In the years leading to that, Thad Starner developed the Google Glass AR system [144]. Today, the use of XR is widely adopted in construction planing, engineering, design visualizations, flight simulation, medical diagnosis [71], theme-park entertainment [76] and even in sports training [63, 69, 70, 145, 146]. In stress research it has already been used extensively as well, not primarily for stress generation, but for treatment of psychological disorders and also for relaxation and other biofeedback-related applications [147]. With its high quality immersion capabilities and thus prevalent presence feeling, modern HMD hardware platforms are particularly interesting for

31 2 Fundamentals

cognitive research and therapeutics [148]. Subjects may even forget about being part of a experiment or therapy and behave much more naturally than in real laboratory settings. This is one of the reasons why the stress research platform in this thesis is situated around this medium. 2.3.2 Today’s Virtual Reality, Presence & Immersion The advantages of using virtual reality for scientific research are numer- ous and have been debated extensively. Highly consistent repeatability, precise control of the environment and experimental variables, and unrestricted (e.g. not bound by the laws of physics in the real world or restricted to real objects) application scenarios are just some of the repeatedly mentioned and obvious advantages [149, 150]. The large presence feeling that can be induced with modern hardware led to some striking findings in therapeutic research, even arguing that result effects might be comparable or even superior to in vivo therapy [148]. The term “immersion” refers to the physical capability of the hardware to generate an immersive virtual environment. For example an HMD with a larger pixel density per eye, or a wider field of view (FOV), is more immersive than one with a lower density/FOV7. The term “presence” refers to the feeling of the immersed user on how much he/she actually feels like being in a real environment, or “being there” [76, 138, 151, 152]. To some extent, presence is dependent on (a function of) immersion and the user [76], however there is no easily discernible relationship, as presence is a feeling of a human being. It can already be quite high with rudimentary immersion capabilities, as has been shown by several computer games in the past, when computer hardware was still very limited and programmers and game designers used many clever tricks to create an illusion of reality. Examples for such games are Doom, Quake (both id Software, Richardson, US), and Half-Life (Valve, Bellevue, US), which were praised and hugely successful due to their realism compared to the state-of-the-art at their time of release [153, 154]. People were used to much worse graphics, sound, simulated, and reactive content so that the little immersion potential of those games provided a relatively high presence-feeling/engagement. For all of those games, John Carmack was directly or indirectly responsible, which made him a key-figure in the development of immersive consumer

7 This comparison is of course only valid if the parameter is not already above the limit of physiological separability in humans, i.e. if pixel density is already so high that the human eye can not discern them, there will be no increase in immersion if the density is increased.

32 2.3 Virtual Reality

Figure 2.2: A modern virtual reality headset. The person is wearing the Vive Pro virtual reality headset [156, 157] with the additional Wireless Adapter module (top of the head). This module allows the VR to be experienced without having a cable connection to the rendering computer. The headset seems heavy in the front, however, due to its strapping and counterweight at the backside it feels quite light to carry. computer simulations. He works for Oculus since 2013, originally as CTO and has been awarded with a lifetime achievement VR Award [154, 155]. Concluding, one could say that in the virtual reality, the feeling of presence is generated by essentially tricking the human senses. However, at a very high degree of presence, for many psycho-cognitive processes, it does not make a difference anymore whether the source of the presence- feeling is the real world or a virtual one [148]. These ideas and findings around immersion and presence and their effect with respect to cognition and stress engagement are relevant for the design decisions made for the Stroop Room in this work, e.g. the way how errors are communicated and how the overall environment is designed in a simplified way with complicated animations not being relevant to the stress elicitation of the Stroop paradigm and visual orientation (see chapter 8). 2.3.3 Hardware Systems for Immersive Virtual Environments Modern virtual reality hardware benefits from the fast development growth in the smart device market. Miniaturization of microprocessors, increase in memory density and development of miniaturized sensor components like IMUs are key developments enabling modern HMDs. These developments were accelerated by the pervasive spread of smartphones in our society. Besides a multitude of different HMDs on the market [63], there are two key hardware platforms that dominate it. The Oculus Rift system (patented by Palmer Luckey et al. in 2013 [158]), now owned by Facebook

33 2 Fundamentals

Inc., and the Vive system (patented by Zellweger et al. in 2015 [156]), produced and marketed in collaboration between HTC Corporation and Valve Corporation/Software. Their major distinguishing factor were originally that the Oculus was more of a portable device without room-scale tracking capabilities while the Vive was a stationary system with room- scale tracking enabled through two tracking base-stations based on laser triangulation, Figure 2.2 shows the headset with its optical sensors. Both companies have expanded their hardware portfolio to also include stand-alone devices that do not require an external computer for graphics processing. The Oculus headset is also available with a mobile version that requires a smartphone to be inserted in the housing to enable VR experiences rendered through a split-screen view on the smartphone. This is a highly versatile and mobile variation and one of the reasons why this device in particular was targeted by the wearable stress platform. In the AR ecosystem, the major platforms are Microsoft with its HoloLens system [159], which recently was released in version 2, and Magic Leap, which recently launched their first hardware system [160]. Google also developed an AR device, the Google Glass, however it never made it to the final consumer market as of today [63]. Apple also recently showed interest in the augmented reality spectrum, but hasn’t announced any hardware yet [161]. They are seemingly not interested in the VR space. Speculation exists that an augmented reality iPhone device will be released some time in the future. These AR systems are listed here for completeness, as they can theoretically also provide enclosed VR environments. However, the current hardware capabilities of AR devices is not comparable to high-end VR solutions, as they cater for a different need. Both device platforms still have to use cutting-edge technology to provide reliable solutions for their respective field, so at this moment, AR devices are not relevant yet for a wearable stress platform. There are a multitude of other hardware development companies that provide either adapted or own HMD creations, however, most of the existing and published development pipelines are mainly targeting the Oculus and Vive platforms. This is the reason why these two systems were used in this work. 2.3.4 Software to Create Immersive Virtual Environments In order to use VR hardware for research, a software pipeline has to be used to create, program, and run virtual environments. There are

34 2.3 Virtual Reality

many different software tools or entire toolkits to do so and theycanbe distinguished broadly in three categories:

1. base 3D engines, which providean entire framework to build virtual worlds from the very ground up with unlimited possibilities for modeling and programming, 2. derived specialized software kits that either are custom build or based on one of the frameworks in (1) to ease creation of content for specific target areas but still are programmable by code toa certain degree, 3. derived specialized software kits with pre-made environments that are not programmable by code anymore and only allow tweaking of simulation parameters/independent variables, intended for direct use by an end-user, e.g. environments for treatment of psychological disorders, where the target user is a psychologist and his/her client. An overview of such engines and tools that can be used to create virtual environments for psychological purposes is given in Table 2.3, along with its associated category. The table furthermore lists the platform, the programmability of the engine and references. Historically, the development of those engines started with the game engine for Wolfenstein 3D (id Software, Richardson, US) and Doom, mostly programmed by a single person (John Carmack, considered one of the farthers of modern 3D graphic and engines, in particular the first person shooter genre) [153]. With the advancement of 3D graphics and its dedicated hardware, called graphics processing units (GPUs), the rendering pipelines (including the programming of sophisticated shader code that runs directly on GPUs) and the effort for modeling high resolution 3D polygon objects with detailed textures became highly complex. Today, in order to obtain photo-realistic graphics, it has become completely unfeasible to do this even as a medium sized company. Therefore, three main 3D “engines” evolved and took the majority of the market ownership with licensing models for their software: CryEngine (Crytek GmbH, Frank- furt, Germany), Unity 3D (Unity Technologies, San Francisco, USA), and the Unreal Engine (Epic Games, Inc., Cary, USA). Most often, these are now either used directly by researchers or companies to create their environments, or derivatives of those.

35 2 Fundamentals

Table 2.3: Overview of scientifically usable 3D engines for VR. It includes today’s most important 3D engines or scientific tools that can be used to create immersive virtual environments, including in particular those usable for creating psychophysiological tests or for therapeutic usage. The first column gives the category. The fourth column gives the programming languages that can be used to program or customize the engine/VE. If it is not customizable by code, it is marked with no. Unavailable information is referenced with n/a.

Cat. Name Platform Programmable References

1 CryEngine Windows C++ [162] 3 CyberSession Windows n/a [163] 3 Limbix Android no [164] 3 Psious Android no [165] 1 Source 1/2 Engine Windows C++ [166, 167] 1 Unity Windows, Mac, Unix C#, JS [168] 1 Unreal Engine Windows, Mac, Unix C++ [169] 3 Virtually Better n/a no [170] 2 Vizard WorldViz Windows Python [171]

Many of the modern engines are so versatile that they allow direct inclusion of wearable hardware or biophysiological monitoring systems via standard operating system interfaces or manufacturer provided software development kits (SDKs). One example is the Unity SDK for the BioPac system [172]. However, these are by far not common standard yet, usually it still involves a lot of development or monetary effort to get systems to inter-operate. In this work, the Nexus Kit and its associated software BioTrace+ was used for reference measurements. Both of which do not provide existing solutions to inter-operate with VR environments. There- fore, forthe livecomponents, a platform wasdeveloped as partof thiswork and integrated into the two libraries of the wearable stress laboratory. 2.4 Synopsis of the Fundamentals Human stress is mainly promoted by two stress-axes: the ANS and the HPA axis. The first one can be analyzed very well using the ECG and 훼-amylase, the second one using cortisol measurements in saliva or blood. Stressors can either be physiological, cognitive or emotional. The most stressful type is the socially evaluative emotional threat, however it is subjected to habituation, while cognitive stressors are not similarly

36 2.4 Synopsis of the Fundamentals

affected. Habituation of a person can be measured using repeated stressor exposure and classified into one of six classes. The ECG provides fundamental high-quality information about the ANS. Since there is no easy way to measure 훼-amylase and cortisol in real-time, the HRV will be used in this work to differentiate stress states. HRV parameters can only reliably be derived if accurate 푅푅-interval timings are recorded and extracted from the ECG. To do this, the 푅-peaks need to be precisely isolated and characterized. Besides the 푅-peak there are the P, Q, S, and T waves, which are important in the ECG (Figure 2.1). However, these are not relevant for HRV parameter determination, of which the following (Table 2.2) are the most relevant to stress: changes in HR, in 푡푅푆퐷 (variability factor for all normal heartbeats in a sequence), 푝푁50 (percentage of highly variable heartbeats), and the 푃퐿/푃퐻 ratio, where an increase characterizes mostly the reduction in PNS activity. VR has evolved significantly in the last ten years and is at a point now where even psychological therapy can be conducted successfully using this medium. One key factor for this is the prevalent presence capability of modern immersive high-end hardware. Immersion is a measure of how good a system can technically present a virtual environment, while presence describes the feeling of the useractually being in that world. The state-of-the-art modality to provide a VR is the head-mounted display with the Oculus and Vive devices being the most dominant on the market. They are programmed using 3D engines (Table 2.3) or specialized tools with pre-made environments.

3 Related Work The related work is discussed in three different sections. First, similar work in the field of mobile/wearable electrocardiography and arrhythmia classification is described. Then, stress detection methods based onsuch wearable implementations are collected. At the end, a short overview of the most important laboratory stressor methods is given, followed by an extensive overview of VR-based stressor methods. 3.1 Mobile/Wearable Electrocardiography and Arrhythmia Classification Wearable monitoring is a term dating back to the earliest developments of body observing or augmenting machines in the last century. As laid out in the last chapter, Holter’s ECG monitoring device was one of the earliest wearables [116, 119]. The goal has not changed since. However, with the advent of microprocessors and the advancement in miniaturiz- ing electronic circuits in general, they became much more lightweight and unobtrusive [119, 173]. Wearable devices reached widespread acceptance with large-scale use of smartphones among consumers. Energy harvesting became an important research sub-field, as batteries are the most limiting factor for wearables or pervasive computing in general, nowadays [174, 175]. Early wearable devices largely focused on activity tracking, and later activity classification [175], and also specific activity parameter determination and cross-field recognition systems, like for example gait analysis and associated disease characterization [176]. Mainly due to low-cost photoplethysmographic sensors becoming available in mass production, sports watches started tracking not only activity through IMU-based systems, but also heart rate using the (blood) pulse wave [177]. Most recent advancements in technology has led to current consumer devices also measuring the actual electrical activity of the heart, thus ECGs. 3.1.1 The Apple Watch as an Example One example of those devices is the Apple Watch (Series 4 and Series 5, Apple Inc., Cupertino, USA). It allows the user to record his/her ECG by placing a finger of the non-watch hand onto the Crown button/dial (usually the index finger) as shown in Figure 3.1. This closes an electrical circuit with both arms serving as limb leads. The resulting signal, shown in Figure 3.2a, is therefore similar to a Lead I derivation and is printed

39 3 Related Work

Figure 3.1: ECG recording on an Apple Watch. The ECG recording function on an Apple Watch 4 and 5. The user has to keep a finger of the non-watch hand on the watch’s Crown to close the electrical circuit and allow an Einthoven Lead I-like derivation of the ECG. out inside a PDF document that is sent to the user’s iPhone where he/she can print it and or send it to their doctor. A classification engine inside the watch targets the recognition of atrial fibrillation. However, it is very conservative, restricted, and limited in its sensitivity [111, 178, 179], e.g. it will not classify signal segments where the heart rate is above 120 beats per minute (bpm), or segments with more than about 15 % noise. The whole system is easily susceptible to noise, as can be seen in Figure 3.2b. Just slight movements of the arm or finger will incur heavy noise artifacts up to a complete loss of the signal. Furthermore, the signal data is not directly available in the standard interface. As a consumer, one is limited to the use of the print-out in PDF documents. There is no access to the raw sampled signal. Even though the Apple Watch clearly shows that consumer technology is at a stage where true ECGs can be recorded, the usability restrictions required to make it work are still too large for feasible unobtrusive monitoring. There are more similar devices, but they all are in the same ballpark regarding ECG recording and analysis characteristics and have the same limitations. 3.1.2 Market Overview An overview of current wearable devices that allow an ECG to be recorded is given in Table 3.1. This represents the majority of systems available at the time of writing, including consumer devices like ECG-enabled

40 3.1 Mobile/Wearable Electrocardiography and Arrhythmia Classification

Table 3.1: Overview of wearable ECG devices. Wearable devices that allow an ECG to be measured. Devices intended forresearch useorstill in developmentand not directly for consumers are marked in the column R (esearch) with the symbol Tools. Devices intended for clinical use (e.g. via prescription or with evaluation only by a medical doctor) are marked in the column C (linical use) with the symbol Hospital. References include a website, where available, as well as describing research or evaluation studies focusing on ECG parameters. The T (type) of the wearable can either be a smartwatch or general wrist-based (˚), a strap or patch-like device attached to the chest (Band-aid), or a textile device (USER-ASTRONAUT).

Wearable Name T R C References

Apple Watch 4/5 ˚ [180, 181] Biocalculus Band-aid Hospital [182] BioPac Bionomadix ECG Tools [183] Cardea SOLO Band-aid Hospital [184] FitnessSHIRT USER-ASTRONAUT Tools [66, 185] Ha et al., EMAC prototype Band-aid Tools [186] Hexoskin (Smart) USER-ASTRONAUT [187–190] Holter monitor Tools [116] KardiaBand (KardiaMobile) ˚ [191–194] Lief Band-aid [195] NeXus-10 (and 4) Tools [196] OMsignal USER-ASTRONAUT Tools [197] QardioCore USER-ASTRONAUT [198] Shimmer ECG Tools [199, 200] TAGecg Band-aid Hospital [201] VivaLNK Vital Scout/ECG monitor Band-aid [202] Withings Move ECG ˚ [203] Zephyr Bioharness (3) USER-ASTRONAUT [204] Zio Patch Band-aid Hospital [205–207]

41 3 Related Work

(a) Normal ECG signal (b) Noisy ECG signal

Figure 3.2: Apple Watch ECG signal characteristics. The resulting signal of an ECG recording with the Apple Watch. The output is a PDF containing a signal- printout, of which 6 s are shown in (a) and 4 s in (b). There is a calibration wave printed into the grid in (a). The QRS complex and the 푇-wave are clearly visible in the signal in (a). In signal (b) the severe noise artifacts are visible that are incurred by just slightly moving the arm or finger. smart watches, devices primarily intended for research or still under research/in development, and devices only used as part of a healthcare provider-initiated patient-blind analysis [208]. The list does not contain devices that are not attachable to the body (i.e. wearable), e.g. devices like imPulse, MyDiagnostick, Psychlab and Zenicor-ECG [209–211]. The Apple Watch was used as an initial example of what current watches or wristbands are capable of. Not quite as unobtrusive from a wearable point of view, but widely available, are chest strap devices like the Zephyr Bioharness system [204]. They usually comprise two electrodes in the chest strap which are used to record a traditional ECG. However, this signal is generally heavily filtered for motion artifacts and 푅-peak detection is applied immediately. The output is a series of 푅푅-intervals or in some cases just an HR value averaged over several seconds. Some of those devices allow exporting of these values for research purposes, but not the original ECG signal. A step forward from a wearability perspective are textile based ECG systems. Usually, they are comprised of some functional textile shirt with specialized textile electrodes sewn into the cloth. One important example for such a device is the Fraunhofer FitnessSHIRT [66]. It is important for this work, because the circuit platform it is built on served as direct precursor to the current-based system presented in this thesis (chapter 4). Further examples of a shirt variant are the Zephyr, Hexoskin, and OMsignal [187, 197, 204].

42 3.1 Mobile/Wearable Electrocardiography and Arrhythmia Classification

3.1.3 Algorithms In addition to consumer-grade wearable devices, where usually the implementation of signal processing algorithms is not publicly available, there are several additional devices or algorithms that play an important role in either ECG processing in general, or in the recording of it. One of these that needs to be mentioned is the Mortara algorithm [212]. It is named after the inventor of that patent from 1986, describing an early digitizing clinical ECG monitor machine and a corresponding algorithm to detect 푅-peaks [212]. Today, many clinical monitors are based on this algorithm, or system. It comprises a bandpass-filtering technique for the raw A/D sampled signal and a QRS complex detector to identify the 푅-peaks. Detection of 푅-peaks is the most crucial operation for any algorithm working on the ECG signal. As the 푅-peak is the most prominent of all fiducial points, except for some specific arrhythmia conditions, alotof work went into development of an optimal method to extract it. Among the first digitally implemented and arguably most influential detectors in this area is the one by Pan and Tompkins [213], created in 1985. It was one year later refined by Hamilton and Tompkins [214]. The detector uses four basic steps: a digital bandpass filter, a squaring function, a moving-window integration, and finally a sequentially adaptive threshold-based detection scheme. The improved algorithm has a sensitivity of 99.69 % and positive predictivity of 99.77 %, evaluated on the MIT-BIH Arrhythmia Database and is thus even today considered as a reference detector [214]. The MIT-BIH Arrhythmia Database is considered a gold standard test database for comparing algorithm performance related to ECG analysis [215]. It contains 48 half-hour two-channel ECG signals from 47 subjects and including samples for most major arrhythmia conditions [215]. It is publicly available on PhysioNet for use by anyone for comparison tests [216]. The digital signal processing steps from Pan and Tompkins have since been used in many QRS detectors in some form. Apart from these approaches based on signal derivatives and digital filters [217], there are two other regularly implemented methods: Wavelet-analysis or neural network based detectors [217, 218]. Further prominent examples for QRS detectors are the Köhler et al. zero crossing count algorithm with sensitivity of 99.70 % and positive predictivity of 99.57 % [219]. The template-based approach by Krasteva and Jekova with sensitivity of 98.40 % and specificity of 98.86 % [220], and

43 3 Related Work

the knowledge-based algorithm by Mohamed Elgendi with sensitivity of 99.78 % and positive predictivity of 99.87 % [221]. There is a multitude of other algorithms, a good overview and categorization is provided in [217]. Also, the work of Mohamed Elgendi shows a good overview of the performance of the most prominent algorithms on 11 different standard test databases [221] as well as the different techniques usable for wearable devices [218]. With these numbers, it becomes clear that improvements after the Hamil- ton/Tompkins algorithm were only made in the sub-percent range and 푅-peak detectors are well established for use in any setting. The errors they make in the database records are always in the same region, where severe noise artifacts even make it almost impossible for experienced human raters to identify the QRS complex. The current challenge is to find optimal embedded detector algorithms, detect all features of the ECG with expert-rater comparable accuracy and identify arrhythmia conditions. Regarding arrhythmia classification, there are recent approaches which show that machine learning-based algorithms involving neural networks, can outperform the average human cardiologist. This was demonstrated by Hannun et al. in 2019, who developed a deep neural network, able to classify ten different arrhythmias, achieving a positive predictive value and sensitivity of 0.837, which is above that of the average cardiologist (0.780) [222]. Unfortunately, the authors did not provide performance metrics on algorithmic execution time, complexity, and memory demand for their network, therefore it is possible that this approach can not be used in an embedded context yet. These results were recently even improved in one specific area. Porumb et al. were able to build a neural network that shows a 100 % accuracy in detecting congestive heart failure [223]. This marks a milestone in machine learning-based arrhythmia detection, where for the first time an algorithm outperforms any compared human rating and underlines the importance of scientific research in this field. Several software packages for different platforms or programming languages exist which perform clinically grade calculation of ECG parameters, either given one or multiple raw ECG signal streams, or given an 푅푅-interval-array [130, 224–226]. However, none of these are designed to be used in embedded and non-embedded settings on mobile devices and regular computers alike. Some are not open-sourced, some are simply outdated in their use of development environments or require proprietary and costly environments like MATLAB. The Java ECG Library

44 3.2 Wearable-Based Stress Detection

(JELY) was built in this work to overcome those constraints and provide platform-independently repeatable and wearable-optimized ECG processing algorithms (section 5.2). 3.2 Wearable-Based Stress Detection In order to identify wearable devices that allow the detection of stress, not only in a qualitative manner but also quantitatively, a cross sectional analysis needs to be done of which wearable can provide which biosignals and how strongly those signals are correlated with the actual stress-level. How biosignals relate to stress has already been discussed in the fundamentals chapter. Therefore, in this section, a summary is given of wearables that allow a minimal set of signal features to be recorded for stress analysis. Essentially, all the wearables listed in the previous Table 3.1 can be seen as capable recording tools for stress quantification, as they precisely measure the ECG. Additionally, there are other wearables which do not record the electrical heart activation, but similar signals, for example the PPG [120]. From this signal, it seems to be possible to estimate e.g. the HRV to an extent that is agreeable with ECG-based recordings. However, measuring the blood pulse wave is not the same modality as the measure of electrical activity across the heart, therefore similar results are difficult to obtain already from a methodological point of view. Recent literature estimated the error, which is dependent on the used wearable and their proprietary algorithms [227]. In this paper, of all the wearables able to report HRV based on PPG, the best performing ones were selected by the authors and evaluated regarding their error compared to a clinical gold standard Holter monitor. They evaluated the Biovotion Everion and the Empatica E4 [227] and the tested parameters were the heart rate and several HRV parameters during increasingly demanding running or biking activities. For heart rate estimations, the Everion had an average correlation coefficient to the Holter device across all activities of 0.99. The Empatica’s HR readings only correlated with 0.41 on average to the Holter device. Due to the low correlation, it made no sense for the authors to correlate HRV parameters of the Empatica against the Holter and they only reported their findings for the Everion device. Here, regarding one of the most relevant parameter for stress estimation, the 푃퐿/푃퐻 (chapter 2), the readings correlated with 0.67 during rest and 0.23 during highest biking activity. This result shows that even though HR can be recorded using PPG signals with very high agreement to classical ECG recordings, this is not true for HRV parameters, which provide a lot of additional information

45 3 Related Work

relevant for stress estimation than just the HR alone. Corresponding claims of device manufacturers definitely need a thorough scientific verification [181, 228]. Ideally, a standard device verification authority is needed that provides an independent device testing and verification process. Something like this has been proposed and attempted before, without success so far [228]. Even though Hernando et al. could show that several HRV parameters derived from the Apple Watch largely agree to values derived from a chest strap device, they did not compare it directly to a clinical gold standard Holter device and only tested subjects in resting conditions [181]. Yet they could show that stress estimation is feasible with a wrist-worn PPG-based wearable device in general. Given these results, it becomes obvious that with current technological development, an ECG signal is still necessary to allow reliable stress estimations from wearables in non-resting conditions. In Table 3.2 an overview is provided for currently available consumer wearable devices with an empirical assessment of how likely it is to obtain reliable stress information about the wearer using the respective device. This assessment is based on the relevance of biosignals the devices are able to record. Corresponding names of promising devices are marked in green. This table demonstrates that there are only few devices that are even remotely capable of monitoring stress accurately. However, these devices usually are either closed source commercial devices or do not provide an actual ECG that can be processed sample-by-sample. None of them provides an open-sourced protocol to use their raw signals. Therefore, this work will introduce its own open-sourced ECG recording device that additionally uses a novel recording principle to be usable in any environment (chapter 4). 3.3 In Vivo and Virtual Reality Stressors This section gives an overview about the most important/recognized non-VR-based (in vivo) stressors and, if existent, their adaptations in VR. It is a straightforward and common approach to use an existing and validated in vivo stressor mechanic, and create a VR adaption of it that employs the same mechanism to induce stress. The advantages of VR implementations are their easy/effortless and consistent repeatability throughout an experiment and their lack of direct physical impact on the subject to be stressed [149, 258]. A good example are phobic-based stressors, which are usually used with the goal of treatment, here however, the stressfulness is actually a side-

46 3.3 In Vivo and Virtual Reality Stressors [251] [195] [255] [252] [239] [256] [202] [229] [246] [180, 181] [187–190] [230–233] [227, 237] [240, 241] [253, 254] [203, 257] [234–236] [238, 239] [242, 243] [249, 250] [247, 248]

[104, 244, 245] IMU

x x x x x x x x x x x x x x x x x x x x Mic

x x x x GPS

x x x x x x x EEG

EMG EDA

x x

(x) SpO2

x x x x x x ST

x x x x Resp

x x x x x x x x x BP

x ECG

x x x x x x HR x x x x x x x x x x x x x x x x x x x x x x Device name Available sensors References Apple Watch Series 4&5 Aura Band Biostrap/Wavelet Wristband Empatica EmbracePlus Everion Fitbit Charge 3/Versa/Ionic Garmin Vivosmart 4/Fenix 5 Plus/Instinct Hexoskin Lief Misfit Vapor 2 Muse 2 Headband Omron HeartGuide Oura Smart Ring Polar Vantage V Samsung Galaxy Watch/Active (2) Sentio Feel Skagen Falster 2 Spire Health Tag/Stone TicWatch (Pro/S2/...) VivaLNK Vital Scout WellBe Bracelet Withings Move ECG/ScanWatch . Wearables from Overview of most : recent biosignal-tracking the initial potential to track stress continuously. For each bility to measure abiosignal are specific given, and references to the manufacturer’s product page andtific to scien- papers thatsomeevaluated aspects of theirracy. This accu- table was adapted from [4]. Wearableemphasized names in greenare color likely well suited toform per- stress analysison a based combination of relevant biosignals. Previously mentioned directly relevant biosignals are also marked with green color. various manufacturers have Table 3.2 wearable the available hardware sensors and/or possi- wearables

47 3 Related Work

effect of this process. Rothbaum et al. could show for the first timein 1995 that the treatment of a phobia purely in a VE is possible [259]. Similar research has backed this observation since then [148]. An example would be the treatment of arachnophobia, where people are confronted with virtual spiders, instead of real ones [260], or acrophobia (fear of heights) where subjects are immersed in a VR-simulation that brings them close to an elevated ridge [259]. Even though falling in VR causes severe cybersickness and possibly even actual falling in the moment of the virtual impact, it obviously is a more safe approach than having to experience a real falling (danger) scenario [261]. These approaches are only effectively usable as a stressor for people having a corresponding phobia, or at least are susceptible to the associated fear, and thus are not suitable as a general purpose stressor. In VR, it is also possible to design entirely new methods for stress induction that are not (easily) possible in vivo. This can be based either on the modeling of impossible scenarios, e.g. being threatened by a dinosaur, or it could involve the exploitation of negative side-effects in the way VR systems are generating immersion. For example this could be a method that causes physiological stress by purposefully inducing cyber/simulation sickness [73, 76, 262] to disorient subjects, make them feel sick and causing preventive or reactive stress effects in the body. However, this is not the focus of current research due to reasonable ethical considerations as well as harmful outcomes for the subjects. Another class of stressors are exposure to noise or emotion inducing stimuli (e.g. with emotional movie scenes, or showing video footage of a gross dental treatment). However, these in general do not have significant effects on the stress level [82] and related literature isnot further considered here. 3.3.1 Trier Social Stress Test One of the most prominent generalized stressors in literature is the Trier Social Stress Test (TSST) [39, 88, 263, 264]. It was developed by Clemens Kirschbaum and colleagues in 1993 [39]. The test comprises a preparation phase, where subjects are told to prepare for a job-interview- like public speaking task for ten minutes and the judgement phase, where the prepared talk has to be given in front of an unsmiling three-person audience as shown in Figure 3.3 in the bottom picture. To even further increase socially evaluative threat, the judges give the subject a mental arithmetic task at the end that gets increasingly difficult so subjects are likely to fail eventually. This combination of social and cognitive stressors

48 3.3 In Vivo and Virtual Reality Stressors

Figure 3.3: The TSST in vivo and in virtual reality. The public speaking and mental arithmetic task setting for the TSST in reality at the bottom and the VR adaptation at the top as it was realized in the work by Zim- mer and colleagues [263]. Reprinted from [263] © 2019, with permission from Elsevier.

are the reason for the high effectiveness of the TSST [39, 82, 263]. Since its introduction it has been cited, replicated and used many times [82]. There are several VR adaptations of it, e.g. [263, 265–268]. Recently, a meta-analysis of the cortisol reactivity of most VR implementations has been published [269]. It calculated a mean effect size of 0.65 for the cortisol increase in the test condition across 13 studies implementing a VR version of the TSST. The most recent implementation and publication of Zimmer and colleagues comprises two TSST versions: one in reality and one in VR, where the whole setting and even the looks of the judges were recreated to match the reality as close as possible, shown in Figure 3.3. They enrolled and tested 93 subjects in their study, distributed into two VR (stress and control) and two non-VR conditions for the TSST. They tracked and evaluated HR, SCL, salivary cortisol and 훼-amylase. The results show that the VR version induces significant increases in all salivary metrics and thus causes activation of the HPA axis. However, they all are also significantly lower compared to the in vivo TSST. The heart rate did not increase at all compared to the control in the VR condition, while in vivo it increases highly significantly. The SCL did not show any relative change between conditions. It increased significantly for all conditions over time. Especially considering the largely different behavior of the HR, this shows that even though a similar effect is possible ina VR adaptation of the TSST, it is not quite the same level of stress or perhaps it is a different kind of stress. The authors assume the VR version to be comparable to the in vivo method and do not give a detailed finding of why the HR shows a different behavior and the SCL does not change at all.

49 3 Related Work

Comparable findings are described by Shiban and colleagues in 2016 [268]. They also performed the TSST in vivo and in VR. For the latter, they also introduced an additional condition (“VR+”), where they enhanced the normal TSST protocol to include a (virtual) competitor. In general, this had an increased effect on all measured stress parameters. However, salivary cortisol also changed much less pronounced in either VR condition, compared to in vivo. In a 2019 meta-analysis, Wechsler et al. analyzed the treatment-effectiveness fordifferent phobias in VR compared to invivo [148]. Theyconcluded that non-social phobias can be treated as effectively in VR as in vivo. However, even though treatment of social phobia in VR shows the same trend, its effect compared to in vivo treatment is much less pronounced as it is for the other phobias. This could be interpreted as further proof that the social threat in the TSST-VR versions is just not as effective as in vivo [263, 268]. With the TSST being the most profound social stressor, it could further be interpreted that in general VR social threat stressors are simply not as effective as in vivo, as long as the subjects know that they do not experience the reality, or interact with real people. In general, the deficiencies of the TSST are that the socially-evaluative stress can not easily be separated from the cognitive stress in any modality [264]. Furthermore, for some studies it is a problem that repeated exposure leads to habituation, this in particular has been proven for the TSST [38]. However, for this work this is also an important feature as it allows the habituation parameter of a subject to be evaluated. 3.3.2 Cold Pressor Test Next to the TSST, anotheroften referenced stressor in literature is the cold pressor test (CPT) originally developed in 1932 by Hines and Brown [270]. It involves the immersion of one hand, up to the wrist, in ice cold water (4 °C) for one minute. Its effect on the human body, most notably the rise in blood pressure, is very consistent and repeatable, even within the same subject, and occurs in 99 % of the population in a similar way [271]. The characteristic reading of blood pressure after 30 s and 60 s allows the diagnosis of hypertension [271]. Due to its shocking effect on the human body, and for some people large amount of pain involved, it is a very stressful procedure/maneuver and thus has been used extensively for that purpose [272, 273]. Including a socially evaluative component (SECPT), this test also shows significant HPA axis activation [40]. This stressor is not transferable to a VR, as it is based mainly on the physical effects of vasoconstriction and nociceptor stress, i.e. pain [271, 273].

50 3.3 In Vivo and Virtual Reality Stressors

3.3.3 Maastricht Acute Stress Test A combined variation of the TSST and the CPT is the Maastricht Acute Stress Test (MAST) [88]. It uses an enhancement of the socially evaluative component and an alternating sequence of mental arithmetic and ice water hand immersion tasks, while being given negative feedback. Generally speaking, this test maximizes the cognitive load on a subject while inducing physical pain, blood pressure increase and shameful feelings, causing a high response in subjective, autonomic, and HPA axis stress [88]. The authors showed in theirwork that this causes significantly higher cortisol responses than the CPT or SECPT alone, and also higher post-responses than in the TSST. However, during the task, cortisol response is higher in the original TSST. 3.3.4 Montreal Imaging Stress Task The Montreal imaging stress task (MIST) is a combination of a cognitive stressor with social evaluative threat [274]. It was originally developed as a stressor for use in functional magnetic resonance imaging (fMRI) or positron emission tomography (PET) devices. At its core, it is a mental arithmetic task, constructed in such a way that answers can always be given with the numbers 0-9. This was done to allow keyboard responses inside an MRI device. The core stressor is, that the performance of the user in the task is compared against others and its difficulty increased if the subject manages to surpass the average performance. So it is impossible to beat the system for the subject but he/she is told by the experimenter that it is essential for the experiment outcome for him/her to perform around the average. The subject is also told that the performance is closely monitored, etc. This represents a classical implementation of the exposed-failure stressor paradigm (see Fundamentals) and thus causes a high state of anxiety and stress for the participant. This is reflected in a significant increase in cortisol levels for participants of the MIST’s test condition. It is obvious that the effectiveness of all these tests rely on a large pro- portion on socially negative feedback or the fear of it [19]. Due to our evolution in society structures and their importance for survival, shame is a very strong emotion in humans. An individual outside of a group was an easy target for predators, so the shameful feeling increased the chances of getting back into the group. Socially evaluative threat targets exactly this evolutionary developed shame/stress mechanism [19, 275]. Therefore, it is not surprising that these tests constitute the gold standard in laboratory stress induction. A large downside to this is, that

51 3 Related Work

Red Blue Green Green Red Blue Figure 3.4: Examples for Stroop stimuli. The left three words are congruent stimuli/Stroop words; the right three words are incongruent stimuli/Stroop words. The Stroop interference can be prominently observed when sequentially naming the color of several incongruently colored words in quick succession. being subjected to such tests is not a pleasant experience for participants. This is what purely cognitive stressors try to avoid. They satisfy the requirement of a stressor where it is undesired or not feasible to induce socially shameful feelings in subjects. 3.3.5 Stroop Test The Stroop Test is one of the most important cognitive stressors and one of the main pillars of the foundation of this work, therefore it and its related work is discussed in great detail. First, a history of the Stroop Test is given. Then 2D tests are discussed, afterwards VR implementations of the test and finally an overview of open-source projects about the Stroop Test. The History and Overview of the Stroop Test In 1935, John Ridley Stroop published his influential work about what today is known as the Stroop Color Word Test (SCWT) or just Stroop Test [276]. It is of special interest for this work, as the stress laboratory developed here is centered around this test. Even though the Stroop Test wasn’t originally designed as that, it is a lab stressor now in many different ways. It provokes Stroop interference in a cognitive task very reliably and someone doing this test needs to concentrate his attention very well to perform the test properly without making mistakes [44]. Originally, Ridley Stroop was not concerned with creating a stress test. As mentioned earlier, the specific concept of stress in humans was not widely established in the 1930s. He mostly was interested in studying interference effects in the human cognition with respect to word reading and color naming [41]. He came up with the idea to create a compound stimuli where the ink-color of a word, which itself describes a different color, has to be read, or vice versa. This is called the incongruent condition of the Stroop Test [41]. An example of both Stroop conditions/ stimuli is shown in Figure 3.4. The test usually comprises two main conditions: congruently and incongruently colored words. In the incongruent condition, the task usually is to name the color of the ink and inhibit

52 3.3 In Vivo and Virtual Reality Stressors

Figure 3.5: Ridley Stroop’s Stroop effect graph. Original plot depicting the Stroop effect in Ridley Stroop’s paper from 1935 [276]. It shows the large difference in reaction time induced by the Stroop interference effect between congruent stimuli (1) and incongruent stimuli (2). the more automated (i.e. highly practiced) cognitive process of reading the word [44]. Overcoming this interference of stimuli and inhibiting the automated task is associated with a delay in the response while the subject has to concentrate on selecting/naming the right color. Figure 3.5 shows Ridley Stroop’s original drawing demonstrating this effect in his experiment population. A multitude of work exists around many variations of the Stroop Test exploiting this cognitive interference effect. For example, varying stimulus onset (word-preexposure intervals), where the word is shown but the colors only with a certain delay, or sequence effects like negative priming, which constitutes two consecutive stimuli where the second one requires the color not to be named that was required on the first one, or vice versa. In 1991, Colin MacLeod published an extensive review on research about the Stroop effect, tasks, and tests as well as created a taxonomy forthem, which will also be used in this work [41]. He categorized the main types of Stroop Test variations as the original list-test, the individual stimulus version, the (card-)sorting and matching, the picture-word task, and the auditory version. He also specified two possible response modalities: the oral and the manual response. With the oral modality, a higher interference effect is observed, however, the manual modality in general allows for more precise measurements. MacLeod mentioned that already in 1991, around 700 Stroop-related articles existed in the literature [41]. The majority of these have been published between 1973 and 1991. A verification through Google Scholar1

1 https://scholar.google.com (visited on 2019-11-02).

53 3 Related Work

reveals that indeed, when searching for literature containing the word sequence “Stroop effect” within a date-range of 1973 to 1991, around 870 articles show up [277]. The same search within a date-range of 1992 to 2020 results in almost 17,000 references [278]. These comprise research of all scientific domains, from findings about the metabolic origins ofthe interference in the human brain to classification of schizophrenia using the Stroop effect, or its changes while aging [279–282]. The possibilities for research using this test are vast and the principle can be applied pervasively and has led to a better understanding of several cognitive processes [44]. This work can only scratch the surface of the general research, and will therefore focus on a deep dive on literature providing detailed psychophysiological measurements and analysis on stress-related effects, as well as virtual reality and open-source implementations. 2D Stroop Tests In 1988, Tulenet al. measured biochemical and physiological responses to the Stroop Test [283]. They recorded plasma and urinary (Nor)Adrenalin, cortisol, HR, and EDA. Nine young male subjects in their early 20s underwent a control and a stress session. The latter one comprised two 20 min Stroop Tests with 20 min resting period before and after each test. During the control session they were only required to sit with their eyes open for five 20 min periods. All measures showed significant increases with the exception of cortisol during the stress session, compared to the control session. Delaney and Brodie measured HR as well as calculated HRV during an advanced Stroop Test in 2000 [284]. The test was not the regular Stroop Test, they also set it up in a competitive setting among 30 subjects with the additional promise of financial reimbursement. Furthermore, they also had to do a mental arithmetic challenge. The measured parameters were HR, 휎푅푅, 푡푅푆퐷, 푝푁50, 푃푣퐿, 푃퐿, and 푃퐻. Results showed a significant reduction in 푡푅푆퐷 and 푝푁50 for the test group, as well as significant decrease of 푃퐻 and significant increase of 푃퐿, and consequently a significant increase in 푃퐿/푃퐻. Renaud and Blondin provided another evaluation of the physiological changes during the Stroop Test [285]. They tested 51 young male adults in a congruent control and then incongruent test condition. Both lasted for 8 min, with a 10 min baseline phase before that, a 5 min resting phase in between, and a 5 min relaxation phase afterwards. They recorded the ECG and EDA for all participants and calculated HR and SCL changes between all phases. No significant differences where observed between any conditions, in any of the measures. However, they reported that

54 3.3 In Vivo and Virtual Reality Stressors

if the incongruent condition was performed first, and then the control condition, mean HR changed from 82.1 bpm to 76.4 bpm, respectively. If performed in the other order, first the congruent, then the incongruent, mean HR changed from 73.3 bpm to 72.9 bpm, respectively. This still is a very interesting finding, as it seems to suggest that the stress for the Stroop Test is largest, if subjects are not confronted with the task for a prolonged period of time. So, in other words, they have to be thrown right into the difficult incongruent test condition to show a large stress response. Usui and Nishida performed thorough evaluation of changes in HRV metrics during the Stroop Test [286]. They measured the 푅푅-intervals during a 20 min intervention as well as a 10 min baseline before and 120 min relaxing period afterwards. They calculated the 푃푣퐿, the 푃퐻 and the 푃퐿. Results showed that the 푃퐻 and 푃푣퐿 significantly decreased and 푃퐿/푃퐻 significantly increased during the Stroop Test intervention. During the relaxing period, the 푃푣퐿 was significantly decreased compared with all other periods. They argue that 푃푣퐿 shows the slow recovery to a stressor and 푃퐿/푃퐻 shows the quick recovery. In 2011, Eilola and Havelka performed an emotional and taboo task Stroop experiment [287]. They recorded the SCL and Stroop reaction parameters while testing native English and non-native speakers for emotional and taboo words. Results showed that the SCL increased much more for the native speakers and only slightly for non-native speakers. This means that those kind of tests need to be performed in the native language, and that an SCL rise should be observable for a stressor. Particularly EDA was measured by Mestanik et al. during a Stroop Test experiment that also involved mental arithmetic and negative emotional stimulus (in the form of a dental caries treatment video shown to participants) [288]. They recruited 20 young adult students in their study and used a protocol of all three stressors as subsequent tasks with 2-4 minutes recovery phases in between and baseline measurements at the beginning. For the Stroop phase, they observed a highly significant increase in SCL of 4.5 µS from an average of 6.5 µS during rest, and 11 µS during Stroop stress. Although their standard error was comparably low, in contrast to the recommendations for EDA measurement [289], they did not calculate the increase per person and averaged across that, but summed up the raw SCL values. Still, their results show that during the Stroop Test, a significant increase in SCL should be observable. Another EDA-only Stroop study was conducted by Svetlak et al. in 2010 [290]. They measured EDA of 106 healthy university students during rest and inside the Stroop Test. However, they did not provide raw EDA

55 3 Related Work

values, but instead performed an intrinsic complexity analysis, which did not provide any new or different results than those described previously. In 2016, Poguntke, Wirth and myself also conducted a Stroop Test experiment, which led to my research for this thesis [57]. In this study, three different conditions were used to determine the physiological effect of performing the Stroop Test in a virtual environment and what possible implications a changing VR environment might have on those signals. Here, a classical Stroop Test with individual stimulus presentation and oral response modality was performed for 15 young adult subjects. They were randomly, equally distributed in one of the three conditions: Desk- top, meaning a 2D version of the test on a computer screen, VR, a 2D version of the test on a virtual layer in an empty VE, and VR-hm, which extended the VR condition by shifting the presented Stroop item within the user’s field of view in VR, so that a head movement was necessary to properly see the word. Each subject went through a 5 min resting baseline phase, then either an incongruent stress phase or a congruent control phase, with a 5 min resting phase in-between. The protocol was very similar to that previously described for the work of Renaud and Blondin [285]. No significant changes could be observed in the measured HR and HRV parameters between the control phase and the stress phase in any condition. However, the parameters showed a trend towards the VR-hm condition being slightly more stressful then both other conditions. Based on additional oral feedback from participants, the conclusion was that an additional visual search task in a VE could serve as an additional stressful element to complement the Stroop paradigm. This additional task was implemented and optimized for the Stroop Room. VR Stroop Tests There are also several other virtual reality implementations of the Stroop Test. Some of these were already mentioned in [43], which is loosely recapitulated in the next three paragraphs and afterwards, more recent work is described. Rizzo et al. developed the Virtual Classroom, an application that takes advantage of creating a controlled standard environment in VR forsimula- tions to collect and compare performance data for assessment, treatment, and rehabilitation [291]. A Stroop Test was designed for this virtual classroom with the purpose of evaluating complex attention performance. They showed that a virtual 2D Stroop Test has the same real-world Stroop interference effect.

56 3.3 In Vivo and Virtual Reality Stressors

Parsons et al. and Wu et al. implemented the test into VR in a military- style simulator [292–295]. Even though this is one of the best evaluated Stroop Tests in VR, which also offers extensive evaluation of associated changes in biophysiological signals, they only implemented a 2-dimensional variation on a screen/window in the VR, they did not exploit the Stroop mechanism itself in the virtual environment. They do have a very well-working stressful environment, yet it essentially is a military simulation, which may not be suited for a general audience. Tocreate a measure of inhibition and impulsivity assessments, Henry et al. developed the bimodal VR-Stroop, ClinicaVR: Apartment Stroop [296]. Results of the preliminary study with 71 participants confirmed the elicitation of the Stroop effect with a VR task based on bimodal stimuli and the ability to measure motor inhibition and internal interference control, and scores related to important measures of impulsivity. This implementation was further assessed in 2018 by Parsons and Bar- nett [297]. They compared the implementation against traditional Stroop implementations using only performance assessments. No biosignals were recorded. They concluded that similar responses are found in the VR implementation of the Apartment Stroop and observed in a study with 91 healthy undergraduates that Stroop performance was worsened when distractors are present in the VE. Recently in 2019, Kerous et al. implemented a Stroop test in a VR setting [264]. They tested a group of 42 healthy subjects in a virtual setting with social stressful elements represented by agents in different proxim- ity to the participant and having different acting elements between the agents while a 2D Stroop task had to be performed on a monitor inside the VE. They measured EDA and ECG and calculated HR and 푡푅푆퐷. They concluded that EDA is particularly useful for the detection of even mild stressful situations and that the HRV parameters are not easily influenced by a socially stressful environment alone. HR changed from a median of 69 bpm in the relaxing condition, to 84 bpm in the Stroop condition. Open Source Stroop Tests The Stroop Test not only exists in the literature, there are several open- source and/or direct-to-use versions available. Open-source implementations of the Stroop Test were made e.g. for Matlab [298], PHP [299], Swift (iOS game app) [300], ReactJS [301], or Android Java [302]2.

2 I did not test these implementations for functionality. Authors are listed as best as they could be identified. If no publication is associated with an implementation, theyare

57 3 Related Work

There also are several psychological experimental test platforms available that have a Stroop Test integrated, e.g. [303, 304] in C#, [305] in JavaScript for the Experiment Factory [306], [307] written in Python for the Psy- choPy3 platform [308], the PEBL platform [309], within the PsyToolkit [310–312], or a pure online version [313]. However, these are all implementations of the classical Stroop Test paradigm with only slight modifications and differences, e.g. with respect to response modality. There is no implementation available that transfers the Stroop paradigm into a 3D virtual reality. 3.3.6 Synopsis of the Related Work This chapter looked at the state-of-the-art in industry and science about wearable ECG recording and classification, stress detection and prior work about experimental stressor methods outside and inside a VR. Wearable ECG devices have been developed intoa statewhere unobtrusive monitoring is possible and real-time connectivity and data streaming to other devices is feasible, but not the standard yet. There are only few devices that allow license-free and easy raw signal data extraction within an open framework, which is desired by scientific projects. Con- sumer devices like the Apple Watch and the Withings ScanWatch provide methods for easy ECG recordings in the home environment, but these devices are still restricted in their use, specifically they require active and motionless focus of the user. With respect to wearable stress classification, several promising methods have been developed in the literature, but many of these algorithms focus on quantifying stress into often rather rough categories. More sophisticated algorithms are not available in consumer devices yet, and those that claim to be able to assess stress, usually do not share their methods or provide independent evaluation results. Experimental stressors exist in a large variety. The most successful ones, which are considered the gold standard with respect to their activation of the HPA axis, are the TSST [39] and the MAST [88]. Their disadvan- tages are that they require a lot of effort and time for experimenters and subjects, and they usually invoke, out of necessity, considerable discomfort among subjects. This work tries to tackle the major issues in these three areas and provide an improvement. The issue with restricted wearable ECG devices is approached with the Everywhere-ECG, described in the next chapter,

listed e.g. for GitHub according to the amount of code lines contributed to the respective repository.

58 3.3 In Vivo and Virtual Reality Stressors

which uses a different recording principle to allow a more pervasive use of such devices, while following an established wearable platform standard. For stress classification, this work introduces an extensive evaluation of different HRV parameters that can be used to allow a more fine-grained algorithmic quantification of stress, while also providing reference datasets. Furthermore, an algorithm for stress responder type classification is proposed, which takes a novel approach, compared to established methods. Finally, the Stroop Room is proposed as an implementation of the Stroop Test paradigm in a novel way by transferring it to an immersive 3D VR environment and creating a stressor method that does not suffer from the most crucial drawbacks of classical laboratory methods.

Part III

Components for Electrocardiogram-Based Wearable Stress Classification

4 An Everywhere-Usable Electrocardiography System As was laid out in the previous part, stress classification is reliably possible using an ECG signal. In contrast to the EDA or breathing signal, it has little external influences and a well-understood and robust recording mechanism. It is the basis for HRV parameter determination, which depends on a precise localization of the 푅-peaks. Such a precise determination is not feasible with averaged heart rate tracking solutions that are usually employed for PPG-based signals in consumer devices. Further- more, the ECG signal has an established foundation for the assessment of arrhythmic influences and even detection of heart diseases. It is for all of these reasons that a wearable ECG device forms the basis for the stress laboratory presented in this thesis. A wide variety of wearable ECG devices exist. However, only few allow cheap (without license costs) and easy access to the raw signals in an open signal framework for research. Furthermore, even though many of the patch-like devices in Table 3.1 are classified as waterproof, this only concerns the insulation of the general circuit boards, not the technique with which the electric potential on the skin is measured. It is still standard practice to build ECG systems as measurement devices of electric voltage. The reason for this is of historic nature and concerns the development of mass-production electric circuit components and the associated lacking availability of GΩ and TΩ resistors for transimpedance amplifiers [120, 212]. In this chapter, an approach is presented, which is based not on a direct measurement of voltage at the skin interface, but a measurement of electric current. Although, indirectly, also a voltage is derived, this happens at a later stage in the circuit, while initially measuring a flow of current in the first stage. Such transimpedance amplifier circuits arenot new in the area of biosignal recording [314], however, no previous work has designed and evaluated such a circuitry for use in wearable textile devices and specifically for application in soaking wet environments or even underwater. Both of these scenarios are of increased importance when targeting the home- or even unknown-environment, which the in-the-wild stress laboratory in this work is. The current-based approach is a convenient option to reliably provide the ability for long-term ECG monitoring in any environmental condition while not impacting the usability. The stress laboratory should be as

63 4 An Everywhere-Usable Electrocardiography System

future-proof as possible and therefore requires a recording modality that not already by design makes certain recording scenarios unfeasible. Most modern wearable- and smart-devices have made the transition to water-proof concepts. However, voltage-based ECG devices are designed for controlled environment usage like in the hospital or laboratory setting. This recording technique could not yet make the same transition. Taking the Apple Watch as an example, its ECG system does not work underwater, per design, as it records the voltage. Also, PPG-based HR recording is susceptible to noise errors when used in submerged conditions. Classical voltage-based measurements deteriorate eventually when the skin-electrode interface and the skin area connecting the two electrode sites is submerged in water or placed in a very humid environment, e.g. in a completely sweat-soaked garment. Usually a short-circuit through the water/liquid layer and an immediate dispersion of all electric charges at the skin surface will prevent reliable measurements of skin potentials. However, it is not impossible to measure a voltage potential in watery environments when employing highly adhesive foam-gel electrodes where the gel has a higher conductivity than water and only dissolves slowly in such a medium. This approach is not particularly deterministic though, and therefore also not very useful for reliable signal measurements. Other approaches that use traditional potential-based circuits in watery environments require significant effort through insulation techniques ensuring the electrodes stay mostly dry during recording. These techniques always require the use of adhesive gel-foam Ag/AgCl electrodes. If pure metal plate/contact electrodes are used, as it is the case for most of the wearables, it is impossible to record an ECG signal in water environments. The current-based recording approach presented in this chapter might be a solution for this problem. Not only is the ECG system usable in any environment, integrated in a well-researched wearable textile platform, it also allows real-time streaming of raw ECG signal data to a connected Bluetooth device in an open framework and thus forms the data basis for the wearable stress classification platform. Streaming is even possible in submerged conditions, however, wireless transmission is severely im- pacted, up to a complete signal loss, due to the non-propagation of radio waves in the Bluetooth frequency range through larger water volumes. This chapter describes the circuit and its evaluation study. It was tested and evaluated in repeated experiments overa half-year period in the swimming pool of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU). Furthermore, an HRV relaxation study was performed with it, which is presented in the second part of this chapter. The purpose of

64 4.1 Platform and Current-Based Circuit Design

this study was twofold. First, it should provide proof about the accurate measuring capabilities of the current-ECG in submerged conditions. Sec- ond, its data forms part of the knowledge foundation as ground-truth for the determination of stress-based changes in the HRV parameters. The study was conducted in an environment that specifically was designed to be as relaxing as possible, and without any stressful influences to the participants. Previous research has shown that such an externally relaxing situation can be achieved reliably through flotation in water [315–318], which does not rely too much on the subject’s ability for self-relaxation. The final goal of recording this dataset is to be able to determine theexact changes in the HRV parameters compared the more stressful situations in later studies in this work. Most of the information presented here has already been published in 2017 [54], however, the result and discussion sections were significantly extended for this thesis. In the original work, my student Jasmine Lauber helped with the design and implementation of the circuit, and the HRV study as part of her Master’s thesis under my supervision [319]. Tobias Cibis provided support with feasibility testing of the circuit and wrote some passages of the original paper from 2017. Vinzenz von Tscharner had the original idea for building this circuit and helped plan its design and the associated experiments based on his prior work [320–322]. Ruslan Rybalko supported the embedding of the circuit into the Fit- nessSHIRT platform. 4.1 Platform and Current-Based Circuit Design Current practice to measure the cardiac activity is the determination of potentials on the surface of the human skin. The ECG is therefore registered as a potential difference (Φ1–Φ2) using two active measurement electrodes attached to the body. In contrast, the current-based circuit is based on the assumption that current reflects the cardiac activity equally to the surface potential and therefore intends to amplify current rather than voltage [321]. The theoretic advantage of current amplification is that the electric potentials on the skin surface arenot simply measured, but flow as a current into the electrode interface and are thus not susceptible to shifting noise or dispersion. The current-based sensor-system described here, including its housing, can be seen in Figure 4.1, the general/abstract block diagram of the circuit is depicted in Figure 4.2. All parts not related to the current amplification circuit are based on the original FitnessSHIRT platform [66, 323]. In the current-based circuit, an ECG signal is measured by a monopolar

65 4 An Everywhere-Usable Electrocardiography System

(a) Housing (b) Board

Figure 4.1: Underwater ECG sensor overview.(a) Prototype housing of the sensor. (b) Board of the sensor with the battery attached and the filter-switch visible at the upper edge. Reprinted from [54] with permission. electrode placed somewhere on the body where electrical currents are induced by the changes in the heart’s electrical field. The obtained signal is processed through a transimpedance amplifier and is subsequently filtered using a highpass–lowpass configuration. Two filter configurations are implemented and can be selected via a switch. A wide bandwidth filter with a pass-band of 0.5 Hz–500 Hz, and a narrow bandwidth filter with a pass-band of 7 Hz–24 Hz. These different filter paths allow the output signal to either be used for clinically relevant morphological analyses or as a very noise robust input for precise 푅-peak detection, respectively [75, 324]. After the filter-stage, a differential amplifier and an analog to digital (A/D) converter realize the digitization of the signal. 4.1.1 Transimpedance Amplifier The transimpedance amplifier represents a low ohmic input stage and acts as the central element of the current-based ECG circuit. It translates incoming current into a proportional output voltage. In the presented circuit, the transimpedance amplifier is designed using a conventional voltage-controlled amplifier where the non-inverting input is connected to a reference electrode; it is thereby actively grounded and the inverting input is connected to the measuring electrode (Figure 4.3). The low ohmic input is realized using a virtual shortcut at the input stage of the amplifier. Furthermore, the virtual shortcut causes both inputs to be on an equal potential Φ+ = Φ−. Due to the low ohmic input of the measuring electrode, the current signal flows into the amplifier andis converted into voltage. A resistance is integrated in the feedback loop,

66 4.1 Platform and Current-Based Circuit Design

Figure 4.2: Block diagram of current-based ECG circuit. Simplified block diagram of the developed circuit for current-based measurements of the cardiac activity. The circuit was implemented for two bandwidths. In the upper path, the wider range of the bandwidth (0.5 Hz–500 Hz) is illustrated. In the lower path, the narrow range of the bandwidth (7 Hz–24 Hz) is shown. Reprinted from [54] with permission. in order to amplify the signal. According to Ohm’s and Kirchoff’s laws, the output is characterized as:

푈out = −퐼 ⋅ 푅 (3)

Besides the signal of interest 퐼ECG, an interfering DC or slow drifting signal 퐼noise can occur and flow over the transimpedance amplifier, causing the amplifier and the following filter stages to overload. Therefore, a feedback loop was implemented to counter the DC and low-frequency drifts by extracting and feeding back the interfering signal that was back inverted and processed through a lowpass filter (Figure 4.3b). This feedback loop is only integrated in the wide frequency range circuit as a low-frequency drift or interfering DC can only occur there. 4.1.2 Signal Processing The obtained voltage output signal from the transimpedance amplifier is further processed by a bandpass filter consisting of a cascaded highpass and lowpass. The signal in the wide bandwidth range is filtered at the cutoff frequencies 푓푐 = 0.5 Hz for the highpass and 푓푐 = 500 Hz for the lowpass. In the smaller bandwidth range the signal is high- and lowpass filtered using the cutoffs 푓푐 = 7 Hz and 푓푐 = 24 Hz, respectively.

67 4 An Everywhere-Usable Electrocardiography System

(a) Transimpedance amplifier (b) Feedback loop

Figure 4.3: Transimpedance amplifier.(a) Transimpedance amplifier converting the current input signal into a voltage output signal. (b) Structural model of the feedback loop to reduce DC/low-frequency interring noises. Reprinted from [54] with permission.

While the wide bandwidth allows a full representation of the morphology of the cardiac activity to be measured, the smaller bandwidth can be used to extract the QRS complex and to remove noise for further signal processing. After the bandpass, the filtered signal is amplified using a differential amplifier and then digitized by the microcontroller, which stores the signal, sampled at 1000 Hz into an internal flash memory. The signal can also be transmitted wireless via Bluetooth to an external device in real-time during recording. However, for transmission it is additionally resampled to 256 Hz to reduce energy requirements. For all evaluations described later, the signal was stored on the internal flash memory and then transferred to a PC after the measurement, converted and handled in a comma-separated values (CSV) file. 4.1.3 Power Supply and Housing As power supply, a Li-Ion rechargeable battery with an output voltage of 3.7 V, a capacity of 300 mAh and a stored energy of 1.11 Wh is used. The supply voltage is regulated to 3.3 V, as required by some components. A virtual ground point was set at 1.65 V. Hence, the signal swings around this potential. The developed circuit was integrated into a plastic housing for protection. The housing was adapted to the ports and switches of the circuit to enable access (Figure 4.1). Furthermore, the housing provides a positive effect for the shielding of interferences. A micro-USB connector allows the sensor to be charged via USB power, and data from the flash memory can be transmitted through this interface. The ECG electrodes are wired to the circuit board using an analog cable connector

68 4.2 Circuit Validation Study

and shielded (coaxial) cables that were manually soldered to common button-connectors for foam-electrodes. 4.2 Circuit Validation Study In addition to a routine electrical evaluation of the circuit, physiological measurements in dry and immersed conditions were performed and compared against those made simultaneously with two commercial potential-based ECG sensors to proof the superiority of a current-based approach. Commercial sensors were the Noraxon Desktop DTS (No- raxon U.S.A., Inc., Scottsdale, AZ, USA) as a Gold-standard reference system for dry measurements and the Shimmer2 ECG sensor (Shim- mer Research Ltd., Dublin, Ireland) for underwater measurements. The Shimmer device allowed the attachment of long electrode cables (up to 1.5 m) enabling comfortable water access and submersion for subjects while keeping the circuit housing outside the water. The measurements were performed either with metal-plate electrodes or with Ag/AgCl adhesive foam electrodes. Data was recorded from five different adult subjects (two male, three female). All recorded datashowed thesamecharacteristicsasdescribed in the following. Therefore one of those datasets is used, from a healthy male participant, to demonstrate the signals and findings. Several different electrode positions were tested for the measurement electrode (e.g. on the wrist or all typical 12-lead ECG electrode positions). However, most showed similar signal characteristics (Figure 4.5). In the dry environment, measurements using the wide range configuration showed accurate morphology of the ECG. However, the signal was strongly overlapped with 50 Hz power line interferences (Figure 4.4a bottom and Figure 4.5), however, these can be easily removed using a notch filter in a post-processing stage. This was not done for these figures or in the analog circuit directly (as it is done usually in commercial ECG devices) in order to investigate the difference between various electrode positions (Figure 4.5) and theoverall impactof noiseon thecircuit. In contrast, the small range configuration showed very little noise but due tothe small bandwidth several morphological ECG events were filtered as well. Current-based measurements in immersed conditions corresponded to those obtained in dry conditions. The wide bandwidth configuration still showed power line artifacts, however, the signal-to-noise ratio was about ten-times higher than outside the water (see Figure 4.4b bottom, where the signal amplitude is in V rather than mV compared to outside the water). One particularly influencing noise characteristic in underwater

69 4 An Everywhere-Usable Electrocardiography System

measurements, which is also visible in Figure 4.4, is a heavy baseline wandering, which is very rapid and unpredictable, probably related to the flow of water around the electrode sites. In severe cases, this caused the potential ECG’s ADC to go into saturation. While also still visible in the current ECG, it is not as susceptible to those problems, since the potential difference influence from the water short circuit has very little influence. The small bandwidth configuration shows almost no additive noise. The QRS complex was correctly amplified. In order to prove the accurate working of the current-based ECG, the sensor was compared to two commercial potential-based ECGs. To compare the current-based sensor, measurements were performed simultaneously using both currentand commercial ECG sensors. Figure 4.4 demonstrates the simultaneous measurement of both sensors. Since exact circuitry behavior is different for each device, and the commercial devices certainly contain analog filter stages for power-line noise rejection, their SNR seems to be better. The measurement showed an accurate ECG signal using the current-based sensor and no crosstalk between the electrodes of the commercial ECG and the current-based one seemed to have occurred. QRS complexes seen in the potential-based ECG sensor correspond to those identifiable in the signal of the newly developed current-based sensor. However, the commercial sensor showed an increase of interferences in immersed condition. The reference potential-based sensor was not able to record an accurate morphologically interpretable ECG. QRS complexes are reduced to hardly discernable peaks that provide neither a good enough SNR for reliable 푅-peak detection, nor the assessment of fiducial points. The results show an improvement of the newly developed sensor regarding the signal quality compared to a commercial reference ECG sensor in immersed conditions. Based on the filter design with a bandwidth of 7 Hz–24 Hz, only the QRS complex is extracted. Due to that narrow passband, most of the other parts of the ECG signal besides the QRS complex are lost. The newly developed sensor was able to record an ECG signal in immersed conditions without loss of signal quality. Compared to the potential-based sensors, the current-based system showed advantages regarding resilience to low-frequency drift. Further- more, the signal of the Shimmer sensor was attenuated, likely due to the conductivity of water as already mentioned by Kwatra et al. [325]. Overall, the current-based system is more reliable during recordings under immersed conditions.

70 4.3 Heart Rate Variability Characteristics in Relaxed Environments

(a) Dry ECG measurement (b) Underwater ECG measurement

Figure 4.4: Dryand submerged ECG measurements. Left side (a): Comparison between a commercial surface potential-based ECG (top) and the current-based ECG in the wide bandwidth configuration (bottom), both showing the same six heartbeats of one healthy person not immersed in water. The potential ECG was filtered in its device while the current-based ECG was not post-filtered. Measure- ment electrodes were placed in a Lead II configuration. Right side (b): Comparison between a commercial surface potential-based ECG (top) and the current-based ECG in the wide bandwidth configuration (bottom), both showing a 4 s window of the measured ECG with the subject’s torso and the attached electrodes fully submerged in water. Both measurements were made with metal-plate electrodes attached to a healthy person. The QRS complexes of the potential ECG are hardly discernible any more and severely distorted due to the loss in signal strength. In contrast, while the current ECG also shows some baseline wandering, it is overall much more stable also in the amplitudes of the QRS complexes, which even show an increased signal-to-noise ratio compared to dry conditions. Reprinted from [54] with permission.

4.3 Heart Rate Variability Characteristics in Relaxed Environments In order to examine the functionality of the developed sensor circuit and its embedding in the FitnessSHIRT platform, and to provide further comparable baseline data for the stressor platform and the Stroop Room classifier (chapter 10), a study was conducted to (1) record HRV values under very relaxed conditions and (2) to examine the differences in the HRV of humans that have their body almost fully immersed in water (head out immersion [326]) compared to a non-immersed condition. Floating in warm water is known to be a very relaxing situation [315, 317]. This study therefore can help to identify the magnitudes between relaxed and normal conditions, in contrast to stressed conditions and

71 4 An Everywhere-Usable Electrocardiography System

(a) Measured ECG signal (b) Power spectral density

Figure 4.5: Measured ECG signals and their PSD. Comparison of measured signal (a) and power spectral density (PSD) (b) of that signal from the current-based device using different electrode placements. In the top (I), the measurement electrode was placed on the proximal backside of the right arm wrist. In the middle (II), it was placed on the chest skin at the third intercostal space between the sternum and the midclavicular line. At the bottom (III), the measurement electrode was placed on the chest skin at the sixth intercostal space at the midclavicular line. All three configurations show a typical PSD with the 50 Hz power-line interference. The heart rate was about 100 beats per minute. The top signal (from the armwrist) naturally shows a lower signal amplitude (much smaller signal-to-noise ratio) than when measured at the chest locations. Reprinted from [54] with permission. more accurately evaluate the different HRV parameters with respect to stress related assessments later in this work, or in future work. It can therefore provide a ground-truth dataset for how such signals behave in participants under no external stress influence. The characteristic changes of the HRV and their magnitudes are compared to data from Schipke and Pelzer [326] to verify that this system is equally suited for such measurements as the one they used, an FDA- cleared, potential-based Holter monitor system with insulated electrodes. The HRV changes when submerged in water can serve as an indicator for cardiovascular health and should show similar characteristics across different experiments. 4.3.1 Subjects Twelve students were recruited to take part in this study, five female and seven male, aged (27.4 ± 2.8) years. Their mean body height was (1.77 ± 0.07) m and weight (72 ± 11) kg. All participants provided written

72 4.3 Heart Rate Variability Characteristics in Relaxed Environments

consent to participate. The study was approved by the local ethics committee of the university. The average sport workload of the subjects was 3.1 h per week. All of them were non-smokers, had no cardiovascular problems, no acute diseases and were asked not to consume alcohol or caffeine up to 12 h before the measurements in order to reduce external substance-based influences on the HRV [327]. 4.3.2 Method The study was performed in a swimming pool with a water depth of <1 m and stairs. The average water temperature was 30.2 °C with a chlorine content of 0.05 mg/L. The measurements outside the swimming pool were also performed in the same room with an average air temperature of 26.3 °C. All measurements were performed between 1 pm and 4 pm in the afternoon to reduce influences of the circadian rhythm and provide standardized conditions across all subjects [327]. The ECG of all subjects was recorded using the current ECG system described in this chapter with a sampling rate of 1000 Hz and with the same procedure for each of them, which is shown in the flowchart in Figure 4.6, and the pictures in Figure 4.7, and also described in the following. After preparation of the skin for electrode application, the subjects sat in a relaxing position for 5 min outside the water to allow their body to rest and slow down the heart to a normal resting heart rate. Then, there was a 10 min measurement period in a sitting position outside the water, then 10 min measurement in supine position outside the water. Afterwards, the subjects were immersed in water. The ECG was measured for 10 min in a sitting position inside the water, then 10 min in a supine position. Forreliable HRV measurements, a 5 min evaluation period is required [75]. With a 10 min recording period in this study, a centered window of 5 min within this measurement was used for each evaluation, as it has been done in previous work [326]. This also reduces the influence of cold feelings near the end of the measurements, which several subjects reported after staying fully submerged in water for 20 min. Without much movement, even warm water above 30 °C will cool down the body and might cause changes in the HRV characteristics. In order to allow a fair comparison to methods using potential-based measurements, e.g. [326], the electrodes were placed in an Einthoven Lead II configuration, although, based on the current-circuit’s characteristics, it would be more favorable for the current ECG to have the measurement electrode placed directly above the heart on the left side of the lower sternum.

73 4 An Everywhere-Usable Electrocardiography System

Skin preparation & 5 min resting: 10 min recoding: 10 min recoding: electrode Sitting position Sitting position Supine position attachment outside water outside water outside water

10 min recoding: 10 min recoding: End of Only 5 subjects: Supine position Sitting position measurement 2 min backstroke inside water inside water

Figure 4.6: Study procedure for each subject. The backstroke phase was only recorded for five subjects to visually test signal characteristics during motion, which did not produce more noise in the signal, but not evaluated further. Reprinted from [54] with permission.

(a) (b)

Figure 4.7: Positions of the subjects during the study.(a) Sitting position outside water. (b) Supine position outside water. (c) Sitting position inside water. (d) Supine position inside water. Adapted from [54] with permission.

74 4.3 Heart Rate Variability Characteristics in Relaxed Environments

Table 4.1: HRV parameters as a mean over all subjects. * 푝 < 0.05; ** 푝 < 0.01; *** 푝 < 0.001 compared to outside water condition, according to Welch’s t-test or Wilcoxon signed rank test, if not normal distributed.

Sitting Sitting Supine Supine Parameter Outside Water Inside Water Outside Water Inside Water

HR (bpm) 74.1 ± 5.6 62.3 ± 5.7 *** 61.2 ± 6.0 59.2 ± 6.6

푃퐿/푃퐻 4.26 ± 1.05 1.39 ± 0.34 *** 1.67 ± 0.51 1.32 ± 0.26

푡푅푆퐷 (ms) 35.5 ± 5.5 60.8 ± 8.1 *** 60.0 ± 11.8 61.1 ± 8.1

푝푁50 (%) 13.1 ± 4.3 36.0 ± 7.0 ** 32.3 ± 9.1 36.6 ± 7.6

After the measurement, which was stored on the internal SD-card of the device, the signals were extracted, all 푅-peaks determined by an automated algorithm [56, 221] (see next chapters) and manually checked for outliers. The resulting list of 푅푅-intervals was fed into the PhysioNet HRV Toolkit to calculate time-domain and frequency-domain HRV features [130]. The NN/RR measure (percentage of 푅푅-intervals classified as normal), supplied by the toolkit, was used to determine reliability of the automated assessment. All recordings had higher than 90 % NN/RR values, so all data could be used. This procedure was used throughout the work for all HRV analyses.

The HRV measures 푇푅푅, heart rate (HR; calculated from 푇푅푅 and in indirect inverse relationship to it), 푃퐿/푃퐻, 푡푅푆퐷, and 푝푁50 were used for evaluation according to [75]. 4.3.3 Results If not stated otherwise, the typical HRV measures are described as proposed or used in previous work and described in chapter 2 [326–328]. Variance is always given as the 95 %-confidence interval and corresponding values are the mean for all subjects. The Shapiro–Wilk method was used to test for normal distribution of the data and depending on outcome afterwards the Wilcoxon signed-rank test or Welch’s t-test to obtain the two-sided p-value, which is given for each statistically significant result in the next two sections. This evaluation scheme was also used for all evaluations of significance which are described in the remaining work.

75 4 An Everywhere-Usable Electrocardiography System

HR Change (Poolside Submerged) HR Change (Poolside Submerged)

1.1 1.1

1.0 1.0

0.9 0.9

0.8 0.8

Average Relative HR Change 0.7 Average Relative HR Change 0.7

Sitting Supine Female Male LF/HF Change (PoolsideLocation Submerged) LF/HF Change (PoolsideGender Submerged)

1.2 1.2

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

Average Relative LF/HF Change 0.0 Average Relative LF/HF Change 0.0

Sitting Supine Female Male RMSSD Change (PoolsideLocation Submerged) RMSSD Change (PoolsideGender Submerged)

3.5 3.5

3.0 3.0

2.5 2.5

2.0 2.0

1.5 1.5

1.0 1.0

0.5 0.5 Average Relative RMSSD Change Average Relative RMSSD Change 0.0 0.0 Sitting Supine Female Male PNN50 Change (PoolsideLocation Submerged) PNN50 Change (PoolsideGender Submerged)

5 5

4 4

3 3

2 2

1 1 Average Relative PNN50 Change Average Relative PNN50 Change 0 0 Sitting Supine Female Male Location Gender Figure 4.8: Distribution plots of relative changes in HRV. The average relative change from outside the pool measurement to submerged in the water for the HRV parameters HR (first row), 푃퐿/푃퐻 (second row), 푡푅푆퐷 (third row), and 푝푁50 (last row). On the left side the sitting and supine positions are differentiated, on the right side female and male participants only in the sitting condition (the female and male plots on the right side in combination will form the left-most distribution). The dashed line shows the median of all subjects in the sitting position just as one additional reference. The violin is the kernel density estimate, always plotted at least until the data point extrema.

76 4.3 Heart Rate Variability Characteristics in Relaxed Environments

Time Domain Parameters Figure 4.8 shows the results of the evaluated HRV parameters in the time and frequency domain, as a mean change over all subjects between outside the water and in immersed condition. They are further listed in

Table 4.1. 휎푅푅, 푡푅푆퐷, 푝푁50, and 푇푅푅 showed an increase in the HRV during immersion compared to the positions at the poolside. The 푡푅푆퐷 highly significantly increased푝 ( < 0.001) in the sitting position from (35.5 ± 5.5) ms outside the water, to (60.8 ± 8.1) ms immersed in water. In the supine position during immersion this measure only slightly increased with no significant change. Similarly, the 푝푁50 significantly (푝 = 0.0022) increased from (13.1 ± 4.3) % sitting at the poolside, to (36.0 ± 7.0) % sitting inside the water. In the supine position, the measure showed no significant difference.

The 푇푅푅 is described as HR for easier interpretation and comparabil- ity to results later in this work. For the sitting position, the HR decreased highly significantly (푝 < 0.001) from (74.1 ± 5.6) bpm outside to (62.3 ± 5.7) bpm inside the water. Again, no significance in the decrease for the supine position. Overall, all time parameters showed a significant change for the sitting position during water immersion compared to outside the water. In contrast, the values did not differ significantly in the supine position. While immersed, the measures showed only a minor change. Gender differences are plotted in Figure 4.8 on the right side for each parameter, however these were not tested for significance. Yet, particularly the 푝푁50 showed a much larger average relative increase for men in the submerged condition then what was observable for women. Frequency Domain Parameters For the HRV features in the frequency domain, only the 푃퐿/푃퐻 parameter is visualized with its changes in Figure 4.8 and described in Table 4.1, as it represents the most descriptive parameter in this domain, which also encodes the changes in 푃퐿 and 푃퐻 indirectly in a subject neutral way, and only the ratio parameter is used in subsequent chapters. The power spectral density in the 푃푣퐿 range showed an increase in the sitting position during water immersion to (2295 ± 879) ms2 compared to (1631 ± 371) ms2 in this position outside the water. In contrast, in the supine position, this power spectral density part was lower during the immersed condition than at the poolside. The 푃푣퐿 showed a value of 1520 ms2 in the water, in contrast to 2286 ms2 outside the water.

77 4 An Everywhere-Usable Electrocardiography System

2 In the low-frequency range, the 푃퐿 was (2190 ± 426) ms in the sitting position outside and (1972 ± 658) ms2 inside the water. For the supine position it was (1976 ± 474) ms2 at the poolside and (1873 ± 530) ms2 submerged in the water. For the power spectral density in the high-frequency range, the 푃퐻 outside the water was (629 ± 274) ms2 and (1485 ± 515) ms2, while it was (1501 ± 392) ms2 and (1498 ± 390) ms2 inside the water in sitting and supine positions, respectively. The reverse characteristic between sitting and supine positions was also observed for the total power. In contrast, the ratio between the low-frequency and the high-frequency range showed the same trend for both conditions, but with significantly different magnitudes. In the sitting position, the 푃퐿/푃퐻 mean ratio outside was 4.26 ± 1.05, with a median of 4.08, and in the submerged condition it was 1.39 ± 0.34, with a median of 1.22. The change is highly significant with 푝 < 0.001. For the supine position, the mean ratio outside was 1.67 ± 0.51, with a median of 1.43, and inside the water it was 1.32 ± 0.26, with a median of 1.28. This change is not significant with 푝 = 0.136. 4.4 Discussion In this chapter, a newly designed sensor for everywhere ECG measurements is presented. A previous current-based concept [320, 321] was implemented and extended. Originally intended for the measurement of muscle activity in electromyograms, the results show that the concept works equally well for the measurement of the electrocardiogram in underwater or dry conditions. This in particular enables the sensor to be used in any environment, especially in non-clinical ’free-living’ home environments. In order to evaluate the new sensor during real applications, it was tested in different conditions: outside and inside the water, as well as during rest and movements. The results of the small bandwidth configuration showed an almost noiseless signal quality, also under immersed conditions in a public pool with several different larger noise sources in the vicinity. Looking at the direct signal output of the ADC, a much higher signal amplitude is noticeable for the 푅-peak in the QRS complexes acquired by the current-based circuit when the electrodes and the subject were fully head-out immersed in water. This might be due to the high conductivity of the water surrounding the metal plate electrodes on the subject’s body, which increased the current flow into the electrode interface substantially.

78 4.4 Discussion

Looking at different placements of the measurement electrode of this monopolar recording system, different configurations were found to work very well, particularly with the measurement electrode placed directly above the heart on the chest-skin at the third intercostal space between the sternum and the midclavicular line (second row in Figure 4.5). At several placement configurations, the electrical heart activity could not be recorded in a noticeable magnitude; for example, if both the measurement and the ground electrode are placed on the abdomen on the left and right lateral sides, respectively. It is assumed this has to do with disturbances in the electrical field of the heart activity inside the body. Even though the QRS complexes can be clearly identified in the current- based ECG signal, it is not entirely clear at this point whether evaluation methods defined for standard potential-based ECGs, e.g. for abnormality detection, can be applied unchanged to current-based signals [329]. The monopolar system measures the entire summed-up electrical field of the heart (i.e., the changes in it), and not a particular directional derivation of it, it is not trivial to translate specific definitions of pathological electrical activity, seen in certain derivations of a 12-lead clinical potential ECG, to the current-based system. Yet, 푅-peaks can be detected with equal precision, compared to the potential-based systems. Therefore, a current-based monopolar system has the advantages to not only be usable even in immersed environments, but also with less electrodes and thus effort. This can be particularly interesting for wearable devices like smartwatches to overcome the requirement of attention-based ECG analysis, remove the need for PPG sensors and at the same time increase the reliability and accuracy of HRV determinations. The proposed current measurement technology was implemented into the FitnessSHIRT platform (electrical back-end) as introduced by the Fraunhofer IIS (Erlangen, Germany) [66, 323]. However, the integration was not tested to its full potential. The FitnessSHIRT platform’s core element, the shirt itself with integrated tissue-based electrodes, could not be used in the study due to restrictions in the manufacturing process of the initial prototype circuit of the system. However, the form factor can be reduced to fit onto the contact elements of the shirt. With insulation of the circuit housing, the system would then be entirely waterproof and be able to collect real-time ECG data very unobtrusively on this wearable platform. For prototype usage and evaluation in this work, external electrode connectors were of superior importance in order to allow for very similarly designed experiments between this circuit and

79 4 An Everywhere-Usable Electrocardiography System

gold standard reference sensors, which are also cable-based. In this way, the influence of the entire cabling in both systems could be neglected, as they had the same length and quality forall experiments. Also, the sensors (reference and current-based) could be kept in the same locations (at the poolside), where the same amount of external noise sources are present. One interesting finding in this regard is that the largest difference between the potential- and current-based systems can be seen when using metal-plate electrodes. Only in this case, the full distortion effect of the highly conductive water surrounding the electrodes is noticeable. When using highly adhesive, foam-based Ag/AgCl electrodes, there is not a large difference in signal quality between the two systems atthe beginning of signal acquisition. This is likely due to the high shielding factor of the foam material and the much higher conductivity of the electrode gel between the skin and the electrode surface. As the electrodes are covered by the gel on the one side and the connector and foam on the other side. The signal degrades severally over time when the water washes away the electrode gel. The results of the heart rate variability study endorse the physiological response to immersion showing an initial reflex bradycardia resulting from a restriction of blood flow to the peripheries and to venoconstriction due to hydrostatic forces, typically known as the mammalian diving reflex [330]. In general, immersion results in an increase of venous return, increased stroke volume and bradycardia as a modest increase in cardiac output [331]. The primary goal of this study, in the context of the developed sensor, was to demonstrate its applicability for such recording endeavors in any environment and to assess HRV parameters in a relaxed floating condition. It was possible to successfully reproduce the characteristic changes in the HRV in dry and submerged conditions as reported by previous studies with insulated potential-based ECG sensors [326]. This allows the conclusion that this sensor can be used to perform accurate and repeatable measurements of the human ECG in dry and underwater conditions for physiological experiments or monitoring, in particular for the important evaluation of the HRV, with its high relevance to the human stress state. For the HRV parameters, the closeness of the median value to the mean value indicates that there are no large outliers in the data for the 푃퐿/푃퐻 parameter. This could allow the careful conclusion that theses values might be representative also for a larger population. Also an important observation here is the difference between male and female participants clearly depicted in Figure 4.8 in the right column of plots. Even though

80 4.4 Discussion

the significance was not evaluated due to the small number of participants (only five female subjects), the values could indicate that thereis a difference between genders. Looking at the 푃퐿/푃퐻 parameter, almost all values of female participants are above the median of all subjects (the dashed line) while almost all values of male participants are below the median. However, the sample size is just too small to draw any gender-specific conclusions here. Yet, in this small population, the same characteristic is clearly visible for the 푝푁50 parameter. Even though the kernel density estimate seems to suggest not that clear a distinction, it should be noted that this only is a very generalized estimation to provide an additional visual guide to interpret the data. The actual data reference should be the box-and- whisker plots in the center of the plots, which are almost non-overlapping for the male/female differentiation. This observation is in contradiction to the ones in previous work [332], where it was suggested that the gender has no influence on the 푝푁50. The results here would suggest, according to Equation 1, that male participants were able to relax much more deeply in this setting than female participants. It could also mean however, that the characteristics of deep relaxation are different for men and women. However, again, the small and background-biased study population could also be the reason for any difference observed. This does not become clear from these results and will require further investigation in the future. Still, the data provides a solid basis of (gender-neutral) observations on how the four main HRV parameters, that are also used in the next chapters, behave in relaxed conditions, with no stressful or physical task to do for participants. Floating in warm water is for most humans a very relaxing environmental state with a large potential for deep relaxation [315–318]. This is demonstrated by the large decrease in the 푃퐿/푃퐻 and HR parameter and increase in 푡푅푆퐷 and 푝푁50 with also large increases in variability. The evaluation pipeline for all these parameters is the same as in later chapters, and results can therefore be compared.

5 Real-Time QRS Detection and Classification on Mobile Devices Even though mobile devices have exponentially increased in computing power, they are still embedded systems. They run their own operating systems and are still limited in many ways, e.g. regarding power consumption and input/output connectivity. Therefore, machine learning-based systems have to be adopted and optimized for those platforms. For example, two of the most adopted implementation tools of machine learning systems are based on the Python programming language (TensorFlow and PyTorch). However, the interpreter system of Python has a considerable dependency overhead, which at the point of writing, is still too much to be reasonably implemented in mobile operating systems. In this chapter, a complete processing and classification system for ECG signals is presented that can be run on many mobile devices and was thoroughly evaluated on those. Instead of evaluating the algorithm in a laboratory environment on non-mobile machines, here, the algorithms were computed with all evaluation data on mobile phones and the results represent precisely how the system will work in the real setting. Given the input from the Everywhere-ECG presented in the previous chapter, the phone-embedded classification engine developed here allows a quasi real-time processing of the raw ECG signal values transmitted via Bluetooth from this device and their evaluation for HRV information or arrhythmias. Most of the sections here have been published in [55] in 2012, when devices like the Apple Watch were not available yet. Even though the acquisition of an ECG signal on smart devices is easier today, it is far from being routinely implemented. The Apple Watch only allows short term focused acquisition of the signal and only provides rudimentary signal analysis. The applicability of the method and research in [55] is thus still very relevant. The presented algorithm could be implemented on smart watches or even future smart garments and other mico-sized wearable technology. The restrictions that applied for smartphones during development of the algorithm apply today for those devices [11]. With data transmission over wireless networks still being as expensive, it will likely always be more efficient to evaluate and classify raw data on the recording/wearable device and only transmit aggregated or classified results. Energy harvesting methods are still in their infancy and will not

83 5 Real-Time QRS Detection and Classification on Mobile Devices

be able to sufficiently power those devices to change these fundamental approach characteristics in the near future. In this chapter, the work is presented that led to an accurate and energy- efficient ECG processing and classification method for mobile and wearable devices [55]. The original implementation was created as an An- droid application called Hearty. The sources of the entire project have been made open source and are publicly available on GitHub [333]. The originally published work is here extended with the three independent projects it has spawned: an open interface API for wearable sensors on Android, an ECG processing library, and a GUI framework for real-time signal plotting on Android, each also on GitHub [334–336]. 5.1 Energy Efficient Detection and Classification of QRS Complexes The method for energy efficient detection and classification comprises four main algorithm blocks: 1. QRS complex detection, which is based in parts on the work of Pan & Tompkins and their renown and highly efficient QRS detector [213], 2. the formation of a set of healthy template waveforms for the QRS complex and their continuous adaptation to the underlying signal, 3. the extraction of features related to the heartbeat, based in parts on the work of Krasteva & Jekova [220], 4. classification of individual QRS complexes using a hyperparameter- optimized classification tree, adapted partly from [220] as well and retrained for this pipeline. Figure 5.1a depicts an overview of the algorithm, each of the steps are explained in detail in the following sections. 5.1.1 QRS detection As a first processing step, the raw ECG Lead I, Lead II or Lead III signal is processed with digital filters for noise rejection and a QRS complex detection scheme roughly adapted from Pan & Tompkins [213]. The processing steps for the detection pipeline are, in order of application:

1. a bandpass filter (output denoted Band) composed of cascaded low-pass and high-pass filters,

84 5.1 Energy Efficient Detection and Classification of QRS Complexes

(a) Overview of heartbeat detection and classification (b) Decision tree for classification

Figure 5.1: Algorithm overview and detailed decision tree for heartbeat detection and classification. Main components are a bandpass filter, window integration, peak detection, waveform extraction, template formation and adaptation, feature extraction and threshold-based tree-classification. Reprinted with permission from [55] © 2012 IEEE.

2. a five-point differentiation, 3. a squaring operation, point-by-point, and

4. a moving window integration (output denoted Int). Single QRS complexes are isolated using a threshold T computed from Int, by applying a moving average filter with a window size of 150 ms. If either Int or Band reach the threshold, a search for the R-deflection is initiated using a 3-point peak-detector on Band. For each peak candidate, Int is again compared to T to ensure that the detected peak is a valid R-deflection. The reason for this is that if Int does not immediately follow an increase in Band, it is likely that the first threshold crossing is caused

85 5 Real-Time QRS Detection and Classification on Mobile Devices

by noise or an ECG feature other than the 푅-peak. The next chapter proposes a general solution to these kind of problems. 5.1.2 Template Formation and Adaptation To facilitate subsequent feature computation and heart beat classification, two QRS complex templates are required for this approach. As a fully autonomous approach is desired, these templates are not selected by a supervising cardiologist like in previous work [220], but derived automatically from the ECG signal and adapted over time. For the template generation, a selection process is conducted at the beginning of every ECG acquisition to determine the first templates. The goal of this process is to maximize the chance of selecting two healthy/normal QRS complexes. For this, computations are performed on 400 ms windows centered on the first six validly detected 푅-peaks. The QRS complexes in these six windows are used as candidates for the templates. In order to assign the most likely normal candidates to the two template slots, two criteria are used, which look for maximal waveshape and -size correlation:

1. smallest difference of the individual waveform area to the average waveform area of the six candidates, based on the ArDiff (see below) 2. Pearson correlation between the candidates of more than 0.95, in order to increase the likelihood of having two regular beats in the template slots. The candidates are first split into two groups according to their individual waveform area, where beats with a waveform area lower than the average are considered first. This is due to the fact that abnormal beats showa substantially different waveform area and this parameter is well suited for a naive pre-selection [129]. Within each group, the beats are ranked according to (1). Using this combined ranking, the first two consecutive candidates that satisfy (2) are chosen as templates. If such a pair can not be identified, the first two ranked candidates are selected as templates. The process is depicted in Figure 5.2. This approach does not rely on expert-supervised selection of normal beats, therefore, no fixed set of templates is used for ECG analysis. Partic- ularly in the domain of real-time long-term unobtrusive ECG monitoring, it is a more feasible approach to continually update beat templates, since characteristics of the recording devices and the entire recording envi-

86 5.1 Energy Efficient Detection and Classification of QRS Complexes

(a) Exemplary incoming waveform (b) Detection and selection of first six sequence beats

Figure 5.2: Process of automated QRS complex template selection. (a) represents an incoming signal with six heartbeats, three normal beats, one small atrial fibrillation-like beat and two premature ventricular contractions. After detection of all beats in (b), they are sorted according to waveform area (c) and the two beats with the smallest negative distance to the average waveform area are selected as beat templates (d). ronment undergo fluctuations over time, e.g. environmental changes in temperature or medium (water immersion). The beat template update procedure works as follows. Every time a heart beat is classified as normal (see below), it replaces the template thathad the higher correlation with the classified beat. This progressively rotates new normal beats through the template slots and provides the same behavior described in the QRS template matching procedures in [220]. It is expected that two diverging normal beat templates are thus stored in these two template slots and provide normality matching in a large number of correct cases. However, this template selection and adaptation process can fail. In the case of more than three aberrant QRS complexes in the first six detected beats, no healthy template can be selected. This is a restriction of this process that assumes that the usual subject classified with this method, is quite healthy. This is feasible as the goal or target for this method is an energy efficient detection of precise 푅-peaks for HRV analysis with the

87 5 Real-Time QRS Detection and Classification on Mobile Devices

Figure 5.3: Features used for beat classification. Each of the features has a visual description of its computation method. added ability to identify abnormal beats. This can be helpful to notify a user of the system of heart problems or reject beats for HRV-based stress classification to prevent false positives. 5.1.3 Feature Extraction For subsequent beat classification, four heart beat features as defined by Krasteva & Jekova are used [220]. These are:

1. difference in absolute normalized waveform area ArDiff

2. maximum cross-correlation coefficient MaxCorr, computed from a 400 ms window centered on the detected 푅-peak

3. the width of a detected QRS complex QRSwidth, computed using the Pan-Tompkins integrator output [213]

4. the 푅푅-interval time 푇푅푅 of the current beat R-R and of the previous beat R-Rprev. The features are shown with a visual description of their computation method in Figure 5.3, and as rounded shapes in Figure 5.1b. 5.1.4 Beat Classification Beats are classified using two different characteristics. The waveform characteristic, which distinguishes normal and abnormal beats. The

88 5.2 Implementation

corresponding abnormal beat class comprises the subclasses {premature ventricular contraction (PVC), PVC/aberrant, bundle branch block, escape beat (generic), atrial premature contraction (APC), aberrant}. The other characteristic is the pace/rhythm characteristic, which distinguishes normal and abnormal pace. The abnormal pace class comprises the subclasses {fusion of two beats, AV-block, tachycardia, bradycardia}. Discrimination of the different subclasses is performed by a physiologically knowledge-based decision tree (Figure 5.1b) similarly to the one proposed in [220]. The different beat classes are shown in rectangular boxes in the figure. See [129] for a detailed description of the particular arrhythmias associated with each of these pathological beat types. 5.2 Implementation The software framework was implemented in Java using the Android SDK 2.3.3 (Google Inc.), which corresponds to the API Level 9. An- droid was selected as mobile phone platform because of its open nature, widespread use and the portability of the code. Additionally, it allowed easy integration of external ECG sensors via Bluetooth and is expected to be long-lived and also available for many future wearable devices. Google released the Android Wear OS platform1, which is based on the Android operating system. The software framework consists of three components:

• a Data Delivery Service that provides data streaming from a Blue- tooth connection or pre-recorded database files, • a Signal Processing Service that implements the above mentioned algorithms, and • a Graphical User Interface (GUI) that displays the results.

Each of these components were originally developed as part of the Hearty app and then sparked the development of individual independent components that are easily reusable: the Data Delivery Service led to the development of the SensorLib; the Signal Processing Service led to the Java ECG Library (JELY); and the GUI development necessitated the creation of an Android UI high performance plotting widget, the PlotView. All these are visualized in Figure 5.4 and described in the following, after the corresponding component is presented. All of them are now fully independent and available as open sourced Java or Android components

1 https://developer.android.com/wear/ (visited on 2020-02-13).

89 5 Real-Time QRS Detection and Classification on Mobile Devices

SensorLib JELY

Sensor GUI Data Signal PlotView Delivery Processing Service Service

Figure 5.4: Hearty framework and outsourced components. The core elements of the Hearty framework are shown in black, and in green color their respective stand-alone out- and open-sourced GitHub projects [334–336]. with their own GitHub repositories and are even today actively used and further developed. The Data Delivery Service The Data Delivery Service allows provisioning of ECG data to the actual processing algorithms in real-time. Implementation was realized as an Android service running in the background. Using a service allowed to manage the Bluetooth connection to the sensor node independently of any foreground process, and thus the user to switch between applications without loosing the connection. Furthermore the service provided a consistent interface to the processing application for live and pre-recorded data. Datawereeitherprovided in live mode or database mode. In live mode, the serviceallowsacquisition of Lead I, Lead II, or Lead III dataviaa Bluetooth connection to a Shimmer ECG sensor node, or the FitnessSHIRT platform (including the Everywhere-ECG). In database mode, the service supported reading of pre-recorded ECG data from database files. In this case, real-time sampling was simulated by the service. If available in the file, the service additionally provided the beat-annotation characters for in-app evaluation of the algorithm. The SensorLib – Plug-and-Play Sensor Implementations on Android Mobile Devices Originating from the developments of the original Hearty application’s Data Delivery Service, the SensorLib was created. It is a Java-based An- droid library that can be included into any Android app development project and provides a standardized interface to interact with biophysiological measurement sensors attachable to Android devices, e.g. via

90 5.2 Implementation

Bluetooth. It is available free and open source under the GNU GPL v3 license on GitHub [334]. Its original idea was to provide an easy to use and common software interface independently of the used sensor. It provides a Sensor base class from which individual manufacturer-related class implementations are derived. In the following is an example of how easy and intuitive a sensor is instantiated and data streamed. This listing assumes it is called from within an Android Activity context class method.

Listing 5.1: Instantiation and starting of a sensor stream using the SensorLib.

1 InternalSensor sensor = new InternalSensor(this, new SensorDataProcessor() { 2 @Override 3 public void onNewData(SensorDataFrame sensorDataFrame) { 4 Log.d("Example", "new data: " + sensorDataFrame); 5 } 6 }); 7 sensor.connect(); 8 sensor.startStreaming();

Instead of the InternalSensor, this could be any sensor like the Shim- merSensor2, or the FitnessShirt3. All sensors are instantiated and used in the exact same way. The SensorLib deals internally with all usage of individual driver components and libraries and their custom calls. Ev- erything is exposed to the user/developer within a unified API, which significantly eases implementation effort when working with different sensors in smartphone or wearable device applications. This library can complement existing systems like the SSI/SSJ frameworks [337, 338] or the Open Data Kit Sensors [339]. Signal Processing Service The signal data provided by the Data Delivery Service/SensorLib were passed to the Signal Processing Service, which applied all described algorithmic processing steps and invoked the detection and classification algorithms. To enable processing in quasi real-time, the implementation was optimized with respect to computational requirements and memory footprint. Circular buffers were used for all buffering operations during

2 See ShimmerSensor.java in [334] 3 FitnessShirt.java in [334]

91 5 Real-Time QRS Detection and Classification on Mobile Devices

signal processing to avoid overhead and function calls were kept at a minimum. All digital filters were automatically adapted to the sampling interval of the incoming data. The JELY – Java ECG Analysis Library All algorithms that were developed for real-time ECG analysis within this work, are collected inside the JELY [335] (“Java Ecg LibrarY”). This is a comprehensive Java library made publicly available on GitHub [335]. Java was selected as programming language, since almost every computing device can either directly run Java libraries, or an implementation exists that allows it to run them. Having the algorithms in native Java code within that library allows them to be run directly from within common programming environments like Matlab or Python, etc. This enables all implementations to be developed and tested on a development platform (like a PC) and then used without any re- or cross- compilation on the target platform, like e.g. the Android operating system on mobile devices. This is of particular importance for research on such mobile platforms, since debugging and scientific evaluation of the algorithms is naturally difficult here due to the inherent disconnected/wireless nature of those devices. There are approaches on how to do this, for example using the adb4 (“Android Debug Bridge”) for the Android operating system, yet this still imposes many restrictions on the developer. Even though debugging can be performed remotely on the mobile device, evaluating Machine Learning classifiers on large databases is at best inconvenient andin the worst case even infeasible. The JELY comprises many algorithms, originally all of those used in this work, and in the meantime it has been extended substantially. The following works either made use of the library, or contributed to it: [43, 54–56, 59, 60, 64, 66]. The library provides classes and methods for ECG signal streaming, storing/loading, preprocessing (e.g. filtering, windowing), QRS complex detection, 푅-peak detection, 푃-wave and 푇-wave analysis, 푅푅-interval determination, HRV calculation, beat classification, and annotation han- dling storing and streaming. Furthermore it provides utility functions at the intersections of all these methods. For example the following methods are implemented in the library: El- gendi’s Fast QRS Detector [221], the decision tree classifier described in

4 https://developer.android.com/studio/command-line/adb (visited on 2019-06-08).

92 5.2 Implementation

this chapter5 [55], the classifiers used by Leutheuser et al. [60], loading routines for PhysioNet databases [216], for the CustoMed ECG devices binary file format, and for CSV files. The JELY also provides its own binary data file format, optimized for the storage of ECG signals and annotations and their use on mobile and embedded systems. Furthermore, the library’s repository also contains a Java-based GUI application for the inspection and annotation of any supported ECG files6. Graphical User Interface A Graphical User Interface (GUI) was implemented, which allowed starting and stopping the processing and visualization of the results within the Hearty app. It is shown in Figure 5.5. The user is able to select an ECG sensor in Bluetooth range and connect to it, or select an ECG file on the local Android device’s storage for simulation through the app. After successfully connecting to the sensor, the app immediately starts streaming the raw ECG signal and processes it according to the previously described method. To allow efficient plotting of signal data with high sampling rates(up to 2000 Hz) on Android devices, a compact and lightweight plotting component was created for this purpose, the PlotView. Line-plots representing the raw ECG signal, extracted QRS complexes and variation of heart rate were displayed in the bottom area of the screen. Every time a QRS complex was detected and classified, it was marked either in green (normal beat) or red (abnormal beat). Additionally, the current value of different features like heart rate, 푅푅- interval in ms and number of recognized QRS complexes was displayed in the upper area of the screen. Upon closing a signal recording session, an overview of the classification results was displayed. The PlotView There are no GUI widgets or components within the existing default Android framework provided by Google to developers that allow high performance plotting of sampled signals [340]. It is too much of a specialized need to justify a default component. There are several custom solutions available for plotting-widgets in An- droid, however, they are in most cases only intended to be used for static graphs. In this work, a plotting tool for Android was needed that is able to plot multiple time-series signals in real-time on the screen for user

5 See GradlDecisionTreeClassifier.java in [335] 6 The EcgEditor class in [335].

93 5 Real-Time QRS Detection and Classification on Mobile Devices

Figure 5.5: Screenshot showing Hearty’s main interface. Main interface with the PlotView used in four instances, showing, from top to bottom, (I) the raw ECG input signal, (II) the filtered signal (blue) with output of the moving window integrator (green) and the main threshold function (red), (III) a continuous collection of extracted 400 ms heartbeat windowed signal shapes with detected Q, R, and S points marked with green dots, and (IV) the tachogram (heart rate function), respectively. Reprinted with permission from [55] © 2012 IEEE.

feedback, signal inspection and algorithm debugging. With the requirement for ECG signals to be sampled at a minimum frequency of 200 Hz and ideally at 1000 Hz, the use of existing solutions was not feasible and thus the PlotView was created to cater for this need. As many researchers likely have a similar need, the project was open-sourced on GitHub [336]. The library consists of three main components: (1) a performant circular buffer implementation for boolean-, integer-, floating-point values and arbitrary Java objects, (2) an Android custom view implementation (the actual PlotView) to project color pixels on the screen based on an underlying value buffer object, and four different plot components for 1-D, 2-D, 3-D7, and specialized sampled data plotting (SamplingPlot). The circular buffer realizes a well-known concept in optimized buffer implementations known in computer science and regularly used in data streaming concepts [341]. The goal is to completely avoid reallocation of memory (which slows down execution time) and reuse a one-time allocated memory space. With cheap random access memory available in large quantities, this also allows large amounts of data to be stored, accessed, or streamed in a efficient manner. The PlotView implements such buffers in multiple instances for different primitive data types inthe context of the Java programming language, optimized for the Android

7 Not yet fully implemented at the time of writing.

94 5.3 Mobile-Phone Based Evaluation

operating system where reallocations are prevented by one-time alloca- tion at the beginning and the use of class-public attributes instead of method-local variables. The public exposure of all members breaks with the strict concept of object-oriented encapsulation in favor of maximum speed since the Android compiler originally used for the algorithms did not ensure all state-of-the-art desktop compiler optimization methods. The PlotView makes use of these circular buffers to allow performant random access to variables stored in the buffer. These are the x-, y-, z-coordinate information (or time information in the case of the Sam- plingPlot) and the respective values at these locations as well as possible markers for each value. The PlotView allows attachment of an arbitrary number of Plot-derived classes Plot1D, Plot2D, Plot3D and Sampling- Plot. These are searched at runtime and, depending on view settings, projected in a windowed and scaled manner within the boundaries of the widget size specifications. High performance in plotting is achieved independent of windowed scale by using pixel-projection, instead of array iterated line-drawing. Instead of going through each point, testing its screen visibility, and then drawing it, the method iterates through each pixel on screen and tests whether the pixel should be colored. This is standard procedure for all modern graphics devices, but usually graph or plot widgets do not make use of it due to the high effort of implementation and the missing need for static graphs. 5.3 Mobile-Phone Based Evaluation The MIT-BIH Arrhythmia Database and MIT-BIH Supraventricular Ar- rhythmia [342] databases were used to evaluate the algorithm on three different Android-based phones, the Samsung GT-I9000, Samsung GT- N7000, and HTC Wildfire S A510e. Using a PC, the records of both databases were downloaded and converted to compressed CSV files using the PhysioNet tools [216] and manual processing. The data were then copied to the mobile phone storage and loaded at runtime into app memory and evaluated using the algorithm on the phone. In order to determine algorithm performance, the annotation character supplied with the MIT-BIH records (if available) was compared with the result of the decision tree classification of every detected beat using a window of interest. The results were stored by the application to provide statistics about matches and mismatches. After processing of all records, the results were written again into a different CSV file and later evaluated on a PC for scientific reporting.

95 5 Real-Time QRS Detection and Classification on Mobile Devices

For presentation of the results, all normal/abnormal beat classifications were evaluated. The measures to describe the performance of the classification are: True Negative a beat was detected and correctly classified as normal True Positive a beat was detected and correctly classified as abnormal False Negative a beat was either not detected or detected and incorrectly classified as normal False Positive a beat was either detected when none was annotated or detected and incorrectly classified as abnormal Median FN median of false negatives over all records 5.4 Results Table 5.1 presents the results for real-time classification of the MIT-BIH Arrhythmia and the MIT-BIH Supraventricular databases on the smartphone. 256 014 unique beat annotations (MIT-BIH Arrhythmia: 90 116 in 39 records, MIT-BIH Supraventricular: 165 898 in 72 records) were processed, and 0.42 % were not recognized. Overall sensitivity for abnormal beat detection was 89.5 % with a specificity of 80.6 %. The results of the application were identical on all employed mobile phones. Several records (numbers 104, 109, 111, 118, 124, 203, 214, 231, 232 in the MIT-BIH Arrhythmia Database and numbers 845, 848, 850, 855, 888, 890 in the MIT-BIH Supraventricular Arrhythmia database) had to be omitted, as they only contained exclusively paced, left/right bundle branch block, other abnormal beats or too much noise within the fist ten beats, so no healthy templates could be found and the algorithm did not operate normally. Live operation of the application was tested with four healthy individuals using the Everywhere-ECG. The application worked as expected, providing a continuously monitored ECG signal for recording times of 1 min to 30 min with all QRS complexes correctly detected, according to visual inspection, and no abnormal beats found. 5.5 Discussion The results demonstrate that the proposed implementation works well on the MIT-BIH databases, which constitute the most widely used gold standard in this field. Especially the high sensitivity and the low number of false negatives show that the approach is applicable, while the specificity and hence the number of false positives could still be reduced.

96 5.5 Discussion

Table 5.1: Evaluation results of the algorithm for all processed beats. Adapted from [55] with permission.

MIT-BIH Arrhythmia MIT-BIH Supraventricular

Detected Beats 99.59 % 99.58 % True Positive 11 224 16 474 True Negative 65 855 114 606 False Positive 10 987 32 567 False Negative 1680 1556 Median FN 3 3

For those database records where the first beats and hence the template was conditioned well, detailed analysis showed a high classification accuracy. This also showed that most of the false negative or false positive decisions were made on isolated records. In most cases this was the result of poorly conditioned templates due to either noisy data or many ectopic beats at the beginning of a record. Automated selection of the first templates relied on a simple algorithm, which requires that the first ten beats are of high quality. This needs tobe improved in further work, especially as several records in the databases could not be used because of this limitation. To provide more consistent results, a further step in the selection algorithm could be implemented to evaluate the signal quality/level of noise, present at the beginning of the recording and delay the first template selection accordingly. It might also be a viable option to extract accurate templates over the course of an entire ECG recording, store them associated with the current mobile phone user, and reuse them in subsequent ECG evaluations of the same user. It would allow the template selection to still be entirely unsupervised while opening up the opportunity for more accurate templates. Still, the main application is intended for healthy populations to identify isolated arrhythmic beats or the start of a sudden arrhythmic period, so the template selection process can be assumed to succeed in any case. The main purpose of the algorithm is to provide a very computationally efficient and accurate 푅-peak detector for mobile and wearable devices. The results show that this is easily achieved with an accuracy

97 5 Real-Time QRS Detection and Classification on Mobile Devices

well above 99 %, which is similar to any other existing algorithms for non-wearable scenarios. The database signals partially suffered from heavy noise, which did not affect the algorithms capability to detect the peaks. Since the Everywhere- ECG in the prototype stress platform does not provide morphological information accurately in the restricted filter-pass mode, 푅-peak detection and subsequent HRV extraction is the most feasible goal here. In 2012 the algorithm was able to run in real-time on all tested phones, but created high computational load, in particular calculation of the maximal correlation coefficient led to full CPU utilization at the given sampling rate. Today’s microprocessors are more advanced, even the ones in wearable devices, which reduces the severity of this issue. Still, there is further potential for optimization, as any reduction in CPU load will allow a more efficient and prolonged use of battery power. Computation of the correlation coefficient only around the 푅-peak will certainly boost algorithmic efficiency, while providing similar results if combined with 푅-peak refinement as described in the next chapter. Using the free resources, further analysis of features in the frequency domain of the HRV or of another ECG lead could be performed, which could strengthen classification performance. However, in this work, the anomaly detection additionally implemented in this application is mainly required only for rejection of abnormal beats and its potentially skewing effects on HRV statistics. For effective beat classification and the least influence of abnormal beats, a precise 푅-peak detection is crucial. Therefore, the next chapter presents an algorithm on how this part can be, and was enhanced even further.

98 6 Refinement of R-Peak Detections As shown in the previous chapter, most of the widely used and successful 푅-peak detectors known in literature are based on a similar sequenced analysis procedure [213, 221]. After the ’raw’ signal, coming from the skin-interfacing electrical circuit, is sampled and digitized by the ADC, a filter method is used to remove typical ECG signal noise interference. Then, several algorithmic computation steps increase the signal-to-noise ratio (SNR) of QRS complexes. Since the filters and algorithms are mostly time-domain based implementations, there is a processing and time-shift in signal characteristics incurred. The group-delay of filter implementations often is determined as aconstantduring developmentof the processing pipeline. Even though it is known that this delay changes across the frequency spectrum, the effort of determining a frequency-dependent group-delay function and applying it during runtime is rarely taken. This can have a variety of reasons. Generally, when looking at the typical application of 푅-peak detectors, which is to produce an HR estimation, the effort/cost to implement such a function is disproportionate to its effect on the precision of the estimation. Usually it likely isnot even noticeable by or relevant to the end-user when just reading-off the current HR value. Thus, many implementations do not even (need to) consider this step. They can just search for 푅-peaks in the filtered signal and do not need to care about its time-shift with respect to the original signal, as they assume this shift to be near constant for the same frequency. However, this is not true for aberrant beats or phases of fast changing HRV. Let’s assume the R-peak detector is implemented as part of a wearable sports tracking device. In this case the display or storage of the device likely doesn’t even allow a high enough precision for the difference to show up for the end-user. Especially in high-noise environments like trail running with a wearable heart rate tracker, the noise pollution of the signal due to movement artifacts is a much bigger problem than having a high-precision 푅-peak determination. For a good user experience it is more important to have a reliable heart rate estimation displayed and recorded. This is usually achieved through heavy averaging on values. Yet, averaging of 푅푅-intervals strongly affects HRV metrics, which need to be calculated on the actual, exact 푅푅-intervals with ms precision. Many authors who proposed a QRS detector algorithm in the past did not share the details on how they verified the 푅-peak detections [214, 221,

99 6 Refinement of R-Peak Detections

1 ECG signal Elgendi R-peaks Reference R-peaks 0.5

0 Voltage [mV]

-0.5

-1 5.54 5.56 5.58 5.6 5.62 5.64 5.66 5.68 Samples [f = 360 Hz] ×104

Figure 6.1: Differences betweendetected and annotated 푅-peaks. 푅-peaks determined from the ECG signal in record 207 of the MIT-BIH Arrhythmia Database by the algorithm in [221] (red circles), and actual reference annotated 푅-peak locations (blue cross). Reprinted with permission from [56] © 2015 IEEE.

343]. Usually, the peaks are simply detected by a maximum search in the filtered signal and then compared to raw signal annotations using a slackness window. For example, Köhler et al. used a window of 75 ms [219], which is quite large, considering for example the 푝푁50 value, which distinguishes 푅푅-interval variability above and below 50 ms. Further- more, values for the 푡푅푆퐷 and 휎푅푅 parameters are in the range of 50 ms to 170 ms (see Table 2.2). Therefore, an accuracy evaluation with 75 ms slackness has no meaning for how accurate the determination of the HRV is. However, it is rare that even this slackness parameter is mentioned in scientific papers introducing 푅-peak detectors, the paper by Köhler et al. is one of few publications found considering that value. This means that no information is available on how accurate the temporal location actually is within that slackness window for almost all detectors in the literature. The problem is demonstrated in Figure 6.1 where four heartbeats are shown. The second beat was still considered to be correctly detected, even though its temporal location deviates considerably from that of the actual reference beat annotation. A minor deviation can also be observed for all the other shown beats except for the last beat, which matches the reference annotation perfectly (which, in this example, itself is questionably annotated with a difference of one or two samples to the actual signal minimum).

100 6.1 Beat Slackness

However, in the future it will be important to precisely determine or evaluate 푅-peak detection precision in order to provide more continuous and sophisticated 푅푅-interval measurements and thus HRV estimations, that can be sensitive to even very subtle changes. This is quite relevant for the determination of stress and also more detailed performance evaluations for human-performance-centered wearables1. At the end of this work it will be clear from all the results that the difference in the 푡푅푆퐷 parameter between absolute relaxation and maximum stress is only about 25 ms (see chapter 11). This is why a novel 푅-peak refinement algorithm was developed as part of the wearable stress-determination ecosystem. The algorithm is modular as it can work with signals coming from any QRS detection algorithm, it can be used as a simple post-processing step. In the following, the work presented in [56] is closely recapitulated and important connec- tions to the stress research platform described. No previous related or similar work could be found, which leaves the assumption that such determinations have not been made before. Simultaneously to the refinement algorithm, a measure for the 푅-peak detection error is proposed, dubbed beat slackness, and it is determined for three state-of-the-art QRS detectors [219, 221, 343]. This measure can be used to evaluate such detectors with respect to their suitability for HRV– and thus stress – analysis and to estimate the uncertainty of their detections. It can thus provide an additional feature for performance- comparisons. Furthermore, modifications to the QRS detection algorithm by Mohamed Elgendi [221] are proposed, that allow it to work in real-time on embedded hardware. Last but not least, the algorithm is also available in the JELY as an open source implementation on GitHub2. 6.1 Beat Slackness Most heart beat detectors apply a bandpass filter during signal preprocessing [221]. This is done to extract the frequency components associated with the characteristic deflections of the QRS complex. Often, a cascaded highpass/lowpass filter combination or direct bandpass filtering is used with cutoff frequencies at around 8 Hz and 30 Hz [213, 217,

1 For example the Apple Watch as one of the most accurate PPG-based HR tracking wearables [228], only allows HRV measurements to be exported while using the Breathe app [344], which requires the user to remain very calm with little body motion [181]. 2 RPeakSlacknessReduction.java in [335]

101 6 Refinement of R-Peak Detections

Original ECG signal 1 ECG signal Elgendi R-peaks 0.5 Reference R-peaks

0 Voltage [mV] -0.5

-1 5.47 5.48 5.49 5.5 5.51 5.52 5.53 5.54 t t 4 Samples [f = 360 Hz] R D ×10

Original ECG signal filtered with bandpass 0.6 Bandpassed ECG signal Elgendi R-peaks 0.4 Reference R-peaks

0.2

0 Voltage [mV] -0.2

-0.4 5.47 5.48 5.49 5.5 5.51t 5.52 5.53 5.54 F Samples [f = 360 Hz] ×104

Figure 6.2: Location definitions for the Peak Refinement Algorithm. Two QRS complexes found in record 207 of the MIT-BIH Arrhythmia Database. The top plot shows the original signal, the bottom plot shows the bandpass-filtered signal. Red circles show the peak detected by the improved implementation of the knowledge-based QRS detector [221] described later in this chapter, blue crosses show the actual reference annotated 푅-peak locations. 푡푅 marks the time in the original signal that corresponds to the correct reference 푅-peak. 푡퐹 marks the time in the band-filtered signal that corresponds to the detected absolute maximum in the area associated with the QRS complex. 푡퐷 marks the computed location of the assumed 푅-peak in the original signal, by subtracting the group- delay of the bandpass filter. This demonstrates the problem of beat slackness in QRS detectors. Reprinted with permission from [56] © 2015 IEEE.

221]. This removes signal distortions like baseline wander or power-line noise. The filtered signal is then usually used directly, or in the formof a window-integrated auxiliary signal, to search for the location of the largest amplitude in an area associated with the QRS complex, in order to identify the 푅-peak location. The filtering introduces a time lag inthe resulting signal equal to the group-delay 푑 of the employed filters. It is not guaranteed that the largest signal amplitude in the filtered/processed signal at time 푡퐹 always corresponds precisely to the actual 푅- deflection time 푡푅 in the original signal (Figure 6.2). Many algorithms do not provide a compensation. Thus, they implicitly introduce a per-beat

102 6.2 Beat Slackness in Existing QRS Detectors

slackness error 푠푛 for the temporal location of the 푛th detected 푅-peak 푡퐷, in the original signal. This per-beat slackness can be defined as

푠푛 = |푡푅 − 푡퐷| , with (4) 푡퐷 = 푡퐹 − 푑.

The slackness can be accumulated for an entire continuously recorded and processed ECG signal consisting of 푁 detected heartbeats to form the total per record/database beat slackness as:

푁

푆 = ∑ 푠푖. (5) 푖=1 A record or database-independent value can then be calculated as the average per-beat slackness 푆 휇 = . (6) 푁 This value can be used as a performance indicator for temporal precision of the algorithm used to detect the beats. 6.2 Beat Slackness in Existing QRS Detectors In many cases, authors of QRS detectors do not share open-source implementations of their methods. Therefore, for an evaluation of the beat slackness in existing detectors, three common and mobile-friendly QRS detectors were re-implemented, based on their associated publications, and this parameter subsequently assessed. A common and mobile-friendly QRS detector is in this work defined as an algorithm that fulfills the following criteria: 1. it was evaluated on the entire MIT-BIH Arrhythmia Database, with no records left out;

2. it has a reported detection accuracy of more than 99 % on that database; 3. it uses only time-domain features for QRS detection, which are more desired above frequency domain features on mobile platforms due to their lower computational demand, which for long- term analysis of signals impacts battery runtime.

103 6 Refinement of R-Peak Detections

Three popular and methodologically diverse QRS detector algorithms that fall in this category were implemented and evaluated with respect to their beat slackness: the Fire Pulse Train Automaton detector [343], the Zero Crossing Counts method [219], and the fast Knowledge-Based QRS Detector [221]. The last one is in its methodology comparable to the one used in the previous chapter. A refined version with integrated beat refinement was developed and evaluated here and now serves inthe JELY as the default detector also for the Hearty implementation. Fire Pulse Train Automaton The Fire Pulse Train Automaton detector represents an approach for hardware-based, highly efficient 푅-peak detection [343]. It uses a non- Nyquist analog to digital sampling method converting analog signals into pulses instead of a sampled signal at fixed intervals. A time-based integrate and fire (IF) sampler is used to extract descriptors in thepulse domain. QRS detection is performed by an automaton-based decision logic. For the most part, the same processing pipeline described in [343] was used for this re-implementation. As this usually has to be implemented as an ADC-like component within the ECG circuitry, some slight modifications were made for it to run using regularsampling mechanisms, allow its simulation and evaluation on digital databases and improve its performance for this kind of application:

• Physiological blanking cannot be interrupted by strong stimuli for a certain time 푡퐵.

• Time parameters 푡푝1, 푡푝2, and 푡푝3 (see [343]) are automatically adapted depending on the sampling frequency 퐹푠 used for the simulation as: 1 푡푝1 = , 퐹푠 (7) 푡푝2 = 1.5 ⋅ 푡푝1, and

푡푝3 = 1.75 ⋅ 푡푝2.

Zero Crossing Counts Method The Zero Crossing Counts method is based on applying a high frequency saw-tooth sequence to the band-filtered and squared original ECG signal [219]. Then, the number of times the resulting signal changes its sign in a given time window is counted. For bandpass filtering, a combined FIR lowpass-highpass filter was employed using a Blackman-Harris window and cutoff frequencies 푓푙 and 푓ℎ, respectively.

104 6.3 R-peak Refinement Algorithm

Köhler et al. describe their temporal localization as a maximum/minimum search where the magnitudeof the much largerextrema is taken [219]. However no further details are provided. For re-implementation, the largest absolute extrema in the band-filtered signal is used. Fast Knowledge-Based QRS Detector The Fast Knowledge-Based QRS Detector algorithm proposed by Mo- hamed Elgendi in 2013 is a Pan-Tompkins-like detector based on two independent moving average thresholds [221]. After filtering using an 푡ℎ 8 order Butterworth bandpass with cutoff frequencies 푓푐1 and 푓푐2, the resulting signal is squared and the two moving averages calculated as described in [221]. Since the algorithm is desired to work in real-time, on the fly, the statistical mean of the entire squared ECG signal could not be used. Instead another adaptive threshold is employed to represent 푧 [221], calculated as the moving average of a time window with width 푇푤 of the filtered and squared signal. To detect the QRS complexes, the rejection parameters were modified. Blocks of interest are ignored in this implementation if they appear within 100 ms of the ending of the last detected QRS block of interest. 6.3 R-peak Refinement Algorithm Each of the previously described algorithms can further be improved by applying an additional processing step that is described in this section. The Peak Refinement Algorithm searches for the highest absolute differential maximum in the vicinity of the assumed peak position 푡퐷 in the original, unfiltered ECG signal. This method assumes that after calculating 푡퐷, in most cases, the actual 푅-peak position can be found by this additional refinement search that works as follows. Three amplitudes 푎1, 푎2, 푎3 of the original signal are extracted around 푡퐷 at the corresponding times 푡1, 푡2, 푡3 using a time window of length 푇1 and 1 푡 = 푡 − 푇 , 1 퐷 2 1 푡2 = 푡퐷, and (8) 1 푡 = 푡 + 푇 . 3 퐷 2 1

From the three values 푎1, 푎2, 푎3 the median value is selected and taken as the amplitude reference 푎푟. Next, all the amplitudes 푎[푡] in another time window of length 푇2, with 푇2 < 푇1, around 푡퐷 are tested against

105 6 Refinement of R-Peak Detections

1 ECG signal Elgendi R-peaks a Reference R-peaks 2 0.5 Selected amplitude values

a 6 a 0 a 5 1 a 3 Voltage [mV]

-0.5

a 4

-1 5.54 5.55t 5.56 5.57 5.58 5.59 5.6 U Samples [f = 360 Hz] ×104

Figure 6.3: Peak refinement with the slackness reduction algorithm. Two QRS complexes found in record 207 of the MIT-BIH Arrhythmia Database [215]. Blue circles show the peak detected by this implementation of the fast Knowledge- Based QRS Detector [221], blue crosses show the actual annotated reference 푅- peak locations. The amplitude values selected by the Peak Refinement Algorithm for both beats are marked with red squares. 푎1, 푎2, 푎3 correspond to the first beat, 푎4, 푎5, 푎6 correspond to the second, abnormal beat. For the first beat 푎푟 = 푎1 is selected as the amplitude reference and for the second beat 푎푟 = 푎5 is selected. 푡푈 is marked for the first beat. The value 푎[푡푈] and the first reference 푅-peak mark (first blue cross) coincide in this case. No additional symbol was introduced in the plot to avoid confusion. Reprinted with permission from [56] © 2015 IEEE.

푎푟. At the time where the largest absolute difference to the reference amplitude 푎푟 is found as

푡푈 = arg max |푎[푡] − 푎푟| , (9) 푡 is taken as the refined/updated temporal location 푡푈 of the detected 푅-peak. This procedure makes the following assumptions:

1. the original location 푡퐷 is already very close to the actual 푅-peak location 푡푅;

2. the width of QRS complexes does not exceed 0.9 × 푇2. Then, two of the three selected amplitude values, and thus the reference amplitude 푎푟, will lie on, or near, the baseline of the current QRS complex. It follows that the largest absolute difference to it will be the actual 푅-peak

106 6.4 Implementation and Results

in most cases. In Figure 6.3 two beats and their selected amplitude values are shown for clarification of the procedure. 6.4 Implementation and Results 6.4.1 Thresholds The authors of all the tested algorithms made it clear how important the choice of threshold values is to the performance of each detector [219, 221, 343]. For each threshold that is not discussed in this section, the default value as suggested in the respective work was used. 푡퐵 = 80 ms was set, in order to avoid additional false positive detections that resulted from simulated pulse generation at low sampling frequencies. For the zero crossing method the forgetting factors 휆퐾 = 휆퐷 = 0.995, and 휆Θ = 0.985 were used [219], which are empirically optimized values adapted from [323]. Köhler et al. did not share the actual values used for their forgetting factors. They also ignored the first five minutes of all records, which contain about 10 000 beats over the entire MIT-BIH Arrhythmia Database. These beats were included in this work for the slackness evaluation of this algorithm, since the zero crossing count was initialized here with a non-zero value that significantly reduced the training phase. For the knowledge-based QRS detector 퐹1 = 8 Hz and 퐹2 = 21 Hz was set, 푇푤 = 4 s, and a threshold offset 훽 = 7 %. Time windows for the Peak Refinement Algorithm were chosen as 푇1 = 200 ms, since no QRS complex is expected to last longer than 180 ms, even the abnormal ones, and 푇2 = 140 ms, as suggested by an evaluation of different values for 푇1 and 푇2, using the Creighton University Ventricular Tachyarrhythmia Database [345]. 6.4.2 Slackness Results The results for the slackness evaluation of the three implemented detectors on the MIT-BIH Arrhythmia Database are shown in Table 6.1. No records were left out. The last second of each record was ignored due to implementation convenience for the algorithms that required windowing methods. The Fire Pulse Train Automaton detector had the smallest average slackness without using the Peak Refinement Algorithm, evaluated for all beats (휇) and only considering abnormal beats (휇푎). Using the Peak Refinement Algorithm, the average slackness dropped by slightly more than 50 % for the fast Knowledge-Based QRS Detector, which was the best result. Furthermore, the overall beat detection error was also

107 6 Refinement of R-Peak Detections

Table 6.1: Slackness results. for the three tested detectors without and with the Peak Refinement Algorithm (PRA) applied. Adapted from [56] with permission. 푆푎/휇푎: total slackness/average slackness for abnormal beats only, always given in ms.

Method 푆 푆푎 휇 휇푎 Pulse Train detector [343] without PRA 317 774 126 290 8.1 10.2 with PRA 166 158 93 371 4.2 7.5

Zero Crossing Counts [219] without PRA 363 787 188 200 9.2 15.2 with PRA 165 659 88 403 4.2 7.1

Knowledge-Based [221] without PRA 341 242 166 316 8.7 13.4 with PRA 157 093 86 927 4.0 6.7 reduced from 0.41 % to 0.34 % for this implementation of [221], which was significantly lower than for the other two algorithms. Applying the Peak Refinement Algorithm increased the average per-beat processing time by 3.2 % for the Fast Knowledge-Based QRS Detector, by 0.7 % for the Zero Crossing Counts method, and by 0.2 % for the simulated Fire Pulse Train Automaton Detector. The per-beat processing time was constant. 6.5 Discussion The results demonstrate two important aspects: (1) common QRS detectors have an inherent beat slackness that is not insignificant and (2) using the proposed Peak Refinement Algorithm the slackness can be considerably reduced. More precisely, an average slackness of 4.0 ms is close to the sampling interval, which was 2.78 ms for all the signals from the MIT-BIH Arrhythmia Database. This means that an even further reduction of the slackness is not really feasible. Yet, the slackness is still not zero, however in the range of an error of one to two samples on average. For most beats, there is no error anymore, however for phases of high noise or totally aberrant beats, a larger error is still present. This also raises the question however, how valid in time beat annotations were

108 6.5 Discussion

originally made. As can be seen in Figure 6.1, even the references are not always exactly annotated at the extremum for aberrant beats. A large majority of all QRS detection methods uses a very similar preprocessing pipeline. It consists of a bandpass filter ranging from 8 Hz to 35 Hz [217, 221]. Moreover, for temporal localization of the 푅-wave, most algorithms require an extrema search procedure on the band-filtered signal from which the group-delay of the band-filter is subtracted. This leads to very similar systematic errors in most methods. The largest per beat slackness occurs for abnormal beats that have a very different shape compared to normal QRS complexes. One example can be found in Figure 6.2. The downward slope of the abnormal beat is in a different frequency range than the normal QRS complex and produces a much lower response in the bandfiltered signal. It is also too wide to be fully contained in the secondary search window (푇2) of the Peak Refinement Algorithm. Therefore, it will not correctly localize that 푅- peak, but it will still improve the detected temporal location. All of algorithms had to be modified slightly for their re-implementation. These modifications likely did not alter the slackness characteristics of their originally intended methods, but this cannot be guaranteed. This is the downside of authors not providing reference open-source implementations and there is no easy way to get around it, except requesting the original source code, which in this case did not work due to a lack of responses from authors. In this chapter, an estimator for the temporal error in detected 푅-peaks is suggested. An evaluation for this beat slackness is provided for three QRS detection methods and an algorithm presented to significantly reduce that error. This algorithm requires very little additional processing time and can even be implemented in hardware to complement existing systems with negligible effort. In the context of the stress laboratory platform of this thesis, the presented Peak Refinement Algorithm allows for a more accurate HRV tracking and complements the mobile detection and classification system presented in the previous chapter.

109

7 Biosignal Awareness in Virtual Environments In the previous three chapters, methods were presented to (1) record an ECG in any environment using a wearable platform, (2) use the raw ECG signal in a mobile and wearable operating system and platform to detect all stress-related ECG features, and (3) to perform 푅-peak refinement in such a way, that maximally accurate 푅푅-interval information is captured. In the next chapter, the Stroop Room stress research platform will be presented, which, as input, can employ all of these outcomes. In this chapter, an experiment is described that intended to provide preparatory work for the Stroop Room implementation. Its main goal was to determine if using a VE in a study will have an impact on a person’s HRV parameters and thus potentially bias results of a VR-based stressor, and if so, to what extent. Furthermore, the study explored whether user interface choices in the Stroop Room would impact the stress level or biosignals of the user and to what extent the user is aware of his/her own heart rate while in a VR that knowingly assesses this parameter. The outcome is expected to allow a better understanding and interpretation of the results described in the next part of this thesis. This chapter is based on one of my previous publications [59], the experiments and implementations were supported by my bachelor student Tobias Zillig, who worked with me on this topic under my supervision [346]. For this chapter, I entirely recomputed and reevaluated all data with an independent discussion and conclusion, compared to the published version, with a much stronger focus on their relevance for the wearable stressor platform in this thesis. Previous work has shown that virtual realities can be used to affect a user’s biosignals, therefore having a biofeedback effect. Amores et al. used an unobtrusively recorded live EEG to alter the virtual reality users found themselves in, reflecting their state of relaxation [147]. Richer et al.used the Google Glass to display an augmented reality heart rate feedback with no biofeedback manipulations [347]. Gorini et al. used HR variations recorded live during immersion of patients in a virtual reality, to control various aspects of the environment [348]. For example the flow speed of a waterfall decreased when the patients relaxed and their HR went down correspondingly. Luca Chittaro came up with three different methods of how anxiety could be induced in users in a virtual environment [349]. He showed that a method using biofeedback mechanisms has the largest

111 7 Biosignal Awareness in Virtual Environments

impact. Jeroen Gielen created a biofeedback VR application based on the electromyogram of the biceps and triceps to modify and steer motion in the virtual scenario [350]. Most of these works used similar toolkits or scenarios, e.g. the campfire and nature-scenarios developed in [348] were used in multiple works. Except for [349], there has not been a thorough evaluation of different visualization concepts of the users’ biosignals and the general impact on users of taking part in a VR biofeedback study. Still it is clear from these works that there seems to be a certain correlation between what people experience in a virtual environment and their biosignals. Therefore, this chapter presents four different visualization concepts of how the cardiac activity of users could be reflected in virtual environments, and evaluates their usability, attractiveness, and whether they have a biofeedback-related impact (tendency) on heart rate signals in virtual reality. This experiment is then used to determine the general level of impact a VE and the study participation have on ECG-derived biosignals. For example, whether participants in a study in VR are already so stressed out by their participation and the awareness of being recorded (Hawthorne effect), that an actual stressor-mechanic is not even relevant, or whether they are completely unaffected by the immersion itself. This constitutes an important step in preparatory work made for the Stroop Room (next part), which tries to actively induce stress through presence-effects in the virtual reality. With the knowledge of how biosignal awareness in VR will affect the heart rhythm patterns, a more conclusive light can be shed on the results in the next part. The effects on the HRV in this chapter form the middle ground, an intermediary state between being fully relaxed (first chapter) and being fully stressed (next chapter). 7.1 Methods The enveloping experiment that was used to assess stress states under mild cognitive load consisted of the subjects having to rate four different visualizations of their current heart rate in VR and using them for self-assessment of their own HR. The subjects were therefore expected to experience a mildly social stressed situation where the experimenter asks them about their self-assessment and gives feedback about the correctness of these estimations, making the subjects aware of the correctness of answers. At the same time, the interfaces could be evaluated in terms of user experience and performance in this study. An additional benefit was

112 7.1 Methods

that those design choices and their effects on users could inform user interface decisions in the Stroop Room. In summary, this study had three goals. First, it should provide an outcome of the most likable visualization method for a user’s heart rate. Second, an assessment of the most functional and effective visualization in terms of self-assessment. Third, the evaluation of stress levels subjects experience during such a mild cognitive and socially stressful experiment where they are aware of being monitored in terms of performance and biosignal reactivity. 7.1.1 Visualizations Four different visualizations were designed with the goal of exploring a four-dimensional design space. This space was set to explore novelty (RadialVis), immersiveness/use of three dimensionality (CubeGrid), pragmatism (ReferenceVis), and feasibility for peripheral visibility even in augmented reality scenarios (ScreenPulse). Two of the four visualizations (ReferenceVis, ScreenPulse) were derived from existing concepts found in medical devices and computer games [349]. The two others (RadialVis, CubeGrid) were picked from complementary areas, in at least two dimensions, of that design space. ReferenceVis The first visualization allows a direct assessment of the heart rateand was intended as a reference visualization. It consists of a heart symbol pulsating in the rhythm of the users heart rate and a number above it, showing the rate as a text (Figure 7.1a). Such visualizations are typically seen on heart rate/blood pressure monitors found in clinical or ambulatory settings, which are designed to be well readable and easily understood. As it displays the heart rate as a number, the user is able to simply read off this information (as reference), in contrast to the other designs, where the information is encoded as a parameter of the visualization animation. This visualization is expected to impose no, or only a light cognitive load on the viewer. However, it included the requirement for subjects to read a task-related text in a VE, which is also a central part of the stressful task in the Stroop Room. RadialVis This visualization consists of a circle moving time-coherent around 360∘ in 20 s before starting anew. From this circle, for each detected heartbeat (푅-peak), a line extends from the center of the circle. The length of the line and its color changes in direct correlation to the current 푅푅-interval,

113 7 Biosignal Awareness in Virtual Environments

(a) Reference Visualization (b) Radial Visualization

Figure 7.1: Reference and Radial Visualizations. (a) The “ReferenceVis” visualization, which consists of a heart symbol, pulsating in the rhythm of the user’s heartbeat, and the actual current beats per minute value written above it as a number. It is used as a reference visualization inspired by clinical electrocardiogram monitoring devices and presented in the central field of view of the subject. (b) The “RadialVis” visualization, which consists of a circle filling time-coherent in radial sections over 20 s. For each detected heartbeat (R-peak) a line extends from the center of the circle. The length of the line and its color changes in direct correlation to the current RR-interval. Reprinted with permission from [59] © 2018 IEEE. equal to the momentary beats per minute (bpm) value (Figure 7.1b). At equal or less than 60 bpm the line is green and short, at a bpm of 200 or higher, the line is red and twice as long as at 60 bpm. Length and color values in between are linearly interpolated between those extrema. ScreenPulse The third visualization was taken from the concept seen in many first- person computer games [349]. A circular red transparent texture overlay is faded in and out over the entire view of the user in the rhythm of her/his heart rate. This is used in computer games when the player character is low on health to provide a peripheral indication of this condition (Figure 7.2). CubeGrid The name of this visualization comes from the little cubes that are ar- ranged in a two dimensional grid at the feet of the user, just as if they were lying on the floor. They bounce upward for about 30 cm and fall back down (Figure 7.3). The upward movement is dictated by the rhythm of the heartbeat of the viewer. This was the most basic approach to directly affect the surrounding environment using biofeedback, as itwas

114 7.1 Methods

(a) Normal screen (b) Pulse overlay at maximum intensity

Figure 7.2: ScreenPulse Visualization. The “ScreenPulse” visualization, which consists of an red colored transparent texture, overlaid over the entire view of the user and fading in and out based on the user’s heart rhythm. This concept is adapted from the peripheral danger indicator visualization seen in many first- person view computer games. Reprinted with permission from [59] © 2018 IEEE. expected the users would have the impression of seeing the floor moving in their peripheral vision. 7.1.2 Experimental Design The experiment was conducted for each participant in two parts. The first part was used for assessment of the influence HR visualizations have on the HRV parameters – their biofeedback tendencies. The second part was used for evaluation of the visualization concepts with respect to their attractiveness and usability/functionality as a heart rate display. During the entire experiment, the subjects were seated on a chair and equipped with a HMD. The additional overall goal from the entire experiment, which was not communicated to the subjects directly, was to assess their overall stress state within this experiment and inside a neutral or experimental virtual environment. First Part - Biofeedback Tendencies In the first three minutes of this part, the subjects did not see a visualization and just sat in an empty, neutral, hangar-like wide-spread room in the virtual reality as can be seen in Figure 7.2a. During that time, their baseline heart rate was recorded (baseline recording phase). In the next step, and in order to create the necessary connection, or feeling of presence, they were shown the visualization of their own actual real-time heart rate for two minutes (physiologically accurate visualization control phase).

115 7 Biosignal Awareness in Virtual Environments

(a) Animation loop at 30 % (b) Animation loop at 60 % (c) Animation loop at 90 %

Figure 7.3: CubeGrid Visualization. The “CubeGrid” visualization, which consists of a floor of cubes that bounce slightly in the rhythm of the user’s heartbeat. This visualization can be considered a predecessor of direct environmental modification biofeedback, where the final implementation would transform the virtual environment, e.g. ground, floor, or walls in a similar way. Reprinted with permission from [59] © 2018 IEEE.

Afterwards a (simulated) continuously rising heart rate was visualized over three minutes (rising visualization control phase) in order to assess whether the HR that is displayed has any influence/biofeedback tendency on the user’s body. The simulated heart rate was raised over the first 30 s by 10 bpm, starting from their current real heart rate, then did not change for 30 s. This was repeated once. In the third minute, the simulated heart rate was raised again by 20 bpm over the first 30 s and kept at this value for another 30 s. This was repeated for all four visualizations in a randomized latin square order. Second Part - Concept Evaluation This part was conducted with two objectives. The first one was to understand how well the different visualization concepts are suited for heart rate estimation. For this, each visualization was shown to the subject and they were asked to estimate the shown HR by naming a bpm value and in a further step, whether the heart rate signal they saw was coming from their actual current cardiac activity or was a randomly generated offset signal of their heart rate. Ten signals were shown to them for each visualization. Foreach of those signals, it was randomly decided by the program whether their actual current real-time heart rate was shown to them, or a randomly generated signal with a close off-set to their current HR. Furthermore, after each visualization, they had to fill out the 28-item version of the AttrakDiff questionnaire [351, 352], in order to assess the hedonic and pragmatic quality as well as the attractiveness of each concept, which was the second objective of this part. In contrast to other

116 7.2 Evaluation and Results

measures or questionnaires, the advantage of the AttrakDiff is that it provides reliable scores even if users did not experience a product for a very long time or with extensive usage protocols [351, 352]. 7.1.3 Participants In the study, 14 volunteers participated (eight male, six female), with a mean age of (29 ± 12) years. They were mostly recruited from the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU). All participants gave written informed consent to take part in the experiment, which was approved by the local ethics committee. Five had prior experience with VR HMD systems. Except for three participants, all had some previous experience with ECG recordings. Twelve of the 14 participants reported regular physical exercise. Exclusion criteria were pregnancy, acute sickness on the day of the study, consumption of alcohol or high amounts of caffeine on the day, or the day before the study, and susceptibility to any types of motion or simulator sickness. 7.1.4 Hardware and Software The virtual environment was created using Unity and presented on a Samsung GearVR headset with an Android-based Samsung Galaxy S6 smartphone (both from Samsung Electronics, Seoul, South Korea) to the participants of the study. The heart rate was monitored using the Everywhere-ECG (chapter 4). With this, a Lead I derivation of the ECG was recorded and streamed in real-time with a sampling rate of 256 Hz to the smartphone and stored there in a JELY binary file for later offline HRV analysis. The ECG stream was also directly analyzed on the smartphone by the latest revision of the Hearty software and the JELY (chapter 5), which detected the 푅-peaks, calculated the current heart rate and raised a system event for each new detected heartbeat, which was captured by the VE application. See section 5.2 for implementation details. HRV offline analysis was performed as described in chapter 4. 7.2 Evaluation and Results The evaluation was separated into the different study sections, which are the test for biofeedback tendencies, estimation task, HRV analysis and user experience assessment. 7.2.1 Biofeedback Tendencies During the biofeedback test, the fluctuations of the heart rate in response to the presented visualizations were evaluated using median values cal-

117 7 Biosignal Awareness in Virtual Environments

culated from sequential 20 s epochs with no overlap [123]. The median was used to compensate for several large outliers caused by short-term heart rate variabilities. For each experimental condition’s time phase, the mean of all contained epochs was calculated and used for a repeated measures ANOVA, comparing baseline recording values, physiologically accurate visualization control, and simulated, rising visualization control. For time phases of different length, segments of one minute were cut from datasets to allow equal comparisons. The magnitude of heart rate changes between the baseline and the highest effect during the simulated tachycardic phase varied between 1.4 bpm to 2.5 bpm. 7.2.2 Estimation Task For evaluating the accuracy of estimated heart rate values, ReferenceVis was excluded, since participants were shown the number of the currently visualized heart rate, making an estimation irrelevant. The process of estimating was, however, continued during this visualization, in order to not create any differences in behavior for participants. Estimation data from two participants was excluded from further analysis due to technical problems. Results in this section are reported from the data of 12 participants only. Estimations given by participants were compared to the true values, and listed as a correct estimation, if it differed at most by 5 bpm. Accuracy of correct estimations was 31.7 % for RadialVis, 41.7 % for ScreenPulse and 45.8 % for CubeGrid. These results are visualized in Figure 7.4a. Looking at the differences between estimated HR and measured HR, the same trend is observable, with RadialVis showing the largest overall error at a mean difference of (78 ± 74) bpm. Estimations using ScreenPulse differed on average by (29 ± 27) bpm, and for CubeGrid by (37 ± 27) bpm. These results are visualized in Figure 7.4b. The signal source estimation task included the reference visualization, because, although participants were provided with a number display showing the current heart rate, they had no information on the signal source, just like with the other visualizations. Results show that signal source estimations using ReferenceVis were correct in (72 ± 25) % of the cases, for CubeGrid in (72 ± 16) % of cases for RadialVis in (68 ± 17) %, and for ScreenPulse in (67 ± 17) % of cases. 7.2.3 HRV Analysis The results for the HRV parameter analysis is shown in Figure 7.5 and also in Table 7.1. The changes between baseline and biofeedback trial

118 7.3 Discussion

70 175 ) ) m % p (

150 b

60 ( n

o e i t t a a 125 R m

i 50 t t r s a E e

100 e H

t 40 e a u R r 75 t T r

a o t e

30 e H 50

c t n c e e r r

r 20 e 25 o ﬀ i C D 10 0 RadialVis ScreenPulse CubeGrid RadialVis ScreenPulse CubeGrid

(a) Correct estimation (b) Deviation

Figure 7.4: Results of heart rate estimations. Mean success rates and error distances (including standard deviations) of all participants, for the heart rate estimation task. Subfigure (a) shows the percentage of how often a subject correctly estimated her/his heart rate (within an interval of ±5 bpm). Subfigure (b) shows the distance in bpm between the subject’s actual current heart rate and her/his estimation based on the respective visualization. Reprinted with permission from [59] © 2018 IEEE. or estimation trial are highly significant for the heart rate only, with 푝 < 0.001 and 푝 = 0.001, respectively. All other parameter changes are not significant. In addition to the trial changes, the plots in Figure 7.5also show the difference between male and female subjects for each parameter. 7.2.4 User Experience User experience was evaluated using the AttrakDiff questionnaire [351, 352]. It provides scores for the metrics: pragmatic quality (PQ), hedonic quality from an identity (HQ-I) and a stimulation (HQ-S) point of view, as well as overall attractiveness (ATT). Scores can be in a range from -3 to +3, representing the worst and best performance, respectively. The mean scores of all subjects with confidence intervals are shown in Figure 7.6. 7.3 Discussion The most important result of this study, in light of the stress platform, is a confirmation that (1) an immersion in a virtual reality does notin- crease stress, it even decreases it, and (2) that even an immersion in VR combined with a mild cognitive task within an experimental environment does not increase stress, instead even relaxes people when making mistakes.

119 7 Biosignal Awareness in Virtual Environments

HR Change (Baseline Trial) HR Change (Baseline Estimation)

1.05 1.10

1.00 1.05

1.00 0.95

0.95 0.90 0.90

Average Relative HR Change 0.85 Average Relative HR Change 0.85

Biofeedback Estimation Female Male LF/HF Change Trial(Baseline Trial) LF/HF Change (BaselineGender Estimation) 1.75 1.75

1.50 1.50

1.25 1.25

1.00 1.00

0.75 0.75

0.50 0.50

0.25 0.25 Average Relative LF/HF Change Average Relative LF/HF Change

Biofeedback Estimation Female Male RMSSD ChangeTrial (Baseline Trial) RMSSD Change (BaselineGender Estimation) 1.8 1.8

1.6 1.6

1.4 1.4

1.2 1.2

1.0 1.0

0.8 0.8

0.6 0.6 Average Relative RMSSD Change Average Relative RMSSD Change 0.4 0.4 Biofeedback Estimation Female Male PNN50 Change Trial(Baseline Trial) PNN50 Change (BaselineGender Estimation) 4 4

3 3

2 2

1 1 Average Relative PNN50 Change Average Relative PNN50 Change 0 0 Biofeedback Estimation Female Male Trial Gender Figure 7.5: Distribution plots of relative changes in HRV. Distribution plots of the average relative change from baseline to both trials for the HRV parameters HR (first row), 푃퐿/푃퐻 (second row), 푡푅푆퐷 (third row), and 푝푁50 (last row). On the left side the biofeedback and estimation trials are differentiated, on the right side female and male participants only during the estimation trial. The dashed line shows the median of all subjects in the biofeedback trial. The violin is the kernel density estimate, always plotted at least until the data point extrema.

120 7.3 Discussion

Table 7.1: HRV parameters as a mean over all subjects. Overview of all important HRV measures (HR, RMSSD, pNN50, LF/HF) [75], between the two trial phases in the study. The unit is given in the first column in parentheses, the uncertainty is given as 95 %-confidence interval and in the column 푑B is the delta ratio of change compared to the baseline measurement given – calculated as the average of subject specific change percentages (푑iB) – values > 1 signify an increase, values < 1 signify a decrease of that statistic.

Baseline Biofeedback 푑B Estimation 푑B Trial Trial

HR (bpm) 74.4 ± 5.8 70.6 ± 1.6 0.95 ± 0.02 70.4 ± 2.0 0.95 ± 0.03 RMSSD (ms) 38.8 ± 9.6 37.4 ± 10.3 0.98 ± 0.11 45.3 ± 10.8 1.20 ± 0.21 pNN50 (%) 18.5 ± 7.4 16.4 ± 9.3 0.93 ± 0.33 23.3 ± 9.3 1.42 ± 0.55 LF/HF 3.08 ± 1.36 2.42 ± 0.83 0.85 ± 0.12 2.41 ± 0.86 0.99 ± 0.24

The estimation trial was set up in a way that participants got feedback about the correctness of their answers immediately. When looking at the 푡푅푆퐷 and 푝푁50 values in Figure 7.5, they are on average above the 1.0-line and thus indicate a more relaxed state of the participants. This is supported by a reduction of heart rate by more than 5 % on average. Only the 푃퐿/푃퐻 shows an slight increase, which could indicate a mild stress, but this is only due to its large variation. The median itself, again, is not changing. This result is quite important and it should be kept it in mind for the next chapters about the stressor implementation in VR, which follows two paradigms that were shown here to not negatively affect stress states of the users by themselves. These are the immersion into VR inside an experiment where users are aware of the fact that they are tracked (Hawthorne effect) and their biosignals recorded, and that they haveto perform a task inside the virtual environment which is evaluated and reacts to errors/mistakes made by the user. Looking closer at the 푃퐿/푃퐻, and also less pronounced in the other parameters, there is a clear difference between the biofeedback and the estimation task. Furthermore, there seems to be a clear difference between male and female participants with respect to the 푡푅푆퐷 and 푝푁50, in particular. It is important to note that the large variation in those two metrics for women (left violin-plots on the right column in Fig- ure 7.5) are caused by isolated heavy outliers, with the median settled around 1.1. This slightly contradicts the generalized observations made in the first data recording (chapter 4) about differences between men

121 7 Biosignal Awareness in Virtual Environments

PQ HQ-I HQ-S ATT

2 2 2 2

ReferenceVis 1 1 1 1 RadialVis ScreenPulse CubeGrid 0 0 0 0 Subscore

1 1 1 1

2 2 2 2

Figure 7.6: Results of the AttrakDiff questionaire. Results for each of the visualizations as mean and 95 %-confidence interval of all participants. It shows the pragmatic quality (PQ), hedonic quality from an identity (HQ-I) and a stimulation (HQ-S) point of view, as well as overall attractiveness (ATT). The score range is from -3 (worst) to +3 (best). The CubeGrid visualization had the overall best scores, except for the PQ value, where the ReferenceVis had the highest score.

and women for the 푝푁50 parameter in any stress setting. The observation here instead further supports the literature, which assumes no gender difference in the 푝푁50 [332]. Regarding the actual modification of the HR through biofeedback tendencies in the visualizations, it can be assumed that there was no effect using this study setup. Even though some significant deviations were found in sub-phases of the biofeedback trial, those are too marginal to assume a relationship to the visualizations. Either subjects were too indifferent to the visualizations to have effective biofeedback tendencies, or the setup was just too naive to provoke any change. This observation allows the conclusion that biofeedback visualization does not have a naive effect on users. A possible explanation for the changes and fluctuations in the HRV parameters for the biofeedback trial could be that the participants had to talk during the study. They were instructed to read out aloud the instructions given in the VR regarding the progression. It is known that talking modulates the HRV, so this could have had an influence. Results of the heart rate estimation show that the ScreenPulse seems to perform overall the best regarding the subjects ability to assess their own heart rate. Even though the CubeGrid visualization was only a basic approach of simulating an adaptive environment, in this case the

122 7.3 Discussion

ground plane/geometry, it yielded the most correct HR estimations of all three non-reference visualizations, however with a much larger variation than the ScreenPulse. The CubeGrid certainly was the most liked one in general when looking at the AttrakDiff scores. Whether this was due to its high effectiveness or due to the highest animation fidelity remains inconclusive from these results. It likely is acombination of both. Certainly it is the mostattractive one, with the ScreenPulse being the least attractive (Figure 7.6). Since it allowed the heart rate to be assessed most effectively as well, it can be hypothesized that a direct modification of the actual virtual environment does not break the presence in a virtual reality and thus is also the most immersive method of visualizing biosignals. Virtual environments are considered an evolution of the narrative medium, therefore modulat- ing biosignals into the environment is a diegetic way of information propagation to the user [353]. The RadialVis is a completely non-diegetic and the ScreenPulse visualization only a semi-diegetic way, which could also explain their inferiority with respect to presence, attractiveness and effect. Yet it also needs to be stressed that both of these did not make use of the advantages of the three-dimensional virtual reality, which leaves the question whether this already influenced the outcome of the experiment. However, the goal of the study was to satisfy extrema in the design space around each of the four pillars novelty, immersiveness, pragmatism, and peripheral visibility. The ScreenPulse was overall the most effective/pragmatic (best estimation results) while not being particularly liked by participants. This led to its inclusion into the Stroop Room design as a distressing error indicator. In the light of the entire thesis concept about a stressor laboratory in VR, the study delivered promising result. It provides additional proof that simply an immersion into VR alone does not significantly influence HR parameters. Even if the users are made aware of their biosignals and are aware that they are recorded for a study, the influences are marginal. This is an important outcome for the next chapters, where a stressor mechanic will be presented that actually impacts the measured HRV parameters significantly and it can be validly assumed that the VE and awareness of biosignal recording as well as study participation is a negligible confounder. No conclusion can be drawn though for the stressor-impact of the visualizations themselves. As an outcome of this study, it was tested whether a biosignal visualization in the Stroop Room (next chapter) would be possible by transforming the walls or the ceiling using a wobbling-effect,

123 7 Biosignal Awareness in Virtual Environments

as an evolved version of the CubeGrid, in the rhythm of a user’s HR. How- ever, it was decided to limit the number of possible confounders for the first Stroop Room experiments. Since this is a visually satisfying/pleasing animation, as it is indicated by the AttrakDiff evaluation (Figure 7.6) in this study, it is unknown whether it might even induce increased flow effects and thus rather reduce stress. Even though an HR evaluation of the data from this study did not reveal differences between the visualizations, this might also be related to their disconnected/stand-alone nature. The search for an answer to this implicit question could be a goal for future studies/evaluations.

124 Part IV

The Stroop Room – A Novel Tool for Stress Research

8 The Stroop Room The Stroop Room essentially isaVirtual Reality-enhanced StroopTest [43]. It extends the original cognitive demand of the Stroop Test, which is to inhibit an executive function (not reading the word but instead naming the color) [41, 276, 354]. The Stroop Room additionally requires a visual orientation, search, and selection task (OSS) as well as containing an optional additional pressure condition, also exploiting the presence of VR, to further try to increase the stress of participants. The optional pressure conditions are explained further below. The goal of the Stroop Room is two-fold. First, it is intended as an easy- to-use, quick-to-setup and highly repeatable stressor for laboratory-like stress research. This is achieved by its use of a VR environment [149], implemented for modern off-the-shelf consumer VR headsets. Second, it is intended to be used to characterize a users’ stress responses, like for example the stressor habituation type [90] or the recovery reaction after making mistakes without requiring extensive laboratory effort, like for example with the TSST. The related work section demonstrated that there currently appears to be no way to provoke a similar HPA axis activation using VR systems alone. Even if the same protocols are used, the presence in the VE is still not high enough, in particular regarding the perception of the user to interact with real humans, while they are, in fact, virtual. Therefore, the decision for the Stroop Room was to build this stressor system using a cognitive stress approach and not a socially-evaluative one. This is expected to have the additional benefit of the stressor to be much more likable by participants than e.g. the TSST or CPT. Results between different subject cohorts are furthermore expected to be more consistent and repeatable in the Stroop Room. This chapter explains the basic concepts of the Stroop Room, its pressure conditions and its implementation. The following chapter (9) then describes the validation experiment that was used to show that the Stroop Room is able to induce cognitive stress. Then, chapter 10 shows how a machine learning approach was used to develop and evaluate potential classifiers for its ability to determine stress reactivity parameters based onuser movement, task performance, and error recovery in the Stroop Room. Parts of this and the next chapter have been published previously in [43] and are copied here without substantial change as long as I originally conceived the text passages. The entire code of the Stroop Room is also published open-source on GitHub [355].

127 8 The Stroop Room

Figure 8.1: The wall layout of the Stroop Room. From straight above looking down into the center of the room. Floor and ceiling are removed. In the center, a symbolic representation of the HTC Vive – a virtual reality headset – is depicted and its field of view with respect to the walls with lines extending from the center. Reprinted from [43] with permission.

The design of the Stroop Room is the result of continuous development from previous research I contributed in a cooperative project about ways to transfer the Stroop Test into VR, which started in 2016 [57]. Important support for this project was also provided by Andrea Wonner, who wrote her Bachelor’s thesis about this topic under my supervision [356]. She helped with initial development of the Unity project and with recording of data in the first recording session. 8.1 Topology and Interaction Concept The Stroop Room consists of a virtual room with six walls, a floor and a ceiling. The walls are set up in a hexagonal shape and are “painted” in a changeable color, with the user standing in the center of the room by default. The colors used for the wall target areas are red, green, blue, purple, yellow and orange, based on the colors that were considered for the original Stroop Test: red, green, blue, purple, yellow, and brown [276]. Orange was used instead of brown as it was empirically found to be slightly more distinctive in overlay text representations in a VR. Using six colors makes the room symmetrical with an even number of walls. This was expected to have a positive influence on orientation for the user in the virtual world. It also allows unambiguous definition of opposing walls, which is necessary for one of the difficulty conditions. A schematic top-down view is depicted in Figure 8.1, without floor and ceiling. Inside the room, the user has to perform a repeating VR-Stroop task, given as an instruction on a 2-D head-up display (HUD)-like overlay (Figure 8.2)

128 8.1 Topology and Interaction Concept

Figure 8.2: The HUD of the Stroop Room. It displays the Stroop instructions to the user. The two possible task badges are shown left and right. The first line gives the task, the second line causes the Stroop interference. Reprinted from [43] with permission. and requiring an interaction with the room, which is to search and select the wall with the color asked for by the instruction. This orientation, search, and selection task replaces the requirement of the original Stroop Test to read the word out loud or say the color name, which is not required in the Stroop Room. A verbal interaction, with today’s VR hardware systems, would not be a diegetic interaction in the 3-D VE, which is the goal of the Stroop Room. It would rather disrupt the immersion. The Stroop Room therefore implements the card-sort paradigm and manual response modality according to MacLeod [41]. Even though the manual modality is known to cause a slightly less pronounced interference effect, it also eases determination of response and reaction times, respiratory patterns via the ECG as well as implementation, the first two of which were considered to be more important for the following development of a Stroop Room stress classifier. Since the Stroop Room can be used with a Bluetooth enabled ECG recording device, in this prototypical implementation demonstrated with the Everywhere- ECG (see chapter 4), an HRV-based evaluation of the breathing pattern through the respiratory sinus arrhythmia (RSA, see 2.2.4) is possible. The RSA is severely distorted during speaking and therefore a manual response modality is more favorable for the Stroop Room. The Stroop Stimulus The basic goal of every Stroop Room task is to select the wall painted in the correct color 퐶 according to an instruction 휄 and under the influence of a simple or complex pressure condition 푃. A Stroop Room stimulus is the appearance of a new task instruction 휄. The task thus refers to the act of recognizing the stimulus, reading it, deciding what color to select, searching for the correct color 퐶 in the Stroop Room and selecting it using an interaction method.

129 8 The Stroop Room

Figure 8.3: View of the user during a trial in the Stroop Room. A color-task (푇푐) with an incongruent stimulus is displayed. A dark floor and ceiling texture improves visual perception of the colored walls.

In contrast to the original Stroop stimulus, the Stroop Room stimulus (instruction) contains two lines; The first line determines the task 푇 and comprises either the word WORD, for the word-task 푇푤, or the word COLOR, for the color-task 푇푐 (Figure 8.2). Depending on 푇, the user has to either read the Stroop stimulus in the second line (푇푤) or focus on its font color (푇푐) to select the correct wall. As an example, if the Stroop stimulus in the second line is GREEN colored red, the word-task 푇푤 would tell the user to select the green wall, whereas the color-task 푇푐 would tell him/her to select the red wall. Without this change, and additional cognitive requirement, the classical Stroop task would not have been feasible in this setup, as a user could easily focus with his peripheral vision only on the color of the pixels in the second line and match the color to one of the walls, without even having to try to inhibit the task of reading the word. The user only has a limited amount of time 푡푆 to perform the task. In the HUD area below the task badge, an inverse progress bar and a text field displays the remaining time left to perform the current task inan resolution of 0.01 s. Figure 8.3 shows what the user sees through the VR headset while performing a trial in the Stroop Room. The awareness about the time limit and about the additional visual search task, at the time the user receives the instruction, is expected to increase stress. As soon as the user selects a color on a wall, or the time 푡푆 runs out, the next Stroop stimulus/instruction is displayed immediately.

130 8.2 Trials

Mistakes A mistake is defined as either the selection of the wrong colored wall, the ceiling, the floor, or no selection within time 푡푆. In one sub-condition (Room C, see next section), selecting the wall instead of the colored target area is also considered a mistake. Inspired by video games, every time the user makes a mistake, his viewport in the VR is flashed with a red color and a short, alarm-bell-like sound is played. This feedback method is adapted from first person video games’ “hurt effect” and has been evaluated and used in comparable implementations in previous research [59, 285]. The effect startles users and brings mistakes clearly to their conscious attention. 8.2 Trials The Stroop Room has different trials and pressure conditions. These allow an assessment of various “performance” parameters, in addition to the ones already defined by the classical Stroop Test. All of these elements are explained in this section. There are two possible trials, a congruent one and an incongruent one with respect to the word/color instruction displayed. Each has 푁푇 Stroop stimuli/tasks as the trial and 푁푃 stimuli for practicing before the actual task. Practicing was allowed for participants to get adapted to the task and understand how the instructions work. This design was adopted from [57, 276] to be able to compare experimental results directly. The colors and tasks are semi-randomized for each trial. At the beginning of the experiment, 푁푃 and 푁푇 stimuli are generated for both trials by permuting all possible color and task combinations in such a way that an equal number of each combination is present and their order of appearance then randomized. 8.3 Pressure Conditions Four different ”difficulty“ or pressure conditions are available inthe Stroop Room. This allows an evaluation of whether a combination of the Stroop Room stimulus with another psychological pressure factor has a multiplicative effect.

Normal condition (0, 푅0) This condition displays the task and gives the user a constant amount of 푡푆 = 5 s time for completing it, which was empirically found to be a well-doable timeframe and leaves room for increasing the time difficulty. The walls of the room are by default 10 m in width and 3.5 m in height and have a distance of

131 8 The Stroop Room

Figure 8.4: A participant inside the Stroop Room. She experiences difficulty condition 푅퐶, where selectable color patches on the walls shrink in size. This is a photomontage to provide a semi-realistic representation of how a subject is immersed in the VE of the Stroop Room.

푟푤 = 8.5 m to the centerof the room. The wall height was defined in such a way that the ceiling height in the Stroop Room feels similar to the height of a normal industrial room. The distance to the walls and their width was selected so that the user inside the Stroop Room using the originally employed headset for development always at least had two walls in his horizontal field of view to ease orientation. This whole condition constitutes the base challenge of the Stroop Room.

Time-pressure condition (A, 푅퐴) The time-pressurecondition reduces the amount of time the user has to complete the task from 푡푆 = 5 s to 푡푆 = 3 s in steps of 0.5 s linearly over the course of each trial.

Environmental pressure condition (B, 푅퐵) The environmental pressure condition reduces the distance of the walls to the center of the room in five steps over the course of each trial from 8.5 m (factor 1.0) to 1.7 m (factor 0.2).

Selection difficulty condition (C, 푅퐶) In this condition, the size of the selectable and visible color texture on the walls of the Stroop Room is linearly reduced. Over the course of each trial, the selectable color patch shrinks in size (width × height) to (7.9 m × 2.8 m); (5.8 m × 2.1 m); (3.6 m × 1.4 m); (1.5 m × 0.8 m). The increased difficulty (shrinking patch-size) can be seen in Figure 8.5. The smallest color patch will also be shifted around on the wall. For each new stimulus, all the patches always appear on the opposite wall.

132 8.4 Parameters

Figure 8.5: Shrinking target areas in the Stroop Room. In the selection difficulty condition (푅퐶), the selectable color patch on the walls decreases in size over the course of the trial. In the most difficult stage at the end (bottom right image), the patch also shifts around randomly on the wall.

All possible conditions/rooms are shown in Figure 8.6. The additional stressor effects are: time-pressure (푅퐴) up to a point where it becomes almost unfeasible to perform the task; a claustrophobic distress in 푅퐵; and in 푅퐶 an additional attention task by increasing the difficulty of the search and selection interaction. The assumption is, that additional stressors will still allow the baseline stress of the Stroop task and visual search in the virtual space to be assessed by averaging physiological arousal across all pressure conditions. 8.4 Parameters All the parameters that can be assessed as descriptors of the human performance in the current Stroop Room implementation are explained in the following. This subsection can be used to understand and interpret the results. This set of parameters can be determined in any Stroop Room implementation, independent of whether validating biosignal sensors are attached to the participant, as it was the case for the validation study (see next chapter). Several of these parameters have also been used in previous Stroop Test research [42].

133 8 The Stroop Room

Room 0 Room A Room B Room C

Figure 8.6: The different conditions in the Stroop .Room Room 0 (푅0) is the default room, other rooms have an additional difficulty or stressor: Room A(푅퐴) has time pressure; Room B (푅퐵) has claustrophobic distress; Room C (푅퐶) has an attention task. Reprinted from [43] with permission.

Task completion time (푡푇) defined as the time between receiving the instruction and selecting a wall. If no selection is made for any reason until the time 푡푆 runs out, the variable 푡푇 is not defined.

Task completion time change (푑푡푇) defined as the relativechange (increase) in 푡푇 from congruent to incongruent phase.

100 Seconds per hundred tasks (푡푇 ) defines a task-normalized variation 100 100 of the task completion time with 푡푇 = ∑푛=1 푡푇(푛), where 푡푇(푛) is the task completion time of the n-th successively performed Stroop Room task by the same subject. This parameter is adopted from the original Stroop Test publication to allow standardized direct comparisons [276].

Reaction time (푡푅) defined as the time between receiving the instruction and starting threshold-defined significant body movement (either the hand or head) indicating the completion of cognitive processing and the desire to select the color. This is different than in [43], where 푡푇 was defined as reaction time, which is not entirely correct. Using the 푡푇 or the 푡푅 does not make a difference when evaluating the delta/relative change to the baseline measurement, however, to avoid confusion in future work, a clear distinction is defined here. In the classical Stroop Test, the reaction timeis the same as the task completion time, but the orientation, search, and selection task in the Stroop Room constitutes a much larger percentage to the task completion time than in classical tests just naming the correct color or selecting one of four buttons in the user’s direct field of view.

Orientation, search, and selection time (푡푂푆푆) defined as the time between having decided for an answer and finishing selection of a wall. Thus, task completion time can be defined as 푡푇 =푡푅 +푡푂푆푆. 134 8.5 Implementation

Mistakes (푛퐸) the number of Stroop task mistakes/errors, e.g. selecting the wrong color, the floor or ceiling, the wall base, or not selecting anything at all (as defined in 8.1). 8.5 Implementation The Stroop Room was implemented in Unity 3D employing the SteamVR library1. As VR hardware system, the original HTC Vive2 and later also the HTC Vive Pro Eye3 with the Wireless Adapter (Figure 2.2) were used, each with room scale tracking. The interaction method uses one of the HTC Vive controllers by allowing the user to point with the virtual laser pointer onto one of the walls and selecting it by pressing the trigger button4, see also Figure 8.4 for an idea on how the subjects were immersed. Randomization of stimuli was performed with Unity’s built-in random number generator, seeded with the subject ID to achieve repeatability of the experiment. Dark colors and textures were used in the room model (ceiling, floor, wall-bases) to facilitate a high color contrast tothe selectable color patches on the wall [357]. Parameters for each shown Stroop stimulus were logged using UTC timestamps with millisecond precision into CSV log files for laterdata evaluation and synchronization.

1 https://github.com/ValveSoftware/steamvr_unity_plugin (visited on 2019-07-11). 2 https://www.vive.com/eu/product/#vive-spec (visited on 2019-07-11). 3 https://www.vive.com/eu/product/vive-pro-eye/ (visited on 2020-01-25). 4 https://www.vive.com/eu/support/vive/category_howto/about-the-controllers.html (visited on 2019-07-11).

135

9 Validation of the Stroop Room In order to assess the amount of cognitive stress induced by the Stroop Room, and to demonstrate the feasibility of its concept, a validation study was conducted. Reference stress-relevant gold standard parameters from biosignals were determined while participants experienced the Stroop Room. The validation experiment was conducted in two separate recording sessions, each over a period of about two to three months. The first one (subjects 1-32), was done in December 2018 and January 2019. The second one was done October 2019 through December 2019. The time in-between was used for preliminary data evaluation and iteration on the Stroop Room concept and assessment strategy. Since many students were used as subjects, this also allowed a comparable environmental setting for subjects in both data recording sessions, e.g. absence of stress from exam-preparations. This chapter describes the study setup and its evaluation. Some content has been published previously in [43]. The results and discussion sections contain significantly extended content. In the published paper, dataof only 32 subjects was evaluated from the first session. For this thesis, these data were extended to include another 39 subjects from the second session. Thus, any results shown here were either recomputed with a maximum of 71 participants or constitute the outcome of unprecedented and significantly more detailed evaluations. 9.1 Participants In the validation study 71 participants took part (31 male and 40 female). Their mean age was (25.1 ± 6.1) years and they had a mean body mass index (BMI) of (22.3 ± 2.6) kg m−2. Most participants were native Ger- man speakers. One was native English and one native Dutch. Stroop stimuli were always given/asked in the subject’s native language, as is derived from [41] to increase robustness of the interference effect. The questionnaires were either in German or in English, depending on the language skills of the participant. Participants gave their informed consent to take part in the study and have their biosignals and saliva samples recorded. All elements of the study were approved by the local ethics committee of the university. Participants were recruited among the students and workforce of the Friedrich-Alexander-Universität Erlangen-Nürnberg (FAU) and, for development of a stress reactivity classifier, participants who took part in

137 9 Validation of the Stroop Room

at least one TSST conducted by the Chair of Health Psychology of the FAU, were explicitly contacted. This constituted participants from e.g. [358]. For these, stress reactivity curves based on their cortisol reaction was available for one or two days, providing a stress habituation label for the latter cases. Members of this subgroup (having at least one TSST profile) are identified in this work as being part of thesubset 퐺Tst ∈ 퐺, with 퐺 being all participants. Subjects with a two-day habituation profile are part of subset 퐺Hab, this group contained a total of 23 participants, eleven female and twelve male. Subjects in the first data recording session are part of 퐺1 and the second one 퐺2. A unique subject ID was assigned to all participants in 퐺 and identifying records deleted afterwards. Exclusion criteria were color-blindness, known simulator sickness or balance problems, and any known acute illness at the time of participation. All participants were instructed to neither drink alcohol or caffeine-based beverages, nor smoke for 12 hours before taking part in the study. 45 % of the subjects wore vision aids (four wore contact lenses, 28 wore glasses). 36 of the participants (51 %) had previous experience with VR. Twelve participants (17 %) had previous experience with the classical Stroop Test (SCWT). The majority of subjects were physically active, 48 (68 %) said they do sports more than twice a week. For the female participants, 20 (50 %) used contraceptives, of these 15 (38 %) employed hormone-based methods, which might have influenced cortisol measurements. Five participants were regular smokers, six casual smokers. 9.2 Questionnaires For comparison with other Stroop Test-related studies, previous work, and to provide qualitative assessment of stress and engagement parameters as well as possible confounders, an array of standard questionnaires was employed. Participants were asked general questions about their age, weight, medi- cations, smoking habits, sports, previous experiences with VR, the Stroop Test and playing computer games in a demographic questionnaire (D) at the beginning of the test. Female participants were also asked about their menstrual cycle and possible use of hormon-based contraceptives in order to correlate this information with outliers in the cortisol and amylase evaluations. The Short Stress State Questionnaire was filled out two times [359]. Before the Stroop Room experience at the beginning of the test (SSSQ0) and after taking part in the experience (SSSQ1). It

138 9.3 Measures of Physiology

was used to assess the dimensions worry, distress and engagement on a 5-point Likert scale. Furthermore, the Flow State Scale (FSS) was used to evaluate the general experience and flow feelings associated with the Stroop Room participation [360]. On the one hand, this can allow an estimation of whether the stress in the Stroop Room might be attributed to positive feelings, and thus represents Eustress, and on the other hand it provided an additional assessment tool for the likability and thus acceptance of such an action-based experimental tool. The FSS contains 36 items which represent nine dimensions of flow related experiential states. Each is rated on a 5-point Likert scale. The Perceived Stress Scale (PSS) was used to control confounding variables by recently experienced stressors [361]. The participants indicated in 10 items on a 5-point Likert scale the extent to which life was perceived as unpredictable, uncontrollable, and overburdened in the last month. The examination of possible depressive symptoms was made using the RRS short form on a 4-point Likert scale with the dimensions self reflec- tion and brooding [362, 363]. Furthermore the 20 items (translated) long version of the German Depression Scale (ADS-L), on a 4-point Likert scale, was used [364]. In order to have yet another rating of the subjects’ stress level, a visual analogue scale (VAS) was used at the very end, which required subjects to rate the stress of the experience on a continuous scale. 9.3 Measures of Physiology In order to assess the amount of induced stress and be able to compare it with other Stroop Test variations found in literature, the electrocardiogram (ECG), the electrodermal activity (EDA), and saliva samples for HPA axis stress response were recorded for each subject. In addition to the Everywhere-ECG, the Nexus Kit 10 MKI (Mind Media, Herten, Netherlands) and its BioTrace+ 2018 software was used to continuously record in real-time a Lead II ECG, using three foam-gel Ag/AgCl electrodes attached to the chest skin surface, and EDA from the index and middle finger mid-phalanges of the non-dominant hand during the entire experiment (Figure 9.1). Markers were manually set inside the BioTrace+ software by the experimenter to designate the beginning and end of each of the individual phases. The raw signal data and all associated markers were extracted into CSV files and used for later processing. The ECG was recorded with a sampling rate of 1024 Hz, the EDA with 256 Hz. Using the ECG, the 푅-peak positions were continuously deter-

139 9 Validation of the Stroop Room

Figure 9.1: Devices used to record the biosignals in the Stroop Room. The electrocardiogram is recorded in Einthoven’s Lead II configuration with three electrodes and electrodermal activity from the index and middle finger mid- phalanges of the non-dominant hand. Reprinted from [43] with permission. mined and the HR calculated. This was done with the Everywhere-ECG for feasibility testing with two subjects only. Since recording of the EDA was required, which could only be done with the Nexus Kit, it made no sense to use two different ECG recording devices in parallel for the bulk data acquisition of the participants. Salivasamples were taken at five points during theexperiment (see section 9.4) using Salivette collection devices (Sarstedt, Nümbrecht, Germany). After collection of all samples, they were stored for later analyses at −18 °C. In a laboratoy, samples were centrifuged at 2000 g and 20 °C for five minutes. Salivary cortisol concentractions and alpha-amylase were determined in duplicate using chemiluminescence immunoassay (CLIA, IBL, Hamburg, Germany) and enzyme kinetic method, respectively, as described previously [358, 365]. If the coefficient of variation between duplicated measures was above 10 %, that sample was not used for result evaluation, as it had a non-neglectable chance for being a measurement error. This affected 20 % of cortisol and 18 % of the 훼-amylase samples. 9.4 Procedure The evaluation process for each participant was separated into several steps/phases as depicted in Figure 9.2. For congruent and incongruent trials, 푁푃 was set to 15 and 푁푇 was set to 120. This number is similar to previous Stroop Test experiments [57, 285].

140 9.4 Procedure

S0 S1

Biomonitor & VR Resting Baseline Informed Questionnaires Headset Setup & ECG + EDA Consent D + SSSQ0 Calibration 10 min

S2 S4 S3 Congruent Trial Incongruent Trial Relax Phase & Resting Phase 120 (+15) Stimuli 120 (+15) Stimuli Questionnaires 5 min ~5 min ~6 min >10 min

Figure 9.2: The procedure used for the evaluation study. S0-S4 indicate the points in time where the five saliva samples were collected. D stands for the demographic questionnaire. SSSQ0 is the pre-test short stress state questionnaire. The sequence of congruent and incongruent trials was randomized for the second recording session. During the Relax Phase, subjects filled out the SSSQ1, the FSS, and the VAS. Reprinted from [43] with permission.

Each subject was assigned one of the four difficulty conditions (room 0, A, B, or C) at the beginning of the test. These assignments were permuted at the beginning of the study to have an equal distribution across participants and time (18 per room). In phase 1 (introduction) the participants were introduced to the study and its goals, procedure and measurements explained to them; they gave their informed consent and were asked to fill out the demographic questionnaire (D), the PSS, RRS, ADS-L, and the SSSQ0 while the first saliva sample (S0) was taken; afterwards, the biomonitoring equipment was attached and calibrated and they were equipped with the VR headset and the controller. Phase 2 (resting baseline) lasted for 10 minutes during which the participants sat quietly on a chair and were immersed into the default Steam- VR mountain-top virtual environment to keep them calm as much as possible. One minute into the resting baseline phase, another saliva sample was taken (S1). Phase 3 comprised the congruent trial or the incongruent trial of the Stroop Room; first, subjects were presented with the 푁푃 congruent/ incongruent practice stimuli and afterwards the 푁푇 actual trial stimuli. During phase 4 (resting phase) subjects had to sit down again on a chair, still immersed in the virtual environment (Stroop Room), where they watched a timer counting down five minutes. They had this time to

141 9 Validation of the Stroop Room

relax from the first trial. One minute into the resting phase, another saliva sample was taken (S2). Phase 5 comprised the incongruent trial or the congruent trial of the Stroop Room; again first they were presented with the 푁푃 incongruent/ congruent practice stimuli and afterwards the 푁푇 trial stimuli. Directly after finishing the last trial stimuli, a saliva sample was taken(S3). In phase 6 (relaxation phase) the subjects could take off the VR headset, sat down at a desk and could relax while filling out the remaining questionnaires (SSSQ1, FSS, VAS); this took around 10 min; at the end of the phase, the last saliva sample was taken (S4). During the first recording session, the incongruent trial was always done after the congruent trial, as it was expected that higher stress levels were observed during the incongruent one. Since peak cortisol levels for stress exposure are reached around 15 min after the stressor [263, 358, 365], it was decided to not randomize the sequence of trials in the first session, but keep it the same in favor of the cortisol evaluation. In this way, the S4 sample could always be collected at +15 min to +20 min after starting the incongruent trial. For the second recording session, this sequence was semi-randomized while trying to collect a little more incongruent-first conditions, in order to enable more balanced-out data comparisons. In total, 17 of the subjects experienced the incongruent trial first. This was considered a minimal amount to allow balanced evaluations, i.e. comparing the 17 incongruent- first subjects with 17 randomly selected congruent-first subjects. During phases 3 and 5, the Stroop Room software implementation recorded extensive information with detailed associated UTC timestamps about the performance and actions of the subject within the VE. This included head movement, gaze direction, hand (controller) movement, button presses, selected objects in the VE, sequence of Stroop items, and timings between actions. All this information was stored in two separate additional subject-associated CSV-based log-files on the data recording computer (Performance Log). 9.5 Signal Processing The real-time detected 푅-peaks of the BioTrace+ software were used to determine inter-beat-intervals forall records, which were then used with the PhysioNet HRV Toolkit to calculate time-domain and frequency-domain HRV features [130]. The NN/RR measure (percentage of RR-intervals classified as normal), supplied by the toolkit, was used to determine

142 9.5 Signal Processing

reliability of the automated assessment. Evaluated sections with less than 65 % NN/RR were discarded. The HRV measures 푇푅푅, heart rate (HR; calculated from 푇푅푅 and in indirect inverse relationship to it), 푃퐿, 푃퐻, 푡푅푆퐷, and 푝푁50 were used for evaluation according to [75]. From the EDA, the skin conductance level (SCL) and the frequency of non-specific skin conductance responses (NS-SCRs) were extracted for each phase of the experiment according to [91, 289, 366]. For those measures, the ratio of change between baseline (2) and {congruent (3), incongruent (5), relax (6)} phases, respectively, were calculated. Then, the median, mean, confidence interval, and significance of difference was determined for all subjects and subject groups in the same difficulty condition. This procedure was repeated for EDA analysis, except here, to calculate the mean SCL for trials across all subjects, values were log-normalized, but not individual (subject 푖) change percentages 푑iB, according to [289]. Significance levels were determined by Welch’s t-test after testing for normal distribution with the Shapiro-Wilk test. For non-normal distributions, significance was determined by Wilcoxon’s signed-rank test. According to previous literature, the most reliable determinants for the stress level are the HR, 푃퐿/푃퐻, 푡푅푆퐷, and 푝푁50 [131], SCL, and NS-SCRs. Therefore, these parameters were used to compare this stressor to other work where possible. In order to make the results more comparable within this study design, parameters for the baseline and relax phases were calculated from a centered section set to the same length as the shorter one of the two trial phases (usually the congruent one). For various analyses, up to four subjects had to be partially excluded from ECG and EDA analysis due to errors in the recorded signals. One subject was excluded from analysis because of exceptional outliers in the HRV metrics (10-20 times different to the mean value of all other subjects). Cortisol and amylase values were log-normalized to achieve near-normal distributions. Values of S0 were discarded as they were only recorded for subject exclusion and baseline comparisons. Possible influences of age, BMI, or depressive symptoms on cortisol and amylase values were checked using repeated-measures ANOVAs. In order to calculate the Stroop Room performance parameters (section 8.4) and further signals (e.g. HR response to errors) relevant for building the classifier (see next chapter), biosignals and Performance Log data

143 9 Validation of the Stroop Room

Table 9.1: Result parameters as a mean over all subjects. Overview of all important HRV measures (HR, RMSSD, pNN50, LF/HF) [75], the EDA changes (SCL, NS-SCRs) [91, 289], number of Stroop task errors (푛퐸), and task completion time (푡푇), between the four main phases in the evaluation study, for all difficulty conditions combined. The unit is given in the first column in parentheses, the uncertainty is given as 95% confidence interval and in the column 푑B is the delta ratio of change compared to the baseline measurement given – calculated as the average of subject specific change percentages (푑iB) – values > 1 signify an increase, values < 1 signify a decrease of that statistic. Adapted from [43] with permission.

Baseline Congruent 푑B Incongruent 푑B Relax HR (bpm) 72.9 ± 2.5 85.9 ± 1.9 1.18 86.9 ± 2.1 1.19 71.2 ± 1.0 RMSSD (ms) 49.2 ± 4.8 36.8 ± 3.7 0.75 36.6 ± 3.5 0.73 52.1 ± 5.6 pNN50 (%) 30 ± 4 19 ± 5 0.54 17 ± 4 0.52 32 ± 5 LF/HF 1.48 ± 0.22 1.81 ± 0.24 1.46 1.80 ± 0.17 1.47 1.02 ± 0.45

SCL (log µS) 0.80 ± 0.17 1.30 ± 0.15 1.74 1.34 ± 0.15 1.81 1.20 ± 0.15 1 NS-SCRs ( ) 4.6 ± 0.9 9.7 ± 0.9 2.11 9.3 ± 0.9 2.02 4.3 ± 0.7 min 훼-amylase (U/ml) 96 ± 32 90 ± 30 1.29 103 ± 36 1.71 72 ± 28 cortisol (nmol l−1) 5.43 ± 1.04 5.48 ± 1.10 0.97 5.34 ± 1.13 0.94 4.79 ± 0.95

푛퐸 2.25 ± 0.91 10.38 ± 3.34 4.61

푡푇 (s) 1.70 ± 0.11 2.61 ± 0.14 1.54 from the Stroop Room implementation were synchronized in time using associated UTC-converted timestamps with an estimated error of ±1 s. Subjects with more than 60 mistakes (outside two times standard deviations) were discarded as outliers for performance evaluations. 9.6 Results The main findings in the evaluated data are compiled in Table 9.1.It shows the changes between the four main phases of the study. The measured or calculated values are given with their 95 % confidence interval. For the congruent and incongruent trials, the relative change to the Base- line phase is given in a separate column (푑B). This relative change was always calculated based on the average of individual relative changes 푑iB. All changes from baseline to those two phases are highly significant (푝 « 0.001) based on t-test/Wilcoxon evaluations. These are the combined statistics for all four difficulty conditions and all subjects. For changes in individual difficulty conditions, all of them from baseline to congruent and incongruent for 푅0 are highly significant

144 9.6 Results

HR LF/HF RMSSD PNN50 1.7 1.6 6 1.6 2.00 1.4 1.5 5 1.75 1.2 1.50 1.4 4 1.25 1.3 1.0 3 1.2 1.00 0.8 1.1 2 0.75 0.6

Average Relative Change 1.0 1 0.50

0.9 0.4 0.25 0 0.8 Cong. Inco. Relax Cong. Inco. Relax Cong. Inco. Relax Cong. Inco. Relax

Figure 9.3: The overall relative changes in the HRV metrics. Violin plots of the relative changes (푑B) in the four HRV metrics HR, 푃퐿/푃퐻, 푡푅푆퐷, and 푝푁50 [75]. The 1.0-line is emphasized to show the difference to the baseline measurement. These violin plots and all others in this work represent a default boxplot in their center and a kernel density estimation of the underlying data distribution.

(푝 < 0.005), for 푅퐴, also all are highly significant, except for 푃퐿/푃퐻 changes, which are baseline to incongruent significant (푝 = 0.0071) and baseline to incongruent not significant (푝 = 0.0538). The mean heart rate was compared across phases and subjects. The result is shown in Figure 9.4. The Incongruent trial has a significantly higher heart rate (푝 « 0.001), as does the congruent trial (푝 « 0.001). p-values to the relax phase were not determined. The difference between congruent and incongruent trials is not significant in any measure. Between difficulty conditions (rooms 0, A, B, C) there is not much difference, when comparing relative changes in mean heart rate, which are plotted in Figure 9.5. Notably, the environmental pressure condition (walls closing in) of 푅퐵 separates itself from the other three conditions with higher relative changes for congruent and incongruent trials. These differences to all other rooms are significant (e.g. 푝 = 0.0225 vs. 푅퐴). For the HRV, the low-frequency power spectrum was analyzed and compared across phases and subjects. Previous work has shown that this has the highest correlation to stress of all HRV features and increases with stress level [8, 91, 131]. To normalize power spectra, and since 푃퐻 also decreases with increased stress while 푃퐿 increases, the 푃퐿/푃퐻 value was used. The result is shown in Figure 9.3 for all conditions together and

145 9 Validation of the Stroop Room

HR Change 120

100

60 Average Absolute HR (bpm)

40 Base Cong. Inco. Relax Phase

HR Change (Congruent First) HR Change (Incongruent First) 120 120

100 100

80 80

60 60 Average Absolute HR (bpm) Average Absolute HR (bpm)

40 40 Base Cong. Inco. Relax Base Cong. Inco. Relax Phase Phase

Figure 9.4: Distributions of relative average changes in HR. Violin distribution plots of the absolute mean HR of subjects during the baseline phase (Base), the congruent phase (Cong.), the incongruent phase (Inco.), and the relaxation phase (Relax). Plotted for all data (top), only subjects who experienced the congruent phase first (bottom left), and those who experienced the incongruent phase first (bottom right). in Figure 9.5 for each additional difficulty condition. Largest differences in the 푃퐿/푃퐻 parameter were observed in 푅0 for both trials. As mentioned in the Fundamentals, for better orientation, all figures that depict HRV parameters, follow the same color coding scheme. 푇푅푅 or the directly related mean HR parameter use violet colors, the power spectrum parameters like 푃퐿/푃퐻 have red colors, the 푡푅푆퐷 is colored in green tones and the 푝푁50 in orange tones. Plots are additionally separated between gender, which is what also has been done in the original Stroop paper [276] and also successive research has extensively shown a difference in stress depending on sex, especially in VR [269]. Also a separation is made between subjects having experienced the congruent or the incongruent trial phase as their first condition. It is expected

146 9.6 Results

that subjects who had to experience the more difficult condition had no chance to learn and adapt their strategy during the easy congruent condition and might have experienced more stress. The difficulty conditions that caused by far the largest increase in 푃퐿/푃퐻, or decrease in 푝푁50 from baseline were 푅0, and 푅퐵, respectively, see Figure 9.5. The mean changes in HR could be compared to Tulen et al. [283], Re- naud and Blondin [285], Delaney and Brodie [284], and Parsons and Courtney [295] and Poguntke et al. [57]. The results are described as relative change of the largest difference (either baseline to congruent or baseline to incongruent) and are shown in Figure 9.8. The results from my previous work are averaged between the two main VR conditions, ’VR’, and ’VR-hm’ as described in [57]. The Stroop Room induces the largest relative change in HR compared to baseline (in the incongruent trial). The relative change in 푃퐿/푃퐻 ratio was similarly compared and plotted in Figure 9.8. However, this statistic is only available from Delaney and Brodie [284], Usui and Nishida [286], and Poguntke et al. [57]. The Stroop Room causes the highest relative increase in the 푃퐿/푃퐻 HRV statistic compared to all works, except the previous work [57]. However, it should be noted that in the previous work, only 15 subjects were recorded and the mean value had a much larger standard deviation. Figure 9.8 can in general serve as a good overview of comparisons and also to see which statistic was measured previously. Also, the reaction time (푑푡푇 for the Stroop Room) is plotted as a mean change to baseline in this figure.

The mean of all relative EDA changes (푑B), in SCL compared to baseline, increased by 74 % for the congruent, and 81 % for the incongruent trial. These changes are plotted with different evaluative views in Figure 9.9. The number of NS-SCR by 111 % and 102 %, respectively. All changes are significant with 푝 = 0.022, 푝 = 0.012, 푝 < 0.001, and 푝 < 0.001, respectively. Both changes are also described in Table 9.1. The distribution of average relative changes in salivary biomarkers are shown in Figure 9.10 and also in Table 9.1. The 훼-amylase (sAA) showed a large change in the mean from baseline to incongruent trial by a factor of 1.7, while the median was unchanged (1.0). For the congruent condition, the mean change factor was 1.29 while the median was 0.75. Both results are not statistically significant. Data was not normal distributed according to the Shapiro-Wilk test. Variance in general is very large. The cortisol showed a general decrease in the average relative change factor. No interaction could be found between the measure and the trial

147 9 Validation of the Stroop Room

HR Change (Baseline Congruent) HR Change (Baseline Incongruent) Median (all) Median (all) 1.6 Male 1.6 Male Female Female

1.4 1.4

1.2 1.2

1.0 1.0 Average Relative Increase in HR Average Relative Increase in HR 0.8 0.8 0 A B C 0 A B C Room Room LF/HF Change (Baseline Congruent) LF/HF Change (Baseline Incongruent) Median (all) Median (all) 5 Male 5 Male Female Female 4 4

3 3

2 2

1 1

0 0 Average Relative Increase in LF/HF Average Relative Increase in LF/HF

0 A B C 0 A B C Room Room RMSSD Change (Baseline Congruent) RMSSD Change (Baseline Incongruent) Median (all) Median (all) Male Male 1.2 Female 1.2 Female

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2 Average Relative Increase in RMSSD Average Relative Increase in RMSSD 0 A B C 0 A B C Room Room pNN50 Change (Baseline Congruent) pNN50 Change (Baseline Incongruent) 1.4 Median (all) 1.4 Median (all) Male Male 1.2 Female 1.2 Female

1.0 1.0

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2 Average Relative Increase in pNN50 Average Relative Increase in pNN50

0 A B C 0 A B C Room Room Figure 9.5: Distribution plots of relative changes in HRV. Distribution plots of the average relative change from baseline measurement to congruent trial (left sides) and incongruent trials (right side) for the HRV parameters HR (first row), 푃퐿/푃퐻 (second row), 푡푅푆퐷 (third row), and 푝푁50 (last row). Distribution violin plots are two-sided, left side is male participants only and right side is female participants. They are plotted for each room (difficulty). The dashed line shows the median of all data.

148 9.6 Results

Mistakes tT dtT 3.0

60 4.0 T t

t n

3.5 e

50 u r 2.5 g n

40 3.0 o C

m 2.0 2.5 o 30 r f

e s Time (s) 20 2.0 a e

r 1.5 c n I Number of Mistakes 10 1.5 e v i t 1.0 a 0 1.0 l e R 0.5 Cong. Inco. Cong. Inco. Cong. Inco. Phase Phase Mistakes Depending on Room dtT Depending on Room T

60 t

3.0 t n e

50 u r

g 2.5 n

40 o C

o 2.0 30 r f

e s

20 a e

r 1.5 c n I Number of Mistakes 10 e v i

t 1.0 a 0 l e R

0 A B C 0 A B C Room Room Mistakes Depending on First Phase dtT Depending on First Phase

Congruent First T Congruent First

60 t

Incongruent First t Incongruent First

n 3.0 e

50 u r g n

40 o 2.5 C

m o

30 r f

2.0 e s

20 a e r

c 1.5 n I Number of Mistakes 10 e v i t a 0 l 1.0 e R

0 A B C 0 A B C Room Room Mistakes Depending on Gender dtT Depending on Gender

Male T Male

60 t

Female t 3.0 Female n e

50 u r g

n 2.5 40 o C

o 2.0

30 r f

e s

20 a

e 1.5 r c n I Number of Mistakes 10 e 1.0 v i t a 0 l e

R 0.5 0 A B C 0 A B C Room Room Figure 9.6: Distribution plots of performance changes. Distribution plots for the number of mistakes 푛퐸, the task completion time 푡푇 and the change in task completion time 푑푡푇 between congruent and incongruent trial phases. For the bottom three mistake plots, only the incongruent phase is shown.

149 9 Validation of the Stroop Room

Original Stroop Interference vs. Stroop Room

Frequency 1 2 0 0 25 50 75 100 125 150 175 200

1 10 2 Frequency

0 50 100 150 200 250 300 350 400 450 Seconds per Hundred Reactions

Figure 9.7: Comparison of the Stroop interference effect. Compared as 100 a histogram with Gaussian estimated kernel density plot of 푡푇 , between the original Stroop paper [276] (top), and the Stroop Room (bottom). The top plot is a faithful recreation of Ridley Stroop’s original drawing made in his paper (see Figure 3.5). ’1’ is the congruent condition/trial phase with no interference, ’2’ is the incongruent condition/trial phase with the Stroop interference clearly visible. To compare both effects, the Gaussian estimated distribution of ’1’ was matched between both plots by optimizing for the maximum overlap on the x-axis. ’Frequency’ as y-axis description is equivalent to ’number of observations’, the description was kept as in the original paper. or condition. The decrease in the change factor was significant for the change from baseline to the incongruent trial with 푝 = 0.019 and highly significant for the change to the relaxation phase with 푝 = 0.007. For the evaluation of the SSSQ, the scale Engagement showed a significant time effect (푝 = 0.001). Furthermore women reported significantly higher item values than men at the second time of measurement (post- experience) only, 푡(30) = −2.25 (푝 = 0.016). The evaluation of the Flow State Scale (FSS) revealed an overall positive experience with the simulation, with all subscale score means higher than average (scores are in the Likert range 1-5), as depicted in Figure 9.11 and Table 9.2. Most are clearly above the average except for the subscales Loss of Self-Consciousness and Autotelic Experience, which are close to the average score. The overall Flow Score (mean of all subscores) is 3.3 ± 0.22.

150 9.6 Results

Poguntke 2019 n/a

Parsons 13+18 n/a

n/a Usui 2017 n/a n/a Eilola 2011 n/a Heart Rate LF/HF Delaney 2000 n/a Reaction Time

Renaud 1997 n/a

Tulen 1989 n/a n/a Stroop Room 1 1.1 1.2 1.3 1.4 1.5 1.6 1.7 Relative increase in Heart Rate, LF/HF, and Reaction Time

Figure 9.8: Comparison of Stroop Room results with other work. Relative mean increase of HR, 푃퐿/푃퐻 power ratio in the HRV and reaction time between baseline and most stressful condition, compared to previous work that used Stroop stressors and where results for at least one (’n/a’ means not available) of these metrics is available [57, 283–287, 294, 295]. The metrics for [57] were recalculated using the described evaluation pipeline. Figure adapted from [43] with new values.

Table 9.2: The results of the Flow State Scale (FSS). The sub-scores and their standard deviations (SD) are shown.

Subscale Score SD

1. Challenge-Skill Balance 3.64 0.61 2. Merging of Action and Awareness 3.68 0.63 3. Clear Goals 3.36 0.62 4. Unambiguous Feedback 3.34 0.66 5. Total Concentration 3.54 0.60 6. Sense of Control 3.43 0.66 7. Loss of Self–Consciousness 3.13 0.71 8. Time Transformation 3.45 0.57 9. Autotelic Experience 3.11 0.70

151 9 Validation of the Stroop Room

SCL Change from Baseline SCL Change (Congruent Incongruent) 4.0 Median (all) 1.8 Male 3.5 Female 1.6 3.0 1.4 2.5 1.2 2.0 1.0 1.5

1.0 0.8 Average Relative SCL change

0.5 Average Relative Increase in SCL 0.6 Cong. Inco. Relax 0 A B C Phase Room

SCL Change (Baseline Congruent) SCL Change (Baseline Incongruent) Median (all) Median (all) Male 4 Male 4 Female Female

3 3

2 2

1 1 Average Relative Increase in SCL Average Relative Increase in SCL 0 0 0 A B C 0 A B C Room Room

Figure 9.9: EDA changes in the Stroop Room. Violin distribution plots of the absolute mean EDA SCL change of subjects from the baseline to the congruent phase (Cong.), the incongruent phase (Inco.), and the relaxation phase (Relax). Plotted for all data (top left), the change from congruent to incongruent phase for each room (top right), and separated by gender (bottom) between baseline to congruent (bottom left) and to incongruent (bottom right).

Regarding Stroop Room performance parameters, changes in 푡푇 and 푛퐸 are highly significant푝 ( « 0.001) in all measures. The increase in the number of errors from the congruent to the incongruent phase is likely not normally distributed, according to the Shapiro-Wilk test, while the increase in task completion time is supposed to be. Distributions are shown in Figure 9.6 and Figure 9.7. Instead of the increase in 푡푅, most of the results in this section are given or compared in increase in 푡푇. This is due to the fact that the determination of 푡푅 could only be done using an automated algorithm, while 푡푇 was measured on fixed interaction mechanisms in the implementation that provide highly accurate time-values (the error 푡푒 in s of the time- measurements are in the worst case equal to two times the frame-rate

152 9.6 Results

Amylase Change from Baseline Cortisol Change from Baseline 1.3 3.5 1.2 3.0 1.1 2.5 1.0 2.0 0.9

1.5 0.8

1.0 0.7

0.5 0.6 Average Relative change in Cortisol Average Relative change in Amylase 0.5 Cong. Inco. Relax Cong. Inco. Relax Phase Phase

Figure 9.10: 훼-amylaseand cortisol changes in the Stroop Room. Distribution plots of the average relative change in 훼-amylase (left) and cortisol (right) of subjects from the baseline to the congruent phase (Cong.), the incongruent phase (Inco.), and the relaxation phase (Relax).

Challenge-Skill Merging of Action Unambiguous Total Balance and Awareness Clear Goals Feedback Concentration 5 5 5 5 5 4 4 4 4 4 3 3 3 3 3 2 2 2 2 2 FSS Subscore 1 1 1 1 1

Sense of Loss of Self- Time Autotelic Control Consciousness Transformation Experience 5 5 5 5 Stroop Room VR Task 4 4 4 4 2D Screen Task Sports 3 3 3 3 2 2 2 2 FSS Subscore 1 1 1 1

Figure 9.11: Results of the Flow State Scale. The nine subscales of the Flow State Scale (FSS) [360] are each represented by one box. Four different datasets are plotted for each of the subscales. With decreasing relevance to this work: the Stroop Room dataset itself (purple), an experience dataset of users in an engaging virtual environment (“VR Task”) [367], while performing an engaging task on a 2D screen (“2D Screen Task”) [367], and lastly a dataset of a quasi-reference population with over 650 subjects who were asked about flow experiences in their main sport (“Sports”) [368].

153 9 Validation of the Stroop Room

of the engine, which is fixed at 90 Hz when used with the Vive headset: 2 푡 = = 0.022s). 푒 90 9.7 Discussion Subjects having previous experience with the Stroop Test were not excluded from the results, as a practice-effect to reduce Stroop interference is highly dependent on the exact task design [41]. Since the Stroop Room is a new task design, the only relevant practice criteria would be to have experienced the Stroop Room before already. Which, of course, was not the case for any participant. The interaction test revealed no significant differences between participants with previous Stroop experience and those without. Compared to the first evaluation of the Stroop Room in [43], this evaluation with its much higher number of subjects showed a very similar result regarding the HRV parameters, except for the change in the 푃퐿/푃퐻 parameter, which is much more reduced in its mean. The heart rate differences stayed almost similar between trial phases, however the variability reduced by around 1/3 for all phases, as is expected with the much higher number of samples. This is a good sign that the mean values might show a representative result. Specifically looking at the 푃퐿/푃퐻 metric, it seems that a large number of subject samples is required to reduce the influences of variability in the data. When comparing the results of the two Stroop Room evaluations with the previous work my colleagues and myself in [57], as shown in Figure 9.8, it seems plausible that the larger increase in 푃퐿/푃퐻 seen in that work, is not representative and due to large outliers. In the study from 2016, we collected data from 15 subjects and calculated the 푃퐿/푃퐻 value as a mean from only eleven (due to data problems). The same was done in 2019 for the Stroop Room data with 32 subjects, and it yielded a much higher mean result (2.07) in the change of that parameter than the data evaluation of 71 subjects (1.47). In particular the very low increases in HR with the proportionally large ones in 푃퐿/푃퐻 [57], seem to support this point. When looking at the Stroop Room evaluations, the increases in HR are very robust (1.19 with 32 subjects and 1.19 with 71 subjects). The relation of 푃퐿/푃퐻 to HR in the 1.47 Stroop Room are = 1.24, the one for Delaney and Brodie [284] is 1.19 1.29 1.07 = 1.16, the one for [57] is = 1.45. 1.11 1.55 Also in contrast to the first evaluation, the extended results show that almost no difference between the congruent and the incongruent trials

154 9.7 Discussion

show up in the mean across all subjects for most of the HRV parameters. However, looking at gender and room specific results in Figure 9.5, it becomes clear that there are large differences between male and female subjects depending on the room/difficulty they are in. This clearly indicates that stress effects are different for such a cognitive stress, depending on many influential factors. Regarding the gender difference, these can be solemnly attributed tothe actual task design and not the Stroop effect, as this interference does not depend on whether the subject is male or female [41, Result 15]. Figure 9.4 shows for the incongruent-first subjects a much narrower variation band (plot in the lower right of the figure). This is very promising, as the congruent condition was more considered a control condition. However, since many subjects experienced it first, the stress of the following incongruent trial was maybe reduced due to a certain training effect [41, 369]. With the much higher variation inthe HR change for the congruent condition, this alone can not be used as a robust stressor alone. Variability, especially concerning outliers, is largely reduced for the incongruent-first trial sequence, with almost no outliers and variability outside the interquartile range. In general the heart rate showed the most robust behavior between subjects in 퐺1 and those in 퐺2 or 퐺. The 푃퐿/푃퐻 seemed the most unstable, with high variances and many outliers. This might be explained by the delicate nature of short-term frequency analysis. As mentioned earlier, the HRV parameters are usually calculated on long-term recordings in the domain of hours. The 푡푅푆퐷 also had a robust behavior between 퐺1 and 퐺2, with reductions in variability and otherwise similar results. For the 푝푁50, the changes in this parameter are more expressed in 퐺2 then in the subjects only from 퐺1. This can be seen in the mean and 95 %- confidence interval of 푝푁50 decreasing from 0.578 ± 0.123 to 0.543 ± 0.076 for the congruent trials and from 0.527 ± 0.085 to 0.517 ± 0.055 for the incongruent trials. For the cortisol measurements, it seems to be obvious from the results that there is no effect in the Stroop Room in its current form. With non socially evaluative threat in this implementation, this also was not expected. It rather confirms that the Stroop Room can mostly avoid an activation of the HPA axis. Therefore, the stressor effect likely only stems from the cognitive difficulty of the extended Stroop interference. The 훼-amylase shows somewhat inconclusive results. There is a very large inter-subject variance, with some of the subjects reacting strongly

155 9 Validation of the Stroop Room

with an increase in 훼-amylase to the incongruent trial. For about one fourth of the participants, an increase by a factor of two to seven from baseline to incongruent trial could be observed. For more than half of the participants the sAA measure doubled from congruent to incongruent trial. Even though there is a large variance in the data, there is a clear tendency that the larger cognitive demand in the incongruent trial, compared to the congruent one, is reflected in the data samples, e.g. witha median congruent change factor of 0.75 and median incongruent change factor of 1.01. This means that subjects were cognitively relaxed first by entering the Stroop Room in the congruent condition and then stressed again during the incongruent one. Looking at 푡푇, the difference between congruent and incongruent tasks becomes apparent and the overlay figure with the original Stroop test 100 evaluation for 푡푇 shows that the Stroop Room provokesa more significant Stroop effect delay. Subjects in the incongruent-first condition seemto have a longer 푡푇 in all rooms, with the least pronounced effect in 푅퐶. This could be explained by the factor of training, which influences 푡푅 and 푡푇 for the Stroop test [369]. Subjects experiencing the congruent condition first were able to train the response modality intensively inthe easy congruent condition, and therefore probably require less cognitive capacity during the incongruent one for this sub-task, resulting in a much lowered 푡푇 increase. For the incongruent-first group, the 푡푂푆푆 was on average almost two times the one for the congruent-first group, with all their values being above the median with the exception of subjects in 푅퐶, where the OSS task apparently was so much more difficult that the training effect had little influence onthe 푑푡푇. This is not true for any training regarding the Stroop effect, which was equal for both groups, however, congruent-first subjects might have collected also more proficiency in reading words in the VR during the easier task. Figure 9.8 shows a much higher 푡푇 in the incongruent condition for the Stroop Room, compared to all other work. I decided to use the increase in 푡푇 for comparison of increases in reaction time from other work, as those mostly do not even differentiate between 푡푇 and 푡푅 and with the OSS task being the exact same for congruent and incongruent trials in this work, this seems to be a valid approach. The 푡푅 in this work is a different concept and not directly comparable with usual 푡푅 in Stroop Tests. For example for speech-based traditional Stroop Tests, 푡푅 would be measured from display of the stimuli until the beginning of the word pronunciation by the subject. In the case of

156 9.7 Discussion

the Stroop Room, the subject is assumed to have decided for the answer when starting his/her movement, but the time required for the additional orientation and search task can be quite high and might in part correlate with a completely other cognitive function in the brain. Even though for 푅0, 푅퐴, and 푅퐵, the memory recall for selecting the correct wall probably plays a significant role in the orientation and search task, this still depends on whether or not subjects make use of this information stored in their memory (about which color is adjacent to which and thus where the answer-colored wall is located in 3-D space). This is a very interesting question and provides ample opportunities for future research. In particular whether different subjects were able to better memorize colored-wall locations than others and thus have a much shorter search and selection time 푡푂푆푆. This parameter thus might correlate with memorization and memory recall skills under stress/pressure. It is not extensively evaluated in this work as such evaluation results would go beyond the scope of this work in length and would require additional insight into memorization and recall processes in the brain to be introduced in the Fundamentals chapter. The Shapiro-Wilk test yielded a likely chance for non-normality in the increase in 푛퐸 and a likely chance for normality in the increase in 푡푇. However, looking at the Figure 9.6, the top-rightmost plot of 푑푡푇, it intuitively does not seem to be normally distributed. This can further be observed in Figure 9.7, where the result in the 푡푇 is compared to the 푡푅 in the original Stroop experiment [276]. While 푡푇 during the congruent phase seems normally distributed, for the incongruent condition this does not seem to be the case. Rather it appears as if there are two merged normal distributions. This could be the effect of the merged results, especially between 푅퐶 and all others. Indeed, when looking in detail for subjects having a 푡푇 per hundred reactions of less than 250 s, none of them was in the 푅퐶. Also, women seem to have a faster 푡푇 in general, but specifically in 푅0 and 푅퐵. This could mean that men are more susceptible to claustrophobic distress in their performance. However, correlating this with the HR and 푝푁50 values (Figure 9.5), which indicate that women are more stressed in those two phases, it would lead to the conclusion that women deliver a better performance in the Stroop Room under time and claustrophobic stress pressure. Furthermore, a difference is visible in the 푑푡푇 in 푅퐶 between male and female. Women seem to be more affected by a more complicated orientation, search, and selection task with requiring more time to find the correct wall color patch and selecting it.

157 9 Validation of the Stroop Room

Yet, regarding the Stroop interference, Figure 9.7 clearly demonstrates that the Stroop Room at least induces the same magnitude of a Stroop interference than seen in the classical task. This not only validates the underlying assumption of the Stroop Room, namely that Stroop interference is invoked, it also seems to do so in a more significant manner than the classical test. The addition of the orientation-, search-, and selection- task might provoke a worsening of the interference. However, it has previously been reported that manual response modality- based Stroop experiments show a reduced interference effect between congruent and incongruent conditions if a manual response modality is used, e.g. a 푑푡푇 of 1.235 for the manual modality and 1.279 for the verbal variant [41, 369]. This seems not the case for the Stroop Room. Even though a direct comparison is not possible, as no verbal modality condition was used in this evaluation, with a mean 푑푡푇 of 1.54 the Stroop Room effect seems comparable with the original Stroop effect, as visible in Figure 9.7. That means either the Stroop Room’s Stroop effect is at least as effective as the original one, even with manual response, oritis less effective but the cognitive requirement of the additional OSStask is also affected by the interference effect. A test for effects of ’negative priming’ [44, box 3] did not yield anob- servable effect, which was expected from previous findings [41, Result 8]. However it was only tested using the 푡푇. Concluding the discussion, a recap of the observations seem to support the finding that the incongruent trial is most stressful in the Stroop Room within the difficulty conditions of 푅퐴 and 푅퐵 (time-pressure and claustrophobic distress). Increasing the OSS task seems to be not as effective and mainly have influences on the task and reaction time. The 푃퐿/푃퐻 parameter is slightly in disagreement with the other evaluated HRV parameters, the EDA, and the 훼-amylase. This finding suggests that for such short-term evaluation, it is not able to provide a solid foundation for stress determination in the Stroop Room. A cortisol reactivity can not be provoked with the Stroop Room in its current form. The lack of a socially-evaluative stressor, which was from the beginning a design decision, seems to prevent an effect in the cortisol levels. However, the 훼-amylase did show interesting conclusive effects. No effect was found in 푅0 and 푅퐶. In 푅퐴 and 푅퐵 however, an expected effect was found that supports the assumption that the largest stress is induced in the incongruent conditions of 푅퐴 and 푅퐵. This finding is well supported by all the other signals.

158 9.7 Discussion

Large differences have been found in reactivity to the Stroop Roombe- tween male and female subjects. Since this cannot depend on the Stroop effect, it is probably related to the recognition and search task. Women seem to profit more from training (i.e. when first experiencing the easy condition to train and then performing better in the more difficult one), while men show less stress and perform better in comparison when experiencing the more difficult trial without training.

159

10 The Stroop Room Classifier In the previous chapter it was shown that it is feasible to provoke an increased amount of cognitive stress in the Stroop Room using a reinforced Stroop interference in a 3D space, compared to classical 2D versions. Based on the findings gathered during the validation of the Stroop Room, this chapter describes the development of a two-way classification procedure able to classify stress responder types based on ECG signals, and/or based on the performance during the Stroop Room task. The classification of stress habituation is a challenging task, simply due to the demand for repeated stress tests being available for the same participants. With the large effort involved in conducting the TSST, which is the most effective and widely used stress test that allows habituation to be assessed, data is scarce in general. Motivating volunteers to participate in just two repeated TSST iterations is difficult and, compared with the available body of data for single TSST studies, data instances for such experiments are almost non-existent [38]. It has been shown that habituation classification is robustly possible with repeated measurements of cortisol and 훼-amylase, or similar metabolic biomarkers [38]. How- ever, these repeated laboratory stress-exposure tests are hard to come-by precisely because these metabolic biomarkers are required. Addition- ally, robust cortisol responses can only be achieved if a lot of effort is put into these tests. Including a solid acting performance of at least two people to stage social pressure. Still, such longitudinal response observations through multiple tests are desired to advance research in the area of long-term stress reactivity [38]. To overcome this lack, the Stroop Room was designed and validated. In this chapter, it is assessed whether this method, not relying on the observation of metabolic biomarkers, can provide insight and a potentially feasible solution to that problem. Therefore, a classification scheme was developed using two approaches with different existing classifiers that targets the recognition of habituation or non-habituation among subjects only with data that is collected in a non-validating Stroop Room run (ECG-, movement, and Stroop performance data). The content of this chapter has not been published yet, mainly due to the difficult demands on reference data collection, which makes publication a long-term effort.

161 10 The Stroop Room Classifier

10.1 Stress Habituation The goal of the classifier training and evaluation is to separate stress habituation characteristics among participants of the Stroop Room, trained on ground-truth information collected through the TSST. Depending on various factors, each person reacts slightly different to stress [90]. Usually, a healthy reaction observed for the repeated exposure to the same stressor is a habituation of the response. This means that the body is less and less affected, or thrown out of homeostasis, the more often the situation is encountered. This can very well be tested by subjecting people to a two-day repeated TSST. Looking at the cortisol reaction on both days, the cortisol signal curves over time can be characterized as either habituating, non- habituating, sensitizing, non-responding, anticipating on day one, or anticipating on day two [37, 72, 358]. Only the habituating response is considered to be healthy, with non-habituation, e.g. the same amount of stress is experienced again and again, which might lead to exhaustion according to Hans Selye’s GAS model [17]. There have been Machine Learning based approaches to classify general emotions [370, 371] as well as habituation effects [72]. However, these were either not concerned with cortisol or focused on classification of the cortisol profile itself. For this chapter, the goal is to identify and use characteristic patterns in the stress response of the ANS, available in the ECG or performance data of the Stroop Room, in a machine learning based classification method to match and ultimately predict habituation classes among participants in the Stroop Room validation experiment. 10.2 Data Preprocessing As an input to the classifiers, only participants of 퐺Hab were used (11f, 12m). For each of those, an expert in health psychology evaluated the cortisol responses of both TSST sessions they underwent and assigned a habituation label to each subject. This is the target class label. However, since observations are too limited in order to solve a six-class classification problem, the habituation labels were collapsed into just two classes to form a binary-class problem. All observations with a normal habituation response were assigned to the normal class (habituation/habit). Any other observed classes were projected onto the abnormal class (non-habituation/-habit). These constitute the observations for non-habituation itself, sensitization, non-responder, anticipatory, and anticipatory-II. After this preprocessing step, the dataset

162 10.3 Calculation and Selection of Features

contained 14 observations with the habit label and nine observations with the non-habit label. ECG data from all subjects were processed according to the descriptions in the previous chapter and provided to the feature calculation as time/stimulus-synchronized collection of 푅푅-interval information. Additionally, all the other data from the validation of the Stroop Room was available, like mistakes, and 푡푇. 10.3 Calculation and Selection of Features Several features were extracted from the raw or the pre-processed data described in the previous chapter (HRV and Stroop Room performance). Since the goal of the classifier is to assess the stress reactivity, any feature that might be related to such a metric was used. In the following, all these are listed and described in detail. There are three types of features, depending on the data they were derived from. First of all, there are demographic features related to the demographics of the individual subject. Then, there are mistake-related features, which are calculated based on, and referenced to the first 푁 mistakes that a user made in the Stroop Room (mistakes were usually only made in the incongruent condition). The third feature category contains phase-related features which were calculated based on an entire trial-phase or a specific sub-window of a phase in the Stroop Room. An overview of the features is given in Table 10.1, the following two sections will explain their calculation in more detail. Features Related to Mistakes For the first 푁 mistakes that a user of the Stroop Room made, with 푁 = 1, 2, 3, ..., 10, the following features, with 푥 ∈ 푁, were calculated. The median for the number of mistakes across all participants was ten, so this number was selected under the assumption that for 푥 > 10 the subset of usable observations decreases considerably. The mistake-related features were calculated based on a biosignal window with a width of 6 s before mistake 푥 (pre) and another one after mistake 푥 (post). These features are expected to reflect very short-term HRV information about the reaction of the ANS to mistakes under cognitive load. Features Related to Phases Each of the 푑B HRV parameters (see chapter 9) were used as features forall phases p. In addition, forthe incongruent phase, the mean relativechange

163 10 The Stroop Room Classifier

of all parameters was also calculated for four equal sized subphases. Therefore, for the phases p with

p ∈ {congruent,incongruent, (10) inco0,inco1, inco2, inco3, relax}, incongruent describes the entire incongruent trial, inco0 is the first quarter of the phase, inco1 the second, inco2 the third and inco3 the fourth quarter. These features are expected to provide background information to the classifier about the general stress reactivity of the ANSfor each subject. Feature Selection With all described possible feature function permutations, a total of 88 features were used. All features were normalized to zero-mean and a standard deviation of 1, before their input to the feature selection and classification. For feature selection, several different standard methods were used: information gain (reduction of entropy) and information gain ratio [372], Gini and 휒2 statistics [373], ReliefF [374], and the Fast Correlation Based Filter (FCBF) [375]. The values of the five best ranked features for each method were empirically tested for interactions with the incongruent-first group and the room difficulty condition. These two conditions were about equally distributed among observations in feature space. 10.4 Training and Evaluation of the Classifier The following classifiers were trained and evaluated: Naive Bayes, k- Nearest Neighbor (kNN), Stochastic Gradient Decent (SGD), Logistic Re- gression (LR), Support Vector Machine (SVM), Random Forest (RF), and AdaBoost with a classification tree (CT) as base estimator [376]. These constitute a sampling of regularly encountered classification schemes in stress research [8]. In particular it was desired to have a broad range of different classifier approaches according to the taxonomy of Christopher Bishop [376]: kNN as a probabilistic nonparametric method, SGD as an on-line linear regression method using sequential learning, LR as a classical linear probabilistic discriminator, SVM as a sparse non-linear kernel estimator, and CT/RF as an easily boostable combining model with computationally inexpensive implementations.

164 10.5 Results

All classifiers were evaluated using a stratified 5- and 10-fold cross validation (CV) and also a stratified 10-fold (repeated) random sampling with 66/33 training/test split. In order to mitigate a possible bias-effect of the imbalanced class distribution, evaluations were done either without resampling (default), or with different resampling strategies (stratified fixed-size or bootstrap) in order to either change the bias (bootstrap) or provide equal class occurrences (fixed-size). Hyperparameters of each classifier were empirically varied and evaluated. For example with the kNN classifier, different 푘 ∈ {1, 2, ..., 6} were tested, different distance metrics (Euclidean, Manhattan, Mahalanobis) and weightings (Uniform, distance-based), and the SVM using different kernels. Training and evaluation were done using the Python toolkit Orange1. 10.5 Results With resampling (bootstrap), the overall best performance was achieved using the SGD classifier employing the top five features according tothe information gain. These features were (from best to worst): PNN(inco3), HrInc(inco0), PNN(incongruent), HrChange(10), HrAmp(3). The evaluation showed that the classifier had a precision of 0.960 and a recall of 0.957 with an F1 score of 0.956. The confusion matrix is given in Table 10.2. An almost similar result was also achieved using the SVM with linear kernel, however it had a reduced AUC. The confusion matrix shown in Table 10.3 is for the evaluation of the classifiers with a stratified fixed-size (16) resampling. Here, the Random Forest classifier performed best with a F1 score of 0.875. Without resampling, the overall best performance was achieved using the Naive Bayes classifier employing the top five features according to the information gain. These features were (from best to worst): HrAmp(3), HrMed(3), PNN(inco3), HrMed(6), HrChange(10). The evaluation showed that the classifier had a precision of 0.929 and a recall of 0.913 with an F1 score of 0.914. The confusion matrix is given in Table 10.4. These features for both classifiers were among the top 5-10 features for the majority of the different feature selection methods. Other repeatedly encountered features in the top 10 ranking across the selectors were: HrAmp(10), TaskTime(congruent), TCT, HrChange(7), HrMed(7), and HrAmp(7).

1 https://orange.biolab.si/ (visited on 2020-02-15).

165 10 The Stroop Room Classifier

Table 10.1: Overview of the features in the Stroop Room classifier. The features are separated by theircategory and the basis that is used for theircalculation.

Feature Description

Demographic features Gender The gender of the subject (1 for female, 0 for male). Age The age of the subject.

Related to mistake 푥 ∈ 푁 HrAmp(x) Relative change in the pre-to-post absolute heart rate amplitude. HrMed(x) Relative change in the pre-to-post median heart rate. HrMean(x) Relative change in the pre-to-post mean heart rate. HrChange(x) Relative change in the heart rate calculated from start/first value of pre to end/last value of post.

Related to phase 푝 ∈ {congruent, incongruent, inco0, inco1, inco2, inco3, relax} HrInc(p) Relative change/increase of the heart rate as p/baseline.

LF/HF(p) Relative change of the 푃퐿/푃퐻 as p/baseline.

RMSSD(p) Relative change of the 푡푅푆퐷 as p/baseline.

PNN(p) Relative change of the 푝푁50 as p/baseline. Related to phase 푝 ∈ {congruent, incongruent}

Errors(p) Number of mistakes 푛퐸 in the trial phase p.

TaskTime(p) The average 푡푇 in the trial phase p.

TaskTimeStd(p) The average 푡푇 standard deviation in the trial phase p.

TCT Relative change/increase in the mean 푡푇 of the congruent to the incongruent trial.

Features only from the incongruent phase

TTnp The mean 푡푇 of all stimuli affected by negative priming.

TTinco The mean 푡푇 of all stimuli in the incongruent trial.

Table 10.2: Confusion matrix for the SGD classifier. Evaluated Predicted Class using a stratified, bootstrap re- Habit Non-Habit ∑ sampled 10-fold cross-validation Actual Habit 9 1 10 employing the five best ranked Class Non-Habit 0 13 13 features according to information ∑ 9 14 23 gain.

166 10.6 Discussion

Table 10.3: Confusion matrix for the Random Forest classifier. Predicted Class Evaluated using a 10-fold cross- Habit Non-Habit ∑ validated evaluation (with fixed Actual Habit 7 1 8 sample size stratified resampling) Class Non-Habit 1 7 8 employing the six best ranked ∑ 8 8 16 features according to information gain.

Table 10.4: Confusion matrix for the Naive Bayes classifier. Predicted Class Evaluated using a five-fold cross- Habit Non-Habit ∑ validated evaluation (without re- Actual Habit 12 2 14 sampling) employing the five best Class Non-Habit 0 9 9 ranked features according to infor- ∑ 12 11 23 mation gain.

The SGD and Naives Bayes classifiers are considered the best ones of this evaluation, since they did not show any false negative errors (non- habituation classified as habituation). The reason is that, in this context, such type II errors are worse than type I errors (false positives) as they would not detect all actually abnormal cases. 10.6 Discussion With the large and biased number of female subjects having a habituation label, there are several observations that are interesting in this regard, as the suspicion could be brought up that the features in general simply have a high discriminating power for the gender, but not actually the habituation class. Looking at the different feature selection rankings, it becomes clear that the gender as a feature itself only plays a role when ranking them according to the information gain ratio or the ReliefF. Otherwise it does not even appear in the top ten features. However, its importance cannot be neglected. As it became obvious by the statistical analysis of the Stroop Room results in the previous chapter and the evaluation of the more relaxed datasets, the genderdoes have an influence on the biosignal reactivity and performance. It might have been that most women showed a characteristic change in HRV parameters in their reaction to the first errors. So the classifier might just have identified women in the error reaction, and not the stress

167 10 The Stroop Room Classifier

1.5

1.0 0.5

0.5 0.0 0.0

PNN(inco3) 0.5 0.5 HrChange(10)

1.0 1.0 1.5 1 0 1 1 0 1 HrMed(3) PNN(inco3)

Figure 10.1: Feature space overview. Overview over a projection of the feature space. Blue dots are class labels for Habituation, red dots for Non-Habituation. Both plots show different combinations of some of the top five features plotted against each other in several observations. They clarify the good separability of the classes. For the right plot, the HrChange(10) is plotted, which is a feature only available for subjects who made at least ten mistakes, this was only true for a subset of the observations, therefore a reduced number of feature points is visible. responder type. It is known, that certain characteristics of the ECG are different for women [377, 378]. These differences also include HRV measures [379]. Curiously, this work mentions the significant differences only in the frequency-domain features and the 푡푅푆퐷, mainly caused by different breathing patterns between men and women, which hasare highly correlated to the 푃퐻 [379]. The most relevant feature for the Stroop Room classifier is the 푝푁50, which has been reported to be completely unaffected by sex differences [332]. However, the HR seems to be significantly affected by gender differences during the TSST stressor [380], and thus potentially and social-threat stressor. Such observations have not been made for the Stroop Room, which is in its current form mainly a cognitive stressor. When looking at the misclassifications, there is an equal number of male and female samples misclassified, which can be seen as a further indicator that the classifiers are not gender biased. Furthermore, the prior for habituation and non-habituation is gender indifferent, with 70 % and 30 %, respectively. When building a classifier using all 71 observations from 퐺, with the gender as the target class, thus checking whether the features above are able to separate male and female participants, which would indicate a gender-bias, the best result in a stratified 10-fold cross- validated evaluation was an F1 score of 0.69 for the Naive Bayes classifier,

168 10.6 Discussion

with all other classifiers having an F1 score between 0.47 to 0.55. This result supports the standpoint that the features are gender indifferent. This in particular makes the results seem plausible, yet there is just not enough data to properly validate the scores and completely rule out gender bias in the HRV features. A leave-one-subject-out cross validation delivered similar results and also the large information content in some of the non-mistake related features seem to support the outcome. More representative results could be obtained if there were more diverse input observations and a completely independent test dataset from a different, repeated, stress test and a different rater for the habituation class labels. There is a chance that changes in the HRV for mistakes number 3, 6, and 10 were randomly distributed in the feature space in such a favorable way that they allowed an almost perfect feature separation and accordingly high classification accuracy. The changes for other mistake-reactions seem to provide no information content at all. This could however also be related to the fact that not all subjects made more than nine mistakes and were consequently excluded during classifier training or evaluation. However, the striking separability and the sequence of multiples of three for the mistakes make it seem that there indeed is the proposed information content available. When using the 10-fold repeated stratified random sampling to equal- ize class occurrences, the F1 scores reached 1.0 for Logistic Regression and SGD. However, the resampling method did not consider gender bal- ancing, so this inequality is the biggest weak spot of this classification method and its evaluation and the reasons for it not be used as the main result. However, it is a problem of the underlying data in general, not the methodology. A lot more observations of men with a normal habituating response need to be collected to improve validation for this point. Further elaboration can be made around this topic when not normalizing feature space and looking at the scatterplot of HrAmp(3) vs HrMed(3) features in Figure 10.2, which are in most rankings in the top five of best selected features. The figure shows that for most habituators, except one outlier, the heart rate amplitude stayed on the same level or even decreased seemingly significantly. Whereas the change in median heart rate pre- to post-mistake slightly increased or stayed on the same level. For non-habituators, the heart rate amplitude increases around the factor of two, i.e. a fast and sudden change in heart rate, while the median dropped slighly. These characteristics are in contrast to each other and seem to suggest a valid class separability.

169 10 The Stroop Room Classifier

1.1 Figure 10.2: Feature distributions. Projections in non normal- 1.0 ized feature space of HrAmp(3) and HrMed(3), the two most

HrMed(3) 0.9 distinguishing features for normalized, non-resampled classification. Blue dots are class labels 0.8 for Habituation, red dots for Non- Habituation. 1 2 3 HrAmp(3)

Some of the values for preprocessing, feature selection and classifier settings where empirically pre-selected, e.g. the 6 s window for short-term HRV extraction. This leaves room for further research with hyperparameter optimization approaches for this method. Only because the approach worked on this dataset doesn’t guarantee that the selections that were made are the best ones and didn’t lead the classifier training into some random local optimum. There are many approaches on how to tackle this, like exhaustive grid search or random search, which are frequently used, or more complex techniques like Bayesian-, gradient-based-, or evolutionary optimizations [381, 382]. Due to the already very good results, and the more profoundly methodological limitation of the small dataset, they were not used in this work. However, this certainly is a next step as soon as a larger body of observation data becomes available. Looking deeper into the features, ranked by the selectors, an interesting finding can be made in the ReliefF ranking, which pulled up several features characterizing the heart rate recovery during the relaxation phase after the stressor. Namely, the ReliefF ranked the features RMSSD(relax), HrInc(relax), LF/HF(relax), and PNN(relax) among the top 6, 10, 15, and 21 best features for characterization of responder classes, with a resulting F1 score of 0.784 when only using those four features for classification with the AdaBoost classifier. The recovery process after having experienced one of the established stressors, like the TSST, is very important, as it reveals the exact magnitude of the cortisol response. Similarly, the heart rate recovery after high-intensity sports plays a considerable role in the evaluation of the training level and for the characterization of gen-

170 10.6 Discussion

eral health [383]. These important correlations could encourage further research around recovery features for the Stroop Room classifier as well. The final thought for this discussion is the question about whethera three-class separation would have made sense as well. Since the non- habituation class projected all other classes, it might well be that it missed a significant difference in response between e.g. non-responder and other non-habituation classes. However, this again comes down to a problem of data availability. The number of instances available for these other classes (only 2 instances for non-responder), didn’t justify the effort to even try. Putting all the open questions aside, with the sparse dataset available from the 퐺Hab, the results of the developed classifiers are very good. In particular, the lack of false negatives is a delightful observation. The ultimate purpose of this last part in the pipeline of a wearable stress laboratory is to provide a decision about whether or not something seems unusual with the stress reactivity of a subject. Automated healthcare- related recommender systems usually consider false negatives as the more severe error than false positives. The first one would not recommend someone to seek medical attention while it would be required, whereas the second one would recommend it in some cases even for healthy people, which usually is not that much of a problem. Concluding, it can be said that the goal of classifying habituation and non- habituation using a VR implemented cognitive stressor seems feasible with a machine learning approach like the one presented here. Further research needs to be conducted to safely rule out any gender-specific bias of the classifier, this, however, can only be achieved with higher diversity in the data and thus in general a higher data quantity, which will likely require several more years to be collected.

171

Part V

Perspective

11 Summary This thesis presents a fully connectable modular prototype system that can be used to assess and classify stress reactions in-the-wild. The system relies on three key hardware devices: a Bluetooth-enabled, Android- based smartphone, which can be bought off-the-shelf, a corresponding VR device like the Gear VR from Samsung, also available off-the-shelf, and an ECG recording device with Bluetooth real-time data streaming, which was developed for example within the FitnessSHIRT platform. Characteristics of the HRV are provided for three distinct datasets of different stress-level scenarios. One for people in total relaxation, floating in warm water (N=12), another set with slight cognitive demand inside a virtual reality (N=14), and the third one of a cognitively demanding stressful task inside a VR (N=71). An overview of the difference in the HRV throughout these three datasets is provided in Table 11.1 as a summary of all the results presented in the individual chapters. Furthermore, the average relative changes in all the HRV parameters are plotted in an overview plot in Figure 11.1 showing all three datasets. The table and the plot shows how these parameters change from no stress to significant stress. With the associated confidence interval, it becomes clear that the 푡푅푆퐷 and 푝푁50 can provide the best insights into classification of severe stress. The variability of the 푃퐿/푃퐻 parameter during stress is large, therefore, this parameter can not be considered as an isolated feature for comparable stress determination. This by itself constitutes an interesting dataset and result. In this work, it was furthermore used to develop a stress responder classification pipeline that can be used pervasively for large scale screening or as an independent recommender system. Its primary constituent is the Stroop Room, a virtual reality-based stressor application that was derived from the Stroop Test. It enhances the Stroop interference by additional requirements for the subject to increase the cognitive load and thus stress. The reaction of the subjects to errors and the overall development of the ECG-based HRV parameters seems to allow an accurate assessment (>90 % accuracy of the associated classifiers) of whetheror not a person experiencing the stressor will show a habituation characteristic in their cortisol concentration during repeated exposure to a standardized stressor, like the TSST. 11.1 Discussion of Contributions With this work, several contributions are made to extend the state- of-the-art.

175 11 Summary

Table 11.1: Overviewof HRV measures inall datasets. Important HRV measures (HR, RMSSD, pNN50, LF/HF) [75], between all three-phase trials in this work. The unit is given in the first column in parentheses, the uncertainty is given as 95 %-confidence interval.

HRV Measure Dataset Baseline Max Relaxation Max Stress

Water relaxation 74.1 ± 5.6 59.2 ± 6.6 HR (bpm) Biofeedback VR 74.4 ± 5.8 70.4 ± 2.0 Stroop Room 72.9 ± 2.5 71.2 ± 1.0 86.9 ± 2.1

Water relaxation 35.5 ± 5.5 61.1 ± 8.1 RMSSD (ms) Biofeedback VR 38.8 ± 9.6 45.3 ± 10.8 Stroop Room 49.2 ± 4.8 52.1 ± 5.6 36.6 ± 3.5

Water relaxation 13.1 ± 4.3 36.6 ± 7.6 pNN50 (%) Biofeedback VR 18.5 ± 7.4 23.3 ± 9.3 Stroop Room 30 ± 4 32 ± 5 17 ± 4

Water relaxation 4.26 ± 1.05 1.32 ± 0.26 LF/HF Biofeedback VR 3.08 ± 1.36 2.41 ± 0.86 Stroop Room 1.48 ± 0.22 1.02 ± 0.45 1.80 ± 0.17

HR RMSSD PNN50 LF/HF 2.0 1.2 2.0

Water Relaxation 1.1 4 1.5 Biofeedback VR 1.5 Stroop Room 1.0 1.0 2 0.9 1.0 0.5 0.8 Average Relative Change

Figure 11.1: Comparison of HRV changes across all datasets. An overview of the average relative changes of the four main HRV parameters between the three datasets in this work. Each plot represents one of the parameters and inside each plot from left to right the water relaxation (chapter 4), biofeedback VR (chapter 7), and Stroop Room (chapter 8) are depicted as the mean and associated 95 %-confidence interval. The line-of-no-change (1.0) is emphasized to improve orientation in each plot.

176 11.1 Discussion of Contributions

In the Fundamentals chapter and the Related Work, a comprehensive overview of state-of-the-art devices and methods for wearable and portable ECG-related stressdetection is compiled, as well as the state-of-the-art in background knowledge to stress detection from various biosignals. It is a condensed and up-to-date overview of the current status of research in this field. Novel is the everywhere-usable wearable ECG device, which is based on the amplification of current rather than voltage in the input stage. Thisis a new approach to how the ECG is measured when considering the submersion of the electrodes in water or water soaked environments. Before this, the state-of-the-art was to use heavily insulated electrodes together with a regular potential-based ECG device. With the increased availability of TΩ-resistors, transimpedence amplifiers are also feasible now for mass-production in regular ECG devices, opening up the possibility for environment-independent measurement systems. This novel recording technique was used for the acquisition of unfiltered ECG signals of subjects in a highly relaxing environment. The thesis provides evaluation data on how the most important HRV parameters change or behave for subjects who enter a state of considerable external relaxation. The data provides extended insight into how the difference of lying down inside and outside of the water affects the heart rhythm in comparison to a sitting position. It also forms the basis for all further HRV-based data evaluations in this work, since everything is recorded with the same modality and analyzed with the same toolset. Next, an app was presented (Hearty) that was developed in the early days of smartphones to allow back-then unprecedented analysis of the heart rhythm and arrhythmia conditions to be performed directly on the mobile device. It constitutes an energy-efficient 푅-peak detection method and an arrhythmia classifier providing feedback in real-time with an open interface on the Android mobile platform. For this purpose, two important cornerstones in Android/Java processing software were developed. The SensorLib is a unified application programming interface to access all kinds of sensors connectable to smartphones. The JELY provides also a unified and streamlined way of working with ECG data on mobile devices. Furthermore, an algorithm was developed that can improve the general accuracy of 푅-peak detection in any QRS detector. The entire concept of a beat error, here called beat slackness, has not been introduced methodologically before. Usually algorithms use a very straightforward way to tackle this problem, which causes these methods to not even being

177 11 Summary

mentioned in the publications. This is a problem for which this thesis provides a possible solution path. These libraries and the algorithms in the Hearty application can continue to be very useful when thinking about the further development of the smart device sector. The restrictions that used to apply to smartphones, e.g. careful use of battery power and restricted processing capabilities, now apply to smart watches, so the algorithms in this work continue to be useful, now for these kind of platforms. With the majority of my work of the last seven years on these topics being open-sourced and freely available to other researchers or app-developers on GitHub, this provides a big contribution to the community. Going further, the Stroop Room is a completely unprecedented way of using the immersive capabilities of modern VR hardware to bring one of the most effective and widely used cognitive stressors into a3D environment. The interaction of the user with the test was adapted accordingly and resulted in a novel branch of the card-sorting paradigm as an additional method on how to perform a Stroop Test and invoke the Stroop interference effect. The wealth of data available from 71 participants who were tested in the Stroop Room allows extensive new data evaluations on a new experimental dataset. All the prerequisities are provided to build a machine learning system that is able to classify stress reactivity in a self-contained virtual environment- based platform. This was done exemplary in the last chapter of the methods. The classifiers presented there allow a binary classification ofa subject’s stress response in the Stroop Room into healthy and unhealthy responses. This is the first time that the HRV in a variant of the Stroop Test was used in a machine learning approach to classify the cortisol- related habituation to a stressor. If this system can be fully validated with a large body of data, it would constitute an impactful change in stress research and enable completely new approaches to large-scale screening and stress-related recommender systems. However, as it is very difficult to obtain stress responder type labels, the approach described in chapter 10 should still be considered in early development and can not be seen as a validated solution yet. The results need to be confirmed on a less gender-biased dataset. Should further evaluation provide proof of the validity of the Stroop Room classifier, then this could be used as a convenient screening testfor subject inclusion into scientific studies regarding stress. Usually, habituation is considered the healthy stress response and non-habituation the

178 11.2 From Prototype to Continuously Integrated System

unhealthy/unusual. As the classifier is able to very accurately separate those two classes, a lot of effort could be saved for psychologists if they would be able to pre-select participants with an unhealthy stress reaction. In general, this makes the Stroop Room classifier an independent screening tool for abnormal stress reaction. It means it could indeed be used for mass-screening in the general public to serve as a pre-warning for people exposed to too much stress, or even as an admission-recommender tool for subjects to see a health-practitioner. The gold standard so far is the recommendation based on validated questionnaires. Therefore, this work could considerably improve the state-of-the-art. 11.2 From Prototype to Continuously Integrated System The system that was presented in this thesis is a prototype. Not every individual component was continuously integrated with all others to provide a complete solution. However, state-of-the-art methods can be used to fully implement the whole prototypical pipeline, to generate a closed system implementation. First of all, all individual modules are available open-sourced on GitHub (see respective chapters). Next, the following systems are already able to interoperate seamlessly. The Everywhere-ECG streams Bluetooth-based real-time sampled signals of the ECG to the Hearty application. Within the Hearty app, it is possible to connect to the device that was integrated into the SensorLib, which is basically the back-end of Hearty’s hardware connectivity driver. The SensorLib thus provides the raw signal samples to the application, which uses the integrated JELY to process the raw signal and extract QRS complexes and 푅-peaks. The found 푅-peaks are then propagated as system messages within the Android system. Any app on the device can subscribe to those messages and received real-time 푅푅-interval information, optionally with an arrhythmia label for each heartbeat. This beat information can be e.g. picked up by the Unity implementation of the virtual environments presented in this thesis. In the VE of chapter 7 it was fully implemented and the environment reacts to the changes in the heart-beat as well as displaying it. Similarly this could be easily implemented into the VE of the Stroop Room and then be used with the classifier from the last chapter, which is the only part of this work that is not yet fully implemented in an open source project, to reliably classify stress reactivity for any subject completely seamlessly and quite unobtrusively – compared to a TSST.

179 11 Summary

From a prototypical point of view, an integrated system can be built and deployed with off-the-shelf components, consisting of a wearable ECG enabled device (section 3.1 and section 3.2 list all currently available ones), a current smartphone with the Hearty app installed, and put inside a smartphone-enabled VR headset running the Stroop Room.

180 12 Conclusions & Outlook The main conclusions of this work are the following. An ECG can be recorded everywhere, even underwater with no insulation of electrodes necessary using the proposed current-based circuit system. In order to use the ECG information to classify stress, accurate 푅-peak refinement is desired to allow dependable HRV determination. Such HRV parameter derivation on mobile or wearable devices can be done accurately and with small energy consumption using the algorithms proposed in this work. The Stroop Room is a novel approach to implement a 3D-VR enhanced and demanding variant of the Stroop Test. HRV parameters mainly of the incongruent trial of the Stroop Room, and especially the reaction to mistakes, can be used to accurately determine the stress habituation type of a user into habituating or non-habituating. This finding needs to be backed by further evaluations in the future, as data is too limited to draw final conclusions on this topic. More data is needed to fully explore the possibility of stress responder classification. However, it looks promising that in the future, this method could be used as a screening tool and maybe at some point even as a recommender system in the healthcare landscape around stress. Last, but not least, the thesis and its evaluations of relaxation and stress in different environments spawned a three-part extensive dataset of ECG recordings in those conditions. From this set the conclusive recommendation can be drawn to use mainly 푝푁50 and HR features in the HRV domain for robust stress classification due to their small variability at peak stress levels (Figure 11.1), reliable extraction even in moving conditions, and in particular for the 푝푁50 its similarity of characteristics between men and women. In the future, apps and wearables for polling the stress states will allow assessment of millions of people, which will provide new insights into stress patterns [16]. This work was designed as a step into this direction. Not only is the stressor itself more approachable and open source, its playful interaction with the user shows that the design of a stress-test can actually made to be likable by users and thus the motivation to repeatedly perform them is much higher than for classical stress tests in the laboratory. This can tackle a gap in current psychophysiological research: the small amount of data available for stress reactions on repeated stressor exposure. Current research shows that even though the ECG system in this prototype is still mandatory, this requirement could be removed in the future.

181 12 Conclusions & Outlook

Either by classification purely based on user feedback and movement patterns, or by deriving solid HRV signals from headset sensors directly. There are two approaches which can already do something similar. Her- nandez et al. have already shown in 2015 that HR can be derived from smartphone motion sensors alone [123]. Those kind of IMU sensors are also present in typical consumer VR headsets and controllers. There- fore, the same algorithm can be applied to these signals. With further improvements to such algorithms, it might be feasible to calculate some of the relevant HRV signals reliably enough to input them in a similar machine learning system. The other approach is the use of electrodes or PPG sensors integrated into the headset. The feasibility for this has already been demonstrated in 2017 by a startup called Neurable [384] or in [385]. Since the ECG is faintly visible (signal amplitude at head electrodes is about 10x lower than with chest electrodes [385]) in the EEG, and can easily be extracted as a separate signal, an HRV determination would be possible. Regarding PPG sensors, even though the correlation to ECG-derived HR signals is dependent on the sensor location, signal quality and absence of movement artifacts, further developments in this area could provide a more reliable signal for HRV data. It is likely that the predictions of the World Health Organization are true, and stress will become one of the biggest concerns to the healthcare system in this still young decade. The COVID-19 crisis shadowing all other healthcare problems right now do not mean that they are gone, or less important. Maybe, with home-office use now becoming an integral part of work-society, this in-the-wild and in-the-home usable stressor hits a necessary sweet spot at the right time. It is the hope that this work can at least provide scientists with some new insights into a direction of in-the-wild laboratories, in particular regarding stress reactivity, but also on how to use virtual reality in innovative ways to support screening and assessment methodologies that are still firmly in the hand of stationary university and hospital laboratories. The “citizen scientist” will play an increasingly important role in the progression of future science [386]. With more people being interested and knowledgeable about their own biosignals and quantified-self data, the willingness to help research with this will likely increase as well. Finally, there could be even something good coming out of the development of enterprise companies recklessly collecting user data and thus users being less and less careful when sharing their own data. Just a few days before submitting this thesis, the German Robert Koch-Institute, central

182 12 Conclusions & Outlook

information authority for the COVID-19 situation in Germany, released an app that users can download to submit all their HR and movement data from smart wearables in order to track the spread of the coronavirus [387]. This is an unprecedented move of a German government-related authority and clearly shows how fast things can change today. The German government has also recently launched a dedicated campaign for promoting citizen science, with its own internet platform1, and a multitude of funding opportunities [388]. Still, it will require a lot more research and thinking, e.g. simply already about the ethical implications of using platforms like the one presented here in this work, on a very large number of people in the general public, which, even though very much needed, cannot and should not be done as narrow eyed and reckless as it is carried out by large companies or research institutes with commercial interests. This discussion, however, I will not lead further in this work, and leave it to my fellow scientists in their fields of expertise.

1 https://www.buergerschaffenwissen.de (visited on 2020-02-15).

183

Bibliography [1] R. S. Lazarus. Stress and Emotion: A New Synthesis. New York, NY: Springer Publishing Company, 2006. 342 pp. [2] P. Zhou et al. “A Pneumonia Outbreak Associated with a New Coronavirus of Probable Bat Origin.” Nature 579.7798 (Mar. 2020), pp. 270–273. [3] S. Jiang, Z. Shi, Y. Shu, J. Song, G. F. Gao, W. Tan, and D. Guo. “A Distinct Name Is Needed for the New Coronavirus.” The Lancet 395.10228 (Mar. 2020), p. 949. [4] S. Gradl, M. Wirth, R. Richer, N. Rohleder, and B. M. Eskofier. “An Overview of the Feasibility of Permanent, Real-Time, Unobtru- sive Stress Measurement with Current Wearables.” Proceedings of the 13th EAI International Conference on Pervasive Comput- ing Technologies for Healthcare - PervasiveHealth’19. The 13th EAI International Conference. Trento, Italy: ACM Press, 2019, pp. 360–365. [5] B. Beach. “Ageing Populations and Changing Worlds of Work.” Maturitas 78.4 (Aug. 2014), pp. 241–242. [6] J. Goh, J. Pfeffer, and S. A. Zenios. “The Relationship Between Workplace Stressors and Mortalityand Health Costs in the United States.” Management Science 62.2 (Feb. 2016), pp. 608–628. [7] J. Pfeffer. Dying for a Paycheck: How Modern Management Harms Employee Health and Company Performance and What We Can Do about It. HarperCollins Publishers, 2018. 258 pp. [8] A. Alberdi, A. Aztiria, and A. Basarab. “Towards an Automatic Early Stress Recognition System for Office Environments Based on Multimodal Measurements: A Review.” Journal of Biomedical Informatics 59 (Feb. 2016), pp. 49–75. [9] M. Weiser. “The Computer for the 21 St Century.” Scientific Amer- ican (1991), p. 13. [10] J. M. Peake, G. Kerr, and J. P. Sullivan. “A Critical Review of Con- sumer Wearables, Mobile Applications, and Equipment for Pro- viding Biofeedback, Monitoring Stress, and Sleep in Physically Active Populations.” Frontiers in Physiology 9 (June 2018).

185 Bibliography

[11] S. Gradl, M. Zrenner, D. Schuldhaus, M. Wirth, T. Cegielny, C. Zwick, and B. M. Eskofier. “Movement Speed Estimation Based on Foot Acceleration Patterns.” Conf Proc IEEE Eng Med Biol Soc. Honolulu, HI, USA, 2018, p. 4. [12] S. Mann, J. Nolan, and B. Wellman. “Sousveillance: Inventing and Using Wearable Computing Devices for Data Collection in Surveillance Environments.” Surveillance & Society 1.3 (Sept. 2002), pp. 331–355. [13] K. Aizawa, D. Tancharoen, S. Kawasaki, and T. Yamasaki. “Ef- ficient Retrieval of Life Log Based on Context and Content.” Proceedings of the the 1st ACM Workshop on Continuous Archival and Retrieval of Personal Experiences - CARPE’04. The the 1st ACM Workshop. New York, New York, USA: ACM Press, 2004, p. 22. [14] M. Swan. “Sensor Mania! The Internet of Things, Wearable Com- puting, Objective Metrics, and the Quantified Self 2.0.” Journal of Sensor and Actuator Networks 1.3 (Nov. 2012), pp. 217–253. [15] P. B. Shull, W. Jirattigalachote, M. A. Hunt, M. R. Cutkosky, and S. L. Delp. “Quantified Self and Human Movement: A Review on the Clinical Impact of Wearable Sensing and Feedback for Gait Analysis and Intervention.” Gait & Posture 40.1 (May 2014), pp. 11–19. [16] T. S. Lorig. “The Respiratory System.” Handbook of Psychophysi- ology. Ed. by J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson. 4th ed. Cambridge University Press, Dec. 2016, pp. 244–257. [17] H. Selye. “The General Adaptation Syndrome and the Disease of Adaptation.” The Journal of Clinical Endocrinology 6.2 (1946), pp. 117–230. [18] H. Selye. “Confusion and Controversy in the Stress Field.” Journal of Human Stress 1.2 (June 1975), pp. 37–44. [19] S. S. Dickerson, T. L. Gruenewald, and M. E. Kemeny. “When the Social Self Is Threatened: Shame, Physiology, and Health.” Journal of Personality 72.6 (Dec. 2004), pp. 1191–1216. [20] M. R. Salleh. “Life Event, Stress and Illness.” The Malaysian journal of medical sciences: MJMS 15.4 (Oct. 2008), pp. 9–18.

186 Bibliography

[21] N. Rohleder. “Stimulation of Systemic Low-Grade Inflammation by Psychosocial Stress.” Psychosomatic Medicine 76.3 (Apr. 2014), pp. 181–189. [22] J. Siegrist and J. Li. “Work Stress and Altered Biomarkers: A Synthesis of Findings Based on the Effort–Reward Imbalance Model.” International Journal of Environmental Research and Public Health 14.11 (Nov. 2017), p. 1373. [23] W. B. Cannon. “The William Henry Welch Lectures. I. Some New Aspects of Homeostasis.” Journal of Mount Sinai Hospital 5 (1939), pp. 587–597. [24] H. Selye. The Stress of Life. McGraw-Hill Book Co., 1956. [25] World Health Organization. ICD-10: International Statistical Classification of Diseases and Related Health Problems. Geneva: World Health Organization, 2011. [26] World Health Organization, ed. The ICD-10 Classification of Men- tal and Behavioural Disorders: Clinical Descriptions and Diagnos- tic Guidelines. Geneva: World Health Organization, 1992. 362 pp. [27] G. Fink. Stress: The Health Epidemic of the 21st Century | SciTech Connect. Apr. 2016. URL: http://scitechconnect.elsevier.com/ stress-health-epidemic-21st-century/,%20http://scitechconnect. elsevier.com/stress-health-epidemic-21st-century/ (visited on 12/17/2019). [28] E. Brondolo, K. Byer, P. J. Glanaros, C. Liu, A. A. Prather, K. Thomas, C. L. Woods-Giscombé, L. A. Beatty, P. DiSandro, and G. P. Keita. Stress and Health Disparities: Contexts, Mechanisms, and Interventions Among Racial/Ethnic Minority and Low Socio- economic Status Populations: (500202018-001). American Psycho- logical Association, 2017. [29] J. Hassard, K. R. H. Teoh, G. Visockaite, P. Dewe, and T. Cox. “The Cost of Work-Related Stress to Society: A Systematic Review.” Journal of Occupational Health Psychology 23.1 (Jan. 2018), pp. 1– 17. [30] G. Soleil. Workplace Stress: The Health Epidemic of the 21st Cen- tury. Dec. 2017. URL: https : / / www . huffpost . com / entry / workplace-stress-the-heal_b_8923678 (visited on 12/17/2019).

187 Bibliography

[31] Stress, the ”Health Epidemic of the 21st Century”. Apr. 2019. URL: https://hcatodayblog.com/2019/04/30/stress-the-health- epidemic-of-the-21st-century/ (visited on 12/17/2019). [32] S. Gradl, H. Leutheuser, P. Kugler, T. Biermann, S. Kreil, J. Korn- huber, M. Bergner, and B. Eskofier. “Somnography Using Unob- trusive Motion Sensors and Android-Based Mobile Phones.” 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2013 35th Annual Inter- national Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Osaka: IEEE, July 2013, pp. 1182–1185. [33] J. Hassard, K. Teoh, T. Cox, P. Dewe, M. Cosmar, K. Van den Broek, R. Gründler, D. Flemming, European Agency for Safety and Health at Work, TC-OSH, Birkbeck College University of London (BBK), D. Robert Gründler and Danny Flemming, and Prevent. Calculating the Costs of Work-Related Stress and Psychosocial Risks: Literature Review. Luxembourg: Publications Office, 2014. URL: http://dx.publications.europa.eu/10.2802/20493 (visited on 06/24/2019). [34] World Health Organization. Mental Health: Fact Sheet. 2019. URL: http://www.euro.who.int/__data/assets/pdf_file/0004/404851/ MNH_FactSheet_ENG.pdf (visited on 02/29/2020). [35] Scopus Webpage. 2019. URL: https://www.scopus.com/ (visited on 09/11/2019). [36] G. Giannakakis, D. Grigoriadis, K. Giannakaki, O. Simantiraki, A. Roniotis, and M. Tsiknakis. “Review on Psychological Stress Detection Using Biosignals.” IEEE Transactions on Affective Com- puting (2019), pp. 1–1. [37] B. S. McEwen. “Stress and the Individual: Mechanisms Leading to Disease.” Archives of Internal Medicine 153.18 (Sept. 1993), p. 2093. [38] N. Rohleder. “Stress and Inflammation – The Need to Address the Gap in the Transition between Acute and Chronic Stress Effects.” Psychoneuroendocrinology 105 (July 2019), pp. 164–171. [39] C. Kirschbaum, K.-M. Pirke, and D. H. Hellhammer. “The ‘Trier Social Stress Test’ – A Tool for Investigating Psychobiological Stress Responses in a Laboratory Setting.” Neuropsychobiology 28.1-2 (1993), pp. 76–81.

188 Bibliography

[40] L. Schwabe, L. Haddad, and H. Schachinger. “HPA Axis Activa- tion by a Socially Evaluated Cold-Pressor Test.” Psychoneuroen- docrinology 33.6 (July 2008), pp. 890–895. [41] C. M. MacLeod. “Half a Century of Research on the Stroop Ef- fect: An Integrative Review.” Psychological Bulletin 109.2 (1991), pp. 163–203. [42] F. Scarpina and S. Tagini. “The Stroop Color and Word Test.” Frontiers in Psychology 8.557 (Apr. 2017). [43] S. Gradl, M. Wirth, N. Machtlinger, R. Poguntke, A. Wonner, N. Rohleder, and B. M. Eskofier. “The Stroop Room: A Virtual Reality-Enhanced Stroop Test.” 25th ACM Symposium on Virtual Reality Software and Technology - VRST ’19. 25th ACM Symposium on Virtual Reality Software and Technology. Parramatta, NSW, Australia: ACM Press, 2019, pp. 1–12. [44] C. M. MacLeod and P. A. MacDonald. “Interdimensional Interfer- ence in the Stroop Effect: Uncovering the Cognitive and Neural Anatomy of Attention.” Trends in Cognitive Sciences 4.10 (Oct. 2000), pp. 383–391. [45] E. M. Altmann, J. G. Trafton, and D. Z. Hambrick. “Momentary Interruptions Can Derail the Train of Thought.” Journal of Exper- imental Psychology: General 143.1 (2014), pp. 215–226. [46] G. Mark, S. T. Iqbal, M. Czerwinski, P. Johns, A. Sano, and Y. Lutchyn. “Email Duration, Batching and Self-Interruption: Pat- terns of Email Use on Productivity and Stress.” Proceedings of the 2016 CHI Conference on Human Factors in Computing Sys- tems. CHI’16: CHI Conference on Human Factors in Computing Systems. San Jose California USA: ACM, May 2016, pp. 1717–1728. [47] Udemy. 2018 Workplace Distraction Report. 2018. URL: https: //research.udemy.com/research_report/udemy-depth-2018- workplace-distraction-report/ (visited on 04/01/2020). [48] R. S. Ulrich, R. F. Simons, B. D. Losito, E. Fiorito, M. A. Miles, and M. Zelson. “Stress Recovery during Exposure to Natural and Urban Environments.” Journal of Environmental Psychology 11.3 (Sept. 1991), pp. 201–230. [49] J. Heerwagen. “Green Buildings, Organizational Success and Oc- cupant Productivity.” Building Research & Information 28.5-6 (Sept. 2000), pp. 353–367.

189 Bibliography

[50] J. Kabat-Zinn. Full Catastrophe Living: Using the Wisdom of Your Body and Mind to Face Stress, Pain, and Illness. In collab. with U. of Massachusetts Medical Center/Worcester. Delta trade pbk. reissue. New York, N.Y: Delta Trade Paperbacks, 2005. 471 pp. [51] S. R. Kellert, J. Heerwagen, and M. Mador, eds. Biophilic Design: The Theory, Science, and Practice of Bringing Buildings to Life. Hoboken, N.J: Wiley, 2008. 385 pp. [52] B. L. Fredrickson. “Cultivating Positive Emotions to Optimize Health and Well-Being.” Prevention & Treatment 3.1 (2000). [53] K. Gillis and B. Gatersleben. “A Review of Psychological Litera- ture on the Health and Wellbeing Benefits of Biophilic Design.” Buildings 5.3 (Aug. 2015), pp. 948–963. [54] S. Gradl, T. Cibis, J. Lauber, R. Richer, R. Rybalko, N. Pfeiffer, H. Leutheuser, M. Wirth, V.von Tscharner, and B. M. Eskofier. “Wear- able Current-Based ECG Monitoring System with Non-Insulated Electrodes for Underwater Application.” Applied Sciences 7.12 (Dec. 2017), p. 1277. [55] S. Gradl, P. Kugler, C. Lohmüller, and B. M. Eskofier. “Real- Time ECG Monitoring and Arrhythmia Detection Using Android- Based Mobile Devices.” 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2012 34th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). San Diego, CA, USA: IEEE, Aug. 2012, pp. 2452–2455. [56] S. Gradl, H. Leutheuser, M. Elgendi, N. Lang, and B. M. Eskofier. “Temporal Correction of Detected R-Peaks in ECG Signals: A Cru- cial Step to Improve QRS Detection Algorithms.” 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2015 37th Annual International Confer- ence of the IEEE Engineering in Medicine and Biology Society (EMBC). Milan: IEEE, Aug. 2015, pp. 522–525. [57] R. Poguntke, M. Wirth, and S. Gradl. “Same Same but Different: Exploring the Effects of the Stroop Color Word Test in Virtual Reality.” Human-Computer Interaction – INTERACT 2019. Ed. by D. Lamas, F. Loizides, L. Nacke, H. Petrie, M. Winckler, and P. Zaphiris. Vol. 11747. Cham: Springer International Publishing, 2019, pp. 699–708.

190 Bibliography

[58] GitHub, Inc. Github. Feb. 2020. URL: https://github.com (visited on 02/26/2020). [59] S. Gradl, M. Wirth, T. Zillig, and B. M. Eskofier. “Visualization of Heart Activity in Virtual Reality: A Biofeedback Application Using Wearable Sensors.” 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN). 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN). Las Vegas, NV: IEEE, Mar. 2018, pp. 152–155. [60] H. Leutheuser, S. Gradl, P. Kugler, L. Anneken, M. Arnold, S. Achenbach, and B. M. Eskofier. “Comparison of Real-Time Clas- sification Systems for Arrhythmia Detection on Android-Based Mobile Devices.” 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Chicago, IL: IEEE, Aug. 2014, pp. 2690–2693. [61] N. R. Lang, M. Brischwein, E. Hasslmeyer, D. Tantinger, S. Feilner, A. Heinrich, H. Leutheuser, S. Gradl, C. Weigand, B. Eskofier, and M. Struck. “Filter and Processing Method to Improve R-Peak Detection for ECG Data with Motion Artefacts from Wearable Systems.” 2015 Computing in Cardiology Conference (CinC). 2015 Computing in Cardiology Conference (CinC). Nice, France: IEEE, Sept. 2015, pp. 917–920. [62] H. Leutheuser, S. Gradl, B. M. Eskofier, A. Tobola, N. Lang, L. Anneken, M. Arnold, and S. Achenbach. “Arrhythmia Classifica- tion Using RR Intervals: Improvement with Sinusoidal Regres- sion Feature.” 2015 IEEE 12th International Conference on Wear- able and Implantable Body Sensor Networks (BSN). 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN). Cambridge, MA, USA: IEEE, June 2015, pp. 1–5. [63] S. Gradl, B. M. Eskofier, D. Eskofier, C. Mutschler, and S. Otto. “Virtual and Augmented Reality in Sports - An Overview and Acceptance Study.” Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing Adjunct - UbiComp ’16. The 2016 ACM International Joint Conference. Heidelberg, Germany: ACM Press, 2016, pp. 885–888.

191 Bibliography

[64] H. Leutheuser, S. Gradl, L. Anneken, M. Arnold, N. Lang, S. Achenbach, and B. M. Eskofier. “Instantaneous P- and T-Wave Detection: Assessment of Three ECG Fiducial Points Detection Algorithms.” 2016 IEEE 13th International Conference on Wearable and Implantable Body Sensor Networks (BSN). 2016 IEEE 13th International Conference on Wearable and Implantable Body Sensor Networks (BSN). San Francisco, CA, USA: IEEE, June 2016, pp. 329–334. [65] M. Altini, P. Mullan, M. Rooijakkers, S. Gradl, J. Penders, N. Geusens, L. Grieten, and B. Eskofier. “Detection of Fetal Kicks Using Body-Worn Accelerometers during Pregnancy: Trade-Offs between Sensors Number and Positioning.” 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2016 38th Annual International Confer- ence of the IEEE Engineering in Medicine and Biology Society (EMBC). Orlando, FL, USA: IEEE, Aug. 2016, pp. 5319–5322. [66] H. Leutheuser, N. R. Lang, S. Gradl, M. Struck, A. Tobola, C. Hofmann, L. Anneken, and B. M. Eskofier. “Textile Integrated Wearable Technologies for Sports and Medical Applications.” Smart Textiles. Ed. by S. Schneegass and O. Amft. Cham: Springer International Publishing, 2017, pp. 359–382. [67] A. Stefke, F. Wilm, R. Richer, S. Gradl, B. M. Eskofier, C. Forster, and B. Namer. ““MigraineMonitor” – Towards a System for the Prediction of Migraine Attacks Using Electrostimulation.” Cur- rent Directions in Biomedical Engineering 4.1 (2018), pp. 629– 632. [68] M. Zrenner, S. Gradl, U. Jensen, M. Ullrich, and B. Eskofier. “Com- parison of Different Algorithms for Calculating Velocity and Stride Length in Running Using Inertial Measurement Units.” Sensors 18.12 (Nov. 2018), p. 4194. [69] M. Wirth, S. Gradl, D. Poimann, H. Schaefke, J. Matlok, H. Koerger, and B. M. Eskofier. “Assessment of Perceptual-Cognitive Abilities among Athletes in Virtual Environments: Exploring In- teraction Concepts for Soccer Players.” Proceedings of the 2018 on Designing Interactive Systems Conference 2018 - DIS ’18. The 2018. Hong Kong, China: ACM Press, 2018, pp. 1013–1023.

192 Bibliography

[70] M. Wirth, S. Gradl, D. Poimann, R. Richer, J. Ottmann, and B. M. Eskofier. “Exploring the Feasibility of EMG Based Interaction for Assessing Cognitive Capacity in Virtual Reality.” 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Honolulu, HI: IEEE, July 2018, pp. 4953–4956. [71] M. Wirth, S. Gradl, J. Sembdner, S. Kuhrt, and B. M. Eskofier. “Evaluation of Interaction Techniques for a Virtual Reality Read- ing Room in Diagnostic Radiology.” The 31st Annual ACM Sympo- sium on User Interface Software and Technology - UIST ’18. The 31st Annual ACM Symposium. Berlin, Germany: ACM Press, 2018, pp. 867–876. [72] L. Abel, R. Richer, A. Küderle, S. Gradl, B. M. Eskofier, and N. Rohleder. “Classification of Acute Stress-Induced Response Pat- terns.” Proceedings of the 13th EAI International Conference on Pervasive Computing Technologies for Healthcare - Pervasive- Health’19. The 13th EAI International Conference. Trento, Italy: ACM Press, 2019, pp. 366–370. [73] T. Feigl, D. Roth, S. Gradl, M. Wirth, M. E. Latoschik, B. M. Eskofier, M. Philippsen, and C. Mutschler. “Sick Moves! Motion Parameters as Indicators of Simulator Sickness.” IEEE Transac- tions on Visualization and Computer Graphics 25.11 (Nov. 2019), pp. 3146–3157. [74] G. G. Berntson, K. S. Quigley, G. J. Norman, and D. L. Lozano. “Cardiovascular Psychophysiology.” Handbook of Psychophysiol- ogy. Ed. by J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson. 4th ed. Cambridge University Press, Dec. 2016, pp. 183–216. [75] M. Malik, J. Thomas Bigger, A. John Camm, Robert E. Kleiger, Alberto Malliani, Arthur J. Moss, and Peter J. Schwartz. “Heart Rate Variability: Standards of Measurement, Physiological Inter- pretation and Clinical Use. (Task Force of the European Society of Cardiology and the North American Society of Pacing and Electrophysiology).” Circulation 93.5 (1996), pp. 1043–1065. [76] J. Jerald. The VR Book: Human-Centered Design for Virtual Real- ity. First edition. ACM Books 8. New York: acm, Association for Computing Machinery, 2016. 599 pp.

193 Bibliography

[77] H. Selye. “Stress without Distress.” Psychopathology of Human Adaptation. May 1976, pp. 137–146. [78] G. P. Chrousos. “Stressors, Stress, and Neuroendocrine Integra- tion of the Adaptive Response: The 1997 Hans Selye Memorial Lec- ture.” Annals of the New York Academy of Sciences 851 (1 STRESS OF LIF June 1998), pp. 311–335. [79] S. Folkman, R. S. Lazarus, C. Dunkel-Schetter, A. DeLongis, and R. J. Gruen. “Dynamics of a Stressful Encounter: Cognitive Ap- praisal, Coping, and Encounter Outcomes” (1986), p. 12. [80] R. S. Lazarus. “From Psychological Stress to the Emotions: A History of Changing Outlooks” (1993), p. 22. [81] J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson, eds. Handbook of Psychophysiology. 4th ed. Cambridge University Press, Dec. 2016. [82] S. S. Dickerson and M. E. Kemeny. “Acute Stressors and Cortisol Responses: A Theoretical Integration and Synthesis of Laboratory Research.” Psychological Bulletin 130.3 (2004), pp. 355–391. [83] A. Feder, E. J. Nestler, and D. S. Charney. “Psychobiology and Molecular Genetics of Resilience.” Nature Reviews Neuroscience 10.6 (June 2009), pp. 446–457. [84] D. McDuff, S. Gontarek, and R. Picard. “Remote Measurement of Cognitive Stress via Heart Rate Variability.” 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Chicago, IL: IEEE, Aug. 2014, pp. 2957–2960. [85] Y. Noto, T. Sato, M. Kudo, K. Kurata, and K. Hirota. “The Rela- tionship Between Salivary Biomarkers and State-Trait Anxiety Inventory Score Under Mental Arithmetic Stress: A Pilot Study.” Anesthesia & Analgesia (Dec. 2005), pp. 1873–1876. [86] D. A. Granger, K. T. Kivlighan, M. El-Sheikh, E. B. Gordis, and L. R. Stroud. “Salivary -Amylase in Biobehavioral Research: Re- cent Developments and Applications.” Annals of the New York Academy of Sciences 1098.1 (Mar. 2007), pp. 122–144.

194 Bibliography

[87] U. Nater and N. Rohleder. “Salivary Alpha-Amylase as a Non- Invasive Biomarker for the Sympathetic Nervous System: Current State of Research.” Psychoneuroendocrinology 34.4 (May 2009), pp. 486–496. [88] T. Smeets, S. Cornelisse, C. W. Quaedflieg, T. Meyer, M. Jelicic, and H. Merckelbach. “Introducing the Maastricht Acute Stress Test (MAST): A Quick and Non-Invasive Approach to Elicit Robust Autonomic and Glucocorticoid Stress Responses.” Psy- choneuroendocrinology 37.12 (Dec. 2012), pp. 1998–2008. [89] N. Hjortskov, D. Rissén, A. K. Blangsted, N. Fallentin, U. Lundberg, and K. Søgaard. “The Effect of Mental Stress on Heart Rate Variability and Blood Pressure during Computer Work.” Eu- ropean Journal of Applied Physiology 92.1-2 (June 2004), pp. 84– 89. [90] B. S. McEwen. “Protective and Damaging Effects of Stress Media- tors.” The New England Journal of Medicine 338 (1998), pp. 171– 179. [91] M. E. Dawson, A. M. Schell, and D. L. Filion. “The Electrodermal System.” Handbook of Psychophysiology. Ed. by J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson. 4th ed. Cambridge University Press, Dec. 2016, pp. 217–243. [92] M. van Dooren, J. (.-J. de Vries, and J. H. Janssen. “Emotional Sweating across the Body: Comparing 16 Different Skin Conduc- tance Measurement Locations.” Physiology & Behavior 106.2 (May 2012), pp. 298–304. [93] E. Smets, W. De Raedt, and C. Van Hoof. “Into the Wild: The Challenges of Physiological Stress Detection in Laboratory and Ambulatory Settings.” IEEE Journal of Biomedical and Health Informatics 23.2 (2018), pp. 463–473. [94] N. Sharma and T. Gedeon. “Objective Measures, Sensors and Computational Techniques for Stress Recognition and Classifica- tion: A Survey.” Computer Methods and Programs in Biomedicine 108.3 (Dec. 2012), pp. 1287–1301. [95] J. Kempfle and K. V. Laerhoven. “Respiration Rate Estimation with Depth Cameras: An Evaluation of Parameters” (2018), p. 10.

195 Bibliography

[96] F. Seoane, I. Mohino-Herranz, J. Ferreira, L. Alvarez, R. Buendia, D. Ayllón, C. Llerena, and R. Gil-Pita. “Wearable Biomedical Mea- surement Systems forAssessmentof Mental Stress of Combatants in Real Time.” Sensors 14.4 (Apr. 2014), pp. 7120–7141. [97] A. Tonacci, F. Sansone, A. P. Pala, and R. Conte. “Exhaled Breath Analysis in Evaluation of Psychological Stress: A Short Literature Review: EXHALED BREATH AND PSYCHOLOGICAL STRESS.” International Journal of Psychology (May 2018). [98] S. D. Aeschliman, M. S. Blue, K. B. Williams, C. M. Cobb, and S. R. MacNeill. “A Preliminary Study on Oxygen Saturation Levels of Patients During Periodontal Surgery with and without Oral Conscious Sedation Using Diazepam.” Journal of Periodontology 74.7 (July 2003), pp. 1056–1059. [99] M. Taj-Eldin, C. Ryan, B. O’Flynn, and P. Galvin. “A Review of Wearable Solutions for Physiological and Emotional Monitoring for Use by People with Autism Spectrum Disorder and Their Caregivers.” Sensors 18.12 (Dec. 2018), p. 4271. [100] C. H. Vinkers, R. Penning, J. Hellhammer, J. C. Verster, J. H. G. M. Klaessens, B. Olivier, and C. J. Kalkman. “The Effect of Stress on Core and Peripheral Body Temperature in Humans.” Stress 16.5 (Sept. 2013), pp. 520–530. [101] R. Luijcks, H. J. Hermens, L. Bodar, C. J. Vossen, J. van. Os, and R. Lousberg. “Experimentally Induced Stress Validated by EMG Activity.” PLoS ONE 9.4 (Apr. 2014). Ed. by A. Macaluso, e95215. [102] O. Hidaka, M. Yanagi, and K. Takada. “Mental Stress-Induced Physiological Changes in the Human Masseter Muscle.” Journal of Dental Research 83.3 (Mar. 2004), pp. 227–231. [103] J. Wijsman, B. Grundlehner, J. Penders, and H. Hermens. “Trapezi- us Muscle EMG as Predictor of Mental Stress.” ACM Transactions on Computational Logic (2010), pp. 155–163. [104] R. Richer, N. Zhao, J. Amores, B. M. Eskofier, and J. A. Paradiso. “Real-Time Mental State Recognition Using a Wearable EEG.” 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2018 40th Annual Inter- national Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Honolulu, HI: IEEE, July 2018, pp. 5495– 5498.

196 Bibliography

[105] M. Tanida, M. Katsuyama, and K. Sakatani. “Relation between Mental Stress-Induced Prefrontal Cortex Activity and Skin Con- ditions: A near-Infrared Spectroscopy Study.” Brain Research 1184 (Dec. 2007), pp. 210–216. [106] H. van Steenbergen, G. P. H. Band, and B. Hommel. “Threat But Not Arousal Narrows Attention: Evidence from Pupil Dilation and Saccade Control.” Frontiers in Psychology 2 (2011). [107] G. Giannakakis, M. Pediaditis, D. Manousos, E. Kazantzaki, F. Chiarugi, P. Simos, K. Marias, and M. Tsiknakis. “Stress and Anx- iety Detection Using Facial Cues from Videos.” Biomedical Signal Processing and Control 31 (Jan. 2017), pp. 89–101. [108] A. Zoerner, S. Oertel, M. P. M. Jank, L. Frey, B. Langenstein, and T. Bertsch. “Human Sweat Analysis Using a Portable Device Based on a Screen-Printed Electrolyte Sensor.” Electroanalysis 30.4 (Apr. 2018), pp. 665–671. [109] P. Meerlo, A. Sgoifo, and D. Suchecki. “Restricted and Disrupted Sleep: Effects on Autonomic Function, Neuroendocrine Stress Systems and Stress Responsivity.” Sleep Medicine Reviews 12.3 (June 2008), pp. 197–210. [110] M. A. Martens, A. Antley, D. Freeman, M. Slater, P. J. Harrison, and E. M. Tunbridge. “It Feels Real: Physiological Responses to a Stressful Virtual Reality Environment and Its Impact on Working Memory.” Journal of Psychopharmacology 33.10 (Oct. 2019), pp. 1264–1273. [111] A. C. Pearson. Apple Watch Fails To Notify Patient Of 3 Hour Episode Of Rapid Atrial Fibrillation. Dec. 2018. URL: https:// theskepticalcardiologist.com/2018/12/13/apple-watch-fails-to- notify-patient-of-3-hour-episode-of-rapid-atrial-fibrillation/ (visited on 10/12/2019). [112] J. Wiesel. “Method and Apparatus for Detecting Atrial Fibrilla- tion.” U.S. pat. 9681819 B2. June 2017. [113] J. Torres. Apple Watch Atrial Fibrillation Feature at the Heart of a Patent Lawsuit. Dec. 2019. URL: https://www.slashgear.com/ apple-watch-atrial-fibrillation-feature-at-the-heart-of-a-patent- lawsuit-29604629/ (visited on 01/10/2020). [114] K.-P. Bethge and B.-D. Gonska. Langzeit-Elektrokardiographie. Berlin, Heidelberg: Springer Berlin Heidelberg, 1992.

197 Bibliography

[115] J. W. Hurst. “Naming of the Waves in the ECG, With a Brief Account of Their Genesis.” Circulation 98 (1998), pp. 1937–1942. [116] N. J. Holter and J. A. Gengerelli. “Remote Recording of Physio- logical Data by Radio.” 46 (1949), pp. 747–751. [117] S. S. Barold. “Norman J. “Jeff” Holter–“Father” of Ambulatory ECG Monitoring.” Journal of Interventional Cardiac Electrophysiology 14.2 (Nov. 2005), pp. 117–118. [118] N. J. Holter. “New Method for Heart Studies: Continuous Elec- trocardiography of Active Subjects over Long Periods Is Now Practical.” Science 134.3486 (Oct. 1961), pp. 1214–1220. [119] J. S. Steinberg et al. “2017 ISHNE-HRS Expert Consensus Statement on Ambulatory ECG and External Cardiac Monitor- ing/Telemetry.” Annals of Noninvasive Electrocardiology 22.3 (May 2017), e12447. [120] J. W. Rohrbaugh. “Ambulatory and Non-Contact Recording Meth- ods.” Handbook of Psychophysiology. Ed. by J. T. Cacioppo, L. G. Tassinary, and G. G. Berntson. 4th ed. Cambridge University Press, Dec. 2016, pp. 300–338. [121] B. S. Bozhenko. “[Seismocardiography–a new method in the study of functional conditions of the heart].” Terapevticheskii Arkhiv 33 (Sept. 1961), pp. 55–64. [122] M. Jafari Tadi, E. Lehtonen, A. Saraste, J. Tuominen, J. Koskinen, M. Teräs, J. Airaksinen, M. Pänkäälä, and T. Koivisto. “Gyrocardio- graphy: A New Non-Invasive Monitoring Method for the Assess- ment of Cardiac Mechanics and the Estimation of Hemodynamic Variables.” Scientific Reports 7.1 (Dec. 2017), p. 6823. [123] J. Hernandez, D. J. McDuff, and R. W. Picard. “Biophone: Phys- iology Monitoring from Peripheral Smartphone Motions.” 2015 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2015 37th Annual Inter- national Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). Milan: IEEE, Aug. 2015, pp. 7180–7183. [124] M. Zhao, F. Adib, and D. Katabi. “Emotion Recognition Using Wireless Signals.” Proceedings of the 22nd Annual International Conference on Mobile Computing and Networking - MobiCom ’16. The 22nd Annual International Conference. New York City, New York: ACM Press, 2016, pp. 95–108.

198 Bibliography

[125] Wei Hu, Zhangyan Zhao, Yunfeng Wang, Haiying Zhang, and Fujiang Lin. “Noncontact Accurate Measurement of Cardiopul- monary Activity Using a Compact Quadrature Doppler Radar Sensor.” IEEE Transactions on Biomedical Engineering 61.3 (Mar. 2014), pp. 725–735. [126] J. Kranjec, S. Beguš, G. Geršak, and J. Drnovšek. “Non-Contact Heart Rate and Heart Rate Variability Measurements: A Review.” Biomedical Signal Processing and Control 13 (Sept. 2014), pp. 102– 112. [127] J. Hernandez, Y. Li, J. M. Rehg, and R. W. Picard. “BioGlass: Physi- ological Parameter Estimation Using a Head-Mounted Wearable Device.” EAI 4th International Conference on Wireless Mobile Communication and Healthcare (Mobihealth). Athens, Greece, 2014, p. 5. [128] J. T. Bigger, J. L. Fleiss, R. C. Steinman, L. M. Rolnitzky, W. J. Schneider, and P. K. Stein. “RR Variability in Healthy, Middle- Aged Persons Compared With Patients With Chronic Coronary Heart Diseaseor Recent Acute Myocardial Infarction.” Circulation 91.7 (Apr. 1995), pp. 1936–1943. [129] H.-J. Trappe and H.-P. Schuster. EKG-Kurs für Isabel. 6., überar- beitete und erweiterte Auflage. Stuttgart: Georg Thieme Verlag, 2013. 325 pp. [130] J. E. Mietus and A. L. Goldberger. Heart Rate Variability Analysis with the HRV Toolkit. 2018. URL: https://archive.physionet.org/ tutorials/hrv-toolkit/ (visited on 12/15/2019). [131] H.-G. Kim, E.-J. Cheon, D.-S. Bai, Y. H. Lee, and B.-H. Koo. “Stress and Heart Rate Variability: A Meta-Analysis and Review of the Literature.” Psychiatry Investigation 15.3 (Mar. 2018), pp. 235–245. [132] F. Yasuma and J.-i. Hayano. “Respiratory Sinus Arrhythmia: Why Does the Heartbeat Synchronize with Respiratory Rhythm?” Chest 125.2 (Feb. 2004), pp. 683–690. [133] J. Allen. “Photoplethysmography and Its Application in Clinical Physiological Measurement.” Physiological Measurement 28.3 (Mar. 2007), R1–R39.

199 Bibliography

[134] E. Gil, M. Orini, R. Bailón, J. M. Vergara, L. Mainardi, and P. Laguna. “Photoplethysmography Pulse Rate Variability as a Surrogate Measurement of Heart Rate Variability during Non- Stationary Conditions.” Physiological Measurement 31.9 (Sept. 2010), pp. 1271–1290. [135] P. Milgram and F. Kishino. “A Taxonomy of Mixed Reality Visual Displays.” IEICE Transactions on Information and Systems E77- D.12 (Dec. 1994), pp. 1321–1329. [136] J. Jantz, A. Molnar, and R. Alcaide. “A Brain-Computer Interface for Extended Reality Interfaces.” ACM SIGGRAPH 2017 VR Village on - SIGGRAPH ’17. ACM SIGGRAPH 2017 VR Village. Los Angeles, California: ACM Press, 2017, pp. 1–2. [137] B. Jones and N. Waliczek. WebXR Device API. Oct. 2019. URL: https://www.w3.org/TR/webxr/ (visited on 12/26/2019). [138] J. Steuer. “Defining Virtual Reality: Dimensions Determining Telepresence.” Journal of Communication 42.4 (Dec. 1992), pp. 73– 93. [139] J. Conditt. Facebook Buys Oculus VR. Mar. 2014. URL: https : //www.engadget.com/2014/03/25/facebook-buys-oculus-vr/ (visited on 12/19/2019). [140] Why Virtual Reality Is About to Change the World. Aug. 2015. URL: https://time.com/3987022/why-virtual-reality-is-about-to- change-the-world/ (visited on 12/19/2019). [141] J. Conditt. The Surprising Joy of Time’s Virtual Reality Cover Star- ring Palmer Luckey. Aug. 2015. URL: https://www.engadget.com/ 2015/08/06/time-vr-palmer-luckey-cover-photoshop/ (visited on 12/19/2019). [142] M. Bolas, P. Hoberman, Thai Phan, P. Luckey, J. Iliff, N. Burba, I. McDowall, and D. M. Krum. “Open Virtual Reality.” 2013 IEEE Virtual Reality (VR). 2013 IEEE Virtual Reality (VR). Lake Buena Vista, FL: IEEE, Mar. 2013, pp. 183–184. [143] I. E. Sutherland. “A Head-Mounted Three Dimensional Display.” Proceedings of the December 9-11, 1968, Fall Joint Computer Con- ference, Part I on - AFIPS ’68 (Fall, Part I). The December 9-11, 1968, Fall Joint Computer Conference, Part I. San Francisco, Cali- fornia: ACM Press, 1968, p. 757.

200 Bibliography

[144] T. Starner. “Project Glass: An Extension of the Self.” IEEE Pervasive Computing 12.2 (Apr. 2013), pp. 14–16. [145] J. Bailenson, K. Patel, A. Nielsen, R. Bajscy, S.-H. Jung, and G. Kurillo. “The Effect of Interactivity on Learning Physical Actions in Virtual Reality.” Media Psychology 11.3 (Sept. 2008), pp. 354– 376. [146] M. Casale. “STRIVR Training Demonstrates Faster and More Accurate Learning Compared to Traditional Study Methods.” 2017. [147] J. Amores, X. Benavides, and P. Maes. “PsychicVR: Increasing Mindfulness by Using Virtual Reality and Brain Computer In- terfaces.” Proceedings of the 2016 CHI Conference Extended Ab- stracts on Human Factors in Computing Systems - CHI EA ’16. The 2016 CHI Conference Extended Abstracts. Santa Clara, California, USA: ACM Press, 2016, pp. 2–2. [148] T. F. Wechsler, F. Kümpers, and A. Mühlberger. “Inferiority or Even Superiority of Virtual Reality Exposure Therapy in Pho- bias?—A Systematic Review and Quantitative Meta-Analysis on Randomized Controlled Trials Specifically Comparing the Efficacy of Virtual Reality Exposure to Gold Standard inVivo Exposure in Agoraphobia, Specific Phobia, and Social Phobia.” Frontiers in Psychology 10 (Sept. 2019), p. 1758. [149] J. Blascovich, J. Loomis, A. C. Beall, K. R. Swinth, C. L. Hoyt, and J. N. Bailenson. “Immersive Virtual Environment Technology as a Methodological Tool for Social Psychology.” Psychological Inquiry 13.2 (Apr. 2002), pp. 103–124. [150] J. Brookes, M. Warburton, M. Alghadier, M. Mon-Williams, and F. Mushtaq. “Studying Human Behavior with Virtual Reality: The Unity Experiment Framework.” Behavior Research Methods (Apr. 2019). [151] M. Slater and S. Wilbur. “A Framework for Immersive Virtual Environments Speculationson the Role of Presence in Virtual Environments.” Presence 6.6 (1997), pp. 603–616. [152] J. J. Cummings and J. N. Bailenson. “How Immersive Is Enough? A Meta-Analysis of the Effect of Immersive Technology on User Presence.” Media Psychology 19.2 (Apr. 2016), pp. 272–309.

201 Bibliography

[153] B. King and J. Borland. Dungeons & Dreamers: A Story of How Computer Games Created a Global Community. Second edition. Pittsburgh, PA: ETC Press, 2014. 277 pp. [154] S. Miklaucic. John Carmack - Biography & Facts. 2019. URL: https: //www.britannica.com/biography/John-Carmack (visited on 12/20/2019). [155] A. Chalk. John Carmack Says He’s Not ’satisfied with the Pace of Progress’ in VR Development. Nov. 2019. URL: https://www. pcgamer.com/john-carmack-says-hes-not-satisfied-with-the- pace-of-progress-in-vr-development/ (visited on 01/10/2020). [156] C. Zellweger, B. E. Barberis, C. S. Kim, C. S. I. Conlee, and B. K. N. Robertson. “Head Mounted Display.” U.S. pat. D761258 S. HTC Corporation. July 2016. [157] VIVE Pro Starter Kit. 2020. URL: https://www.vive.com/us/ product/vive-pro-starter-kit/ (visited on 04/09/2020). [158] P. Luckey, B. I. Trexler, G. England, and J. McCauley. “Virtual Reality Headset.” U.S. pat. D701206S1. Oculus VR, Inc. Mar. 2014. [159] Y. C. Poon, R. James, and J. Wu. “Hololens Light Engine with Lin- ear Array Imagers and Mems.” Pat. WO 2019/067042 Al. Microsoft Technology Licensing, LLC. Apr. 2019. [160] G. R. Bradski, S. A. Miller, and R. Abovitz. “Methods and Sys- tems for Creating Virtual and Augmented Reality.” U.S. pat. 2016/0026253 A1. Magic Leap, Inc. Jan. 2016. [161] M. Fitzsimmons and 2018. Apple’s Tim Cook: ’AR Has the Ability to Amplify Human Performance’. Feb. 2018. URL: https://www. techradar.com/news/apples-tim-cook-ar-has-the-ability-to- amplify-human-performance (visited on 12/20/2019). [162] CRYENGINE - The Complete Solution for next Generation Game Development by Crytek. 2019. URL: https://www.cryengine.com/ (visited on 12/12/2019). [163] C. Scheller. CyberSession - Virtual Reality for Research and Expo- sure Exercises. 2019. URL: https://www.cybersession.info/en/ (visited on 12/12/2019). [164] Limbix - Digital Therapeutics for Mental Health. 2019. URL: https: //www.limbix.com/ (visited on 12/12/2019).

202 Bibliography

[165] Psious - Virtual Reality Platform for Psychology and Mental Health. 2019. URL: https://psious.com/ (visited on 12/12/2019). [166] Source - Valve Developer Community. Feb. 2020. URL: https : / / developer . valvesoftware . com / wiki / Source (visited on 03/13/2020). [167] Source 2 - Valve Developer Community. Mar. 2020. URL: https: / / developer . valvesoftware . com / wiki / Source _ 2 (visited on 03/13/2020). [168] Unity 3D Engine Platform. 2019. URL: https : / / unity . com / frontpage (visited on 12/12/2019). [169] Unreal Engine Platform. 2019. URL: https://www.unrealengine. com/en-US/what-is-unreal-engine-4 (visited on 12/12/2019). [170] Virtually Better. 2019. URL: http://www.virtuallybetter.com/ (visited on 12/12/2019). [171] WorldViz - Virtual Reality Creation and Collaboration. 2019. URL: https://www.worldviz.com/ (visited on 12/12/2019). [172] BIOPAC Systems, Inc. Unity VR Interface for AcqKnowledge. 2019. URL: https://www.biopac.com/product/vr-unity-interface/ (visited on 12/26/2019). [173] F. A. Saunders, W. A. Hill, and B. Franklin. “A Wearable Tactile Sensory Aid for Profoundly Deaf Children.” Journal of Medical Systems 5.4 (Dec. 1981), pp. 265–270. [174] T. Starner. “Human-Powered Wearable Computing.” IBM Systems Journal 35.3.4 (1996), pp. 618–629. [175] O. D. Lara and M. A. Labrador. “A Survey on Human Activity Recognition Using Wearable Sensors.” IEEE Communications Surveys & Tutorials 15.3 (2013), pp. 1192–1209. [176] J. Klucken, J. Barth, P. Kugler, J. Schlachetzki, T. Henze, F. Marxreiter, Z. Kohl, R. Steidl, J. Hornegger, B. Eskofier, and J. Winkler. “Unbiased and Mobile Gait Analysis Detects Motor Im- pairment in Parkinson’s Disease.” PLoS ONE 8.2 (Feb. 2013). Ed. by M. Toft, e56956. [177] T. Vandenberk, J. Stans, C. Mortelmans, R. Van Haelst, G. Van Schelvergem, C. Pelckmans, C. J. Smeets, D. Lanssens, H. De Cannière, V. Storms, I. M. Thijs, B. Vaes, and P. M. Vandervoort. “Clinical Validation of Heart Rate Apps: Mixed-Methods Evalua- tion Study.” JMIR mHealth and uHealth 5.8 (Aug. 2017), e129.

203 Bibliography

[178] A. C. Pearson. Putting The Apple Watch 4 ECG To The Test In Atrial Fibrillation: An Informal Comparison To Kardia. Dec. 2018. URL: https://theskepticalcardiologist.com/2018/12/10/putting- the-apple-watch-4-ecg-to-the-test-in-atrial-fibrillation-an- informal-comparison-to-kardia/ (visited on 10/12/2019). [179] A. C. Pearson. The New Apple Watch 4: Cardiac Accuracy Un- known, ”Game-Changing” Benefits Overblown. Sept. 2018. URL: https://theskepticalcardiologist.com/2018/09/13/the-new-apple- watch-4-cardiac-accuracy-unknown-game-changing-benefits- overblown/ (visited on 10/12/2019). [180] Apple Watch Series 4. Feb. 2019. URL: https://www.apple.com/ lae/apple-watch-series-4/ (visited on 02/24/2019). [181] D. Hernando, S. Roca, J. Sancho, Á. Alesanco, and R. Bailón. “Validation of the Apple Watch for Heart Rate Variability Mea- surements during Relax and Mental Stress in Healthy Subjects.” Sensors 18.8 (Aug. 2018), p. 2619. [182] Biocalculus. 2019. URL: https : / / www . mybiocalculus . com / technology (visited on 10/08/2019). [183] BioNomadix 2Ch Wireless ECG Transmitter+Receiver | BN-ECG2 | BIOPAC. Oct. 2019. URL: https://www.biopac.com/product/ bionomadix-2ch-ecg-amplifier/ (visited on 10/05/2019). [184] Cardea SOLO Wireless ECG Wearable Monitor. 2019. URL: https: / / www . cardiacinsightinc . com / cardea - solo/ (visited on 10/09/2019). [185] FitnessSHIRT. Oct. 2019. URL: https://www.iis.fraunhofer.de/ en/ff/sse/mks/prod/fitnessshirt.html (visited on 10/05/2019). [186] T. Ha, J. Tran, S. Liu, H. Jang, H. Jeong, R. Mitbander, H. Huh, Y. Qiu, J. Duong, R. L. Wang, P. Wang, A. Tandon, J. Sirohi, and N. Lu. “A Chest-Laminated Ultrathin and Stretchable E-Tattoo for the Measurement of Electrocardiogram, Seismocardiogram, and Cardiac Time Intervals.” Adv. Sci. (2019), p. 13. [187] Hexoskin Smart Shirts. 2019. URL: https://www.hexoskin.com/ (visited on 03/07/2019).

204 Bibliography

[188] S. Abdallah, C. Wilkinson-Maitland, M. Waskiw-Ford, I. Abdal- lah, A. Lui, B. Smith, J. Bourbeau, and D. Jensen. “Late Breaking Abstract - Validation of Hexoskin Biometric Technology to Mon- itor Ventilatory Responses at Rest and during Exercise in COPD.” Clinical Problems. ERS International Congress 2017 Abstracts. European Respiratory Society, Sept. 2017, PA1359. [189] C. Al Sayed, L. Vinches, and S. Hallé. “Validation of a Wearable Biometric System’s Ability to Monitor Heart Rate in Two Different Climate Conditions under Variable Physical Activities.” E-Health Telecommunication Systems and Networks 06.02 (2017), pp. 19– 30. [190] C. A. Elliot, M. J. Hamlin, and C. A. Lizamore. “Validity and Reli- ability of the Hexoskin Wearable Biometric Vest During Maximal Aerobic Power Testing in Elite Cyclists:” Journal of Strength and Conditioning Research 33.5 (May 2019), pp. 1437–1444. [191] AliveCor KardiaMobile. 2019. URL: https://www.alivecor.com/ kardiamobile/ (visited on 10/05/2019). [192] J. C. Himmelreich, E. P.Karregat, W. A. Lucassen, H. C. van Weert, J. R. de Groot, M. L. Handoko, R. Nijveldt, and R. E. Harskamp. “Diagnostic Accuracy of a Smartphone-Operated, Single-Lead Electrocardiography Device for Detection of Rhythm and Con- duction Abnormalities in Primary Care.” The Annals of Family Medicine 17.5 (Sept. 2019), pp. 403–411. [193] M. Karacan, N. Celik, E. E. Gul, C. Akdeniz, and V. Tuzcu. “Valida- tion of a Smartphone-Based Electrocardiography in the Screening of QT Intervals in Children.” 6.1 (2019), pp. 48–52. [194] E. Lai, K. Boyd, D. Albert, M. Ciocca, and E. H. Chung. “Heart Rate Variability in Concussed Athletes: A Case Report Using the Smartphone Electrocardiogram.” HeartRhythm Case Reports 3.11 (Nov. 2017), pp. 523–526. [195] Lief Therapeutics. 2019. URL: https://getlief.com/ (visited on 10/08/2019). [196] NeXus-10 MKII Biofeedback and Neurofeedback Device - Mind Media. Oct. 2019. URL: https : / / www . mindmedia . com / en / products/nexus-10-mkii/ (visited on 10/05/2019).

205 Bibliography

[197] C. Steinberg, F. Philippon, M. Sanchez, P. Fortier-Poisson, G. O’Hara, F. Molin, J.-F. Sarrazin, I. Nault, L. Blier, K. Roy, B. Plourde, and J. Champagne. “A Novel Wearable Device for Con- tinuous Ambulatory ECG Recording: Proof of Concept and As- sessment of Signal Quality.” Biosensors 9.1 (Jan. 2019), p. 17. [198] QardioCore. 2019. URL: https : / / www . getqardio . com / en / qardiocore-wearable-ecg-ekg-monitor-iphone/ (visited on 09/09/2019). [199] Shimmer ECG Sensor. Oct. 2019. URL: http : / / www . shimmersensing.com//products/ecg-development-kit (visited on 10/05/2019). [200] A. Burns, B. R. Greene, M. J. McGrath, T. J. O’Shea, B. Kuris, S. M. Ayer, F. Stroiescu, and V. Cionca. “SHIMMER™ – A Wireless Sensor Platform for Noninvasive Biomedical Research.” IEEE Sensors Journal 10.9 (Sept. 2010), pp. 1527–1534. [201] TAGecg Wearable ECG Sensor. 2019. URL: https : / / www . welchallyn.com/en/products/categories/cardiopulmonary/ wearable - ecg - monitors / tagecg - wearable - ecg - sensor . html (visited on 10/08/2019). [202] VivaLNK Inc. VivaLNK Vital Scout. 2019. URL: http : / / www . vivalnk.com/ecg-monitor (visited on 10/07/2019). [203] Withings Move ECG. 2019. URL: https://www.withings.com/us/ en/move-ecg (visited on 03/07/2019). [204] Zephyr™ Performance Systems. 2019. URL: https : / / www . zephyranywhere.com/system (visited on 10/07/2019). [205] iRhythm. Oct. 2019. URL: https://www.irhythmtech.com (visited on 10/05/2019). [206] M. P. Turakhia, D. D. Hoang, P. Zimetbaum, J. D. Miller, V. F. Froelicher, U. N. Kumar, X. Xu, F. Yang, and P. A. Heidenreich. “Diagnostic Utility of a Novel Leadless Arrhythmia Monitoring Device.” The American Journal of Cardiology 112.4 (Aug. 2013), pp. 520–524. [207] P. M. Barrett, R. Komatireddy, S. Haaser, S. Topol, J. Sheard, J. Encinas, A. J. Fought, and E. J. Topol. “Comparison of 24-Hour Holter Monitoring with 14-Day Novel Adhesive Patch Electrocar- diographic Monitoring.” The American Journal of Medicine 127.1 (Jan. 2014), 95.e11–95.e17.

206 Bibliography

[208] J. A. Walsh, E. J. Topol, and S. R. Steinhubl. “Novel Wireless Devices for Cardiac Monitoring.” Circulation 130.7 (Aug. 2014), pp. 573–581. [209] Psychlab. 2020. URL: http://www.psychlab.com/index.html (visited on 02/22/2020). [210] R. Bennett and A. French. “Rise of the Smart Device ECG and What It Means for the General Cardiologist.” Heart (Aug. 2019), heartjnl-2019–315357. [211] NICE. Lead-I ECG Devices for Detecting Symptomatic Atrial Fi- brillation Using Single Time Point Testing in Primary Care. May 2019. URL: https://www.nice.org.uk/guidance/dg35/resources/ leadi-ecg-devices-for-detecting-symptomaticatrial-fibrillation- using-single-time-point-testing-inprimary-%20care-%20pdf- 1053752401861. [212] D. W. Mortara. “High Resolution ECG Waveform Processor.” U.S. pat. 4630204. Mortara Instrument Inc. 1986. [213] J. Pan and W. J. Tompkins. “A Real-Time QRS Detection Algo- rithm.” IEEE Transactions on Biomedical Engineering BME-32.3 (Mar. 1985), pp. 230–236. [214] P. S. Hamilton and W. J. Tompkins. “Quantitative Investigation of QRS Detection Rules Using the MIT/BIH Arrhythmia Database.” IEEE Transactions on Biomedical Engineering BME-33.12 (Dec. 1986), pp. 1157–1165. [215] G. Moody and R. Mark. “The Impact of the MIT-BIH Arrhythmia Database.” IEEE Engineering in Medicine and Biology Magazine 20.3 (May-June/2001), pp. 45–50. [216] A. L. Goldberger, L. A. N. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley. “PhysioBank, PhysioToolkit, and PhysioNet: Com- ponents of a New Research Resource for Complex Physiologic Signals.” Circulation 101.23 (June 2000). [217] B.-U. Köhler, C. Hennig, and R. Orglmeister. “The Principles of Software QRS Detection.” IEEE Engineering in Medicine and Biology Magazine 21.1 (2002), pp. 42–57.

207 Bibliography

[218] M. Elgendi, B. Eskofier, S. Dokos, and D. Abbott. “Revisiting QRS Detection Methodologies for Portable, Wearable, Battery- Operated, and Wireless ECG Systems.” PLoS ONE 9.1 (Jan. 2014). Ed. by L. A. N. Amaral, e84018. [219] B.-U. Köhler, C. Hennig, and R. Orglmeister. “QRS Detection Using Zero Crossing Counts.” Progress in Biomedical Research 8.3 (2003), pp. 138–145. [220] V. Krasteva and I. Jekova. “QRS Template Matching for Recogni- tion of Ventricular Ectopic Beats.” Annals of Biomedical Engineer- ing 35.12 (Nov. 2007), pp. 2065–2076. [221] M. Elgendi. “Fast QRS Detection with an Optimized Knowledge- Based Method: Evaluation on 11 Standard ECG Databases.” PLoS ONE 8.9 (Sept. 2013). Ed. by A. Talkachova, e73557. [222] A. Y. Hannun, P.Rajpurkar, M. Haghpanahi, G. H. Tison, C. Bourn, M. P. Turakhia, and A. Y. Ng. “Cardiologist-Level Arrhythmia Detection and Classification in Ambulatory Electrocardiograms Using a Deep Neural Network.” Nature Medicine 25.1 (Jan. 2019), pp. 65–69. [223] M. Porumb, E. Iadanza, S. Massaro, and L. Pecchia. “A Convo- lutional Neural Network Approach to Detect Congestive Heart Failure.” Biomedical Signal Processing and Control 55 (Jan. 2020), p. 101597. [224] J.-P. Niskanen, M. P. Tarvainen, P. O. Ranta-aho, and P. A. Karjalainen. “Software for Advanced HRV Analysis.” Computer Methods and Programs in Biomedicine 76.1 (Oct. 2004), pp. 73–81. [225] L. Rodríguez-Liñares, A. Méndez, M. Lado, D. Olivieri, X. Vila, and I. Gómez-Conde. “An Open Source Tool for Heart Rate Vari- ability Spectral Analysis.” Computer Methods and Programs in Biomedicine 103.1 (July 2011), pp. 39–50. [226] D. Kaplan and P. Staffin. Heart Rate Variability Software. 2019. URL: https://www.macalester.edu/~kaplan/hrv/doc/ (visited on 12/15/2019). [227] L. Barrios, P. Oldrati, S. Santini, and A. Lutterotti. “Evaluating the Accuracy of Heart Rate Sensors Based on Photoplethysmography for In-the-Wild Analysis” (2019), p. 11.

208 Bibliography

[228] A. Shcherbina, C. Mattsson, D. Waggott, H. Salisbury, J. Christle, T. Hastie, M. Wheeler, and E. Ashley. “Accuracy in Wrist-Worn, Sensor-Based Measurements of Heart Rate and Energy Expen- diture in a Diverse Cohort.” Journal of Personalized Medicine 7.2 (May 2017), p. 3. [229] AURA Devices. AURA Band. 2020. URL: https://auraband.io/auraband.html (visited on 03/12/2020). [230] Biostrap. 2019. URL: https : / / biostrap . com/ (visited on 03/07/2019). [231] O. Dur, C. Rhoades, M. S. Ng, R. Elsayed, R. van Mourik, and M. D. Majmudar. “Design Rationale and Performance Evaluation of the Wavelet Health Wristband: Benchtop Validation of a Wrist-Worn Physiological Signal Recorder.” JMIR mHealth and uHealth 6.10 (Oct. 2018), e11040. [232] D. Jarchi, D. Salvi, C. Velardo, A. Mahdi, L. Tarassenko, and D. A. Clifton. “Estimation of HRV and SpO2 from Wrist-Worn Com- mercial Sensors for Clinical Settings.” 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN). 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN). Las Vegas, NV, USA: IEEE, Mar. 2018, pp. 144–147. [233] G. Sneddon, R. van Mourik, P. Law, O. Dur, D. Lowe, and C. Carlin. “Cardiorespiratory Physiology Remotely Monitored via Wearable Wristband Photoplethysmography: Feasibility and Initial Bench- marking.” Thorax 73 (Suppl 4 Dec. 2018), A197. [234] Empatica E4. 2019. URL: https://www.empatica.com/research/ e4/ (visited on 03/07/2019). [235] G. Regalia, F. Onorati, M. Lai, C. Caborni, and R. W. Picard. “Multimodal Wrist-Worn Devices for Seizure Detection and Ad- vancing Research: Focus on the Empatica Wristbands.” Epilepsy Research (Feb. 2019). [236] O. E. Krigolson, C. C. Williams, A. Norton, C. D. Hassall, and F. L. Colino. “Choosing MUSE: Validation of a Low-Cost, Portable EEG System for ERP Research.” Frontiers in Neuroscience 11 (Mar. 2017). [237] Biovotion Everion. URL: https://www.biovotion.com/everion/ (visited on 04/07/2020).

209 Bibliography

[238] Fitbit Charge 3. 2019. URL: https://www.fitbit.com/eu/charge3 (visited on 03/07/2019). [239] J. Hwang, J. Kim, K.-J. Choi, M. S. Cho, G.-B. Nam, and Y.-H. Kim. “Assessing Accuracy of Wrist-Worn Wearable Devices in Measurement of Paroxysmal Supraventricular Tachycardia Heart Rate.” Korean Circulation Journal 49 (2019). [240] Vivosmart 4. 2019. URL: https://buy.garmin.com/en-US/US/p/ 605739 (visited on 03/07/2019). [241] M. Smith and M. Powers. “Does the Garmin Vivosmart HR Ac- curately Measure Steps and Energy Expenditure?” International Journal of Exercise Science: Conference Proceedings 11.4 (2016). [242] Misfit Vapor 2. 2019. URL: https://misfit.com/products/misfit- vapor-2 (visited on 03/07/2019). [243] A. Henriksen, M. Haugen Mikalsen, A. Z. Woldaregay, M. Muzny, G. Hartvigsen, L. A. Hopstock, and S. Grimsgaard. “Using Fitness Trackers and Smartwatches to Measure Physical Activity in Re- search: Analysis of Consumer Wrist-Worn Wearables.” Journal of Medical Internet Research 20.3 (Mar. 2018), e110. [244] Muse™. 2019. URL: https://choosemuse.com/muse-2/ (visited on 03/07/2019). [245] J. Amores, R. Richer, N. Zhao, P. Maes, and B. M. Eskofier. “Pro- moting Relaxation Using Virtual Reality, Olfactory Interfaces and Wearable EEG.” 2018 IEEE 15th International Conference on Wear- able and Implantable Body Sensor Networks (BSN). 2018 IEEE 15th International Conference on Wearable and Implantable Body Sensor Networks (BSN). Las Vegas, NV, USA: IEEE, Mar. 2018, pp. 98–101. [246] Omron Healthcare, Inc. Omron HeartGuide Wearable Blood Pressure Monitor. 2020. URL: https://omronhealthcare.com/ products / heartguide - wearable - blood - pressure - monitor - bp8000m/ (visited on 02/17/2020). [247] Oura Ring. 2019. URL: https : / / ouraring . com/ (visited on 03/07/2019).

210 Bibliography

[248] B.-F. Gheran, J. Vanderdonckt, and R.-D. Vatavu. “Gestures for Smart Rings: Empirical Results, Insights, and Design Implica- tions.” Proceedings of the 2018 on Designing Interactive Systems Conference 2018 - DIS ’18. The 2018. Hong Kong, China: ACM Press, 2018, pp. 623–635. [249] Polar Vantage Series. 2019. URL: https://www.polar.com/en/ vantage (visited on 03/07/2019). [250] A. Erdogan, C. Cetin, H. Karatosun, and M. L. Baydar. “Accuracy of the Polar S810i(TM) Heart Rate Monitor and the Sensewear Pro Armband(TM) to Estimate Energy Expenditure of Indoor Rowing Exercise in Overweight and Obese Individuals.” Journal of Sports Science & Medicine 9.3 (2010), pp. 508–516. [251] SentioFeel. 2019. URL: https : / / www . myfeel . co/ (visited on 03/07/2019). [252] Skagen Falster 2. 2019. URL: https : / / www . skagen . com / en - us/falster2-learn-more# (visited on 03/07/2019). [253] Spire Health. 2019. URL: https://spirehealth.com/ (visited on 03/07/2019). [254] M. Holt, B. Yule, D. Jackson, M. Zhu, and N. Moraveji. “Am- bulatory Monitoring of Respiratory Effort Using a Clothing- Adhered Biosensor.” 2018 IEEE International Symposium on Medi- cal Measurements and Applications (MeMeA). 2018 IEEE Interna- tional Symposium on Medical Measurements and Applications (MeMeA). Rome, Italy: IEEE, June 2018, pp. 1–6. [255] TicWatch S2. 2019. URL: https://www.mobvoi.com/uk/pages/ ticwatchs2 (visited on 03/07/2019). [256] The WellBe Bracelet and App. WellBe - Leave Your Stress Behind. 2020. URL: https://thewellbe.com/ (visited on 02/17/2020). [257] Withings ScanWatch. 2020. URL: https://www.withings.com/ uk/en/scanwatch (visited on 04/07/2020). [258] C. J. Wilson and A. Soranzo. “The Use of Virtual Reality in Psy- chology: A Case Study in Visual Perception.” Computational and Mathematical Methods in Medicine 2015 (2015), pp. 1–7.

211 Bibliography

[259] B. O. Rothbaum, L. F. Hodges, R. Kooper, D. Opdyke, J. S. Williford, and M. North. “Effectiveness of Computer-Generated (Virtual Reality) Graded Exposure in the Treatment of Acropho- bia.” American Journal of Psychiatry 152.4 (Apr. 1995), pp. 626– 628. [260] T. D. Parsons and A. A. Rizzo. “Affective Outcomes of Virtual Reality Exposure Therapy for Anxiety and Specific Phobias: A Meta-Analysis.” Journal of Behavior Therapy and Experimental Psychiatry 39.3 (Sept. 2008), pp. 250–261. [261] The Climb: Official Site - Blog. Feb. 2016. URL: https://www. theclimbgame.com/blog?p=4 (visited on 12/11/2019). [262] S. Rangelova and E. Andre. “A Survey on Simulation Sickness in Driving Applications with Virtual Reality Head-Mounted Dis- plays.” PRESENCE: Virtual and Augmented Reality 27.1 (Mar. 2019), pp. 15–31. [263] P. Zimmer, B. Buttlar, G. Halbeisen, E. Walther, and G. Domes. “Virtually Stressed? A Refined Virtual Reality Adaptation of the Trier Social Stress Test (TSST) Induces Robust Endocrine Re- sponses.” Psychoneuroendocrinology 101 (Mar. 2019), pp. 186–192. [264] B. Kerous, R. Bartacek, R. Roman, P. Sojka, O. Bejcev, and F. Liarokapis. “Social Environment Simulation in VR Elicits a Dis- tinct Reaction in Subjects with Different Levels of Anxiety and Somatoform Dissociation.” International Journal of Human–Com- puter Interaction (Sept. 2019), pp. 1–11. [265] O. Kelly, K. Matheson, A. Martinez, Z. Merali, and H. Anisman. “Psychosocial Stress Evoked by a Virtual Audience: Relation to Neuroendocrine Activity.” CyberPsychology & Behavior 10.5 (Oct. 2007), pp. 655–662. [266] P. Jönsson, M. Wallergård, K. Österberg, Å. M. Hansen, G. Johansson, and B. Karlson. “Cardiovascular and Cortisol Reac- tivity and Habituation to a Virtual Reality Version of the Trier Social Stress Test: A Pilot Study.” Psychoneuroendocrinology 35.9 (Oct. 2010), pp. 1397–1403. [267] M. Wallergård, P. Jönsson, G. Johansson, and B. Karlson. “A Virtual Reality Version of the Trier Social Stress Test: A Pilot Study.” Presence: Teleoperators and Virtual Environments 20.4 (Aug. 2011), pp. 325–336.

212 Bibliography

[268] Y. Shiban, J. Diemer, S. Brandl, R. Zack, A. Mühlberger, and S. Wüst. “Trier Social Stress Test in Vivo and in Virtual Reality: Dissociation of Response Domains.” International Journal of Psy- chophysiology 110 (Dec. 2016), pp. 47–55. [269] E. C. Helminen, M. L. Morton, Q. Wang, and J. C. Felver. “A Meta-Analysis of Cortisol Reactivity to the Trier Social Stress Test in Virtual Environments.” Psychoneuroendocrinology 110 (Dec. 2019), p. 104437. [270] E. A. Hines and G. E. Brown. “A Standard Stimulus for Measuring Vasomotor Reactions: Its Application in the Study of Hyperten- sion.” Proc. Staff Meet. Mayo Clin. 7.332 (1932). [271] E. A. Hines and G. E. Brown. “The Cold Pressor Test for Measur- ing the Reactibility of the Blood Pressure: Data Concerning 571 Normal and Hypertensive Subjects.” American Heart Journal 11.1 (Jan. 1936), pp. 1–9. [272] W. Lovallo. “The Cold Pressor Test and Autonomic Function: A Review and Integration.” Psychophysiology 12.3 (May 1975), pp. 268–282. [273] J. Fagius, S. Karhuvaara, and G. Sundlof. “The Cold Pressor Test: Effects on Sympathetic Nerve Activity in Human Muscle and Skin Nerve Fascicles.” Acta Physiologica Scandinavica 137.3 (Nov. 1989), pp. 325–334. [274] K. Dedovic, R. Renwick, N. K. Mahani, V. Engert, S. J. Lupien, and J. C. Pruessner. “The Montreal Imaging Stress Task: Using Functional Imaging to Investigate the Effects of Perceiving and Processing Psychosocial Stress in the Human Brain.” J Psychiatry Neurosci 30.5 (2005), pp. 319–25. [275] T. L. Gruenewald, M. E. Kemeny, N. Aziz, and J. L. Fahey. “Acute Threat to the Social Self: Shame, Social Self-Esteem, and Cortisol Activity:” Psychosomatic Medicine 66.6 (Nov. 2004), pp. 915–924. [276] J. R. Stroop. “Studies of Interference in Serial Verbal Reactions.” Journal of Experimental Psychology 18.6 (1935), pp. 643–662. [277] Google Scholar Search for ”Stroop Effect” between 1973 and 1991. Dec. 2019. URL: https : / / scholar . google . com / scholar ? q = %22Stroop+effect%22&hl=en&as_sdt=0%2C5&as_ylo=1973&as_ yhi=1991.

213 Bibliography

[278] Google Scholar Search for ”Stroop Effect” between 1992 and .2020 Dec. 2019. URL: https : / / scholar . google . com / scholar ? q = %22Stroop+effect%22&hl=en&as_sdt=0%2C5&as_ylo=1992&as_ yhi=2020. [279] V.Larrue, P.Celsis, A. Bès, and J. P.Marc-Vergnes. “The Functional Anatomy of Attention in Humans: Cerebral Blood Flow Changes Induced by Reading, Naming, and the Stroop Effect.” Journal of Cerebral Blood Flow & Metabolism 14.6 (Nov. 1994), pp. 958–962. [280] J. M. G. Williams and A. Mathews. “The Emotional Stroop Task and Psychopathology.” Psychological Bulletin 120.1 (1996), pp. 3– 24. [281] P.Verhaeghen and L. De Meersman. “Aging and the Stroop Effect: A Meta-Analysis.” Psychology and Aging 13.1 (1998), pp. 120–126. [282] A. Henik and R. Salo. “Schizophrenia and the Stroop Effect.” Behavioral and Cognitive Neuroscience Reviews 3.1 (Mar. 2004), pp. 42–59. [283] J. Tulen, P. Moleman, H. van Steenis, and F. Boomsma. “Char- acterization of Stress Reactions to the Stroop Color Word Test.” Pharmacology Biochemistry and Behavior 32.1 (Jan. 1989), pp. 9– 15. [284] J. P. A. Delaney and D. A. Brodie. “Effects of Short-Term Psycho- logical Stress on the Time and Frequency Domains of Heart-Rate Variability.” Perceptual and motor skills 91.2 (2000), pp. 515–524. [285] P. Renaud and J.-P. Blondin. “The Stress of Stroop Performance: Physiological and Emotional Responses to Color–Word Interfer- ence, Task Pacing, and Pacing Speed.” International Journal of Psychophysiology 27.2 (Sept. 1997), pp. 87–97. [286] H. Usui and Y. Nishida. “The Very Low-Frequency Band of Heart Rate Variability Represents the Slow Recovery Component after a Mental Stress Task.” PLOS ONE 12.8 (Aug. 2017). Ed. by A. Romigi, e0182611. [287] T. M. Eilola and J. Havelka. “Behavioural and Physiological Re- sponses to the Emotional and Taboo Stroop Tasks in Native and Non-Native Speakers of English.” International Journal of Bilin- gualism 15.3 (Sept. 2011), pp. 353–369.

214 Bibliography

[288] M. Mestanik, Z. Visnovcova, and I. Tonhajzerova. “The Assess- ment of the Autonomic Response to Acute Stress Using Elec- trodermal Activity.” Acta Medica Martiniana 14.2 (Sept. 2014), pp. 5–9. [289] W. Boucsein, D. C. Fowles, S. Grimnes, G. Ben-Shakhar, W. T. Roth, M. E. Dawson, and D. L. Filion. “Publication Recommen- dations for Electrodermal Measurements: Publication Standards for EDA.” Psychophysiology 49.8 (Aug. 2012), pp. 1017–1034. [290] M. Svetlak, P. Bob, M. Cernik, and M. Kukleta. “Electrodermal Complexity during the Stroop Colour Word Test.” Autonomic Neuroscience 152.1-2 (Jan. 2010), pp. 101–107. [291] A. A. Rizzo, T. Bowerly, J. G. Buckwalter, D. Klimchuk, R. Mitura, and T. D. Parsons. “A Virtual Reality Scenario for All Seasons: The Virtual Classroom.” CNS Spectrums 11.1 (Oct. 2009), pp. 35–44. [292] D. Wu, C. G. Courtney, B. J. Lance, S. S. Narayanan, M. E. Dawson, K. S. Oie, and T. D. Parsons. “Optimal Arousal Identification and Classification for Affective Computing Using Physiological Sig- nals: Virtual Reality Stroop Task.” IEEE Transactions on Affective Computing 1.2 (July 2010), pp. 109–118. [293] T. D. Parsons, C. C. G, A. Brian, and D. Michael. “Virtual Reality Stroop Task for Neurocognitive Assessment.” Studies in Health Technology and Informatics 163 (2011), pp. 433–439. [294] T. D. Parsons, C. G. Courtney, and M. E. Dawson. “Virtual Reality Stroop Task for Assessment of Supervisory Attentional Process- ing.” Journal of Clinical and Experimental Neuropsychology 35.8 (Oct. 2013), pp. 812–826. [295] T. D. Parsons and C. G. Courtney. “Interactions Between Threat and Executive Control in a Virtual Reality Stroop Task.” IEEE Transactions on Affective Computing 9.1 (Jan. 2018), pp. 66–75. [296] M. Henry, C. C. Joyal, and P. Nolin. “Development and Initial Assessment of a New Paradigm for Assessing Cognitive and Mo- tor Inhibition: The Bimodal Virtual-Reality Stroop.” Journal of Neuroscience Methods 210.2 (Sept. 2012), pp. 125–131. [297] T. D. Parsons and M. D. Barnett. “Virtual Apartment Stroop Task: Comparison with Computerized and Traditional Stroop Tasks.” Journal of Neuroscience Methods 309 (Nov. 2018), pp. 35–40.

215 Bibliography

[298] L. Wagner. Liborw/Stroop. Aug. 2018. URL: https://github.com/ liborw/stroop (visited on 12/24/2019). [299] H. Bartsch and A. DeCastro. ABCD-STUDY/Stroop-Task. ABCD Data Analytics and Informatics Resource Center, Mar. 2016. URL: https://github.com/ABCD-STUDY/stroop-task (visited on 12/24/2019). [300] J. Trinh. Ellojess/Brain_game. Dec. 2019. URL: https://github. com/ellojess/brain_game (visited on 12/24/2019). [301] J. Lee and T. Riley. Orcatechteam/React-Neuropsych-Stroop. OR- CATECH, Oct. 2019. URL: https://github.com/orcatechteam/ react-neuropsych-stroop (visited on 12/24/2019). [302] Siddhantxshirguppe/Colour_quest. July 2017. URL: https://github. com/siddhantxshirguppe/Colour_quest (visited on 12/24/2019). [303] F. Fleury, H. Honda, and G. Albino. Lab-Neuro-Comp/Test- Platform. Laboratório de Neurociência e Comportamento, Dec. 2019. URL: https://github.com/lab-neuro-comp/Test-Platform (visited on 12/25/2019). [304] H. Honda and F. Fleury. Lab-Neuro-Comp/StroopTest. Labo- ratório de Neurociência e Comportamento, Nov. 2017. URL: https: / / github . com / lab - neuro - comp / StroopTest (visited on 12/25/2019). [305] V. Saurus and T. Burleigh. Expfactory-Experiments/Stroop-5min. The Experiment Factory, May 2019. URL: https://github.com/ expfactory-experiments/stroop-5min (visited on 12/25/2019). [306] V. V. Sochat, I. W. Eisenberg, A. Z. Enkavi, J. Li, P. G. Bissett, and R. A. Poldrack. “The Experiment Factory: Standardizing Behavioral Experiments.” Frontiers in Psychology 7 (Apr. 2016). [307] M. Taavoni. mojiTMJ/Stroop-Effect. July 2019. URL: https : / / github.com/mojiTMJ/Stroop-effect (visited on 12/25/2019). [308] PsychoPy v3.0. 2019. URL: https://psychopy.org/ (visited on 12/25/2019). [309] S. T. Mueller. PEBL: The Psychology Experiment Building Lan- guage. Mar. 2019. URL: http://pebl.sourceforge.net/ (visited on 12/25/2019). [310] G. Stoet. “PsyToolkit: A Software Package for Programming Psy- chological Experiments Using Linux.” Behavior Research Methods 42.4 (Nov. 2010), pp. 1096–1104.

216 Bibliography

[311] G. Stoet. “PsyToolkit: A Novel Web-Based Method for Running Online Questionnaires and Reaction-Time Experiments.” Teach- ing of Psychology 44.1 (Jan. 2017), pp. 24–31. [312] G. Stoet. PsyToolkit Stroop Task. 2019. URL: https : / / www . psytoolkit . org / experiment - library / stroop . html (visited on 12/25/2019). [313] Online Stroop Test - The Stroop Effect. 2019. URL: http://www. onlinestrooptest.com/ (visited on 12/13/2019). [314] O. T. Inan and G. T. A. Kovacs. “An 11 µW, Two-Electrode Trans- impedance Biosignal Amplifier with Active Current Feedback Stabilization.” IEEE Transactions on Biomedical Circuits and Sys- tems 4.2 (Apr. 2010), pp. 93–100. [315] S. Å. Bood, U. Sundequist, A. Kjellgren, T. Norlander, L. Nord- ström, K. Nordenström, and G. Nordström. “Eliciting the Re- laxation Response with the Help of Flotation-Rest (Restricted Environmental Stimulation Technique) in Patients with Stress- Related Ailments.” International Journal of Stress Management 13.2 (2006), pp. 154–175. [316] J. Turner, W. Gerard, J. Hyland, P. Nieland, and T. Fine. “Effects of Wet and Dry Flotation REST on Blood Pressure and Plasma Cortisol.” Clinical and Experimental Restricted Environmental Stimulation. Ed. by A. F. Barabasz and M. Barabasz. New York, NY: Springer New York, 1993, pp. 239–247. [317] P. Suedfeld, E. J. Ballard, and M. Murphy. “Water Immersion and Flotation: From Stress Experiment to Stress Treatment.” Journal of Environmental Psychology 3.2 (June 1983), pp. 147–155. [318] D. van Dierendonck and J. Te Nijenhuis. “Flotation Re- stricted Environmental Stimulation Therapy (REST) as a Stress- Management Tool: A Meta-Analysis.” Psychology & Health 20.3 (June 2005), pp. 405–412. [319] J. Lauber. “Underwater Electrocardiography.” Master’s Thesis. Erlangen: FAU Erlangen-Nürnberg, 2016. [320] V. von Tscharner, C. Maurer, F. Ruf, and B. M. Nigg. “Compari- son of Electromyographic Signals from Monopolar Current and Potential Amplifiers Derived from a Penniform Muscle, the Gas- trocnemius Medialis.” J Electromyogr Kines 23.5 (2013), pp. 1044– 1051.

217 Bibliography

[321] J. W. Whitting and V. von Tscharner. “Monopolar Electromyo- graphic Signals Recorded by a Current Amplifier in Air and under Water without Insulation.” Journal of Electromyography and Ki- nesiology 24.6 (Dec. 2014), pp. 848–854. [322] V. von Tscharner, C. Maurer, and B. M. Nigg. “Correlations and Coherence of Monopolar EMG-Currents of the Medial Gastroc- nemius Muscle in Proximal and Distal Compartments.” Frontiers in Physiology 5 (June 2014). [323] D. Tantinger, S. Feilner, D. Schmitz, C. Weigand, C. Hofmann, and M. Struck. “Evaluation of QRS Detection Algorithm Imple- mented for Mobile Applications Based on ECG Data Acquired from Sensorized Garments.” Biomedical Engineering / Biomedi- zinische Technik 57 (SI-1 Track-F Jan. 2012). [324] N. V. Thakor, J. G. Webster, and W. J. Tompkins. “Estimation of QRS Complex Power Spectra for Design of a QRS Filter.” IEEE Transactions on Biomedical Engineering BME-31.11 (Nov. 1984), pp. 702–706. [325] S. C. Kwatra and V. K. Jain. “A New Technique for Monitor- ing Heart Signals-Part I: Instrumentation Design.” IEEE Trans Biomed Eng BME-33.1 (1986), pp. 35–41. [326] J. D. Schipke and M. Pelzer. “Effect of Immersion, Submersion, and Scuba Diving on Heart Rate Variability.” Brit J Sport Med 35.3 (2001), pp. 174–180. [327] W. Wittling and R. A. Wittling. Herzschlagvariabilität: Frühwarn- system, Stress-Und Fitnessindikator: Grundlagen - Messmethoden - Anwendungen. Eichsfeld-Verlag, 2012. [328] F. Shaffer and J. P. Ginsberg. “An Overview of Heart Rate Vari- ability Metrics and Norms.” Frontiers in Public Health 5 (Sept. 2017). [329] R. J. Prineas, R. S. Crow, and Z.-M. Zhang. The Minnesota Code Manual of Electrocardiographic Findings. Springer Science \& Business Media, 2009. [330] A. Brubakk and T. S. Neuman. Bennett and Elliotts’ Physiology and Medicine of Diving. 5th ed. W.B. Saunders, 2006. [331] A. Boussuges, F. Blanc, and D. Carturan. “Hemodynamic Changes Induced by Recreational Scuba Diving.” CHEST Journal 129.5 (2006), pp. 1337–1343.

218 Bibliography

[332] J. Zhang. “Effect of Age and Sex on Heart Rate Variability in Healthy Subjects.” Journal of Manipulative and Physiological Ther- apeutics 30.5 (June 2007), pp. 374–379. [333] S. Gradl. Hearty. Mar. 2019. URL: https://github.com/gradlman/ hearty (visited on 02/24/2020). [334] S. Gradl and R. Richer. SensorLib. Machine Learning and Data Analytics Lab FAU, Nov. 2019. URL: https://github.com/mad-lab- fau/SensorLib (visited on 02/24/2020). [335] S. Gradl, A. Heinrich, and H. Leutheuser. JELY. Machine Learning and Data Analytics Lab FAU, Feb. 2020. URL: https://github.com/ mad-lab-fau/JELY (visited on 02/24/2020). [336] S. Gradl. PlotView. Machine Learning and Data Analytics Lab FAU, Feb. 2020. URL: https://github.com/mad-lab-fau/PlotView (visited on 02/24/2020). [337] J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler, and E. André. “The Social Signal Interpretation (SSI) Framework: Multimodal Signal Processing and Recognition in Real-Time.” Proceedings of the 21st ACM International Conference on Multime- dia - MM ’13. The 21st ACM International Conference. Barcelona, Spain: ACM Press, 2013, pp. 831–834. [338] I. Damian, M. Dietz, and E. André. “The SSJ Framework: Aug- menting Social Interactions Using Mobile Signal Processing and Live Feedback.” Frontiers in ICT 5 (June 2018), p. 13. [339] W. Brunette, R. Sodt, R. Chaudhri, M. Goel, M. Falcone, J. Van Orden, and G. Borriello. “Open Data Kit Sensors: A Sensor Inte- gration Framework for Android at the Application-Level.” Pro- ceedings of the 10th International Conference on Mobile Systems, Applications, and Services - MobiSys ’12. The 10th International Conference. Low Wood Bay, Lake District, UK: ACM Press, 2012, p. 351. [340] Google Inc. Android Developers. 2020. URL: https://developer. android.com/ (visited on 02/12/2020). [341] J. C. Gunther. “Algorithm 938: Compressing Circular Buffers.” ACM Transactions on Mathematical Software 40.2 (Mar. 2014), pp. 1–12.

219 Bibliography

[342] S. Greenwald. “Improved Detection and Classification of Arrhyth- mias in Noise-Corrupted Electrocardiograms Using Contextual Information.” Ph.D. thesis. Harvard-MIT Division of Health Sci- ences and Technology, 1990. [343] G. Nallathambi and J. C. Principe. “Integrate and Fire Pulse Train Automaton for QRS Detection.” IEEE Transactions on Biomedical Engineering 61.2 (Feb. 2014), pp. 317–326. [344] Apple Inc. Apple Watch Breathe App. 2020. URL: https://support. apple.com/en-gb/guide/watch/apd371dfe3d7/watchos (visited on 01/15/2020). [345] F. Nolle, F. Badura, J. Catlett, R. Bowser, and M. Sketch. “CREI- GARD, a New Concept in Computerized Arrhythmia Monitoring Systems.” Computers in Cardiology 13 (1986), pp. 515–518. [346] T. Zillig. “Biosignal Visualization in Virtual and Augmented Re- ality.” Erlangen, Germany: FAU Erlangen-Nürnberg, 2016. [347] R. Richer, T. Maiwald, C. Pasluosta, B. Hensel, and B. M. Eskofier. “Novel Human Computer Interaction Principles for Cardiac Feed- back Using Google Glass and Android Wear.” 2015 IEEE 12th Inter- national Conference on Wearable and Implantable Body Sensor Networks (BSN). 2015 IEEE 12th International Conference on Wearable and Implantable Body Sensor Networks (BSN). Cam- bridge, MA, USA: IEEE, June 2015, pp. 1–6. [348] A. Gorini, F. Pallavicini, D. Algeri, C. Repetto, A. Gaggioli, and G. Riva. “Virtual Reality in the Treatment of Generalized Anxiety Disorders.” Studies in Health Technology and Informatics (2010), pp. 39–43. [349] L. Chittaro. “Anxiety Induction in Virtual Environments: An Experimental Comparison of Three General Techniques.” Inter- acting with Computers 26.6 (Nov. 2014), pp. 528–539. [350] J. Gielen. “EMG Biofeedback for Virtual Reality Therapy.” 2011. [351] M. Hassenzahl, M. Burmester, and F. Koller. “AttrakDiff: Ein Fragebogen zur Messung wahrgenommener hedonischer und pragmatischer Qualität.” Mensch & Computer 2003. Ed. by G. Szwillus and J. Ziegler. Vol. 57. Wiesbaden: Vieweg+Teubner Verlag, 2003, pp. 187–196.

220 Bibliography

[352] M. Schrepp, T. Held, and B. Laugwitz. “The Influence of Hedon- ic Quality on the Attractiveness of User Interfaces of Business Management Software.” Interacting with Computers 18.5 (Sept. 2006), pp. 1055–1069. [353] B. Winters. “The Non-Diegetic Fallacy: Film, Music, and Narrative Space.” Music and Letters 91.2 (2010), pp. 224–244. [354] J. V. Pardo, P. J. Pardo, K. W. Janer, and M. E. Raichle. “The An- terior Cingulate Cortex Mediates Processing Selection in the Stroop Attentional Conflict Paradigm.” Proceedings of the Na- tional Academy of Sciences 87.1 (Jan. 1990), pp. 256–259. [355] S. Gradl and A. Wonner. StroopRoom. Machine Learning and Data Analytics Lab FAU, Dec. 2019. URL: https://github.com/ mad-lab-fau/StroopRoom (visited on 02/24/2020). [356] A. Wonner. “The Stroop Room - A Virtual Reality Enhanced Stroop Test.” Bachelor’s Thesis. Erlangen: Friedrich-Alexander- Universität Erlangen-Nürnberg (FAU), 2019. [357] R. Yao, T. Heath, A. Davies, T. Forsyth, N. Mitchell, and P. Hoberman. Oculus VR Best Practices Guide. 2014. [358] J. Janson and N. Rohleder. “Distraction Coping Predicts Better Cortisol Recovery after Acute Psychosocial Stress.” Biological Psychology 128 (Sept. 2017), pp. 117–124. [359] W. S. Helton. “Validation of a Short Stress State Questionnaire.” Proc Hum Factors Ergonomics Society Annual Meeting 48.11 (2004), pp. 1238–1242. [360] S. A. Jackson and H. W. Marsh. “Development and Validation of a Scale to Measure Optimal Experience: The Flow State Scale.” Journal of Sport and Exercise Psychology 18.1 (Mar. 1996), pp. 17– 35. [361] E. M. Klein, E. Brähler, M. Dreier, L. Reinecke, K. W. Müller, G. Schmutzer, K. Wölfling, and M. E. Beutel. “The German Version of the Perceived Stress Scale – Psychometric Characteristics in a Representative German Community Sample.” BMC Psychiatry 16.1 (Dec. 2016), p. 159. [362] R. N. Davis and S. Nolen-Hoeksema. “Cognitive Inflexibility Among Ruminators and Nonruminators.” Cognitive Therapy and Research 24.6 (2000), pp. 699–711.

221 Bibliography

[363] S. Nolen-Hoeksema and J. Morrow. “A Prospective Study of De- pression and Posttraumatic Stress Symptoms After a Natural Di- saster: The 1989 Loma Prieta Earthquake.” Journal of Personality and Social Psychology 61.1 (1991), pp. 115–121. [364] M. Hautzingerand M. Bailer. Allgemeine Depressions Skala. Beltz, 1993. [365] Y. I. Kuras, C. M. McInnis, M. V. Thoma, X. Chen, L. Hanlin, D. Gianferante, and N. Rohleder. “Increased Alpha-Amylase Re- sponse to an Acute Psychosocial Stress Challenge in Healthy Adults with Childhood Adversity.” Developmental Psychobiology 59.1 (Jan. 2017), pp. 91–98. [366] W. Boucsein. Electrodermal Activity. 2nd Edition. New York: Springer, 2012. 618 pp. [367] Y. Xu, H. Park, and Y. Baek. “A New Approach Toward Digital Storytelling: An Activity Focused on Writing Self-Efficacy in a Vir- tual Learning Environment.” Journal of Educational Technology & Society 14.4 (2011), pp. 181–191. [368] S. A. Jackson, A. J. Martin, and R. C. Eklund. “Long and Short Measures of Flow: Examining Construct Validity of the FSS-2, DFS-2, and New Brief Counterparts.” Journal of Sport and Exercise Psychology 30 (2008), pp. 561–587. [369] G. M. Redding and D. A. Gerjets. “Stroop Effect: Interference and Facilitation with Verbal and Manual Responses.” Perceptual and Motor Skills 45.1 (Aug. 1977), pp. 11–17. [370] J. Wagner, J. Kim, and E. André. “From Physiological Signals to Emotions: Implementing and Comparing Selected Methods for Feature Extraction and Classification.” 2005 IEEE International Conference on Multimedia and Expo. 2005 IEEE International Conference on Multimedia and Expo. Amsterdam, The Nether- lands: IEEE, 2005, pp. 940–943. [371] J. Kim and E. André. “Emotion Recognition Based on Physiologi- cal Changes in Music Listening.” IEEE Transactions on Pattern Analysis and Machine Intelligence 30.12 (Dec. 2008), pp. 2067– 2083. [372] J. R. Quinlan. “Induction of Decision Trees.” Machine Learning 1.1 (Mar. 1986), pp. 81–106.

222 Bibliography

[373] S. R. Singh, H. A. Murthy, and T. A. Gonsalves. “Feature Selection for Text Classification Based on Gini Coefficient of Inequality.” JMLR. 2010, pp. 76–85. [374] I. Kononenko, E. Simec, and M. Robnik-Sikonja. “Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF.” Applied Intelligence 7.1 (1997), pp. 39–55. [375] L. Yu and H. Liu. “Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution.” Proceedings of the 20th International Conference on Machine Learning (ICML-03). 2003, p. 8. [376] C. M. Bishop. Pattern Recognition and Machine Learning. Infor- mation Science and Statistics. New York: Springer, 2006. 738 pp. [377] H. Bidoggia, J. P. Maciel, N. Capalozza, S. Mosca, E. J. Blaksley, E. Valverde, G. Bertran, P. Arini, M. O. Biagetti, and R. A. Quinteiro. “Sex-Dependent Electrocardiographic Pattern of Cardiac Repo- larization.” American Heart Journal 140.3 (Sept. 2000), pp. 430– 436. [378] B. Surawicz and S. R. Parikh. “Prevalence of Male and Female Patterns of Early Ventricular Repolarization in the Normal ECG of Males and Females from Childhood to Old Age.” Journal of the American College of Cardiology 40.10 (Nov. 2002), pp. 1870–1876. [379] R. Sinnreich, J. D. Kark, Y. Friedlander, D. Sapoznikov, and M. H. Luria. “Five Minute Recordings of Heart Rate Variability for Popu- lation Studies: Repeatability and Age-Sex Characteristics.” Heart 80.2 (Aug. 1998), pp. 156–162. [380] B. M. Kudielka, A. Buske-Kirschbaum, D. H. Hellhammer, and C. Kirschbaum. “Differential Heart Rate Reactivity and Recovery after Psychosocial Stress (TSST) in Healthy Children, Younger Adults, and Elderly Adults: The Impact of Age and Gender.” Inter- national Journal of Behavioral Medicine 11.2 (June 2004), pp. 116– 121. [381] J. S. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. “Algorithms for Hyper-Parameter Optimization.” Advances in Neural Information Processing Systems 24 (NIPS 2011) (2011), pp. 2546–2554. [382] J. Bergstra and Y. Bengio. “Random Search for Hyper-Parameter Optimization.” Journal of Machine Learning Research 13 (Feb 2012), pp. 281–305.

223 Bibliography

[383] J. Borresen and M. I. Lambert. “Autonomic Control of Heart Rate during and after Exercise: Measurements and Implications for Monitoring Training Status.” Sports Medicine 38.8 (2008), pp. 633–646. [384] R. Metz. Get Ready to Throw Toys with Your Mind (in VR, at Least). Aug. 2017. URL: https://www.technologyreview.com/s/608574/ mind-controlled-vr-game-really-works/ (visited on 02/10/2020). [385] J. W. Ahn, Y. Ku, and H. C. Kim. “A Novel Wearable EEG and ECG Recording System for Stress Assessment.” Sensors 19.9 (Apr. 2019), p. 1991. [386] H. Riesch and C. Potter. “Citizen Science as Seen by Scientists: Methodological, Epistemological and Ethical Dimensions.” Pub- lic Understanding of Science 23.1 (Jan. 2014), pp. 107–120. [387] Robert Koch-Institut. Corona-Datenspende. 2020. URL: https: //corona-datenspende.de/ (visited on 04/14/2020). [388] BMBF-Internetredaktion. Citizen Science - BMBF. Oct. 2019. URL: https://www.bmbf.de/de/citizen-science-wissenschaft-erreicht- die-mitte-der-gesellschaft-225.html (visited on 02/15/2020).

224

UNIVERSITY PRESS

Stefan Gradl

The Stroop Room A Wearable Virtual Reality Stress Laboratory The Stroop Room The Stroop Based on the Electrocardiogram

ISBN 978-3-96147-384-7 FAU UNIVERSITY PRESS 2020 FAU Stefan Gradl