<<

Wireless Sensing for Medical Applications

Abdelwahed Khamis

Dissertation submitted in fulfilment of the requirements for the degree of

Doctor of Philosophy in Computer Science and Engineering

School of Computer Science and Engineering

Supervisor: Wen Hu

Co-supervisors: Chun Tung Chou and Brano Kusy

February 2020 '' ''&((#"(

  )&"! ! -!    *"!'     &*(#"#&&'*"("*&'(- "&     ) (-   ##       ''(             " !""% !!!" $""!" !!!" ## ' %" "!""!"!#"'#"  " $  $ "!#" "!"!!&"! !!  % !!$!" $"!!"! "#"!"!%   % !!! '!$"!" "" !"!'!" #"""   ' "! " %  ! $ "!%" #" #! "!"'!#"!  "  !!% !!" "!!!"%""! !   #'  !"  # "!"!!"!"!$!' '# ' !!  !!"!!'!"! """!! !""""!" !"$ "!%"#"$" $'"""! "%!%% ""! '$ $! !! "%!! % !!"%!!" "$"!% ! !!'!"!

  &(#"& ("(#'$#'(#"#$&#((''''&((#"  &-&"((#("*&'(-#+#)( '#&('"('"#", )'* "(#&*"(#! *  " )"(#!!&'#( $) !-(''#&''&((#""+# #&"$&("("*&'(- &&'" #&!'#!"#+#&&(& "#+"  "#+ (( &("  "( () $&#$&(-&('+')''("!-(''#&''&((#"')'#$-&("$("(&('')((#$$   +  '#&("( &((#)' #&$&(#!-(''#&''&((#"")()&+#& '')'&( '#&## '        ......  ......... ... "()&  ( "*&'(-&#"''(((&!-,$(#" &)!'("'&%)&"&'(&(#"'#"#$-"#&#"(#"'#")'%)'('#&&'(&(#" #&$&##)$(#-&'"!+"')!(("(" #$'#-#)&(''(#(&&-%)'('#& #"&$&## &'(&(#"!-#"'&",$(#" &)!'("'"&%)&($$&#* #( "# &)('&  

ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed ……………………………………………......

Date ……………………………………………......     

  %%'""!$&(! &%"'% &$$%$$%'&%'$ &$ &'$ %& & %% *  &! $!'$  -%$,$('+' -+ $',# $*,# +$+$'%$ -(! #), *$!  •  &! &$'&$&$& !&! & & &"'&!  %& ,"$ $+'&!$-& &)%$%"! %"$ $+!$&"  *'&!   "$"$&! !&)!$!$"'&!  •  &%""$!(&! '&"'&!  &$&%% '! "&$ $! &$%'"$(%!$ !%&$'& !!$ &!$ • "'&! % !&%'&&! +!&! %!$! &$&'$  &%)&&$ "$&+&&)!'! %&$ &% '%!  &&%%  % &)&$&%&%%! & %"'% &$!$ !& 

%&%%! & % !"'&! %&$"'%!$%' &&!$"'&!             

! !&)!$%$ &%&%%% "'% &%   !'  & &$( & "&$%)& !)  &            

%&%%%"'&! %&$"'%!$%' &&!$"'&!    !$"!$& &!& '!"&$ &&%$"$% &!)    0    $&&  • (! ")&& %% *  &! $!'$ • )$ ('%"'&!  '! "&$&%&"'&! %!) &%&$#'$  &%&! ' &&%% ' $ , 0+&  $"',-*  ,  &&//    %.# #&$+    

        )  ' "" $ !"' % #"! "!"!&#!$ "  $  "  $ # "  !  " # ' "!!   !! ""%  "" $ !"'  ! !%   " % %""  """#  "' "!%!#!!" '"!! !! ""!#!' """ "!!#""% ! "" ""#!  "'"!! !! ""#"# % !!# ! "! !*  ) '!#!"" "!' "" #!"!"!!% "" !!  #!!" "' "" ! $ "# $ !""!!*   (((((((((((((((((

"(((((((((((((((((

     )  "'"""   '!""'! " #$""'  $$ !'"!!*

 (((((((((((((((((

"((((((((((((((((( Abstract

A transmitted wireless signal travelling at the speed of light in indoor spaces goes on an intriguing journey in which it reflects off ambient objects and gets modulated by human motion before reaching the receiver. Leveraging this fun- damental principle, this thesis exploits radio signals from commercial wireless devices to enable health sensing applications. The practical outcomes of this work range from wireless-based physiological vital sign monitoring to the

first system for automatic tracking of Hand Hygiene practices of healthcare workers.

To deliver this, we introduced techniques and algorithms to analyse human motions form reflected radio signals while addressing the practical challenges associated with the noise presence in Radio Frequency (RF) signal and the requirements of the sensing applications themselves. By relying purely on RF signals for sensing, these systems operate in a contact-less manner, agnostic to lighting conditions and can fit in residential and clinical environments without invading the privacy of the inhabitants. In effect, we show how the capabilities of commercially available RF devices can be harnessed for health and well- being sensing while addressing the downsides of alternative modalities (e.g., wearables and camera-based systems). Acknowledgements

This work is partially funded by a CISCO Research Center University Grant.

In the name of Almighty Allah, the most compassionate, the most merciful.

All praise to Him, who bestowed upon me the strength to accomplish this disseration.

I cannot hope to thank adequately those who helped me in the prepara- tion of this dissertation. I am especially indebted to my advisor Wen Hu.

During the past four years, I have thouroughly enjoyed our collaboration that involved intense intellectual discussions, getting me invloved in interesting re- search projects and personal support whenever I needed it. Without his sup- port, this dissertation would not have been possible. My sincere apprecitation goes to my co-supervisors Chun Tung Chou and Brano Kusy. I can’t thank them enough for their feedback, encouragment and involvement in every mile- stone of this dissertation.

My thanks goes to Marylouise McLaws (School of Medicine, UNSW, Syd- ney) for introducing me to the Hand Hygiene tracking problem described in

Chapter 3, and providing domain expert’s input. My conversations with Hong

Jia (CSE, UNSW, Sydney) have always been enlighting and his suggestions improved the work presented in Chapter 3. I would like to express my grati- tudes to Sara Khalifa (CSIRO, Brisbane) for her comments and the engaging

iii iv discussions during the thesis writing stage.

I am blessed being in the company of Mahmoud Gamal, Mohammed Jad- doa, Firas Al-Doughman and Ahmed Saadeh. They have been there for me over the past few years, cheering me up on my darkest days. A very special words of gratitude go to my friends Khalifa Eissa and Mahmoud Saied. Despite the thousands of kilometers between us, their good memories always brought me happiness and helped me to survive hard times.

I am deeply grateful to my parents whose prayers, commitment to education and unconditional love led me to where I am today.

Above all, my profoundest thanks to my dear wife, Eman who went through the whole journey with me. This is as much her accomplishment as mine.

The last word goes for Moez, my beloved son, who has always been the source of much joy and happiness for me. This dissertation is dedicated to him.

Abdelwahed Khamis

February 2020 Brisbane, Australia

July 20, 2020 Publications

Journal Publications

1. Abdelwahed Khamis, Chun Tung Chou, Branislav Kusy and Wen Hu,

“WiRelax: Towards Real-time Respiratory Biofeedback During

Meditation Using WiFi.” Elsevier Ad Hoc Networks Journal , in-

press, accepted in May 2020.

Publication revised in Chapter 4 Conference Publications

2. Abdelwahed Khamis, Chun Tung Chou, Branislav Kusy and Wen Hu,

“CardioFi: Enabling Heart Rate Monitoring on Unmodified

COTS WiFi Devices.” In Proceedings of the International Confer-

ence on Mobile and Ubiquitous Systems: Computing, Networking and

Services. MobiQuitous ’18.

Publication revised in Chapter 5

3. Qi Lin, Weitao Xu, Jun Liu, Abdelwahed Khamis, Wen Hu, Mahbub

Hassan, and Aruna Seneviratne. “H2B: Heartbeat-based Secret

Key Generation using Piezo Vibration Sensors.” In Proceedings of

the International Conference on Information Processing in Sensor Net-

works, IPSN ’19. v vi

4. Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Marylouise

McLaws and Wen Hu, “Poster: A Weakly Supervised Tracking

of Hand Hygiene Technique.” In Proceedings of the International

Conference on Information Processing in Sensor Networks. IPSN ’20.

Publication revised in Chapter 3

5. Isura Nirmal, Abdelwahed Khamis, Mahbub Hassan,Wen Hu and Xiao-

qing Zhu, “Poster: Combating Transceiver Layout Variation in

Device-Free WiFi Sensing using Convolutional Autoencoder.”

In Proceedings of the International Conference on Information Process-

ing in Sensor Networks. IPSN ’20.

Under Review

6. Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Marylouise

McLaws and Wen Hu, “RFWash: A Weakly Supervised Track-

ing of Hand Hygiene Technique.” (Under review in a conference)

Manuscript revised in Chapter 3

7. Isura Nirmal, Abdelwahed Khamis, Mahbub Hassan and Wen Hu, “Deep

Learning for Radio-based Device-free Human Sensing: Recent

Advances and Future Directions” (Under review in a journal)

July 20, 2020 vii

Statement of contributions of student author

Publication Contributions 1 Designing and conducting the experiment and data analysis and writing the manuscript 2 Conducting video attack experiments and writing the relevant section in the manuscript. 3 Designing and conducting the experiments and data analysis and writing the manuscript 4 Designing and conducting the experiments and data analysis and writing the manuscript 5 Writing the manuscript 6 Designing and conducting the experiments and data analysis and writing the manuscript 7 Writing the manuscript

July 20, 2020 Awards

1. Best Poster Award IPSN 2020

• Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Marylouise

McLaws and Wen Hu, “Poster: A Weakly Supervised Track-

ing of Hand Hygiene Technique.” In Proceedings of the Inter-

national Conference on Information Processing in Sensor Networks.

IPSN ’20.

viii Contents

Abstract ii

Acknowledgements iii

Publications v

Awards viii

Acronyms xxi

1 Introduction 1

1.1WirelessSignalsforMedicalSensing...... 1

1.2 Systems Developed ...... 3

1.2.1 Hand Hygiene Monitoring with Radio Frequency signal . 4

1.2.2 HumanRespiratoryBiofeedbackwithWiFi...... 6

1.2.3 HeartbeatMonitoringwithWiFi...... 7

1.3 Beyond the Applications Considered ...... 8

2 Background and Related Work 9

2.1SensingUsingRadioFrequencySignals...... 9

2.2 Foundations ...... 21

2.2.1 RF-basedGestureRecognition...... 21

2.2.2 Sequence Labelling ...... 23 ix Contents x

2.2.3 RF-basedVitalSignMonitoring...... 24

2.3 Related Works ...... 25

2.3.1 HandHygieneMonitoring...... 25

2.3.2 Breathing Biofeedback ...... 26

2.3.3 HeartbeatEstimation...... 29

3 Hand Hygiene Monitoring 31

3.1 Motivation ...... 36

3.1.1 HandHygieneMonitoringintheRealWorld...... 36

3.1.2 RFSensingforHHmonitoring...... 37

3.2 Background and Technical Motivation ...... 39

3.2.1 Back-to-back Gesture Tracking ...... 40

3.3RFWash...... 45

3.3.1 Why deep sequence labelling? ...... 46

3.3.2 RF Measurements ...... 47

3.3.3 Deep Learning Model ...... 51

3.4 Evaluation ...... 56

3.4.1 Goals, Methodology and Metrics ...... 56

3.4.2 WeaklySupervisedGestureTracking...... 58

3.4.3 Unseen Domains ...... 62

3.4.4 Comparison with Fully Supervised Gesture Recognition

Deep Learning Models ...... 68

3.4.5 LSTM vs BiLSTM ...... 70

3.5 Related Work ...... 70

3.6 Discussion and Future Work ...... 72

3.7 Conclusions ...... 76

July 20, 2020 Contents xi

4 Breathing Biofeedback 77

4.1 Overview ...... 82

4.1.1 Experimental Observation ...... 82

4.1.2 WiRelaxoverview...... 84

4.2WiRelaxSystem...... 85

4.2.1 The impact of Chest Displacement on Sub-carriers’ Phase

Difference...... 86

4.2.2 Modeling the Impact of Displacement on Phase Difference 88

4.2.3 RelativeDisplacementEstimation...... 91

4.3 Evaluation ...... 100

4.3.1 Goals, Metrics and Methodology ...... 100

4.3.2 CapturingBreathingCycles...... 101

4.3.3 Capturing Complete Breathing Pattern ...... 104

4.4 Related Work ...... 107

4.5 Limitations and Future Work ...... 108

4.6 Conclusions ...... 110

5 Heartbeat Estimation 112

5.1 Motivation ...... 115

5.2CardioFi...... 118

5.2.1 Background ...... 119

5.2.2 Preprocessing ...... 121

5.2.3 Sub-carrier Selection ...... 123

5.2.4 HeartRateEstimation...... 127

5.3 Evaluation ...... 127

5.3.1 Overall Performance ...... 129

5.3.2 Impact of Parameters ...... 131

July 20, 2020 Contents xii

5.4 Related Works ...... 133

5.5 Limitations and Future Work ...... 133

5.6 Conclusion ...... 134

6 Conclusions 135

6.1 Summary of Contributions ...... 135

6.2 Future Work ...... 136

6.2.1 Limitations ...... 137

6.2.2 Future Applications ...... 138

References 142

July 20, 2020 List of Figures

1.1 Hand Hygiene Tracking using Radio Frequency signal .5

1.2 WiRelax System WiRelax system leverages WiFi communica-

tion to provide each subject with instantaneous respiratory feed-

back during the breathing exercise session. The video demon-

strationisavailableat[1]...... 6

2.1 Conceptual illustration of the medical sensing applica-

tions ...... 11

2.2 RF is a genuinely anonymous signal: RF ranging data

has much lower resolution than frames from a co-located Kinect

depth camera and represents an alternative signal for genuine

anonymity...... 18

3.1 The alcohol-based handrub procedure recommended by

the WHO [2]. The 9 steps are marked by the labels G1, G2 etc. 32

3.2 Gesture sequence recognition. Conceptual illustration

of the proposed gestures tracking model (bottom row). Unlike

conventional gesture recognition (top row), the proposed model

is trained on unsegmented hand hygiene gestures and predicts

labels for whole sequences of gestures in run-time...... 34

xiii List of Figures xiv

3.3 Timeline of HH technique of a practicing healthcare

worker...... 39

3.4 Back-to-Back versus manually segmented gestures .. 41

3.5 Classification is highly dependent on segmentation quality in

RFgesturerecognitionsystems[4]...... 42

3.6 RFWash is trained on continuous RF samples (A) of HH

gestures and corresponding sequence labels. The model auto-

matically learns which frames correspond to individual gestures

(e.g., G1 vs G2) via ’alignment learning’. In run-time, per-frame gesture predictions (B) are produced and used to estimate the

most likely gesture sequence ...... 46

3.7 Range Doppler frames measurements ...... 48

3.8 Illustration of Range Doppler frame post-processing for

gesture G2. The figure shows the original RD frame, RD frame withbackgroundremoved,andasmoothedversion...... 51

3.9 RFWash Network Architecture. RFWash network has five

convolutions followed by a max pooling layer (2x2) and fully

connected layer, followed by two Bi-directional LSTM layers

and finally a softmax layer. All convolutions are 3 × 3 (the

number of filters are denoted in each box)...... 51

3.10 Concatenation Intuition. We significantly grow samples con-

taining a specific sequence ( [G7] ) by concatenating it with

other sequences in the training set ( [G7,Gx]or[Gx,G7] in the

middle column ). Consequently, G7 is seen in many contexts by the model and enables better learning of radar frames that

correspond to it within the sequence ( [G6,G7] right column). . 56 3.11 RFWash evaluation for different gesture sequence lengths. 59 July 20, 2020 List of Figures xv

3.12 Alignment performance w.r.t training sequence length .60

3.13 The impact of data augmentation...... 62

3.14 Impact of unseen sequence length to the performance

of RFWash. Vertical shaded areas in the figures highlights the

sequence length used in the training...... 64

3.15 Temporal HH gesture alignment. GT: Ground truth. aug:

with data augmentation. w/o: without data augmentation . . . 65

3.16 The benefit of data augmentation. (a) G3 retraction and protraction result in angular blobs in RD frames (red border in

(c)). (b) and (c) shows G3 RD frames predicted by the model w/augmentation and w/o augmentation .Images transparency

are inversely proportional to the posterior value for the frame. . 66

3.17 The impact of unseen gesture sequences...... 67

3.18 Healthcare Worker Indetification Perfromance ...... 74

4.1 WiRelax Concept. WiRelax leverages WiFi communication

to provide users with instantaneous respiratory feedback during

breathing exercises sessions. Video demonstration here: [1]) . . . 79

4.2 Cycle counting versus instantaneous breath tracking:

Most CSI streams agree on breath cycle counts (three cycles

in the orange segment); however, there is a lack of consensus

about instantaneous breath (i.e., whether the subject is inhaling

or exhaling and at what depth [see the red segment]). The

WiRelax addresses instantaneous breath tracking...... 80

4.3 A sample breathing session...... 82

4.4 Recorded amplitude and phase difference for the breath-

ing signal of Figure 4.3...... 83

July 20, 2020 List of Figures xvi

4.5 Illustration of WiRelax architecture. WiRelax is meant to

be a framework that supports conscious breathing applications

by providing real-time detailed breathing waveform...... 85

4.6 The effect of object displacement on phase and phase

difference (PD) ...... 86

4.7 Illustrative example annotated with key symbols used

in the model derivation...... 89

4.8 Preprocessing. Preprocessing depicted for a 1-minute seg-

ment of breathing session ( rate 40bpm ). Time is on x-axis. The

lower row shows the pre-processing sequence applied to a single

sub-carrier (#9) while the upper row shows the pre-processing

effect on all sub-carriers (sub-carriers numbers on y-axis). The

values in the lower sub-figures were scaled to range from 0–1 for

each sub-carrier for visualisation purposes...... 93

4.9 Sub-carriers Correlation : The correlation between all sub-

carriers and the ground truth (GT) for a subject breathing nor-

mally.Matrix rows are ordered by their variance, with the high-

est value at the top. Sub-carriers with higher variance show a

better correlation with GT in general...... 94

4.10 Breathing Waveform Estimation ...... 96

July 20, 2020 List of Figures xvii

4.11 WiRelax breath cycles tracking. (a) estimated relative dis-

placement waveform compared to reference displacement wave-

form. Peaks and valleys positions are visualized on top and

bottom, respectively, slanting lines signify the deviation direc-

tion [as shown in the closeup (b)]. (c) breath tracking accuracy

for various metrics. The red lines denote the medians: 0.21

seconds , 12.2% and 14.4% for cycle timing error, relative tim-

ing error and IER error respectively. The correlation and root

mean squared error between the estimated waveform and refer-

encesignalarealsoreported...... 98

4.12 Estimated waveforms for various breathing patterns.

(a), (c) and (e) show the estimated waveforms for deep breath-

ing, deep & normal breathing and quick breathing sessions, re-

spectively. (b), (d) and (f) show scatter plots of estimated rel-

ative displacement in relation to the true displacement...... 99

4.13 WiRelax biofeedback prototype (demo: [1]) ...... 102

4.14 WRelax Evaluation. Evaluation of different breathing ac-

curacy metrics with respect to distance between the user and

WiRelaxsystem...... 103

July 20, 2020 List of Figures xviii

4.15 Breathing cycle estimations from three algorithms: Liu

et al. [5], PhaseBeat and WiRelax. The gray line shows the

ground truth for chest displacement. The solid gray (red) circles

show the true (estimated) peak. The ticks show the time differ-

ence between the estimated and true peak (i.e. the difference in

timings of the red and gray circles). (a) Liu et al. [5] estimates

cycle period using all peaks (small red circles) of selected sub-

carriers. The weighted average of all sub-carriers cycle time de-

termines the final breathing cycle boundaries (marked by large

red circles). (b) PhaseBeat [6] employs inter-peak duration

for a chosen single sub-carrier to estimate the breathing rate. (c)

WiRelax employs a cohort of selected sub-carriers to estimate

the final breathing waveform...... 106

4.16 TensorBeat [7] processing for same data of Fig. 4.15 . . 106

5.1 Power Spectral Density (PSD) curves for CSI data col-

lected using: omni-directional antennas. The low SNR makes

the heart rate estimation a challenge...... 116

5.2 Heart rate estimated from subcarriers. Actual heart rate

compared to the estimation produced from individual sub-carriers.

At each point in time, different sub-carriers produce estimation

that varies largely (illustrated by the red shaded areas whose

boundary represent maximum and minimum estimation at that

point). Even after discarding extreme estimates (darker area),

the range continues to be large...... 118

5.3 The architecture of CardioFi...... 119

5.4 CardioFi preprocessing ...... 122

July 20, 2020 List of Figures xix

5.5 Correlation between estimation error and the calculated estima-

tion variance ps ...... 124

5.6 Heart rate estimation from highest scoring sub-carrier

selected using Spectral-based (a) and Variance-based

(b) selection methods...... 125

5.7 The performence of CardioFi versus that of Liu et. al.

[8] ...... 128

5.8 Experimental setup scenarios ...... 128

5.9 The performance of CardioFi...... 130

5.10 The impact of CardioFi parameters ...... 131

July 20, 2020 List of Tables

3.1OverviewofautomatedHHmonitoringsystems...... 36

3.2 Time cost to perform manual labelling and sequence labelling . 44

3.3 Gesture Error Rate (GER) examples ...... 58

3.4ThegesturerecognitionaccuracyofRFWash...... 58

3.5 HH gesture recognition accuracy with different deep learning

models...... 70

3.6 LSTM vs BiLSTM ...... 70

4.1 Symbols used in the mathematical derivation...... 89

xx Acronyms

RF Radio Frequency

HH Hand Hygiene

HCW Healthcare Worker

WHO World Health Organization

FMCW Frequency Modulated Continuous Wave

HAI Hospital Acquired Infections

COTS Commercial-Of-The-Shelf

RD Range Doppler

CTC Connectionist Temporal Classification

GER Gesture Error Rate

CSI Channel State Information

PD Phase Difference

IER Inhalation-to-Exhalation Ratio

xxi Chapter 1

Introduction

1.1 Wireless Signals for Medical Sensing

Imagine a future in which wireless devices as ubiquitous sensors. Now imag- ine, a patient with a chronic disease who must attend periodic follow-up visits with his physician every few months. During these short sessions, a physician must rely on limited observations based on tests and the patient’s responses to assessment questions to make a critical care decision [9, 10]. Fortunately, the physician has access to physiological data from the patient’s in-home wireless sensors that reveals the patient’s daily health state since his last visit. The sen- sors work from afar without body contact and are able to monitor fine-grained vital signs and acquire information suitable for medical assessment. Based on the data, the physician determines that the patient is experiencing exacerba- tion and needs to be hospitalised. A decision prevents the patient’s condition from deteriorating any further. However, in the hospital there is a chance of picking up a new infection when the patient’s nurse fails to follow proper hand hygiene practice. Luckily, the in-hospital wireless sensors mounted to soap dispensers passively identify the nurse’s improper HH practice and notify the

1 1.1. Wireless Signals for Medical Sensing 2 healthcare workers (HCWs) of this occurrence.

This research seeks to directly address some of the problems that may arise in this and similar contexts. At the core of this research are systems that perform ubiquitous medical sensing in a contact-free manner and are able to operate efficiently in home and hospital settings. Specifically, we propose instrumenting indoor environments with wireless radio frequency (RF) sensors that can continually monitor residents’ health states from afar without user guidance and more importantly, without invading their privacy.

Today, many research and commercial solutions exist for measuring hu- man health. Wearables have been extensively used in many medical sensing applications. However, wearables may cause inconvenience to the subjects be- ing monitored. In addition to being uncomfortable to wear while sleeping, important sections of the population, such as the elderly, typically abstain from wearing monitoring devices [11] [12]. Additionally, all-time monitoring of wearables cannot be enforced, as valuable information is lost if they are removed; thus, they are ill-suited for continuous long-term monitoring appli- cations. Cameras represent a potential option for contact-free monitoring.

However, cameras also capture auxiliary information that may invade users’ privacy by collecting data about a user’s identity or personally-embarrassing activities. Given that the entire medical field is concerned with patient and data privacy, [13] visual surveillance solutions for medical sensing may struggle to gain large-scale adoption.

To address these limitations, we have developed sensing systems that can be deployed on top of commercially available wireless devices to translate the incoming wireless measurements to health metrics. To meet our objectives, we exploited wireless signals to develop three medical sensing applications.

Specifically, this research shows how RF signals can be used to: 1) accu- July 20, 2020 1.2. Systems Developed 3 rately monitor the HH practices of HCWs in healthcare facilities, 2) extract detailed breathing metrics suitable for biofeedback applications; and 3) track heart rate using unmodified ubiquitous wireless devices. The sensing approach adopted in these applications has three key advantages. First, it relies purely on wireless signals for sensing and does not disclose any visual appearance data

(unlike the imaging data collected by cameras). Thus, our approach represents an alternative that seeks to protect privacy. As stated above, this is of critical importance in the medical domain, as it allows for in-ward sensing without compromising the privacy of patients. Second, it is contactless, as it does not require physical contact with the human body. Thus, it represents a suit- able platform for long-term health monitoring and could potentially be used in longitudinal studies. Third, it relies on RF measurements from commercially available devices rather than bespoke RF equipment. Thus, the devices can be used directly after a simple software/firmware upgrade to the hosting RF hardware.

1.2 Systems Developed

In the course of this thesis, we progressively move from perceptible to imper- ceptible motion sensing. We begin by presenting a system for tracking HH motions to establish whether a HCW correctly followed the standard nine- step protocol of the World Health Organisation (WHO). Next, we measure

fine-grained psychological vital sign parameters (i.e., breathing and heartbeat) from imperceptible chest displacement motions. Common across all the appli- cations presented in the thesis is our exploitation of the effect of human motion on RF signals to perform monitoring. Consequently, we propose several tech- niques for extracting motions of interest and discarding irrelevant motions and

July 20, 2020 1.2. Systems Developed 4 noise depending on the target application. Finally, we highlight the technical contributions of the systems developed.

1.2.1 Hand Hygiene Monitoring with Radio Frequency

signal

We present RFWash which is first system to track HH technique (i.e., the stan- dard nine-step hand-rubbing protocol recommended by the WHO) of HCWs.

Poor compliance with HH in healthcare facilities is associated with hospital- acquired infections (HAI) that can lead to mortalities. Our work represents the first attempt to employ RF sensing to address this issue.

Figure 1.1 illustrates RFWash Hand Hygiene tracking system. The system processes input RF measurements to identify the hand-rub steps performed by

HCWs. The live camera feed is shown for reference and is not used by the sys- tem. The notable advantages of employing RF sensing include that: 1) device- free sensing removes the risk of transferring pathogens (a potential risk when using wearables); and 2) privacy is protected, as the measurements are genuinely anonymous and details of users’ visual appearance are not disclosed.

The primary challenge in designing RFWash related to the fact that HCWs perform the steps naturally in a back-to-back manner, such that all the steps are performed contiguously without any pauses. In the absence of pauses, RF data becomes unsegmentable. Thus, it is challenging to identify when a user starts or ends each of the nine steps using non-visual RF data. The assumption of ‘pauses’ between gestures is ubiquitous among RF gesture sensing systems

[14]. It is critical to segment the input before the subsequent classification. To address this issue, we proposed an alternative solution that frames the problem in terms of sequence recognition whereby the complete gesture sequence is

July 20, 2020 1.2. Systems Developed 5

RF Sensor (Ti's mmWave)

Fig. 1.1: Hand Hygiene Tracking using Radio Frequency signal predicted and the segmentation is addressed implicitly.

In Chapter 3, we present the proposed deep learning model and show that the suggested approach enables the accurate tracking of HH gestures. The proposed technique is enhanced by the use of a data augmentation method that significantly reduces the labelling effort required to train the model. We further complement the capabilities of the RFWash by demonstrating how we can use the same RF measurements to identify which subject is performing the gesture. This effectively enables HH compliance to be tracked at the HCW level.

July 20, 2020 1.2. Systems Developed 6

Current Cycle Progress

subject performing (3s inhalation, 3s exhalation) inhaled for breathing exercise 1.1 s WiFi channel measurements

missed inhalation (1.9 s)

exhaled for 2.5 s

WiFi laptop tracking breathing in contact-less manner

breathing depth tracking

Fig. 1.2: WiRelax System WiRelax system leverages WiFi communication to provide each subject with instantaneous respiratory feedback during the breathing exercise session. The video demonstration is available at [1]

1.2.2 Human Respiratory Biofeedback with WiFi

Next, we demonstrate how contactless tracking based on RF data can be ap-

plied to the sub-centimetre–scale motion of the human body. We focus on

breathing tacking and introduce WiRelax, a breathing biofeedback solution

based on WiFi signals. Biofeedback is critical in domains in which breathing

exercises are used in the clinical treatment of a breathing-related disorder, such

as Attention Deficit Hyperactivity Disorder, Chronic Obstructive Pulmonary

Disease and Asthma [15, 16, 17], and in well-being applications.

Figure 1.2 demonstrates our system called the WiRelax. The WiFi receiver

is placed in the vicinity of a user and translates the WiFi channel measure-

ments to instantaneous breathing metrics (pattern and duration) in real time.

Users are informed about their instantaneous breathing performance within

the breathing session, which enables them to engage in fine-grained breath

control during the session.

July 20, 2020 1.2. Systems Developed 7

Extracting instantaneous breathing metrics suitable for biofeedback ap- plications from WiFi channel measurements represented the key challenge in creating this system. Unlike breathing rate, which can be reported based on completed breathing cycles, biofeedback metrics need to be reported instantly as the user is inhaling or exhaling. Thus, our system was designed to report on instantaneous breathing depth and timing while the breathing cycle is ongoing.

We propose a mathematical model (see Chapter 4) that can regress directly on incoming measurements without having to wait for cycle completion, which in turn enables WiFi-based interactive breathing applications.

1.2.3 Heartbeat Monitoring with WiFi

Finally, this research demonstrates that RF signals can be used to track a motion that is imperceptible to the human eye: a heartbeat. In line with our previous methods, we used unmodified WiFi devices to enable in-home well-being monitoring. Heartbeats contribute to a surface chest motion that can affect the wireless channel; however, the motion is relatively small as compared to that of respiratory motion [18] (see above). This millimetre-level motion corresponds to small RF signal variations that are overpowered by the collocated respiratory motion and even the noise present in the WiFi channel.

Consequently, it is very challenging to make reliable heart rate estimations from RF signals. To improve the RF signal quality and increase the accuracy of estimations, previous works [5, 6] have used directional antennas to enhance hardware. Conversely, we adopt a purely algorithmic approach that made it possible to deploy the system on ‘unmodified’ WiFi devices. As discussed in

Chapter 5, our system, CardioFi, can identify stable channel data streams

(sub-carriers). These data streams can be further processed to yield reliable

July 20, 2020 1.3. Beyond the Applications Considered 8 heart rate estimations with a median error of 1.1 bpm that outperform the state-of-the-art approaches by a clear margin.

1.3 Beyond the Applications Considered

In this thesis, we focused on leveraging wireless signals from commercial de- vices to enable novel medical sensing applications. Notably, the techniques developed in the thesis can be generalised to other scenarios. For example, the deep sequence prediction model developed for tracking back-to-back HH gestures (see Chapter 3) addresses the difficulty of input segmentation that is prevalent in other application domains, such as unsegmentable recogni- tion (see Section 6.2) . Additionally, the privacy preserving advantages of our approach makes it suitable to sensitive environments (e.g., hospital wards) and it is applicable to other applications that use the same setup. In Chapter 6, we explore potential medical sensing applications that could use our frameworks and consider the new research questions raised by these applications.

The ongoing exploration of the sensing capabilities of RF devices will con- tinue over the coming years. We believe that these devices will serve as a favoured and reliable modality in real-world medical sensing in the near fu- ture.

July 20, 2020 Chapter 2

Background and Related Work

In this chapter, we discuss the background of the work presented in this thesis.

We consider and re-examine the principles underlying RF signals that makes then suitable to sensing applications. Fundamentally, this shows how RF sig- nals can be used in sensing and exposes the capabilities and limitations. Next, we briefly review the technical foundations of methods we developed for each sensing application. Finally,we review the previous research that informed the sensing applications developed in this dissertation.

2.1 Sensing Using Radio Frequency Signals

In our endeavour to build RF medical sensing systems based on interpreting human motion impact in the RF signal, we examined fundamental questions, the answers of which had a direct effect on the design and applicability of the systems. Specifically, we considered the following questions.

1. Body-Signal Interaction

• How does a RF signal interact with a human body and how can we

use this to sense motions of various scales from the millimetre to 9 2.1. Sensing Using Radio Frequency Signals 10

sub-metre levels?

2. Environment-Signal Interaction:

• How does a user’s ambience affect RF signal and consequently, how

can system be tailored to home and hospital settings?

3. Privacy Preservation:

• In addition to the main sensing task associated with human motion,

how much visual appearance information (i.e., information about

faces, gender and nudity level) is disclosed by RF sensing?

First, we examined the interplay between human body and the signal to identify the kind of motions that can be captured and in what context. This provided the foundation to mathematically model the effect of human motion on a signal (see Chapter 4). Second, as contact-free sensing is performed, the

RF signal also reflects off ambient objects; thus, we also examined the inter- action between the signal and a user’s environment. Dealing with interference from ambient objects in wireless environment is one of the key challenges fac- ing any RF-based sensing system. Typically, interference removal procedure depends on a multitude of factors including the specific RF hardware, ex- pected usage scenario and dynamics of the operating environment. Clearly, designing a ubiquitous procedure for interference removal that can work in any environment is out of the scope of this work 1. Here, we rather focus on the interference tolerance capabilities of various COTS RF platforms with respect to the applications considered. Finally, the privacy preservation of RF sensing was assessed. This question is rarely asked in the sensing

1Limited by the available resources, all the experiments in Chapters 3, 4 and 5 were conducted in a lab environment July 20, 2020 2.1. Sensing Using Radio Frequency Signals 11

heart in-home Ch.2 breathing Ch.3 in-hospital Ch.1 Hand Hygiene beat

A

static reflections dynamic reflections reflections relevant to sensing task

Fig. 2.1: Conceptual illustration of the medical sensing applications

and for most systems, it is viewed as less important than sensing quality. How-

ever, in medical environments, privacy concerns represent a great barrier to

the adoption of sensing modalities that could potentially expose confidential

user information. We emphasise that we don’t explicitly consider the design

of privacy preserving sensing systems as this beyond the scope of the work

presented here. We rather explore factors that motivate the use of RF signals

for sensing in medical environment from a privacy point of view.

By answering these questions, design guidelines were also developed for

each particular system. In the next section, we review the applications we

considered. Figure 2.1 provides a conceptual illustration of the applications.

The first application (see Chapter 3) is an automatic HH gesture tracking ap-

plication. Currently, the tracking procedure adopted by healthcare facilities

uses manual human observation. The goal is to have automated systems that

watch HCWs’ actions and report on their compliance with the standard hand-

rubbing procedure. Potential automatic monitoring systems need to track

July 20, 2020 2.1. Sensing Using Radio Frequency Signals 12 natural human actions and perform sensing in dynamic environments (e.g., hospital wards) in which interfering users may be present. These requirements and operating conditions represent common problems in the sensing area 2.

Thus, HH tracking reflects other sensing problems and can be used to exam- ine the suitability of RF sensing in hospital settings. Heartbeat and human breathing monitoring (see Chapters 4 and 5) are the subject of the other two applications. These key vital signs are the subject of many in-home monitoring systems, as they are an excellent indicator of well-being and predictors of a range of health issues. Further, accurately estimating vital signs serves many higher level applications, such as those directed at apnoea detection, emotion recognition, sleep quality monitoring, stress detection and conversation detec- tion. Together these are typical examples of in-home medical monitoring that may have a multitude of medical sensing purposes.

In general, the problems can be considered in terms of 1) the motion scale, which ranges from the sub-metre level in HH tracking to the millimetre level in heart rate tracking, 2) the operating environment, which ranges from a dynamic clinical to home environment; and 3) feedback, which needs to be instantaneous in the case of breathing biofeedback but can be relaxed in other applications.

In this section, we address the questions (see above) in light of the appli- cations’ characteristics and requirements. In all the applications, we target commercial RF devices, as the hosting platforms of our algorithms; thus, be- spoke RF equipment will not be considered. In earlier research that used a single hardware platform, it was possible to analytically quantify this inter- action by modelling the effect of human motion on RF signal amplitude and

2One example is activity logging in Intensive Care Units (ICUs). For further discussion on this issue, see Section 6.2 Future Work. July 20, 2020 2.1. Sensing Using Radio Frequency Signals 13 phase given the signal and motion characteristics. However, it is challenging to extend this approach to heterogeneous RF platforms in the market due to the vast diversity of platform parameters (i.e., bandwidths, frequencies, signal type, etc.). Consequently, we decided to focus on established physical proper- ties of the RF signal that can sufficiently demonstrate the potential of RF in motion sensing, gain an understanding of its limitations and develop guidelines to design the three medical sensing systems of interest.

1) Body-Signal Interaction: RF signals interact with material (includ- ing human body surfaces) in a variety of ways; for example, via reflection, absorption and refraction (the signal propagates within the material). A com- bination of these typically occur in a complex way; however, human motion sensing is primarily undertaken using reflection as a large portion of the RF signal is reflected off the human body [19]. The reflections provide the main sensing tool and they are captured and interpreted in light of the particular motion being monitored.

The nature of this reflection is dictated by the ratio between the wave- length and the reflective surface. The general rule is that RF waves can by- pass surfaces whose size is smaller than the wavelength and would otherwise be reflected [20]. Focusing on the human body as the monitored object, the pre- cise RF reflection behaviour differentiates between two prominent wavelength categories that exist in consumer RF products today: 1) the centimetre-scale wavelength to which the ubiquitous WiFi and other technologies belong ; and

2) the millimetre-scale wavelength that underlies 802.11ad WiFi and a range of embedded radar products that are gradually becoming main stream in indoor sensing and automotive applications.

The Centimetre-scale Wavelength: In common RF technologies (such as

WiFi), whose wavelengths are in the order of centimetres, the human body re- July 20, 2020 2.1. Sensing Using Radio Frequency Signals 14

flects the signal in a specular manner (i.e., mirror-like reflection) [21, 22] rather than in a diffusing and scattering everywhere manner. Under specularity, the nature of the reflections vary across different body parts. For moving limbs, reflections may be missed when the signal is deflected away from receiver. For example part of the reflections from a user’s moving hand can be deflected away (see location A in Figure 2.1 where only the coloured body parts are observable by the receiver) or reflections from the whole limb can be missed by the receiver depending on the surface orientation. Thus, it is challenging to obtain continuous measurements from moving limbs. Conversely, as a large reflector with relatively controlled movement, the torso, can be monitored to capture continuous reflection. Even if a user is in a mobile state (moving to- wards the receiver), continuous measurements can be captured from the torso

[22]. Further, the spatial factor of the torso (as the largest reflector in the human body) permits the capturing of minute motion reflections reliably.

To summarise, the torso (as a large spatial reflector) can be monitored for minute motions. We relied on this fact to capture motions associated with vital signs. To do this, we adopted the convention used previously by [23]

[22] to monitor subjects while their bodies face wireless devices. Additionally, torso reflections are continually available, unlike limbs which provide only oc- casional reflections. We used this to continually capture measurements from the torso with minimal missed reflections. This can be leveraged (when the body is quasi-stationary [23] ) for interactive sensing applications, such as breathing biofeedback, that require processing and reporting on contiguous measurements.

Millimetre-scale Wavelength: The aforementioned specularity property com- plicates the process of tracking moving limbs by using RF [22][24]. Addition- ally, centimetre-scale wavelengths lack the sensitivity to track small surfaces July 20, 2020 2.1. Sensing Using Radio Frequency Signals 15 of the body, such as micro-hand gestures and finger configurations [25][24].

However, millimetre-wave alternatives have proven to be more capable in this regard [26] , as they can be easily reflected by small objects, including finger tips. Consequently, applications involving intricate hand gestures with com- plex finger configurations and mirrored gestures (e.g., HH gestures) can benefit from millimetre-wave sensitivity capabilities.

Occlusion Impact: A millimetre wave provides better sensing resolution; however, the signal attenuation can limit its sensing range compared to tech- nologies with a longer wavelength. Generally, the longer the wavelength of an electromagnetic wave, the lower its attenuation [27]. These facts have been manipulated to the advantage of applications in the RF sensing liter- ature. Through-wall sensing systems employ long wavelengths that bypass obstructions, such as walls. Systems for tracking small objects (e.g., a pen

[20]) that are transparent in centimetre-scale wavelengths employ millimetre- wave signals that have stronger reflections from these objects. In light of this, centimetre-scale wavelength signals can cover multiple rooms [28] if necessary and operate through in-home potential occlusions, such as thick clothing. In vital sign monitoring applications, this enables effective tracking even if a user is sleeping under a quilt (a scenario that reportedly degrades the accuracy of other contact-free modalities such as the ultrasonic) [29].

By analysing the motion impact in reflected RF signal, we can gain insight into the actions that are occurring. Regardless of the mechanism used to analyse the signal, RF sensing indoors is typically challenged by environment- related factors (called multi-path) that raise a second question in relation to the effect of the interaction between the signal and the environment.

2) Environment-Signal Interaction: As the signal interacts with the materials in the environment, the operating environment needs to be factored July 20, 2020 2.1. Sensing Using Radio Frequency Signals 16 into the RF sensing system design. This may include reflections from walls, fur- niture and other objects (see the grey paths in Figure 2.1) and reflections from other users engaging in independent actions that cause dynamic interference

(see the green paths in Figure 2.1). The latter refers to dynamic multi-paths; it is more difficult to remove the effects of dynamic multi-paths than that of the first type. Thus, the measurements of different reflectors need to be sepa- rated. One common approach to mitigating dynamic multi-paths is to employ ranging information from Time-of-Flight (ToF) measurements. Intuitively, re-

flections from nearby reflectors will reach a receiver before those from further distances. However, signals travel at the speed of light and all reflectors are close to each other in indoor environments. Thus, reflections that are a few feet apart are still difficult to separate and typically require the employment of a larger bandwidth. Previous research has shown that it is possible to widen the original WiFi bandwidth through channel stitching [30]. However, this technique is subject to strict timing constraints and the final ranging resolu- tion is limited. As an alternative approach, we used commercial embedded millimetre-wave radars that can export precise ranging measurements.

3) Privacy Preservation: One example of a successful sensing technol- ogy that raises privacy concerns in sensitive environments is computer vision technology. Cameras can perform accurate sensing and tracking; however, due to their resolution, they can easily capture privacy-sensitive information, in- cluding information about an individual’s face, gender and skin colour [31].

Recent research efforts have addressed this issue by using various techniques to artificially downgrade the imaging quality and resolution [32, 33]. We show that the intrinsic properties of RF signal govern lower resolution imaging from the native measurement without the need for any transformation. Many fac- tors contribute to this fact; however, we discuss two key factors that are best July 20, 2020 2.1. Sensing Using Radio Frequency Signals 17 understood when comparing RF systems to cameras. The analogy is valid given that RF and visible light share many of the properties of electromag- netic waves [34].

The first factor is ‘undetectable surfaces’ and is relevant to our earlier discussion on the relationship between wavelength and sensing resolution. As everyday objects are not much larger than the illumination wavelength used in common RF devices3, many surfaces become undetectable and are missed by RF imaging [35]. The situation also applies to millimetre waves whose wavelengths are thousands of times longer than the nanometre-scale waves of visible light.

The second factor is the practical challenge of constructing the ‘dense RF sensor arrays’ required for imaging with high spatial resolution. Visible cam- eras capture images with high spatial resolution through millions of tiny sensors

(pixels). Today, these are compact enough to fit in our smartphones. In RF systems, these sensors (RF pixels) are antennas with associated complex cir- cuitry [34]. As the size of a sensor has to be comparable to a RF wavelength,

[34] it is very challenging to have a dense array of RF pixels even when using a shorter wavelength. To better illustrate this gap, we compare the resolution of modern millimetre-wave RF technology to depth cameras (see Chapter 3 for further details). Figure 2.2 shows a qualitative comparison of the ranging measurements of a millimetre-wave radar system and depth camera that was employed by [32] as a privacy preserving alternative to RGB cameras for HH monitoring. The depth camera reveals visual appearance details, such as a user’s body shape and the geometric layout of that user’s ambience. Con- versely, reflections from RF signals are sufficient for sensing purposes, such as

3RF technologies in the microwave frequency range, such as WiFi and Radio Frequency Identification (RFID). July 20, 2020 2.1. Sensing Using Radio Frequency Signals 18 identifying a user’s location without disclosing sensitive information.

(a) RF ranging heatmap (b) Depth camera range informa- tion

Fig. 2.2: RF is a genuinely anonymous signal: RF ranging data has much lower resolution than frames from a co-located Kinect depth camera and represents an alternative signal for genuine anonymity.

Thus, in general, in terms of privacy friendliness, RF sensing is a preferable to a state-of-the-art contact-free modality (e.g., vision). Based on the various

RF signal properties we explored above, the following guidelines can be used when considering a hosting RF platform:

• In-home Monitoring: WiFi technologies can be used for vital sign

in-home monitoring, as they are highly ubiquitous and already available

in residential environments. The technology is sufficient to accurately

sense vital sign micro-motions, as the torso is much larger than a sig-

nal’s wavelength. Further, the signal reflections can be captured through

potential occlusions; thus, it is possible to employ vital sign monitoring

during sleep and other related applications.

• In-hospital Monitoring: Conversely, commercial off-the-shelf (COTS)

radars with shorter wavelengths (e.g., millimetre-wave radars) are more

practical for sensing the intricate motions of the hand gestures of HCWs. July 20, 2020 2.1. Sensing Using Radio Frequency Signals 19

For example, modern radars provide accurate ToF measurements that

can be instrumental in separating dynamic reflections from interfering

users in a highly dynamic environment. Further, recent millimetre-wave

radars have a small size factor and thus can be used for this kind of

monitoring without any deployment consideration concerns.

While it is possible to use millimetre-wave radars for in-home vital sign monitoring, employing WiFi platforms already existing in today’s home envi- ronment is favourable as it enables large scale deployment with much lower cost[5]. On the other hand, WiFi measurements will be less reliable in in- hospital due to interference from secondary users. Such interference impact can be mitigated by employing precise ranging information as explained earlier.

We move from guidelines above and build on foundations from past liter- ature to address typical RF sensing challenges in the three medical sensing problems. To turn these abstract ideas into actual sensing systems, the fol- lowing new technical challenges had to be addressed in this thesis:

Sensing Natural Human Behaviour: Assumptions about human be- haviour that do not hold in realistic environments have prevented many re- search systems from being taken out of the laboratory. In the RF sensing liter- ature, the practicality of a gesture recognition system is limited by simplifying assumptions about human behaviour when performing gestures. For exam- ple, one gesture recognition system [4] assumes access to perfectly segmented data that is not attainable in sign language gestures (unless a user pauses after every gesture). In [36], a similar assumption was made that required the user to pause between subsequent gestures to allow accurate segmenta- tion. When operating in conditions in which the simplifying assumptions do

July 20, 2020 2.1. Sensing Using Radio Frequency Signals 20 not hold, accuracy degrades significantly (see Chapter 3) and leads to sensing deficiencies. In RFWash (see Chapter 3), we first observed the behaviour of a professional HCW and then constructed a system to identify the naturally performed back-to-back gestures to ensure that users will not have to adopt any artificial behaviours, such as pausing after every gesture. Our approach departed significantly from the two-stage models (segmentation followed by classification) common to the literature, as it framed the problem as sequence learning.

Relying on Consumer Devices: As mentioned above, we focused on accessible technology that can be deployed with minimal effort rather than developing custom hardware. In addition to being ubiquitous and cost efficient alternatives, recent progress in large-scale manufacturing has enabled the off- the-shelf hardware size factor to be much smaller. For example, the size factor of Ti’s radar, which we used for HH tracking (see Chapter 3), is significantly smaller than metre-scale non-commercial platforms [37, 38] that employ the same underlying technology. If we had used bulky designs to boost the sensing performance rather than the current choice, their form factor would create a practical deployment challenge when trying to fit them on to soap dispensers.

Thus, there is an added advantage of employing commercial RF.

Despite these advantages, the employment of consumer devices rather than specifically designed hardware introduces various practical challenges depend- ing on the actual hardware used. Sometimes a specific valuable RF measure- ment type is invalidated. For example, in custom-made WiFi platforms, [39] reliable phase measurements that were shown to be instrumental in achieving precise motion tracking [20] can be acquired. However, as we will see later

(see Chapters 4 and 5), these measurements are very noisy and useless in off-the-shelf WiFi hardware. Consequently, we turned to different measure- July 20, 2020 2.2. Foundations 21 ments of phase difference (PD) with different characteristics and used these to model motion impact. Another challenge that arose in using commercial

RF, is that the measurements might be available albeit with much lower sam- pling rates. We encountered this in the HH tracking system (see Chapter 3), as embedded radars export measurements at very low frame rates due to re- source constraints. Given this limitation, a time window with a few samples

(e.g., 10) may still contain data from multiple gestures rather than one gesture.

Consequently, our architecture was developed to address this scenario.

We believe in the critical value of developing sensing systems that use the native capabilities of the already popular and accessible RF hardware platforms and designing a sensing workflow that adapts to natural human behaviour.

2.2 Foundations

In this section we explore the technical foundations on which the methods in the results chapters (Chapters 3, 4 and 5) build on. First, we discuss RF- based gesture recognition and sequence labelling literature. These are relevant to RFWash system in Chapter 3. Then, we review methods for vital sign monitoring using RF that were leveraged in Chapter 4 and Chapter 5. In this section, we briefly explore the technical foundations in the literature. A thorough treatment of the background material, however, is presented in the individual chapters.

2.2.1 RF-based Gesture Recognition

In a typical RF-based gesture recognition system [4, 40, 41], the measurements stream is processed using a staged sequential approach in which the output of each stage is used as input to the next one. Commonly, three stages are July 20, 2020 2.2. Foundations 22 employed; preprocessing, segmentation and classification; respectively. In the

first stage, the input measurements are preprocessed to alleviate the impact of noise and interference caused by reflections from static and dynamic objects in the environment. Next, the segmentation stage utilizes “silence periods” in the measurements stream to extract segments that contain only the gesture data. The intuition is that in periods where the user is stationary (e.g. silence periods), the RF measurements variations will be minimal and can be detected based on an empirical threshold [40]. By extracting measurements between two consecutive silence periods, segmented gesture samples can be acquired.

Finally, the gesture samples are forwarded to classifier that predicts the gesture label.

Extensive prior work on RF-based gesture recognition clearly demonstrated that RF signal can capture complex hand dynamics. Even when the number of gestures is very high classification can be done efficiently on segmented gesture data as demonstrated by SignFi [4]. Guided by this fact, we revisit

RF-based gestures classification when segmentation is not possible as it is the case in Hand Hygiene gestures. A key outcome of our study is that classifiers pre-trained on segmented samples can’t tolerate even marginal errors in the segmentation during the runtime. Consequently, we explored various alter- native approaches for resolving the problem. The most successful of which is inspired by the advances in “sequence labelling” (discussed next) which addresses segmentation and classification simultaneously during the learning stage.

July 20, 2020 2.2. Foundations 23

2.2.2 Sequence Labelling

In the machine learning literature, sequence labelling is used to refer to tasks where sequences of data are transcribed with sequences of discrete labels [42].

Speech and handwriting recognition, protein secondary structure prediction and part-of-speech tagging are a few example domains in which sequence la- belling is used extensively. The noticeable similarity between speech/handwriting recognition and the requirements of the hand hygiene tracking problem mo- tivates employing sequence labelling techniques. More specifically, we have a continuous unsegmentable input RF stream and we need to map it to the the corresponding discrete gesture labels. Thus the problem can be framed as sequence labelling rather than the traditional segmentation followed by classi-

fication. Formulating the problem this way, however, means that information about the start and times of each individual gesture within the sequence is not accessible by the machine learning model. In other words, the problem is further complicated by the fact that alignment between inputs and gesture labels is unknown.

We leverage weakly supervised sequence labelling [43] in which the learning algorithm learns to determine the location as well as the identity of the output labels within a continuous input sequence. During runtime, the algorithm assigns a likelihood for possible alignments, which is then used to infer the most likely gesture sequence. This aligns very well with the problem requirements as it allows us to train the system on unsegmented RF sequence. Thus the difficulty of segmentation is bypassed. We support this learning process by a novel data augmentation to tackle the problem of limited training data.

July 20, 2020 2.2. Foundations 24

2.2.3 RF-based Vital Sign Monitoring

The success of RF-based vital sign monitoring was popularized by radar-based systems like Vital-Radio [23]. It works by transmitting a low power wireless signal and measuring the time it takes to travel to the human body and re-

flect back to the receiving antennas. The FMCW radar used in Vital-Radio enables highly accurate distance tracking that even slight and periodical chest displacements caused by the user’s breathing and heart beating can be moni- tored accurately. In this way, the system can be fitted into a home environment to turn it into smart environment that monitor the residents vital sign.

Despite the proven success of FMCW-based vital sign monitoring systems like Vital-Radio, another line of research focused on utilizing ubiquitous wire- less devices that already exist in home environment [5, 8]. In particular, WiFi devices are used as vital sign monitors by extracting the breathing and heart- beat information from the channel measurements (CSI). Despite the fact that distance measurements can’t be directly estimated from WiFi CSI measure- ments, the theory of operation is very similar to this of Vital-Radio. As the signal is reflected by the human body before reaching the receiving antenna, variations in the receiver’s CSI sub-carriers will capture vital sign related pe- riodic displacements. Methods were developed to de-noise the measurements and extract the vital sign information using frequency analysis [5, 8]. WiRelax

(Chapter 4) and CardioFi (Chapter 5) builds on these methods and upgrade their capabilities to accommodate new application scenarios such as heartbeat monitoring without specialized antennas and on-going breathing cycle moni- toring for biofeedback.

July 20, 2020 2.3. Related Works 25 2.3 Related Works

In this section, we summarise related works to define the scope and context of the developed medical sensing applications. We then revisit and complement this review by elaborating on the technical differences between our work and related works in the corresponding chapters.

2.3.1 Hand Hygiene Monitoring

The first application we consider is tracking hand hygiene technique in which healthcare workers perform nine standard gestures. The current golden stan- dard for monitoring is human auditing. However, in addition to costing the community millions of dollars annually, it also only collects a small fraction of hygiene opportunities.

To address this issue, previous efforts to automatically monitor HH moni- toring have focused on:

• Dispenser Usage Monitoring This technique simply tries to deter-

mine how often HCWs use soap dispensers compared to expectations.

In hospitals, indirect estimation using methods, such as detergent con-

sumption and electronic counters, [44] have been employed. Additionally,

direct monitoring using wearable Radio Frequency Identification (RFID)

[45] or Bluetooth tags have also been tested. When such technologies are

employed, HCWs are recorded as having performed a HH action when

their tags are detected by a tag reader-enabled dispenser. However, given

that the use of a dispenser does not necessarily mean that the correct

technique was followed, these approaches do not provide feedback about

the quality of the actual HH procedure.

July 20, 2020 2.3. Related Works 26

• Hand Hygiene Technique Monitoring Under this group, a system

tracks an individual’s actual hand-rubbing steps (the nine gestures speci-

fied by the WHO). This fine-grained tracking system differs substantially

from simply tracking dispenser usage. Wearables [46], depth [3] and RGB

cameras [47] are able to conduct the tracking. However, due to privacy

concerns associated with camera imaging and the possibility of transfer-

ring pathogens using wearables, these systems do not have the potential

of working in hospital wards.

Our system falls in the second category. In our RF setup, we tracked the hygiene steps by capturing reflections of subjects’ hands in a contactless way.

The fact that the gestures are performed with interlocked hands, include subtle motions and that many of the gestures are mirrored introduces challenges to contact-free systems. Previous contact-free efforts relied primarily on detailed visual appearance features of hands from RGB [47] or depth [48] cameras.

However, our research appears to be the first to employ RF for HH tracking.

Unlike video based techniques, capturing detailed visual appearance, such as hand shape, or orientation is not attainable from RF measurements. Our algorithms rely on range and velocity information extracted from subjects’ hand reflections. The algorithms also deal with the realistic requirements associated with operating inside clinical environments (e.g., HCWs performing the steps in a back-to-back manner and staff and other subjects being actively present in the vicinity of the main subject).

2.3.2 Breathing Biofeedback

In our second application, we used WiFi to track human breathing for breath control applications. Under this system, a user can track her breathing in-

July 20, 2020 2.3. Related Works 27 stantly to be able to perform a breathing action (inhaling/exhaling/retention) to align with a specific breathing exercise. The benefits of breath control range from the development of physical and mental well-being [49] [50] to the treatment illnesses such as hypertension and arrhythmias [51]. The efficacy of breath control training has led to it being used in non-drug medical treat- ment solutions.For example, Resperate[52] (a medical device recommend by

American Heart Association [53] for lower blood pressure) is based entirely on episodes of slow breathing exercises.

To allow the users to achieve breath control, methods were developed to inform users about their breathing activities. Conventionally, devices such as respiratory inductance plethysmography (RIP) are used for accurately track respiration activity in clinical settings. As the device requires two sensory bands around a patient’s torso to record the breathing motion and a specialist to operate it, it is ill-suited for in-home use. Consequently, researchers have investigated various methods to achieve ubiquitous respiration tracking. The large body of work concerned with the actual tracking of breathing activity can be divided into two groups:

• Breathing Rate Monitoring. In this group, the periodicity of breathing-

induced motion is analysed to infer the breathing rates. This topic has

been the subject of extensive research and has been examined using a

wide range of technologies. The common “body surface sensors” ap-

proach emulates RIP principle by attaching sensors to chest/abdomen

to record breathing rates. For this purpose, devices such as chest bands

[54] , smartphone’s inertial sensors [55] and smartwatches [56] have been

used. Today, some of these methods are already present in consumer

products. For example, the SpireHealth [57] and VitaliWear [58] use ac-

July 20, 2020 2.3. Related Works 28

celerometers attached to underwear to track breathing rates. Conversely,

contactless approaches have made considerable progress despite having

slightly lower accuracy than body surface sensors. Similarly, respiration

monitoring has been conducted using cameras and video amplification

techniques [59] and by determining chest motion impact on ultrasonic

signals [60] and wireless signals [23]. Despite the success of the previ-

ously discussed technologies, breathing rate is a summary statistic that

represents the number of breaths per minute and does not the

breathing pattern details that are needed in many applications, includ-

ing biofeedback.

• Breathing Biofeedback. In this group, the goal is to provide users

with instant fine-grained breathing metrics in breathing training ses-

sions. Technologies similar to those reviewed in the first category have

been used to obtain biofeedback. The Prana wearable [61] is used for

diaphragmatic (i.e., abdominal) breathing training. The MindfulWatch

[62] uses a smartwatch to track respiration from a user’s wrist and reports

breathing cycle timing for meditation sessions. Subsequent to our work,

Breeze [63] showed tha smartphones’ microphones can be used to detect

breathing phases in gamified biofeedback-guided breathing training.

Our work (Chapter 4) belongs to the second category but also builds on the

WiFi sensing literature [5, 6, 64, 65] from the first category. The major target is to provide contact-free inexpensive breathing biofeedback solutions that can operate on top of commercial WiFi devices without requiring specialised hard- ware. Algorithms employed by past WiFi sensing work [5, 6, 64, 65]; however, they cannot be extended to biofeedback, as they use various techniques to anal- yse completed breathing cycles. Conversely, our systems track a user’s ongoing July 20, 2020 2.3. Related Works 29 breathing cycle and reports the instant progress on a ‘breath-by-breath’ basis so that a user can exert breath control action. For example, the system might guide a user to stop exhaling after a fraction of a second (e.g., 0.25 seconds) and start inhaling afterwards. In addition to timing, we need to inform users of their breathing depth. For this purpose, we developed a novel model that can recognise instantaneous breathing pattern parameters (including depth, timing and inhalation-to-exhalation ratio) by regressing directly on incoming wireless measurements making it suitable for guiding the users in breathing exercises session.

2.3.3 Heartbeat Estimation

From serving as an indicator of a range of important health issues [66, 67] to optimising exercises to reach fat-burning zones [68, 69], heart rate has always been used as a main vital signal. Outside hospitals, the estimation of heart rate has been the focus of many ubiquitous healthcare sensing systems. These systems have used smartphones camera and inertial sensors [70]. At the same time, various research groups have used RF platforms to investigate contactless monitoring.

Fundamentally, contactless heart rate monitoring is achieved by analysing the signal reflections of ballistocardiography (BCG). BCG refers to movements of the body synchronous with the heartbeat due to ventricular pump activity

[71]. Previous research has shown that periodic vibrations caused by chest

BCG reflects heart rate [72]. This fact has been used by various RF systems

FMCW radars [23], WiFi [5] and RFID. Similarly, our system analyses sig- nal reflections from minute chest displacements to estimate heart rate. We were inspired by sensing systems that capture heart rate from WiFi channel

July 20, 2020 2.3. Related Works 30 measurements [65] [6]. A main limitation of these systems is that they use a directional antenna to amplify the feeble motion impact on the noisy wireless measurements. The antennas are far less common in residential WiFi devices than omnidirectional devices. Unlike previous research, CardioFi (see Chap- ter 5) does not employ special antennas to boost the signal-to-noise ratio. In this scenario, as we show below, data streams (sub-carriers) become very noisy and fusing them directly, as per previous techniques [65] [6], yields inaccurate estimations. We make the experimental observation that a few of the data streams still reflect the actual heart rate and propose a novel sub-carrier selec- tion scheme to identify and discard other noisy channels early in the processing.

Ultimately, CardioFi brings heart rate estimation capabilities to unmodified

WiFi devices.

July 20, 2020 Chapter 3

Hand Hygiene Monitoring

Healthcare Associated Infections (HAIs) find their way to one in twenty five patients admitted to hospitals [73] and lead to increased patient mortality and healthcare costs [73]. Proper hand hygiene protocol (i.e., frequent and thorough hand cleaning) is an effective way to combat HAIs [74]. This leads to the question of how one can monitor hand hygiene (HH) adherence in an hospital environment.

The conventional approach to HH adherence monitoring is to employ a team of observers (e.g., overt nurse trained auditors) to record Hand Hygiene

Opportunities (HHOs) and the number of times health care workers (HCWs) comply with the protocol. Today, this is considered to be the gold standard for measuring compliance by the World Health Organization (WHO).

To date, attempts to implement automated alternatives for monitoring HH have had limited success. For example, electronic counters [44] and RFID [45] simply count hand washing activities. These tools provide a very limited pic- ture of HH adherence. They cannot reveal whether hand hygiene technique

— such as the nine-step procedure for applying alcohol-based handrub recom- menced by WHO [2], see Figure 3.1 — has been thoroughly adhered to. Some

31 Chapter 3. Hand Hygiene Monitoring 32

G1 G2 G3 G4 G5 G6 G7 G8 G9

Fig. 3.1: The alcohol-based handrub procedure recommended by the WHO [2]. The 9 steps are marked by the labels G1, G2 etc. commercial camera systems train HCWs to learn the correct HH technique; however, no solution appears to exist for the automated monitoring of the HH technique in healthcare

Although there are commercial camera systems for training HCWs to learn the correct HH technique, to the best of our knowledge, there exists no solu- tion for automated monitoring of the HH technique in healthcare facilities. facilities.

In this chapter, we use commercial-off-the-shelf RF mmWave sensors to monitor the HH technique in Figure 3.1. Our vision is to embed these sensors at the alcohol-based handrub dispensers, which are distributed throughout the hospitals, to monitor whether HCWs who perform hand rubbing have adhered to HH technique. Our vision thus enables much more fine-grained monitoring of HH adherence.

The HH technique in Figure 3.1 comprises nine different hand movement patterns. Recently major progress has been made in gesture recognition using radio frequency (RF) signals [4]; however, HH gesture monitoring presents unique challenges. First, six of the nine steps of the hand rub technique are very similar as they comprise motions in which the left and right hands are mirrored.

Second, some gestures are performed with two hands interlocked. Finally, the entire procedure is performed without a pause between consecutive gestures.

Contiguous sequences of gestures have not been previously investigated in the

July 20, 2020 Chapter 3. Hand Hygiene Monitoring 33

RF sensing literature. In fact, previous RF-based sensing approaches [4] rely on pauses between gestures that are employed as physical markers to identify the start and end of each motion segment. This approach trivially achieves accurate segmentation and the problem reduces to gesture classification. In the absence of enforced pauses, joint segmentation and classification becomes a challenging task.

Back-to-back gestures with no pauses defy traditional segmentation tech- niques. Due to the significant interdependence between segmentation and subsequent recognition, poor segmentation (see Section 3.2.1) , deteriorates classification performance. Thus, the traditional approach cannot be adapted to HH tracking. The challenge of RF-based contiguous gestures recognition has been recognized in prior research [4, 75]; however, to the best of our knowl- edge, no attempts were made to address it. For example, a WiFi-based sign language recognition system SignFi [4] addresses the segmentation issue by making the assumption that “manually segmented” single-gesture samples can be acquired. Clearly, the assumption is unrealistic and the problem was posed this way because it “introduces many challenges”[4] to consider contiguous gesture recognition.

In this work, we address the problem by introducing RFWash; a segmentation- free approach for recognising back-to-back HH gestures sequence. We drew in- spiration from modern end-to-end speech recognition systems, that are similar to our problem because it is difficult to label continuous speech data. Of par- ticular relevance to our problem are weakly supervised methods that can learn directly form data without requiring explicit data segmentation and full anno- tation. To this end, we developed a model that can be trained on back-to-back gesture sequences (see Figure 3.2) without requiring gesture segmentation, that can also reduce labelling overhead substantially. July 20, 2020 Chapter 3. Hand Hygiene Monitoring 34

Segmentation Classification G1 Gesture Sequence G2 Model G3

Sequence Prediction Proposed Back-to-back Gesture Sequence G1,G2,G3 Model

Fig. 3.2: Gesture sequence recognition. Conceptual illustration of the proposed gestures tracking model (bottom row). Unlike conventional gesture recognition (top row), the proposed model is trained on unsegmented hand hygiene gestures and predicts labels for whole sequences of gestures in run-time.

Notably, a straightforward adaptation of sequence learning, however, does not work for long HH gestures sequences. Long training sequences pose two major challenges that RFWash needs to overcome. First,working with longer sequences leads to fewer training data points, as a fixed-size training set gets split into fewer sub-sequences proportionate to the sub-sequence length. Sec- ond, the number of possibilities to align a minimal gesture label sequence within an RF HH data sequence grows exponentially with sequence length [76].

Ultimately, the situation becomes ill-posed and results in poor alignment(see

Section 3.4.3). To address this issue, we used data augmentation to signif- icantly increase the number of training samples without modifying the se- quence content. Consequently, a significant improvement of sequence learning was achieved.

We make the following contributions in this chapter:

July 20, 2020 3.1. Motivation 35

1. We propose and implement RFWash 1, which is the first RF-based

system for device-free monitoring of the nine-step HH technique.

2. We characterize the challenges of recognizing back-to-back HH

gestures using an RF-based gesture recognition processing pipeline. In

particular, the lack of pauses between gestures makes segmentation dif-

ficult which, in turn, affects the performance of the subsequent classifi-

cation component.

3. We propose a new sequence learning approach that performs seg-

mentation and recognition simultaneously. Consequently, the model can

be trained using continuous streams of minimally labelled RF data corre-

sponding to naturally performed hand-rub gestures. We further extend

the approach using a novel data augmentation technique to enable

training on longer segments that are less labour intensive.

4. We extensively evaluate the performance of RFWash using a dataset of

1,800 gesture samples collected from ten subjects over 3 months and

shows that RFWash achieves a low Gesture Error Rate (GER) of 7.41%

and low gesture timing error of 1.8 seconds, using weakly-labelled se-

quences of 10 seconds in length.

July 20, 2020 3.1. Motivation 36

Table 3.1: Overview of automated HH monitoring systems.

Contact- Hygiene Work Inside Wards Free Tech.

Electronic Counters [44]   RFID [45]   Wearable [46] (pathogens) RGB Camera [47] (privacy) Depth Camera [3] (privacy) Depth Camera [48] (privacy) Proposed (RF sensor)   

3.1 Motivation

3.1.1 Hand Hygiene Monitoring in the Real World

An ideal automated system for monitoring HH compliance should be able to detect attempts by HCWs to perform hand rub procedures to track HH opportunities and to establish compliance rate baselines. Additionally, such a system should monitor fine-grained parameters of the HH technique (Fig- ure 3.1) itself. Such information can provide useful insights and help establish compliance rates of the healthcare facilities. The system must be capable of running unattended in real-world healthcare facilities. A previous study of 789 clinicians in a 380-bed tertiary hospital [77] showed that automated HH train- ing systems have a limited effect on HH compliance as they do not operate inside wards. The same study has shown that the direct human observation im- proved compliance which is explained by Hawthorne effect (i.e. the fact that

1Despite the name, RFWash tracks the nine-step Alcohol-Based Hand Rub (ABHR) [2] rather than hand washing. Hand washing techniques that use soap and water are performed in a 12-step procedure that have the additional steps for rinsing and drying hands.

July 20, 2020 3.1. Motivation 37 humans change their behaviour when they believe they are being watched).

The benefits of an automated monitoring system that evaluates HH in-situ are therefore twofold. It will lead to improved compliance by reducing the

Hawthorne effect and it will provide quantitative data about hygiene quality within the healthcare facility.

Despite the advent of machine learning algorithms for vision-based systems, the golden standard for assessing HH in clinical facilities is direct human ob- servation; however, such observations can only monitor a small fraction of

HHOs [45]. A complete and automated HH monitoring system has yet to be realised. Table 3.1 surveys the key characteristics of current research-based and commercial automated solutions. Existing solutions only perform well on one or two aspects. More importantly, no solutions exist for monitoring the hand-washing/rubbing process (i.e., the nine-step HH technique recommended by the WHO) in hospital wards.

3.1.2 RF Sensing for HH monitoring

As mentioned earlier, RFID and electronic counters are limited on the quality of information they can capture about the hand hygiene. Practical evaluations have revealed that RFID can miss more than 80% of hygiene events. To increase localisation accuracy, a network of cameras that tracks staff inside hospitals has been proposed [3]. However, neither approach is able to track the actual hand-rub technique.

No commercial solutions currently exist to track hand-rub techniques in hospital wards; however, research solutions based on camera technology have been proposed [47, 48]. As privacy regulations, such as the Health Insurance

Portability and Accountability Act and General Data Protection Regulation

July 20, 2020 3.1. Motivation 38 limit the use of cameras in healthcare settings [3], camera-based systems em- ploy image anonymisation techniques. One such example uses a depth camera that conceals colour information as each pixel value in-depth image represents the distance between the pixel and the camera instead of the colour. The use of a depth camera alone does not provide sufficient privacy guarantees. Despite careful control of the field of view of the cameras and reduced image resolution

[32], the images may still capture detailed images of the visual appearance of individuals that may be used to track and invade their privacy.

Figure 2.2 compares the RF signal data of the TI mmWave radar used in this paper to the depth data of a co-located Kinect depth camera. The camera was mounted to prevent the subject’s face from being captured. Both devices provide ranging information (i.e., how far objects from the sensing device); however, the RF heatmap captures significantly less personal information con- tent that significantly reduces the risk of privacy intrusion. We believe that the value of RF sensing can contribute to many other privacy-sensitive healthcare applications, such as Intensive Care Unit (ICU) activity logging [32, 33].

Further, the mmWave radar has advantages over other RF sensing tech- nologies, such as WiFi and RFID, including that: 1) the mmWave radar is self-contained, as it does not require two communicating parties, tags, or an- tennas; 2) it can be ubiquitous, as its form factor and low cost lower the barrier of the technology uptake; and 3) it has better spatial resolution that allows the filtering out of irrelevant motions that are often present in the real world due to other people or equipment. Together with privacy-protection property

(see discussion above), these advantages make the mmWave radar an ideal candidate for large-scale adoption in real-life healthcare facilities.

July 20, 2020 3.2. Background and Technical Motivation 39 3.2 Background and Technical Motivation

Fig. 3.3: Timeline of HH technique of a practicing healthcare worker.

HCWs are expected to execute the HH protocol at appropriate occasions at work (e.g., before and after touching a patient). Hospitals facilitate this protocol by placing soap or alcohol-based hand-rub dispensers at many easily accessible places in and outside the hospital wards. HCWs are expected to follow a standard hand cleaning procedure to ensure their hands are thoroughly cleaned. For example, the WHO recommended the HH technique (see Figure

3.1), which should take a HCW between 20–30 seconds to complete.

To better understand the current state of HH practices in healthcare en- vironments, we conducted face-to-face interviews with active HCWs at the

Prince of Wales Hospital in Sydney. During the interviews, we asked the

HCWs to show us how they would typically execute the hand-rub procedure.

We used a camera to record the process 2 and analysed the recordings to obtain the hand-rub gesture sequence and timing information. As Figure 3.3 shows, the real-life gesture sequence diverges from the ideal expected sequence shown in Figure 3.1 in which the gestures G1, G2, ..., G9 are executed consecutively. Conversely, Figure 3.3 shows that the gestures are repeated and do not occur in the expected order. For example, one HCW continued to rub her hands

2Ethical approval was granted by the University of New South Wales (Approval Number HC180818)

July 20, 2020 3.2. Background and Technical Motivation 40 until all alcohol had dried off her hands. Further, variation in the timing of each gesture was also observed. This simple example illustrates the intricacies involved in hand rubbing. We expect significant deviation between real-world hand rubbing and the ideal protocol.

Based on the above observations, the first goal of the RFWash is to accu- rately track the sequence of gesture poses performed by HCWs. The recorded sequence can be compared to the expected set of gestures { G1,G2, ...G9 };for example, the detected gestures can be compared to the set of expected ges- tures. Additionally, the RFWash tracks timing information to help to assess compliance against the 20–30–second duration guidelines. A more complex compliance analysis based on the gesture sequence and timing information could also have been undertaken, but this is beyond the scope of this work.

3.2.1 Back-to-back Gesture Tracking

In this section, we explore the limitations of existing RF gesture processing algorithms in the HH scenario.

Popular RF gesture recognition approaches follow a two-stage architecture, in which a detection/segmentation step is followed by a recognition step [4,

36, 40, 41]. There is a critical assumption that the RF time-series can be segmented into segments in which each segment contains only one gesture.

Thus, a classifier can be trained and tested based on these well-separated segments.

Typically, segmentation is conducted in one of two ways:

• Gestures are naturally segmentable. Users introduce a brief pause before

and after performing each gesture [40] to make the detection of the start

and end of individual gestures simpler (see Figure 4 in [40] ). Training

July 20, 2020 3.2. Background and Technical Motivation 41

(a) Back-to-Back Gestures Top: Doppler measurements of contiguously per- formed hand-rub gestures. Bottom: The differentiated principal component of the measurements. The vertical lines mark the start and end time of each gesture.

(b) Manually Segmented Gestures: Examples of manually segmented sign lan- guage gestures (samples taken from dataset [4] [a laboratory environment] and pro- cessed using PCA

Fig. 3.4: Back-to-Back versus manually segmented gestures

samples either contain only relevant gesture data [40, 75] or the gesture

data with additional samples that represent “no gesture” [78]. In the

run-time, a segmentation module automatically segments gestures using

no motion or “silent” periods.

• Users annotate continuous gestures manually. Applications such as sign

language recognition do not have segmentable gestures and the auto-

mated segmentation step from the previous approach fails. The limi-

tation can be overcome by manual segmentation [4] (i.e., the manual

extraction of segments) each of which contains a gesture. The key draw-

back to such an approach is the high intensity of labour that the manual

segmentation efforts requires. The labelling of RF signals is not intuitive,

which can introduce more errors than more natural modalities, such as July 20, 2020 3.2. Background and Technical Motivation 42

(a) Segmentation error (b)Imapctofofseg.erroronAc- curacy

(c) Confusion matrix, manual seg. (d) Confusion matrix,seg. er- ror: 0.5 → 1s

Fig. 3.5: Classification is highly dependent on segmentation quality in RF gesture recognition systems [4] video or audio. Figure 3.4b provides an example of sign gestures in this

category.

Why do we propose to use a segmentation-free approach? Figure 3.4a shows the Doppler measurements (top graph) and its differentiated principal compo- nent (bottom graph) of a real-life execution of the HH technique. The vertical lines in the bottom graph show the correct gesture boundaries. It shows that gesture boundaries are sharp with a minimal period of “no gesture” samples in between. Therefore, threshold-based segmentation [75] fails to recognise gesture boundaries. Consequently, most segmented sequences contain RF sig-

July 20, 2020 3.2. Background and Technical Motivation 43 natures from multiple gestures. A classifier trained on such data will perform poorly.

The impact of segmentation errors. To quantify the errors due to inaccurate segmentation, we applied SignFi algorithm [4] to RF traces of HH gestures.

The algorithm uses a deep CNN architecture originally designed to classify

276 sign language gestures, which we adapted to better suit our application scenario3. We evaluated SignFi on our dataset of naturally-performed HH technique from ten subjects using manually segmented samples that contain exactly one gesture in each segment4. Using two-second Doppler Range mea- surements and session-based cross-validation(see Section 3.3.2 for the details), we obtained accuracy of 83.3% . The confusion matrix (see Figure 3.5c) shows that the accuracy is more than 79% for most gestures except in relation to some of the mirrored gestures, i.e., G6/G7 and G9. Anecdotally, RF signatures of (G6,G7) and (G8,G9) are similar to each other and are more likely to result in incorrect classification.

To investigate the effect of segmentation error, we deliberately allowed segments to contain a few samples from neighbouring gestures. We ensured that the majority of the samples in a segment corresponded to the target gesture (see Figure 3.5a for an illustration). In particular, we allowed for an overlap of 1-25% and 25-50%, corresponding to 0.2-0.5 second and 0.5-1 second overlaps, respectively. This allowed us to study the impact of different levels of segmentation error on the classification accuracy. As Figure 3.5b shows that the accuracy decreases when the segmentation error increases. This shows that

SignFi does not handle the segmentation errors well.

3The convolutional layer in [4] has three 3x3 kernels. This produced poor results on our Doppler Range measurements. Consequently, we increased the number of kernels from 3 to 512, which improved its performance significantly. 4Sample-level labelling was conducted using a synchronised camera. July 20, 2020 3.2. Background and Technical Motivation 44

Table 3.2: Time cost to perform manual labelling and sequence labelling

Method RF data Labelling & Saving Segment. Manual Segmentation 4 mins @ 8Hz 18 mins - 10s Sequence labelling 4 mins @ 8Hz 6 mins 66.6%

The cost of manual segmentation and labelling. As RF samples are difficult to label and segment directly, we used a synchronised video camera in our ex- periment. The gestures were identified in the video and labels were propagated to the corresponding RF signatures. Despite using the camera feed as a visual aid, we found the process to be very time-consuming and we investigated an alternative method for annotating RF segments.

Sequence labelling. We introduce a new approach, which we call sequence labelling, to reduce the complexity of manual labelling. Two key ideas in se- quence labelling are: 1) We ask users to annotate relatively long continuous sequences of data; and 2) We request users to annotate gesture sequences with- out capturing the exact timing information of individual gesture boundaries.

Let us consider an example. Assume that we have a collection of 20 data frames {f0,...,f19} that contains the gestures G1, G2, and G3 in that or- der. Manual segmentation requires us to identify gesture boundaries or map each frame to a gesture, e.g. the annotated sequence is G1 ∈ [f0,f5],G2 ∈

[f6,f13],G3 ∈ [f14,f19]. In contrast, sequence labelling will annotate this col- lection of frames simply as G1 → G2 → G3, which says the order of gestures in the frames are G1, G2 and G3 without to specifying the transition times. Thus, less work is required to conduct sequence labelling.

We quantified the time required to perform manual segmentation and se- quence labelling experimentally by asking three annotators to label four min-

July 20, 2020 3.3. RFWash 45 utes of RF data sampled at 8 Hz using these methods. The average time taken is shown in Table 3.2. On average, manual segmentation took 18 minutes while sequence labelling took only 6 minutes, resulting in a saving of ≈ 66.6%.

Notably, manual labelling and segmentation costs can be significantly higher for higher RF sampling rates such as 200Hz in [4] or 1kHz in [36, 75].

In the next section, we show that it is possible to achieve highly accu- rate gesture segmentation and classification based on sequence labels. Unlike classical supervised learning that requires fully annotated data, our weakly supervised method only requires minimally labelled data.

Summary: The assumption of easily segmentable input that is commonly used by existing RF-based gesture recognition approaches does not apply to

HH gesture recognition scenario. We showed that the HH gesture classification accuracy depends heavily on the segmentation quality. High quality classifiers can be developed using manually segmented data; however, the associated labelling costs incurred are substantial. Inspired by sequence labelling methods which were used extensively in speech and handwriting recognition literature,

RFWash departs from the existing RF sensing segmentation approaches and proposes new methods to learn from weakly labelled unsegmented data.

3.3 RFWash

Figure 3.6 shows the architecture of the proposed RFWash framework. RFWash is trained on sequences of HH gestures in the RF space and their correspond- ing sequence labels. As discussed in the previous section, a sequence label only contains the order of the gestures in the segment. The training process, therefore, needs to determine the most likely mapping of gesture labels to each

RF frame. This is done via a process called alignment learning. At runtime,

July 20, 2020 3.3. RFWash 46

RFWash model internally assigns a likelihood to each (input RF frame, ges- ture) pair, which is then used to infer the most likely gesture sequence. Before we delve into the details about the model itself, we explain why we considered the specific model chosen and then we explain the input RF measurements.

Training A RD frames

alignment G1,G2 learning { [ ] Model sequence label Runtime Model (trained) predicted f1 f2 f3 f4 f5 f6 sequence G1 G G G G B 2 [ 1, 2, 3] G ... 3

Fig. 3.6: RFWash is trained on continuous RF samples (A) of HH gestures and corresponding sequence labels. The model automatically learns which frames correspond to individual gestures (e.g., G1 vs G2) via ’alignment learning’. In run- time, per-frame gesture predictions (B) are produced and used to estimate the most likely gesture sequence .

3.3.1 Why deep sequence labelling?

As explained earlier, the hand hygiene tracking can be approached as a se- quence labelling problem. In this regard, Hidden Markov Model (HMM) [79] can be considered as possible alternative to the deep sequencing labelling model. However, many studies had already shown that deep sequence la- belling that relies on Recurrent Neural Networks (RNN) coupled with CTC outperforms HMM and other variants [80]. Moreover, HMM requires manual effort and domain knowledge before it can be used (i.e. specifying the model

July 20, 2020 3.3. RFWash 47 states and state transitions). On the other hand, deep sequence labelling can be trained directly using input-output pairs in an end-to-end manner.

As we will see in Section 3.3.3, the CTC is an alignment-free algorithm.

The key benefits of the algorithm are that no need for have pre-segmented training data and no need for external post-processing to extract the label sequence. On the other hand, CTC doesn’t permit encoding explicit knowledge for the context between classes and their temporal progression. This can be a limitation in other domains such as video learning [81]. For example, when mapping video frames to actions sequence encoding, grammar rules that makes the action“pouring milk” more likely if the previous action was “reaching milk” can enhance the sequence prediction accuracy [82]. However, this is not the case in Hand Hygiene gestures as users can perform the sequence in any order

(see Section 3.2). In fact, it is even desirable to have predictions that don’t make hard assumptions about the expected gesture sequence.

3.3.2 RF Measurements

RFWash uses a mmWave radar mounted to a soap dispenser, to collect RF signatures of subjects that perform hand cleaning. Figure 3.7a shows the system setup. Many subjects may be present in a hospital environment (in- hospital setup, see Section 2.1); however, subjects that perform hand cleaning must stand close to the radar (e.g., within 1 metre). The subject will face the radar and her hands will be at approximately the same height as the radar.

Consequently, our goal is to measure the velocity of her hand motions and

filter out any other irrelevant signals.

A mmWave radar transmits a sinusoidal wave T (t), called a “chirp”, of linearly changing frequency and a time delayed version of the transmitted

July 20, 2020 3.3. RFWash 48

(a) Main subject facing the a radar and performing hand rub while an interfering subject( masked in green) passes behind the main subject

(b) Consecutive frames showing that the main subject (SM )RD measurements can be separated from a passing interfering subject (SI ) by a range cut-off.

Fig. 3.7: Range Doppler frames measurements signal is received for every reflector in the environment, including the hands of the subject performing washing. Formally, the frequency of a chirp at time

July 20, 2020 3.3. RFWash 49 t can be expressed as:

B f f t, t = 0 + T (3.1) where f0 is the starting frequency of the chirp, B is the bandwidth and T is the chirp duration. Let A(t) be the amplitude of T (t) at time t. The transmitted signal T (t) can be expressed as :

B T (t)=A(t) sin(2π(f t + t2)). (3.2) 0 2T

When the transmitted signal is reflected by a stationary object at distance

D0 from the radar, the reflected signal R(t)is:

B 2 R(t)=E(t) sin(2π(f (t − td)+ (t − td) )), (3.3) 0 2T where E(t) is the amplitude modulated by the object, the round-trip time delay is td =(2D0)/c where c is the speed of light. The signals T (t) and R(t) are mixed on the radar to produce the received signal S(t). It can be shown that S(t) has two frequency components: 1) the difference in frequencies between T (t) and R(t); and 2) the sum of their frequencies. A low pass filter can be applied to remove the second component:

BD f D S t ≈ C t π 2 0 t 2 0 0 . ( ) ( ) cos(2 ( cT + c )) (3.4)

C t S t 2BD0 where ( ) is the amplitude. The frequency of ( ), which is given by cT , is called beat frequency and can be used to estimate the objects distance D0. In general, there may be multiple objects in the vicinity of the radar and the mixed received signal will contain multiple beat frequencies. We can re-

July 20, 2020 3.3. RFWash 50 solve these with Fast Fourier Transform (FFT) and consequently compute the distances between each object and the radar.

However, range alone does not provide sufficient information to solve our problem. A subject’s hands during the handrub are very close to each other during the entire procedure. More information is needed to differentiate be- tween the gestures. Fortunately, the mmWave radar allows us to measure

Doppler frequency shift in the S(t) signal of the objects moving in the scene can be obtained.

We use mmWave signal S(t) to derive intensity map of the scene shown in Fig. 3.7b. The intensity map I(t, r, v) has the following interpretation: the intensity I(t, r, v) is higher if there is a higher chance at time t of finding an object located at distance r from the radar and moving at speed v. Fig. 3.7b shows the intensity map at three different time instants, with r plotted from

0 to 3m metres, and v from -2 to 2 m/s. Large intensity is shown in red. We will refer to the intensity map I(t, r, v) at a point in time as a Range-Doppler

(RD) frame.

RFWash needs to be robust to interference from nearby moving objects and people. Figure 3.7a shows a subject performing handrub in front of the radar. The person masked in green is within the range of the radar and acts as an interferer. Figure 3.7b shows RD frames at three time instants, with the location of the subject’s hands and interferer marked by SM and SI , respectively. We note that the intensity of all RD frames stay approximately unaffected in the SM region, by the interferer’s movement (see the dotted elipses in Figure 3.7b). From now on, we limit the range r to less than 1 metre so as to focus on the main subject only.

For illustration, we post-process the RD frames to amplify the HH gestures by performing background subtraction and Gaussian smoothing. For each RD July 20, 2020 3.3. RFWash 51

Fig. 3.8: Illustration of Range Doppler frame post-processing for gesture G2. The figure shows the original RD frame, RD frame with background removed, and a smoothed version. frame, we use the frames in the previous 1 second to estimate the background.

We then perform background subtraction and smooth the result with a Gaus- sian filter (see Figure 3.8). These steps remove the static reflection of the torso of the subject and amplify the hand motions related to the gesture performed in the RD frame. The input to RFWash deep model is a stack of normalised

RD frames after applying a cut-off at 1 metre to each frame and resizing them to 50 × 50 pixels.

3.3.3 Deep Learning Model

Fig. 3.9: RFWash Network Architecture. RFWash network has five convolu- tions followed by a max pooling layer (2x2) and fully connected layer, followed by two Bi-directional LSTM layers and finally a softmax layer. All convolutions are 3 × 3 (the number of filters are denoted in each box).

Figure 3.9 shows the layering structure of our deep learning model is shown.

Convolution layers followed by Bidirectional LSTM layers are used to extract spatiotemporal gesture features from input RD frames, a softmax layer and

July 20, 2020 3.3. RFWash 52

finally a Connectionist Temporal Classification (CTC) are employed to predict the gesture sequence.

As illustrated in Figure 3.6, RFWash takes a segment consisting of a stack

50×50×T of consecutive T RD frames X =[x1, ..., xT ] ∈ R from the continuous stream as input. The goal is to infer the gesture sequence  performed by a

1×K HCW where  =[1,2, .., K ] ∈A where A is the set of possible gestures and K ≤ T . Since the continuous segment can contain irrelevant motions (i.e., stationary user or users walking away from device), we define an additional

“no gesture” class GNo in addition to the nine HH gesture classes. Ultimately, the set of possible gestures is formulated as A = {GNo}∪{G1, ..., G9}. As stated above a sequence label , is used rather than a frame-by-frame

1×T label π =[π1, ··· ,πT ] ∈A to reduce the labelling cost (see Section 3.2.1). π is also called gesture path. An associated challenge with using  is the lack of temporal alignment as it can be compatible with many plausible gesture paths.

For example, if the sequence label is G1 → G2 for an input of four frames, then this label is compatible with the gesture paths [G1,G2,G2,G2], [G1,G1,G2,G2] and [G1,G1,G1,G2]. Intuitively, the model resolves this challenge by consid- ering the probability of all plausible gestures paths for a particular sequence label.

Spatiotemporal Feature Extraction

Motions captured by the mmWave radar in a single RD frame have an identi-

fiable spatial pattern on range and velocity dimensions. Additionally, the tem- poral dynamics of each gesture will be present in consecutive RD frames. We use spatiotemporal feature extraction layers composed of five Convolutional layers followed by a fully connected layer and two RNN (Recurrent Neural

Network) layers. RNNs perform welll in sequential data modelling and are a July 20, 2020 3.3. RFWash 53 good choice for capturing the temporal dynamics of the gestures. However, in the context of HH technique, the mirrored gestures discussed in Section 3.2.1, present unique challenges because of their similarity in RF domain. Therefore, we employ bidirectional recurrent layers with LSTM cell type (BiLSTM [83]) to enable the network to use all available input information in the past and the future from a specific RD frame. In this configuration, two separate recurrent layers running in the forward direction (future) and the backward direction

(past) are utilized to learn the complex temporal dynamics.

The spatiotemporal feature extraction layers and softmax activation pro- cess input RD frames X to produce frame-wise probabilities of different ges- tures Y , which we call BiLSTM posterior. Y can be interpreted as the prob- ability of observing a sequence of gestures across T frames. This is further processed by the temporal alignment to estimate the most likely gesture se- quence.

Temporal Alignment Learning

RFWash implements alignment learning to infer the hand rub gesture sequence by mapping the output of BiLSTM components (i.e., BiLSTM posterior) to the corresponding gesture path. We rely on CTC algorithm [43], which is explained next in details.

A×T Let Y =[y1, ··· ,yT ] ∈ R be the softmax-normalized BiLSTM output for a stack of T RD frames, where A = |A∪{φ}| where φ denotes a blank. A blank is used by CTC to account for the probability of observing ‘no labels’ and modelling the transition between gestures within sequence. Thus, A =11 for RFWash. The vector yt,t ∈{1,...,T} can be interpreted as follows: yt,k denotes the probability that the gesture at time t is k where k =1,...,A.

Given the observations X, the posterior probability for any gesture path July 20, 2020 3.3. RFWash 54

π =[π1, ··· ,πT ] can be calculated as: T , P (π|X)= yt πt , ∀πt ∈A. (3.5) t=1 Notably, the posterior probabilities obtained in Equation (3.5) are con- ditionally independent for different gesture paths. This is desirable in the problem context, as we do not want the gesture classifier to be dependable on the order of gestures in the training data.

In the CTC framework, the probability of the sequence label  is the sum of the probabilities of all its compatible gesture paths:

P (|X)= P (π|X), (3.6) {π|B(π)=}

where B is an operator that removes consecutive label repetitions and blanks in π. Intuitively, Equation (3.6) considers all possible alignments in

 5. The most probable sequence label ∗ can be predicted as:

∗ = arg max P (|X), (3.7)  and the network can be trained using standard back-propagation method, min- imising the following:

L(, X)=− log P (|X). (3.8)

For gesture timing estimation, we first process BiLSTM output to estimate the top gesture pathπ ˆ =[ˆπ1,...,πˆT ] by selecting the gesture with the top probability at each frame. Next, we set the starting time of a gesture to the frame with the highest probability for that particular gesture. Finally, the

5We note that the CTC forward backward algorithm is more efficient than considering all possibilities [43].

July 20, 2020 3.3. RFWash 55 end time is set to the frame before the starting point of the next gesture in sequence or the end of segment.

Data augmentation

Up to this point, the model training can proceed using unsegmented input X of arbitrary lengths T and the corresponding sequence labels  (see Equation

(3.8)). Larger T will reduce the annotation effort as fewer sequences need to be annotated for a given training set. However, very long segments would result in a few training samples. Additionally, according to our experimental obser- vation ( see Section 3.4.3), training on long sequences results in poor tempo- ral alignment. Since we cannot tamper with sub-sequences, RFWash employs

“order preserving” concatenation of existing samples to augment training data that can lead to the increase of the number of samples quadratically. For ex- ample, let Xa and Xb be two stacks of RD frames, and their corresponding sequence labels are a and b. A new stack is obtained by concatenating Xa and Xb to form [Xa,Xb] and its corresponding sequence label is B([a,b]).

Concatenation is applied across the whole training sequences across different users. As a result, concatenated sequences can be compromised of gestures performed by more than one user in order. This simulates the natural situa- tion in which multiple users doing the HH gestures sequentially such that the user starts after the preceding user finishes the sequence and walks away from the radar. Prior to applying “order preserving” augmentation on the sequences level, we apply random jittering, as done by others [84], on the the individual

RD frames within each sequence. Jittering simulates the random noise that can be present in wireless environments and ensures no two sequences (after applying “order preserving augmentation”) have the exact same copy of RF frame as this can lead to over-fitting.. Figure 3.10 illustrates the concatena- tion intuition in our context. RFWash uses concatenation to grow the training July 20, 2020 3.4. Evaluation 56

Fig. 3.10: Concatenation Intuition. We significantly grow samples containing a specific sequence ( [G7] ) by concatenating it with other sequences in the training set([G7,Gx]or[Gx,G7] in the middle column ). Consequently, G7 is seen in many contexts by the model and enables better learning of radar frames that correspond to it within the sequence ( [G6,G7] right column). dataset by ten times.

3.4 Evaluation

3.4.1 Goals, Methodology and Metrics

We collected natural handrub data where subjects perform gestures without pausing. Our RFWash prototype uses a TI mmWave IWR1443 sensor to collect RF measurements at 8 Hz. The sensor operates in the 60GHz frequency band. RD measurements in our setup are exported with a ranging resolution of ≈ 3 cm and velocity resolution of 0.2ms−1. Ten subjects were recruited for data collection (8 males and 2 females)6. The subjects had no previous experience with the handrub procedure. They were shown a video of hand

6Ethical approval has been granted by the University of New South Wales (Approval Number HC180818).

July 20, 2020 3.4. Evaluation 57 rubbing produced by the WHO, and they were asked to repeat the whole process two times before starting data collection began. For each session, a subject performed the handrub gestures from G1 → G9 then stayed stationary or walked away from the device and returned. Note that subjects could miss one or more gesture. In addition to following the instructions, the subjects were also asked to perform handrub gestures in a random order to allow us to evaluate the impact of unseen gesture sequences. We collected a total of 1800 gesture samples (10 subjects × 4 sessions × 5 repetitions × 9 gestures). Thus our dataset contain the same number of gesture samples for each of the unique 9 gestures. Following [84], we keep the number of background samples (i.e. GNo) to < 60% of the whole data to maintain the balance of the dataset. Figure

3.11a shows the time take by subjects in each gesture (average gesture time is

3.2s). We recorded all sessions with a camera and used Network Time Protocol

(NTP) to time-synchronise video and RF frames from radar. Later, a human auditor inspected the recorded video footage and labelled the footage frame-by- frame. We employed a four-fold cross validation for evaluation. By default, we adopt session-based cross validation where one hand rubbing session is held out for test and the rest are used for training. Order preserving augmentation (see

Section 3.3.3) is applied only on the training data after data splitting to ensure no overlap between the training and test sets. RFWash was implemented using

Keras deep learning framework.

Metrics

To evaluate the performance of RFWash, the following metrics were used:

• Gesture Error Rate (GER): This is defined as the minimum number

of gesture insertions, substitutions, and deletions needed to transform the

July 20, 2020 3.4. Evaluation 58

predicted gesture sequence into the ground truth (GT) gesture sequence,

divided by the number of gestures in the ground truth (GT). Table 3.3

shows a few examples of GER. This metric mimics Word Error Rate

(WER) which is a standard metric in sequence recognition problems.

• Exact Match Rate (EMR): When the predicted sequence matches

the ground truth exactly, GER equals zero. We report the percentage of

these sequences as the EMR.

• Timing Error: this is the absolute difference between estimated gesture

and ground truth timing. The timing error is calculated for gesture

sequences with EMR of 100%.

Table 3.3: Gesture Error Rate (GER) examples

Ground Prediction GER The number of editing truth required

[G2] [G2] 0% - G G ,G deletion [ 2] [ 1 2] 100% the length of ground truth G G ,G deletion+substitution [ 2] [ 1 3] 200% the length of ground truth

3.4.2 Weakly Supervised Gesture Tracking

Table 3.4: The gesture recognition accuracy of RFWash

Segment gestures/segment mean GER median GER 1s μ:1, max:2 16.01% 0% 5s μ:2, max:4 11.01% 0% 10s μ:4, max:6 7.41% 0%

We evaluate the gesture sequence recognition of RFWash using the three metrics discussed earlier. Figure 3.11b shows the mean GER when RFWash July 20, 2020 3.4. Evaluation 59

(a) Gestures durations (c) Exact Match Rate stats (b) Gesture Error Rate

(d) Procedure Timing Error (e) Per-step Timing Error

Fig. 3.11: RFWash evaluation for different gesture sequence lengths.

is trained/tested on segments with lengths of one, five and ten seconds, re-

spectively. Results show that RFWash trained on 10 second segments achieves

GER of 7.41% which translates to 0.45 substitutions, deletions or insertions to

make the sequence match the ground truth per maximum number of 6 observed

gestures in the segment.

Table 3.4 shows that the average(μ) and maximum (max)numberofges-

tures in each segment length along with mean and median GER. The median

is zero as more than 75% of the sequences are correct with GER of zero. The

table also shows that the GER decreases for larger segments. This is because

the number of errors (i.e., the numerator in GER equation) is relatively con-

stant even as the sequence length increases. We observe that edits are usually

required at the end of the segment. We hypothesize that a gesture at the end

of the segment contains relatively fewer RD frames than other gestures due to

July 20, 2020 3.4. Evaluation 60

(a) Per-procedure alignment. Thick blocks denote periods of performing the se- quence [G1,G2, ··· ,G9].

(b) Per-setp alignment.

Fig. 3.12: Alignment performance w.r.t training sequence length the overlap with the next segment. On the other hand, EMR is fairly constant across the different segment lengths, as shown in Figure 3.11c.

Recall that HCW are required to perform hand rub procedure (i.e. the nine steps) for at least 20s (see Section 3.1). We evaluate RFwash in capturing the timing error of the whole handrub procedure. As the user approaches the radar, performs the procedure then walk away or stays stationary, we report

July 20, 2020 3.4. Evaluation 61

7 the procedure time as the time of between two consecutive GNo . The results in Figure 3.11d shows that procedure time can be estimated with very high accuracy with median error of 0s regardless of the sequence length used for training.

To gain further insight into the timing alignment performance, we calcu- lated the per-step (i.e. per gesture) timing error in Figure 3.11e. In general, it shows that the per-step timing error is larger than the procedure timing error. The medians absolute errors in per-step case are 0.49s, 1.17s and 1.88s for different sequence lengths of 2s, 5s and 10s, respectively.

The reason behind better procedure alignment compared to per-step align- ment is that GNo pattern is highly distinguishable from the rest of the gestures. This makes accurate identification of the whole procedure boundaries possible even when using large sequence length (10s). On the other hand, back-to-back steps within the procedure may show similar patterns specially for mirrored gestures. Figure 3.12 qualitatively compares the alignment performance for procedure and step levels. Figure 3.12b illustrates per-step alignment accu- racy degrade as we increase the training sequence length. It should be noted that, as the gestures are following each other in back-to-back manner, timing error in one gesture will contribute equally to the neighbouring gestures. For example, G5 alignment error in bottom row of Figure 3.12b results in equal alignment error in the subsequent G6

The Impact of Data Augmentation

We investigate the impact of data augmentation introduced in Section 3.3.3 on the performance of RFWash. For the benefit of space, we show the results

7 recall that GNo is the “no gesture” period

July 20, 2020 3.4. Evaluation 62

(a) Exact match rates. (b) Gesture error rates.

Fig. 3.13: The impact of data augmentation. of sequences of 5s as other sequence lengths show similar patterns. Figure 3.13 shows that data augmentation improves the performance of RFWash signif- icantly. Specifically, it increases the EMR from 69.2 % to 75% and reduces mean GER from 13.6% to 11.01%. More importantly, the box in Fig- ure 3.13b shows that the majority of gesture sequence matches the ground truth, i.e., GER = 0. In general, the results shows that the effectiveness of the proposed data augmentation. The precautions taken to guard against copying the data “as is” between training sequences ( see data augmentation details in

Section 3.3.3) were instrumental in avoiding over-fitting.

3.4.3 Unseen Domains

To evaluate the generalization capability of RFWash, we evaluate it for unseen domains, which include unseen sequence lengths and unseen gesture sequences that were not included in training data.

Unseen Sequence Length

RFWash accepts radar signals of variable lengths as valid inputs (see Sec- tion 3.3.3). We evaluate the performance of RFWash with input sequence

July 20, 2020 3.4. Evaluation 63 lengths that are different from those in the training data. The ability to clas- sify variable length sequence is an advantage. For example, short segments have smaller latency and are preferable in scenarios where quick user feedback is needed. Longer sequences have better recognition performance and may be preferable for HH compliance audits which can be performed offline.

Figures 3.14a, 3.14b and 3.14c show that unseen sequence length has neg- ative impact on the performance of RFWash. However, the negative impact can be reduced by data augmentation, achieving significant improvements for all metrics. For example, for 15s sequence length, data augmentation improves

RFWash performance by more than 2.8x, 4x and 12x, for timing estimation error, GER and EMR, respectively.

We investigate this in more detail for one specific sequence shown in Fig- ure 3.15. BiLSTM posteriors for three different sequence lengths of 3.1s, 6.25s and 12.5s are shown. Note that only the 6.25s sequnce was included in train- ing data. The predicted gesture sequence (G2,G3,G4,G5) for the previously seen 6.25s sequnce is correct, both with and without data augmentation (see

Figure 3.15b). However, data augmentation produces significantly better tem- poral alignment. For unseen sequence lengths (Figures 3.15a and 3.15c), data augmentation produces significantly better GER and temporal alignment. For example, G2 is a false positive in Figure 3.15a, while G2,G3 and G7 are false negatives in Figure 3.15c for RFWash without data augmentation. RFWash with data augmentation did not produce any incorrect predictions in this in- stance. Similar behaviour was observed for many other segments of signals.

To understand why occurs and gain insights into the model behaviour, we leverage the posterior and compare input frames with the highest posteriors for the two versions (augmented versus non-augmented). Based on a visual inspection, it was generally observed that for each gesture some of the RD July 20, 2020 3.4. Evaluation 64

(a) Gesture error rates.

(b) Exact match rate.

(c) Timing

Fig. 3.14: Impact of unseen sequence length to the performance of RFWash. Vertical shaded areas in the figures highlights the sequence length used in the training. frames can better distinguish it from the rest (distinctive frames). Conversely, other frames can be similar to other gestures (common frames) and can be a source of confusion. The augmented versions tend to consistently assigns July 20, 2020 3.4. Evaluation 65

(a) Alignment for unseen sequence length of 3.1s.

Alignment for reference (seen) sequence length of (b) 6.2s. We use this segment length for training.

(c) Alignment for unseen sequence length of 12.45s .

Fig. 3.15: Temporal HH gesture alignment. GT: Ground truth. aug: with data augmentation. w/o: without data augmentation highest posterior value to “distinctive” input frames.

To explain our observation, consider gesture G3 where the subject slides

July 20, 2020 3.4. Evaluation 66

(a) G3 protraction (left) and retraction (right)

(b) Without data augmentation: poor temporal alignment that doesn’t capture G3 protraction and retraction

(c) With data augmentation: the model assign high probability G3 protraction and retraction(red border).

Fig. 3.16: The benefit of data augmentation. (a) G3 retraction and protraction result in angular blobs in RD frames (red border in (c)). (b) and (c) shows G3 RD frames predicted by the model w/augmentation and w/o augmentation .Images transparency are inversely proportional to the posterior value for the frame. the left hand above the right hand to clean the areas of the back and between two fingers of right hand. G3 consists of two motions: protraction and re- traction (see Figure 3.16a), respectively. In our experiment setup, the motion of protraction is defined as her left hand moves towards the radar, and the motion of retraction is defined as her left hand moves away from the radar.

Figure 3.16a plots the Velocity Points Cloud (VPC) of mmWave radar output for the gesture. The colours of the points in the figures show the velocity of the tracked objects (i.e., the left hand of the subject), where the warm colors such as red show positive velocities (i.e. moving towards radar) and the cold July 20, 2020 3.4. Evaluation 67

(a) Gesture error rates. (b) Exact match rates.

Fig. 3.17: The impact of unseen gesture sequences. colours such as blue show negative velocities(i.e. moving away from radar), respectively. As it can be seen, the higher the absolute velocity of the object is, the darker (i.e, warmer or colder) the color is. Which means that the ab- solute velocity of the object is higher when it is further away from the radar.

Consequently, these two motions result in patterns that appear as “angular blobs” in the RD frames.

What happens is that augmented version mostly assigns high posterior to these distinctive key frames containing angular blobs (Figure 3.16c). The non- augmented version misses these key frames and assigns high probability to common frames that can be seen in many other gestures, hence the confusion.

In fact the frames in Figure 3.16c of the non-augmented version actually belong to G4 and not G3.

Unseen Gesture Sequences

We evaluate the ability of the RFWash to recognise unseen gesture sequences.

It is important that RFWash performs well on unseen sequences as a HCW may follow any order of gestures (see Figure 3.3 for an example), and it is difficult to collect data for all possible gesture sequences.

July 20, 2020 3.4. Evaluation 68

To test the performance of RFWash on unseen gesture sequences, we col- lected additional data with hand washing gestures performed in random orders

(for example, G9 → G8 ··· → G1). Recall that RFWash was trained on se- quences G1 → G9 (see Section 3.4.1). Figure 3.17 shows that RFWash with data augmentation performs well on unseen gesture sequences, with negligible reduction of performance compared to testing on previously seen sequences. As in the previous section, the impact of data augmentation on RFWash performance is significant.

3.4.4 Comparison with Fully Supervised Gesture Recog-

nition Deep Learning Models

In this section, we compare the performance of RFWash model to state-of- the-art supervised deep learning based approaches, which include C3D [85] 8 and DeepSoli [26]. C3D has a 3D CNN that outperforms 2D ConvNets in spa- tiotemporal feature learning in large scale vision-based gesture recognition [86].

DeepSoli employs deep Convolutional Recurrent model (i.e., CNN + LSTM) to recognize finger gestures used for HCI, and it operates on mmWave Range

Doppler measurements of Google’s Soli sensor [26], which is available in re- cently launched Pixel 4 smartphones. Compared to the TI mmWave radar in

RFWash prototype, Soli radar has significantly higher sampling rates, but a limited sensing range of 30 cm, which make it unsuitable for HH tracking. C3D and DeepSoli are trained on manually segmented TI mmWave RD frames and tested on continuous HH gesture stream using auto segmentation of a sliding window length of 8 RD frames and an overlap of 7 samples. DeepSoli

8we consider an adapted version of the original model with the layers: [Conv1a, pool1, Conv2a, Conv3a, Conv3b, pool3, pool2, Conv4a, Conv4b, pool5, FC(512), FC(512)][85]

July 20, 2020 3.4. Evaluation 69 and C3D official public implementations are available only for the deprecated frameworks Torch and Caffe 1.0 ; respectively. To avoid technical issues asso- ciated with outdated packages, we replicated the architectures on top of Keras

(the framework we used for RFWash) guided by the details in the papers and the public official code. For a sanity check, the replicated implementations were evaluated against the corresponding public dataset of each system. The classification accuracy was found to aligned with the results reported in the papers with a negligible difference of ≤ 0.4%. Training and test data splitting was performed as mentioned in Section 3.4.1. Each test fold contains ≈ 12k

RD samples. To improve the accuracy, prediction pooling (p) is applied to

C3D and DeepSoli by summing up softmax activation and using the average activation for gesture prediction [26]. RFWash is trained on continuous gesture stream data as usual with sequence length set to 2s.

Table 3.5 shows the HH gesture recognition accuracy of different deep learning models (GNo in the table denotes “no gesture” class). It shows that RFWash outperforms the alternative approaches significantly with an overall accuracy of 85%, which is 7% and 20% higher than C3D(p) and DeepSoli (p) respectively. We note the poor performance of DeepSoli is probably due to the low sampling rates of TI mmWave radar sensors, which is significantly lower than those of Soli radar sensors. Specifically, RFWash achieves highest accu- racy for most gestures except for G2, G3 and G8. Furthermore, we note that RFWash has the additional advantage of “weak” supervision, i.e., without the requirement of intensive manual per RD frame labelling and segmentation, and significantly less number of model parameters (see the last row of Table 3.5) and computational resource consumption.

July 20, 2020 3.5. Related Work 70

Table 3.5: HH gesture recognition accuracy with different deep learning models

Gesture C3D C3D(p) DSoli DSoli(p) RFWash

GNo 92.5% 95.1% 92.5% 94.6% 97.8% G1 69.3% 72.4% 41.6% 40.9% 92.1% G2 68.3% 76.7% 88% 89.8% 85.4% G3 84.2% 92.9% 76.1% 77.8 87.2% G4 82.2% 83.9% 77.5% 80.8% 89.7% G5 84.4% 86.4% 33.1% 34.9% 87.7% G6 48.2% 43.7% 20.3% 18.8% 71.4% G7 70.4% 69.3% 74.1% 77.2% 79.6% G8 56.7% 66.3% 76.6% 80.6% 77.7% G9 66.1% 75.9% 42% 42.1% 84.2% Accuracy 73.33% 77% 63.15% 64.79% 85% # params 32.6M 166M 7M

3.4.5 LSTM vs BiLSTM

We investigate the impact of different RFWash deep learning model param- eters. Specifically, we compare the performance of CNN + LSTM and CNN

+ BiLSTM in Table 3.6, which shows that the proposed deep learning model with BiLSTM consistently performs better than that with LSTM, and the performance gap increases with the segment length.

Table 3.6: LSTM vs BiLSTM Model 2s 5s 10s GER EMR GER EMR GER EMR (LSTM) 20.7% 69.3% 18% 54% 17.7% 44.36% (BiLSTM) 16% 76% 11% 75.28% 7.41% 76.69%

3.5 Related Work

In this section we briefly revisit the earlier discussion of related HH monitoring research (see Section 2.3.1) then we elaborate on related RF gesture recognition July 20, 2020 3.5. Related Work 71 and gesture recognition systems that employed sequence labelling architecture similar to ours.

RFWash was developed with the main goal of tracking HH technique of

HCW. A common limitation of HH systems tried in healthcare facilities (such as RFID [45]) is that the actual HH technique is not monitored. Previous re- search sought to monitor HH technique either employed RGB [47] or depth [48] cameras (the use of which raises unacceptable privacy concerns [3] in health- care environments) or wearable sensors such as smart wristbands [46] that may lead to the transmission of health care-associated pathogens. RFWash, to the best of our knowledge, is the first contact-free HH technique monitoring system with the potential to work inside healthcare facilities without compromising privacy.

RF Gesture Recognition Gesture recognition is a very active research

field; many studies have sought to apply RF sensing to gesture recognition applications including WiFi-based [36, 75], RFID [87] and mmWave radar [26] systems. Despite extensive research on this topic, little research has been conducted on naturally-performed gestures that do not include pauses. This limitation continues to have a negative effect on the existing RF sensing appli- cations such as sign language recognition [4]. For example, while WiMu [36] successfully manages to recognise simultaneous multi-user gestures, it also re- quires user to take brief pauses before and after the gestures because the seg- mentation module “cannot segment [a user’s] gestures” [36] when performed contiguously. To this end, we presented our attempt to address this problem by taking input without segmentation and introducing a deep learning model that can learn from contiguously performed gestures. We hope the results of this work spark interest of research community, and in the near future, we will see follow-up research that can uses unsegmented RF data streams for more July 20, 2020 3.6. Discussion and Future Work 72 sensing applications.

Connectionist Temporal Classification-based Gesture Recognition

A number of research work [78, 88] employed CTC-based architecture for ges- ture recognition applications. Most notable of which is Nivida’s vision-based system [78] that fuses depth, RGB and IR camera measurements to recognize a driver’s hand gestures. In these approaches, gestures are segmentable (i.e., pauses exist between gestures) and training samples are pre-segmented to con- tain data for one gesture only and frames for “no gesture” [78]. The role of

CTC is to fine tune the predictions by locating the gesture nucleus. In the present study, due to the difficulty of pre-segmenting back-to-back gestures es- pecially from RF measurements, we proposed training on unsegmented gesture sequence (e.g, a number of gestures within a training segment). Such a criti- cal difference raises a challenge when training on long segments; however, we addressed this by employing a novel order preserving augmentation technique that regularises the training process and thus enables learning with “weak”

(i.e., less) supervision.

3.6 Discussion and Future Work

Significant improvements could be made to the current implementation of

RFWash. In our design, we focused mainly on the key technical challenge of recognising back-to-back gestures. To further improve the current design, we consider the key areas discussed below.

Hand Hygiene Tracking In Wards: Limited by the ethical clearance of this research, we were only able to collect RF data from the general pub- lic in a conventional university laboratory. Clinicians can perform hand-rub techniques at a faster . We considered collecting data in a dynamic office

July 20, 2020 3.6. Discussion and Future Work 73 environment in which interfering subjects would pass close by the main user.

Interference in real healthcare facilities could be more challenging; however, the spatial resolution of the mmWave could help to reduce the effect. Addi- tionally, the accuracy of mirrored gestures can be further enhanced by fusing additional information, such as point cloud data (see Figure 3.16a). In gen- eral, our results suggest that its adoption in real healthcare facilities should be investigated.

Healthcare Worker Identification: Gesture motions pattern has been shown to be unique identifier of user [75]. This is interesting direction that complements RFwash as it associates HH compliance with individual subjects.

We are currently extending RFWash in this direction to add device-free HCW identification by adding subsequent model trained to infer the subject from the predicted gesture. We built on the findings of [75] and investigated com- plementing the current architecture with HCW identification capability. In particular, we trained the C3D model [85] to predict which user was perform- ing a gesture given the RD frames as input. Following the guidelines of [75], the user identification model augments the current gesture tracking model.Thus, the frames identified by gesture tracking model as belonging to particular ges- ture (say G2 ) are further processed by the user identification model to predict the user identity. Our preliminary results on 9 subjects ({S1,S2, ··· ,S9})us- ing only gesture G2 are very encouraging. It shows that we can identify the subject who performed G2 with an average accuracy of 94.12% (Figure 3.18a). To reduce user identification model labelling effort while improving per- formance, we investigated regularising the training using Virtual Adversar- ial Training (VAT) [89] loss. Like many other regularisation techniques, the essence of VAT is making the model less sensitive to small perturbations in the inputs. In other words, the model predictions should not change signifi- July 20, 2020 3.6. Discussion and Future Work 74

(a) User identification accuracy (b) Impact of VAT regularisation

Fig. 3.18: Healthcare Worker Indetification Perfromance cantly after applying small perturbation to the input. A key difference in VAT case, however, is that perturbations are not random and instead are calculated

(“adversarial perturbation”).

Formally put, for a model θ and input sample x, the model output dis- tribution (i.e. prediction) is p(y|x, θ) and the adversarial perturbation radv is given by:

radv = arg max ΔKL(r, x, θ) (3.9) r;r≤

ΔKL(r, x, θ) = KL[p(y|x,θ), p(y|x+r,θ)] (3.10)

where KL is Kullback-Leibler divergence that measures the divergence be- tween two distributions,  is a norm constraint to ensure a small value for the perturbation and r is a variable that represents the amount of perturbation on x. According to Equation (3.9), the target perturbation is the one that max- imises the KL divergence between the two output distributions. The goal is make the model robust against such perturbation by minimizing the following loss via gradient descent: July 20, 2020 3.6. Discussion and Future Work 75

LVAT = −ΔKL(radv,x,θ) (3.11)

The fact that VAT doesn’t require the ground truth label allows it to be used in semi-supervised setting. We tested the performance of our model in semi-supervised settings where train using only a few labelled samples and the rest of data as unlabelled. Ultimately, as it can be seen VAT regularisation (see

Figure 3.18b )not only improves the general accuracy of the model but also allows it to maintain high accuracy (> 91%) even when a very small number of labelled samples were used (e.g., four labelled samples per user).

In Section 2.1, we discussed the privacy preservation capability of RF sig- nals that makes them favourable in medical sensing context. Being able to identify the users, however, doesn’t contradict the privacy preserving assump- tion as the system can only extract human-identifying features (i.e. unique hand motion dynamics) without leaking any visual appearance details (i.e. skin color, gender, emotion, etc.). Subsequent to the work presented here, this idea of privacy preserving RF-based user identification was embraced by others

[90, 91]. For example, RF-ReID [90] followed the same line and formalized the idea as a “privacy-conscious” person identification. They developed RF-based system for re-identifying walking users from their RF reflections patterns with- out capturing personal information. RFWash, on the other hand, relies on the unique hand motion dynamics for subject identification which aligns naturally with the hand hygiene application.

July 20, 2020 3.7. Conclusions 76 3.7 Conclusions

In this Chapter, we introduced RFWash which is the first RF-based system for contact-free monitoring of healthcare workers performing Hand Hygiene tech- niques. The novelty of the work is two-fold. First, we fill the gap in HH tech- nique monitoring research by introducing the first device-free system that is privacy preserving. Second, we introduce a deep model capable of recognizing back-to-back HH gestures that are not trivially separable. Gesture recognition is traditionally approached by two-stage systems comprising segmentation fol- lowed by classification. This study showed that the traditional approach does not extend to naturally performed back-to-back gestures. The gestures are un- segmentable and small segmentation errors significantly decrease the accuracy of subsequent classification. Consequently, we made a significant departure from traditional approaches by framing the problem as one of sequence la- belling. Our method overcomes the need for segmentation, accurately predicts complete gesture sequences and substantially reduces the human labelling ef- fort. RFWash was implemented using an embedded mmWave radar sensor and evaluated in a real-world environment. The promising results of RFWash in tracking Hand Hygiene in small scale trials encourage us to further expand this work by collecting data at a larger scale in clinical facilities and hospital wards.

This work has resulted in the following publication:

1. Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Marylouise

McLaws and Wen Hu, “Poster: A Weakly Supervised Tracking of

Hnad Hygiene Technique” International Conference on Information

Processing in Sensor Networks. IPSN ’20

July 20, 2020 Chapter 4

Breathing Biofeedback

Breath control exercises have proven vital to many applications, ranging from the clinical treatment of hypertension and breathing disorders to the man- agement of everyday stress. To assist users to learn about their breathing behaviours during these exercises, biofeedback solutions have been adopted.

These work by measuring changes in breathing parameters in an instanta- neous and continuous manner then communicating these changes to users via audio or visual feedback. In this Chapter, we explore the potential of sens- ing instantaneous breathing in a contactless manner from RF measurements.

We examine and address the challenges and introduce WiRelax, a WiFi-based breathing biofeedback system.

Tracking breathing is commonly performed using dedicated wearables, such as textile sensors, wearable belts or chest bands [62, 92, 93]. Additionally, several commercial products use wearable technologies to assess respiratory patterns in stress management exercises. For example, the Spire [57] and the Prana [61] capture fine-grained breathing features, such as rate, depth, inhalation-to-exhalation ratio (IER) and encourage users to engage in calmer breathing to alleviate the stress. In addition to dedicated wearables, smart-

77 Chapter 4. Breathing Biofeedback 78 watch manufacturers have already begun to integrate breathing exercises in their products. For example, Apple’s Breathe application[94] guides users dur- ing breathing exercises by requiring them to breathe-in/out following a circle animation. One key limitation of wearables and wired sensors in biofeedback applications is that they restrict a user’s posture [95]. They also tend to de- grade a user’s experience and can even introduce new forms of stress [95].

As device-free systems represent more comfortable alternatives, researchers have developed RF breathing tracking solutions using FMCW radars [23],

USRP [[96], Doppler radars [97] and WiFi [5, 55, 98]. The majority of these systems are designed to monitor breathing only and do not support biofeed- back [95]. Additionally, WiFi-based systems primarily focus on breathing rate.

Consequently, information about ongoing respiratory patterns, such as, cur- rent cycles (inhalation/exhalation/retention), timing (e.g., inhaling for 0.75 seconds), breathing depth (e.g., shallow/deep) cannot be directly acquired via existing approaches. Such information is equally important to biofeedback and wellbeing monitoring applications; however, to date, a ubiquitous contact-free breathing biofeedback solution has yet to be realised.

To address this issue, we introduce WiRelax, a device-free system that monitors the detailed breathing patterns of a user in real time, which is a key enabler of breathing exercise and biofeedback systems. The system is based on analysing the channel state information (CSI) of WiFi packets transmit- ted between two commodity WiFi devices, such as a tablet and a smartphone

(see Figure 4.1). WiRelax informs users about their instantaneous breathing performance (duration and depth) within a breathing session. Instantaneous reporting provides timely feedback and enables subjects to apply breath con- trol action. Consider the scenario in Figure 4.1 in which a subject is practising

6 seconds of paced breathing (3 seconds of inhalation and 3 seconds of exha- July 20, 2020 Chapter 4. Breathing Biofeedback 79

Fig. 4.1: WiRelax Concept. WiRelax leverages WiFi communication to provide users with instantaneous respiratory feedback during breathing exercises sessions. Video demonstration here: [1]) lation). The visual feedback consists of a colour-coded circle that progresses in the direction of the dashed arrow, as the system senses the user inhaling

(green colour) and exhaling (red colour). In State 2, for example, the user ob- serves that he has completed 50% of the exhalation cycle and has 1.5 seconds more remaining (see the grey segment). Ultimately, instantaneous sensing and feedback allow the user to synchronise his breathing with the desired exercise settings.

Several challenges arise in our use case scenario. First, we require the breathing progress to be reported continuously during the breathing cycle. July 20, 2020 Chapter 4. Breathing Biofeedback 80

Fig. 4.2: Cycle counting versus instantaneous breath tracking: Most CSI streams agree on breath cycle counts (three cycles in the orange segment); however, there is a lack of consensus about instantaneous breath (i.e., whether the subject is inhaling or exhaling and at what depth [see the red segment]). The WiRelax addresses instantaneous breath tracking.

Conventional peak-to-peak distance [5] and frequency analysis [6, 98] approaches that are used to estimate breathing rate are unsuitable for this purpose, as their estimates are performed on completed cycles segments (see Figure 4.2). We addressed this issue by introducing a model that correlates the instantaneous breathing induced chest displacement of a user to the change in CSI properties of the WiFi signal sub-carriers. Under this model, the chest displacement will cause a linear shift in the receiver antenna’s PD. Additionally, as the shift is proportional to the chest displacement magnitude, breathing depth (shallow versus deep breathing) can be identified. Driven by the model, a novel signal processing pipeline was developed to address the practical considerations of

filtering out noisy sub-carriers and fusing streams from many sub-carriers into a single breathing waveform.

Our second challenge related to the need to distinguish between inhalation and exhalation cycles, as it has been shown that the IER modulates heart rate

July 20, 2020 Chapter 4. Breathing Biofeedback 81 variability [99]. Reporting the IER and other similar metrics is contingent on the ability to make such a distinction. We solved this problem through a calibration procedure in which we ask the user to perform a pre-specified breathing sequence at the beginning of the session. The calibration takes a few seconds and makes the system agnostic to changes in the environment and in the identity of subjects.

Ultimately, the estimated waveform accurately matched the timing of the inhalation/exhalation cycles and the amplitude of the chest displacement. Our results (see Section 4.3) show that the timing errors are less than 0.5 seconds

83% of the time and the correlation between the estimated breathing waveform and the ground truth is 77%.

Our contributions are summarized as follows:

1. We propose the first WiFi-based contact-less real-time monitoring system

for ongoing breathing cycles.

2. We model the relationship between breathing-related chest displacement

and the change in phase difference (PD) of commodity WiFi packets.

Unlike earlier models [100][98] that focus on breathing frequency, the

proposed model is designed to infer instantaneous breathing dynamics

(timing and depth) making it suitable for biofeedback applications.

3. We demonstrate the effectiveness of the system by capturing detailed

breathing pattern metrics in real-world trials. Specifically, we capture

inhalation time, exhalation time, and relative amplitude, and inhalation-

to-exhalation ratio (IER).

The remainder of this Chapter is organised as follows. Section 4.1 provides an overview of the proposed WiRelax system that can accurately capture the July 20, 2020 4.1. Overview 82 chest displacement profile of subjects. Section 4.2 presents an analytical model that captures how chest displacement affects the antenna PD and also describes the algorithm WiRelax uses to process the CSI data. Section 4.3 evaluates the performance of WiRelax. Section 4.4 discusses related research. Section 4.5 discusses the limitations of this research and future avenues for research and

finally, Section 4.6 concludes the Chapter.

4.1 Overview

In this section, we motivate our work using an illustrative example and provide a brief overview of WiRelax. A detailed description of the system is presented in the next section.

4.1.1 Experimental Observation

4

2

0

-2

-4 displacement (mm) 10 20 30 40 50 60 time (s)

Fig. 4.3: A sample breathing session.

Our objective is to develop a system for monitoring of detailed breathing dynamics of a user in real-time. First, we must make an experimental obser- vation about the capability of PD measurements to capture breathing depth, as it is this capability upon which we base our model. We conducted an ex- periment with a single subject and two contact-less systems in a closed room.

July 20, 2020 4.1. Overview 83

In the experiment, we used a UWB radar 1 to capture the ground truth data for the chest movement (see Figure 4.3). We also deployed two laptops, one transmitter and one receiver, to measure CSI data using an Intel 5300 WiFi card. We processed the CSI data using the procedure in [5], whereby a Hampel

filter was applied to remove outliers and then a moving average filter was used to remove the high frequency noise irrelevant to breathing.

(a) Amplitude measurements

(b) Phase difference measurements

Fig. 4.4: Recorded amplitude and phase difference for the breathing signal of Figure 4.3.

We plot the amplitude of the processed CSI data across sub-carriers and phase difference of the CSI data between the two receiver antennas in Figure

4.4. The top part of sub figures show the data for all sub-carriers, while

1details about setup and ground truth in Section 4.3 July 20, 2020 4.1. Overview 84 the bottom sub-figures show the sub-carrier with the highest variance2.The ground-truth breathing waveform is shown in Figure 4.3.

We observe that while the frequency of the oscillation of the breathing waveform is preserved in both the amplitude and phase of the CSI data, the amplitude of the breathing waveform (the chest displacement in Figure 4.3) does not correlate well with the CSI amplitude. Phase difference, on the other hand, shows a high correlation with the breathing waveform and is more suitable for monitoring breathing patterns with a fine-grained detail. Based on this observation our modelling is based on phase difference measurements.

Next, we give overview of how the system works. Then, we introduce the model and system implementation.

4.1.2 WiRelax overview

WiRelax works by tracking a quasi-stationary user who is seated in a room equipped with two commodity WiFi devices (see Section 2.1 for in-home mon- itoring setup). At least one of the devices has multiple antennas, which is a common hardware feature used to improve the spatial diversity of WiFi com- munications.

Our system works in two steps. First, the user is asked to inhale and exhale normally for 10 seconds to calibrate the system. The calibration step generates a user model that links the RF signal to the user’s breathing. Next, the system provides the user with detailed information about his breathing patterns in real-time, through the conscious breathing graphical user interface

(see prototype in Figure 4.13)

Figure 4.5 shows the key algorithmic steps. The system first captures CSI

2A sub-carrier’s variance is an indicator of its sensitivity [5]

July 20, 2020 4.2. WiRelax System 85

Fig. 4.5: Illustration of WiRelax architecture. WiRelax is meant to be a framework that supports conscious breathing applications by providing real-time detailed breathing waveform. data for both receiver antennas and calculates phase difference (PD) between the two CSI streams, referred to as Raw PD. The raw data is then preprocessed to remove noise present in all sub-carriers. Next, the system rejects outlier sub- carriers and selects one of the remaining sub-carriers as a “reference” in the

Selection and Alignment step. In this stage, sub-carriers that have the opposite phase of the reference are inverted to be aligned. In the Waveform Estimation step, the data across all of the sub-carriers is fused into a single CSI waveform, using linear regression. Finally, the user model from the calibration step is used to transform the CSI waveform to the breathing waveform.

4.2 WiRelax System

In this section, we discuss the implementation of WiRelax. We start by mod- elling the relationship between chest displacement and antennas phase differ- ence of the radio signals. We then present our pre-processing steps to de-noise the signals and a data fusion algorithm for combining data from multiple sub- carriers into a single breathing waveform.

July 20, 2020 4.2. WiRelax System 86

4.2.1 The impact of Chest Displacement on Sub-carriers’

Phase Difference

(a) Single antenna case.

(b) Multiple antennas with phase difference.

Fig. 4.6: The effect of object displacement on phase and phase difference (PD)

This section gives the intuition on how displacement affects the phase dif- ference of the CSI. A more detailed mathematical analysis will be provided in

Section 4.2.2.

WiRelax works with two receiver antennas, and we will use the term phase difference (PD) to refer to the phase difference measured between the two

July 20, 2020 4.2. WiRelax System 87 antennas, typically calculated using the received CSI data. The term should not be confused with path phase difference commonly found in the literature, which refers to the phase difference between two signals (typically direct path and a reflection) as recorded by a single receiver antenna [100, 101]. The latter cannot be measured from COTS WiFi but its impact on the received CSI amplitude was modelled by earlier efforts and employed in various applications

[98, 101].

We consider the situation in Figure 4.6a where a transmitter (TX), a re- ceiver (RX) and a slowly-moving reflector (depicted as a thick horizontal line) are lying on a plane. We assume that the reflector moves in the direction per- pendicular to the line connecting the transmitter and receiver. We consider ray tracing and in particular the ray from transmitter to receiver via the re-

flector. As Figure 4.6b shows, the movement of the reflector causes the length of this ray to extend. In particular, the thick green line shows the extra path traversed by the signal compared with an earlier time instance. This extra path length Δ causes an extra phase shift Δφ at the receiver [23]:

 φ π Δ Δ =2 λ (4.1) where λ denotes the wavelength. Since the extra path length Δ is related to displacement of the reflector, the extra phase shift Δφ therefore contains in- formation on the unknown displacement. This method of estimating displace- ment from phase shift has been employed in [23, 102]. However, this method requires the transmitter and the receiver synchronise their radio carriers, which is not available for commodity WiFi. In this paper, we will overcome the lack of carrier synchronisation in WiFi by using the phase difference between two antennas of the same receiver.

July 20, 2020 4.2. WiRelax System 88

Figure 4.6a, is similar to that of Figure 4.6b, but the receiver has two antennas. We again consider ray tracing. There is a reflected ray from the transmitter to each of the two antennas, shown as thin blue and orange lines.

The orange ray travels over a longer path length and the path length difference between the two rays are shown as thick red line (lower part of the figure) and thick green line (upper part of the figure). The extra path causes a phase difference between the received signals at the two receiver antennas and we use it to estimate the displacement of the reflector.

It should be noted that PD is computed with respect to a common trans- mitter antenna which acts as a reference. Therefore, the lack of synchronisation between the transmitter and receiver is not a concern. Additionally, we as- sume that the unknown displacement that we want to measure is sufficiently small so that the extra path length is less than one wavelength. This allows our system to be agnostic to the phase wrap-around problem, where many different extra path lengths correspond to the same phase difference (e.g. a π λ λ λ λ λ phase difference of 2 can be due to extra path lengths of 4 , + 4 ,2 + 4 etc). In our scenario, the expected movement of the chest is in the range of 4-12 mm and the wavelength of the 5GHz signals is 5.7cm [98].

4.2.2 Modeling the Impact of Displacement on Phase

Difference

The aim of this section is to derive a mathematical expression relating PD between two receiver antennas as a function of the displacement of the chest as a reflector. We consider the situation depicted in Figure 4.7, which has the same setup as that in Figure 4.6b but with relevant distance labelled to facilitate the mathematical analysis. Table 4.1 summarises the symbols used

July 20, 2020 4.2. WiRelax System 89

Table 4.1: Symbols used in the mathematical derivation.

Symbol Description d Distance between TX and RX in the x-direction k Distance between the two receiving antennas ht Nominal distance between the transmitter and the reflector in the y-direction hr Distance between the receiving antenna 1 and the reflector in the y-direction  Displacement of the reflector in the y-direction from the nominal distance in the mathematical analysis.

Fig. 4.7: Illustrative example annotated with key symbols used in the model derivation.

Our model is based on ray tracing. We consider two rays for each antenna:

(1) LoS propagation from the transmitting antenna; (2) the ray reflected by the chest. The difference in path length ΔpLOS for the two LoS rays reaching the two antennas is:

2 2 ΔpLOS = ((ht − hr) + d − ((ht − (hr + k))2 + d2 (4.2)

This path length difference is independent of the chest displacement. Without loss of generality, we assume that ht = hr = h. As the inter-antenna distance k is small compared to d, we assume that ΔpLOS is negligible. July 20, 2020 4.2. WiRelax System 90

When the chest is at a distance of  from its nominal position, the difference in path lengths Δp() for the two reflected rays reaching the two antennas is a function of , as follows:

Δp()= (ht + hr +2)2 + d2 − ((ht +(hr + k)+2)2 + d2 (4.3)

This results in a PD Δφ() between the two receiving antennas:

π φ  2 p  Δ ( )= λ Δ ( ) (4.4)

Again, we assume that ht = hr = h. Since the chest displacement  is small compared with h, we approximate the right-hand side of Equation (4.3) using the Taylor’s series expansion in  that retains only up to linear term. With this approximation, we have:

π φ  − φ 2 S Δ ( ) Δ (0) = λ (4.5) where ∂Δp() 4h 4h +2k S = = − (4.6) ∂ 2 2 2 2 =0 (2h) + d (2h + k) + d

We assume that WiFi has C sub-carriers with wavelengths λi where i =

1, ..., C and for the i-th sub-carrier, the measured PD between receiver anten- nas is Δφi(). By using Equation (4.5) for all C sub-carriers, we have:

Δφi() − Δφi(0) 1 = S for i =1, ..., C. (4.7) 2π λi

Δφi()−Δφi(0) If we perform a linear regression with 2π as the dependent variable 1 S and λi ’s as the regressors, then the estimated slope is . As per Equation July 20, 2020 4.2. WiRelax System 91

(4.6) that the constant S depends on the distances between the transmitter, receiver and the user. Although it may be possible to obtain the value of S through some calibration process, this process can be cumbersome. In this paper, we will use the estimated slope to determine the chest displacement up to a proportional constant and we will refer to that as the relative displacement.

4.2.3 Relative Displacement Estimation

Our proposed system uses the phase difference between a pair of receiver an- tennas as the input. In the beginning, our system acquires and processes phase difference signal for a calibration period of 10 seconds in which the user is asked to breath normally. A model of the amplitude and phase of user’s breathing is kept as a result of this step. Next, the system follows a series of processing steps to remove noise in the CSI samples, reject outlier sub-carrier data, and to fuse data from multiple sub-carriers. Figure 4.5 presents an overview of the steps. We next describe the individual processing steps in a more detail.

Calibration

In the calibration step, the system instructs the user to breathe normally for

10 seconds and then gives an audio cue to indicate the start of the calibra- tion step. During calibration, the subject is expected to hold the breath for

1 second, then repeatedly breath in and breath out in relaxed manner until the end of calibration, signaled with another audio cue. After that, the sub- ject will follow the breathing exercises. The system records CSI data during calibration. This data is subsequently used to resolve ambiguity in the sign of

±180o phase shift, which in turn causes ambiguity in distinguishing between inhalation and exhalation. By asking the subject to inhale first before exhaling

July 20, 2020 4.2. WiRelax System 92 in the calibration process, we can capture the sign corresponding to inhalation and exhalation to ensure they can correctly identified later. In summary, the calibration data is used to model:

• Amplitude of the Normal Breathing: The median of the estimated

breathing amplitude during calibration is kept as the reference ampli-

tude. The information is used for tracking breathing depth.

• Direction of the Phase Difference in Inhalation and Exhala-

tion Cycles: Identifying inhalation vs exhalation is achieved by relating

the change in the produced waveform to the actual expected breathing

pattern during calibration. The information is vital for capturing IER

among other metrics.

It should be noted that the calibration stage (10 seconds) is performed only once at the beginning of each breathing session (which lasts 10-15 minutes).

Denoising and Outlier Removal

The aim of the preprocessing step seeks to de-noise the signal before handing it over to the filtering module. The preprocessing is done for each individual sub-carrier independently. First, we use the Hampel filter to remove outlier samples that render themselves as abrupt changes. In particular, we discard any point falling outside the range of [μ − τ ∗ σ, μ + τ ∗ σ], where μ, σ and τ are the median, mean absolute deviation and the threshold, respectively. The window size is set to 1 second. Next, to filter-out irrelevant high frequency noise, the sub-carriers streams are subjected to a moving average filter. We use a moving average filter with a larger window size (set to a default value of 3.5 seconds) to extract the sub-carrier dynamic trend. We then subtract

July 20, 2020 4.2. WiRelax System 93 the obtained trend from the original data stream to get the de-trended data

(Figure 4.8e).

5 5 5

10 10 10

15 15 15

20 20 20

25 25 25

30 30 30 5 10152025303540455055 5 10152025303540455055 5 10152025303540455055 (a) Raw input (b) Hampel filter (c) SG filter

2.3 -0.1 2.2 -0.15 -0.15 2.1 -0.2 -0.2 2 -0.25 1.9 -0.25 0 1020304050 0 1020304050 0 1020304050 (d) Raw input( single sub- (e) Hample filter ( single (f)SGfilter(singlesub- carrier) subcarrier) carrier)

Fig. 4.8: Preprocessing. Preprocessing depicted for a 1-minute segment of breath- ing session ( rate 40bpm ). Time is on x-axis. The lower row shows the pre-processing sequence applied to a single sub-carrier (#9) while the upper row shows the pre- processing effect on all sub-carriers (sub-carriers numbers on y-axis). The values in the lower sub-figures were scaled to range from 0–1 for each sub-carrier for visuali- sation purposes.

During our experimentation, we occasionally observed a few (typically fewer than three) noisy sub-carriers present in the data. These noisy sub- carriers vary in a random way throughout the whole breathing session and do not reflect the actual breathing. The complete exclusion of these sub-carriers as soon as possible ensures the reliability of subsequent operations. To identify these sub-carriers, the following heuristic is used. ⎧ ⎨⎪ >τ, outlier abs(Vari(PD)) = (4.8) ⎩⎪ ≤ τ, otherwise where Vari(PD) denotes the variance of the PD for the sub-carrier i;The default value for τ is 0.8π.

In the final step of the preprocessing, the Savitzky-Golay polynomial least squares filter (SG Filter) is employed. It serves the purpose of smoothing the

July 20, 2020 4.2. WiRelax System 94 signal while preserving the steep changes [103] and is useful for preserving the position of the peaks and valleys. Figure 4.8f shows an example of pre- processed single sub-carrier.

Selecting sub-carriers

Fig. 4.9: Sub-carriers Correlation : The correlation between all sub-carriers and the ground truth (GT) for a subject breathing normally.Matrix rows are ordered by their variance, with the highest value at the top. Sub-carriers with higher variance show a better correlation with GT in general.

While previous work exists on selecting the informative sub-carriers [5, 98], we base our sub-carrier selection algorithm on an observation that sub-carriers with high variance are more representative of the actual breathing pattern.

Figure 4.9 shows the correlation matrix of all 30 sub-carriers (each scaled to

[0-1]) sorted by their variance. We include the reference signal (“GT”) in the calculation. We observe: 1) high variance of sub-carrier signals correlates strongly with GT, 2) top sub-carriers correlate well with each other. Like previous vital sign monitoring works [5], our dataset was collected mostly in lab environment where the interference from secondary users is transient. July 20, 2020 4.2. WiRelax System 95

Thus in situations where inference is more pronounced advanced techniques for selection could be considered. Based on these observations, we selected a sub-carrier that had the highest variance and meets certain criteria and then selected all the other sub-carriers with which it correlated well. These selected sub-carriers were fused later by the waveform estimation module. Under the selection procedure:

• We sort sub-carriers in descending order based on their variance.

• For the top 20 sub-carriers, we calculate the correlation score as the

average correlation between it and every other sub-carrier in this list.

• We pick the sub-carrier with highest correlation score as the reference

if its number of peaks is close to the median number of peaks for all

top sub-carriers (less than one standard deviation). Otherwise, it will

be discarded and the next candidate will be considered. The rationale

is based on the observation that sub-carrier’s high variance does not

necessarily reflect its sensitivity level. A noisy sub-carrier with high

variance will have a much higher number of peaks than the majority of

other sub-carriers due to random fluctuations.

• The process is continued until a suitable reference sub-carrier is found.

• From all sub-carriers, we add the sub-carriers that are strongly correlated

with the reference one (absolute correlation ≥ 0.65) to the list.

Alignment

Some sub-carriers in the selected subset will have an opposite phase to the ref- erence sub-carrier. We align all sub-carriers with the reference one by inverting the sub-carriers that have negative correlation with the reference signal. July 20, 2020 4.2. WiRelax System 96

Estimating Breathing Waveform

The input to this step is a number of PD streams (or time series) for which each stream corresponds to a selected de-noised and aligned sub-carrier. The black curves in Figure 4.10 show an example. In this step, we ‘average’ these data streams to compute the relative displacement via linear estimation.

0.1 0.1 3 Estimated 2 Reference 0.05 0.05 1

0 0 0 PD lacement#

-1 p -0.05 -0.05

-2 Dis cde f g Estimated Amp. -0.1 -0.1 -3 0246 0246 time (s) time (s) (a) selected sub-carriers (b) estimated waveform

(c) 0.25 s (d) 0.75 s (e) 1.25 s (f) 1.75 s (g) 2.25 s

Fig. 4.10: Breathing Waveform Estimation

Linear Estimation. The rationale behind the linear estimation is related to our earlier observation that the PD and chest displacement are linearly re- lated. The idea is that, at each time instance, we perform a linear regression with the PD of the sub-carriers as the dependent variables and the inverse

1 wavelength of the sub-carriers λi as the regressors. To illustrate this idea, we selected five time instances which are indicated by the red dotted lines in Fig- ure 4.10a. At each time instance, we plot the PD of the selected sub-carrier against the sub-carrier index in sub-figures (c)-(g) in Figure 4.10. These sub-

figures also show the fitted line in red color. We can see from these figures that the trend is almost linear but the noise level is fairly high. By repeating

July 20, 2020 4.2. WiRelax System 97 this process for all time instances, we arrive at the estimated relative displace- ment. Figure 4.10b compares the estimated relative displacement (calibration information utilized) to the ground truth. To counter any error that might result from fusing noisy sub-carriers, the regression residual error is continu- ally monitored for every selected sub-carrier and the overall median as well.

When the residual error of specific sub-carrier is higher than the median by more than 1.5 standard deviation, it will be excluded and the regression-based estimation will be repeated to produce a refined estimation.

Since λi in Equation (4.5) for different sub-carriers are linearly spaced, one can expect the measured PDi values for different sub-carriers to fit a line and for a specific chest displacement, change PDi values will increase/decrease linearly for sequential sub-carriers and that will be manifest in change in that line slope. Figure 4.10 shows that PD change over time for 5 consecutive points in time for the breathing cycle shown in Figures 4.10c–4.10g . It can be seen that the displacement inflicts a proportional PD change throughout the whole set of sub-carriers that could be observed from the continuous slope change. It worthy to mention that, in reality, a perfect linear fit is unlikely. One way to address this issue is to use the regression error to filter-out noisy sub-carriers whose regression error will be more than threshold to refine the linear. Finally, the data from calibration is used to transform the amplitude and phase of the output into the final estimated waveform (see Figure 4.10b).

Figure 4.11a shows an estimated waveform during normal breathing session and the corresponding reference waveform. In this example, WiRelax achieves timing error of 0.21 seconds (12.2% relative timing error). The IER is another metric commonly used in paced breathing exercises and WiRelax estimates it with an error of 14.4% . Visually, we can observe the similarity between the estimated and the reference waveforms across the five-minute segment July 20, 2020 4.2. WiRelax System 98

(a) Breathing waveform compared to reference (b) closeup

(c) Error

Fig. 4.11: WiRelax breath cycles tracking. (a) estimated relative displacement waveform compared to reference displacement waveform. Peaks and valleys positions are visualized on top and bottom, respectively, slanting lines signify the deviation direction [as shown in the closeup (b)]. (c) breath tracking accuracy for various metrics. The red lines denote the medians: 0.21 seconds , 12.2% and 14.4% for cycle timing error, relative timing error and IER error respectively. The correlation and root mean squared error between the estimated waveform and reference signal are also reported.

July 20, 2020 4.2. WiRelax System 99

0.2 0.1

0 0

-0.1Estimated amplitide -0.2 -0.2 0 1020304050607080-6-4-20246 time (s) Reference (mm) (a) rate: 8 bpm, chest displacement var: 6.2 mm (b)

0.06

0.04 0.05 0.02 0

-0.02Estimated

0 -0.04

-0.06 -3-2-1012345 Reference (mm) amplitude -0.05 0 5 10 15time (s) 20 25 30 (d) (c) rate: 21 bpm , chest displacement var: 4.4 mm

0.15 0.1 0.1

0.05 0 0

-0.05 -0.1 Estimated -0.1 amplitude

-0.15 0 5 10 15 20 25 30 35 40 45 50 -3-2-10123 time (s) Refernce (mm) (e) rate: 45 bpm , chest displacement var: 1.07 mm (f)

Fig. 4.12: Estimated waveforms for various breathing patterns. (a), (c) and (e) show the estimated waveforms for deep breathing, deep & normal breathing and quick breathing sessions, respectively. (b), (d) and (f) show scatter plots of estimated relative displacement in relation to the true displacement. and this is confirmed by the high correlation (0.88) and low RMSE (0.12) of the reconstructed wave-form and the reference signal. Figure 4.12 shows correlation between the estimated and the reference waveforms for various respiratory pattern examples.

The estimation algorithm works for various breathing patterns. Figure 4.12 depicts the estimated breathing depth stability. In particular, sub-figures (b),

(d) and (f) in Figure 4.12 show scatter plots of the estimated relative dis- placement and the true displacement. It can readily be seen that they are highly correlated. This shows that WiRelax is able to compute the relative displacement accurately.

July 20, 2020 4.3. Evaluation 100 4.3 Evaluation

4.3.1 Goals, Metrics and Methodology

We show that WiRelax provides real-time full-cycle respiration feedback dur- ing meditation practice. For this purpose, we consider a meditation space, also called “Quiet Room”. Similar rooms are available to employees in work envi- ronments [104]. The subject would place her portable WiFi enabled devices in front of her, pick a specific exercise (or manually setting inhalation/exhalation time), and start practising. Feedback is provided during the session. Currently, commodity smartphones’ wireless cards do not export CSI data. Instead, we employ a pair of laptops as the communicating devices. Typically, a subject would need the devices close enough to be able to observe the real-time feed- back. Nevertheless, in our experiments, we vary the distances in the range 1-3 meters to evaluate the performance of WiRelax for different room sizes.

Feedback: Although we leave the feature-rich interactive user interface de- sign as a future work, we designed an initial prototype for providing in-session feedback (Figure 4.13). After experimenting with a few designs, we find the circular segmented (radial donut) user interface component is generally pre- ferred by the subjects (see the top left in Figure 4.13). We process incoming packets every 0.1 seconds to simulate real-time input. WiRelax processing was implemented on MacBook Pro running macOS Sierra v 10.12.6 with 8

GB RAM and 2.7 GHz Intel Core i5 processor. Implementation was done in

Python 3. NumPy and scikit-learn libraries were utilized for processing data and rendering the output. With these configurations, it takes WiRelax 180 milliseconds on average to process one second of input data and produce the estimated waveform.

Data Collection We collected data from several “quiet rooms” to evaluate July 20, 2020 4.3. Evaluation 101

WiRelax. The WiFi transmitter and receiver are HP Elitebook 6930p Laptops equipped with Intel 5300 WiFi cards. These laptops are placed on a desk and collect the WiFi CSI data using Linux 802.11n CSI Tool [105]. The ground truth for chest displacement was collected by an X2-M200 UWB-IR sensor

[106]. This sensor has been employed in a variety of vital sign monitoring ap- plications [106] and reportedly has a maximum deviation of 5% compared to

PSG reference airflow and thorax/abdomen displacement measurements [107].

The sensor reports chest displacement in millimetres at 20Hz sampling rate.

Ten volunteers (8 males and 2 females) participated in the data collection process over a total period of 4 months.3 For each user, we run several exper- iments to evaluate the impact of the distance between the user and the sensor on WiRelax algorithm performance. We select the key signal processing pa- rameters using leave-one-out cross validation (LOOCV) in which data from single user is used for testing the parameters that were determined using data from all other users.

Metrics: We use the same quantitative metrics as previous work, including the real-time cycle timing error and relative timing error (also called the progress time error) metrics used by [62], and the normalised amplitude Root Mean

Squared Error (RMSE) and correlation between normalised waveforms used by [108].

4.3.2 Capturing Breathing Cycles

In this section we evaluate the performance of WiRelax in terms of the accuracy of breathing cycle time and amplitude estimation. We also study the impact

3Ethical approval was granted by the University of New South Wales (Approval Number HC17823).

July 20, 2020 4.3. Evaluation 102

Fig. 4.13: WiRelax biofeedback prototype (demo: [1]) of the distance parameter h defined in Section 4.2.2. Figure 4.14 shows the accuracy of the proposed system for various timing and amplitude related metrics across the three distances considered.

Figure 4.14c plots the breathing rate error. The system estimates the rate accurately, with a median error of 0.15 bpm for a distance of 1 meter, which slightly increases to 0.2 bpm for larger distances. Inhalation and exhalation cycles are also captured accurately (see Figure 4.14a) with median errors of

0.24s, 0.23s and 0.32s for 1, 2 and 3 metres, respectively. We note that an error in the cycle time estimation has a direct impact on the next cycle. For example, a small positive error in the inhalation cycle of +0.1 second will introduce an error of -0.1 second in the following exhalation cycle.

The relative timing error is a more descriptive measurement of the timing accuracy because the cycle length is taken into account here. WiRelax achieves the median relative timing error of 12.9% at a 1 metre distance. Intuitively, the timing errors observed in WiRelax are related to the number and the quality of

July 20, 2020 4.3. Evaluation 103

(a) Timing Error (b) Relative Timing Error

(c) Breathing Rate Error (d) Amplitude Correlation

(e) Amplitude RMSE

Fig. 4.14: WRelax Evaluation. Evaluation of different breathing accuracy metrics with respect to distance between the user and WiRelax system.

July 20, 2020 4.3. Evaluation 104 representative sub-carrier signals measured by the Wi-Fi hardware. Signal-to- noise ratio of these signals decreases with increasing distance between the test subject and the measurement apparatus, which in turn lowers the accuracy.

Amplitude-related metrics are shown in Figures 4.14d and 4.14e. Similar to the timing metrics, amplitude estimation performance degrades with an increasing distance, albeit at a higher rate. There are two reasons for this.

First, timing-based metrics are only impacted by the cycle start/end positions while amplitude is impacted by the whole PD time-series that dictate the waveform shape. Second, misalignment caused by the timing errors further amplifies the amplitude errors.

4.3.3 Capturing Complete Breathing Pattern

WiRelax largely aims to provide accurate in-session feedback to the user, it is informative to put WiRelax in context with state-of-the-art WiFi-based breathing monitoring systems. As explained by [109], radar-based respiratory monitoring systems [23] are not directly comparable to WiFi-based systems, such as WiRelax, due to differences in the measurements type and quality. We focus on amplitude-based system [5] and two phase difference-based systems

[6] and [7]. To make an objective comparison, we consider only the breathing rate for [7] and the complete cycle length metric for [5], while the complete waveform (timing and waveform amplitude) is compared to PhaseBeat [6].

A brief description of these systems along with replication considerations is presented below:

• Liu et al. [5]: amplitude measurements from 30 sub-carriers are used.

Data calibration is applied to mitigate the noise in raw data. Then, the

most sensitive sub-carriers (blue curve area in Figure 4.15a) are selected

July 20, 2020 4.3. Evaluation 105

based on their variance. Next, peaks locations are calculated while con-

sidering fake peak removal in the process. Finally, weighted average of

all peak-to-peak intervals across selected sub-carriers is employed to get

the breathing rate.

• PhaseBeat [6]:PD data from 30 sub-carriers at 400 Hz sampling rate are

used as input. DC component and high frequency noises are removed

through calibration. Next, the top 3 sub-carriers with maximum mean

absolute deviations are picked. Out of them, the median one is the final

selection.

• TensorBeat [7]: PD data from 60 sub-carriers (two antenna pairs) at 20

Hz sampling rate are used as input. After calibration, a two-dimensional

Hankel matrix is constructed from 600 consecutive packets of each sub-

carriers. The 60 Hankel matrices are stacked into the 3-dimensional

tensor. Since we have a single user, the tensor’s rank was fixed to 2.

Next, tensor decomposition is applied using CP decomposition [110] and

autocorrelation is calculated on the fusion of the decomposed signal pairs.

Finally, the rate is reported based on average inter-peak duration.

We show the performance of different algorithms based on an experiment with one subject. Over a period of 30 seconds, the subject was asked to perform deep breathing for 10 seconds, followed by normal breathing for 10 seconds and finally deep breathing for 10 seconds. The purpose is to create a breathing pattern the amplitude of which varies over time. The true breathing pattern captured by the UWB sensor is plotted in gray line in Figures 4.15a–

4.15c.

The WiFi CSI data was processed by the algorithm of Liu et al., as well as

PhaseBeat, WiRelax and TensorBeat. The results from the three algorithms July 20, 2020 4.3. Evaluation 106

(a) Liu et al [5]

(b) PhaseBeat [6]

(c) WiRelax

Fig. 4.15: Breathing cycle estimations from three algorithms: Liu et al. [5], PhaseBeat and WiRelax. The gray line shows the ground truth for chest displacement. The solid gray (red) circles show the true (estimated) peak. The ticks show the time difference between the estimated and true peak (i.e. the difference in timings of the red and gray circles). (a) Liu et al. [5] estimates cycle period using all peaks (small red circles) of selected sub-carriers. The weighted average of all sub-carriers cycle time determines the final breathing cycle boundaries (marked by large red circles). (b) PhaseBeat [6] employs inter-peak duration for a chosen single sub-carrier to estimate the breathing rate. (c) WiRelax employs a cohort of selected sub-carriers to estimate the final breathing waveform.

4 1000

2 500

0 0

-2 -500

-4 -1000 0 100 200 300 0 200 400 600 (a) Tensorization (b) Decomposition (c) Autocorrelation

Fig. 4.16: TensorBeat [7] processing for same data of Fig. 4.15 July 20, 2020 4.4. Related Work 107 are shown in Figures 4.15a–4.15c while those for TensorBeat are in Figure 4.16.

We first consider Figures 4.15a–4.15c. In these figures, the solid gray and red circles show the timing of the true and estimated peaks, respectively. The gray ticks near the top of each subplot show the deviation between the timing of the true and estimated peaks. Therefore, a wider tick indicates a larger timing error and vice versa.

For waveform estimation, it can readily be seen that WiRelax has the least timing error, 0.21 s compared to 0.41 s and 0.266 s for Liu et al. [5] and

PhaseBeat. In addition, WiRelax is able to get the relative chest displacement amplitude much more accurately compared to PhaseBeat (WiRelax normalised

RMSE is 0.1081 compared to 0.2716 by PhaseBeat). TensorBeat estimates the breathing rate from the median peak-to-peak distances (see Figure 4.16c).

The fused breathing signal (Figure 4.16b )doesn’t match the actual breathing pattern. While a noticeable decrease in breathing depth can be seen in Figure

4.16b, the state transition (deep to shallow to deep again) is missing.

4.4 Related Work

In this section we extend our earlier treatment of related works (see Section

2.3.2) by elaborating on the related WiFi breathing monitoring research.

The vast majority of earlier works focus on breathing rate. UbiBreathe [55] which is one of the earliest effort that employed WiFi radio signal strength

(RSS) to monitor breathing rates. mmVital uses RSS of a 60 GHz millimeter wave signal for breathing and heart rates monitoring. Liu et al.[5] and Phase-

Beat [6] leveraged WiFi channel state information (CSI) to monitor breathing rate and, also, heart rate after utilizing directional antennas. A number of amplitude-based respiration sensing systems [98] [100] based on the Fresnel

July 20, 2020 4.5. Limitations and Future Work 108

Zone Model were developed. The model relates one’s chest movement to the received WiFi signals amplitude measurements. As chest displacement crosses

Fresnel zones, the receiving signal shows a continuous sinusoidal-like wave, with peaks and valleys generated by crossing the boundaries. This allows for breathing pattern extraction. Our model, on the other hand, is based on PD measurements and goes beyond pattern capturing to explain how breathing depth can impact captured PD measurements. Recently, a few works demon- strated the possibility of improving the detection range of WiFi breathing.

Fullbreath[109] uses phase and amplitude measurements simultaneously and employs conjugate multiplication of two antenna measurements to improve breathing detectability. FarSense [111] proposed to use “CSI ratio” to push the respiration sensing range to house level (up to 8-9 metres). The evaluation in the two systems is limited to respiration rate tracking accuracy.

The main distinction between our work and related device-free WiFi breath- ing monitoring systems is the focus on extracting breathing biofeedback in- formation. For this, our system is designed to address the requirements of reporting instantaneous breathing depth and timing while the breathing cy- cle is still ongoing. The ability of the proposed model to regress directly on incoming PD measurements enables such tracking without having to wait for the cycle completion. This, in turn, enables WiFi-based interactive breathing applications.

4.5 Limitations and Future Work

Advances in device-free sensing using COTS WiFi have recently sparked a renewed interest in their applications for fine-grained sensing applications in- cluding vital sign monitoring. In this study, we focused on reliable and detailed

July 20, 2020 4.5. Limitations and Future Work 109 monitoring of the breathing cycle and its characteristics. Our findings suggest that COTS WiFi can be effectively employed for conscious breathing monitor- ing and real-time feedback.

However, WiRelax has its limitations. First, the CSI data is sensitive to multi-path interference specific to the environmental setup. We acknowledge that WiRelax performance might be impacted by the movement of nearby people. The environmental impact limitation is fundamental to WiFi sensing systems and presents an important challenge for future research [112]. Mitigat- ing the impact of irrelevant motions can be addressed by employing a “Breath

Model”[62] for the users. Breathing patterns do not change dramatically be- tween consecutive cycles, which allows us to suppress the noise in the breathing cycle estimation by fusing information from a previous (noise-free) reference cycle.

Second, we adopted preliminary visual feedback scheme in the WiRelax prototype, to demonstrate the real-time breath-by-breath monitoring capa- bility. The subject can observe his/her instantaneous behaviour and adapt accordingly. However, a further user-centric study might be needed to im- prove its responsiveness and make it more intuitive. For example, studies show that auditory feedback is superior to visual feedback in creating self- reported calm [113]. Another important feedback that our system can provide is a summarised per-session breathing performance, which is vital for long-term tracking of breathing habits. Our future work will investigate visual, auditory, and summary features to further improve WiRelax capabilities.

July 20, 2020 4.6. Conclusions 110 4.6 Conclusions

To enable interactive breath control applications, this Chapter proposed contact- less sensing of instantaneous breathing dynamics using WiFi channel (CSI).

To date, no previous research have sought to track ongoing breath cycles using

WiFi. Contrary to complete cycles tracking (breathing rate), on-going cycle progress reporting is complicated by the need to map divergent measurements from noisy sub-carriers into a single instantaneous breathing state (time and depth) while maintaining sub-second responsiveness (i.e. instant sensing and feedback loop) necessary for biofeedback. Our approach to the problem is guided by observations about the stability of PD. Our study of the impact of micro-motions on the PD measurements culminated into a model showing the linear relation between the two quantities. Interestingly, this enables mapping the instantly measured PD into chest displacement (up to relative constant).

Consequently, inferring breathing progress without waiting for cycle comple- tion is theoretically feasible. To obtain a reliable estimation in practice, a novel sub-carrier filtering and selection was adopted and robustness was further en- forced by leveraging a simple per-session calibration procedure. WiRelax is able to report sub-second breathing progress in real-time with median timing error of 0.25 seconds. The system requires neither training nor information about subject identity and ready to deploy on off-the-shelf WiFi devices. This opens the door to employing WiFi sensing in biofeedback applications on large scale.

This work has resulted in the following publication:

1. Abdelwahed Khamis, Chun Tung Chou, Branislav Kusy and Wen Hu,

“WiRelax: Towards Real-time Respiratory Biofeedback During

Meditation Using WiFi” Elsevier Ad Hoc Networks Journal, in-press, July 20, 2020 4.6. Conclusions 111

accepted in May 2020.

July 20, 2020 Chapter 5

Heartbeat Estimation

Ubiquitous health monitoring had witnessed a surge of interest in the past few years. Current heart rate monitoring solutions mostly employ wearable devices attached to the user’s body. The alternative device-free heart rate monitoring offers improvements in comfort, ease of use, and does not require close cooperation of the subject, which are important aspects especially in the health care context. In this chapter we explore the potential of employing in-home WiFi devices for heart rate monitoring.

Commodity WiFi devices have recently been used for contact-free moni- toring of vital signs, such as heartbeat and respiration [6, 8]. While demon- strating promising performance for respiration monitoring, commodity WiFi technology requires directional antennas to achieve accurate heart rate moni- toring. The key observation is that directional antennas help to substantially reduce multipath effects in complex real-world environments that render sig- nals obtained with omnidirectional antennas difficult to analyse. In addition to respiration and heartbeat monitoring, directional WiFi antennas have en- abled new sensing modalities, such as lip reading [114], WiFi imaging [115], and gesture recognition [116].

112 Chapter 5. Heartbeat Estimation 113

The key question we ask in this chapter is: Is it possible to accurately monitor heartbeat of a person using ubiquitous consumer RF devices that are not attached to the person? We show the positive answer to this by designing, implementing, and validating a system comprised of Commercial Off-The-Shelf

(COTS) WiFi devices that track instantaneous heartbeat of a person with median accuracy of 1.1 beats per minute.

Unlike previous research, we consider a specific scenario with two unmod- ified WiFi devices communicating near a user. The two devices can be, for example, a laptop and a portable device both using built-in omni-directional antennas and one of the devices collects channel state information (CSI) data associated with WiFi transmissions. Our key observation is that despite the substantial noise present in the CSI signals, the impact of the noise on indi- vidual sub-carriers is frequency dependent. Specifically, several sub-carriers typically contain strong heartbeat related frequency components and are suf-

ficient for making accurate heartbeat estimation.

Based on our observation, we implement filtering and sub-carrier selection algorithms that rank the sub-carriers according to their heartbeat informa- tion content. Related work has primarily adopted the variance-based selection method that discards sub-carriers with low variance [6, 8, 98] or ranked sub- carriers according to their periodicity [117]. While these approaches tend to work well when the motion level is sufficiently strong to modulate the am- plitude of the received signal (such as breathing, walking), they fail when the motion is weak and easily dominated by other environmental RF noise.

Through experiments, we show that existing methods yield noisy frequency curves and significant energy peaks at frequencies distributed across the whole heartbeat frequency spectrum. Our sub-carrier selection method takes ad- vantage of the spectral history and chooses those sub-carriers with consistent July 20, 2020 Chapter 5. Heartbeat Estimation 114 dominant frequency component over a specified time window.

We make the following contributions in this chapter:

• We demonstrate the feasibility of contact-less extraction of heart rate

from Channel State Information (CSI) of COTS WiFi devices without

extra hardware such as directional antennas. This is to the best of our

knowledge, the first system that does not rely on bulky directional anten-

nas. The proposed approach estimates the heart rates with median error

comparable to directional antennas for user-to-apparatus distances of up

to 2 meters and shows an improvement of 40% compared to previous

work [8].

• We propose a novel sub-carrier ranking scheme based on Spectral Sta-

bility, which is capable of selecting informative sub-carriers in situations

where the majority of the sub-carriers are noisy. The scheme leverages

the known frequency range of the heart beating and is computationally

efficient, which makes it suitable for instantaneous heart rate estima-

tion. The proposed scheme reduces the median error by 19% compared

to previous sub-carrier selection approaches based on variance.

The rest of this Chapter is organized as follows. We start by motivating the problem of WiFi-based contact-free heart rate estimation in the absence of bulky directional antennas in Section 5.1, where we also highlight the chal- lenges and our observations for making the heart rate estimation possible. Next in Section 5.2, we show the proposed CardioFi system architecture. Then, in

Section 5.3, we evaluate the system’s performance and study the impact of different parameters, which is followed by the overview of related works in

Section 5.4. Finally, we conclude in Section 5.6.

July 20, 2020 5.1. Motivation 115 5.1 Motivation

The state of a wireless channel is affected by the movement of people and ob- jects in the transmission medium. For WiFi devices, these movements induce changes in the CSI of different sub-carriers. The key idea behind WiFi sensing is to use the CSI to infer the movements that have caused the changes in CSI.

By using this inference, researchers have been able to successfully perform loca- tion tracking [118], gesture recognition [119], breathing rate estimation [7, 120], gait [121] and many other applications.

Our aim is to use COTS WiFi devices to estimate the heart rates. Although the estimation of breathing rate from CSI data has already been demon- strated [7, 120], it is challenging to estimate the heart rate from CSI data because the heart movement is significantly smaller in magnitude compared to that of the lungs. The chest movement due to breathing is approximately 4-12 mm [122] and causes a periodic variation in the CSI time series which is easily discernible by naked eyes. However, the chest movement due to the heart is an order of magnitude smaller, at approximately 0.2-0.5 mm [123], and can only induce a small change in CSI. The mixing of the movement due to breathing and the heart can cause another problem. Since breathing movement is lower in frequency than that of the heart, the higher harmonics due to breathing can interfere with the signal due to the heart [6, 124].

In order to overcome these challenges, Liu et. al. employed frequency analysis to isolate the heart rate frequency band from that of breathing, and used directional antenna to boost the signal’s quality [8]. Figure 8(b) in

[8] shows the results. The thin magenta lines in the figure show the Power

Spectral Density (PSD) of the amplitude of the CSI time series from different sub-carriers after filtering out the frequency range in which the heart rates are

July 20, 2020 5.1. Motivation 116

Actual Heart Ratee CardioFi estimation

Fig. 5.1: Power Spectral Density (PSD) curves for CSI data collected us- ing omni-directional antennas. The low SNR makes the heart rate estimation a challenge. unlikely to be found. The thick blue line shows the mean PSD of all the sub- carriers, which shows a distinct peak at a frequency very close to the actual heart rate indicated by the black dashed lines.

Figure 5.1 demonstrates the challenges of applying the method in [8] to the

CSI values of all 30 sub-carriers obtained from conventional WiFi devices that use on-board omnidirectional antenna. Here, we used a pair of laptops separated by a distance of 1 m. A user was sitting at approximately 0.5 metres from the laptops. We directly applied the same method as in [8] to process the data. The thin magenta lines in Figure 5.1 show the PSD of the sub-carriers and they were significantly noisier than those in Figure 8(b) in [8]. The thick blue line shows the mean PSD. The heart rate is then estimated from the peak

PSD as in [8] to be 68 bpm, which is significantly different to the ground truth

July 20, 2020 5.2. CardioFi 117 value of 77 bpm. The large error can be explained by the fact that only 7 sub-carriers out of 30 are within ± 2 bpm from the actual heart rate. On the other hand, the proposed CardioFi, which will be introduced and discussed in later sections, produced an estimation of 76 bpm (the red line in the figure) that is very close to the ground truth.

Although the CSI data in Figure 5.1 is noisy, we can see that the signals contain useful information as the PSD of a number of sub-carriers has a peak close to the actual heart rate. The top circular ticks in Figure 5.1 depict the positions of sub-carriers’ PSD peaks. The opacity of the tick shows the number of sub-carriers with the peak at that position. Anecdotally, six sub-carriers out of thirty differ from the actual heart rate by 0.9 bpm only.

We show another example in Figure 5.2, where we track the PSD for each sub-carrier over time. We calculate the peak of the PSD curve for each sub- carrier and use the peaks as heart rate estimates. The lightly shaded area shows the heart rate estimate bounds defined by the minimum and maximum

PSD peaks among all sub-carriers. Similarly, the dark shaded area gives a heart rate estimate as bounded by 10% and 90% percentile of individual sub-carrier estimates. Although the range is fairly wide, note the range still includes the actual heart rate shown by the black line which was measured with an external heart rate monitor. We conclude that despite the significant increase in RF noise due to omnidirectional antennas, CSI data contains useful heart rate information. In the next section, we will present a suite of algorithms that

CardioFi uses to filter out the noise and substantially improve accuracy of heart rate estimation from CSI data collected with omnidirectional antennas.

July 20, 2020 5.2. CardioFi 118

Fig. 5.2: Heart rate estimated from subcarriers. Actual heart rate compared to the estimation produced from individual sub-carriers. At each point in time, different sub-carriers produce estimation that varies largely (illustrated by the red shaded areas whose boundary represent maximum and minimum estimation at that point). Even after discarding extreme estimates (darker area), the range continues to be large.

5.2 CardioFi

Figure 5.3 shows the architecture of CardioFi, which consists of two WiFi devices. One device acts as the transmitter and the other as the receiver; the receiver is assumed to have multiple antennas, which is common for the WiFi devices nowadays. Figure 5.3 depicts the transmitter and the receiver as an access point and a laptop respectively. The transmitter sends packets to the receiver at a regular interval. If a subject is in the vicinity of the devices then this subject’s heart beats, as well as other movements, will modulate the wireless signals arriving at the receiver. Therefore, the CSI contains the information of the heart rate of the subject.

The receiver continuously records the CSI of the received packets in differ- ent sub-carriers. The CSI data is used to compute the phase difference (PD) between the antenna pairs. In the first stage, CardioFi preprocesses the PD data using outliers removal and noise filtering algorithms. The second stage constitutes a key technical contribution of this Chapter, where we use novel algorithms to score and select sub-carriers that are the most informative to heart rate. In the last stage, data fusion is used to fuse information from the July 20, 2020 5.2. CardioFi 119

Fig. 5.3: The architecture of CardioFi. top scored sub-carriers to produce the final heart rate estimate.

5.2.1 Background

The physical layer of the latest WiFi standard is based on the Orthogonal

Frequency Division Multiplexing (OFDM) modulation technique. OFDM uses a number of orthogonal sub-carriers and transmits data independently on each of the sub-carriers. Assuming there are m transmit antennas and n receive antennas, the CSI of all data streams can be expressed as:

July 20, 2020 5.2. CardioFi 120

⎛ ⎞ H H ··· H ⎜ 1,1 1,2 1,n ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ H2,1 H2,2 ··· H2,n ⎟ ⎜ ⎟ (5.1) ⎜ . . . ⎟ ⎜ ...... ⎟ ⎝ . . . ⎠

Hm,1 Hm,2 ··· Hm,n where Hi,j is the CSI vector between the i-th transmit antenna and the j- th receive antenna. Commodity WiFi cards make the CSI of some of the sub-carriers available, e.g. Intel 5300 WiFi cards can export the CSI of 30 sub-carriers. Let C denote the number of sub-carriers whose CSI is available, then each Hi,j is a vector with C elements. We use H to denote a generic Hi,j and we write:

H =[h1,h2, ··· ,hC ] (5.2) where hi denotes the CSI for sub-carrier i in the CSI vector H. The CSI hi is a complex number which combines the effect of attenuation, reflection and scattering of the radio sub-carriers when they propagate from the transmitter antenna to the receiver antenna. The CSI can be expressed in terms of the multi-path attributes. Assuming that the radio propagation in hi (whose fre- quency is fi) is the combined effect of P multi-paths where the attenuation and propagation delay on the k-th path (where k =1,...,P) are denoted by pk and τk, respectively, then we can write:

P −j2πfiτk hi = pke (5.3) k=1

In our context, some of the multi-paths may be modulated by the movement of the heart of a subject, see Figure 5.3, and these multi-paths provide infor- mation on the heart rate of the subject.

Instead of using CSI hi, one may also use the magnitude and the phase of July 20, 2020 5.2. CardioFi 121 hi to infer the information on the heart rate. However, as explained earlier in

Chapter 4, the challenge of using the phase of hi obtainedfromCOTSWiFi devices is that there is an unknown random offset in the phase measurements which varies from packet to packet. Fortunately, this unknown offset is the same for multiple receiving antennas for a given packet [7]. This is because the antennas are on the same Network Interface Card (NIC) and hence they use the same system clock and the same down-converter frequency. Therefore, it is possible to remove this unknown offset by subtracting the phase measurements from two receiving antennas. Hence, CardioFi uses PD as the raw data for the heart rate estimation.

5.2.2 Preprocessing

The preprocessing stage consists of three sub-steps: outlier removal, de-trending and de-nosing, see Figure 5.3. The input to the preprocessing block is the PD of all sub-carriers and the preprocessing is done for each individual sub-carrier independently. The removal of outliers is performed by a Hampel filter. We begin by computing the median μ and the mean absolute deviation σ over a time window (T1) of the PD data. By using a threshold τ1, we discard the data that lie outside the interval [μ − τ1 × σ, μ + τ1 × σ]. CardioFi chooses T1

1 to be 0.5 seconds and τ1 to be 0.4 . Then, linear interpolation is performed to maintain uniform sampling (due to interference caused by other devices in the same WiFi channel, packets received may not be evenly distributed in time).

The next step is to remove the trend in the data. This step is important because we observed during our experimentation that sub-carriers could go

1Parameters were initially set as [6], then experimental fine tuning was performed so that the chosen values accurately removes outliers without over smoothing the signal.

July 20, 2020 5.2. CardioFi 122

(a) raw PD

(b) Trend Estimation

(c) After de-trending

(d) After de-noising

Fig. 5.4: CardioFi preprocessing through unpredictable abrupt changes, see Figure 5.4a. These changes could occur when the user made sudden movements and they appeared in all sub- carriers.

Since the abrupt changes in PD can last for a short time or a long period of time, see Figure 5.4a, we propose a de-trending method called Dynamic

Window (DW). The basic idea of DW is to divide the PD time series into non-overlapping windows and if de-trending has been correctly performed, we expect the variance in each window to be almost the same. This means that if the trend changes slowly, then we can use a larger window size; and vice versa.

July 20, 2020 5.2. CardioFi 123

The purpose of the DW algorithm is to determine this window size adaptively.

The DW algorithm begins by computing the average variance over c win- dows where the default window size is used. Let vw denotes the variance of the w-th window. The DW algorithm computes

c E 1 v v = c w (5.4) w=1

The next step is to determine the size of the next window so that the variance of the data in the next window is almost the same as Ev. We do this by increasing the size of the next window until either the variance of the data in the window exceeds αEv where α is a parameter or the window size reaches a preset limit of l. In our experiments, we chose c =5,l = 3 seconds and α =1.2.

Green dashed vertical lines in Figure 5.4b depict the dynamic windows.

The advantage of DW algorithm is that it can capture abrupt changes in the signal accurately. We compare the accuracy of the DW algorithm against the Moving Average (MA) method with a fixed window size, see Figures 5.4c and 5.4d. DW-based de-trending clearly performs better than the MA method.

After de-trending, another Hampel filter is used to remove the noise. Figure

5.4d shows the de-noised data (window size, or T2, is 0.5 second and τ2 set to 0.1) 2.

5.2.3 Sub-carrier Selection

As discussed in Section 5.1, some sub-carriers are more informative to heart rate estimation. Here we present a method to assess the quality of estimation of each sub-carrier.

2Parameters were initially set as [6], then experimental fine tuning was performed.

July 20, 2020 5.2. CardioFi 124

(a) Individual Sub-carriers’ error rates (b) Estimation Variance (1/ps)

(c) Error vs Estimation Variance (1/ps)

Fig. 5.5: Correlation between estimation error and the calculated estimation variance ps

The sub-carrier selection step consists of two sub-steps (see Figure 5.3).

The first step is to use a bandpass filter to remove the frequency components in which the heart rate is unlikely to be found. Following [124], we retain the frequency in the interval [max(2 ∗ BR,50), 2] Hz, where BR is the estimated breathing rate, to avoid the frequency harmonics caused by breathing. The breathing rate is estimated as the frequency component with strongest power in the breathing range. Note that the breathing rates generally range from

0.2-0.5 Hz.

The next step is to determine the quality of the sub-carriers. Earlier meth- ods [8, 98] made use of the fact that different sub-carriers have different wave- lengths and this difference can result in different sensitivity levels for the mo-

July 20, 2020 5.2. CardioFi 125

(a)

(b)

Fig. 5.6: Heart rate estimation from highest scoring sub-carrier selected using Spectral-based (a) and Variance-based (b) selection methods. tion of interest. Hence, the mean absolute deviation of sub-carriers’ signals is used as a quality metric. Generally the larger the deviation is, the higher the sensitivity is.

In this work, we propose a novel Spectral Stability method to determine the quality of the sub-carriers. Our method is based on an observation that the true heart rates do not change rapidly over a short time duration. Therefore, we calculate the heart rate estimate multiple times over a short duration. If the estimates are consistent, then the signal is likely to be of good quality.

In order to calculate the spectral stability score at time t, we consider a time interval of length T3 and divide the interval [t − T3 : t]intoN sliding windows.

3 Each window is wf seconds long and overlapped by wf − 1 seconds with the following window. For sub-carrier i and time window n (for n =1...N), we determine the peak frequency of the PD time-series as ri,n. The value of ri,n

3 The impact of the values of N and wf will be investigated in Section 5.3.2

July 20, 2020 5.2. CardioFi 126 can be considered as one of the heart rate estimates. We define the Spectral

Stability score psi of sub-carrier i as:

1 psi = (5.5) variance(ri,1,...,ri,N)

We now demonstrate that the spectral stability score is indeed a good indi- cator of the quality of the heart rate estimation. We conducted an experiment where we used the individual sub-carriers to estimate the heart rates and the true heart rates are also collected. Figure 5.5a shows the error of the estimated heart rates. It can be seen that there is a lot of variation in the estimation error but some carriers produced low estimation error. Figure 5.5b shows the variance of the estimated heart rates, i.e. the denominator of Equation (5.5).

Figure 5.5c plots the variance of the estimated heart rates against the error in the estimated heart rates. The figure shows that there is a high correlation between the two quantities, which demonstrates that we can use the spec- tral stability score to assess the quality of the sub-carriers for the heart rate estimation.

Figure 5.6 further illustrates the effectiveness of the proposed sub-carrier selection scheme. Figure 5.6a shows the heart rates estimated by using the sub-carrier with the highest spectral stability score. It can be seen that the estimated heart rates (blue curve) follow the actual heart rates (black curve) closely, with a median error of 2.4 bpm. On the other hand, if we use the sub- carriers that have the largest variance, then the heart rate estimation (green curve) is poor as shown in Figure 5.6b with a median error of 5.2 bpm which is more than double of that in Figure 5.6a.

July 20, 2020 5.3. Evaluation 127

5.2.4 Heart Rate Estimation

After computing the spectral stability score for the sub-carriers, the next step is to use the scores to select the good sub-carriers. We first normalise the spectral stability scores before excluding the sub-carriers whose scores are less than 20% of the score of the best sub-carrier. We fuse the data from the most informative sub-carriers by calculating the mean PSD spectrum across the sub-carriers. The final instantaneous heart rate is then estimated as the frequency component with the largest magnitude in the mean PSD spectrum.

The output of the whole processing pipeline for a 200 seconds segment is shown in Figure 5.7a. The figure shows estimated heart rate from CardioFi and also from [8]. It can be seen that CardioFi tracks the true heart rates very closely. Figure 5.7b shows the box plot of the estimation errors from CardioFi and [8] respectively. The median error for CardioFi is 1.0 bpm, while that of

[8] is 1.9 bpm, which is 90% larger.

Figures 5.7c and 5.7d illustrate the performance gap between CardioFi and

[8]. They show the estimated spectra of the two methods at time 150 seconds of Figure 5.7a. The behaviour of Figure 5.7c is similar to that in Figure

?? introduced earlier where the estimated spectra are very noisy. However,

CardioFi obtained a better spectrum estimation by judiciously choosing the informative sub-carriers using the spectral stability score. Note also that in

Figure 5.7d, only 5 sub-carriers were selected.

5.3 Evaluation

We evaluated CardioFi in office room and bedroom environments. Figure 5.8 illustrates the environments we considered for our experiments, the placements of devices, and the location of a subject. July 20, 2020 5.3. Evaluation 128

CardioFi

(a) Perfomance of CardioFi vs Liu et. al.[8] for 200 seconds segment (b) Error rate

(c) Mean frequency spectrum (d) Mean frequency spectrum at time 150 second of (a) ac- at time 150 second of (a) as cording to [8] observed by CardioFi

Fig. 5.7: The performence of CardioFi versus that of Liu et. al. [8]

user receiver transmitter

Corridor glass 3.70 m 4 m white board

4

4 m 2.90 m (a) Office room setup (b) Bedroom setup

Fig. 5.8: Experimental setup scenarios

The first setup (see Figure 5.8a) represents the setup of the contact-free heart rate monitoring for a quasi-static subject (watching TV, reading, etc.).

The subject’s body does not intersect the line of sight (LoS) between the sender

July 20, 2020 5.3. Evaluation 129 and the receiver. Typical applications include long-term vital sign monitoring for medical applications and instantaneous heart rate monitoring after exercise as post-exercise recovery rates were shown to be a strong predictor of mortality

[125]. Another example is a subject observing the slowing down of her heart rates in real-time while practising meditation [126]. The second (see Figure

5.8b) setup is a representative environment for the vital sign monitoring during sleep.

The WiFi transmitter and receiver are two HP Elitebook 6930p laptops equipped with Intel 5300 WiFi card and both devices use internal antennas.

CSI data was collected using Linux 802.11n CSI Tool [105] in the 5 GHz band.

Four volunteers participated in the data collection process over a total period of 2 months4. The ground truth for heart rates was collected at 1 Hz by the Polar H7 sensor [127], which was wrapped around the subject’s chest and reported the instantaneous heart rates over Bluetooth. Network time protocol was used to synchronise the CSI and Polar H7 data streams. We varied the distances between a user and the LoS of devices for different Tx/Rx placement scenarios. Except for the results in Figure 5.9b, the distances were less than 2 metres.

5.3.1 Overall Performance

We begin by evaluating the heart rate estimation of our proposed approach

(see Figure 5.9a) and investigate how the performance changes with increasing distance of the user from the WiFi devices.

Compared to the baseline method in [8], CardioFi decreases the median

4Ethical approval was granted by the University of New South Wales (Approval Number HC17823).

July 20, 2020 5.3. Evaluation 130

(a) CDFs of heart rate estimation error (b) User-to-device distance impact

Fig. 5.9: The performance of CardioFi. error from 1.9 bpm to 1.14 bpm, a 40% reduction. Moreover, 90% of the errors are below 5.1 bpm which improves the baseline algorithm by 176%. In general, the proposed approach has a median error comparable to device-less systems implemented with directional antennas [6, 8] and device-based accelerometer systems [128]. This is achieved without requiring hardware modifications or direct contact with subject’s body.

We next test the impact of increasing the distance between the subject and any of the communicating devices and plot our results in (see Figure 5.9b).

As the distance increases, the reflected signal becomes weaker and hence the accuracy degrades gradually until reaching the median error of 1.6 bpm at distance of 2 metres. We find the signal becomes very noisy when the distance goes beyond 2 metres, resulting in a substantial increase of the error. Upon closer inspection of the signal, the majority of the sub-carriers fail to produce accurate estimation of the heart rate, which ultimately leads to large errors in the data fusion stage. We consider only distances up to two meters in the experiments for the rest of this chapter.

July 20, 2020 5.3. Evaluation 131

(a) Number of selected sub-carriers (b) Spectral Stability window (N)

(c) Sliding window length (wf ) (d) CSI packet sampling rates

Fig. 5.10: The impact of CardioFi parameters

5.3.2 Impact of Parameters

Impact of Sub-carrier selection

In this section we study the effect of the sub-carrier selection step on the produced estimation results. Figure 5.10a shows the error as we change the number of sub-carriers considered for each criterion. We compare our proposed

Spectral Stability score to the variance-based sub-carrier selection schemes in which the sub-carrier score is calculated based on variance of signal itself.

The spectral stability method outperforms the variance based method for the top ranked sub-carrier, with 2.8 bpm and 3.14 bpm median error, respectively.

The estimation improves as we include more sub-carriers until reaching ten sub-carriers and the median error of 1.14 bpm and 1.8 bpm for spectral and variance methods, respectively. On average, the spectral stability method

July 20, 2020 5.3. Evaluation 132 decreases the median error of the variance method by 19%.

Second important parameter used in the sub-carrier selection step is the window length N for assessing the spectral stability. Figure 5.10b illustrates that the median error is insensitive to N ≥ 20 (median error 1.1 bpm). Smaller

N values produce poor spectral stability score (median error 1.4 bpm) as all sub-carriers tend to score close to zero, making it difficult to identify reliable sub-carriers. We set the value of N to be 40 seconds.

Sliding Window Length wf

The window length used for calculating the FFT balances the need of obtaining accurate results and the initial delay in the system response. Ideally, larger window size is preferred to obtain more accurate results. However, this comes at the cost of increased computational processing and delayed reporting due to the initial delay. Figure 5.10c shows that segments as small as 10 seconds are sufficient for obtaining the heart rate with median error of 1.9 bpm. We set the value of wf to 20 seconds.

Impact of Sampling Rates

Figure 5.10d shows the median error for different sampling rates. Error ranges

(5th-90th percentile) are depicted as the vertical lines. While sampling at 20 samples per second should be theoretically sufficient for capturing the heart rate, we find this rate unreliable in practice. The results show that higher sampling rates reduce median errors. However, this effect diminishes above

100 samples per second. Hence, we set the sampling rate to 100 samples per second in our evaluation.

July 20, 2020 5.4. Related Works 133 5.4 Related Works

In this section, we augment the earlier treatment of related works(see Section

2.3.3) by surveying closely related RF vital sign monitoring research.

RF radars have been employed in contact-less vital sign monitoring. Adib et. al. [23] uses T-shaped special antennas and ultra-wideband FMCW radar to monitor breathing and heart rate of smart homes occupants. EQ-Radio

[129] extends [23] to acquiring Heart Rate Variability (HRV) and uses it for emotion recognition. mmVital [64] uses the Received Signal Strength of the highly directional 60 GHz millimeter radio signal with rotating antennas for breathing and heart rates monitoring. Liu et. al. [5] leveraged the ampli- tude information of CSI to monitor breathing and heart rates during sleep.

PhaseBeat [6] used CSI phase difference for the same purpose. Both [5] and

[6] employ directional antennas for the heartbeat monitoring scenario to boost the radio signal quality. The main distinction between CardioFi and earlier device-free RF heartbeat monitoring systems is addressing the accurate

HR estimation problem on COTS WiFi devices without relying on hard- ware enhancement. We manage to get reliable heart rate estimation by efficient data processing and fusing input from informative sub-carriers only.

We believe that enabling vital sign monitoring on ubiquitous unmodified WiFi devices is a key enabler of many interesting applications.

5.5 Limitations and Future Work

The current prototype of CardioFi allow extracting heart rate and can be improved in the following directions:

• Enhancing Resolution: through extending the current capabilities of

July 20, 2020 5.6. Conclusion 134

the system beyond the heart rate metric by capturing heart rate variabil-

ity (HRV). Capturing HRV from CSI signal is more challenging. How-

ever, the metric can be used effectively in a broader range of applications

including emotion and stress monitoring.

• Uninterrupted Monitoring: vital signs tracking in the current im-

plementation is done under the assumption that the user is quasi-static

(e.g. watching TV). Signal variations caused by full body motion will

overshadow micro motions caused by heart beating making it challenging

for the system to capture heart beating. Addressing this limitation can

enable un-interrupted monitoring.

5.6 Conclusion

In this Chapter, we presented a system for heart rate monitoring on top of

COTS WiFi devices. We showed the challenges of heart beat tracking on con- sumer grade WiFi systems and addressed them by a novel signal processing pipeline without resorting to noise mitigation hardware (i.e. directional an- tennas). Our results showed that the proposed CardioFi system outperforms state-of-the-arts by reducing their 50- and 90-th percentile error by 40% and

176%, respectively.

This work has resulted in the following publication:

1. Abdelwahed Khamis, Chun Tung Chou, Branislav Kusy and Wen Hu,

“CardioFi: Enabling Heart Rate Monitoring on Unmodified

COTS WiFi Devices” International Conference on Mobile and Ubiq-

uitous Systems: Computing, Networking and Services. MobiQuitous ’18

July 20, 2020 Chapter 6

Conclusions

In conclusion, the dissertation explored radio frequency (RF) as a ubiquitous medical sensing modality. Without any physical body contact, we showed that RF signal reflections can be practically used to track hand hygiene steps, capture human detailed breathing metrics for breath control applications and eliciting heart rates. All of which achieved using algorithms working on native wireless measurements from commercial unmodified devices.

6.1 Summary of Contributions

The contributions of this research work collectively fall within the realm of enabling ubiquitous medical sensing on top of commercial RF devices. From an algorithmic perspective, this dissertation introduced novel techniques for extracting motions relevant to human health from consumer RF devices. This culminated in the following key practical contributions:

• Introducing algorithms for human motion extraction from RF signal in

the presence of unknown motions boundaries (Chapter 3), noisy measure-

ments of instantaneous motions (Chapter 4) and very low signal-to-noise

135 6.2. Future Work 136

ratio (Chapter 5).

• Framing back-to-back gesture recognition as a sequence labelling prob-

lem and introducing semi-supervised algorithm for tracking naturally

performed gestures. This was employed in the first RF system for track-

ing hand hygiene gestures of healthcare workers. The system tracks

naturally performed gestures tracking with very little training overhead

and able to operate in the presence of interfering users.

• Modelling the impact of micro-motion chest displacements on the WiFi

channel and proposing algorithms for instantaneous tracking of chest mo-

tion. Hence, allowing WiFi devices to extract detailed breathing metrics

essential for breath control and biofeedback applications.

• Enabling WiFi devices to accurately track heart rate based purely on

native (and noisy) measurements without requiring hardware modifica-

tions.

6.2 Future Work

This research focused on the target medical sensing problems; however, it also introduced algorithms that can be extended to other applications, including non-medical applications. The deep sequence learning model (see Chapter

3) substantially reduces the labelling effort required for training in sequence recognition problem and bypasses the need to do manual segmentation. It can be reused in problems for which it is easy to obtain the sequence of actions performed but cumbersome and time-wasting to record the start/end times of each action. Examples include assembly tasks recognition [130] application among others. Additionally, the micro-motion modelling (see Chapter 4) de- July 20, 2020 6.2. Future Work 137 veloped for tracking chest motion can extended to other applications such as tremor recording (discussed below). Looking forward, the practical value of

RF medical sensing can be extended by addressing the current limitations and investigating other medical sensing problems.

6.2.1 Limitations

The following is a summary of the limitation explored in the thesis chapters:

• In-ward Validation of Hand Hygiene Tracking Although RFWash system

was designed to remove the impact of interfering users and focusing only

on the main subject, the experiments were largely performed in the lab.

Although the results are encouraging, there is still more work to be done

to validate the gesture tracking accuracy in healthcare facilities. The

next step is installing the system in hospital wards and testing it on

handrub data collected from clinicians.

• All-time Monitoring of Vital Signs: In vital sign monitoring systems the

tracking is being performed while the user is quasi-stationary. Although

people are in sedentary state most of the time in in-home environments,

uninterrupted monitoring can provide much needed data for long-term

monitoring. Whole body motions that will overwhelm the tiny chest

displacement is the main challenge in this problem. This a challenge

for contact-less sensing approaches in general (including non-RF) and

overcoming this is an interesting direction for future research.

• Multi-user Sensing: Extending monitoring to the multi-user case is al-

ways desirable in sensing applications. In Chapter 3 , we used simple

“cuttoff” technique to remove the impact of interfering users and fo-

cusing on the target. It can be noted, however, that reflections from July 20, 2020 6.2. Future Work 138

interfering users capture their movements and actions and can be fur-

ther analysed to enable multi-user sensing. However, the situation can

be very challenging when the users are very close to each other (cen-

timetres apart). For this, further processing of the measurements using

signal and tensor decomposition techniques can be leveraged.

• Heart Beating Sensing Resolution Capturing fine-grained heart rate met-

rics can enable high-level applications beyond well-being monitoring.

The fine-grained information of the beat-to-beat interval is not captured

by heart rate and presents a key metric for applications such as emotion

recognition and stress monitoring. As observed it is challenging to cap-

ture heart rate from noisy wireless measurements (see Chapter 5). How-

ever, advanced signal processing techniques that recover signal buried in

noise can be experimented with. For example, Adaptive Stochastic Res-

onance [131] , which is one the techniques used to amplify weak signal

buried in noise, could be beneficial.

6.2.2 Future Applications

Beyond addressing the limitations, RF sensing can be part of a range of other medical sensing problems that are interesting venues for future research. By exploring the hand hygiene problem in this dissertation (see Chapter 3), we only scratched the surface of in-hospital applications in which RF monitoring can practically employed. Manual auditing is prevalent in clinical environments as we will see shortly. Thus, we believe that automation opportunities using RF systems is ample. Below we explore two examples of in-hospital applications and additional third in-home application. All of these can be approached using the current capabilities of commercial RF devices.

July 20, 2020 6.2. Future Work 139

• Mobility Logging in Intensive Care Units (ICU) [132]. Post-

intensive care syndrome is a collection of health disorders (such as ICU-

acquired muscular weakness) that are common among patients who sur-

vive intensive care. To mitigate this risk, early and frequent patient

mobilisation during ICU is one of the techniques adopted. For this,

monitoring patient’s mobility is critical and, unfortunately, the current

practice is based on the labor intensive direct human observation. We en-

vision an RF system that automates the mobility monitoring process by

tracking patients activities in ICU such as getting out of the bed and get-

ting in chair. Range Doppler measurements (see Chapter 3) can be used

to locate subjects from their torso reflections and identify the activities

through their distinctive velocity patterns. Interesting research questions

in this direction would be “How can a system differentiate between a tar-

get patient and a nurse staff member?” and “How can the occlusion of

interfering subjects be addressed”.In relation to the first question, RF

signals could obtain biometrics data, such as gait data [133] or vital bio-

metrics combined with location history. In relation to the second, using

numerous RF sensors can be leveraged and experimental investigation

into optimal deployment setup can be performed.

• Behavioural Mapping: Similar to the previous application, behavioural

mapping is concerned with the physical activity of hospitalised patients

(usually those in rehabilitation centres). In addition to mobility infor-

mation, the activity location and social context is recorded. Context

recording allow answering questions like ‘What percentage of time do

patients spend in their patient rooms?’ and ‘How many patients stay

in their beds during lunch time?’. Traditionally, conventional manual

July 20, 2020 6.2. Future Work 140

monitoring methods (whereby trained observers record these parame-

ters every 10 minutes from 8 am to 5 pm) have been used to address

such questions. Efforts to employ accelerometer for monitoring showed

that 75% of the patients declined to wear the device [134] and camera

solutions found that the ethical issues associated with privacy may out-

weigh the monitoring benefits [135]. In light of this, we believe that a

contact-less RF systems capable for tracking patient’s physical activities

and localising users and thus represent a suitable alternative. The re-

search questions overlap with the previous application with the added

necessity of covering multiple rooms. This can be done using multiple

in-room sensors or employing RF system with longer wavelength to work

through-wall (as discussed in Chapter 2).

• Tremor Recording: Tremors (i.e., the involuntary oscillatory move-

ment of body part) can be an indicator of disease condition. In addi-

tion to Physiological Tremor, which occurs in healthy human beings in

stress or fatigue situations, Essential Tremor (ET) and Parkinson’s Dis-

ease (PD) are common among the elderly. To track and record progres-

sion of these tremors, researcher use accelerometry and electromyogrphy

(EMG) wearables in clinical settings [136]. We envision in-home RF-

based monitoring approach that can be more convenient and suitable

for providing long-term longitudinal data sought be researchers in this

field [137]. Micro-motion model (see Chapter 4) can be extended and

reused to record the tremor and motion frequency used by specialist in

identifying tremor type. A key research question in this problem regards

removing the impact of large motions that can overshadow the tremor’s.

This is relevant to the previously discussed ‘All-time Monitoring of Vital

July 20, 2020 6.2. Future Work 141

Signs’. The same techniques could viably be employed to address this

issue.

July 20, 2020 Bibliography

[1] WiRelax Demo. https://youtu.be/e er2w39b4I, 2019. [Online; accessed

15-February-2020].

[2] WHO guidelines on hand hygiene in health care. Published

by World Health Organisation. Retrieved from: whqlib-

doc.who.int/publications/009.pdf, 2009.

[3] Albert Haque, Michelle Guo, Alexandre Alahi, Serena Yeung, Zelun Luo,

Alisha Rege, Jeffrey Jopling, Lance Downing, William Beninati, Amit

Singh, et al. Towards vision-based smart hospitals: A system for track-

ing and monitoring hand hygiene compliance. In Machine Learning for

Healthcare Conference, pages 75–87, 2017.

[4] Yongsen Ma, Gang Zhou, Shuangquan Wang, Hongyang Zhao, and

Woosub Jung. Signfi: Sign language recognition using wifi. Proceedings

of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technolo-

gies, 2(1):23, 2018.

[5] Jian Liu, Yan Wang, Yingying Chen, Jie Yang, Xu Chen, and Jerry

Cheng. Tracking vital signs during sleep leveraging off-the-shelf wifi. In

Proceedings of the 16th ACM International Symposium on Mobile Ad

Hoc Networking and Computing, pages 267–276. ACM, 2015.

142 BIBLIOGRAPHY 143

[6] Xuyu Wang, Chao Yang, and Shiwen Mao. Phasebeat: Exploiting csi

phase data for vital sign monitoring with commodity wifi devices. In

IEEE 37th International Conference on Distributed Computing Systems

(ICDCS), pages 1230–1239. IEEE, 2017.

[7] Xuyu Wang, Chao Yang, and Shiwen Mao. Tensorbeat: Tensor decom-

position for monitoring multiperson breathing beats with commodity

wifi. ACM Transactions on Intelligent Systems and Technology (TIST),

9(1):1–27, 2017.

[8] Jian Liu, Yingying Chen, Yan Wang, Xu Chen, Jerry Cheng, and Jie

Yang. Monitoring vital signs and postures during sleep using wifi signals.

IEEE Internet of Things Journal, 2018.

[9] Sharon Ann Hunt. Acc/aha 2005 guideline update for the diagnosis

and management of chronic heart failure in the adult: a report of the

american college of cardiology/american heart association task force on

practice guidelines (writing committee to update the 2001 guidelines

for the evaluation and management of heart failure). Journal of the

American College of Cardiology, 46(6):e1–e82, 2005.

[10] Klaus F Rabe, Suzanne Hurd, Antonio Anzueto, Peter J Barnes, Sonia A

Buist, Peter Calverley, Yoshinosuke Fukuchi, Christine Jenkins, Roberto

Rodriguez-Roisin, Chris Van Weel, et al. Global strategy for the diag-

nosis, management, and prevention of chronic obstructive pulmonary

disease: Gold executive summary. American journal of respiratory and

critical care medicine, 176(6):532–555, 2007.

[11] Hulya Gokalp and Malcolm Clarke. Monitoring activities of daily living

July 20, 2020 BIBLIOGRAPHY 144

of the elderly and the potential for its use in telecare and telehealth: a

review. TELEMEDICINE and e-HEALTH, 19(12):910–923, 2013.

[12] Robert Steele, Amanda Lo, Chris Secombe, and Yuk Kuen Wong. El-

derly persons’ perception and acceptance of using wireless sensor net-

works to assist healthcare. International Journal of Medical Informatics,

78(12):788–801, 2009.

[13] Muhammad Rizwan Asghar, TzeHowe Lee, Mirza Mansoor Baig, Ehsan

Ullah, Giovanni Russello, and Gillian Dobbie. A review of privacy and

consent management in healthcare: A focus on emerging data sources.

In IEEE 13th International Conference on e-Science (e-Science), pages

518–522. IEEE, 2017.

[14] Aditya Virmani and Muhammad Shahzad. Position and orientation ag-

nostic gesture recognition using wifi. In Proceedings of the 15th Annual

International Conference on Mobile Systems, Applications, and Services,

pages 252–264. ACM, 2017.

[15] Kavita D Chandwani, Bob Thornton, George H Perkins, Banu Arun,

NV Raghuram, HR Nagendra, Qi Wei, and Lorenzo Cohen. Yoga im-

proves quality of life and benefit finding in women undergoing radiother-

apy for breast cancer. Journal of the Society for Integrative Oncology,

8(2), 2010.

[16] Anne E Holland, CJ Hill, AY Jones, and CF McDonald. Breathing exer-

cises for chronic obstructive pulmonary disease. The Cochrane database

of systematic reviews, 10, 2010.

[17] E Holloway and FS Ram. Breathing exercises for asthma. Cochrane

Database Syst Rev, 1, 2004. July 20, 2020 BIBLIOGRAPHY 145

[18] Ghufran Shafiq and Kalyana C Veluvolu. Surface chest motion decom-

position for cardiovascular monitoring. Scientific reports, 4:5093, 2014.

[19] Deepak Vasisht, Guo Zhang, Omid Abari, Hsiao-Ming Lu, Jacob Flanz,

and Dina Katabi. In-body backscatter communication and localization.

In Proceedings of the International Conference of the ACM Special In-

terest Group on Data Communication, pages 132–146. ACM, 2018.

[20] Teng Wei and Xinyu Zhang. mtrack: High-precision passive tracking

using millimeter wave radios. In Proceedings of the International Con-

ference on Mobile Computing and Networking, pages 117–129. ACM,

2015.

[21] Andr´e Spizzichino. The Scattering of Electromagnetic Waves from Rough

Surfaces. Pergamon Press, 1963.

[22] Fadel Adib, Chen-Yu Hsu, Hongzi Mao, Dina Katabi, and Fr´edo Du-

rand. Capturing the human figure through a wall. ACM Transactions

on Graphics (TOG), 34(6):219, 2015.

[23] Fadel Adib, Hongzi Mao, Zachary Kabelac, Dina Katabi, and Robert C

Miller. Smart homes that monitor breathing and heart rate. In Proceed-

ings of the 33rd annual ACM conference on human factors in computing

systems, pages 837–846. ACM, 2015.

[24] Tianhong Li, Lijie Fan, Mingmin Zhao, Yingcheng Liu, and Dina Katabi.

Making the invisible visible: Action recognition through walls and occlu-

sions. In Proceedings of the IEEE International Conference on Computer

Vision and Pattern Recognition (CVPR), pages 872–881, 2019.

[25] Jaime Lien, Nicholas Gillian, M Emre Karagozler, Patrick Amihood, July 20, 2020 BIBLIOGRAPHY 146

Carsten Schwesig, Erik Olson, Hakim Raja, and Ivan Poupyrev. Soli:

Ubiquitous gesture sensing with millimeter wave radar. ACM Transac-

tions on Graphics (TOG), 35(4):142, 2016.

[26] Saiwen Wang, Jie Song, Jaime Lien, Ivan Poupyrev, and Otmar Hilliges.

Interacting with soli: Exploring fine-grained dynamic gesture recogni-

tion in the radio-frequency spectrum. In Proceedings of the 29th Annual

Symposium on User Interface Software and Technology, pages 851–860.

ACM, 2016.

[27] William C Stone and William C Stone. Nist construction automation

program report no. 3: Electromagnetic signal attenuation in construction

materials. US Department of Commerce, National Institute of Standards

and Technology, 1997.

[28] Chen-Yu Hsu, Rumen Hristov, Guang-He Lee, Mingmin Zhao, and Dina

Katabi. Enabling identification and behavioral sensing in homes using

radio reflections. In Proceedings of the CHI Conference on Human Fac-

tors in Computing Systems, page 548. ACM, 2019.

[29] Tianben Wang, Daqing Zhang, Yuanqing Zheng, Tao Gu, Xingshe Zhou,

and Bernadette Dorizzi. C-fmcw based contactless respiration detection

using acoustic signal. Proceedings of the ACM on Interactive, Mobile,

Wearable and Ubiquitous Technologies, 1(4):170, 2018.

[30] Yaxiong Xie, Zhenjiang Li, and Mo Li. Precise power delay profiling

with commodity wifi. In Proceedings of the International Conference on

Mobile Computing and Networking, pages 53–64. ACM, 2015.

[31] Zhenyu Wu, Zhangyang Wang, Zhaowen Wang, and Hailin Jin. Towards

privacy-preserving visual recognition via adversarial training: A pilot July 20, 2020 BIBLIOGRAPHY 147

study. In Proceedings of the European Conference on Computer Vision

(ECCV), pages 606–624, 2018.

[32] Edward Chou, Matthew Tan, Cherry Zou, Michelle Guo, Albert Haque,

Arnold Milstein, and Li Fei-Fei. Privacy-preserving action recognition

for smart hospitals using low-resolution depth images. arXiv preprint

arXiv:1811.09950, 2018.

[33] Bingbin Liu, Michelle Guo, Edward Chou, Rishab Mehra, Serena Yeung,

N Lance Downing, Francesca Salipur, Jeffrey Jopling, Brandi Campbell,

Kayla Deru, et al. 3d point cloud-based visual prediction of icu mobility

care activities. In Machine Learning for Healthcare Conference, pages

17–29, 2018.

[34] Manikanta Kotaru, Guy Satat, Ramesh Raskar, and Sachin Katti. Light-

field for rf. arXiv preprint arXiv:901.03953, 2019.

[35] Gregory Charvat, Andrew Temme, Micha Feigin, and Ramesh Raskar.

Time-of-flight microwave camera. Scientific reports, 5:14709, 2015.

[36] Raghav H Venkatnarayan, Griffin Page, and Muhammad Shahzad.

Multi-user gesture recognition using wifi. In Proceedings of the 16th

Annual International Conference on Mobile Systems, Applications, and

Services, pages 401–413. ACM, 2018.

[37] Marco Mercuri, Ilde Rosa Lorato, Yao-Hong Liu, Fokko Wieringa, Chris

Van Hoof, and Tom Torfs. Vital-sign monitoring and spatial tracking of

multiple people using a contactless radar-based sensor. Nature Electron-

ics, page 1, 2019.

July 20, 2020 BIBLIOGRAPHY 148

[38] Fadel Adib, Zach Kabelac, Dina Katabi, and Robert C Miller. 3d track-

ing via body radio reflections. In 11th USENIX Symposium on Networked

Systems Design and Implementation (NSDI), pages 317–329, 2014.

[39] Kiran Joshi, Dinesh Bharadia, Manikanta Kotaru, and Sachin Katti.

Wideo: Fine-grained device-free motion tracing using rf backscatter. In

12th USENIX Symposium on Networked Systems Design and Implemen-

tation (NSDI), pages 189–204, 2015.

[40] Aditya Virmani and Muhammad Shahzad. Position and orientation ag-

nostic gesture recognition using wifi. In Proceedings of the 15th Annual

International Conference on Mobile Systems, Applications, and Services,

pages 252–264. ACM, 2017.

[41] Heba Abdelnasser, Moustafa Youssef, and Khaled A Harras. Wigest: A

ubiquitous wifi-based gesture recognition system. In 2015 IEEE Con-

ference on Computer Communications (INFOCOM), pages 1472–1480.

IEEE, 2015.

[42] A Graves. Supervised sequence labelling with recurrent neural networks

[ph. d. dissertation]. Technical University of Munich, Germany, 2008.

[43] Alex Graves, Santiago Fern´andez, Faustino Gomez, and J¨urgen Schmid-

huber. Connectionist temporal classification: labelling unsegmented se-

quence data with recurrent neural networks. In Proceedings of the 23rd

International Conference on Machine learning, pages 369–376. ACM,

2006.

[44] AR Marra and MB Edmond. New technologies to monitor healthcare

worker hand hygiene. Clinical Microbiology and Infection, 20(1):29–33,

2014. July 20, 2020 BIBLIOGRAPHY 149

[45] Lisa L Pineles, Daniel J Morgan, Heather M Limper, Stephen G Weber,

Kerri A Thom, Eli N Perencevich, Anthony D Harris, and Emily Landon.

Accuracy of a radiofrequency identification (rfid) badge system to mon-

itor hand hygiene behavior during routine clinical activities. American

journal of infection control, 42(2):144–147, 2014.

[46] Hong Li, Shishir Chawla, Richard Li, Sumeet Jain, Gregory D Abowd,

Thad Starner, Cheng Zhang, and Thomas Plotz. Wristwash: towards

automatic handwashing assessment using a wrist-worn device. In Pro-

ceedings of the 2018 ACM International Symposium on Wearable Com-

puters, pages 132–139. ACM, 2018.

[47] David Fern´andez Llorca, Ignacio Parra, Miguel Angel´ Sotelo, and Ger-

ard Lacey. A vision-based system for automatic hand washing quality

assessment. Machine Vision and Applications, 22(2):219–234, 2011.

[48] Henry Zhong, Salil S Kanhere, and Chun Tung Chou. Washindepth:

Lightweight hand wash monitor using depth sensor. In Proceedings of

the 13th International Conference on Mobile and Ubiquitous Systems:

Computing, Networking and Services, pages 28–37. ACM, 2016.

[49] Andrea Zaccaro, Andrea Piarulli, Marco Laurino, Erika Garbella, Danilo

Menicucci, Bruno Neri, and Angelo Gemignani. How breath-control can

change your life: a systematic review on psycho-physiological correlates

of slow breathing. Frontiers in Human Neuroscience, 12:353, 2018.

[50] Pyoung Sook Lee. Theoretical bases and technical application of breath-

ing therapy in stress management. Journal of Korean Academy of Nurs-

ing, 29(6):1304–1313, 1999.

July 20, 2020 BIBLIOGRAPHY 150

[51] William J Elliott and Joseph L Izzo Jr. Device-guided breathing to lower

blood pressure: case report and clinical overview. Medscape General

Medicine, 8(3):23, 2006.

[52] Relu Cernes and Reuven Zimlichman. Resperate: the role of paced

breathing in hypertension treatment. Journal of the American Society

of Hypertension, 9(1):38–47, 2015.

[53] Robert D Brook, Lawrence J Appel, Melvyn Rubenfire, Gbenga

Ogedegbe, John D Bisognano, William J Elliott, Flavio D Fuchs, Joel W

Hughes, Daniel T Lackland, Beth A Staffileno, et al. Beyond medica-

tions and diet: alternative approaches to lowering blood pressure: a

scientific statement from the american heart association. Hypertension,

61(6):1360–1383, 2013.

[54] Emre Ertin, Nathan Stohs, Santosh Kumar, Andrew Raij, Mustafa

Al’Absi, and Siddharth Shah. Autosense: unobtrusively wearable sensor

suite for inferring the onset, causality, and consequences of stress in the

field. In Proceedings of the ACM Conference on Embedded Networked

Sensor Systems, pages 274–287. ACM, 2011.

[55] Heba Abdelnasser, Khaled A Harras, and Moustafa Youssef. Ubibreathe:

A ubiquitous non-invasive wifi-based breathing estimator. In Proceedings

of the 16th ACM International Symposium on Mobile Ad Hoc Networking

and Computing, pages 277–286. ACM, 2015.

[56] Xiao Sun, Li Qiu, Yibo Wu, Yeming Tang, and Guohong Cao. Sleepmon-

itor: Monitoring respiratory rate and body position during sleep using

smartwatch. Proceedings of the ACM on Interactive, Mobile, Wearable

and Ubiquitous Technologies, 1(3):1–22, 2017. July 20, 2020 BIBLIOGRAPHY 151

[57] Spire. https://spire.io/, 2020. [Online; accessed 15-February-2020].

[58] VitaliWear. https://vitaliwear.com/, 2020. [Online; accessed 15-

February-2020].

[59] Hao-Yu Wu, Michael Rubinstein, Eugene Shih, John Guttag, Fr´edo Du-

rand, and William Freeman. Eulerian video magnification for revealing

subtle changes in the world. ACM transactions on graphics (TOG),

31(4):1–8, 2012.

[60] Rajalakshmi Nandakumar, Shyamnath Gollakota, and Nathaniel Wat-

son. Contactless sleep apnea detection on smartphones. In Proceedings

of the International Conference on Mobile Systems, Applications, and

Services, pages 45–57. ACM, 2015.

[61] Prana. http://www.prana.co, 2020. [Online; accessed 15-February-2020].

[62] Tian Hao, Chongguang Bi, Guoliang Xing, Roxane Chan, and Linlin

Tu. Mindfulwatch: A smartwatch-based system for real-time respiration

monitoring during meditation. Proceedings of the ACM on Interactive,

Mobile, Wearable and Ubiquitous Technologies, 1(3):57, 2017.

[63] Chen-Hsuan Iris Shih, Naofumi Tomita, Yanick X Lukic,

Alvaro´ Hern´andez Reguera, Elgar Fleisch, and Tobias Kowatsch.

Breeze: Smartphone-based acoustic real-time detection of breathing

phases for a gamified biofeedback breathing training. Proceedings of the

ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies,

3(4):152, 2019.

[64] Zhicheng Yang, Parth H Pathak, Yunze Zeng, Xixi Liran, and Prasant

July 20, 2020 BIBLIOGRAPHY 152

Mohapatra. Monitoring vital signs using millimeter wave. In MobiHoc,

pages 211–220, 2016.

[65] Xiao Sun, Li Qiu, Yibo Wu, Yeming Tang, and Guohong Cao. Sleepmon-

itor: Monitoring respiratory rate and body position during sleep using

smartwatch. Proceedings of the ACM on Interactive, Mobile, Wearable

and Ubiquitous Technologies, 1(3):104, 2017.

[66] Ahmed H Abdelhafiz. Heart failure in older people: causes, diagnosis

and treatment. Age and ageing, 31(1):29–36, 2002.

[67] Albert B Levin. A simple test of cardiac function based upon the heart

rate changes induced by the valsalva maneuver. The American Journal

of Cardiology, 18(1):90–99, 1966.

[68] Daniel G Carey. Quantifying differences in the “fat burning” zone and

the aerobic zone: implications for training. The Journal of Strength &

Conditioning Research, 23(7):2090–2095, 2009.

[69] Alison M McManus, Rich SW Masters, Raija MT Laukkanen, CW Clare,

Cindy HP Sit, and Fiona CM Ling. Using heart-rate feedback to increase

physical activity in children. Preventive Medicine, 47(4):402–408, 2008.

[70] Reham Mohamed and Moustafa Youssef. Heartsense: Ubiquitous accu-

rate multi-modal fusion-based heart rate estimation using smartphones.

Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous

Technologies, 1(3):97, 2017.

[71] Eduardo Pinheiro, Octavian Postolache, and Pedro Gir˜ao. Theory and

developments in an unobtrusive cardiovascular system representation:

July 20, 2020 BIBLIOGRAPHY 153

ballistocardiography. The Open Biomedical Engineering Journal, 4:201,

2010.

[72] Øyvind Aardal, Yoann Paichard, Sverre Brovoll, Tor Berger, Tor Sverre

Lande, and Svein-Erik Hamran. Physical working principles of medical

radar. IEEE Transactions on Biomedical Engineering, 60(4):1142–1149,

2012.

[73] Healthcare-associated infections: data and statistics. Published by Cen-

ter for Disease Control and Prevention., 2016.

[74] I. R. Daniels and B. I. Rees. Handwashing: simple, but effective. Annals

of the Royal College of Surgeons of England, 81(2):117–118, Mar 1999.

10364970[pmid].

[75] Muhammad Shahzad and Shaohu Zhang. Augmenting user identification

with wifi based gesture recognition. Proceedings of the ACM on Interac-

tive, Mobile, Wearable and Ubiquitous Technologies, 2(3):134, 2018.

[76] Hu Liu, Sheng Jin, and Changshui Zhang. Connectionist temporal clas-

sification with maximum entropy regularization. In Advances in Neural

Information Processing Systems, pages 831–841, 2018.

[77] Yen Lee Angela Kwok, Michelle Callard, and Mary-Louise McLaws.

An automated hand hygiene training system improves hand hygiene

technique but not compliance. American journal of infection control,

43(8):821–825, 2015.

[78] Pavlo Molchanov, Xiaodong Yang, Shalini Gupta, Kihwan Kim, Stephen

Tyree, and Jan Kautz. Online detection and classification of dynamic

July 20, 2020 BIBLIOGRAPHY 154

hand gestures with recurrent 3d convolutional neural network. In Pro-

ceedings of the IEEE Conference on Computer Vision and Pattern Recog-

nition, pages 4207–4215, 2016.

[79] Lawrence R Rabiner. A tutorial on hidden markov models and selected

applications in speech recognition. Proceedings of the IEEE, 77(2):257–

286, 1989.

[80] Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai,

Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catan-

zaro, Qiang Cheng, Guoliang Chen, et al. Deep speech 2: End-to-end

speech recognition in english and mandarin. In International conference

on machine learning, pages 173–182, 2016.

[81] Alexander Richard, Hilde Kuehne, Ahsan Iqbal, and Juergen Gall.

Neuralnetwork-viterbi: A framework for weakly supervised video learn-

ing. In Proceedings of the IEEE Conference on Computer Vision and

Pattern Recognition, pages 7386–7395, 2018.

[82] Siyuan Qi, Siyuan Huang, Ping Wei, and Song-Chun Zhu. Predicting

human activities using stochastic grammar. In Proceedings of the IEEE

International Conference on Computer Vision, pages 1164–1172, 2017.

[83] Mike Schuster and Kuldip K Paliwal. Bidirectional recurrent neural

networks. IEEE Transactions on Signal Processing, 45(11):2673–2681,

1997.

[84] Yonglong Tian, Guang-He Lee, Hao He, Chen-Yu Hsu, and Dina Katabi.

Rf-based fall monitoring using convolutional neural networks. Proceed-

ings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Tech-

nologies, 2(3):1–24, 2018. July 20, 2020 BIBLIOGRAPHY 155

[85] Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and

Manohar Paluri. Learning spatiotemporal features with 3d convolutional

networks. In Proceedings of the IEEE international conference on com-

puter vision, pages 4489–4497, 2015.

[86] Yunan Li, Qiguang Miao, Kuan Tian, Yingying Fan, Xin Xu, Rui Li,

and Jianfeng Song. Large-scale gesture recognition with a fusion of rgb-

d data based on the c3d model. In 2016 23rd International Conference

on Pattern Recognition (ICPR), pages 25–30. IEEE, 2016.

[87] Y. Zou, J. Xiao, J. Han, K. Wu, Y. Li, and L. M. Ni. Grfid: A device-

free rfid-based gesture recognition system. IEEE Transactions on Mobile

Computing, 16(2):381–393, Feb 2017.

[88] Zhenyuan Zhang, Zengshan Tian, and Mu Zhou. Latern: Dynamic con-

tinuous hand gesture recognition using fmcw radar sensor. IEEE Sensors

Journal, 18(8):3278–3289, 2018.

[89] Takeru Miyato, Shin-ichi Maeda, Masanori Koyama, and Shin Ishii. Vir-

tual adversarial training: a regularization method for supervised and

semi-supervised learning. IEEE transactions on Pattern Analysis and

Machine Intelligence, 41(8):1979–1993, 2018.

[90] Lijie Fan, Tianhong Li, Rongyao Fang, Rumen Hristov, Yuan Yuan,

and Dina Katabi. Learning longterm representations for person re-

identification using radio signals. In Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition(CVPR), 2020.

[91] Jianwei Liu, Jinsong Han, Lei Yang, Fei Wang, Feng Lin, and Kui Ren.

A framework for behavior privacy preserving in radio frequency signal,

2020. July 20, 2020 BIBLIOGRAPHY 156

[92] Li Guo, Lena Berglin, YJ Li, H Mattila, A Kalantar Mehrjerdi, and

M Skrifvars. Disappearing sensor:textile based sensor for monitoring

breathing. In International Conference on Control, Automation and Sys-

tems Engineering (CASE), pages 1–4. IEEE, 2011.

[93] Edmond Mitchell, Shirley Coyle, Noel E O’Connor, Dermot Diamond,

and Tomas Ward. Breathing feedback system with wearable textile sen-

sors. In International Conference on Body Sensor Networks (BSN), pages

56–61. IEEE, 2010.

[94] Apple Breathe app. https://support.apple.com/en-ke/HT206999, 2020.

[Online; accessed 15-February-2020].

[95] Bin Yu, Mathias Funk, Jun Hu, Qi Wang, and Loe Feijs. Biofeedback

for everyday stress management: a systematic review. Frontiers in ICT,

5:23, 2018.

[96] Ruth Ravichandran, Elliot Saba, Ke-Yu Chen, Mayank Goel, Sidhant

Gupta, and Shwetak N Patel. Wibreathe: Estimating respiration rate

using wireless signals in natural settings in the home. In Pervasive Com-

puting and Communications (PerCom), 2015 IEEE International Con-

ference on, pages 131–139. IEEE, 2015.

[97] M Nowogrodzki, DD Mawhinney, and HF Milgazo. Non-invasive mi-

crowave instruments for the measurement of respiration and heart rates.

NAECON 1984, pages 958–960, 1984.

[98] Hao Wang, Daqing Zhang, Junyi Ma, Yasha Wang, Yuxiang Wang, Dan

Wu, Tao Gu, and Bing Xie. Human respiration detection with com-

modity wifi devices: do user location and body orientation matter? In

July 20, 2020 BIBLIOGRAPHY 157

Proceedings of the 2016 ACM International Joint Conference on Perva-

sive and Ubiquitous Computing, pages 25–36. ACM, 2016.

[99] Ilse Van Diest, Karen Verstappen, Andr´e E Aubert, Devy Widjaja,

Debora Vansteenwegen, and Elke Vlemincx. Inhalation/exhalation ra-

tio modulates the effect of slow breathing on heart rate variability and

relaxation. Applied psychophysiology and biofeedback, 39(3-4):171–180,

2014.

[100] Daqing Zhang, Hao Wang, and Dan Wu. Toward centimeter-scale human

activity sensing with wi-fi signals. Computer, 50(1):48–57, 2017.

[101] Wei Wang, Alex X Liu, Muhammad Shahzad, Kang Ling, and Sanglu

Lu. Understanding and modeling of wifi signal based human activity

recognition. In Proceedings of the 21st annual international conference

on mobile computing and networking, pages 65–76. ACM, 2015.

[102] Zhicheng Yang, Parth H Pathak, Yunze Zeng, Xixi Liran, and Prasant

Mohapatra. Monitoring vital signs using millimeter wave. In MobiHoc,

pages 211–220, 2016.

[103] Jianwen Luo, Kui Ying, and Jing Bai. Savitzky–golay smoothing and

differentiation filter for even number data. Signal Processing, 85(7):1429–

1434, 2005.

[104] Corporate Meditation: How and Why Big Businesses Are Promoting

Meditation. https://tinyurl.com/yc7hbhyq, 2020. [Online; accessed 15-

February-2020].

[105] Daniel Halperin, Wenjun Hu, Anmol Sheth, and David Wetherall. Tool

July 20, 2020 BIBLIOGRAPHY 158

release: Gathering 802.11 n traces with channel state information. ACM

SIGCOMM Computer Communication Review, 41(1):53–53, 2011.

[106] XeThru X2M200. https://www.xethru.com/respiration-sensor-x2m200.

html, 2020. [Online; accessed 15-February-2020].

[107] Dag T Wisland, Kristian Granhaug, Jan Roar Pleym, Nikolaj Andersen,

Stig Støa, and H˚akon A Hjortland. Remote monitoring of vital signs

using a cmos uwb radar transceiver. In 14th IEEE International New

Circuits and Systems Conference (NEWCAS), pages 1–4. IEEE, 2016.

[108] Yee Siong Lee, Pubudu N Pathirana, Robin J Evans, and Christopher L

Steinfort. Noncontact detection and analysis of respiratory function us-

ing microwave doppler radar. Journal of Sensors, 2015, 2015.

[109] Youwei Zeng, Dan Wu, Ruiyang Gao, Tao Gu, and Daqing Zhang. Full-

breathe: Full human respiration detection exploiting complementarity of

csi phase and amplitude of wifi signals. Proceedings of the ACM on In-

teractive, Mobile, Wearable and Ubiquitous Technologies, 2(3):148, 2018.

[110] Tamara G Kolda and Brett W Bader. Tensor decompositions and appli-

cations. SIAM review, 51(3):455–500, 2009.

[111] Youwei Zeng, Dan Wu, Jie Xiong, Enze Yi, Ruiyang Gao, and Daqing

Zhang. Farsense: Pushing the range limit of wifi-based respiration sens-

ing with csi ratio of two antennas. arXiv preprint arXiv:1907.03994,

2019.

[112] Wenjun Jiang, Chenglin Miao, Fenglong Ma, Shuochao Yao, Yaqing

Wang, Ye Yuan, Hongfei Xue, Chen Song, Xin Ma, Dimitrios Kout-

sonikolas, Wenyao Xu, Lu Su, and Dimitrios Kout. Towards environment July 20, 2020 BIBLIOGRAPHY 159

independent device free human activity recognition. In Proceedings of the

21st annual international conference on mobile computing and network-

ing. ACM, 2018.

[113] Kanit Wongsuphasawat, Alex Gamburg, and Neema Moraveji. You can’t

force calm: designing and evaluating respiratory regulating interfaces for

calming technology. In Adjunct proceedings of the 25th annual ACM sym-

posium on User interface software and technology, pages 69–70. ACM,

2012.

[114] Guanhua Wang, Yongpan Zou, Zimu Zhou, Kaishun Wu, and Lionel M

Ni. We can hear you with wi-fi! IEEE Transactions on Mobile Comput-

ing, 15(11):2907–2920, 2016.

[115] Donny Huang, Rajalakshmi Nandakumar, and Shyamnath Gollakota.

Feasibility and limits of wi-fi imaging. In Proceedings of the 12th ACM

Conference on Embedded Network Sensor Systems, pages 266–279. ACM,

2014.

[116] Pedro Melgarejo, Xinyu Zhang, Parameswaran Ramanathan, and David

Chu. Leveraging directional antenna capabilities for fine-grained gesture

recognition. In Proceedings of the 2014 ACM International Joint Con-

ference on Pervasive and Ubiquitous Computing, pages 541–551. ACM,

2014.

[117] Xuefeng Liu, Jiannong Cao, Shaojie Tang, and Jiaqi Wen. Wi-sleep:

Contactless sleep monitoring via wifi signals. In Real-Time Systems Sym-

posium (RTSS), 2014 IEEE, pages 346–355. IEEE, 2014.

[118] Zheng Yang, Zimu Zhou, and Yunhao Liu. From rssi to csi: Indoor

July 20, 2020 BIBLIOGRAPHY 160

localization via channel response. ACM Computing Surveys (CSUR),

46(2):25, 2013.

[119] Sheng Tan and Jie Yang. Wifinger: leveraging commodity wifi for fine-

grained finger gesture recognition. In Proceedings of the 17th ACM In-

ternational Symposium on Mobile Ad Hoc Networking and Computing,

pages 201–210. ACM, 2016.

[120] Jin Zhang, Weitao Xu, Wen Hu, and Salil Kanhere. Wicare: Towards in-

situ breath monitoring. In 14th EAI International Conference on Mobile

and Ubiquitous Systems: Computing, Networking and Services (MOBIQ-

UITOUS). ACM, 4 2017.

[121] J. Zhang, B. Wei, W. Hu, and S. S. Kanhere. Wifi-id: Human identifica-

tion using wifi signal. In 2016 International Conference on Distributed

Computing in Sensor Systems (DCOSS), pages 75–82, May 2016.

[122] Anne De Groote, Muriel Wantier, Guy Ch´eron, Marc Estenne, and

Manuel Paiva. Chest wall motion during tidal breathing. Journal of

Applied Physiology, 83(5):1531–1537, 1997.

[123] G Ramachandran and M Singh. Three-dimensional reconstruction of

cardiac displacement patterns on the chest wall during the p, qrs and t-

segments of the ecg by laser speckle inteferometry. Medical and Biological

Engineering and Computing, 27(5):525–530, 1989.

[124] Kun Qian, Chenshu Wu, Fu Xiao, Yue Zheng, Yi Zhang, Zheng Yang,

and Yunhao Liu. Acousticcardiogram: Monitoring heartbeats using

acoustic signals on smart devices. 2018.

[125] Christopher R Cole, Eugene H Blackstone, Fredric J Pashkow, Claire E July 20, 2020 BIBLIOGRAPHY 161

Snader, and Michael S Lauer. Heart-rate recovery immediately after

exercise as a predictor of mortality. New England journal of medicine,

341(18):1351–1357, 1999.

[126] Meditation offers significant heart benefits.

https://www.health.harvard.edu/heart-health/meditation-offers-

significant-heart-benefits, 2018. [Online; accessed 1-July-2018].

[127] Polar H7. https://support.polar.com/au-en/support/H7 heart rate

sensor, 2018. [Online; accessed 15-December-2019].

[128] Javier Hernandez, Daniel J McDuff, and Rosalind W Picard. Biophone:

Physiology monitoring from peripheral smartphone motions. In Proceed-

ings of the International Conference on Engineering in Medicine and

Biology Society (EMBC), pages 7180–7183. IEEE, 2015.

[129] Mingmin Zhao, Fadel Adib, and Dina Katabi. Emotion recognition using

wireless signals. In Proceedings of the International Conference on Mobile

Computing and Networking, pages 95–108. ACM, 2016.

[130] Jamie A Ward, Paul Lukowicz, Gerhard Troster, and Thad E Starner.

Activity recognition of assembly tasks using body-worn microphones and

accelerometers. IEEE transactions on Pattern Analysis and Machine

Intelligence, 28(10):1553–1567, 2006.

[131] Patrick Krauss, Claus Metzner, Achim Schilling, Christian Sch¨utz, Kon-

stantin Tziridis, Ben Fabry, and Holger Schulze. Adaptive stochastic

resonance for unknown and variable input signals. Scientific reports,

7(1):1–8, 2017.

[132] Andy J Ma, Nishi Rawat, Austin Reiter, Christine Shrock, Andong Zhan, July 20, 2020 BIBLIOGRAPHY 162

Alex Stone, Anahita Rabiee, Stephanie Griffin, Dale M Needham, and

Suchi Saria. Measuring patient mobility in the icu using a novel nonin-

vasive sensor. Critical care medicine, 45(4):630, 2017.

[133] Wei Wang, Alex X Liu, and Muhammad Shahzad. Gait recognition using

wifi signals. In Proceedings of the ACM International Joint Conference

on Pervasive and Ubiquitous Computing, pages 363–373. ACM, 2016.

[134] Karin Valkenet, Petra Bor, Lotte van Delft, and Cindy Veenhof. Mea-

suring physical activity levels in hospitalized patients: a comparison be-

tween behavioural mapping and data from an accelerometer. Clinical

rehabilitation, 33(7):1233–1240, 2019.

[135] Simon Gibson, Simon J McBride, Coen McClelland, and Marcus Wat-

son. A technological evaluation of the microsoft kinect for automated

behavioural mapping at bed rest. In HIC, pages 39–45, 2013.

[136] Paulo Henrique G Mansur, Lacordaire Kemel P Cury, Adriano O An-

drade, Adriano A Pereira, Guilherme Alessandri A Miotto, Alcimar B

Soares, and Eduardo LM Naves. A review on techniques for tremor

recording and quantification. Critical Reviews in Biomedical Engineer-

ing, 35(5), 2007.

[137] B Hellwig, P Mund, B Schelter, B Guschlbauer, J Timmer, and

CH L¨ucking. A longitudinal study of tremor frequencies in parkinson’s

disease and essential tremor. Clinical Neurophysiology, 120(2):431–435,

2009.

July 20, 2020