Wireless Sensing for Medical Applications
Abdelwahed Khamis
Dissertation submitted in fulfilment of the requirements for the degree of
Doctor of Philosophy in Computer Science and Engineering
School of Computer Science and Engineering
Supervisor: Wen Hu
Co-supervisors: Chun Tung Chou and Brano Kusy
February 2020 '' ''&((#"(
)&"! ! -! *"!' &*(#"#&&'*"("*&'(- "& ) (- ## ''( " !""% !!!" $""!" !!!" ## ' %" "!""!"!#"'#" " $ $ "!#" "!"!!&"! !! % !!$!" $"!!"! "#"!"!% % !!! '!$"!" "" !"!'!" #""" ' "! " % ! $ "!%" #" #! "!"'!#"! " !!% !!" "!!!"%""! ! #' !" # "!"!!"!"!$!' '# ' !! !!"!!'!"! """!! !""""!" !"$ "!%"#"$" $'"""! "%!%% ""! '$ $! !! "%!! % !!"%!!" "$"!% ! !!'!"!
&(#"& ("(#'$#'(#"#$((''''&((#" &-&"((#("*&'(-#+#)( '#&('"('"#", )'* "(#&*"(#! * " )"(#!!&'#( $) !-(''#&''&((#""+# #&"$&("("*&'(- &&'" #&!'#!"#+#&&(& "#+" "#+ (( &(" "( () $$&(-&('+')''("!-(''#&''&((#"')'#$-&("$("(&('')((#$$ + '#&("( &((#)' #&$&(#!-(''#&''&((#"")()&+#& '')'&( '#&## ' ...... ......... ... "()& ( "*&'(-&#"''(((&!-,$(#" &)!'("'&%)&"&'(&(#"'#"#$-"#&#"(#"'#")'%)'('#&&'(&(#" #&$&##)$(#-&'"!+"')!(("(" #$'#-#)&(''(#(&&-%)'('#& #"&$&## &'(&(#"!-#"'&",$(#" &)!'("'"&%)&($$* #( "# &)('&
ORIGINALITY STATEMENT
‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’
Signed ……………………………………………......
Date ……………………………………………......
%%'""!$&(! &%"'% &$$%$$%'&%'$ &$ &'$ %& & %% * &! $!'$ -%$,$('+' -+ $',# $*,# +$+$'%$ -(! #), *$! • &! &$'&$&$& !&! & & &"'&! %& ,"$ $+'&!$-& &)%$%"! %"$ $+!$&" *'&! "$"$&! !&)!$!$"'&! • &%""$!(&! '&"'&! &$&%% '! "&$ $! &$%'"$(%!$ !%&$'& !!$ &!$ • "'&! % !&%'&&! +!&! %!$! &$&'$ &%)&&$ "$&+&&)!'! %&$ &% '%! &&%% % &)&$&%&%%! & %"'% &$!$ !&
%&%%! & % !"'&! %&$"'%!$%' &&!$"'&!
! !&)!$%$ &%&%%% "'% &% !' & &$( & "&$%)& !) &
%&%%%"'&! %&$"'%!$%' &&!$"'&! !$"!$& &!& '!"&$ &&%$"$% &!) 0 $&& • (! ")&& %% * &! $!'$ • )$ ('%"'&! '! "&$&%&"'&! %!) &%&$#'$ &%&! ' &&%% ' $ , 0+& $"',-* , &&// %.# #&$+
) ' "" $ !"' % #"! "!"!&#!$ " $ " $ # " ! " # ' "!! !! ""% "" $ !"' ! !% " % %"" """# "' "!%!#!!" '"!! !! ""!#!' """ "!!#""% ! "" ""#! "'"!! !! ""#"# % !!# ! "! !* ) '!#!"" "!' "" #!"!"!!% "" !! #!!" "' "" ! $ "# $ !""!!* (((((((((((((((((
"(((((((((((((((((
) "'""" '!""'! " #$""' $$ !'"!!*
(((((((((((((((((
"((((((((((((((((( Abstract
A transmitted wireless signal travelling at the speed of light in indoor spaces goes on an intriguing journey in which it reflects off ambient objects and gets modulated by human motion before reaching the receiver. Leveraging this fun- damental principle, this thesis exploits radio signals from commercial wireless devices to enable novel health sensing applications. The practical outcomes of this work range from wireless-based physiological vital sign monitoring to the
first system for automatic tracking of Hand Hygiene practices of healthcare workers.
To deliver this, we introduced techniques and algorithms to analyse human motions form reflected radio signals while addressing the practical challenges associated with the noise presence in Radio Frequency (RF) signal and the requirements of the sensing applications themselves. By relying purely on RF signals for sensing, these systems operate in a contact-less manner, agnostic to lighting conditions and can fit in residential and clinical environments without invading the privacy of the inhabitants. In effect, we show how the capabilities of commercially available RF devices can be harnessed for health and well- being sensing while addressing the downsides of alternative modalities (e.g., wearables and camera-based systems). Acknowledgements
This work is partially funded by a CISCO Research Center University Grant.
In the name of Almighty Allah, the most compassionate, the most merciful.
All praise to Him, who bestowed upon me the strength to accomplish this disseration.
I cannot hope to thank adequately those who helped me in the prepara- tion of this dissertation. I am especially indebted to my advisor Wen Hu.
During the past four years, I have thouroughly enjoyed our collaboration that involved intense intellectual discussions, getting me invloved in interesting re- search projects and personal support whenever I needed it. Without his sup- port, this dissertation would not have been possible. My sincere apprecitation goes to my co-supervisors Chun Tung Chou and Brano Kusy. I can’t thank them enough for their feedback, encouragment and involvement in every mile- stone of this dissertation.
My thanks goes to Marylouise McLaws (School of Medicine, UNSW, Syd- ney) for introducing me to the Hand Hygiene tracking problem described in
Chapter 3, and providing domain expert’s input. My conversations with Hong
Jia (CSE, UNSW, Sydney) have always been enlighting and his suggestions improved the work presented in Chapter 3. I would like to express my grati- tudes to Sara Khalifa (CSIRO, Brisbane) for her comments and the engaging
iii iv discussions during the thesis writing stage.
I am blessed being in the company of Mahmoud Gamal, Mohammed Jad- doa, Firas Al-Doughman and Ahmed Saadeh. They have been there for me over the past few years, cheering me up on my darkest days. A very special words of gratitude go to my friends Khalifa Eissa and Mahmoud Saied. Despite the thousands of kilometers between us, their good memories always brought me happiness and helped me to survive hard times.
I am deeply grateful to my parents whose prayers, commitment to education and unconditional love led me to where I am today.
Above all, my profoundest thanks to my dear wife, Eman who went through the whole journey with me. This is as much her accomplishment as mine.
The last word goes for Moez, my beloved son, who has always been the source of much joy and happiness for me. This dissertation is dedicated to him.
Abdelwahed Khamis
February 2020 Brisbane, Australia
July 20, 2020 Publications
Journal Publications
1. Abdelwahed Khamis, Chun Tung Chou, Branislav Kusy and Wen Hu,
“WiRelax: Towards Real-time Respiratory Biofeedback During
Meditation Using WiFi.” Elsevier Ad Hoc Networks Journal , in-
press, accepted in May 2020.
Publication revised in Chapter 4 Conference Publications
2. Abdelwahed Khamis, Chun Tung Chou, Branislav Kusy and Wen Hu,
“CardioFi: Enabling Heart Rate Monitoring on Unmodified
COTS WiFi Devices.” In Proceedings of the International Confer-
ence on Mobile and Ubiquitous Systems: Computing, Networking and
Services. MobiQuitous ’18.
Publication revised in Chapter 5
3. Qi Lin, Weitao Xu, Jun Liu, Abdelwahed Khamis, Wen Hu, Mahbub
Hassan, and Aruna Seneviratne. “H2B: Heartbeat-based Secret
Key Generation using Piezo Vibration Sensors.” In Proceedings of
the International Conference on Information Processing in Sensor Net-
works, IPSN ’19. v vi
4. Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Marylouise
McLaws and Wen Hu, “Poster: A Weakly Supervised Tracking
of Hand Hygiene Technique.” In Proceedings of the International
Conference on Information Processing in Sensor Networks. IPSN ’20.
Publication revised in Chapter 3
5. Isura Nirmal, Abdelwahed Khamis, Mahbub Hassan,Wen Hu and Xiao-
qing Zhu, “Poster: Combating Transceiver Layout Variation in
Device-Free WiFi Sensing using Convolutional Autoencoder.”
In Proceedings of the International Conference on Information Process-
ing in Sensor Networks. IPSN ’20.
Under Review
6. Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Marylouise
McLaws and Wen Hu, “RFWash: A Weakly Supervised Track-
ing of Hand Hygiene Technique.” (Under review in a conference)
Manuscript revised in Chapter 3
7. Isura Nirmal, Abdelwahed Khamis, Mahbub Hassan and Wen Hu, “Deep
Learning for Radio-based Device-free Human Sensing: Recent
Advances and Future Directions” (Under review in a journal)
July 20, 2020 vii
Statement of contributions of student author
Publication Contributions 1 Designing and conducting the experiment and data analysis and writing the manuscript 2 Conducting video attack experiments and writing the relevant section in the manuscript. 3 Designing and conducting the experiments and data analysis and writing the manuscript 4 Designing and conducting the experiments and data analysis and writing the manuscript 5 Writing the manuscript 6 Designing and conducting the experiments and data analysis and writing the manuscript 7 Writing the manuscript
July 20, 2020 Awards
1. Best Poster Award IPSN 2020
• Abdelwahed Khamis, Branislav Kusy, Chun Tung Chou, Marylouise
McLaws and Wen Hu, “Poster: A Weakly Supervised Track-
ing of Hand Hygiene Technique.” In Proceedings of the Inter-
national Conference on Information Processing in Sensor Networks.
IPSN ’20.
viii Contents
Abstract ii
Acknowledgements iii
Publications v
Awards viii
Acronyms xxi
1 Introduction 1
1.1WirelessSignalsforMedicalSensing...... 1
1.2 Systems Developed ...... 3
1.2.1 Hand Hygiene Monitoring with Radio Frequency signal . 4
1.2.2 HumanRespiratoryBiofeedbackwithWiFi...... 6
1.2.3 HeartbeatMonitoringwithWiFi...... 7
1.3 Beyond the Applications Considered ...... 8
2 Background and Related Work 9
2.1SensingUsingRadioFrequencySignals...... 9
2.2 Foundations ...... 21
2.2.1 RF-basedGestureRecognition...... 21
2.2.2 Sequence Labelling ...... 23 ix Contents x
2.2.3 RF-basedVitalSignMonitoring...... 24
2.3 Related Works ...... 25
2.3.1 HandHygieneMonitoring...... 25
2.3.2 Breathing Biofeedback ...... 26
2.3.3 HeartbeatEstimation...... 29
3 Hand Hygiene Monitoring 31
3.1 Motivation ...... 36
3.1.1 HandHygieneMonitoringintheRealWorld...... 36
3.1.2 RFSensingforHHmonitoring...... 37
3.2 Background and Technical Motivation ...... 39
3.2.1 Back-to-back Gesture Tracking ...... 40
3.3RFWash...... 45
3.3.1 Why deep sequence labelling? ...... 46
3.3.2 RF Measurements ...... 47
3.3.3 Deep Learning Model ...... 51
3.4 Evaluation ...... 56
3.4.1 Goals, Methodology and Metrics ...... 56
3.4.2 WeaklySupervisedGestureTracking...... 58
3.4.3 Unseen Domains ...... 62
3.4.4 Comparison with Fully Supervised Gesture Recognition
Deep Learning Models ...... 68
3.4.5 LSTM vs BiLSTM ...... 70
3.5 Related Work ...... 70
3.6 Discussion and Future Work ...... 72
3.7 Conclusions ...... 76
July 20, 2020 Contents xi
4 Breathing Biofeedback 77
4.1 Overview ...... 82
4.1.1 Experimental Observation ...... 82
4.1.2 WiRelaxoverview...... 84
4.2WiRelaxSystem...... 85
4.2.1 The impact of Chest Displacement on Sub-carriers’ Phase
Difference...... 86
4.2.2 Modeling the Impact of Displacement on Phase Difference 88
4.2.3 RelativeDisplacementEstimation...... 91
4.3 Evaluation ...... 100
4.3.1 Goals, Metrics and Methodology ...... 100
4.3.2 CapturingBreathingCycles...... 101
4.3.3 Capturing Complete Breathing Pattern ...... 104
4.4 Related Work ...... 107
4.5 Limitations and Future Work ...... 108
4.6 Conclusions ...... 110
5 Heartbeat Estimation 112
5.1 Motivation ...... 115
5.2CardioFi...... 118
5.2.1 Background ...... 119
5.2.2 Preprocessing ...... 121
5.2.3 Sub-carrier Selection ...... 123
5.2.4 HeartRateEstimation...... 127
5.3 Evaluation ...... 127
5.3.1 Overall Performance ...... 129
5.3.2 Impact of Parameters ...... 131
July 20, 2020 Contents xii
5.4 Related Works ...... 133
5.5 Limitations and Future Work ...... 133
5.6 Conclusion ...... 134
6 Conclusions 135
6.1 Summary of Contributions ...... 135
6.2 Future Work ...... 136
6.2.1 Limitations ...... 137
6.2.2 Future Applications ...... 138
References 142
July 20, 2020 List of Figures
1.1 Hand Hygiene Tracking using Radio Frequency signal .5
1.2 WiRelax System WiRelax system leverages WiFi communica-
tion to provide each subject with instantaneous respiratory feed-
back during the breathing exercise session. The video demon-
strationisavailableat[1]...... 6
2.1 Conceptual illustration of the medical sensing applica-
tions ...... 11
2.2 RF is a genuinely anonymous signal: RF ranging data
has much lower resolution than frames from a co-located Kinect
depth camera and represents an alternative signal for genuine
anonymity...... 18
3.1 The alcohol-based handrub procedure recommended by
the WHO [2]. The 9 steps are marked by the labels G1, G2 etc. 32
3.2 Gesture sequence recognition. Conceptual illustration
of the proposed gestures tracking model (bottom row). Unlike
conventional gesture recognition (top row), the proposed model
is trained on unsegmented hand hygiene gestures and predicts
labels for whole sequences of gestures in run-time...... 34
xiii List of Figures xiv
3.3 Timeline of HH technique of a practicing healthcare
worker...... 39
3.4 Back-to-Back versus manually segmented gestures .. 41
3.5 Classification is highly dependent on segmentation quality in
RFgesturerecognitionsystems[4]...... 42
3.6 RFWash is trained on continuous RF samples (A) of HH
gestures and corresponding sequence labels. The model auto-
matically learns which frames correspond to individual gestures
(e.g., G1 vs G2) via ’alignment learning’. In run-time, per-frame gesture predictions (B) are produced and used to estimate the
most likely gesture sequence ...... 46
3.7 Range Doppler frames measurements ...... 48
3.8 Illustration of Range Doppler frame post-processing for
gesture G2. The figure shows the original RD frame, RD frame withbackgroundremoved,andasmoothedversion...... 51
3.9 RFWash Network Architecture. RFWash network has five
convolutions followed by a max pooling layer (2x2) and fully
connected layer, followed by two Bi-directional LSTM layers
and finally a softmax layer. All convolutions are 3 × 3 (the
number of filters are denoted in each box)...... 51
3.10 Concatenation Intuition. We significantly grow samples con-
taining a specific sequence ( [G7] ) by concatenating it with
other sequences in the training set ( [G7,Gx]or[Gx,G7] in the
middle column ). Consequently, G7 is seen in many contexts by the model and enables better learning of radar frames that
correspond to it within the sequence ( [G6,G7] right column). . 56 3.11 RFWash evaluation for different gesture sequence lengths. 59 July 20, 2020 List of Figures xv
3.12 Alignment performance w.r.t training sequence length .60
3.13 The impact of data augmentation...... 62
3.14 Impact of unseen sequence length to the performance
of RFWash. Vertical shaded areas in the figures highlights the
sequence length used in the training...... 64
3.15 Temporal HH gesture alignment. GT: Ground truth. aug:
with data augmentation. w/o: without data augmentation . . . 65
3.16 The benefit of data augmentation. (a) G3 retraction and protraction result in angular blobs in RD frames (red border in
(c)). (b) and (c) shows G3 RD frames predicted by the model w/augmentation and w/o augmentation .Images transparency
are inversely proportional to the posterior value for the frame. . 66
3.17 The impact of unseen gesture sequences...... 67
3.18 Healthcare Worker Indetification Perfromance ...... 74
4.1 WiRelax Concept. WiRelax leverages WiFi communication
to provide users with instantaneous respiratory feedback during
breathing exercises sessions. Video demonstration here: [1]) . . . 79
4.2 Cycle counting versus instantaneous breath tracking:
Most CSI streams agree on breath cycle counts (three cycles
in the orange segment); however, there is a lack of consensus
about instantaneous breath (i.e., whether the subject is inhaling
or exhaling and at what depth [see the red segment]). The
WiRelax addresses instantaneous breath tracking...... 80
4.3 A sample breathing session...... 82
4.4 Recorded amplitude and phase difference for the breath-
ing signal of Figure 4.3...... 83
July 20, 2020 List of Figures xvi
4.5 Illustration of WiRelax architecture. WiRelax is meant to
be a framework that supports conscious breathing applications
by providing real-time detailed breathing waveform...... 85
4.6 The effect of object displacement on phase and phase
difference (PD) ...... 86
4.7 Illustrative example annotated with key symbols used
in the model derivation...... 89
4.8 Preprocessing. Preprocessing depicted for a 1-minute seg-
ment of breathing session ( rate 40bpm ). Time is on x-axis. The
lower row shows the pre-processing sequence applied to a single
sub-carrier (#9) while the upper row shows the pre-processing
effect on all sub-carriers (sub-carriers numbers on y-axis). The
values in the lower sub-figures were scaled to range from 0–1 for
each sub-carrier for visualisation purposes...... 93
4.9 Sub-carriers Correlation : The correlation between all sub-
carriers and the ground truth (GT) for a subject breathing nor-
mally.Matrix rows are ordered by their variance, with the high-
est value at the top. Sub-carriers with higher variance show a
better correlation with GT in general...... 94
4.10 Breathing Waveform Estimation ...... 96
July 20, 2020 List of Figures xvii
4.11 WiRelax breath cycles tracking. (a) estimated relative dis-
placement waveform compared to reference displacement wave-
form. Peaks and valleys positions are visualized on top and
bottom, respectively, slanting lines signify the deviation direc-
tion [as shown in the closeup (b)]. (c) breath tracking accuracy
for various metrics. The red lines denote the medians: 0.21
seconds , 12.2% and 14.4% for cycle timing error, relative tim-
ing error and IER error respectively. The correlation and root
mean squared error between the estimated waveform and refer-
encesignalarealsoreported...... 98
4.12 Estimated waveforms for various breathing patterns.
(a), (c) and (e) show the estimated waveforms for deep breath-
ing, deep & normal breathing and quick breathing sessions, re-
spectively. (b), (d) and (f) show scatter plots of estimated rel-
ative displacement in relation to the true displacement...... 99
4.13 WiRelax biofeedback prototype (demo: [1]) ...... 102
4.14 WRelax Evaluation. Evaluation of different breathing ac-
curacy metrics with respect to distance between the user and
WiRelaxsystem...... 103
July 20, 2020 List of Figures xviii
4.15 Breathing cycle estimations from three algorithms: Liu
et al. [5], PhaseBeat and WiRelax. The gray line shows the
ground truth for chest displacement. The solid gray (red) circles
show the true (estimated) peak. The ticks show the time differ-
ence between the estimated and true peak (i.e. the difference in
timings of the red and gray circles). (a) Liu et al. [5] estimates
cycle period using all peaks (small red circles) of selected sub-
carriers. The weighted average of all sub-carriers cycle time de-
termines the final breathing cycle boundaries (marked by large
red circles). (b) PhaseBeat [6] employs inter-peak duration
for a chosen single sub-carrier to estimate the breathing rate. (c)
WiRelax employs a cohort of selected sub-carriers to estimate
the final breathing waveform...... 106
4.16 TensorBeat [7] processing for same data of Fig. 4.15 . . 106
5.1 Power Spectral Density (PSD) curves for CSI data col-
lected using: omni-directional antennas. The low SNR makes
the heart rate estimation a challenge...... 116
5.2 Heart rate estimated from subcarriers. Actual heart rate
compared to the estimation produced from individual sub-carriers.
At each point in time, different sub-carriers produce estimation
that varies largely (illustrated by the red shaded areas whose
boundary represent maximum and minimum estimation at that
point). Even after discarding extreme estimates (darker area),
the range continues to be large...... 118
5.3 The architecture of CardioFi...... 119
5.4 CardioFi preprocessing ...... 122
July 20, 2020 List of Figures xix
5.5 Correlation between estimation error and the calculated estima-
tion variance ps ...... 124
5.6 Heart rate estimation from highest scoring sub-carrier
selected using Spectral-based (a) and Variance-based
(b) selection methods...... 125
5.7 The performence of CardioFi versus that of Liu et. al.
[8] ...... 128
5.8 Experimental setup scenarios ...... 128
5.9 The performance of CardioFi...... 130
5.10 The impact of CardioFi parameters ...... 131
July 20, 2020 List of Tables
3.1OverviewofautomatedHHmonitoringsystems...... 36
3.2 Time cost to perform manual labelling and sequence labelling . 44
3.3 Gesture Error Rate (GER) examples ...... 58
3.4ThegesturerecognitionaccuracyofRFWash...... 58
3.5 HH gesture recognition accuracy with different deep learning
models...... 70
3.6 LSTM vs BiLSTM ...... 70
4.1 Symbols used in the mathematical derivation...... 89
xx Acronyms
RF Radio Frequency
HH Hand Hygiene
HCW Healthcare Worker
WHO World Health Organization
FMCW Frequency Modulated Continuous Wave
HAI Hospital Acquired Infections
COTS Commercial-Of-The-Shelf
RD Range Doppler
CTC Connectionist Temporal Classification
GER Gesture Error Rate
CSI Channel State Information
PD Phase Difference
IER Inhalation-to-Exhalation Ratio
xxi Chapter 1
Introduction
1.1 Wireless Signals for Medical Sensing
Imagine a future in which wireless devices act as ubiquitous sensors. Now imag- ine, a patient with a chronic disease who must attend periodic follow-up visits with his physician every few months. During these short sessions, a physician must rely on limited observations based on tests and the patient’s responses to assessment questions to make a critical care decision [9, 10]. Fortunately, the physician has access to physiological data from the patient’s in-home wireless sensors that reveals the patient’s daily health state since his last visit. The sen- sors work from afar without body contact and are able to monitor fine-grained vital signs and acquire information suitable for medical assessment. Based on the data, the physician determines that the patient is experiencing exacerba- tion and needs to be hospitalised. A decision prevents the patient’s condition from deteriorating any further. However, in the hospital there is a chance of picking up a new infection when the patient’s nurse fails to follow proper hand hygiene practice. Luckily, the in-hospital wireless sensors mounted to soap dispensers passively identify the nurse’s improper HH practice and notify the
1 1.1. Wireless Signals for Medical Sensing 2 healthcare workers (HCWs) of this occurrence.
This research seeks to directly address some of the problems that may arise in this and similar contexts. At the core of this research are systems that perform ubiquitous medical sensing in a contact-free manner and are able to operate efficiently in home and hospital settings. Specifically, we propose instrumenting indoor environments with wireless radio frequency (RF) sensors that can continually monitor residents’ health states from afar without user guidance and more importantly, without invading their privacy.
Today, many research and commercial solutions exist for measuring hu- man health. Wearables have been extensively used in many medical sensing applications. However, wearables may cause inconvenience to the subjects be- ing monitored. In addition to being uncomfortable to wear while sleeping, important sections of the population, such as the elderly, typically abstain from wearing monitoring devices [11] [12]. Additionally, all-time monitoring of wearables cannot be enforced, as valuable information is lost if they are removed; thus, they are ill-suited for continuous long-term monitoring appli- cations. Cameras represent a potential option for contact-free monitoring.
However, cameras also capture auxiliary information that may invade users’ privacy by collecting data about a user’s identity or personally-embarrassing activities. Given that the entire medical field is concerned with patient and data privacy, [13] visual surveillance solutions for medical sensing may struggle to gain large-scale adoption.
To address these limitations, we have developed sensing systems that can be deployed on top of commercially available wireless devices to translate the incoming wireless measurements to health metrics. To meet our objectives, we exploited wireless signals to develop three medical sensing applications.
Specifically, this research shows how RF signals can be used to: 1) accu- July 20, 2020 1.2. Systems Developed 3 rately monitor the HH practices of HCWs in healthcare facilities, 2) extract detailed breathing metrics suitable for biofeedback applications; and 3) track heart rate using unmodified ubiquitous wireless devices. The sensing approach adopted in these applications has three key advantages. First, it relies purely on wireless signals for sensing and does not disclose any visual appearance data
(unlike the imaging data collected by cameras). Thus, our approach represents an alternative that seeks to protect privacy. As stated above, this is of critical importance in the medical domain, as it allows for in-ward sensing without compromising the privacy of patients. Second, it is contactless, as it does not require physical contact with the human body. Thus, it represents a suit- able platform for long-term health monitoring and could potentially be used in longitudinal studies. Third, it relies on RF measurements from commercially available devices rather than bespoke RF equipment. Thus, the devices can be used directly after a simple software/firmware upgrade to the hosting RF hardware.
1.2 Systems Developed
In the course of this thesis, we progressively move from perceptible to imper- ceptible motion sensing. We begin by presenting a system for tracking HH motions to establish whether a HCW correctly followed the standard nine- step protocol of the World Health Organisation (WHO). Next, we measure
fine-grained psychological vital sign parameters (i.e., breathing and heartbeat) from imperceptible chest displacement motions. Common across all the appli- cations presented in the thesis is our exploitation of the effect of human motion on RF signals to perform monitoring. Consequently, we propose several tech- niques for extracting motions of interest and discarding irrelevant motions and
July 20, 2020 1.2. Systems Developed 4 noise depending on the target application. Finally, we highlight the technical contributions of the systems developed.
1.2.1 Hand Hygiene Monitoring with Radio Frequency
signal
We present RFWash which is first system to track HH technique (i.e., the stan- dard nine-step hand-rubbing protocol recommended by the WHO) of HCWs.
Poor compliance with HH in healthcare facilities is associated with hospital- acquired infections (HAI) that can lead to mortalities. Our work represents the first attempt to employ RF sensing to address this issue.
Figure 1.1 illustrates RFWash Hand Hygiene tracking system. The system processes input RF measurements to identify the hand-rub steps performed by
HCWs. The live camera feed is shown for reference and is not used by the sys- tem. The notable advantages of employing RF sensing include that: 1) device- free mode sensing removes the risk of transferring pathogens (a potential risk when using wearables); and 2) privacy is protected, as the measurements are genuinely anonymous and details of users’ visual appearance are not disclosed.
The primary challenge in designing RFWash related to the fact that HCWs perform the steps naturally in a back-to-back manner, such that all the steps are performed contiguously without any pauses. In the absence of pauses, RF data becomes unsegmentable. Thus, it is challenging to identify when a user starts or ends each of the nine steps using non-visual RF data. The assumption of ‘pauses’ between gestures is ubiquitous among RF gesture sensing systems
[14]. It is critical to segment the input before the subsequent classification. To address this issue, we proposed an alternative solution that frames the problem in terms of sequence recognition whereby the complete gesture sequence is
July 20, 2020 1.2. Systems Developed 5
RF Sensor (Ti's mmWave)
Fig. 1.1: Hand Hygiene Tracking using Radio Frequency signal predicted and the segmentation is addressed implicitly.
In Chapter 3, we present the proposed deep learning model and show that the suggested approach enables the accurate tracking of HH gestures. The proposed technique is enhanced by the use of a data augmentation method that significantly reduces the labelling effort required to train the model. We further complement the capabilities of the RFWash by demonstrating how we can use the same RF measurements to identify which subject is performing the gesture. This effectively enables HH compliance to be tracked at the HCW level.
July 20, 2020 1.2. Systems Developed 6
Current Cycle Progress
subject performing (3s inhalation, 3s exhalation) inhaled for breathing exercise 1.1 s WiFi channel measurements
missed inhalation (1.9 s)
exhaled for 2.5 s
WiFi laptop tracking breathing in contact-less manner
breathing depth tracking
Fig. 1.2: WiRelax System WiRelax system leverages WiFi communication to provide each subject with instantaneous respiratory feedback during the breathing exercise session. The video demonstration is available at [1]
1.2.2 Human Respiratory Biofeedback with WiFi
Next, we demonstrate how contactless tracking based on RF data can be ap-
plied to the sub-centimetre–scale motion of the human body. We focus on
breathing tacking and introduce WiRelax, a breathing biofeedback solution
based on WiFi signals. Biofeedback is critical in domains in which breathing
exercises are used in the clinical treatment of a breathing-related disorder, such
as Attention Deficit Hyperactivity Disorder, Chronic Obstructive Pulmonary
Disease and Asthma [15, 16, 17], and in well-being applications.
Figure 1.2 demonstrates our system called the WiRelax. The WiFi receiver
is placed in the vicinity of a user and translates the WiFi channel measure-
ments to instantaneous breathing metrics (pattern and duration) in real time.
Users are informed about their instantaneous breathing performance within
the breathing session, which enables them to engage in fine-grained breath
control during the session.
July 20, 2020 1.2. Systems Developed 7
Extracting instantaneous breathing metrics suitable for biofeedback ap- plications from WiFi channel measurements represented the key challenge in creating this system. Unlike breathing rate, which can be reported based on completed breathing cycles, biofeedback metrics need to be reported instantly as the user is inhaling or exhaling. Thus, our system was designed to report on instantaneous breathing depth and timing while the breathing cycle is ongoing.
We propose a mathematical model (see Chapter 4) that can regress directly on incoming measurements without having to wait for cycle completion, which in turn enables WiFi-based interactive breathing applications.
1.2.3 Heartbeat Monitoring with WiFi
Finally, this research demonstrates that RF signals can be used to track a motion that is imperceptible to the human eye: a heartbeat. In line with our previous methods, we used unmodified WiFi devices to enable in-home well-being monitoring. Heartbeats contribute to a surface chest motion that can affect the wireless channel; however, the motion is relatively small as compared to that of respiratory motion [18] (see above). This millimetre-level motion corresponds to small RF signal variations that are overpowered by the collocated respiratory motion and even the noise present in the WiFi channel.
Consequently, it is very challenging to make reliable heart rate estimations from RF signals. To improve the RF signal quality and increase the accuracy of estimations, previous works [5, 6] have used directional antennas to enhance hardware. Conversely, we adopt a purely algorithmic approach that made it possible to deploy the system on ‘unmodified’ WiFi devices. As discussed in
Chapter 5, our system, CardioFi, can identify stable channel data streams
(sub-carriers). These data streams can be further processed to yield reliable
July 20, 2020 1.3. Beyond the Applications Considered 8 heart rate estimations with a median error of 1.1 bpm that outperform the state-of-the-art approaches by a clear margin.
1.3 Beyond the Applications Considered
In this thesis, we focused on leveraging wireless signals from commercial de- vices to enable novel medical sensing applications. Notably, the techniques developed in the thesis can be generalised to other scenarios. For example, the deep sequence prediction model developed for tracking back-to-back HH gestures (see Chapter 3) addresses the difficulty of input segmentation that is prevalent in other application domains, such as unsegmentable action recogni- tion (see Section 6.2) . Additionally, the privacy preserving advantages of our approach makes it suitable to sensitive environments (e.g., hospital wards) and it is applicable to other applications that use the same setup. In Chapter 6, we explore potential medical sensing applications that could use our frameworks and consider the new research questions raised by these applications.
The ongoing exploration of the sensing capabilities of RF devices will con- tinue over the coming years. We believe that these devices will serve as a favoured and reliable modality in real-world medical sensing in the near fu- ture.
July 20, 2020 Chapter 2
Background and Related Work
In this chapter, we discuss the background of the work presented in this thesis.
We consider and re-examine the principles underlying RF signals that makes then suitable to sensing applications. Fundamentally, this shows how RF sig- nals can be used in sensing and exposes the capabilities and limitations. Next, we briefly review the technical foundations of methods we developed for each sensing application. Finally,we review the previous research that informed the sensing applications developed in this dissertation.
2.1 Sensing Using Radio Frequency Signals
In our endeavour to build RF medical sensing systems based on interpreting human motion impact in the RF signal, we examined fundamental questions, the answers of which had a direct effect on the design and applicability of the systems. Specifically, we considered the following questions.
1. Body-Signal Interaction
• How does a RF signal interact with a human body and how can we
use this to sense motions of various scales from the millimetre to 9 2.1. Sensing Using Radio Frequency Signals 10
sub-metre levels?
2. Environment-Signal Interaction:
• How does a user’s ambience affect RF signal and consequently, how
can system be tailored to home and hospital settings?
3. Privacy Preservation:
• In addition to the main sensing task associated with human motion,
how much visual appearance information (i.e., information about
faces, gender and nudity level) is disclosed by RF sensing?
First, we examined the interplay between human body and the signal to identify the kind of motions that can be captured and in what context. This provided the foundation to mathematically model the effect of human motion on a signal (see Chapter 4). Second, as contact-free sensing is performed, the
RF signal also reflects off ambient objects; thus, we also examined the inter- action between the signal and a user’s environment. Dealing with interference from ambient objects in wireless environment is one of the key challenges fac- ing any RF-based sensing system. Typically, interference removal procedure depends on a multitude of factors including the specific RF hardware, ex- pected usage scenario and dynamics of the operating environment. Clearly, designing a ubiquitous procedure for interference removal that can work in any environment is out of the scope of this work 1. Here, we rather focus on the interference tolerance capabilities of various COTS RF platforms with respect to the applications considered. Finally, the privacy preservation of RF sensing was assessed. This question is rarely asked in the sensing literature
1Limited by the available resources, all the experiments in Chapters 3, 4 and 5 were conducted in a lab environment July 20, 2020 2.1. Sensing Using Radio Frequency Signals 11
heart in-home Ch.2 breathing Ch.3 in-hospital Ch.1 Hand Hygiene beat
A
static reflections dynamic reflections reflections relevant to sensing task
Fig. 2.1: Conceptual illustration of the medical sensing applications
and for most systems, it is viewed as less important than sensing quality. How-
ever, in medical environments, privacy concerns represent a great barrier to
the adoption of sensing modalities that could potentially expose confidential
user information. We emphasise that we don’t explicitly consider the design
of privacy preserving sensing systems as this beyond the scope of the work
presented here. We rather explore factors that motivate the use of RF signals
for sensing in medical environment from a privacy point of view.
By answering these questions, design guidelines were also developed for
each particular system. In the next section, we review the applications we
considered. Figure 2.1 provides a conceptual illustration of the applications.
The first application (see Chapter 3) is an automatic HH gesture tracking ap-
plication. Currently, the tracking procedure adopted by healthcare facilities
uses manual human observation. The goal is to have automated systems that
watch HCWs’ actions and report on their compliance with the standard hand-
rubbing procedure. Potential automatic monitoring systems need to track
July 20, 2020 2.1. Sensing Using Radio Frequency Signals 12 natural human actions and perform sensing in dynamic environments (e.g., hospital wards) in which interfering users may be present. These requirements and operating conditions represent common problems in the sensing area 2.
Thus, HH tracking reflects other sensing problems and can be used to exam- ine the suitability of RF sensing in hospital settings. Heartbeat and human breathing monitoring (see Chapters 4 and 5) are the subject of the other two applications. These key vital signs are the subject of many in-home monitoring systems, as they are an excellent indicator of well-being and predictors of a range of health issues. Further, accurately estimating vital signs serves many higher level applications, such as those directed at apnoea detection, emotion recognition, sleep quality monitoring, stress detection and conversation detec- tion. Together these are typical examples of in-home medical monitoring that may have a multitude of medical sensing purposes.
In general, the problems can be considered in terms of 1) the motion scale, which ranges from the sub-metre level in HH tracking to the millimetre level in heart rate tracking, 2) the operating environment, which ranges from a dynamic clinical to home environment; and 3) feedback, which needs to be instantaneous in the case of breathing biofeedback but can be relaxed in other applications.
In this section, we address the questions (see above) in light of the appli- cations’ characteristics and requirements. In all the applications, we target commercial RF devices, as the hosting platforms of our algorithms; thus, be- spoke RF equipment will not be considered. In earlier research that used a single hardware platform, it was possible to analytically quantify this inter- action by modelling the effect of human motion on RF signal amplitude and
2One example is activity logging in Intensive Care Units (ICUs). For further discussion on this issue, see Section 6.2 Future Work. July 20, 2020 2.1. Sensing Using Radio Frequency Signals 13 phase given the signal and motion characteristics. However, it is challenging to extend this approach to heterogeneous RF platforms in the market due to the vast diversity of platform parameters (i.e., bandwidths, frequencies, signal type, etc.). Consequently, we decided to focus on established physical proper- ties of the RF signal that can sufficiently demonstrate the potential of RF in motion sensing, gain an understanding of its limitations and develop guidelines to design the three medical sensing systems of interest.
1) Body-Signal Interaction: RF signals interact with material (includ- ing human body surfaces) in a variety of ways; for example, via reflection, absorption and refraction (the signal propagates within the material). A com- bination of these typically occur in a complex way; however, human motion sensing is primarily undertaken using reflection as a large portion of the RF signal is reflected off the human body [19]. The reflections provide the main sensing tool and they are captured and interpreted in light of the particular motion being monitored.
The nature of this reflection is dictated by the ratio between the wave- length and the reflective surface. The general rule is that RF waves can by- pass surfaces whose size is smaller than the wavelength and would otherwise be reflected [20]. Focusing on the human body as the monitored object, the pre- cise RF reflection behaviour differentiates between two prominent wavelength categories that exist in consumer RF products today: 1) the centimetre-scale wavelength to which the ubiquitous WiFi and other technologies belong ; and
2) the millimetre-scale wavelength that underlies 802.11ad WiFi and a range of embedded radar products that are gradually becoming main stream in indoor sensing and automotive applications.
The Centimetre-scale Wavelength: In common RF technologies (such as
WiFi), whose wavelengths are in the order of centimetres, the human body re- July 20, 2020 2.1. Sensing Using Radio Frequency Signals 14
flects the signal in a specular manner (i.e., mirror-like reflection) [21, 22] rather than in a diffusing and scattering everywhere manner. Under specularity, the nature of the reflections vary across different body parts. For moving limbs, reflections may be missed when the signal is deflected away from receiver. For example part of the reflections from a user’s moving hand can be deflected away (see location A in Figure 2.1 where only the coloured body parts are observable by the receiver) or reflections from the whole limb can be missed by the receiver depending on the surface orientation. Thus, it is challenging to obtain continuous measurements from moving limbs. Conversely, as a large reflector with relatively controlled movement, the torso, can be monitored to capture continuous reflection. Even if a user is in a mobile state (moving to- wards the receiver), continuous measurements can be captured from the torso
[22]. Further, the spatial factor of the torso (as the largest reflector in the human body) permits the capturing of minute motion reflections reliably.
To summarise, the torso (as a large spatial reflector) can be monitored for minute motions. We relied on this fact to capture motions associated with vital signs. To do this, we adopted the convention used previously by [23]
[22] to monitor subjects while their bodies face wireless devices. Additionally, torso reflections are continually available, unlike limbs which provide only oc- casional reflections. We used this to continually capture measurements from the torso with minimal missed reflections. This can be leveraged (when the body is quasi-stationary [23] ) for interactive sensing applications, such as breathing biofeedback, that require processing and reporting on contiguous measurements.
Millimetre-scale Wavelength: The aforementioned specularity property com- plicates the process of tracking moving limbs by using RF [22][24]. Addition- ally, centimetre-scale wavelengths lack the sensitivity to track small surfaces July 20, 2020 2.1. Sensing Using Radio Frequency Signals 15 of the body, such as micro-hand gestures and finger configurations [25][24].
However, millimetre-wave alternatives have proven to be more capable in this regard [26] , as they can be easily reflected by small objects, including finger tips. Consequently, applications involving intricate hand gestures with com- plex finger configurations and mirrored gestures (e.g., HH gestures) can benefit from millimetre-wave sensitivity capabilities.
Occlusion Impact: A millimetre wave provides better sensing resolution; however, the signal attenuation can limit its sensing range compared to tech- nologies with a longer wavelength. Generally, the longer the wavelength of an electromagnetic wave, the lower its attenuation [27]. These facts have been manipulated to the advantage of applications in the RF sensing liter- ature. Through-wall sensing systems employ long wavelengths that bypass obstructions, such as walls. Systems for tracking small objects (e.g., a pen
[20]) that are transparent in centimetre-scale wavelengths employ millimetre- wave signals that have stronger reflections from these objects. In light of this, centimetre-scale wavelength signals can cover multiple rooms [28] if necessary and operate through in-home potential occlusions, such as thick clothing. In vital sign monitoring applications, this enables effective tracking even if a user is sleeping under a quilt (a scenario that reportedly degrades the accuracy of other contact-free modalities such as the ultrasonic) [29].
By analysing the motion impact in reflected RF signal, we can gain insight into the actions that are occurring. Regardless of the mechanism used to analyse the signal, RF sensing indoors is typically challenged by environment- related factors (called multi-path) that raise a second question in relation to the effect of the interaction between the signal and the environment.
2) Environment-Signal Interaction: As the signal interacts with the materials in the environment, the operating environment needs to be factored July 20, 2020 2.1. Sensing Using Radio Frequency Signals 16 into the RF sensing system design. This may include reflections from walls, fur- niture and other objects (see the grey paths in Figure 2.1) and reflections from other users engaging in independent actions that cause dynamic interference
(see the green paths in Figure 2.1). The latter refers to dynamic multi-paths; it is more difficult to remove the effects of dynamic multi-paths than that of the first type. Thus, the measurements of different reflectors need to be sepa- rated. One common approach to mitigating dynamic multi-paths is to employ ranging information from Time-of-Flight (ToF) measurements. Intuitively, re-
flections from nearby reflectors will reach a receiver before those from further distances. However, signals travel at the speed of light and all reflectors are close to each other in indoor environments. Thus, reflections that are a few feet apart are still difficult to separate and typically require the employment of a larger bandwidth. Previous research has shown that it is possible to widen the original WiFi bandwidth through channel stitching [30]. However, this technique is subject to strict timing constraints and the final ranging resolu- tion is limited. As an alternative approach, we used commercial embedded millimetre-wave radars that can export precise ranging measurements.
3) Privacy Preservation: One example of a successful sensing technol- ogy that raises privacy concerns in sensitive environments is computer vision technology. Cameras can perform accurate sensing and tracking; however, due to their resolution, they can easily capture privacy-sensitive information, in- cluding information about an individual’s face, gender and skin colour [31].
Recent research efforts have addressed this issue by using various techniques to artificially downgrade the imaging quality and resolution [32, 33]. We show that the intrinsic properties of RF signal govern lower resolution imaging from the native measurement without the need for any transformation. Many fac- tors contribute to this fact; however, we discuss two key factors that are best July 20, 2020 2.1. Sensing Using Radio Frequency Signals 17 understood when comparing RF systems to cameras. The analogy is valid given that RF and visible light share many of the properties of electromag- netic waves [34].
The first factor is ‘undetectable surfaces’ and is relevant to our earlier discussion on the relationship between wavelength and sensing resolution. As everyday objects are not much larger than the illumination wavelength used in common RF devices3, many surfaces become undetectable and are missed by RF imaging [35]. The situation also applies to millimetre waves whose wavelengths are thousands of times longer than the nanometre-scale waves of visible light.
The second factor is the practical challenge of constructing the ‘dense RF sensor arrays’ required for imaging with high spatial resolution. Visible cam- eras capture images with high spatial resolution through millions of tiny sensors
(pixels). Today, these are compact enough to fit in our smartphones. In RF systems, these sensors (RF pixels) are antennas with associated complex cir- cuitry [34]. As the size of a sensor has to be comparable to a RF wavelength,
[34] it is very challenging to have a dense array of RF pixels even when using a shorter wavelength. To better illustrate this gap, we compare the resolution of modern millimetre-wave RF technology to depth cameras (see Chapter 3 for further details). Figure 2.2 shows a qualitative comparison of the ranging measurements of a millimetre-wave radar system and depth camera that was employed by [32] as a privacy preserving alternative to RGB cameras for HH monitoring. The depth camera reveals visual appearance details, such as a user’s body shape and the geometric layout of that user’s ambience. Con- versely, reflections from RF signals are sufficient for sensing purposes, such as
3RF technologies in the microwave frequency range, such as WiFi and Radio Frequency Identification (RFID). July 20, 2020 2.1. Sensing Using Radio Frequency Signals 18 identifying a user’s location without disclosing sensitive information.
(a) RF ranging heatmap (b) Depth camera range informa- tion
Fig. 2.2: RF is a genuinely anonymous signal: RF ranging data has much lower resolution than frames from a co-located Kinect depth camera and represents an alternative signal for genuine anonymity.
Thus, in general, in terms of privacy friendliness, RF sensing is a preferable to a state-of-the-art contact-free modality (e.g., vision). Based on the various
RF signal properties we explored above, the following guidelines can be used when considering a hosting RF platform:
• In-home Monitoring: WiFi technologies can be used for vital sign
in-home monitoring, as they are highly ubiquitous and already available
in residential environments. The technology is sufficient to accurately
sense vital sign micro-motions, as the torso is much larger than a sig-
nal’s wavelength. Further, the signal reflections can be captured through
potential occlusions; thus, it is possible to employ vital sign monitoring
during sleep and other related applications.
• In-hospital Monitoring: Conversely, commercial off-the-shelf (COTS)
radars with shorter wavelengths (e.g., millimetre-wave radars) are more
practical for sensing the intricate motions of the hand gestures of HCWs. July 20, 2020 2.1. Sensing Using Radio Frequency Signals 19
For example, modern radars provide accurate ToF measurements that
can be instrumental in separating dynamic reflections from interfering
users in a highly dynamic environment. Further, recent millimetre-wave
radars have a small size factor and thus can be used for this kind of
monitoring without any deployment consideration concerns.
While it is possible to use millimetre-wave radars for in-home vital sign monitoring, employing WiFi platforms already existing in today’s home envi- ronment is favourable as it enables large scale deployment with much lower cost[5]. On the other hand, WiFi measurements will be less reliable in in- hospital setting due to interference from secondary users. Such interference impact can be mitigated by employing precise ranging information as explained earlier.
We move from guidelines above and build on foundations from past liter- ature to address typical RF sensing challenges in the three medical sensing problems. To turn these abstract ideas into actual sensing systems, the fol- lowing new technical challenges had to be addressed in this thesis:
Sensing Natural Human Behaviour: Assumptions about human be- haviour that do not hold in realistic environments have prevented many re- search systems from being taken out of the laboratory. In the RF sensing liter- ature, the practicality of a gesture recognition system is limited by simplifying assumptions about human behaviour when performing gestures. For exam- ple, one gesture recognition system [4] assumes access to perfectly segmented data that is not attainable in sign language gestures (unless a user pauses after every gesture). In [36], a similar assumption was made that required the user to pause between subsequent gestures to allow accurate segmenta- tion. When operating in conditions in which the simplifying assumptions do
July 20, 2020 2.1. Sensing Using Radio Frequency Signals 20 not hold, accuracy degrades significantly (see Chapter 3) and leads to sensing deficiencies. In RFWash (see Chapter 3), we first observed the behaviour of a professional HCW and then constructed a system to identify the naturally performed back-to-back gestures to ensure that users will not have to adopt any artificial behaviours, such as pausing after every gesture. Our approach departed significantly from the two-stage models (segmentation followed by classification) common to the literature, as it framed the problem as sequence learning.
Relying on Consumer Devices: As mentioned above, we focused on accessible technology that can be deployed with minimal effort rather than developing custom hardware. In addition to being ubiquitous and cost efficient alternatives, recent progress in large-scale manufacturing has enabled the off- the-shelf hardware size factor to be much smaller. For example, the size factor of Ti’s radar, which we used for HH tracking (see Chapter 3), is significantly smaller than metre-scale non-commercial platforms [37, 38] that employ the same underlying technology. If we had used bulky designs to boost the sensing performance rather than the current choice, their form factor would create a practical deployment challenge when trying to fit them on to soap dispensers.
Thus, there is an added advantage of employing commercial RF.
Despite these advantages, the employment of consumer devices rather than specifically designed hardware introduces various practical challenges depend- ing on the actual hardware used. Sometimes a specific valuable RF measure- ment type is invalidated. For example, in custom-made WiFi platforms, [39] reliable phase measurements that were shown to be instrumental in achieving precise motion tracking [20] can be acquired. However, as we will see later
(see Chapters 4 and 5), these measurements are very noisy and useless in off-the-shelf WiFi hardware. Consequently, we turned to different measure- July 20, 2020 2.2. Foundations 21 ments of phase difference (PD) with different characteristics and used these to model motion impact. Another challenge that arose in using commercial
RF, is that the measurements might be available albeit with much lower sam- pling rates. We encountered this in the HH tracking system (see Chapter 3), as embedded radars export measurements at very low frame rates due to re- source constraints. Given this limitation, a time window with a few samples
(e.g., 10) may still contain data from multiple gestures rather than one gesture.
Consequently, our architecture was developed to address this scenario.
We believe in the critical value of developing sensing systems that use the native capabilities of the already popular and accessible RF hardware platforms and designing a sensing workflow that adapts to natural human behaviour.
2.2 Foundations
In this section we explore the technical foundations on which the methods in the results chapters (Chapters 3, 4 and 5) build on. First, we discuss RF- based gesture recognition and sequence labelling literature. These are relevant to RFWash system in Chapter 3. Then, we review methods for vital sign monitoring using RF that were leveraged in Chapter 4 and Chapter 5. In this section, we briefly explore the technical foundations in the literature. A thorough treatment of the background material, however, is presented in the individual chapters.
2.2.1 RF-based Gesture Recognition
In a typical RF-based gesture recognition system [4, 40, 41], the measurements stream is processed using a staged sequential approach in which the output of each stage is used as input to the next one. Commonly, three stages are July 20, 2020 2.2. Foundations 22 employed; preprocessing, segmentation and classification; respectively. In the
first stage, the input measurements are preprocessed to alleviate the impact of noise and interference caused by reflections from static and dynamic objects in the environment. Next, the segmentation stage utilizes “silence periods” in the measurements stream to extract segments that contain only the gesture data. The intuition is that in periods where the user is stationary (e.g. silence periods), the RF measurements variations will be minimal and can be detected based on an empirical threshold [40]. By extracting measurements between two consecutive silence periods, segmented gesture samples can be acquired.
Finally, the gesture samples are forwarded to classifier that predicts the gesture label.
Extensive prior work on RF-based gesture recognition clearly demonstrated that RF signal can capture complex hand dynamics. Even when the number of gestures is very high classification can be done efficiently on segmented gesture data as demonstrated by SignFi [4]. Guided by this fact, we revisit
RF-based gestures classification when segmentation is not possible as it is the case in Hand Hygiene gestures. A key outcome of our study is that classifiers pre-trained on segmented samples can’t tolerate even marginal errors in the segmentation during the runtime. Consequently, we explored various alter- native approaches for resolving the problem. The most successful of which is inspired by the advances in “sequence labelling” (discussed next) which addresses segmentation and classification simultaneously during the learning stage.
July 20, 2020 2.2. Foundations 23
2.2.2 Sequence Labelling
In the machine learning literature, sequence labelling is used to refer to tasks where sequences of data are transcribed with sequences of discrete labels [42].
Speech and handwriting recognition, protein secondary structure prediction and part-of-speech tagging are a few example domains in which sequence la- belling is used extensively. The noticeable similarity between speech/handwriting recognition and the requirements of the hand hygiene tracking problem mo- tivates employing sequence labelling techniques. More specifically, we have a continuous unsegmentable input RF stream and we need to map it to the the corresponding discrete gesture labels. Thus the problem can be framed as sequence labelling rather than the traditional segmentation followed by classi-
fication. Formulating the problem this way, however, means that information about the start and times of each individual gesture within the sequence is not accessible by the machine learning model. In other words, the problem is further complicated by the fact that alignment between inputs and gesture labels is unknown.
We leverage weakly supervised sequence labelling [43] in which the learning algorithm learns to determine the location as well as the identity of the output labels within a continuous input sequence. During runtime, the algorithm assigns a likelihood for possible alignments, which is then used to infer the most likely gesture sequence. This aligns very well with the problem requirements as it allows us to train the system on unsegmented RF sequence. Thus the difficulty of segmentation is bypassed. We support this learning process by a novel data augmentation to tackle the problem of limited training data.
July 20, 2020 2.2. Foundations 24
2.2.3 RF-based Vital Sign Monitoring
The success of RF-based vital sign monitoring was popularized by radar-based systems like Vital-Radio [23]. It works by transmitting a low power wireless signal and measuring the time it takes to travel to the human body and re-
flect back to the receiving antennas. The FMCW radar used in Vital-Radio enables highly accurate distance tracking that even slight and periodical chest displacements caused by the user’s breathing and heart beating can be moni- tored accurately. In this way, the system can be fitted into a home environment to turn it into smart environment that monitor the residents vital sign.
Despite the proven success of FMCW-based vital sign monitoring systems like Vital-Radio, another line of research focused on utilizing ubiquitous wire- less devices that already exist in home environment [5, 8]. In particular, WiFi devices are used as vital sign monitors by extracting the breathing and heart- beat information from the channel measurements (CSI). Despite the fact that distance measurements can’t be directly estimated from WiFi CSI measure- ments, the theory of operation is very similar to this of Vital-Radio. As the signal is reflected by the human body before reaching the receiving antenna, variations in the receiver’s CSI sub-carriers will capture vital sign related pe- riodic displacements. Methods were developed to de-noise the measurements and extract the vital sign information using frequency analysis [5, 8]. WiRelax
(Chapter 4) and CardioFi (Chapter 5) builds on these methods and upgrade their capabilities to accommodate new application scenarios such as heartbeat monitoring without specialized antennas and on-going breathing cycle moni- toring for biofeedback.
July 20, 2020 2.3. Related Works 25 2.3 Related Works
In this section, we summarise related works to define the scope and context of the developed medical sensing applications. We then revisit and complement this review by elaborating on the technical differences between our work and related works in the corresponding chapters.
2.3.1 Hand Hygiene Monitoring
The first application we consider is tracking hand hygiene technique in which healthcare workers perform nine standard gestures. The current golden stan- dard for monitoring is human auditing. However, in addition to costing the community millions of dollars annually, it also only collects a small fraction of hygiene opportunities.
To address this issue, previous efforts to automatically monitor HH moni- toring have focused on:
• Dispenser Usage Monitoring This technique simply tries to deter-
mine how often HCWs use soap dispensers compared to expectations.
In hospitals, indirect estimation using methods, such as detergent con-
sumption and electronic counters, [44] have been employed. Additionally,
direct monitoring using wearable Radio Frequency Identification (RFID)
[45] or Bluetooth tags have also been tested. When such technologies are
employed, HCWs are recorded as having performed a HH action when
their tags are detected by a tag reader-enabled dispenser. However, given
that the use of a dispenser does not necessarily mean that the correct
technique was followed, these approaches do not provide feedback about
the quality of the actual HH procedure.
July 20, 2020 2.3. Related Works 26
• Hand Hygiene Technique Monitoring Under this group, a system
tracks an individual’s actual hand-rubbing steps (the nine gestures speci-
fied by the WHO). This fine-grained tracking system differs substantially
from simply tracking dispenser usage. Wearables [46], depth [3] and RGB
cameras [47] are able to conduct the tracking. However, due to privacy
concerns associated with camera imaging and the possibility of transfer-
ring pathogens using wearables, these systems do not have the potential
of working in hospital wards.
Our system falls in the second category. In our RF setup, we tracked the hygiene steps by capturing reflections of subjects’ hands in a contactless way.
The fact that the gestures are performed with interlocked hands, include subtle motions and that many of the gestures are mirrored introduces challenges to contact-free systems. Previous contact-free efforts relied primarily on detailed visual appearance features of hands from RGB [47] or depth [48] cameras.
However, our research appears to be the first to employ RF for HH tracking.
Unlike video based techniques, capturing detailed visual appearance, such as hand shape, or orientation is not attainable from RF measurements. Our algorithms rely on range and velocity information extracted from subjects’ hand reflections. The algorithms also deal with the realistic requirements associated with operating inside clinical environments (e.g., HCWs performing the steps in a back-to-back manner and staff and other subjects being actively present in the vicinity of the main subject).
2.3.2 Breathing Biofeedback
In our second application, we used WiFi to track human breathing for breath control applications. Under this system, a user can track her breathing in-
July 20, 2020 2.3. Related Works 27 stantly to be able to perform a breathing action (inhaling/exhaling/retention) to align with a specific breathing exercise. The benefits of breath control range from the development of physical and mental well-being [49] [50] to the treatment illnesses such as hypertension and arrhythmias [51]. The efficacy of breath control training has led to it being used in non-drug medical treat- ment solutions.For example, Resperate[52] (a medical device recommend by
American Heart Association [53] for lower blood pressure) is based entirely on episodes of slow breathing exercises.
To allow the users to achieve breath control, methods were developed to inform users about their breathing activities. Conventionally, devices such as respiratory inductance plethysmography (RIP) are used for accurately track respiration activity in clinical settings. As the device requires two sensory bands around a patient’s torso to record the breathing motion and a specialist to operate it, it is ill-suited for in-home use. Consequently, researchers have investigated various methods to achieve ubiquitous respiration tracking. The large body of work concerned with the actual tracking of breathing activity can be divided into two groups:
• Breathing Rate Monitoring. In this group, the periodicity of breathing-
induced motion is analysed to infer the breathing rates. This topic has
been the subject of extensive research and has been examined using a
wide range of technologies. The common “body surface sensors” ap-
proach emulates RIP principle by attaching sensors to chest/abdomen
to record breathing rates. For this purpose, devices such as chest bands
[54] , smartphone’s inertial sensors [55] and smartwatches [56] have been
used. Today, some of these methods are already present in consumer
products. For example, the SpireHealth [57] and VitaliWear [58] use ac-
July 20, 2020 2.3. Related Works 28
celerometers attached to underwear to track breathing rates. Conversely,
contactless approaches have made considerable progress despite having
slightly lower accuracy than body surface sensors. Similarly, respiration
monitoring has been conducted using cameras and video amplification
techniques [59] and by determining chest motion impact on ultrasonic
signals [60] and wireless signals [23]. Despite the success of the previ-
ously discussed technologies, breathing rate is a summary statistic that
represents the number of breaths per minute and does not reveal the
breathing pattern details that are needed in many applications, includ-
ing biofeedback.
• Breathing Biofeedback. In this group, the goal is to provide users
with instant fine-grained breathing metrics in breathing training ses-
sions. Technologies similar to those reviewed in the first category have
been used to obtain biofeedback. The Prana wearable [61] is used for
diaphragmatic (i.e., abdominal) breathing training. The MindfulWatch
[62] uses a smartwatch to track respiration from a user’s wrist and reports
breathing cycle timing for meditation sessions. Subsequent to our work,
Breeze [63] showed tha smartphones’ microphones can be used to detect
breathing phases in gamified biofeedback-guided breathing training.
Our work (Chapter 4) belongs to the second category but also builds on the
WiFi sensing literature [5, 6, 64, 65] from the first category. The major target is to provide contact-free inexpensive breathing biofeedback solutions that can operate on top of commercial WiFi devices without requiring specialised hard- ware. Algorithms employed by past WiFi sensing work [5, 6, 64, 65]; however, they cannot be extended to biofeedback, as they use various techniques to anal- yse completed breathing cycles. Conversely, our systems track a user’s ongoing July 20, 2020 2.3. Related Works 29 breathing cycle and reports the instant progress on a ‘breath-by-breath’ basis so that a user can exert breath control action. For example, the system might guide a user to stop exhaling after a fraction of a second (e.g., 0.25 seconds) and start inhaling afterwards. In addition to timing, we need to inform users of their breathing depth. For this purpose, we developed a novel model that can recognise instantaneous breathing pattern parameters (including depth, timing and inhalation-to-exhalation ratio) by regressing directly on incoming wireless measurements making it suitable for guiding the users in breathing exercises session.
2.3.3 Heartbeat Estimation
From serving as an indicator of a range of important health issues [66, 67] to optimising exercises to reach fat-burning zones [68, 69], heart rate has always been used as a main vital signal. Outside hospitals, the estimation of heart rate has been the focus of many ubiquitous healthcare sensing systems. These systems have used smartphones camera and inertial sensors [70]. At the same time, various research groups have used RF platforms to investigate contactless monitoring.
Fundamentally, contactless heart rate monitoring is achieved by analysing the signal reflections of ballistocardiography (BCG). BCG refers to movements of the body synchronous with the heartbeat due to ventricular pump activity
[71]. Previous research has shown that periodic vibrations caused by chest
BCG reflects heart rate [72]. This fact has been used by various RF systems
FMCW radars [23], WiFi [5] and RFID. Similarly, our system analyses sig- nal reflections from minute chest displacements to estimate heart rate. We were inspired by sensing systems that capture heart rate from WiFi channel
July 20, 2020 2.3. Related Works 30 measurements [65] [6]. A main limitation of these systems is that they use a directional antenna to amplify the feeble motion impact on the noisy wireless measurements. The antennas are far less common in residential WiFi devices than omnidirectional devices. Unlike previous research, CardioFi (see Chap- ter 5) does not employ special antennas to boost the signal-to-noise ratio. In this scenario, as we show below, data streams (sub-carriers) become very noisy and fusing them directly, as per previous techniques [65] [6], yields inaccurate estimations. We make the experimental observation that a few of the data streams still reflect the actual heart rate and propose a novel sub-carrier selec- tion scheme to identify and discard other noisy channels early in the processing.
Ultimately, CardioFi brings heart rate estimation capabilities to unmodified
WiFi devices.
July 20, 2020 Chapter 3
Hand Hygiene Monitoring
Healthcare Associated Infections (HAIs) find their way to one in twenty five patients admitted to hospitals [73] and lead to increased patient mortality and healthcare costs [73]. Proper hand hygiene protocol (i.e., frequent and thorough hand cleaning) is an effective way to combat HAIs [74]. This leads to the question of how one can monitor hand hygiene (HH) adherence in an hospital environment.
The conventional approach to HH adherence monitoring is to employ a team of observers (e.g., overt nurse trained auditors) to record Hand Hygiene
Opportunities (HHOs) and the number of times health care workers (HCWs) comply with the protocol. Today, this is considered to be the gold standard for measuring compliance by the World Health Organization (WHO).
To date, attempts to implement automated alternatives for monitoring HH have had limited success. For example, electronic counters [44] and RFID [45] simply count hand washing activities. These tools provide a very limited pic- ture of HH adherence. They cannot reveal whether hand hygiene technique
— such as the nine-step procedure for applying alcohol-based handrub recom- menced by WHO [2], see Figure 3.1 — has been thoroughly adhered to. Some
31 Chapter 3. Hand Hygiene Monitoring 32
G1 G2 G3 G4 G5 G6 G7 G8 G9
Fig. 3.1: The alcohol-based handrub procedure recommended by the WHO [2]. The 9 steps are marked by the labels G1, G2 etc. commercial camera systems train HCWs to learn the correct HH technique; however, no solution appears to exist for the automated monitoring of the HH technique in healthcare
Although there are commercial camera systems for training HCWs to learn the correct HH technique, to the best of our knowledge, there exists no solu- tion for automated monitoring of the HH technique in healthcare facilities. facilities.
In this chapter, we use commercial-off-the-shelf RF mmWave sensors to monitor the HH technique in Figure 3.1. Our vision is to embed these sensors at the alcohol-based handrub dispensers, which are distributed throughout the hospitals, to monitor whether HCWs who perform hand rubbing have adhered to HH technique. Our vision thus enables much more fine-grained monitoring of HH adherence.
The HH technique in Figure 3.1 comprises nine different hand movement patterns. Recently major progress has been made in gesture recognition using radio frequency (RF) signals [4]; however, HH gesture monitoring presents unique challenges. First, six of the nine steps of the hand rub technique are very similar as they comprise motions in which the left and right hands are mirrored.
Second, some gestures are performed with two hands interlocked. Finally, the entire procedure is performed without a pause between consecutive gestures.
Contiguous sequences of gestures have not been previously investigated in the
July 20, 2020 Chapter 3. Hand Hygiene Monitoring 33
RF sensing literature. In fact, previous RF-based sensing approaches [4] rely on pauses between gestures that are employed as physical markers to identify the start and end of each motion segment. This approach trivially achieves accurate segmentation and the problem reduces to gesture classification. In the absence of enforced pauses, joint segmentation and classification becomes a challenging task.
Back-to-back gestures with no pauses defy traditional segmentation tech- niques. Due to the significant interdependence between segmentation and subsequent recognition, poor segmentation (see Section 3.2.1) , deteriorates classification performance. Thus, the traditional approach cannot be adapted to HH tracking. The challenge of RF-based contiguous gestures recognition has been recognized in prior research [4, 75]; however, to the best of our knowl- edge, no attempts were made to address it. For example, a WiFi-based sign language recognition system SignFi [4] addresses the segmentation issue by making the assumption that “manually segmented” single-gesture samples can be acquired. Clearly, the assumption is unrealistic and the problem was posed this way because it “introduces many challenges”[4] to consider contiguous gesture recognition.
In this work, we address the problem by introducing RFWash; a segmentation- free approach for recognising back-to-back HH gestures sequence. We drew in- spiration from modern end-to-end speech recognition systems, that are similar to our problem because it is difficult to label continuous speech data. Of par- ticular relevance to our problem are weakly supervised methods that can learn directly form data without requiring explicit data segmentation and full anno- tation. To this end, we developed a model that can be trained on back-to-back gesture sequences (see Figure 3.2) without requiring gesture segmentation, that can also reduce labelling overhead substantially. July 20, 2020 Chapter 3. Hand Hygiene Monitoring 34
Segmentation Classification G1 Gesture Sequence G2 Model G3
Sequence Prediction Proposed Back-to-back Gesture Sequence G1,G2,G3 Model
Fig. 3.2: Gesture sequence recognition. Conceptual illustration of the proposed gestures tracking model (bottom row). Unlike conventional gesture recognition (top row), the proposed model is trained on unsegmented hand hygiene gestures and predicts labels for whole sequences of gestures in run-time.
Notably, a straightforward adaptation of sequence learning, however, does not work for long HH gestures sequences. Long training sequences pose two major challenges that RFWash needs to overcome. First,working with longer sequences leads to fewer training data points, as a fixed-size training set gets split into fewer sub-sequences proportionate to the sub-sequence length. Sec- ond, the number of possibilities to align a minimal gesture label sequence within an RF HH data sequence grows exponentially with sequence length [76].
Ultimately, the situation becomes ill-posed and results in poor alignment(see
Section 3.4.3). To address this issue, we used data augmentation to signif- icantly increase the number of training samples without modifying the se- quence content. Consequently, a significant improvement of sequence learning was achieved.
We make the following contributions in this chapter:
July 20, 2020 3.1. Motivation 35
1. We propose and implement RFWash 1, which is the first RF-based
system for device-free monitoring of the nine-step HH technique.
2. We characterize the challenges of recognizing back-to-back HH
gestures using an RF-based gesture recognition processing pipeline. In
particular, the lack of pauses between gestures makes segmentation dif-
ficult which, in turn, affects the performance of the subsequent classifi-
cation component.
3. We propose a new sequence learning approach that performs seg-
mentation and recognition simultaneously. Consequently, the model can
be trained using continuous streams of minimally labelled RF data corre-
sponding to naturally performed hand-rub gestures. We further extend
the approach using a novel data augmentation technique to enable
training on longer segments that are less labour intensive.
4. We extensively evaluate the performance of RFWash using a dataset of
1,800 gesture samples collected from ten subjects over 3 months and
shows that RFWash achieves a low Gesture Error Rate (GER) of 7.41%
and low gesture timing error of 1.8 seconds, using weakly-labelled se-
quences of 10 seconds in length.
July 20, 2020 3.1. Motivation 36
Table 3.1: Overview of automated HH monitoring systems.
Contact- Hygiene Work Inside Wards Free Tech.
Electronic Counters [44] RFID [45] Wearable [46] (pathogens) RGB Camera [47] (privacy) Depth Camera [3] (privacy) Depth Camera [48] (privacy) Proposed (RF sensor)
3.1 Motivation
3.1.1 Hand Hygiene Monitoring in the Real World
An ideal automated system for monitoring HH compliance should be able to detect attempts by HCWs to perform hand rub procedures to track HH opportunities and to establish compliance rate baselines. Additionally, such a system should monitor fine-grained parameters of the HH technique (Fig- ure 3.1) itself. Such information can provide useful insights and help establish compliance rates of the healthcare facilities. The system must be capable of running unattended in real-world healthcare facilities. A previous study of 789 clinicians in a 380-bed tertiary hospital [77] showed that automated HH train- ing systems have a limited effect on HH compliance as they do not operate inside wards. The same study has shown that the direct human observation im- proved compliance which is explained by Hawthorne effect (i.e. the fact that
1Despite the name, RFWash tracks the nine-step Alcohol-Based Hand Rub (ABHR) [2] rather than hand washing. Hand washing techniques that use soap and water are performed in a 12-step procedure that have the additional steps for rinsing and drying hands.
July 20, 2020 3.1. Motivation 37 humans change their behaviour when they believe they are being watched).
The benefits of an automated monitoring system that evaluates HH in-situ are therefore twofold. It will lead to improved compliance by reducing the
Hawthorne effect and it will provide quantitative data about hygiene quality within the healthcare facility.
Despite the advent of machine learning algorithms for vision-based systems, the golden standard for assessing HH in clinical facilities is direct human ob- servation; however, such observations can only monitor a small fraction of
HHOs [45]. A complete and automated HH monitoring system has yet to be realised. Table 3.1 surveys the key characteristics of current research-based and commercial automated solutions. Existing solutions only perform well on one or two aspects. More importantly, no solutions exist for monitoring the hand-washing/rubbing process (i.e., the nine-step HH technique recommended by the WHO) in hospital wards.
3.1.2 RF Sensing for HH monitoring
As mentioned earlier, RFID and electronic counters are limited on the quality of information they can capture about the hand hygiene. Practical evaluations have revealed that RFID can miss more than 80% of hygiene events. To increase localisation accuracy, a network of cameras that tracks staff inside hospitals has been proposed [3]. However, neither approach is able to track the actual hand-rub technique.
No commercial solutions currently exist to track hand-rub techniques in hospital wards; however, research solutions based on camera technology have been proposed [47, 48]. As privacy regulations, such as the Health Insurance
Portability and Accountability Act and General Data Protection Regulation
July 20, 2020 3.1. Motivation 38 limit the use of cameras in healthcare settings [3], camera-based systems em- ploy image anonymisation techniques. One such example uses a depth camera that conceals colour information as each pixel value in-depth image represents the distance between the pixel and the camera instead of the colour. The use of a depth camera alone does not provide sufficient privacy guarantees. Despite careful control of the field of view of the cameras and reduced image resolution
[32], the images may still capture detailed images of the visual appearance of individuals that may be used to track and invade their privacy.
Figure 2.2 compares the RF signal data of the TI mmWave radar used in this paper to the depth data of a co-located Kinect depth camera. The camera was mounted to prevent the subject’s face from being captured. Both devices provide ranging information (i.e., how far objects from the sensing device); however, the RF heatmap captures significantly less personal information con- tent that significantly reduces the risk of privacy intrusion. We believe that the value of RF sensing can contribute to many other privacy-sensitive healthcare applications, such as Intensive Care Unit (ICU) activity logging [32, 33].
Further, the mmWave radar has advantages over other RF sensing tech- nologies, such as WiFi and RFID, including that: 1) the mmWave radar is self-contained, as it does not require two communicating parties, tags, or an- tennas; 2) it can be ubiquitous, as its form factor and low cost lower the barrier of the technology uptake; and 3) it has better spatial resolution that allows the filtering out of irrelevant motions that are often present in the real world due to other people or equipment. Together with privacy-protection property
(see discussion above), these advantages make the mmWave radar an ideal candidate for large-scale adoption in real-life healthcare facilities.
July 20, 2020 3.2. Background and Technical Motivation 39 3.2 Background and Technical Motivation
Fig. 3.3: Timeline of HH technique of a practicing healthcare worker.
HCWs are expected to execute the HH protocol at appropriate occasions at work (e.g., before and after touching a patient). Hospitals facilitate this protocol by placing soap or alcohol-based hand-rub dispensers at many easily accessible places in and outside the hospital wards. HCWs are expected to follow a standard hand cleaning procedure to ensure their hands are thoroughly cleaned. For example, the WHO recommended the HH technique (see Figure
3.1), which should take a HCW between 20–30 seconds to complete.
To better understand the current state of HH practices in healthcare en- vironments, we conducted face-to-face interviews with active HCWs at the
Prince of Wales Hospital in Sydney. During the interviews, we asked the
HCWs to show us how they would typically execute the hand-rub procedure.
We used a camera to record the process 2 and analysed the recordings to obtain the hand-rub gesture sequence and timing information. As Figure 3.3 shows, the real-life gesture sequence diverges from the ideal expected sequence shown in Figure 3.1 in which the gestures G1, G2, ..., G9 are executed consecutively. Conversely, Figure 3.3 shows that the gestures are repeated and do not occur in the expected order. For example, one HCW continued to rub her hands
2Ethical approval was granted by the University of New South Wales (Approval Number HC180818)
July 20, 2020 3.2. Background and Technical Motivation 40 until all alcohol had dried off her hands. Further, variation in the timing of each gesture was also observed. This simple example illustrates the intricacies involved in hand rubbing. We expect significant deviation between real-world hand rubbing and the ideal protocol.
Based on the above observations, the first goal of the RFWash is to accu- rately track the sequence of gesture poses performed by HCWs. The recorded sequence can be compared to the expected set of gestures { G1,G2, ...G9 };for example, the detected gestures can be compared to the set of expected ges- tures. Additionally, the RFWash tracks timing information to help to assess compliance against the 20–30–second duration guidelines. A more complex compliance analysis based on the gesture sequence and timing information could also have been undertaken, but this is beyond the scope of this work.
3.2.1 Back-to-back Gesture Tracking
In this section, we explore the limitations of existing RF gesture processing algorithms in the HH scenario.
Popular RF gesture recognition approaches follow a two-stage architecture, in which a detection/segmentation step is followed by a recognition step [4,
36, 40, 41]. There is a critical assumption that the RF time-series can be segmented into segments in which each segment contains only one gesture.
Thus, a classifier can be trained and tested based on these well-separated segments.
Typically, segmentation is conducted in one of two ways:
• Gestures are naturally segmentable. Users introduce a brief pause before
and after performing each gesture [40] to make the detection of the start
and end of individual gestures simpler (see Figure 4 in [40] ). Training
July 20, 2020 3.2. Background and Technical Motivation 41
(a) Back-to-Back Gestures Top: Doppler measurements of contiguously per- formed hand-rub gestures. Bottom: The differentiated principal component of the measurements. The vertical lines mark the start and end time of each gesture.
(b) Manually Segmented Gestures: Examples of manually segmented sign lan- guage gestures (samples taken from dataset [4] [a laboratory environment] and pro- cessed using PCA
Fig. 3.4: Back-to-Back versus manually segmented gestures
samples either contain only relevant gesture data [40, 75] or the gesture
data with additional samples that represent “no gesture” [78]. In the
run-time, a segmentation module automatically segments gestures using
no motion or “silent” periods.
• Users annotate continuous gestures manually. Applications such as sign
language recognition do not have segmentable gestures and the auto-
mated segmentation step from the previous approach fails. The limi-
tation can be overcome by manual segmentation [4] (i.e., the manual
extraction of segments) each of which contains a gesture. The key draw-
back to such an approach is the high intensity of labour that the manual
segmentation efforts requires. The labelling of RF signals is not intuitive,
which can introduce more errors than more natural modalities, such as July 20, 2020 3.2. Background and Technical Motivation 42
(a) Segmentation error (b)Imapctofofseg.erroronAc- curacy
(c) Confusion matrix, manual seg. (d) Confusion matrix,seg. er- ror: 0.5 → 1s
Fig. 3.5: Classification is highly dependent on segmentation quality in RF gesture recognition systems [4] video or audio. Figure 3.4b provides an example of sign gestures in this
category.
Why do we propose to use a segmentation-free approach? Figure 3.4a shows the Doppler measurements (top graph) and its differentiated principal compo- nent (bottom graph) of a real-life execution of the HH technique. The vertical lines in the bottom graph show the correct gesture boundaries. It shows that gesture boundaries are sharp with a minimal period of “no gesture” samples in between. Therefore, threshold-based segmentation [75] fails to recognise gesture boundaries. Consequently, most segmented sequences contain RF sig-
July 20, 2020 3.2. Background and Technical Motivation 43 natures from multiple gestures. A classifier trained on such data will perform poorly.
The impact of segmentation errors. To quantify the errors due to inaccurate segmentation, we applied SignFi algorithm [4] to RF traces of HH gestures.
The algorithm uses a deep CNN architecture originally designed to classify
276 sign language gestures, which we adapted to better suit our application scenario3. We evaluated SignFi on our dataset of naturally-performed HH technique from ten subjects using manually segmented samples that contain exactly one gesture in each segment4. Using two-second Doppler Range mea- surements and session-based cross-validation(see Section 3.3.2 for the details), we obtained accuracy of 83.3% . The confusion matrix (see Figure 3.5c) shows that the accuracy is more than 79% for most gestures except in relation to some of the mirrored gestures, i.e., G6/G7 and G9. Anecdotally, RF signatures of (G6,G7) and (G8,G9) are similar to each other and are more likely to result in incorrect classification.
To investigate the effect of segmentation error, we deliberately allowed segments to contain a few samples from neighbouring gestures. We ensured that the majority of the samples in a segment corresponded to the target gesture (see Figure 3.5a for an illustration). In particular, we allowed for an overlap of 1-25% and 25-50%, corresponding to 0.2-0.5 second and 0.5-1 second overlaps, respectively. This allowed us to study the impact of different levels of segmentation error on the classification accuracy. As Figure 3.5b shows that the accuracy decreases when the segmentation error increases. This shows that
SignFi does not handle the segmentation errors well.
3The convolutional layer in [4] has three 3x3 kernels. This produced poor results on our Doppler Range measurements. Consequently, we increased the number of kernels from 3 to 512, which improved its performance significantly. 4Sample-level labelling was conducted using a synchronised camera. July 20, 2020 3.2. Background and Technical Motivation 44
Table 3.2: Time cost to perform manual labelling and sequence labelling
Method RF data Labelling & Saving Segment. Manual Segmentation 4 mins @ 8Hz 18 mins - 10s Sequence labelling 4 mins @ 8Hz 6 mins 66.6%
The cost of manual segmentation and labelling. As RF samples are difficult to label and segment directly, we used a synchronised video camera in our ex- periment. The gestures were identified in the video and labels were propagated to the corresponding RF signatures. Despite using the camera feed as a visual aid, we found the process to be very time-consuming and we investigated an alternative method for annotating RF segments.
Sequence labelling. We introduce a new approach, which we call sequence labelling, to reduce the complexity of manual labelling. Two key ideas in se- quence labelling are: 1) We ask users to annotate relatively long continuous sequences of data; and 2) We request users to annotate gesture sequences with- out capturing the exact timing information of individual gesture boundaries.
Let us consider an example. Assume that we have a collection of 20 data frames {f0,...,f19} that contains the gestures G1, G2, and G3 in that or- der. Manual segmentation requires us to identify gesture boundaries or map each frame to a gesture, e.g. the annotated sequence is G1 ∈ [f0,f5],G2 ∈
[f6,f13],G3 ∈ [f14,f19]. In contrast, sequence labelling will annotate this col- lection of frames simply as G1 → G2 → G3, which says the order of gestures in the frames are G1, G2 and G3 without to specifying the transition times. Thus, less work is required to conduct sequence labelling.
We quantified the time required to perform manual segmentation and se- quence labelling experimentally by asking three annotators to label four min-
July 20, 2020 3.3. RFWash 45 utes of RF data sampled at 8 Hz using these methods. The average time taken is shown in Table 3.2. On average, manual segmentation took 18 minutes while sequence labelling took only 6 minutes, resulting in a saving of ≈ 66.6%.
Notably, manual labelling and segmentation costs can be significantly higher for higher RF sampling rates such as 200Hz in [4] or 1kHz in [36, 75].
In the next section, we show that it is possible to achieve highly accu- rate gesture segmentation and classification based on sequence labels. Unlike classical supervised learning that requires fully annotated data, our weakly supervised method only requires minimally labelled data.
Summary: The assumption of easily segmentable input that is commonly used by existing RF-based gesture recognition approaches does not apply to
HH gesture recognition scenario. We showed that the HH gesture classification accuracy depends heavily on the segmentation quality. High quality classifiers can be developed using manually segmented data; however, the associated labelling costs incurred are substantial. Inspired by sequence labelling methods which were used extensively in speech and handwriting recognition literature,
RFWash departs from the existing RF sensing segmentation approaches and proposes new methods to learn from weakly labelled unsegmented data.
3.3 RFWash
Figure 3.6 shows the architecture of the proposed RFWash framework. RFWash is trained on sequences of HH gestures in the RF space and their correspond- ing sequence labels. As discussed in the previous section, a sequence label only contains the order of the gestures in the segment. The training process, therefore, needs to determine the most likely mapping of gesture labels to each
RF frame. This is done via a process called alignment learning. At runtime,
July 20, 2020 3.3. RFWash 46
RFWash model internally assigns a likelihood to each (input RF frame, ges- ture) pair, which is then used to infer the most likely gesture sequence. Before we delve into the details about the model itself, we explain why we considered the specific model chosen and then we explain the input RF measurements.
Training A RD frames
alignment G1,G2 learning { [ ] Model sequence label Runtime Model (trained) predicted f1 f2 f3 f4 f5 f6 sequence G1 G G G G B 2 [ 1, 2, 3] G ... 3
Fig. 3.6: RFWash is trained on continuous RF samples (A) of HH gestures and corresponding sequence labels. The model automatically learns which frames correspond to individual gestures (e.g., G1 vs G2) via ’alignment learning’. In run- time, per-frame gesture predictions (B) are produced and used to estimate the most likely gesture sequence .
3.3.1 Why deep sequence labelling?
As explained earlier, the hand hygiene tracking can be approached as a se- quence labelling problem. In this regard, Hidden Markov Model (HMM) [79] can be considered as possible alternative to the deep sequencing labelling model. However, many studies had already shown that deep sequence la- belling that relies on Recurrent Neural Networks (RNN) coupled with CTC outperforms HMM and other variants [80]. Moreover, HMM requires manual effort and domain knowledge before it can be used (i.e. specifying the model
July 20, 2020 3.3. RFWash 47 states and state transitions). On the other hand, deep sequence labelling can be trained directly using input-output pairs in an end-to-end manner.
As we will see in Section 3.3.3, the CTC is an alignment-free algorithm.
The key benefits of the algorithm are that no need for have pre-segmented training data and no need for external post-processing to extract the label sequence. On the other hand, CTC doesn’t permit encoding explicit knowledge for the context between classes and their temporal progression. This can be a limitation in other domains such as video learning [81]. For example, when mapping video frames to actions sequence encoding, grammar rules that makes the action“pouring milk” more likely if the previous action was “reaching milk” can enhance the sequence prediction accuracy [82]. However, this is not the case in Hand Hygiene gestures as users can perform the sequence in any order
(see Section 3.2). In fact, it is even desirable to have predictions that don’t make hard assumptions about the expected gesture sequence.
3.3.2 RF Measurements
RFWash uses a mmWave radar mounted to a soap dispenser, to collect RF signatures of subjects that perform hand cleaning. Figure 3.7a shows the system setup. Many subjects may be present in a hospital environment (in- hospital setup, see Section 2.1); however, subjects that perform hand cleaning must stand close to the radar (e.g., within 1 metre). The subject will face the radar and her hands will be at approximately the same height as the radar.
Consequently, our goal is to measure the velocity of her hand motions and
filter out any other irrelevant signals.
A mmWave radar transmits a sinusoidal wave T (t), called a “chirp”, of linearly changing frequency and a time delayed version of the transmitted
July 20, 2020 3.3. RFWash 48
(a) Main subject facing the a radar and performing hand rub while an interfering subject( masked in green) passes behind the main subject
(b) Consecutive frames showing that the main subject (SM )RD measurements can be separated from a passing interfering subject (SI ) by a range cut-off.
Fig. 3.7: Range Doppler frames measurements signal is received for every reflector in the environment, including the hands of the subject performing washing. Formally, the frequency of a chirp at time
July 20, 2020 3.3. RFWash 49 t can be expressed as:
B f f t, t = 0 + T (3.1) where f0 is the starting frequency of the chirp, B is the bandwidth and T is the chirp duration. Let A(t) be the amplitude of T (t) at time t. The transmitted signal T (t) can be expressed as :
B T (t)=A(t) sin(2π(f t + t2)). (3.2) 0 2T
When the transmitted signal is reflected by a stationary object at distance
D0 from the radar, the reflected signal R(t)is:
B 2 R(t)=E(t) sin(2π(f (t − td)+ (t − td) )), (3.3) 0 2T where E(t) is the amplitude modulated by the object, the round-trip time delay is td =(2D0)/c where c is the speed of light. The signals T (t) and R(t) are mixed on the radar to produce the received signal S(t). It can be shown that S(t) has two frequency components: 1) the difference in frequencies between T (t) and R(t); and 2) the sum of their frequencies. A low pass filter can be applied to remove the second component:
BD f D S t ≈ C t π 2 0 t 2 0 0 . ( ) ( ) cos(2 ( cT + c )) (3.4)
C t S t 2BD0 where ( ) is the amplitude. The frequency of ( ), which is given by cT , is called beat frequency and can be used to estimate the objects distance D0. In general, there may be multiple objects in the vicinity of the radar and the mixed received signal will contain multiple beat frequencies. We can re-
July 20, 2020 3.3. RFWash 50 solve these with Fast Fourier Transform (FFT) and consequently compute the distances between each object and the radar.
However, range alone does not provide sufficient information to solve our problem. A subject’s hands during the handrub are very close to each other during the entire procedure. More information is needed to differentiate be- tween the gestures. Fortunately, the mmWave radar allows us to measure
Doppler frequency shift in the S(t) signal of the objects moving in the scene can be obtained.
We use mmWave signal S(t) to derive intensity map of the scene shown in Fig. 3.7b. The intensity map I(t, r, v) has the following interpretation: the intensity I(t, r, v) is higher if there is a higher chance at time t of finding an object located at distance r from the radar and moving at speed v. Fig. 3.7b shows the intensity map at three different time instants, with r plotted from
0 to 3m metres, and v from -2 to 2 m/s. Large intensity is shown in red. We will refer to the intensity map I(t, r, v) at a point in time as a Range-Doppler
(RD) frame.
RFWash needs to be robust to interference from nearby moving objects and people. Figure 3.7a shows a subject performing handrub in front of the radar. The person masked in green is within the range of the radar and acts as an interferer. Figure 3.7b shows RD frames at three time instants, with the location of the subject’s hands and interferer marked by SM and SI , respectively. We note that the intensity of all RD frames stay approximately unaffected in the SM region, by the interferer’s movement (see the dotted elipses in Figure 3.7b). From now on, we limit the range r to less than 1 metre so as to focus on the main subject only.
For illustration, we post-process the RD frames to amplify the HH gestures by performing background subtraction and Gaussian smoothing. For each RD July 20, 2020 3.3. RFWash 51
Fig. 3.8: Illustration of Range Doppler frame post-processing for gesture G2. The figure shows the original RD frame, RD frame with background removed, and a smoothed version. frame, we use the frames in the previous 1 second to estimate the background.
We then perform background subtraction and smooth the result with a Gaus- sian filter (see Figure 3.8). These steps remove the static reflection of the torso of the subject and amplify the hand motions related to the gesture performed in the RD frame. The input to RFWash deep model is a stack of normalised
RD frames after applying a cut-off at 1 metre to each frame and resizing them to 50 × 50 pixels.
3.3.3 Deep Learning Model
Fig. 3.9: RFWash Network Architecture. RFWash network has five convolu- tions followed by a max pooling layer (2x2) and fully connected layer, followed by two Bi-directional LSTM layers and finally a softmax layer. All convolutions are 3 × 3 (the number of filters are denoted in each box).
Figure 3.9 shows the layering structure of our deep learning model is shown.
Convolution layers followed by Bidirectional LSTM layers are used to extract spatiotemporal gesture features from input RD frames, a softmax layer and
July 20, 2020 3.3. RFWash 52
finally a Connectionist Temporal Classification (CTC) are employed to predict the gesture sequence.
As illustrated in Figure 3.6, RFWash takes a segment consisting of a stack
50×50×T of consecutive T RD frames X =[x1, ..., xT ] ∈ R from the continuous stream as input. The goal is to infer the gesture sequence performed by a
1×K HCW where =[1,2, .., K ] ∈A where A is the set of possible gestures and K ≤ T . Since the continuous segment can contain irrelevant motions (i.e., stationary user or users walking away from device), we define an additional
“no gesture” class GNo in addition to the nine HH gesture classes. Ultimately, the set of possible gestures is formulated as A = {GNo}∪{G1, ..., G9}. As stated above a sequence label , is used rather than a frame-by-frame
1×T label π =[π1, ··· ,πT ] ∈A to reduce the labelling cost (see Section 3.2.1). π is also called gesture path. An associated challenge with using is the lack of temporal alignment as it can be compatible with many plausible gesture paths.
For example, if the sequence label is G1 → G2 for an input of four frames, then this label is compatible with the gesture paths [G1,G2,G2,G2], [G1,G1,G2,G2] and [G1,G1,G1,G2]. Intuitively, the model resolves this challenge by consid- ering the probability of all plausible gestures paths for a particular sequence label.
Spatiotemporal Feature Extraction
Motions captured by the mmWave radar in a single RD frame have an identi-
fiable spatial pattern on range and velocity dimensions. Additionally, the tem- poral dynamics of each gesture will be present in consecutive RD frames. We use spatiotemporal feature extraction layers composed of five Convolutional layers followed by a fully connected layer and two RNN (Recurrent Neural
Network) layers. RNNs perform welll in sequential data modelling and are a July 20, 2020 3.3. RFWash 53 good choice for capturing the temporal dynamics of the gestures. However, in the context of HH technique, the mirrored gestures discussed in Section 3.2.1, present unique challenges because of their similarity in RF domain. Therefore, we employ bidirectional recurrent layers with LSTM cell type (BiLSTM [83]) to enable the network to use all available input information in the past and the future from a specific RD frame. In this configuration, two separate recurrent layers running in the forward direction (future) and the backward direction
(past) are utilized to learn the complex temporal dynamics.
The spatiotemporal feature extraction layers and softmax activation pro- cess input RD frames X to produce frame-wise probabilities of different ges- tures Y , which we call BiLSTM posterior. Y can be interpreted as the prob- ability of observing a sequence of gestures across T frames. This is further processed by the temporal alignment to estimate the most likely gesture se- quence.
Temporal Alignment Learning
RFWash implements alignment learning to infer the hand rub gesture sequence by mapping the output of BiLSTM components (i.e., BiLSTM posterior) to the corresponding gesture path. We rely on CTC algorithm [43], which is explained next in details.
A×T Let Y =[y1, ··· ,yT ] ∈ R be the softmax-normalized BiLSTM output for a stack of T RD frames, where A = |A∪{φ}| where φ denotes a blank. A blank is used by CTC to account for the probability of observing ‘no labels’ and modelling the transition between gestures within sequence. Thus, A =11 for RFWash. The vector yt,t ∈{1,...,T} can be interpreted as follows: yt,k denotes the probability that the gesture at time t is k where k =1,...,A.
Given the observations X, the posterior probability for any gesture path July 20, 2020 3.3. RFWash 54
π =[π1, ··· ,πT ] can be calculated as: T , P (π|X)= yt πt , ∀πt ∈A. (3.5) t=1 Notably, the posterior probabilities obtained in Equation (3.5) are con- ditionally independent for different gesture paths. This is desirable in the problem context, as we do not want the gesture classifier to be dependable on the order of gestures in the training data.
In the CTC framework, the probability of the sequence label is the sum of the probabilities of all its compatible gesture paths: