sensors

Article Validation of Zulu Watch against Polysomnography and Actigraphy for On-Wrist -Wake Determination and Sleep-Depth Estimation

Jaime K. Devine 1,* , Evan D. Chinoy 2,3 , Rachel R. Markwald 2 , Lindsay P. Schwartz 1 and Steven R. Hursh 1

1 Institutes for Behavior Resources, Inc., Baltimore, MD 21218, USA; [email protected] (L.P.S.); [email protected] (S.R.H.) 2 Sleep, Tactical Efficiency, and Endurance Laboratory, Warfighter Performance Department, Naval Health Research Center, San Diego, CA 92106, USA; [email protected] (E.D.C.); [email protected] (R.R.M.) 3 Leidos, Inc., San Diego, CA 92106, USA * Correspondence: [email protected]; Tel.: +1-410-742-6080 (ext. 132)

Abstract: Traditional measures of sleep or commercial wearables may not be ideal for use in oper- ational environments. The Zulu watch is a commercial sleep-tracking device designed to collect longitudinal sleep data in real-world environments. Laboratory testing is the initial step towards validating a device for real-world sleep evaluation; therefore, the Zulu watch was tested against the gold-standard polysomnography (PSG) and actigraphy. Eight healthy, young adult participants wore a Zulu watch and Actiwatch simultaneously over a 3-day laboratory PSG . The ac- curacy, sensitivity, and specificity of epoch-by-epoch data were tested against PSG and actigraphy. Sleep summary statistics were compared using paired samples t-tests, intraclass correlation co- efficients, and Bland–Altman plots. Compared with either PSG or actigraphy, both the accuracy  and sensitivity for Zulu watch Sleep-Wake determination were >90%, while the specificity was  low (~26% vs. PSG, ~33% vs. actigraphy). The accuracy for sleep scoring vs. PSG was ~87% for Citation: Devine, J.K.; Chinoy, E.D.; interrupted sleep, ~52% for light sleep, and ~49% for deep sleep. The Zulu watch showed mixed Markwald, R.R.; Schwartz, L.P.; results but performed well in determining total sleep time, sleep efficiency, sleep onset, and final Hursh, S.R. Validation of Zulu Watch awakening in healthy adults compared with PSG or actigraphy. The next step will be to test the Zulu against Polysomnography and watch’s ability to evaluate sleep in industrial operations. Actigraphy for On-Wrist Sleep-Wake Determination and Sleep-Depth Keywords: actigraphy; sleep tracking; validation; wearable device; sleep scoring; longitudinal data Estimation. Sensors 2021, 21, 76. collection; real-world sleep tracking https://dx.doi.org/doi:10.3390/ s21010076

Received: 17 November 2020 Accepted: 22 December 2020 1. Introduction Published: 25 December 2020 Sleep impacts almost every domain of human function: from mood to job performance to physical health and mortality [1–3]. Whether the goal of a monitoring system in the Publisher’s Note: MDPI stays neu- 4.0 Era is to improve personal well-being or to overhaul industry infrastructure, controlling tral with regard to jurisdictional claims for sleep as an intervening variable can only improve the model. Sleep researchers have in published maps and institutional been tracking sleep using wrist-worn actigraphy since the late 1970s [4]; recently, commer- affiliations. cial wrist-worn wearable devices have begun to feature sleep-tracking technology as well. The Sleep Research Society recently published a white paper outlining the barriers and opportunities for using wearables in sleep and circadian science [5]. This paper identified

Copyright: © 2020 by the authors. Li- poor validation of wearable technology as the primary barrier inhibiting the use of wear- censee MDPI, Basel, Switzerland. This ables in sleep and circadian science. Device validation is also important for industrial or article is an open access article distributed clinical research applications. under the terms and conditions of the The first step towards validating a wearable device begins with laboratory test- Creative Commons Attribution (CC BY) ing against the gold-standard measurements of sleep; i.e., polysomnography (PSG) and license (https://creativecommons.org/ research-grade actigraphy. Measuring sleep with the gold-standard technique polysomnog- licenses/by/4.0/). raphy (PSG) provides high-quality data, but it has many requirements, such as an overnight

Sensors 2021, 21, 76. https://dx.doi.org/10.3390/s21010076 https://www.mdpi.com/journal/sensors Sensors 2021, 21, 76 2 of 17

visit to a sleep laboratory, the application of multiple physiological recordings (on the scalp, face, and body), and scoring of recordings by an expert trained in the American Academy of Sleep Medicine (AASM) guidelines [6]. The output from the PSG data are then condensed into summary measures like time in bed (TIB), total sleep time (TST), sleep efficiency (SE; the ratio of TST over TIB), and sleep architecture. The sleep architecture data are used to further categorize the night into three non-rapid eye movement (NREM) and one rapid eye movement (REM) sleep stages, each with distinct patterns of physiology and behavior that indicate “sleep depth”. Starting with NREM, the sleep stages cycle throughout the night, each playing a critical role in the maintenance of the brain and body to promote optimal health and functioning [7]. In contrast to PSG, actigraphy can measure TIB, TST, SE, and sleep behaviors like napping, but it only categorizes sleep and wake as binary outcomes and therefore does not provide an estimate of sleep depth. Wrist-worn actigraphy devices can automatically determine sleep from activity levels using validated algorithms, but data must first be downloaded from the device to specialized software. Importantly, traditional actigraphy caters to healthy adult populations with regular sleep patterns. Its application to operational environments requires significant additional data processing. While the practice parameters for the clinical or research use of actigraphy in healthy individuals, sleep-disordered popu- lations, or military operational contexts have been published [8–10], these guidelines still encourage hand scoring of data to achieve the highest levels of accuracy, and there is no resolute consensus on the scoring methodology [11,12]. Many commercial wearables have been tested against PSG or actigraphy for sleep- wake determination with mixed, but promising, results [13–18]. The measurement of wrist activity by accelerometry alone cannot estimate sleep stages or NREM–REM sleep stage cycles, but it can identify bouts of immobility which are known to correspond to periods of restful sleep that include NREM and REM [19,20]. Many wearables also incorporate other sensors, such as heart rate, oxygen saturation, or peripheral arterial tone, to supplement wrist activity data [20,21]. These devices can be used to track shorter sleep episodes like napping and can provide an estimate of sleep depth similar to sleep stages. However, while the devices collect accelerometry and other biometric data, sleep scoring is done using algorithms on a companion mobile app or software platform [14,22]. This necessity restricts the ability of devices to log sleep episodes in the absence of wireless connectivity to download data, and it can generate concerns about privacy among researchers and consumers alike [23,24]. Furthermore, commercial sleep trackers are designed to provide direct feedback to the consumer about their sleep, a feature that may not always be desirable for research study purposes. Lastly, the battery life of most research-grade actigraphy devices is only a few weeks and for most commercial devices only a few days, meaning that changes to sleep behavior over the long term cannot be continually observed. This constitutes a limitation to any analyses that strive to investigate changes in sleep over time or the cumulative effects of sleep behavior patterns. Frequently changing the battery or recharging the device may be a deterrent to wearing the device or may not be feasible in all populations and settings. The number of active sensors or features like automatic syncing can affect the longevity of a charge, creating a trade-off between what data can be collected and the time-frame of the collection period. Battery life, therefore, is an important limiting factor when choosing a device for research, clinical, or personal measurement of sleep [12,25]. Sleep is a complex psychosocial behavior in addition to being a neurobiological phe- nomenon. Developing a wearable that can account for variability in sleep behaviors across individuals and over time as well as providing an estimate of sleep depth would bridge the gap between laboratory and epidemiological concepts of sleep. Bridging this gap does not necessarily require significant innovation to hardware or algorithms, but does require addressing the specific needs of real-world applications. The Zulu watch (Institutes for Behavior Resources, Baltimore, MD, USA) is a novel commercial sleep-tracking device de- signed specifically for operational environments. The Zulu watch measures sleep duration Sensors 2021, 21, 76 3 of 17

and estimates sleep depth based on wrist movement using a tri-axial accelerometer and on-wrist detection using a galvanic sensor. It scores sleep on-wrist rather than through a software or cloud application. It features the option to either provide sleep feedback directly to the user or to export sleep data for research purposes. The watch features a light sensor, on-demand heart rate, and off-wrist detection, and can store data on up to 80 sleep intervals and 7 days’ worth of 2-min epoch-by-epoch (EBE) data at a time, which can be extracted to .csv files for analysis. The battery of the Zulu watch can last up to one year with all these features active. Older sleep interval and EBE data are automatically overwritten by the newly collected data to provide for continuous longitudinal measurement of sleep. The Zulu watch uses sleep history data to compute a real-time estimate of sleep debt when in feedback mode. Researchers may extract this data intermittently per their study design and return the watch to the wearer immediately. The long battery life and ability of the Zulu watch to store up to 80 sleep intervals on-wrist means that researchers can collect data over a longer period of time than the 14- to 60-day range typical of research actigraphy devices. The Zulu watch can also detect multiple sleep periods per day, identify naps as short as 20 min, and can be worn as a normal wristwatch to tell time. The Zulu watch was designed as a sleep-tracking device that could serve either as a commercial wearable or as a research-grade actigraph for use in operational environments where sleep opportunities may be limited and the quality of the sleep environment may be compromised. While the Zulu watch is intended for longitudinal measurement of sleep in real-world environments, laboratory testing against the gold standards under controlled conditions is an important initial step towards establishing the performance of a sleep-tracking device [5]. Therefore, the aims of this validation study were to (1) test the agreement between the sleep-tracking algorithm in the Zulu watch and both gold-standard laboratory PSG and research-grade actigraphy for sleep summary outcome measures and EBE Sleep-Wake determination; and (2) test the agreement between the Zulu watch’s sleep-depth scoring and PSG sleep staging.

2. Materials and Methods 2.1. Participants Eight (8) participants wore both a Zulu watch and Actiwatch 2 (Philips Respironics, Murrysville, PA, USA) continuously over the course of a consecutive 3-day PSG sleep study. Participants were all healthy young adults, consisting of four men and four women who met the following inclusion criteria based on a self-report medical history questionnaire: age, 18–35 years (30.4 ± 3.2 years; mean ± SD); average nightly sleep duration, 6–9 h; body mass index, 18.5–29.9 kg/m2; no diagnosed sleep, mental health, or medical disorders; no use of any sleep medications or nicotine in the previous month; no use of illegal drugs in the previous 6 months; women who are not pregnant; and no travel across more than one time zone or any shift work in the previous month. These criteria are standard for inclusion in laboratory PSG sleep research studies in healthy adults. See Chinoy et al. for additional study participant screening details [18]. The study, which took place at the Naval Health Research Center (NHRC), was ap- proved by the NHRC Institutional Review Board, and was conducted in accordance with the Declaration of Helsinki. Participants signed informed consent prior to the study and were compensated for their participation.

2.2. Procedure Prior to study initiation, participants kept a consistent self-selected habitual 8-h TIB sleep schedule for 4 nights. Deviations (earlier or later) in bed and wake times of up to 30 min from the sleep schedule were allowed if the 8-h TIB requirement was upheld each night. Adherence to the pre-study sleep schedule was verified by sleep diary and actigraphy data. Participants arrived at the laboratory each evening and were provided with an 8-h TIB sleep opportunity at the same time each night, based on their habitual bed and wake Sensors 2021, 21, 76 4 of 17

times as calculated from the average of their 4-night self-selected pre-study sleep schedule. Individual sound attenuated and lighting-controlled research were used for sleep testing. Participants left the laboratory during the day but were instructed to wear both devices continuously at home, except for during showers or rigorous activities during which the watches could have been damaged. Beginning with the pre-study period and throughout the study, participants were instructed to not consume any caffeine or alcohol or nap. At the start of each evening, participants signed an attestation form verifying that they adhered to the study protocol and passed an alcohol breathalyzer test to verify sobriety. Room lighting was set to ~150 lux, and participants were observed continuously while in the laboratory and engaged in sedentary activities before bedtime. Participants arrived at the laboratory 2.5 h prior to bedtime on study nights 1 and 3, and 4.5 h prior to bedtime on study night 2. Application of the PSG electrodes began approximately 2 h before the participants’ bedtimes. Participants left the laboratory approximately 1 h after wake time on study night 1, and 3 h after wake time on study nights 2 and 3. See Chinoy et al. for additional study testing protocol details [18].

2.3. PSG Recording PSG included electroencephalographic (EEG: F3-M2, F4-M1, C3-M2, C4-M1, O1-M2, and O2-M1), electromyographic (EMG: mentalis chin muscle), left and right electrocardio- graphic, and electrooculographic (EOG: E1-M2 and E2-M1) recordings performed according to AASM guidelines [6]. PSG data were sampled at 256 Hz; EEG and EOG were filtered at 0.3–35 Hz; and EMG was filtered at 10–100 Hz. Impedances were verified to be ≤10 kΩ at the start of the PSG recordings. The following standard sleep parameters were calculated for the purposes of this study: TIB, in minutes, as time from lights out to lights on; TST in minutes; and SE as a percentage (TST/TIB*100). Sleep stages (wake, N1, N2, slow- wave sleep (SWS), and REM) were scored by registered polysomnographic technologists (RPSGTs) in 30-s epochs, according to standard criteria [6].

2.4. Actigraphy An Actiwatch 2 device was worn on the participant’s non-dominant wrist (same wrist as the Zulu watch) throughout the study period. Activity data were collected in 30-s epochs. Sleep interval start and end times were manually set by a researcher using Actiware software (version 6.0.9) in accordance with the PSG recording period. Sleep-Wake determi- nation and the TIB, TST, and SE for each sleep interval using EBE data were computed by the Actiware 6.0.9 scoring algorithm using the medium sensitivity threshold.

2.5. Automatic Sleep Determination and Sleep Scoring by the Zulu Watch The Zulu watch hardware device collects activity data in 2-min epochs and auto- matically scores TIB and SE on-wrist, based on a proprietary algorithm for Sleep-Wake determination. Devices were programmed to detect multiple sleep episodes per day; the minimum sleep detection was set at 20 min. Data were then exported as the scored sleep interval information and as 2-min EBE data for the duration of the study period. TST was calculated as TIB*SE to determine the number of minutes during the sleep interval in which the Zulu watch determined that sleep was occurring. Epoch data are scored as on-wrist “On” or off-wrist “Off”. Epochs are scored in a separate data column as 0 for periods of wake, 1 for restless or interrupted sleep, 2 for light sleep, and 3 for deep sleep. The Zulu watch uses a propriety algorithm to estimate sleep depth, using only motion and on-wrist detection, and cannot differentiate between sleep stages. Zulu watch sleep-depth scoring should be considered an estimation of locomotor inactivity rather than an estimate of neurophysiological sleep architecture. Sensors 2021, 21, 76 5 of 17

2.6. Analyses All data were analyzed using Excel 2013 and Stata/MP 15 software. Statistical signifi- cance was set at p < 0.05. The statistical plan for the validation of the Zulu watch against PSG and actigraphy was guided by the recommendations set forth by the Sleep Research Society for the validation of commercial sleep wearables against the gold-standard PSG and research-grade actigraphy [5]. One participant’s third-night data were excluded because the Zulu watch indicated that it was off-wrist during the study night. Another participant’s second-night data were excluded because of technical issues collecting the PSG data. For Sleep-Wake determination, all sleep epochs were coded as 1, and all wake epochs were scored as 0 for PSG, Actiwatch 2, and Zulu watch data. PSG and Actiwatch 30-s epochs were recalculated into 2-min epochs by averaging the binary sleep scores into 2-min bins and rounding up to the nearest integer. Using binary Sleep-Wake scores, Zulu watch EBE data were compared against the standard measures (PSG or Actiwatch 2) by subtracting the Zulu watch sleep scores from the corresponding standard measure to determine the agreement for individual epochs. Sensitivity, specificity, and accuracy were then computed for the total recording period. True positives were considered all epochs wherein a Zulu watch score indicates sleep and was in agreement with the standard measures’ (PSG or Actiwatch 2) epochs indicating sleep, while true negatives were considered epochs where the Zulu watch scores indicating wake were in agreement with the PSG or Actiwatch 2 epochs indicating wake. Accuracy ((true positives + true negatives)/all epochs) was calculated as the proportion of all epochs wherein the Zulu watch scoring was in agreement with the standard measures over the total recorded epochs. Sensitivity (true positives/(true positives + false negatives)) was calculated as the proportion of all epochs identified as sleep by the Zulu watch over all epochs identified as sleep by the standard measures, and specificity (true negatives/(true negatives + false positives)) was calculated as the proportion of all epochs identified as wake by the Zulu watch over all epochs identified as wake by the standard measures. Accuracy, sensitivity, and specificity were also calculated for Actiwatch 2 compared against PSG. Differences in accuracy, sensitivity, and specificity between the Actiwatch 2 and Zulu watches compared against PSG were evaluated using paired samples t-tests, and differences across all three sets of comparisons were evaluated using one-way analysis of variance. The Zulu watch sleep-depth scores were not considered equivalent to the PSG sleep stages. The Zulu sleep scores indicate depth-of-sleep via wrist inactivity rather than neurophysiological brain activity, and the data are collected in 2 min epochs instead of 30-s epochs. However, the ability of the Zulu watch to detect periods of inactivity that overlap with periods of restorative sleep was explored by converting the PSG sleep stages into a numerical code and comparing them against the Zulu watch sleep scores. The PSG scores were recalculated into 2-min bins by averaging the sleep-staging scores from 30-s epochs into 2-min bins and recoding these bins to approximate the Zulu watch scores. Figure1 shows the logic for recoding the PSG stages and the Zulu watch sleep scores developed for this study. It should be noted that the numerical scores for the PSG 30-s epochs range between 0 and 3 but are not considered directly equivalent to the Zulu watch sleep scores (0–3). Bins with an average of 0.0 indicated wakefulness for the entire period and were considered equivalent to a Zulu watch score of 0 (wake). Bins with averages between 0.25 and 0.75 (indicating a mixture of epochs scored as sleep and wake within a 2-min period) were considered equivalent to interrupted sleep, or a Zulu score of 1. Light sleep (Zulu watch score of 2) was defined to be any 2-min bins with an average between 1 and 2.25. Deep sleep (Zulu watch score of 3) was defined as any 2-min bins with an average ≥2.5. Zulu watch data were then compared bin by bin with PSG by subtracting the Zulu watch sleep scores from the corresponding PSG score to determine the agreement for individual epochs. The accuracy, sensitivity, and specificity were calculated as previously described for the proportion of all bins wherein the Zulu watch scoring was in agreement with the PSG bins for interrupted sleep, light sleep, or deep sleep. Sensors 2021, 21, x FOR PEER REVIEW 6 of 18

Deep sleep (Zulu watch score of 3) was defined as any 2-min bins with an average ≥2.5. Zulu watch data were then compared bin by bin with PSG by subtracting the Zulu watch sleep scores from the corresponding PSG score to determine the agreement for individual epochs. The accuracy, sensitivity, and specificity were calculated as previously described Sensors 2021, 21, 76 6 of 17 for the proportion of all bins wherein the Zulu watch scoring was in agreement with the PSG bins for interrupted sleep, light sleep, or deep sleep.

Figure 1. Recoding the polysomnography (PSG) sleep stages from 30-s epochs into 2-min Zulu watch sleep score equivalents. Figure 1. Recoding the polysomnography (PSG) sleep stages from 30-s epochs into 2-min Zulu watch sleep score equiva- REM,lents. REM, rapid eyerapid movement; eye movement; SWS, slow-waveSWS, slow- sleep.wave sleep. Sleep summary statistics (TIB, TST, SE, time of sleep onset, and time of final awakening) Sleep summary statistics (TIB, TST, SE, time of sleep onset, and time of final awaken- were calculated over the PSG recording period and were extracted from Actiware or ing) were calculated over the PSG recording period and were extracted from Actiware or Zulu watch sleep interval export data for the Actiwatch 2 and Zulu watches, respectively. Zulu watch sleep interval export data for the Actiwatch 2 and Zulu watches, respectively. Zulu watches detect periods of wakefulness during sleep episodes but do not provide an on- Zulu watches detect periods of wakefulness during sleep episodes but do not provide an wrist report of sleep-onset latency (SOL), wake after sleep onset, or number of awakenings, on-wrist report of sleep-onset latency (SOL), wake after sleep onset, or number of awak- so these measures could not be compared against PSG or actigraphy. Differences between enings, so these measures could not be compared against PSG or actigraphy. Differences PSG and the Zulu watch and Actiwatch 2 watch sleep summary statistics were examined between PSG and the Zulu watch and Actiwatch 2 watch sleep summary statistics were using paired samples t-tests. Agreement between the summary statistics was evaluated examined using paired samples t-tests. Agreement between the summary statistics was using single rater, two-way random effects intraclass correlation coefficients (ICCs) with evaluatabsoluteed agreement. using single The rater, ICC valuestwo-way were random classified effects as poor intraclass (<0.50), correlation moderate coefficients (0.50–0.75), (ICCs)good (0.75–0.90), with absolute orexcellent agreement. (>0.90) The ICC based values on established were classified guidelines as poor [26 (<0]..50), Differences moderate in (0.50sleep–0.75), summary good statistics (0.75–0.90), from or theexcellent Zulu watch(>0.90) and based Actiwatch on established 2 compared guidelines with the[26] gold-. Dif- ferencesstandard in PSG sleep were summary additionally statistics visualized from the using Zulu Bland–Altman watch and Actiwatch plots. Limits 2 compared of agreement with werethe gold computed-standard (mean PSG differencewere additionally± 1.96 SD) visualized to indicate using the Bland range– inAltman which plots. the differences Limits of agreementbetween the were two computed measures would(mean occurdifferenc withe ± 95% 1.96 probability SD) to indicate [27]. the range in which the differencesBecause between Zulu watches the two can measures detect multiplewould occur sleep with episodes 95% probability as short as [27] 20 min. per day, thereBecause were instances Zulu watches when can a participant’s detect multiple Zulu sleep watch episodes logged as sleep short intervals as 20 min occurring per day, thereoutside were of the instances PSG recording when a period. participant’s The timing Zulu and watch duration logged of sleep extra sleepintervals intervals occurring were outsidecompared of the with PSG the recording watch’s off-wrist period. detectionThe timing record and duration and the timing of extra of sleep study intervals procedures were to compareddetermine with whether the thewatch’s logged off interval-wrist detection most likely record represented and the timing a time whenof study the procedures watch was tooff-wrist, determine a time whether when the logged study procedures interval most required likely therepresented participant a time to restrict when movement the watch (suchwas off as-wrist, when a placing time when electrodes the study on theprocedures scalp for required PSG or completing the participant a cognitive to restrict test), move- or a mentpossible (such nap. as when placing electrodes on the scalp for PSG or completing a cognitive test), or a possible nap. 3. Results 3.1. Zulu Watch Sleep-Wake Determination Compared with PSG and Actigraphy The accuracy, sensitivity, and specificity between the sleep measurement methods are summarized in Table1. EBE data (shown as hypnograms in Figure2) were compared in 2-min bins across all measures. Compared with PSG or actigraphy, the accuracy for the determination of sleep by the Zulu watches was high, as was the sensitivity to detect sleep epochs. However, Zulu watches showed low specificity for identifying epochs of wake during a sleep interval. For comparison, the accuracy, sensitivity, and specificity were also computed for the Actiwatch 2 compared with PSG from this dataset (see Table1), Sensors 2021, 21, x FOR PEER REVIEW 7 of 18 Sensors 2021,, 21,, xx FORFOR PEERPEER REVIEWREVIEW 7 of 18

3. Results 3. Results 3.1. Zulu Watch Sleep–Wake Determination Compared with PSG and Actigraphy 3.1. Zulu Watch Sleep–Wake Determination Compared with PSG and Actigraphy The accuracy, sensitivity, and specificity between the sleep measurement methods The accuracy, sensitivity, and specificity between the sleep measurement methods are summarized in Table 1. EBE data (shown as hypnograms in Figure 2) were compared are summarized in Table 1. EBE data (shown as hypnograms in Figure 2) were compared in 2-min bins across all measures. Compared with PSG or actigraphy, the accuracy for the in 2-min bins across all measures. Compared with PSG or actigraphy, the accuracy for the Sensors 2021, 21, 76 determination of sleep by the Zulu watches was high, as was the sensitivity to detect sleep7 of 17 determination of sleep by the Zulu watches was high, as was the sensitivity to detect sleep epochs. However, Zulu watches showed low specificity for identifying epochs of wake epochs. However, Zulu watches showed low specificity for identifying epochs of wake during a sleep interval. For comparison, the accuracy, sensitivity, and specificity were also during a sleep interval. For comparison, the accuracy, sensitivity, and specificity were also computed for the Actiwatch 2 compared with PSG from this dataset (see Table 1), which computedwhich also for had the a highActiwatch accuracy 2 compared and sensitivity with PSG but from low specificity. this dataset Paired (see Table samples 1), whicht-tests also had a high accuracy and sensitivity but low specificity. Paired samples t-tests indi- alsoindicated had a a high trend accuracy for Zulu and watches sensitivity to have but higher low spe percentagescificity. Paired for specificity samples t versus-tests indi- PSG cated a trend for Zulu watches to have higher percentages for specificity versus PSG than catedthan thea trend Actiwatch for Zulu 2 (watchest = 1.92, top =have 0.06). higher The percentages accuracy and for sensitivity specificity versus versus PSGPSG werethan the Actiwatch 2 (t = 1.92, p = 0.06). The accuracy and sensitivity versus PSG were compa- thecomparable Actiwatch between 2 (t = 1.92, the Zulup = 0.06). watch The and accuracy Actiwatch and 2sensitivity (all p > 0.42). versus PSG were compa- rable between the Zulu watch and Actiwatch 2 (all p > 0.42). rable between the Zulu watch and Actiwatch 2 (all p > 0.42). Table 1. Accuracy, sensitivity, and specificity for Sleep-Wake determination. Table 1. Accuracy, sensitivity, and specificity for sleep–wake determination. Table 1. Accuracy, sensitivity, and specificity for sleep–wake determination. Table 1. Accuracy, sensitivity, and specificity forAccuracy sleep–wake determination Sensitivity. Specificity Accuracy Sensitivity Specificity Zulu compared with PSG Accuracy90.28% Sensitivity 97.78% Specificity 25.68% Zulu compared with PSG 90.28% 97.78% 25.68% ActiwatchZulu compared 2 compared with with PSG PSG 90.78%90.28% 98.86%97.78% 21.12%25.68% ActiwatchZulu compared 2 compared with Actiwatch with PSG 2 94.24%90.78% 96.28%98.86% 32.94%21.12% Actiwatch 2 compared with PSG 90.78% 98.86% 21.12% Zulu compared with Actiwatch 2 94.24% 96.28% 32.94% Zulu compared with Actiwatch 2 94.24% 96.28% 32.94%

Figure 2.2. ComparisonComparison ofof the the participants’ participants’ epoch-by-epoch epoch-by-epoch sleep-wake sleep–wake hypnograms hypnograms by polysomnography by polysomnography (PSG), (PSG), Zulu watch, Zulu Figure 2. Comparison of the participants’ epoch-by-epoch sleep–wake hypnograms by polysomnography (PSG), Zulu watch, and actigraphy. A three-panel comparison of the PSG, Zulu watch, and Actiwatch 2 sleep–wake determination watch,and actigraphy. and actigraphy. A three-panel A three comparison-panel comparison of the PSG,of the Zulu PSG, watch, Zulu watch, and Actiwatch and Actiwatch 2 Sleep-Wake 2 sleep– determinationwake determination across across all study nights by participant. Epochs are shown in study days and graphed along the x-axis. Sleep versus wake acrossall study all study nights nights by participant. by participant. Epochs Epochs are shownare shown in study in study days days and and graphed graphed along along the thex -axis.x-axis.Sleep Sleep versusversus wakewake determination is plotted in orange ( ) for PSG, blue ( ) for the corresponding Zulu watch data, and green ( ) for determination is plotted in orange ( )) forfor PSG,PSG, blueblue (( ) for the corresponding ZuluZulu watchwatch data,data, andand greengreen (( ) for the corresponding Actiwatch 2 data. Missing nights of data are indicated by gray blocks ( ). the corresponding Actiwatch 2 data.data. Missing nights of data are indicatedindicated byby graygray blocksblocks (( ).). 3.2. Zulu Watch Sleep-Depth Scoring Compared With PSG 3.2. Zulu Watch Sleep-Depth Scoring Compared With PSG 3.2. ZuluThe a Watchccuracy, Sleep-Depth sensitivity, Scoring and specificity Compared With for the PSG determination of interrupted, light, The accuracy, sensitivity, and specificity for the determination of interrupted, light, and deepThe accuracy,sleep are sensitivity,summarized and in specificityTable 2. EBE for data the determinationwere compared of by interrupted, 2-min bins light, (ex- and deep sleep are summarized in Table 2. EBE data were compared by 2-min bins (ex- ampleand deep sleep sleep-depth are hypnograms summarized are in shown Table 2in. Figure EBE data 3). The were Zulu compared watches byshowed 2-min a binshigh ample sleep-depth hypnograms are shown in Figure 3). The Zulu watches showed a high (example sleep-depth hypnograms are shown in Figure3). The Zulu watches showed a high accuracy for the determination of interrupted sleep and moderate accuracy for the determination of light sleep and deep sleep. The Zulu watch was particularly sensitive for detecting deep sleep compared with interrupted sleep, light sleep, or wake.

Sensors 2021, 21, x FOR PEER REVIEW 8 of 18

Sensors 2021, 21, 76 accuracy for the determination of interrupted sleep and moderate accuracy for the deter-8 of 17 mination of light sleep and deep sleep. The Zulu watch was particularly sensitive for de- tecting deep sleep compared with interrupted sleep, light sleep, or wake.

TableTable 2. 2.Accuracy, Accuracy,Sensors sensitivity, sensitivity,Sensors 2021 , 202121 and, andx, 21FOR specificity ,specificity x PEERFOR PEER REVIEW ofof theREVIEWthe Zulu Zulu sleep-depthsleep-depth scoring versus polysomnogra-polysomnography. 7 of 187 of 18 phy. Accuracy Sensitivity Specificity Accuracy Sensitivity Specificity Interrupted sleep 87.17% 28.81% 92.12% Interrupted sleep 87.17% 28.81% 92.12% Light sleep 51.74%3. Results3. Results 10.73% 88.17% Light sleep 51.74% 10.73% 88.17% Deep sleep 48.77%3.1. Zulu3.1. Zulu 84.16%Watch Watch Sleep Sleep–Wake–Wake Determination Determination 29.90% Compared Compared with w PSGith PSG and Actigraphyand Actigraphy Deep sleep 48.77% 84.16% 29.90% The Theaccuracy, accuracy, sensitivity, sensitivity, and andspecificity specificity between between the sleepthe sleep measurement measurement methods methods are summarizedare summarized in Table in Table 1. EBE 1. EBE data data (shown (shown as hypnograms as hypnograms in Figure in Figure 2) were 2) were compared compared in 2-inmin 2- minbins bins across across all measures. all measures. Compared Compared with with PSG PSG or actigraphy, or actigraphy, the accuracythe accuracy for the for the determinationdetermination of sleep of sleep by the by Zulu the Zulu watches watches was washigh, high, as was as wasthe sensitivity the sensitivity to detect to detect sleep sleep epochs.epochs. However, However, Zulu Zulu watches watches showed showed low lowspecificity specificity for ideforntifying identifying epochs epochs of wake of wake duringduring a sleep a sleep interval. interval. For comparison,For comparison, the accuracy, the accuracy, sensitivity, sensitivity, and andspecificity specificity were w alsoere also computedcomputed for thefor Actiwatchthe Actiwatch 2 compared 2 compared with with PSG PSG from from this thisdataset dataset (see (seeTable Table 1), which 1), which also alsohad hada high a high accuracy accuracy and andsensitivity sensitivity but lowbut lowspe cificity.specificity. Paired Paired samples samples t-tests t-tests indi- indi- catedcated a trend a trend for Zulu for Zulu watches watches to have to have higher higher percentages percentages for specificity for specificity versus versus PSG PSG than than the Actiwatchthe Actiwatch 2 (t =2 1(t.92, = 1 p.92, = 0 p.06). = 0.06). The Theaccuracy accuracy and andsensitivity sensitivity versus versus PSG PSG were were compa- compa- rablerable between between the Zuluthe Zulu watch watch and andActiwatc Actiwatch 2 (allh 2 p(all > 0 p.42). > 0.42).

TableTable 1. Accuracy, 1. Accuracy, sensitivity, sensitivity, and specificityand specificity for sleep for sleep–wake–wake determination determination. .

AccuracyAccuracy SensitivitySensitivity SpecificitySpecificity ZuluZulu compared compared with with PSG PSG 90.28%90.28% 97.78%97.78% 25.68%25.68% ActiwatchActiwatch 2 compared 2 compared with with PSG PSG 90.78%90.78% 98.86%98.86% 21.12%21.12% ZuluZulu compared compared with with Actiwatch Actiwatch 2 2 94.24%94.24% 96.28%96.28% 32.94%32.94%

Figure 3. Comparison of the participants’ epoch-by-epoch Sleep-Wake hypnograms by polysomnography (PSG) and Zulu Figure 3. Comparison of the participants’ epoch-Figureby-epochFigure 2. sleepComparison 2. Comparison–wake hypnograms of the of participants’the byparticipants’ polysomnography epoch epoch-by-epoch- by(PSG)-epoch sleep and sleepZulu–wake –wake hypnograms hypnograms by polysomnography by polysomnography (PSG), (PSG), Zulu Zulu watch.watch. A two-panelA two-panel comparison comparison of of the the paired paired PSGPSGwatch, andandwatch, and Zulu andactigraphy. watch actigraphy. sleep sleep-scoring A-scoring three A three- datapanel data- panelfrom fromcomparison the comparison the first first study of study the ofnight PSG,the night forPSG, Zulu foreach eachZulu watch, watch, and andActiwatch Actiwatch 2 sleep 2 sleep–wake–wake determination determination participant. Only the first night is depicted in orderacrossacross toall improve study all study nights readability. nights by participant. by Epochsparticipant. Epochs are graphedEpochs are shown are along shown in the study inx-axis study days and days and graphedand graphed along along the x the-axis x-. axisSleep. Sleep versus versus wake wake shown in hours of sleep over the study night. Sleepdeterminationdetermination stages are is plotted plotted is plotted inin orangeorange in orange ( ()) for )PSG, PSG for PSG, andblue blueblue( () for )the for corresponding the the corresponding Zulu Zulu watch watch data, data, and greenand green ( () for ) for corresponding Zulu watch data. the correspondingthe corresponding Actiwatch Actiwatch 2 data. 2 data. Missing Missing nights nights of data of dataare indicated are indicated by gray by grayblocks blocks ( (). ).

3.2. Zulu3.2. Zulu Watch Watch Sleep Sleep-Depth-Depth Scoring Scoring Compared Compared With With PSG PSG The Theaccuracy, accuracy, sensitivity, sensitivity, and andspecificity specificity for thefor determinationthe determination of interrupted, of interrupted, light, light, and anddeep deep sleep sleep are summarizedare summarized in Table in Table 2. EBE 2. EBE data data were were compared compared by 2 by-min 2- minbins bins (ex- (ex- ampleample sleep sleep-depth-depth hypnograms hypnograms are shownare shown in Figure in Figure 3). The 3). TheZulu Zulu watches watches showed showed a high a high

Sensors 2021, 21, 76 9 of 17

3.3. Zulu Watch Determination of Sleep Summary Statistics Compared with PSG and Actigraphy Sleep summary statistics as determined by PSG, Actiwatch 2, and Zulu watch are summarized in Table3. The TIB was set by the researchers at a constant of 480 min for all participants across all nights in the PSG and Actiwatch 2 datasets but was automatically determined by the Zulu watch. The lack of variability in TIB between the PSG and Actiwatch 2 measurements prohibited statistical testing, as we have noted. Paired samples t-test analyses indicated agreement between the PSG and Zulu watch measures of TST, time of sleep onset, and time of final awakening, but a lack of agreement between the PSG and Zulu watch measures of SE. Actiwatch 2 and Zulu watch measures of SE, time of sleep onset, and time of final awakening were in agreement, but paired samples t-tests indicated a lack of agreement between Zulu watch and Actiwatch 2 TST. Paired samples t-tests were also computed between the PSG and Actiwatch 2 measures of sleep; there was a lack of agreement between the TST and SE measures. The PSG and Actiwatch 2 measures of time of sleep onset and time of final awakening were in agreement.

Table 3. Comparison of Zulu watch sleep summary statistics against the PSG and actigraphy standards.

Zulu Mean Standard Standard Mean Mean Difference Paired Samples t-Test PSG 4800 ± 00 −240 t = 3.43, p = 0.001 * TIB 0 ± 0 456 34 Actigraphy 4800 ± 00 −240 t = 3.43, p = 0.001 * PSG 4080 ± 570 +60 t = 0.45, p = 0.65 TST 0 ± 0 414 34 Actigraphy 4360 ± 210 −220 t = 2.61, p = 0.01 * PSG 85% ± 12% +6% t = 2.29, p = 0.03 * SE 91% ± 3% Actigraphy 91% ± 4% −0.1% t = 0.06, p = 0.95 PSG 22:02 ± 0:310 +30 t = 0.22, p = 0.83 Time of sleep onset ± 0 22:05 0:44 Actigraphy 21:530 ± 0:310 +120 t = 1.03, p = 0.31 PSG 05:47 ± 0:330 00 t = 0.06, p = 0.95 Time of final awakening ± 0 05:47 0:28 Actigraphy 05:490 ± 0:290 −20 t = 0.20, p = 0.84 Comparisons between PSG and Actigraphy PSG 4080 ± 570 TST − 0 t = 2.23, p = 0.03 * Actigraphy 4360 ± 210 28 PSG 85% ± 12% SE +6% t = 2.22, p = 0.03 * Actigraphy 91% ± 4% PSG 22:02 ± 0:310 Time of sleep onset − 0 t = 1.00, p = 0.32 Actigraphy 21:530 ± 0:310 9 PSG 05:47 ± 0:330 Time of final awakening 0 t = 0.24, p = 0.81 Actigraphy 05:490 ± 0:290 +2 PSG, polysomnography; SE, sleep efficiency; TIB, time in bed; TST, total sleep time; * p < 0.05.

The ICCs of the summary statistics across all three systems of measurement (PSG, Actiwatch 2, and Zulu watch) are summarized in Table4. Based on the guidelines set forth by Koo and Li [26], ICCs for time of sleep onset and time of final awakening were good (0.75–0.90), indicating agreement between all three systems of measurement (PSG, Actiwatch 2, and Zulu watch). The ICCs were moderate for TST and SE (0.50–0.75) and poor for TIB (<0.50). Figure4 summarizes the Bland–Altman plots of the mean difference between TST, SE, time of sleep onset, and time of final awakening, as measured by the Zulu watch or Actiwatch 2 compared against PSG. Sensors 2021, 21, 76 10 of 17

Table 4. Intraclass correlation coefficients between Zulu watch, PSG, and actigraphy.

Probability that Inter-Rater Summary Statistic ICC 95% CI ICC = 0 1 Reliability TIB 0.46 0.09–0.83 p = 0.007 * Poor TST 0.54 0.26–0.76 p ≤ 0.001 ** Moderate SE 0.60 0.33–0.83 p ≤ 0.001 ** Moderate Time of sleep onset 0.87 0.72–0.96 p ≤ 0.001 ** Good Time of final awakening 0.89 0.75–0.97 p ≤ 0.001 ** Good CI, confidence interval; ICC, intraclass correlation coefficient; SE, sleep efficiency; TIB, time in bed; TST, total sleep time. 1 Significance for the probability that the coefficient is zero, implying no agreement; * p < 0.05; ** p ≤ 0.001.

Sensors 2021, 21, x FOR PEER REVIEW3.4. Zulu Watch Extra Sleep Intervals 10 of 18

Across all participants and study days, Zulu watches logged 31 sleep intervals that occurred outside the PSG recording period. The interval median start times and duration Tforable all 4. extraIntraclass intervals correlation are summarized coefficients between in Table Zulu5. Intervals watch, PSG, tended and toactigraph be loggedy. either in the early morning or in the late afternoon rather than midday. Median, rather than average, start times are reported to account for this bimodalProbability distribution. Sixty-one percent (n = 19) Summary Statistic ICC 95% CI Inter-Rater Reliability of the extra intervals could be accounted for by instancesthat IC whenC = 0 1 off-wrist detection indicated that the watchTIB had been removed.0.46 Thirty-five0.09–0.83 percent p = (n 0=.007 11) * occurred duringPoor in-laboratory study procedures.TST Seven of0.54 these in-laboratory0.26–0.76 extrap ≤ intervals 0.001 ** occurredModerate when participants would haveSE been stationary0.60 during 0.33 morning–0.83 cognitivep ≤ 0.001 performance ** testsModerate at a computer, threeTime instances of sleep occurred onset when0.87 participants0.72–0.96 would p have≤ 0.001 been ** stationaryGood during evening Timecognitive of final performance awakening tests 0.89 at a computer,0.75–0.97 and onep instance ≤ 0.001 ** occurred whenGood the participant CI,was confidence stationary interval; during ICC, the intraclass evening correlation PSG electrode coefficient; application. SE, sleep efficiency; One logged TIB, interval time in bed; could TST,not betotal accounted sleep time. for 1 Significance either by off-wrist for the detectionprobability or that by laboratorythe coefficient procedures. is zero, implying The interval nooccurred agreement; in the * p morning < 0.05; ** (11:48) p ≤ 0.001. after study night 1 and lasted 30 min.

Total Sleep Time (in minutes) 150

100

50

0

-50

-100

-150 Difference between Device and PSG and Device between Difference

-200 300 350 400 450 500 Mean of Device and PSG

(A)

Figure 4. Cont.

Sensors 2021, 21, 76 11 of 17 Sensors 2021, 21, x FOR PEER REVIEW 11 of 18

Sleep Efficiency (%)

20%

10%

0%

-10%

-20%

-30% Difference between Device and PSG and Device between Difference

-40% 0% 20% 40% 60% 80% 100% Mean of Device and PSG

(B) Sleep Onset (HH:MM) 1:00

00:30

00:00

-00:30

-01:00

-01:30

Difference between Device and PSG and Device between Difference -01:45

-02:00 21:00 21:15 21:30 21:45 22:00 22:15 22:30 22:45 23:00 23:15 Mean Sleep Onset Time between Device and PSG (C)

Figure 4. Cont.

Sensors 2021, 21, x FOR PEER REVIEW 7 of 18

Sensors 2021, 21, x FOR PEER REVIEW 7 of 18

3. Results 3. Results 3.1. Zulu Watch Sleep–Wake Determination Compared with PSG and Actigraphy 3.1. Zulu Watch Sleep–Wake Determination Compared with PSG and Actigraphy The accuracy, sensitivity, and specificity between the sleep measurement methods The accuracy, sensitivity, and specificity between the sleep measurement methods are summarized in Table 1. EBE data (shown as hypnograms in Figure 2) were compared are summarized in Table 1. EBE data (shown as hypnograms in Figure 2) were compared in 2-min bins across all measures. Compared with PSG or actigraphy, the accuracy for the in 2-min bins across all measures. Compared with PSG or actigraphy, the accuracy for the determination of sleep by the Zulu watches was high, as was the sensitivity to detect sleep determination of sleep by the Zulu watches was high, as was the sensitivity to detect sleep epochs. However, Zulu watches showed low specificity for identifying epochs of wake epochs. However, Zulu watches showed low specificity for identifying epochs of wake during a sleep interval. For comparison, the accuracy, sensitivity, and specificity were also during a sleep interval. For comparison, the accuracy, sensitivity, and specificity were also computed for the Actiwatch 2 compared with PSG from this dataset (see Table 1), which computed for the Actiwatch 2 compared with PSG from this dataset (see Table 1), which also had a high accuracy and sensitivity but low specificity. Paired samples t-tests indi- also had a high accuracy and sensitivity but low specificity. Paired samples t-tests indi- cated a trend for Zulu watches to have higher percentages for specificity versus PSG than cated a trend for Zulu watches to have higher percentages for specificity versus PSG than the Actiwatch 2 (t = 1.92, p = 0.06). The accuracy and sensitivity versus PSG were compa- the Actiwatch 2 (t = 1.92, p = 0.06). The accuracy and sensitivity versus PSG were compa- rable between the Zulu watch and Actiwatch 2 (all p > 0.42). rable between the Zulu watch and Actiwatch 2 (all p > 0.42). Table 1. Accuracy, sensitivity, and specificity for sleep–wake determination. Table 1. Accuracy, sensitivity, and specificity for sleep–wake determination. Sensors 2021, 21, 76 Accuracy12 of 17 Sensitivity Specificity SensorsSensorsSensors 2021 2021 2021, 21, 21, ,x 21, FORx, FORx FOR PEER PEER PEER REVIEW REVIEW REVIEW 1212 of12 of18 of 18 18 SensorsSensors 2021 2021, 21, 21, x, xFOR FOR PEER PEER REVIEW REVIEW 1212 of of 18 18 Accuracy Sensitivity Specificity Zulu compared with PSG 90.28% 97.78% 25.68% Zulu compared with PSG 90.28% 97.78% 25.68% Actiwatch 2 compared with PSG 90.78% 98.86% 21.12% Actiwatch 2 compared with PSG 90.78% 98.86% 21.12% Zulu compared with Actiwatch 2 94.24% 96.28% 32.94% Zulu compared with Actiwatch 2 94.24% 96.28% 32.94% FinalFinalFinal Awakening Awakening Awakening (HH:MM) (HH:MM) (HH:MM) 1:301:301:30 FinalFinal Awakening Awakening (HH:MM) (HH:MM) 1:301:30

1:001:001:00 1:001:00

00:3000:3000:30 00:3000:30

00:0000:0000:00 00:0000:00

-00:30-00:30-00:30 -00:30-00:30

Difference between Device and PSG and Device between Difference -01:00 Difference between Device and PSG and Device between Difference

Difference between Device and PSG and Device between Difference -01:00-01:00 Difference between Device and PSG and Device between Difference Difference between Device and PSG and Device between Difference -01:00-01:00

-01:30-01:30-01:30 -01:30-01:30 05:0005:0005:00 05:15 05:15 05:15 05:30 05:30 05:30 05:45 05:45 05:45 06:00 06:00 06:00 06:15 06:15 06:15 06:30 06:30 06:30 06:45 06:45 06:45 07 07:00 07:00:00 05:0005:00 05:15 05:15 05:30 05:30 05:45 05:45 06:00 06:00 06:15 06:15 06:30 06:30 06:45 06:45 07 07:00:00 MeanMeanMean Final Final Final Awakening Awakening Awakening Time Time Time between between between Device Device Device and and and PSG PSG PSG MeanMean Final Final Awakening Awakening Time Time between between Device Device and and PSG PSG

(D(D)( D) ) Figure 2. Comparison of the participants’(D( Depoch) ) -by-epoch sleep–wake hypnograms by polysomnography (PSG), Zulu FigureFigureFigure 4. 4. 4.BlandBland–Altman Bland–Altman–Altman plot plotsplots ofs ofof the thethe differences differencesdifferences (y (-(yaxisy-axis)-axis) between) betweenbetween polysomnography polysomnographypolysomnography (PSG) (PSG) and and the the Figure 2. Comparison of the participants’ epoch-by-epoch sleep–wake hypnograms by polysomnographyFigureFigureFigure 4. Blandwatch, 4. 4. Bland Bland(PSG),–Altman and–Altman–Altman actigraphy.Zulu plot plots plotof sthe sofA of differencesthethree the differences differences-panel ( ycomparison-axis ( y(-y)axis- axisbetween) )between between of thepolysomnography PSG,polysomnography polysomnography Zulu watch, (PSG) and (PSG) (PSG) and Actiwatch andthe and the the 2 sleep–wake determination ZuluZuluZulu watch watchwatch ( () or) or orPSG PSG PSG and and and the the the Actiwatch Actiwatch Actiwatch 2 (2 2( ) ( measures) measures) measures of of sleep ofsleep sleep versus versus versus the the mean the mean mean of of the the of two thetwo two watch, and actigraphy. A three-panel comparison of the PSG, Zulu watch, and Actiwatch 2Zulu sleepZulu Zuluwatch–wake watch acrosswatch ( determination) or( (all )PSG )or studyor PSG andPSG nights and the and Actiwatchthe the by Actiwatch Actiwatchparticipant. 2 ( 2 )2( measuresEpochs( ) )measures measures are of shown sleep of of sleep sleep versus in studyversus versus the days meanthe the meanand mean of thegraphed of of twothe the two twoalong the x-axis. Sleep versus wake measurementsmeasurements (x -(axisx-axis). Bias). Bias is isrepresented represented by by the the solid solid blue blue line line ( ( ) for) for the the Zu Zululu watch watch and and the the across all study nights by participant. Epochs are shown in study days and graphed along measurementsthemeasurements x-axismeasurementsmeasurements. Sleepdetermination (x versus-axis).(x-axis ( x(-x axis).wake- Biasaxis Bias is).). isplotted Bias isBias represented represented is is represented in represented orange byby ( thethe by by solidthe) thefor solid solidPSG,blue blue blue lineblue line line( (line ( ()) for for for) the )forthe thefor Zu thecorresponding Zuluthelu Zu Zuwatchlu watchlu watch watch and and Zulu andthe and the watchthe data, and green ( ) for solidsolid green green line line ( ( ) for) for the the Actiwatch Actiwatch 2. 2.Upper Upper and and lower lower limits limits of of agreement agreement (LOAs) (LOAs) are are repre- repre- determination is plotted in orange ( ) for PSG, blue ( ) for the corresponding Zulu watchthesolid soliddata,solid solidgreen greenand green thegreen linegreen line corresponding line( line (( ( () for) for for) the)for for the Actiwatch Actiwatchthe the Actiwatch Actiwatch Actiwatch 22. data. Upper 2. 2. 2. UpperUpper Missing Upper and and andlower and nights lowerlower lower limits of limits datalimits of agreement areof of agreement agreementindicated agreement (LOAs) by (LOAs) (LOAs) (LOAs) gray are blocksrepre-are areare repre- repre- ( ). sentedsented by by the the blue blue dashed dashed lines lines ( ( ) for) for the the Zulu Zulu watch watch and and green green dashed dashed lines lines ( ( ) for) for the the the corresponding Actiwatch 2 data. Missing nights of data are indicated by gray blocks ( represented).sented sentedsented by bythe by by the bluethe the blueblue dashedblue dasheddashed dashed lines lineslines lines( ( () for)) the) for for Zuluthe the Zulu Zuluwatch watch watch watch and and andgreen and green greengreen dashed dashed dasheddashed lines lines lineslines( ( () for) the)for for the the ActiwatchActiwatch 2. 2.Narrower Narrower LOAs LOAs indicate indicate relatively relatively less less var variabilityiability versus versus PSG PSG than than wider wider LOAs. LOAs. theActiwatch ActiwatchActiwatchActiwatch 2. 2.Narrower 2. Narrower 2. Narrower Narrower LOAs LOAs LOAs LOAs indicate indicate indicate indicate3.2. relatively relatively Zulurelatively relatively Watchless less lessvar less variabilitySleepiability var variability-iabilityDepth versus versus versus Scoringversus PSG PSG PSG thanPSG Compared than than widerthan wider wider wider LOAs. With LOAs. LOAs. LOAs. PSG AgreementAgreementAgreement is isshown isshown shown between between between measures measures measures for for (forA ()A (totalA) total) total sleep sleep sleep time, time, time, (B ()B sleep()B sleep) sleep efficiency, efficiency, efficiency, (C ()C (time)C time) time of of of 3.2. Zulu Watch Sleep-Depth Scoring Compared With PSG AgreementAgreementAgreement is shown is is shown shown between between between measures measures measures for ( Afor for) total( A(A) )total sleeptotal sleep sleep time, time, time, (B) ( sleep B(B) )sleep sleep efficiency, efficiency, efficiency, (C) ( C time(C) )time time of of of sleepsleep onset, onset, and and (D ()D time) time of of final final awakening. awakening. Time TheTime in a in ccuracy,bed bed (TIB) (TIB) sensitivity,could could not not be beand visualized visualized specificity by by the thefor the determination of interrupted, light, sleepsleepsleep onset, onset, onset, and and( andD) (time D(D) )time oftime final of of final awakening.final awakening. awakening. Time Time Timein bed in in bed(TIB) bed (TIB) (TIB)could could could not benot not visualized be be visualized visualized by the by by the the The accuracy, sensitivity, and specificity for the determinationBlandsleepBland–Altman onset,–Altman of and plots interrupted, plots (D because) because time of T IBlight, finalTIB was was awakening. invariably invariablyand deep set Time set at sleep at480 in 480 bedmin are min (TIB) forsummarized for all all could participants participants not in be Table across visualized across 2.all allEBE nights bynights data the were compared by 2-min bins (ex- BlandBlandBland–Altman–Altman–Altman plots plots plotsbecause because because TIB Twas TIBIB was invariably was invariably invariably set at set set 480 at at 480min 480 min for min allfor for participants all all participants participants across across across all nights all all nights nights and deep sleep are summarized in Table 2. EBE data wereforBland–Altmanfor PSGcompared PSG and and the the byActiwatch plots Actiwatch 2-min because bins2. 2. TIB (ex- was invariablyample sleep set at-depth 480 min hypnograms for all participants are shown across in all Figure nights 3). The Zulu watches showed a high for PSGforfor PSG andPSG andthe and Actiwatchthe the Actiwatch Actiwatch 2. 2. 2. ample sleep-depth hypnograms are shown in Figure 3). Thefor Zulu PSG and watches the Actiwatch showed 2. a high 3.4.3.4. Zulu Zulu Watch Watch Extra Extra Sleep Sleep Intervals Intervals 3.4. 3.4.Zulu3.4. Zulu Zulu Watch Watch Watch Extra Extra Extra Sleep Sleep Sleep Intervals Intervals Intervals Table 5. Zulu watch extra sleep intervals by time, duration, and reason for occurrence. AcrossAcross all all participants participants and and study study days, days, Zulu Zulu watches watches logged logged 31 31 sleep sleep intervals intervals that that occurred occurred out- out- AcrossAcrossAcross all participants all all participants participants and andstudy and study study days, days, days, Zulu Zulu Zuluwatches watches watches logged logged logged 31 sleep 31 31 sleep sleep intervals intervals intervals that that occurred that occurred occurred out- out- out- sideside the the PSG PSG recording recording period. period. The The interval interval median median start start times times and and duration duration for for all all extra extra intervals intervals sideside theside PSGthe the PSG recordingPSG recording recording period. period. period. The Theinterval The interval interval median median median start start timesstart times times and andduration and durationEvening duration for allfor for extra all all extra extraintervals intervals intervals areare summarized summarized in in Table Table 5. 5.Intervals Intervals tended tended to to Morningbe be logged logged Laboratoryeit eitherher in in the the early early morning morning or or in in the the late late are summarizedareare summarized summarizedOverall in Table in in Table Table 5. Intervals Off-Wrist 5. 5. Intervals Intervals tended tended tended to be to tologged be be logged logged either eit eit inherher the in in earlyLaboratorythe the early earlymorning morning morning or Otherin or orthe in in latethe the late late afternoonafternoon rather rather than than midday. midday. Median, Median, rather rather than than average,Procedures average, start start times times are are reported reported to to account account for for afternoonafternoonafternoon rather rather rather than than midday.than midday. midday. Median, Median, Median, rather rather rather than than average,than average, average, start start timesstart times Procedurestimes are reportedare are reported reported to account to to account account for for for thisthisthis bimodal bimodal bimodal distribution. distribution. distribution. Sixty Sixty Sixty-one-one- onepercent percent percent (n ( =n ( 19)n= =19) of19) of the of the theextra extra extra intervals intervals intervals could could could be be accounted be accounted accounted for for for Total number of intervals thisthis bimodal bimodal31 distribution. distribution. Sixty Sixty 19-one-one percent percent ( n(n = =19) 719) of of the the extra extra intervals intervals 4 could could be be accounted accounted 1 for for byby instances instances when when off off-wrist-wrist detection detection indicated indicated that that the the watch watch had had been been removed. removed. Thirty Thirty-five-five per- per- Number per participant, mean (range)by instancesbyby instances instances when4 (2–8) when when off- wristoff off-wrist-wrist detection 3 (0–5)detection detection indicated indicated indicated that 1 that (0–2)thethat watchthe the watch watch had hadbeen had been beenremoved. 1 (0–1) removed. removed. Thirty Thirty 0Thirty- (0–1)five- fiveper--five per- per- centcent (n ( =n 11)= 11) occurred occurred during during in in-laboratory-laboratory study study procedures. procedures. Seven Seven of of these these in in-laboratory-laboratory extra extra centcent (centn = (11) n(n = =occurred11) 09:5611) occurred occurred during during during in10:10-laboratory in in-laboratory-laboratory study study study procedures.07:44 procedures. procedures. Seven Seven Seven of these of 17:52of these these in-laboratory in in-laboratory-laboratory11:48 extra extra extra Interval start time, medianintervals (range)intervals occurred occurred when when participants participants would would have have been been stationary stationary during during morning morning cognitive cognitive per- per- intervalsintervalsintervals occurred(05:32–20:06) occurred occurred when when when participants(06:44–20:06) participants participants would would would have (05:32–8:08)have havebeen been beenstationary stationary stationary during (17:37–19:46)during during morning morning morning cognitive c ognitivec(NA)ognitive per- per- per- formanceformance tests tests at ata computer,a computer, three three instances instances occurred occurred when when participants participants would would have have been been sta- sta- formanceformanceformance tests52 tests attests min a atcomputer, at a acomputer, computer, 58three min three threeinstances instances instances occurred occurred occurred51 min when when when participants participants participants32 would min would would have have havebeen30 been min beensta- sta- sta- Average duration, mean (range)tionarytionary during during evening evening cognitive cognitive performance performance tests tests at ata computer,a computer, and and one one instance instance occurred occurred tionarytionarytionary during during(20–166) during evening evening evening cognitive cognitive (20–166)cognitive performance performance performance tests(26–122) tests attests a atcomputer, at a acomputer, computer, and andone and(24–50) oneinstance one instance instance occurred (NA)occurred occurred whenwhen the the participant participant was was stationary stationary during during the the evening evening PSG PSG electrode electrode application. application. One One logged logged whenwhenwhen the participantthe the participant participant was was stationary was stationary stationary during during during the eveningthe the evening evening PSG PSG ePSGlectrode e lectrodeelectrode application. application. application. One One loggedOne logged logged intervalinterval could could not not be be accounted accounted for for either either by by off off-wrist-wrist detection detection or or by by laboratory laboratory procedures. procedures. The The intervalintervalinterval could could could not benot not accounted be be accounted accounted for eitherfor for either either by off by by- wristoff off-wrist- wristdetection detection detection or by or or laboratory by by laboratory laboratory procedures. procedures. procedures. The The The intervalintervalinterval occurred occurred occurred in in the in the themorning morning morning (11:48) (11:48) (11:48) after after after study study study night night night 1 and1 and1 and lasted lasted lasted 30 30 min. 30 min. min. 4. Discussionintervalinterval occurred occurred in in the the morning morning (11:48) (11:48) after after study study night night 1 1and and lasted lasted 30 30 min. min.

While the true test of merit for the Zulu watch will be its performance in the field, laboratory validation is a crucial first step. Without validation testing, a sleep tracker lacks credibility in scientific or industrial research domains. Moreover, some common features of commercial sleep trackers, such as cloud-based sleep scoring or direct feedback to the user, may not be ideal for research. Sleep-tracking wearables serve an important role in

Sensors 2021, 21, 76 13 of 17

the future of sleep research and sleep medicine [28,29] and have the capacity to surpass gold-standard actigraphy in terms of functionality and accuracy of sleep measurement in the real world. The Zulu watch is designed to fill a niche—sleep in the operational environment—and is capable of on-wrist Sleep-Wake determination, sleep-depth scoring, long battery life, and optional feedback functionality. The purpose of the current analyses was to evaluate the ability of the Zulu watch to measure sleep variables of interest against the gold-standards PSG and research-grade actigraphy. The Zulu watch showed mixed results on EBE Sleep-Wake and sleep-depth staging classifications, but it performed well compared with PSG and research-grade actigraphy on several key sleep summary metrics in a sample of healthy young adults. The Zulu watch showed high accuracy for Sleep-Wake determination (>90%) com- pared with PSG and actigraphy. Sensitivity was also greater than 90% versus PSG and actigraphy. While specificity of the Zulu watch was low compared with the other measures, the specificity of the research-grade actigraph (Actiwatch 2) compared with PSG (21%) was comparable to the Zulu watch specificity (26%). Low specificity is an issue across research actigraphy devices and scoring algorithms [30,31], with longer epoch lengths relating to lower specificity [9]. It is possible that the Actiwatch 2 specificity was impacted by re-binning data from 30-s into 2-min epochs for comparison against Zulu watch data. Epochs longer than 60 s are not recommended when using Actiwatch devices and likely affect the specificity of the device [12,31,32]. Converting the epoch length to 2 min could have resulted in a lower specificity than found in other studies using 30- or 60-s actigraphy epoch lengths. Further, the high number of epochs scored as interrupted sleep by the Zulu watch may also contribute to its low specificity for wake versus PSG because many Zulu watch epochs scored as interrupted sleep may be scored as wake if using standard actigraphy algorithms. To better account for the duration and timing of wake or possible wake with the Zulu watch, perhaps the wake and interrupted sleep output metrics together could be a better reflection of sleep continuity through the night than wake alone. Sleep-depth estimation by the Zulu watch could only be compared against PSG because sleep scoring is not a feature in actigraphy. The Zulu watch showed moderate accuracy for the estimation of sleep depth compared with PSG. Interestingly, the Zulu watch showed high sensitivity but low specificity for detecting deep sleep, and the inverse (high specificity and low sensitivity) for the detection of light or interrupted sleep compared with PSG. For deep sleep, the Zulu watch may be prone to Type I errors (i.e., falsely estimating an epoch to be deep sleep), but prone to Type II errors for light or interrupted sleep (falsely indicating that an epoch is not light or interrupted sleep). It is possible that the trade-off between sensitivity and specificity between sleep scores is due to the discrepancy between the Zulu watch’s sleep-scoring terminology versus PSG. The Zulu watch’s sleep scoring divides sleep periods into interrupted sleep, light sleep, and deep sleep, which do not directly correlate to the PSG scoring system. Previous studies have compared the commercial wearable or mobile app sleep scoring of light and deep sleep against PSG under the assumption that sleep stages N1 and N2 are comparable to light sleep, SWS is comparable to deep sleep, and REM is its own category [33–36]. However, the Zulu watch does not provide a separate category score for REM sleep; it provides the category of interrupted sleep, which has not been compared against PSG in previous studies. Without further guidance from the literature, we have attempted to achieve equivalence by recoding the PSG sleep stages into Zulu watch sleep scores by averaging the PSG 30-s epochs into 2-min bins. Interrupted sleep was scored as bins that contained a balanced mixture of sleep and wake epochs, as previously described. However, it should be noted that what constitutes interrupted sleep is open to interpretation and that the distinction between wake, interrupted sleep, and light sleep could differ on an EBE basis. Moreover, deep sleep could represent any stage of sleep or a combination of NREM and REM epochs, as indicated in Figure1 by the arrows explaining the conversion between the PSG numerical scores and range of averages for the PSG scores. In a real-world environment, the Zulu watch estimation of sleep depth can help provide context on the Sensors 2021, 21, 76 14 of 17

quality of sleep opportunities that researchers may not otherwise be able to measure. However, Zulu watch sleep scores cannot and should not be considered equivalent to PSG sleep staging. Supplementary data sources, such as sleep diaries or daily schedules, as well as hand scoring EBE data, are recommended for any research or clinical use of actigraphy [9]. Supplementary measures may not always be practical to collect in operational research settings like the military, aviation, or shift work because they may increase the burden of data collection in a safety-sensitive environment [10]. This constitutes a limitation in the field of ecological sleep research technology in general. Researchers must use their best judgment when designing an operational research study design and when interpreting Zulu watch sleep scores in the context of their study. Paired samples t-tests and ICCs showed good agreement between the Zulu watch, PSG, and actigraphy for time of sleep onset and time of final awakening. However, for analysis of both PSG and actigraphy, the scoring of these data was informed by the scheduled bed and wake times each night. This information is used to direct the RPSGTs and actigraphy algorithm where to begin looking for sleep onset and final awakening. On most nights, the Zulu watch determination of time of initial sleep onset and time of final awakening was comparable to both PSG and actigraphy without any input regarding bed or wake times. There was moderate agreement between the Zulu watch, actigraphy, and PSG for TST and SE. Paired samples t-tests suggested that the limited agreement for TST and SE may in part be due to differences between actigraphy and PSG as well as differences related to the Zulu watch measures. Removing actigraphy or PSG measures did not influence the ICC results (see Supplementary Tables S1 and S2), which supports the finding that the Zulu watch has moderate agreement with either sleep measurement standards for TST or SE. Agreement between Zulu and PSG or actigraphy for TIB was poor. One limitation to the analysis is that TIB was a constant measure (480 min) set by study staff across all nights of PSG and actigraphy data collection. Future evaluation of the Zulu watch would need greater variability in TIB to conduct more definitive comparisons. In the present study, the underestimation of TIB by the Zulu watch, as well as the device’s poor specificity for wake detection, most likely account for the device’s overestimation of SE. Another factor that may contribute to the Zulu watch’s underestimation of TIB is the lack of SOL estimation or time spent in bed after final awakening (snooze time) measurement by the Zulu watch. To explore this possibility, SOL and snooze time were estimated post hoc by subtracting the Zulu watch-determined time of sleep onset from the bed and wake times. Results were not statistically different between the Zulu watch and PSG (all p > 0.48). This finding indicates that a shorter TIB as measured by the Zulu watch is because the watch does not measure SOL or snooze time. The Zulu watch has positive off-wrist detection so that periods when the watch is not being worn are not mistakenly scored as sleep, and it can measure sleep episodes as short as 20 min occurring at any time of the day. However, there were instances in the current study when the Zulu watch detected sleep intervals other than the study sleep period, even though participants were prohibited from napping and were closely monitored over the parts of the study while in the laboratory. Checking the sleep interval database against EBE data indicated that the majority (61%) of these recorded intervals registered as off-wrist periods. Further analyses are needed to determine how off-wrist periods could be misclassified as naps, and we recommend researchers using the Zulu watch verify naps against EBE data or a supplementary sleep diary. A design feature of the Zulu watch is collection and autonomous scoring of longitudinal sleep data in an operational environment. The Zulu watch can store scored sleep interval information for up to 80 sleep events but can only store 2-min EBE data for up to 7 consecutive days prior to extraction. Therefore, a limitation of the watch is that some off-wrist periods may be coded as naps. This discrepancy can be resolved by checking epoch logs, but only for data collected within the past week. We recommend that longitudinal data collection using the Zulu watch be extracted on a weekly basis to have complete EBE records. Moreover, the utility of the Zulu watch in this study was tested in a controlled laboratory environment Sensors 2021, 21, 76 15 of 17

in a sample of healthy sleepers. It remains to be seen if the watch functions similarly in operational environments, or for individuals with sleep disorders who may have more wake time and variability in their sleep patterns. As such, it is still recommended to ask participants to complete a sleep log to confirm sleep episodes or employ other methodology to confirm whether a sleep interval reflects true sleep versus a period of sedentary behavior or watch removal. Our findings indicate that a device with single-sensor input and a sampling rate longer than 60 s (i.e., the Zulu watch) can detect sleep onset and final awakening with >80% agreement, and can differentiate sleep versus wake with >90% accuracy compared to PSG or research-grade actigraphy. These design choices may not seem preferable to a multi-sensor input and smartwatch-capable design, but they allow the Zulu watch to measure sleep without sacrificing battery life or requiring continuous synchronization with a cloud-based application. Extended battery life and on-wrist scoring are two aspects of device design that should be considered if the wearable is intended for operational or epidemiological data collection where compliance, continuous data collection, and data security are concerns. The next step after laboratory validation testing of the Zulu watch will be to test its utility in real-world operations, not only to measure sleep compared to field actigraphy or self-report, but also to see how the Zulu watch measurements of sleep or sleep depth can be used to inform assumptions about sleep behavior in military, aviation, or healthcare workers, in order to predict fatigue risk and on-the-job performance [37,38].

5. Conclusions The longitudinal commercial sleep-tracking device, the Zulu watch, was evaluated against PSG and actigraphy as a sleep measurement device in a sample of healthy young adults over the course of a 3-day laboratory sleep study, with mixed results. The Zulu watch is not without its limitations, but showed high EBE accuracy and sensitivity for sleep and poor specificity for EBE detection of wake. The Zulu watch also showed high sensitivity for estimation of deep sleep (NREM + REM) compared with PSG, as well as moderate agreement for TST and SE. The Zulu watch was particularly good at determining the time of sleep onset and the time of final awakening. It is noteworthy that the Zulu watch can score sleep automatically on-wrist without requiring any intervention from the wearer or additional processing by a researcher or technologist. Moreover, the Zulu watch achieves this accuracy despite factors that could contribute to inaccuracy, such as long epoch lengths (2 min), on-wrist automatic scoring of sleep, and a lack of supplementary biometric sensor inputs like heart rate. These findings indicate that this monitoring system can reliably measure sleep using a minimalistic design. The Zulu watch represents an important step toward reliable, low-burden measurement of sleep in operational environments.

Supplementary Materials: The following are available online at https://www.mdpi.com/1424-822 0/21/1/76/s1, Table S1: Intraclass Correlation Coefficients between Zulu Watch and PSG, Table S2: Intraclass Correlation Coefficients between Zulu Watch and Actigraphy. Author Contributions: Conceptualization, J.K.D., E.D.C., R.R.M., L.P.S. and S.R.H.; methodology, J.K.D. and E.D.C.; validation, E.D.C., R.R.M. and S.R.H.; formal analysis, J.K.D. and E.D.C.; investiga- tion, E.D.C. and R.R.M.; resources, E.D.C., R.R.M., L.P.S. and S.R.H.; data curation, J.K.D., E.D.C. and L.P.S., writing—original draft preparation, J.K.D.; writing—review and editing, J.K.D., E.D.C., R.R.M., L.P.S. and S.R.H.; visualization, J.K.D.; supervision, R.R.M. and S.R.H.; project administration, E.D.C. and L.P.S.; funding acquisition, R.R.M. and S.R.H. All authors have read and agreed to the published version of the manuscript. Funding: The NHRC research was funded by the Office of Naval Research, Code 34. Validation of the Zulu watch was funded by Medical Technology Enterprise Consortium research award MTEC- 17-08-Multi-Topic-0104. Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki, took place at the Naval Health Research Center (NHRC), was approved by the NHRC Institutional Review Board (NHRC.2017.0008). Sensors 2021, 21, 76 16 of 17

Informed Consent Statement: Informed consent was obtained from all subjects involved in the study. Data Availability Statement: Restrictions apply to the availability of these data. Research data were ob- tained under a Cooperative Research and Development Agreement (CRADA) between the Institutes for Behavior Resources, Inc., and Naval Health Research Center. Acknowledgments: PSG scoring services were performed by RPSGT staff from the Sleep and Be- havioral Neuroscience Center at the University of Pittsburgh Medical Center. The authors wish to thank the study participants, as well as research staff members from NHRC who contributed to data collection. Conflicts of Interest: The authors declare no direct conflicts of interest. The Institutes for Behavior Resources is a non-profit that sells the Zulu watch as a research tool. Authors J.K.D., L.P.S., and S.R.H. are affiliated with the Institutes for Behavior Resources but do not benefit financially or non-financially from sale of the Zulu watch. The Zulu watch is no longer being manufactured or marketed for sale by the manufacturer. Disclaimer: This work was conducted (in part) under a Cooperative Research and Development Agreement (CRADA) between the Institutes for Behavior Resources, Inc., and Naval Health Research Center. Nothing in this document should be construed as an endorsement by the U.S. Navy. I am a military service member or employee of the U.S. Government. This work was prepared as part of my official duties. Title 17, U.S.C. §105 provides that copyright protection under this title is not available for any work of the U.S. Government. Title 17, U.S.C. §101 defines a U.S. Government work as work prepared by a military service member or employee of the U.S. Government as part of that person’s official duties. Report No. 20-97 was supported by the Office of Naval Research, Code 34, under work unit no. N1701. The views expressed in this article reflect the results of research conducted by the authors and do not necessarily reflect the official policy or position of the Department of the Navy, Department of Defense, nor the U.S. Government. The study protocol was approved by the Naval Health Research Center Institutional Review Board in compliance with all applicable federal regulations governing the protection of human subjects. Research data were derived from an approved Naval Health Research Center Institutional Review Board protocol, number NHRC.2017.0008.

References 1. Cappuccio, F.P.; D’Elia, L.; Strazzullo, P.; Miller, M.A. Sleep duration and all-cause mortality: A systematic review and meta- analysis of prospective studies. Sleep 2010, 33, 585–592. [CrossRef][PubMed] 2. Konjarski, M.; Murray, G.; Lee, V.V.; Jackson, M.L. Reciprocal relationships between daily sleep and mood: A systematic review of naturalistic prospective studies. Sleep Med. Rev. 2018, 42, 47–58. [CrossRef][PubMed] 3. Lerman, S.E.; Eskin, E.; Flower, D.J.; George, E.C.; Gerson, B.; Hartenbaum, N.; Hursh, S.R.; Moore-Ede, M. Fatigue Risk Management in the Workplace. J. Occup. Environ. Med. 2012, 54, 231–258. [CrossRef][PubMed] 4. Kripke, D.F.; Mullaney, D.J.; Messin, S.; Wyborney, V.G. Wrist actigraphic measures of sleep and rhythms. Electroencephalogr. Clin. Neurophysiol. 1978, 44, 674–676. [CrossRef] 5. Depner, C.M.; Cheng, P.C.; Devine, J.K.; Khosla, S.; de Zambotti, M.; Robillard, R.; Vakulin, A.; Drummond, S.P.A. Wearable technologies for developing sleep and circadian biomarkers: A summary of workshop discussions. Sleep 2020, 43.[CrossRef] [PubMed] 6. Berry, R.B.; Brooks, R.; Gamaldo, C.E.; Harding, S.M.; Marcus, C.; Vaughn, B.V. The AASM manual for the scoring of sleep and associated events. Rules Terminol. Techn. Specif. Darien Ill. Am. Acad. Sleep Med. 2012, 176, 2012. 7. Carskadon, M.A.; Dement, W.C. Normal human sleep: An overview. Princ. Pract. Sleep Med. 2005, 4, 13–23. 8. Smith, M.T.; McCrae, C.S.; Cheung, J.; Martin, J.L.; Harrod, C.G.; Heald, J.L.; Carden, K.A. Use of Actigraphy for the Evaluation of Sleep Disorders and Circadian Rhythm Sleep-Wake Disorders: An American Academy of Sleep Medicine Clinical Practice Guideline. J. Clin. Sleep Med. 2018, 14, 1231–1237. [CrossRef] 9. Ancoli-Israel, S.; Martin, J.L.; Blackwell, T.; Buenaver, L.; Liu, L.; Meltzer, L.J.; Sadeh, A.; Spira, A.P.; Taylor, D.J. The SBSM Guide to Actigraphy Monitoring: Clinical and Research Applications. Behav. Sleep Med. 2015, 13, S4–S38. [CrossRef] 10. Devine, J.K.; Choynowski, J.; Burke, T.; Carlsson, K.; Capaldi, V.F.; McKeon, A.B.; Sowden, W.J. Practice parameters for the use of actigraphy in the military operational context: The Walter Reed Army Institute of Research Operational Research Kit-Actigraphy (WORK-A). Mil. Med. Res. 2020, 7, 31. [CrossRef] 11. Chow, C.M.; Wong, S.N.; Shin, M.; Maddox, R.G.; Feilds, K.L.; Paxton, K.; Hawke, C.; Hazell, P.; Steinbeck, K. Defining the rest interval associated with the main sleep period in actigraph scoring. Nat. Sci. Sleep 2016, 8, 321–328. [CrossRef][PubMed] 12. Berger, A.M.; Wielgus, K.K.; Young-McCaughan, S.; Fischer, P.; Farr, L.; Lee, K.A. Methodological challenges when using actigraphy in research. J. Pain Symptom. Manag. 2008, 36, 191–199. [CrossRef][PubMed] 13. Baron, K.G.; Duffecy, J.; Berendsen, M.A.; Cheung Mason, I.; Lattie, E.G.; Manalo, N.C. Feeling validated yet? A scoping review of the use of consumer-targeted wearable and mobile technology to measure and improve sleep. Sleep Med. Rev. 2018, 40, 151–159. [CrossRef][PubMed] Sensors 2021, 21, 76 17 of 17

14. Peake, J.M.; Kerr, G.; Sullivan, J.P. A Critical Review of Consumer Wearables, Mobile Applications, and Equipment for Providing Biofeedback, Monitoring Stress, and Sleep in Physically Active Populations. Front. Physiol. 2018, 9, 743. [CrossRef] 15. Kubala, A.G.; Barone Gibbs, B.; Buysse, D.J.; Patel, S.R.; Hall, M.H.; Kline, C.E. Field-based Measurement of Sleep: Agreement between Six Commercial Activity Monitors and a Validated Accelerometer. Behav. Sleep Med. 2019, 18, 637–652. [CrossRef] 16. Mantua, J.; Gravel, N.; Spencer, R.M. Reliability of Sleep Measures from Four Personal Health Monitoring Devices Compared to Research-Based Actigraphy and Polysomnography. Sensors 2016, 16, 646. [CrossRef] 17. Kahawage, P.; Jumabhoy, R.; Hamill, K.; de Zambotti, M.; Drummond, S.P.A. Validity, potential clinical utility, and comparison of consumer and research-grade activity trackers in Disorder I: In-lab validation against polysomnography. J. Sleep Res. 2020, 29, e12931. [CrossRef] 18. Chinoy, E.D.; Cuellar, J.A.; Huwa, K.E.; Jameson, J.T.; Watson, C.H.; Bessman, S.C.; Hirsch, D.A.; Cooper, A.D.; Drummond, S.P.A.; Markwald, R.R. Performance of Seven Consumer Sleep-Tracking Devices Compared with Polysomnography. Sleep 2021, in press. 19. Winnebeck, E.C.; Fischer, D.; Leise, T.; Roenneberg, T. Dynamics and Ultradian Structure of Human Sleep in Real Life. Curr. Biol. 2018, 28, 49–59. [CrossRef] 20. Muzet, A.; Werner, S.; Fuchs, G.; Roth, T.; Saoud, J.B.; Viola, A.U.; Schaffhauser, J.Y.; Luthringer, R. Assessing sleep architecture and continuity measures through the analysis of heart rate and wrist movement recordings in healthy subjects: Comparison with results based on polysomnography. Sleep Med. 2016, 21, 47–56. [CrossRef] 21. Hedner, J.; White, D.P.; Malhotra, A.; Herscovici, S.; Pittman, S.D.; Zou, D.; Grote, L.; Pillar, G. Sleep staging based on autonomic signals: A multi-center validation study. J. Clin. Sleep Med. 2011, 7, 301–306. [CrossRef][PubMed] 22. Kolla, B.P.; Mansukhani, S.; Mansukhani, M.P. Consumer sleep tracking devices: A review of mechanisms, validity and utility. Exp. Rev. Med. Devices 2016, 13, 497–506. [CrossRef][PubMed] 23. Piwek, L.; Ellis, D.A.; Andrews, S.; Joinson, A. The Rise of Consumer Health Wearables: Promises and Barriers. PLoS Med. 2016, 13, e1001953. [CrossRef][PubMed] 24. Troiano, A. Wearables and Personal Health Data: Putting a Premium on Your Privacy. Brooklyn Law Rev. 2017, 82, 6. 25. Hermsen, S.; Moons, J.; Kerkhof, P.; Wiekens, C.; De Groot, M. Determinants for Sustained Use of an : Observa- tional Study. JMIR Mhealth Uhealth 2017, 5, e164. [CrossRef] 26. Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med. 2016, 15, 155–163. [CrossRef] 27. Bland, J.M.; Altman, D. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986, 327, 307–310. [CrossRef] 28. Ko, P.R.; Kientz, J.A.; Choe, E.K.; Kay, M.; Landis, C.A.; Watson, N.F. Consumer Sleep Technologies: A Review of the Landscape. J. Clin. Sleep Med. 2015, 11, 1455–1461. [CrossRef] 29. Khosla, S.; Deak, M.C.; Gault, D.; Goldstein, C.A.; Hwang, D.; Kwon, Y.; O’Hearn, D.; Schutte-Rodin, S.; Yurcheshen, M.; Rosen, I.M.; et al. American Academy of Sleep Medicine Board of, D. Consumer Sleep Technology: An American Academy of Sleep Medicine Position Statement. J. Clin. Sleep Med. 2018, 14, 877–880. [CrossRef] 30. Marino, M.; Li, Y.; Rueschman, M.N.; Winkelman, J.W.; Ellenbogen, J.M.; Solet, J.M.; Dulin, H.; Berkman, L.F.; Buxton, O.M. Measuring sleep: Accuracy, sensitivity, and specificity of wrist actigraphy compared to polysomnography. Sleep 2013, 36, 1747–1755. [CrossRef] 31. Paquet, J.; Kawinska, A.; Carrier, J. Wake detection capacity of actigraphy during sleep. Sleep 2007, 30, 1362–1369. [CrossRef] [PubMed] 32. Littner, M.; Kushida, C.A.; Anderson, W.M.; Bailey, D.; Berry, R.B.; Davila, D.G.; Hirshkowitz, M.; Kapen, S.; Kramer, M.; Loube, D.; et al. Standards of Practice Committee of the American Academy of Sleep, M. Practice parameters for the role of actigraphy in the study of sleep and circadian rhythms: An update for 2002. Sleep 2003, 26, 337–341. [CrossRef][PubMed] 33. Bhat, S.; Ferraris, A.; Gupta, D.; Mozafarian, M.; DeBari, V.A.; Gushway-Henry, N.; Gowda, S.P.; Polos, P.G.; Rubinstein, M.; Seidu, H.; et al. Is There a Clinical Role For Sleep Apps? Comparison of Sleep Cycle Detection by a Smartphone Application to Polysomnography. J. Clin. Sleep Med. 2015, 11, 709–715. [CrossRef][PubMed] 34. de Zambotti, M.; Cellini, N.; Goldstone, A.; Colrain, I.M.; Baker, F.C. Wearable Sleep Technology in Clinical and Research Settings. Med. Sci. Sports Exerc. 2019, 51, 1538–1557. [CrossRef] 35. Liang, Z.; Chapa-Martell, M.A. Accuracy of Wristbands in Measuring Sleep Stage Transitions and the Effect of User-Specific Factors. JMIR Mhealth Uhealth 2019, 7, e13384. [CrossRef] 36. Liang, Z.; Martell, M.A.C. Validity of consumer activity wristbands and wearable EEG for measuring overall sleep parameters and sleep structure in free-living conditions. J. Healthcare Inform. Res. 2018, 2, 152–178. [CrossRef] 37. Paul, M.A.; Hursh, S.R.; Love, R.J. The Importance of Validating Sleep Behavior Models for Fatigue Management Software in Military Aviation. Mil. Med. 2020.[CrossRef] 38. Devine, J.K.; Schwartz, L.P.; Hursh, S.R.; Mosher, E.; Schumacher, S.; Boyle, L.; Davis, J.E.; Smith, M.; Fitzgibbons, S. Trends in Strategic Napping in Surgical Residents by Gender, Postgraduate Year, Work Schedule, and Clinical Rotation. J. Surg. Educ. 2020. [CrossRef]