<<

Unintrusive Eating Recognition using Glass

Shah Atiqur Rahman, Christopher Merck, Yuxiao Huang, Samantha Kleinberg Stevens Institute of Technology Hoboken, NJ {srahman1,cmerck,yhuang23,samantha.kleinberg}@stevens.edu

Abstract—Activity recognition has many health applications, from helping individuals track meals and exercise to providing treatment reminders to people with chronic illness and improving closed-loop control of diabetes. While eating is one of the most fundamental health-related activities, it has proven difficult to recognize accurately and unobtrusively. Body-worn and environ- mental sensors lack the needed specificity, while acoustic and sensors worn around the neck may be intrusive and (a) (b) (c) uncomfortable. We propose a new approach to identifying eating Fig. 1: Device and sample application: (a) , (b) based on head movement data from Google Glass. We develop action recognition and (c) reminder in response to action. the Glass Eating and Motion (GLEAM) dataset using sensor data collected from 38 participants conducting a series of activities including eating. We demonstrate that head movement data are sufficient to allow recognition of eating with high precision and minimal impact on and comfort. We propose that continuously collected data from unin- trusive head-mounted sensors can be used to recognize eat- ing and other activities. Our pilot Glass Eating and Motion (GLEAM) dataset shows that the sensors in Google Glass I.INTRODUCTION (e.g. accelerometer, ) are sensitive enough for this Chronic diseases such as obesity and diabetes affect an purpose. With a combination of machine learning methods, we increasing portion of the population and require new patient- achieve high accuracy for eating detection. While we focus on centered management strategies as patients and their caregivers the use of Glass because of its flexibility for prototyping and have the primary day-to-day management responsibility. Key the possibility of sending feedback to users as shown in figure problems include providing patient-centered decision support 1, in the future other head-mounted sensors (e.g. earbuds) may (e.g. adjusting insulin doses for people with diabetes), improv- be developed when visual feedback is not needed. ing medication adherence, and creating logs to be reviewed with clinicians. Many of these needs center on eating: nutrition II.RELATED WORK is critical to managing blood glucose, many medications are Much prior work has been done on activity recognition, taken with food or on an empty stomach, and nutrition logs for in particular on recognizing locomotion using data from ac- obesity and other diseases have low adherence and accuracy. celerometers in cellphones or body-worn sensors [11]. Less Activity recognition has previously been used for appli- work has been done on detecting eating, with the main cations such as predicting falls in the elderly using mobile approaches using audio or image data, environmental sensors phone , identifying running to track exercise, such as RFID, or sensors placed around the throat. The use and understanding movement patterns through a home. While of acoustic sensors (to detect chewing sounds) ([9], [2]) or eating has been less studied, it is critical for three areas of cameras [12] raises major privacy concerns and may face health. Eating detection will enable personal informatics to challenges due to lighting or background noise, while environ- automate meal logging and give feedback to users. Second, mental sensors [3] have limited generalizability as they depend it can improve the interaction between individuals and their on a controlled and tagged environment. Finally, the use of environment [4]. For example, an application can detect a sensors to detect swallowing [1] requires these devices to be user is eating and silence her phone to avoid interruption. placed around the throat, which can be uncomfortable. Finally, there is a strong connection between eating and health. Ishimaru et. al. have investigated head movement in con- People with chronic disease, a rapidly growing portion of the junction with detection using Google Glass [6], but did population, may particularly benefit due to the importance of not apply the technique to eating detection. They also tested self-management in the treatment of these diseases. Eating de- electrooculography for discriminating several activities tection can support healthy individuals and those with chronic including eating [7], but this study was limited by a small and acute disease by logging and rewarding activity (through sample size (2 participants). gamification mechanisms), providing contextual medication reminders (as many medications must be taken with food or III.DATA COLLECTION on an empty stomach), or prompting a person with diabetes to measure their blood glucose. Systems that provide frequent To test our hypothesis that head movement can be used to feedback can improve treatment adherence, but this requires recognize eating, we developed the GLEAM dataset, using data accurate systems that do not intrude on daily life. collected from 38 participants wearing Glass while conducting Glass application

Sensor Data

Feature Extraction

Classification Recognized activity Fig. 2: Demographic information.

Annotation

a series of activities in a controlled environment. Activity times were annotated by two of the researchers during data collection. Participants provided informed consent and the Android application 1 protocol was approved by the Stevens IRB. Fig. 3: Overview of activity recognition.

A. Procedure A1 A2 # p-value Our main goal is detecting eating, but this is a small portion Onset 0.75 1.07 75 0.6944 of a person’s daily activities. Thus, our protocol involved 2 Offset 0.79 1.12 75 0.2955 hours of data collection spanning eating, brief walks and other Eat 0.56 0.76 62 0.2612 Drink 1.14 1.32 22 0.7473 activities. To ensure that the eating data were representative of usual behavior, participants brought their own meals (usually Walk 0.50 1.50 22 0.3577 lunch) to our lab space. Meals included pasta, bagels, pizza, Talk 0.91 1.27 44 0.2147 sandwiches, sushi, yogurt, salad, and nuts. TABLE I: Mean annotation errors (in seconds) for two anno- Participants talked with the researcher, ate a meal in two tators (A1, A2). P-values are based on a paired t-test. parts (to yield two onset times) with a 5 minute break in- between, walked down and up stairs and across a hallway, drank a beverage, and performed other activities of their choice C. Participants (e.g. reading, working on a computer) until 2 hours had Sensors and Hardware elapsed. The primary activity category was “other.” Not all Data were collected from 21 male and 17 female partici- participants performed the activities in the same order and they pants, aged 18-21 (median 20), recruited from the University were permitted to perform multiple activities simultaneously. and surrounding area. We excluded individuals with prior Lasik surgery (based on Google’s Glass guidance) and those with Participants wore Glass for the duration of data collection difficulty chewing or swallowing. All participants completed but did not interact with the device to avoid biasing the data the full 2 hours of data collection. toward gestures like swipes and taps. Instead, Glass’s sensors recorded movement and a researcher annotated activity start Implements used for eating, drinking, and sitting are shown and end times on a separate device. in figure 2. Additionally, 3 participants wore Glass atop their own prescription glasses. Pictures of food were taken prior to eating, after half the meal was completed, and at the end to B. Sensors enable analysis of accuracy by food type and meal size.

Data was collected from participants wearing Google D. Annotation Glass, which has a similar form factor to glasses but with a display and no lenses. Along one arm (on the right side of the Researchers observing the participants annotated the start head), as shown in figure 1a, Glass contains several sensors and and end times of all activities using an Android app we devel- computing hardware similar to that of a cellphone. The sensors oped for the Galaxy cellphone, as shown in figure 3. include: accelerometer, gyroscope, , and light The clocks of Glass and the cellphone were synchronized. sensor. The Glass API allows reading these sensors directly Activity onsets were defined as follows. Eating begins when and provides processed values for gravity, linear acceleration, food is first placed in the mouth, talking begins when a word and a rotation vector (a quaternion representing the device’s is spoken, drinking begins when a beverage or straw is placed orientation). We developed Glassware that collects data from in the mouth, and walking or navigating stairs begins with all sensors and processed sensor values except light with a the first footfall. If a participant performed several activities at median sampling period of 395 ms. The sampling rate was once, only the dominant activity was annotated. Thus the end chosen as a trade-off between time-granularity and battery life. of one activity is the beginning of another. To ensure the quality of annotation, we conducted pre- 1Data are available at: http://www.skleinberg.org/data.html liminary tests of inter-rater reliability. Two research group h enerri .7ad11 eod o noaos1 annotators but for offset, significant for statistically seconds were than differences ( 1.10 lower no t-test, are and paired errors a using activities, 0.77 Onset all respectively. the is Across 2 of I. and error each table mean in for shown truth) the are ground activities as main each (treated four between video seconds in and and error annotator two absolute onsets by mean combined annotated The independently (150 researchers. then activities and were 75 walked videos of drank, The offsets). total ate, a they with as talked, recorded video were members h muto riigdt o ie atcpn n the and ap- and cross-validation participant out data: training given three participant more progressively a tested use which between for we proaches data relationship performance, training on the classifier of evaluate training amount To by the data. improved individual be some will performance classification Cross-Validation 4. B. figure in and shown ground-truth is the with labels along in activity data class raw frequent predicted of most Each example the axes. An 3 with window. and labeled delta-delta sensors is and 6 vector of (delta feature each temporal for 2 cen- features, and [8]) (spectral ), coefficient roll-off spectral and 3 flux, variance, crest-factor), recognition troid, intensity, and (mean, activity kurtosis statistical for skewness, 5 useful specifically [13]), proven ([2], features of consisting steps). (0 second offsets 10 window with different performance, seconds 6 50 classification on the through based on explore features short To boundaries generated but detection. we window movements onset of activity head be effect granular periodic to for chosen capture enough length to window enough the long with windows, overlapping Extraction Feature A. classifier. the training and data sensor the from features tracting h edotoe nLP ahpriiatsdt sdivided is data on participant’s it each testing LHPO and In participants one. 37 out from held data on the classifier a ing > p u ohtrgniybtenpriiat,w expect we participants, between heterogeneity to Due vector 180-element a generate we window each From non- minute-long into divided was data sensor raw The ex- are process recognition activity the of steps key The OOhs3 rs-aiainfls ahcetdb train- by created each folds, cross-validation 38 has LOPO i.4 ciiyanttos ceeoee -xsdt,adetn eetdb OOR lsie o n participant. one for classifier RF LOWO by detected eating and data, x-axis accelerometer annotations, Activity 4: Fig. ev n idwout window one leave

0 Eating Raw Ground .

2147 Detection Data Truth 0 nalcases). all in eat (LOPO), V E IV. talk ATING ev afpriiatout participant half leave 20 (LOWO). R ECOGNITION eat 40 ev one leave (LHPO), talk Time (min) 60 as eaie a culyb u opue ewe bites between pauses the meals, to report entire we due of Therefore be courses. times actually or end may and negatives start false indicate annotations because Furthermore, our and precision. events, value which logged particularly and positives, we logging, alerts therefore false spurious annoying meal to in result sensitive appropriate. automated would are most (e.g. reminders) is are applications medication precision, 72.43% and envisioned while recall Our eating, of of importance tive an are imbalance so samples class time), strong homework/free of a 11.57% has data (only Our dependent. application Metric Performance parameters. D. default with [5] WEKA in performed was implementations selection Gain using feature Information and classification an All using 10 criterion. selected the non- on features, of only informative presence classifiers most the these trained by we impacted features, negatively informative of are performance kNN the their and As to NB due features. trees non-informative 10 to with C4.5 (RF) robustness evaluated Forest also Random We and work. tree related decision Gaussian in prevalence evaluated their to we due and (NB) homework), Bayes and Naive reading (primarily 1- single Classification a C. for classifier. the except Finally evaluating training for participant. used for remaining then used is the which is window, participants from minute data 37 all data from LOWO, data the on in of each trained In is half half. classifier in and a activity folds, labeled 76 each splitting of by parts two into oetn a eetdi atcpns nldn w who two including participants, 9 in detected was eating No to found not ( of was ANOVA performance offset impact Window RF significantly LOWO best. the performing with classifier performance, training more improved per- and consistently selection, methods feature data classification implicit to Tree-based due III. well formed table matrices in confusion shown Corresponding II. are table in given is approach work. Under related Area the with as comparison well for as (AUC) recall, Curve than strongly more precision The and data is metric performance of choice appropriate The other and eating between classification the on Focusing eut aidbtenpriiat,a hw nfiue5. figure in shown as participants, between varied Results F 0 . 5 F 0 cr o ahcasfiradcross-validation and classifier each for score . 5 .E V. 80 cr o OORF). LOWO for score homework XPERIMENTAL k naetniho kN with (kNN) neighbor -nearest F β cr,where score, 100 p F R 0 0 = . ESULTS 5 cr,wihweights which score, . 92 β sn one-way using otosrela- controls k 1 = Classifier Metric LOPO LHPO LOWO 100 F 31.27% 33.79% 37.06% k-NN 0.5 80 AUC 0.638 0.644 0.652 F 19.71% 20.16% 19.19% NB 0.5 60 AUC 0.786 0.804 0.779

recall F 41.95% 50.0% 53.53% 40 C4.5 0.5 AUC 0.639 0.611 0.677 20 F 49.73% 58.77% 67.55% RF 0.5 AUC 0.858 0.884 0.922 0

0 20 40 60 80 100 TABLE II: F0.5 score and AUC for different classifiers and precision different cross validation approaches. Fig. 5: Performance of LOWO RF classifier for eating class across 38 participants. The marker areas are proportional to k-NN NB C4.5 RF number of participants with identical performance. E O E O E O E O E 213 325 495 43 273 265 244 294 O 371 3708 2595 1484 230 3849 73 4006 frequently adjusted their prescription glasses, one who moved TABLE III: Confusion matrices for eat (E) and other (O) for abruptly in a rolling chair, and one who ate only a small meal LOWO. Rows are actual and columns classified label. of pudding and cookies. Shorter meals (less than 15 minutes) were less likely to be recognized. However, for 11 participants 100% precision was achieved. The results are robust with ACKNOWLEDGMENTS respect to the objects used for eating, with no significant difference in LOWO RF F0.5 score (p = 0.80 using one-way We thank Jason Gardella and Mark Mirtchouk for their ANOVA). Similarly, sitting on a rolling versus fixed chair had assistance. Data collection was supported in part by the Center no significant effect (p = 0.52 using unpaired t-test). for Healthcare Innovation at Stevens. SK was supported in part by NSF Award #1347119. The performance achieved here is better than that reported by other approaches to eating detection, showing the value of REFERENCES using head movement for this purpose. For example, Logan [1] O. Amft and G. Troster, “On-Body Sensing Solutions for Automatic et al. [10] used more than 900 environmental sensors tagged Dietary Monitoring,” IEEE Pervasive Computing, vol. 8, no. 2, 2009. with objects and achieved AUC of 0.587 for eating recognition [2] Y. Bi, W. Xu, N. Guan, Y. Wei, and W. Yi, “Pervasive eating habits whereas our AUC is 0.92 for RF. Finally, the more intrusive monitoring and recognition through a wearable acoustic sensor,” in audio visual data of Liu et al. [9] suffered from a high false Pervasive Health, 2014. positive rate: 13.07% of “other” class detected as eating in [3] M. Buettner, R. Prasad, M. Philipose, and D. Wetherall, “Recognizing contrast to only 1.79% in our study. Daily Activities with RFID-based Sensors,” in UbiComp, 2009. [4] J. Garc´ıa-Rodr´ıguez and J. M. Garc´ıa-Chamizo, “Surveillance and The performance of our LOWO RF classifier relies on Human-computer Interaction Applications of Self-growing Models,” user-specific training data. Although in our study participants Appl. Soft Comput., vol. 11, no. 7, 2011. did not interact with Glass, future studies may evaluate an [5] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. active learning approach where the Glass app prompts users Witten, “The WEKA Data Mining : An Update,” SIGKDD for feedback to gather the required training data so that eating Explor. Newsl., vol. 11, no. 1, 2009. detection may be iteratively improved. [6] S. Ishimaru, K. Kunze, and K. Kise et al., “In the blink of an eye: Combining head motion and eye blink frequency for activity recognition with google glass,” in Augmented Human, 2014, pp. 15:1–15:4. VI.CONCLUSION [7] S. Ishimaru, Y. Uema, K. Kunze, K. Kise, K. Tanaka, and M. Inami, While eating recognition has lagged behind that of other “Smarter eyewear: using commercial eog glasses for activity recogni- tion,” in UbiComp Adjunct, 2014, pp. 239–242. activities, newly developed head-mounted sensors have made [8] D.-S. Kim, S.-Y. Lee, and R. Kil, “Auditory processing of speech signals this feasible with high accuracy in an unobtrusive and privacy- for robust in real-world noisy environments,” IEEE preserving manner. We propose a new approach to recognizing Transactions on Speech and Audio Processing, vol. 7, no. 1, 1999. eating from head movement that can facilitate automated [9] J. Liu, E. Johns, L. Atallah, C. Pettitt, B. Lo, G. Frost, and G.-Z. Yang, nutrition logs, smart medication reminders, and individualized “An intelligent food-intake monitoring system using wearable sensors,” chronic disease management. The data collected are a realistic in Proc. BSN, 2012. sample of eating and drinking as they contain significant [10] B. Logan, J. Healey, M. Philipose, E. M. Tapia, and S. Intille, “A natural variation, such as participants doing multiple activ- long-term evaluation of sensing modalities for activity recognition,” in UbiComp, 2007. ities at once and eating their own foods with their chosen [11] T. Plotz,¨ N. Y. Hammerla, and P. Olivier, “Feature Learning for Activity utensils. Even without developing algorithms specifically for Recognition in ,” in IJCAI, 2011. F this purpose, our study achieved a higher 0.5 (67.55%) for [12] E. Thomaz, A. Parnami, J. Bidwell, I. Essa, and G. D. Abowd, eating recognition than the best reported methods, and we “Technological Approaches for Addressing Privacy Concerns when demonstrated robustness to utensils used. Future work may Recognizing Eating Behaviors with Wearable Cameras,” in UbiComp, focus on active learning to improve performance over time, 2013. determining activity onset time and duration, and using the [13] K. Yatani and K. N. Truong, “Bodyscope: a wearable acoustic sensor meal photographs to estimate meal size and type. for activity recognition,” in UbiComp, 2012, pp. 341–350.