Individual-level Anxiety Detection and Prediction from Longitudinal YouTube and Search Engagement Logs

Anis Zaman Boyu Zhang Henry Kautz University of Rochester University of Rochester University of Rochester [email protected] [email protected] [email protected] Vincent Silenzio Ehsan Hoque Rutgers University University of Rochester [email protected] [email protected] ABSTRACT KEYWORDS Anxiety disorder is one of the world’s most prevalent mental health anxiety, mental health, prediction, history, YouTube conditions, arising from complex interactions of biological and en- history vironmental factors and severely interfering one’s ability to lead ACM Reference Format: normal life activities. Current methods for detecting anxiety heav- Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque. ily rely on in-person interviews, which can be expensive, time- 2018. Individual-level Anxiety Detection and Prediction from Longitudinal consuming, and blocked by social stigmas. In this work, we propose YouTube and Google Search Engagement Logs. In Woodstock ’18: ACM an alternative method to identify individuals with anxiety and fur- Symposium on Neural Gaze Detection, June 03–05, 2018, Woodstock, NY. ACM, ther estimate their levels of anxiety using personal online activity New York, NY, USA, 10 pages. https://doi.org/10.1145/1122445.1122456 histories from YouTube and the Google Search engine, platforms that are used by millions of people daily. We ran a longitudinal 1 INTRODUCTION study and collected multiple rounds of anonymized YouTube and According to the World Health Organization (WHO), 1 in 13 people Google Search logs from volunteering participants, along with their suffers from anxiety globally, making it one of the most prevalent clinically validated ground-truth anxiety assessment scores. We mental health concerns. In the United States, it is the second lead- then developed explainable features that capture both the temporal ing cause of disability among all psychiatric disorders [67]. Nearly and contextual aspects of online behaviors. Using those, we were 40 million people (age 18 and older) experienced anxiety disorder able to train models that (i) identify individuals having anxiety in any given year, yet only 35.9% of those suffered received treat- ± disorder with an average F1 score of 0.83 0.09 and (ii) assess ments1. A study in 2017 reported that the level of anxiety among the level of anxiety by predicting the gold standard Generalized young adolescents has been gradually increasing in recent years [7]. Anxiety Disorder 7-item scores (ranges from 0 to 21) with a mean The population most vulnerable to anxiety disorder is the stu- ± square error of 1.87 0.15 based on the ubiquitous individual-level dents in high school and early college years. A report by the Amer- online engagement data. Our proposed anxiety assessment frame- ican College Health Association in 2018 stated that 63% of college work is cost-effective, time-saving, scalable, and opens the door students in the U.S. felt overwhelming anxiety during the last 12 for it to be deployed in real-world clinical settings, empowering months, and only 23% of these students were either diagnosed or care providers and therapists to learn about anxiety disorders of treated for an anxiety disorder by a professional mental healthcare patients non-invasively at any moment in time. provider [3]. During the early days of college, students are separated from their traditional support system and find themselves in chal- CCS CONCEPTS lenging social and academic settings such as living with roommates, developing independent identities, making new friends, managing • Information systems → Web search engines; • Applied com- heavy workloads, etc. All these experiences induce spikes in anxiety arXiv:2007.00613v2 [cs.HC] 30 Nov 2020 puting → Health informatics; Psychology; • Human-centered from time to time [48], and this psychological distress increases computing → Empirical studies in HCI. during the first few semesters of college10 [ ]. Furthermore, it has been reported that anxiety disorders are significantly associated with other medical and psychiatric comorbidities [14]. Despite such a high prevalence of anxiety among young adolescents, current

Permission to make digital or hard copies of all or part of this work for personal or methods for detecting anxiety disorders consist of self-assessment classroom use is granted without fee provided that copies are not made or distributed surveys and in-person interviews, which can be time-consuming, for profit or commercial advantage and that copies bear this notice and the full citation expensive, lack precision, and hampered by factors such as fear, on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, concealing information, and social stigma related to the mental to post on servers or to redistribute to lists, requires prior specific permission and/or a health issue. fee. Request permissions from [email protected]. Engagements in online platforms are major components in the Woodstock ’18, June 03–05, 2018, Woodstock, NY © 2018 Association for Computing Machinery. lives of young adults [29]. On average, an internet user spent ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00 https://doi.org/10.1145/1122445.1122456 1https://adaa.org/understanding-anxiety Woodstock ’18, June 03–05, 2018, Woodstock, NY Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque the equivalent of more than 100 days online during the last 12 2 RELATED WORK months [63]. It has been reported that 81% of U.S internet users Public social media, blogs, and forums have become popular data 2 aging between 15 to 25 use YouTube regularly. Besides, an average sources for researchers to study the prevalence of mental health internet user uses Google Search at least once a day, and many conditions. [57] showed that the usage of social media sites cor- 3 search dozens of times a day . Extensive studies have been done relates with user depression and anxiety. Tweets, one of the most trying to correlate mental health issues with popular public social explored social media platform, has been used to detect insom- media data such as Facebook [6, 40, 42] and Twitter [9, 11, 13, 23], nia [27], suicidal ideations [18], depressed individuals [17], the yet they may fail to cover people who interact infrequently with extent of depression [64], and languages related to depression and social media or post false positive impressions publicly [21]. In PTSD [12, 46, 49, 50]. Besides, [16] have shown that Facebook status contrast, individual-level search and YouTube logs are ubiquitous can be used to predict postpartum depression and monitor depres- and private for each user and are less likely to be subject to self- sion [56]. Other researchers have leveraged data from Reddit to censorship. A group of researchers has shown that search logs can study mental distress among adolescents [5]. [18] identified shifts be used as a proxy for detecting mental health issues [1, 28, 71]. in language may indicate future suicidal ideations. De Choudhury We draw inspirations from these prior works and hypothesize that et al. provides a comprehensive overview of the role of social media private Google Search engine logs and YouTube histories can leave in mental health researches [15] and evaluation methodologies [9]. a detailed digital trace of the mental health states of users and be Social media users constitutes only a fraction of the general popu- used as a proxy to assess the level of anxiety for individuals. lation, and a small number of them, with particular personalities In this work, we propose a framework that leverages individual- or demographics, typically acts out in social media that may re- level online activities logs, in particular, Google Search and YouTube veal signs of mental health struggles. Hence, findings based on activity histories, to identify individuals with anxiety disorder and social media platforms may not generalize to the majority of the further predict their level of anxiety. We ran a longitudinal study population. to gather two rounds of data, with 5 months in-between, from a A large number of researchers have leveraged sensors, such as college population. During each round, participants shared their smartphone and mobile apps, that are embedded in our daily life anonymized online activity histories along with their answers to a experiences to capture various aspects of mental health [38, 65]. For clinically validated questionnaire for measuring Generalized Anxi- example, sensor data has been used for studies of anxiety [19, 47, 51], ety Disorder (GAD-7) [59]. We then developed an explainable low- stress [39, 55], moods [33, 34], and depression [53, 54, 66]. Several dimensional vector representation that captures different aspects of research groups have developed applications to help users man- one’s online behaviors, including temporal activity patterns, time age stress and anxiety [35, 45] and evoke positive emotions [2]. and semantic diversities, and periods of inactivity. Using these fea- However, smartphone applications for tackling mental health is- ture representations, we trained models that can accurately detect sues have several limitations: (i) not every mental health patient and predict one’s level of anxiety from online activities. Unlike [71] have access to smartphones; (ii) any interventions delivered via who merely focused on mental health issue detection such as self- apps is less likely to be as effective as face-to-face sessions with esteem from Google Search histories, our data incorporates both a therapists [72]; (iii) the app may fail and require developers to Google Search as well as YouTube activities history, and our two constantly keep it updated, which is costly and not sustainable. rounds of data facilitate both the detection and prediction tasks. Fur- One data source that can capture in-the-moment thoughts and thermore, we conduct our experiment with a framework that fits feelings of a broad range of people are search engine logs, which possible real-world applications. We envision our work as an impor- may fill in the gap for continuous monitoring applications38 [ ]. tant step towards helping caregivers better understand and engage Researchers have used population-level search engine logs from with their patients without additional burden through passive data Google Trends to monitor depression and suicide-related behav- and ubiquitous computing. iors [24, 36, 60, 70], identify seasonality in seeking mental health In summary, this work is unique in that (i) we are the first to information [4], and show heavy usages for screening diseases [43] run a longitudinal study where individual-level Google Search and such as pancreatic cancer [44]. A comprehensive review of the us- YouTube histories along with gold-standard clinically validated age of Google Trends in the healthcare domain has been provided anxiety assessment are gathered; (ii) we define explainable features by [41]. A crucial difference between these previous works and ours that capture both the semantic and temporal aspects of online ac- is that we aim to accurately predict the mental health of particular tivities, including a novel representation of periods of activity and individuals, not general populations. Unlike population-level online inactivity based on temporal point processes; (iii) using these fea- engagement logs in Google Trends, our individual-level activity tures, we managed to both detect and predict the anxiety disorder logs are more likely to fit the fabric of one’s daily life experience. of an individual with high performances, showing that ubiquitous private online logs contain strong signals that can potentially be 3 DATA a proxy to assess mental health issues; and (iv) our pioneered two The longitudinal data collected for this work consisted of individual- rounds of data and light-weight experiment setup has strong soci- level Google Search logs, YouTube history, and clinical survey re- etal implications and can empower care providers to estimate the sponses that are very personal and sensitive in . Similar anxiety levels of patients remotely through a non-invasive manner. to [71], we leveraged a cloud-based data collection process using Google Takeout4, a web interface that enables Google product 2https://www.statista.com/statistics/296227/us-youtube-reach-age-gender/ 3https://bit.ly/382vgWD 4http://takeout.google.com/ Individual-level Anxiety Detection and Prediction from Longitudinal YouTube and Google Search Engagement Logs Woodstock ’18, June 03–05, 2018, Woodstock, NY

Figure 1: Obtaining data from an individual users to export their Google Search and YouTube activity histories. Our cloud-based data collection pipeline (see Figure 1) has been thoroughly vetted by the Institutional Review Board (IRB) of our institution in order to ensure the privacy and safety of subjects. 3.1 Study Recruitment Procedure Figure 2: Timeline and participants for two rounds of data The study ran for 5 months starting in August, 2019. Participation collection. There are in total 104 unique individuals partic- was voluntary, and one needed to be at least 18-year-old and have a ipated in the study, and 72 of them participated in both to qualify for the study. The recruitment procedure the first-round and the follow-up. was designed as an one-on-one interview. During the recruitment, participants answered the 7-item Generalized Anxiety Disorder questionnaire, a clinically validated tool for assessing anxiety disor- der, in addition to their GPA, gender, and demographics. Following that, participants signed in to with their Google accounts and initiated the Google Search and YouTube activity history data download process. Before the data was shared with the research team, all sensitive information such as name, email, phone number, social security, and financial information (banking and credit card) was redacted and anonymized using Google’s Data Loss Prevention (DLP) API [30, 31]. In total, we collected two rounds of data. The recruitment pro- Figure 3: Study population breakdown: (a) Demographics of cedure above was performed during each round. In August 2019, the participants. (b) Distributions of subjects with/without 104 qualified college college students participated in the first round. anxiety conditions during the first and the follow-up rounds, For the rest of the paper, we will refer this round of data as the computed based on the survey response via the GAD-7 ques- first-round data. tionnaire. Five months later, we invited all 104 participants from first-round for follow-up and were able to follow up with 72 individuals. We are treated as cutoffs for mild, moderate, and severe anxiety levels, collected their Google and YouTube activity histories again, along respectively. Further follow-up and evaluation are recommended with the survey responses for the second time. For the rest of the for someone with anxiety score greater than 9 [69], and we used paper, we will refer to data collected in the second round as the the recommended score of 9 as a cutoff to label individuals with follow-up data. Therefore, there are in total 72 people participated anxiety disorder. In this work, any individual with GAD-7 score > 9 in both rounds and 104 − 72 = 32 people participated only in is labelled as Anxious, and someone with score ≤ 9 is labelled as the first-round. The overall recruitment timeline and participant Not-anxious. Figure 3(b) shows the breakdown after the anxiety cut- statistics are shown in Figure 2. All participants were compensated off. Figure 4 shows the distribution and changes of anxiety scores with $10 Amazon gift cards at the beginning during each round for all the participants who participated in both the first-round and of participation. About 34% of our participants are male and 65% the follow-up. We observed that the anxiety score increased for 22 female. Figure 3(a) presents a comprehensive breakdown of the individuals, decreased for 32 people, and remain unchanged for 18 demographics of the study population. participants. It is worth noticing that, 9 participants had a change 3.2 Ground Truth via Survey in GAD-7 score which is clinically significant (the absolute value of the change ≥ 5) during the 5 months of study. The ground truth about one’s anxiety disorder was measured us- ing the Generalized Anxiety Disorder (GAD-7) [59], a clinically 3.3 YouTube & Google Search History validated questionnaire (7 questions5) which has been reported For this study, we collect individual-level online engagement logs to be quite accurate in accessing the severity of anxiety [61]. The from YouTube and Google Search engine using the Google Takeout questions in GAD-7 were prefixed with a text for the temporal interface. Google ties all online activities using the Google account context, for example, Over the last six months, how often have you associated to the user. The Takeout platform aggregates user en- been bothered by the following problems? The responses were com- gagement logs from all different sources and makes it available for piled to compute an anxiety score. The 21 points scale GAD-7 is a easy accessibility. This means that as long as someone is logged into commonly used in clinical diagnosis where score of 5, 10, and 15 his/her/their Google account, all engagements are recorded and 5https://www.mdcalc.com/gad-7-general-anxiety-disorder-7 unified under the single Google account regardless of which device Woodstock ’18, June 03–05, 2018, Woodstock, NY Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque

Figure 5: Example online activities distribution from a par- Figure 4: GAD-7 scores during the first-round and follow-up. ticipant over a week, including both Google Search and Red lines represent an increase, and green lines represent YouTube activities. Each row is a day, and each ‘|’ bar rep- unchanged or decrease in anxiety scores. Multiple lines orig- resents a single online activity. The histogram on the right inating from one score means that there are more than one side show the total online activities for each day. Notice the person having that anxiety score. burstiness of daily online activities. was used. For every person, the online activity history spanned number decreases. One possible explanation for drops in activities (on average) over 5.7 years. In total, 1,966,400 Google searches and during weekends can be that people are probably spending less 1,055,847 YouTube interactions were made by all the participants. time interacting on internet and more time relaxing, socializing, Every engagement on YouTube and Google Search engine is and connecting with people around them. Notice that each of the timestamped along with the information whether it is the result following feature is a scalar and is calculated for each individual of watching or searching. For YouTube activity logs, we use the participant. In total, we explored five types of features, and each YouTube API to extract meta-data about the videos that has been has a number of variants, as described below. watched, which includes the title, category, video length, rating, number of likes, number of dislikes, etc. Any video living in the 4.1 Activity Mean and Variance YouTube ecosystem has an associated category label to it, and this enables us to get more context about the video. For Google Search We define the activity mean and variance to measure the overall activities, we label every search query text using the content classi- distribution of an individual’s online interactions on YouTube and fication feature of the Google Cloud NLP API 6. Given a query, the Google Search engine. We calculate the daily and weekly mean and API returns one or more possible category labels for the text along variance of number of activities on YouTube and Google Search with a confidence score. When applicable, we select the category for each participant separately, and take the normalized log of the label with the highest confidence. The API returns a hierarchical mean and variance for numerical stability. label for every query, and we consider the root level in the hierarchy 퐶 푇 as the category label for the query. For instance, for a query 푞, if 4.2 Category ( 퐻 ) & Time ( 퐻 ) Entropy the label from the API is “/News/Sports”, we consider “News” as Every online activity has two components associated with it, namely the category for 푞. The comprehensive lists of all the categories for its category and the timestamp of its occurrence. Drawing inspi- both search queries and YouTube videos are listed in [22] and [62]. ration from information theory [58], we define category entropy, 퐶퐻 , as a measure of how diverse an individual’s online activities 4 FEATURE EXTRACTION FROM ONLINE are in terms of the semantic context. For an individual 푝, based on DATA his/her/their online data, we compute the category entropy in the In this section, we explain how we extracted explainable features following way: 푚 from online history logs for each participant. Individual-level online ∑︁ 퐻 (퐶푎푡푒푔표푟푦) = − 푃 × log(푃 ) (1) engagement logs from YouTube and Google Search engine provides 푝 푖 푖 푖=1 an unique opportunity to capture what may be going through one’s where 푚 is the number of distinct categories in the online activities mind at any given time. Since online activities are timestamped, of 푝, and 푃 is the percentage of activities that belong to category one can investigate the weekday/weekend activity frequency and 푖 푖. A high entropy indicates that 푝 interacts more uniformly across variance, calculate the contextual and temporal variability of these different categories online, whereas lower entropy indicates larger activities, and estimate daily sleeping/resting duration, etc. For inequality in the number of online activities across the categories. example, Figure 5 demonstrates the distribution of activities on Considering that individuals may have different habits during week- YouTube and Google Search engine over a week for a specific indi- days and weekends, we also calculated the category entropy for vidual in our dataset. We observe the bursty nature of incidences weekdays and weekends separately. We include the total, weekday, of these activities which we will leverage to construct features and weekend category entropy as features for each individual. We later in the section. On aggregating daily activities we found that 푤푒푒푘푑푎푦 푤푒푒푘푒푛푑 푡표푡푎푙 there are higher number of interactions on these two platforms denote them as 퐶퐻 , 퐶퐻 , and 퐶퐻 . at the beginning of the week, and, as the week progresses, the Similarly, we define time entropy, 푇퐻 , as a measure of how di- verse an individual’s online activities are in terms of when it hap- 6https://cloud.google.com/natural-language/docs/classifying-text pens. We define the discrete bins for time entropy asthe 24 hours Individual-level Anxiety Detection and Prediction from Longitudinal YouTube and Google Search Engagement Logs Woodstock ’18, June 03–05, 2018, Woodstock, NY of a day. For a person 푝, time entropy is computed as below: when they happened most frequently. Specifically, for all inactivity 24 periods longer than 푘 hours, we first calculate the midpoint times- ∑︁ 퐻푝 (푇푖푚푒) = − 푃푖 × log(푃푖 ) (2) tamp for each of them. For example, for an 8-hour inactivity period 푖=1 starting at 11 P.M. and ending at 7 A.M., the midpoint is 3 A.M. where the summation is taken over the 24 hour marks, and 푃푖 We found that, for all our participants and 푘 ∈ {8, 9, 10}, all the is the percentage of activities that happen during hour 푖. A high midpoint modes fall in-between 5 to 8 A.M., which are most likely entropy indicates that 푝 interacts with YouTube and Google Search to be the middle of sleeping periods. Notice that, for the inactivity engine more uniformly across different times of a day, whereas defined here, we are focusing on when it occurs most frequently lower entropy indicates larger inequalities of numbers of online for each individual. Hence, it is not suitable to take the mean and activities between different hours in a day. Similar to Category variance of inactivity midpoints. We included the modes of mid- Entropy, we obtain the time entropy for weekdays and weekends points for thresholds 푘 ∈ {8, 9, 10} for each individual as features. 푤푒푒푘푑푎푦 푤푒푒푘푒푛푑 푡표푡푎푙 We denote, for threshold 푘 ∈ {8, 9, 10}, the inactivity mode features separately. We denote them as 푇퐻 , 푇퐻 , and 푇퐻 . as I8, I9, and I10. 4.3 Online Activities Temporality {훾, 훼, 훽} Overall, we developed 16 features (including variants) form the online activities (YouTube and Google Search engine) of each in- We observed that there is a bursty nature of online activities when dividual: 4 from Activity Mean & Variance; 3 from each of the plotted on the time axis (see Figure 5) which resulted in clusters of Category Entropy 퐶퐻 , Time Entropy 푇퐻 , Online Activities Tempo- online activities regardless of Google Searches or YouTube histories. rality {훾, 훼, 훽}, and Inactivity Periods I. In other words, we can view the incidences of online activities as a Temporal Point Process and investigate individual-level online be- 5 MODELING ANXIETY haviors from a temporal point of view, such as the Inter-event Times Following the clinical anxiety score cutoff threshold59 [ ], partic- (IETs). We enrich our temporal feature by assuming dependencies ipants with GAD-7 score > 9 were labelled as anxious subjects, between past activities and the next activity. The intuition is that and those with score ≤ 9 were labelled as non-anxious subjects. every occurrence of an online activity increases the probability Overall, there were 60 out of 104 subjects with anxiety conditions of future online activities, and the probability of the next activity in the first-round and 40 out of 72 participants with anxiety condi- decays with time. Hence, such process, called a self-exciting point tions during the follow-up. Given one’s YouTube and Google Search process, can be modeled by the Hawkes Process [26], which has activity history, we explore: (i) Can we identify individuals with been widely used for modeling online data and social media activi- anxiety condition through his/her/their online data? (ii) Can we ties at a population level [52]. Specifically, we define a univariate predict anxiety score based on online activities and past anxiety Hawkes Process with an exponential decay kernel as ∑︁ levels? 휆(푡) = 훾 + 훼훽 exp (−훽(푡 − 푡푖 )) (3) 푡푖 <푡 5.1 Notations and Definitions where 휆(푡) represents the probability (intensity) of an activity oc- The feature vectors for the first-round are extracted using the most curs at time 푡, 훾 is the background intensity of an activity happens recent 12 months of data (the grey box in Figure 2) before the exogenously, 훼 represents the infectivity factor which controls the 16 completion of the first-round survey. We denote this by 풙1 ∈ R . average number of new activities triggered by any past activity, and Unless mentioned specifically, 풙1 is the concatenation of all 16 훽 is the decay rate where 1 represents how much time has passed 훽 scalar features in Section 4 in the same order for each individual. by, on average, between the previous event and the next event. The corresponding GAD-7 scores, gathered via the survey (the By fitting the above Hawkes Process to each individual online his- green box in Figure 2) during the first-round, are denoted as 푦1. tory log, we obtain a unique set of {훾, 훼, 훽} for each participant as Similarly, for the follow-up round, the feature vectors are extracted features. We keep the notations as {훾, 훼, 훽} for this set of features. solely from the 5 months of online history data (the blue box in Figure 2) in-between the first-round and the follow-up, and we 16 4.4 Inactivity Period I denote it as 풙2 ∈ R . The corresponding GAD-7 scores, provided It has been reported that YouTube is becoming the modern day in the follow-up survey, (the magenta box in Figure 2), are denoted classroom for students [20] and provides new ways to consume as 푦2. Therefore, there are in total 104 (풙1,푦1) pairs from first-round contents for virtually every age groups [8]. However, spending too and 72 (풙2,푦2) pairs from follow-up (see Figure 2 & Section 3.1). much time on any platform can lead to internet addition [25], in particular the YouTube addiction [37] and the compulsive usage of 5.2 Classifying Individuals with Anxiety YouTube [32], which are quite prevalent among college population. Here, we treat the problem as a binary classification task: given These previous findings have inspired us consider feature that can the online activity history, we aim to identify if the subject be treated as a proxy to capture the time away from internet of has anxiety condition. Assuming online activity histories are each participant, and we call it the inactivity period I. independent for every person, there are 104 + 72 = 176 segments We focus on periods of time when no Google Search nor YouTube (풙1 and 풙2) of online history in total, regardless of collected from activity was performed of each individual. Given the online activity which round or whom, as observation data with respective anxiety log of a participant and a duration threshold of 푘 hours, we pick scores as labels. Formally, we are interested in 푃 (푦 | 풙), where 푦 is out all the inactive periods longer than 푘 hours and investigate the binary anxiety label from the GAD-7 scores cutoff of 9. Woodstock ’18, June 03–05, 2018, Woodstock, NY Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque

5.3 Predicting Anxiety for Individuals In this section, we consider the anxiety score prediction task: given the online data and the past anxiety level of an individual, we aim to estimate the future GAD-7 score for that individ- ual. Concretely, given the two rounds of data, we aim to predict the GAD-7 score in the follow-up round given the online history data and the GAD-7 score from the first-round of an individual. Formally, this task is regarded as a regression problem, and we are interested in 푃 (푦2 | 풙1, 풙2,푦1). Features for the regression task: for predicting anxiety scores, 푦2, in the above setup, we consider the weekday/weekend Time & 푤푒푒푘푑푎푦 푤푒푒푘푑푎푦 Category entropy {퐶 ,퐶푤푒푒푘푒푛푑,푇 ,푇 푤푒푒푘푒푛푑 }, the Figure 6: ROC curves for Random Forests to classify indi- 퐻 퐻 퐻 퐻 Temporality parameters {훾, 훼, 훽}, and the Inactivity Periods with viduals with anxiety. We carried out a stratified 5-fold cross- thresholds of 9 and 10 hours {I9, I10} as input features. Thus, for validation. The grey area represents ±1 standard deviation. 9 the rest of the section, 풙1, 풙2 ∈ 푅 for all individuals. We hypothesize that the change in online behaviors may pre- serve information about the change in anxiety level. To leverage this in the prediction task, we define the following feature vectors for the regression models: 9 Δ풙 = 풙1 − 풙2 ∈ R (4) 2×9 풙푔푝 = [휂 ⊙ 풙2, (1 − 휂) ⊙ Δ풙] ∈ R (5) 2×9+1 풙푟푒푔 = [휂 ⊙ 풙2, (1 − 휂) ⊙ Δ풙,푦1] ∈ R (6) | {z } 풙푔푝 where the square bracket indicates concatenation, 휂 ∈ [0, 1] is a hyperparameter that controls the weight on 풙 and Δ풙, and ⊙ de- Figure 7: The performance of RF and LR on the anxiety 2 notes an element-wise multiplication. 풙 is a trivial modification classification task. We carried out a stratified 5-fold cross- 푔푝 of 풙푟푒푔 by slicing out the last entry 푦1 and keeping only the online validation. The values after the ± sign represent 1 standard data features. The intuition is that Δ풙 captures the shift in online deviation. Numbers inside ( ) and < > are for LR and RF, re- behaviors between two rounds; 풙 is the most recent online obser- spectively. 2 vation in predicting 푦2; 푦1 acts as a base point of 푦2; 휂 weights the importance between Δ풙 and 풙2. We chose 휂 = 0.9 and fed the 풙 as inputs. For this anxiety We trained logistic regression (LR), linear support vector ma- 푟푒푔 prediction task, we first trained two models: Ordinary Least Squares chine (SVM), and random forest (RF) classifiers on this task and regression (OLS) and Gradient Boosting regression (GB). The GB performed stratified 5-fold cross-validations, respectively. Since the outperformed OLS significantly and achieved an average mean performances of LR and linear SVM were comparable, we report square error (MSE) of 2.29 ± 0.25 in predicting future GAD-7 scores the performance of LR. However, RF significantly outperformed the 푦 (see Figure 8). other two with an average F1 score of 0.83 ± 0.09 and ROC AUC 2 Instead of merely looking for the best prediction given by maxi- of 0.91 ± 0.06. The detailed precision, recall, and F1 scores for each mum likelihood estimations, it is crucial to assess the uncertainty class/average are reported in Figure 7. In Figure 6, we present the over the model and take a Bayesian perspective, especially given average ROC curve with standard deviations of the RF. we are working with healthcare applications with limited sample size. Moreover, it would grant much flexibility if the regression is 5.2.1 Possible Dependency between Two Rounds. There are 72 out not limited to parametric linear form but in a functional space with of 104 individuals who participated in both rounds of the study. non-linearity, investigating the distribution of functions. Therefore, Each of them has two pairs of data, (풙 ,푦 ) from the first-round 1 1 we performed the regression task with a non-parametric Bayesian and (풙 ,푦 ) from the follow-up. During the cross-validation for the 2 2 method, the Gaussian Process (GP) [68]. We define our regression classifiers, we manually ensured that any two pairs of datafrom function as 푓 (풙 ), and it follows the GP below: a same participant either both belong to the training set or both 푟푒푔  ′  belong to the testing set. We employed this to limit any personal 푓 (풙푟푒푔) ∼ GP 푚(풙푟푒푔), 푘(풙푟푒푔, 풙푟푒푔) (7) traits or online habits, on Google Search and YouTube platform, 푚(풙푟푒푔) = 푦1 (8) from getting incorporated to our supervised classifiers, i.e., 풙1 ≈ 풙2 ∥ − ′ ∥2 ! and 푦1 ≈ 푦2 for the same subject. We also experimented ordinary ′ 풙푔푝 풙푔푝 푘(풙푟푒푔, 풙 ) = 푒푥푝 − (9) cross-validation without such precaution, and we observed 1 ∼ 2% 푟푒푔 2ℓ performance increases over the metrics, indicating a small amount 푦 = 푓 (풙 ) + 휖 where 휖 ∼ N (0, 휎) (10) of potential data dependency. 2 푟푒푔 Individual-level Anxiety Detection and Prediction from Longitudinal YouTube and Google Search Engagement Logs Woodstock ’18, June 03–05, 2018, Woodstock, NY where 푚(풙푟푒푔), the mean of the GP, is a deterministic function that returns the corresponding previous anxiety score 푦1 for each sub- ject. The covariance matrix is obtained by an exponential quadratic ′ kernel 푘 over all pairs of individual online data, (풙푔푝, 풙푔푝 ). It entails that, given any pair of individuals, the closer the distance between their online activity features in the vector space, the greater the cor- relation between their anxiety scores 푦2 (close to 1), and vice versa (close to 0). ℓ is a hyperparameter that controls the length scale between data points: the greater the ℓ, the smoother the function. Figure 8: (a) The performances of OLS, GB, and GP on the We further assume that the true 푦2 equals to the function predic- anxiety prediction task. We carried out a standard stratified tion plus an independent unknown Gaussian noise 휖, and 휎 is the 5-fold cross-validation first. (b) We then conducted another hyperparameter for the noise distribution. The above GP gave us a 5-fold cross-validation but kept all the 9 subjects with sig- prior belief over the possible regression functions. The intuition is nificant changes in GAD-7 scores in the test set. The values that, in the output space of our function 푓 (풙푟푒푔), the future GAD-7 after the ± sign represent the standard deviations. anxiety scores, 푦2, are normally distributed with a mean of the follow-up. A change in 5 of GAD-7 scores (ranging from 0 to 21) previous anxiety scores, 푦1. The correlations between different 푦2 values are determined by the similarities between online activities represents a change in anxiety level by around 23%, which can be clinically alarming. Thus, we conducted another 5-fold cross- 풙푔푝 from the input space. In order to assess the performance of our GP over the test set, validation but kept all these 9 subjects in the test set of each fold. we first obtained the predictive posterior: We observed a good flexibility of 풙푟푒푔 in capturing such significant changes in GAD-7 since the performances are comparable to the   푡푒푠푡   푡푟푎푖푛 푡푟푎푖푛 푡푒푠푡  푃 푓 풙푟푒푔 | 푓 풙푟푒푔 , 풙푟푒푔 , 풙푟푒푔 (11) average scores for all models, see Figure 8, column (b). over all the regression functions conditioned on (after observing) the training set. This conditioning operation is in a sense that, after 6 DISCUSSION generating functions from the GP prior, we filter out those that It has been reported that, every year, approximately 60% of all peo- violate the training examples. After that, we sampled 100 functions ple with mental health conditions receive no treatment [38]. The (traces) from the posterior from Equation 11 and used them to make inability to identify patients in need of care and deliver treatments predictions on the test set. We report the average MSE of the 100 on-time are major failure points in the current healthcare system. functions, and such process is repeated for each fold of the cross- This is mainly because our current healthcare system entirely de- validation. We finally report the average performance over the5 pends on people to self-report and actively present themselves to folds in Figure 8. Our GP achieved an average MSE of 1.87 ± 0.14 clinics for treatments. Relying entirely on patients for detecting and in predicting future anxiety scores 푦2. delivering care in a timely manner is quite challenging because pa- Analysis: to summarize, we evaluated the performance of our tients may be experiencing lack of motivations, feeling helpless and OLS, GB, and GP models for the anxiety prediction task with 5-fold hopeless, fearing social stigmatization due to their condition, and stratified cross-validation. We selected 9 out of the 16 features that concealing information, all of which may impair their judgements we extracted from online activities for this task. to seek help. Our baseline model is the OLS regression, and it assumes the In this paper, we ran a novel longitudinal study that collected anxiety score 푦2 (dependent variable) is a linear function of the ubiquitous online activities logs along with gold-standard clinically features (the independent variables). However, in reality, the rela- validated anxiety scores. Individual-level online activities history tionship between anxiety and online behaviors is not necessarily has been gathered from the YouTube and Google Search engine linear. In order to grant more flexibility, we trained a GB regressor via the Google Takeout platform. We have developed explainable which consists of an ensemble of weak learners and is built in a features that capture various semantic and temporal facets of online sequential manner. The intuition is that the next learner learns engagement logs, such as activity and inactivity patterns, content from the mistakes made by the previous one, and each subsequent and time diversities. We have shown that these features are strong learner minimizes the residual prediction loss by gradient descent. signals for not only detecting individuals with anxiety disorders The GB regressor, with an average MSE of 2.29 ± 0.25, significantly but also estimating the severity of anxiety given any segment of outperforms the OLS, whose average MSE is 9.13 ± 3.65, in the online activity history. Given one’s online activities, our best per- anxiety prediction task (see column (a) of Figure 8). forming Random Forest classifier can identify an individual with Finally, with an average MSE of 1.87 ± 0.14, GP outperforms anxiety condition with an average F1 score of 0.83, average pre- both the OLS and GB (column (a) of Figure 8). Unlike OLS and GB, cision and recall of 0.84, and average AUC of 0.90. Furthermore, GP does not have any parametric form but is in a functional space we have demonstrated that anxiety scores can be predicted with with non-linearity. We assume a normal distribution for the future high accuracy, with average MSE of 1.87, using Gaussian Process anxiety score 푦2 with the first-round anxiety score as the mean, and regressor. To the best of our knowledge, we are the first to study correlations between different 푦2 value predictions depends on the and demonstrate that it is feasible to identify whether one is experi- similarities between online activities 풙푔푝 . encing anxiety and estimate his/her/their exact anxiety score using There were 9 individuals whose ground truth GAD-7 anxiety individual-level YouTube and Google Search engine history logs. scores changed by more than 5 between the first-round and the Our findings suggest the viability of constructing remote mental Woodstock ’18, June 03–05, 2018, Woodstock, NY Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque health surveillance frameworks based on passively sensed online addition, we conducted one-on-one interviews for each participant data, which may be cheap, efficient, and bypasses the patient reluc- during the recruitment procedure so that the research team can (a) tance and information concealing dilemmas of traditional systems. take the time to clearly explain the purpose and the outcome of the Integration into Existing Healthcare Systems: the anxiety study and (b) explicitly inform the participants about the existence assessment framework presented in this paper can be initially set up of such sensitive data and how they reserve full control over the in medical endpoints such as behavioral clinics. Therapists involved information shared such as limiting data access or deleting data. with patients suffering from various mental health conditions can Yet, one big limitation of employing opt-in model is that it may use the output of the model as additional information about their significantly limit the number of volunteering participants forthe patients. The predicted anxiety assessments can be leveraged to con- study. Besides, the opt-in procedure may introduce participation nect patients with the right counselor/expert. For example, someone bias in terms of study recruitment and the awareness of subjects. may come to the clinic for a drug addiction problem and, follow- To limit recruitment bias, we have adapted generic wordings, such ing his/her/their informed consents, the counselor runs our model as “help us learn about mental health using online data”, in our study which outputs that the patient may have been experiencing severe advertisements without specifically mentioning anxiety. anxiety during the last 4 week. In this case, our anxiety classifica- Another remaining issue is that whether and when it is ethical tion setup from Section 5.2 may be applicable. The patient may be to intervene in the life of an individual on the basis of online data flagged for review by designated members of the caregiver team signals associated with anxiety, which may not always be accurate. who are specifically trained to handle patients with anxiety as well We believe that the ultimate decision regarding intervention must as addiction problems. be made by therapists, care providers, and experts who understand Furthermore, our anxiety prediction setup from Section 5.3 can both anxiety and the power and limitations of an automated anxiety be used as a guideline to initiate specific treatment steps. Most assessment system. importantly, counselors can use the model on a weekly basis to monitor anxiety levels of their patients remotely (based on their REFERENCES online engagements) in-between sessions/follow-up visits. This en- [1] Natalia Adler, Ciro Cattuto, Kyriaki Kalimeri, Daniela Paolotti, Michele Tizzoni, ables caregivers to note abnormal spikes in the estimated level of Stefaan Verhulst, Elad Yom-Tov, and Andrew Young. 2019. How search engine anxiety comparing to the last visit. Healthcare providers can then ei- data enhance the understanding of determinants of suicide in India and inform ther schedule an immediate follow-up or use this information when prevention: observational study. Journal of medical Internet research 21, 1 (2019), e10179. engaging with the patient to uncover stressors and other issues [2] Judith Amores, Xavier Benavides, and Pattie Maes. 2016. Psychicvr: Increasing that may otherwise go unmentioned during the next appointment. mindfulness by using virtual reality and brain computer interfaces. In Proceedings of the 2016 CHI Conference Extended Abstracts on Human Factors in Computing For example, a therapist could bring up the online behaviors of the Systems. 2–2. patient during the past weekend, which were associated with high [3] American College Health Association. 2018. American College Health stress and anxiety symptoms, and ask if the patient agreed with the Association-National College Health Assessment II: Undergraduate Student Ref- erence Group Data Report Fall 2018. https://www.acha.org/documents/ncha/ assessment. If so, what was happening in his/her/their lives at that NCHA-II_Fall_2018_Undergraduate_Reference_Group_Data_Report.pdf. Silver time. Besides, such anxiety estimation setup is not one-shot fixed: Spring, MD: American College Health Association (sep 2018). it can and should be compared with professional clinical measures, [4] John W Ayers, Benjamin M Althouse, Jon-Patrick Allem, J Niels Rosenquist, and Daniel E Ford. 2013. Seasonality in seeking mental health information on Google. as more patients came in, to help improve the future performance. American journal of preventive medicine 44, 5 (2013), 520–525. Privacy & Ethical Considerations: Building an anxiety mon- [5] Shrey Bagroy, Ponnurangam Kumaraguru, and Munmun De Choudhury. 2017. A social media based index of mental well-being in college campuses. In Proceedings itoring system using individual-level YouTube and Google Search of the 2017 CHI Conference on Human factors in Computing Systems. 1634–1646. engine activity logs presents a series of concerns around privacy [6] Julia Brailovskaia, Elke Rohmann, Hans-Werner Bierhoff, Jürgen Margraf, and and data safety. Due to the sensitive nature of the data collected in Volker Köllner. 2019. Relationships between addictive Facebook use, depressive- ness, insomnia, and positive mental health in an inpatient sample: A German this study, it is important that appropriate human subject protection longitudinal study. Journal of Behavioral Addictions 8, 4 (2019), 703–713. protocols are in place. Hence our study protocol has been rigorously [7] Susanna Calling, Patrik Midlöv, Sven-Erik Johansson, Kristina Sundquist, and reviewed and approved by the Institutional Review Board of our Jan Sundquist. 2017. Longitudinal trends in self-reported anxiety. Effects of age and birth cohort during 25 years. BMC psychiatry 17, 1 (2017), 119. institution to address these concerns. Despite these measures, we [8] Christopher Cayari. 2011. The YouTube Effect: How YouTube Has Provided New acknowledge that ethical challenges may still arise if applications Ways to Consume, Create, and Share Music. International Journal of Education & the Arts 12, 6 (2011), n6. based on our methods are deployed in the real world. [9] Stevie Chancellor and Munmun De Choudhury. 2020. Methods in predictive When someone uses platforms such as YouTube and Google techniques for mental health status on social media: a critical review. NPJ digital Search engine, he/she/they never intend the personal data to be used medicine 3, 1 (2020), 1–11. [10] Colleen S Conley, Jenna B Shapiro, Brynn M Huguenel, and Alexandra C Kirsch. by mental health assessment systems. Hence, some individuals may 2020. Navigating the college years: Developmental trajectories and gender choose not to share their sensitive data and refuse to participate. It differences in psychological functioning, cognitive-affective strategies, and social is important to ensure that participants, at all times, have the choice well-being. Emerging Adulthood 8, 2 (2020), 103–117. [11] Glen Coppersmith, Mark Dredze, and Craig Harman. 2014. Quantifying men- and control over their data and can choose to exclude themselves tal health signals in twitter. In Proceedings of the Workshop on Computational from such studies at will. Participants need to be explicitly informed Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 51–60. about how their online engagement logs will be de-identified and [12] Glen Coppersmith, Mark Dredze, Craig Harman, Kristy Hollingshead, and Mar- analyzed, what type of information it may reveal about the user, garet Mitchell. 2015. CLPsych 2015 shared task: Depression and PTSD on Twitter. and the accrued benefits to the patients and the therapists/care In Proceedings of the 2nd Workshop on Computational Linguistics and Clinical Psychology: From Linguistic Signal to Clinical Reality. 31–39. providers from mental health clinics. To address these concerns, we [13] Glen Coppersmith, Craig Harman, and Mark Dredze. 2014. Measuring post employed an opt-in model for volunteering study participation. In traumatic stress disorder in Twitter. In Eighth international AAAI conference on Individual-level Anxiety Detection and Prediction from Longitudinal YouTube and Google Search Engagement Logs Woodstock ’18, June 03–05, 2018, Woodstock, NY

weblogs and social media. [39] Amir Muaremi, Bert Arnrich, and Gerhard Tröster. 2013. Towards measuring [14] E Jane Costello, Helen L Egger, and Adrian Angold. 2005. The developmental stress with smartphones and wearable devices during workday and sleep. Bio- epidemiology of anxiety disorders: phenomenology, prevalence, and comorbidity. NanoScience 3, 2 (2013), 172–183. Child and Adolescent Psychiatric Clinics 14, 4 (2005), 631–648. [40] Tahir M Nisar, Guru Prabhakar, P Vigneswara Ilavarasan, and Abdullah M Baab- [15] Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013. Social media as a dullah. 2019. Facebook usage and mental health: An empirical study of role of measurement tool of depression in populations. In Proceedings of the 5th Annual non-directional social comparisons in the UK. International Journal of Information ACM Web Science Conference. ACM, 47–56. Management 48 (2019), 53–62. [16] Munmun De Choudhury, Scott Counts, Eric J Horvitz, and Aaron Hoff. 2014. [41] Sudhakar V Nuti, Brian Wayda, Isuru Ranasinghe, Sisi Wang, Rachel P Dreyer, Characterizing and predicting postpartum depression from shared facebook data. Serene I Chen, and Karthik Murugiah. 2014. The use of google trends in health In Proceedings of the 17th ACM conference on Computer supported cooperative care research: a systematic review. PloS one 9, 10 (2014), e109583. work & social computing. 626–638. [42] Y Ophir, CSC Asterhan, and BB Schwarz. 2020. If these Facebook walls could [17] Munmun De Choudhury, Michael Gamon, Scott Counts, and Eric Horvitz. 2013. talk: Detecting and treating teenage psycho-social stress through social network Predicting Depression via Social Media.. In ICWSM. 2. activity (in Hebrew). Breaking down barriers? Teachers, students and social network [18] Munmun De Choudhury, Emre Kiciman, Mark Dredze, Glen Coppersmith, and sites (2020). Mrinal Kumar. 2016. Discovering shifts to suicidal ideation from mental health [43] John Paparrizos, Ryen W White, and Eric Horvitz. 2016. Detecting devastating content in social media. In Proceedings of the 2016 CHI conference on human factors diseases in search logs. In Proceedings of the 22nd ACM SIGKDD International in computing systems. ACM, 2098–2110. Conference on Knowledge Discovery and Data Mining. ACM, 559–568. [19] Jon D Elhai, Jason C Levine, and Brian J Hall. 2019. The relationship between [44] John Paparrizos, Ryen W White, and Eric Horvitz. 2016. Screening for pancreatic anxiety symptom severity and problematic smartphone use: A review of the adenocarcinoma using signals from web search logs: Feasibility study and results. literature and conceptual frameworks. Journal of Anxiety Disorders 62 (2019), Journal of Oncology Practice 12, 8 (2016), 737–744. 45–52. [45] Pablo Paredes and Matthew Chan. 2011. CalmMeNow: exploratory research and [20] Bethany KB Fleck, Lisa M Beckman, Jillian L Sterns, and Heather D Hussey. design of stress mitigating mobile interventions. In CHI’11 Extended Abstracts on 2014. YouTube in the classroom: Helpful tips and student perceptions. Journal of Human Factors in Computing Systems. 1699–1704. Effective Teaching 14, 3 (2014), 21–37. [46] Ted Pedersen. 2015. Screening Twitter users for depression and PTSD with lexical [21] Oren Gil-Or1, Yossi Levi-Belzm, and Ofir Turel. 2015. The “Facebook-self”: char- decision lists. In Proceedings of the 2nd workshop on computational linguistics and acteristics and psychological predictors of false self-presentation on Facebook. clinical psychology: from linguistic signal to clinical reality. 46–53. Frontiers in Psychology 6 (2015), 99. [47] Skyler Place, Danielle Blanch-Hartigan, Channah Rubin, Cristina Gorrostieta, [22] Google. 2020. Content Categories. https://cloud.google.com/natural-language/ Caroline Mead, John Kane, Brian P Marx, Joshua Feast, Thilo Deckersbach, An- docs/categories. [Online; accessed 12-May-2020]. drew Nierenberg, et al. 2017. Behavioral indicators on a mobile sensing platform [23] Reshmi Gopalakrishna Pillai, Mike Thelwall, and Constantin Orasan. 2018. De- predict clinically validated psychiatric symptoms of mood and anxiety disorders. tection of stress and relaxation magnitudes for tweets. In Companion Proceedings Journal of medical Internet research 19, 3 (2017), e75. of the The Web Conference 2018. 1677–1684. [48] Christine Purdon, Martin Antony, Sandra Monteiro, and Richard P Swinson. [24] John F Gunn III and David Lester. 2013. Using google searches on the internet to 2001. Social anxiety in college students. Journal of Anxiety Disorders 15, 3 (2001), monitor suicidal behavior. Journal of affective disorders 148, 2 (2013), 411–412. 203–215. [25] Alex S Hall and Jeffrey Parsons. 2001. Internet addiction: College student case [49] Andrew G Reece, Andrew J Reagan, Katharina LM Lix, Peter Sheridan Dodds, study using best practices in cognitive behavior therapy. Journal of mental health Christopher M Danforth, and Ellen J Langer. 2017. Forecasting the onset and counseling 23, 4 (2001), 312. course of mental illness with Twitter data. 7, 1 (2017), 1–11. [26] Alan G Hawkes. 1971. Spectra of some self-exciting and mutually exciting point [50] Philip Resnik, William Armstrong, Leonardo Claudino, Thang Nguyen, Viet-An processes. Biometrika 58, 1 (1971), 83–90. Nguyen, and Jordan Boyd-Graber. 2015. Beyond LDA: exploring supervised topic [27] Sue Jamison-Powell, Conor Linehan, Laura Daley, Andrew Garbett, and Shaun modeling for depression-related language in Twitter. In Proceedings of the 2nd Lawson. 2012. I can’t get no sleep: discussing# insomnia on twitter. In Proceedings Workshop on Computational Linguistics and Clinical Psychology: From Linguistic of the SIGCHI Conference on Human Factors in Computing Systems. ACM, 1501– Signal to Clinical Reality. 99–107. 1510. [51] MILES RICHARDSON, ZAHEER HUSSAIN, and MARK D GRIFFITHS. 2018. [28] Alberto Jimenez, Miguel-Angel Santed-Germán, and Victoria Ramos. 2020. Google Problematic smartphone use, nature connectedness, and anxiety. Journal of searches and suicide rates in Spain, 2004-2013: correlation study. JMIR public Behavioral Addictions 7, 1 (2018), 109–116. health and surveillance 6, 2 (2020), e10919. [52] Marian-Andrei Rizoiu, Young Lee, Swapnil Mishra, and Lexing Xie. 2017. Hawkes [29] Andreas M Kaplan and Michael Haenlein. 2010. Users of the world, unite! The processes for events in social media. In Frontiers of Multimedia Research. 191–218. challenges and opportunities of Social Media. Business horizons 53, 1 (2010), [53] Sohrab Saeb, Emily G Lattie, Stephen M Schueller, Konrad P Kording, and David C 59–68. Mohr. 2016. The relationship between mobile phone location sensor data and [30] Andy Kiang and Joel Bailon. 2016. Data loss prevention (DLP) methods and depressive symptom severity. PeerJ 4 (2016), e2537. architectures by a cloud service. US Patent 9,237,170. [54] Sohrab Saeb, Mi Zhang, Christopher J Karr, Stephen M Schueller, Marya E Corden, [31] Tae Wan Kim and Seung Tae Paek. 2016. Cloud data discovery method and Konrad P Kording, and David C Mohr. 2015. Mobile phone sensor correlates of system for private information protection and data loss prevention in enterprise depressive symptom severity in daily-life behavior: an exploratory study. Journal cloud service environment. US Patent App. 14/728,503. of medical Internet research 17, 7 (2015), e175. [32] Jane E Klobas, Tanya J McGill, Sedigheh Moghavvemi, and Tanousha Para- [55] Akane Sano and Rosalind W Picard. 2013. Stress recognition using wearable manathan. 2018. Compulsive YouTube usage: A comparison of use motivation sensors and mobile phones. In 2013 Humaine Association Conference on Affective and personality effects. Computers in Human Behavior 87 (2018), 129–139. Computing and Intelligent Interaction. IEEE, 671–676. [33] Robert LiKamWa, Yunxin Liu, Nicholas D Lane, and Lin Zhong. 2013. Moodscope: [56] H Andrew Schwartz, Johannes Eichstaedt, Margaret Kern, Gregory Park, Maarten Building a mood sensor from smartphone usage patterns. In Proceeding of the Sap, David Stillwell, Michal Kosinski, and Lyle Ungar. 2014. Towards assessing 11th annual international conference on Mobile systems, applications, and services. changes in degree of depression through facebook. In Proceedings of the workshop 389–402. on computational linguistics and clinical psychology: from linguistic signal to [34] Yuanchao Ma, Bin Xu, Yin Bai, Guodong Sun, and Run Zhu. 2012. Daily mood clinical reality. 118–125. assessment based on mobile phone sensing. In 2012 ninth international conference [57] Elizabeth M Seabrook, Margaret L Kern, and Nikki S Rickard. 2016. Social on wearable and implantable body sensor networks. IEEE, 142–147. networking sites, depression, and anxiety: a systematic review. JMIR mental [35] Diana MacLean, Asta Roseway, and Mary Czerwinski. 2013. MoodWings: a health 3, 4 (2016), e50. wearable biofeedback device for real-time stress intervention. In Proceedings [58] Peter S Shenkin, Batu Erman, and Lucy D Mastrandrea. 1991. Information- of the 6th international conference on PErvasive Technologies Related to Assistive theoretical entropy as a measure of sequence variability. Proteins: Structure, Environments. 1–8. Function, and Bioinformatics 11, 4 (1991), 297–313. [36] Michael J McCarthy. 2010. Internet monitoring of suicide risk in the population. [59] Robert L Spitzer, Kurt Kroenke, Janet BW Williams, and Bernd Löwe. 2006. A Journal of affective disorders 122, 3 (2010), 277–279. brief measure for assessing generalized anxiety disorder: the GAD-7. Archives of [37] Sedigheh Moghavvemi, Ainin Binti Sulaiman, Noor Ismawati Binti Jaafar, and internal medicine 166, 10 (2006), 1092–1097. Nafisa Kasem. 2017. Facebook and YouTube addiction: the usage pattern of [60] Hajime Sueki. 2011. Does the volume of Internet searches using suicide-related Malaysian students. In 2017 international conference on research and innovation search terms influence the suicide death rate: Data from 2004 to 2009 inJapan. in information systems (ICRIIS). IEEE, 1–6. Psychiatry and clinical neurosciences 65, 4 (2011), 392–394. [38] David C Mohr, Mi Zhang, and Stephen M Schueller. 2017. Personal sensing: [61] RP Swinson. 2006. The GAD-7 scale was accurate for diagnosing generalised understanding mental health using ubiquitous sensors and machine learning. anxiety disorder. Evidence-based medicine 11, 6 (2006), 184. Annual review of clinical psychology 13 (2017), 23–47. [62] TechPostPlus. 2019. YouTube video Categories list FAQs and solu- tions. https://techpostplus.com/2019/04/26/youtube-video-categories-list-faqs- Woodstock ’18, June 03–05, 2018, Woodstock, NY Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque

and-solutions/. [Online; accessed 26-April-2019]. [67] Harvey A Whiteford, Louisa Degenhardt, Jürgen Rehm, Amanda J Baxter, Alize J [63] The Next Web. 2020. Digital trends 2020: Every single stat you need to know about Ferrari, Holly E Erskine, Fiona J Charlson, Rosana E Norman, Abraham D Flaxman, the internet. https://thenextweb.com/podium/2020/01/30/digital-trends-2020- Nicole Johns, et al. 2013. Global burden of disease attributable to mental and every-single-stat-you-need-to-know-about-the-internet. Accessed: 2020-02-07. substance use disorders: findings from the Global Burden of Disease Study 2010. [64] Sho Tsugawa, Yusuke Kikuchi, Fumio Kishino, Kosuke Nakajima, Yuichi Itoh, The lancet 382, 9904 (2013), 1575–1586. and Hiroyuki Ohsaki. 2015. Recognizing depression from twitter activity. In [68] Christopher KI Williams and Carl Edward Rasmussen. 2006. Gaussian processes Proceedings of the 33rd annual ACM conference on human factors in computing for machine learning. Vol. 2. MIT press Cambridge, MA. systems. 3187–3196. [69] Nerys Williams. 2014. The GAD-7 questionnaire. Occupational medicine 64, 3 [65] Rui Wang, Fanglin Chen, Zhenyu Chen, Tianxing Li, Gabriella Harari, Stefanie (2014), 224–224. Tignor, Xia Zhou, Dror Ben-Zeev, and Andrew T Campbell. 2014. StudentLife: [70] Albert C Yang, Shi-Jen Tsai, Norden E Huang, and Chung-Kang Peng. 2011. assessing mental health, academic performance and behavioral trends of college Association of Internet search trends with suicide death in Taipei City, Taiwan, students using smartphones. In Proceedings of the 2014 ACM international joint 2004–2009. Journal of affective disorders 132, 1 (2011), 179–184. conference on pervasive and ubiquitous computing. 3–14. [71] Anis Zaman, Rupam Acharyya, Henry Kautz, and Vincent Silenzio. 2019. Detect- [66] Rui Wang, Weichen Wang, Alex DaSilva, Jeremy F Huckins, William M Kelley, ing Low Self-Esteem in Youths from Web Search Data. In The World Wide Web Todd F Heatherton, and Andrew T Campbell. 2018. Tracking depression dynamics Conference. 2270–2280. in college students using mobile phone and wearable sensing. Proceedings of [72] Melvyn WB Zhang, Cyrus SH Ho, Christopher CS Cheok, and Roger CM Ho. the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 2, 1 (2018), 2015. Smartphone apps in mental healthcare: the state of the art and potential 1–26. developments. BJPsych advances 21, 5 (2015), 354–358.