Individual-Level Anxiety Detection and Prediction from Longitudinal Youtube and Google Search Engagement Logs
Total Page:16
File Type:pdf, Size:1020Kb
Individual-level Anxiety Detection and Prediction from Longitudinal YouTube and Google Search Engagement Logs Anis Zaman Boyu Zhang Henry Kautz University of Rochester University of Rochester University of Rochester [email protected] [email protected] [email protected] Vincent Silenzio Ehsan Hoque Rutgers University University of Rochester [email protected] [email protected] ABSTRACT KEYWORDS Anxiety disorder is one of the world’s most prevalent mental health anxiety, mental health, prediction, Google Search history, YouTube conditions, arising from complex interactions of biological and en- history vironmental factors and severely interfering one’s ability to lead ACM Reference Format: normal life activities. Current methods for detecting anxiety heav- Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque. ily rely on in-person interviews, which can be expensive, time- 2018. Individual-level Anxiety Detection and Prediction from Longitudinal consuming, and blocked by social stigmas. In this work, we propose YouTube and Google Search Engagement Logs. In Woodstock ’18: ACM an alternative method to identify individuals with anxiety and fur- Symposium on Neural Gaze Detection, June 03–05, 2018, Woodstock, NY. ACM, ther estimate their levels of anxiety using personal online activity New York, NY, USA, 10 pages. https://doi.org/10.1145/1122445.1122456 histories from YouTube and the Google Search engine, platforms that are used by millions of people daily. We ran a longitudinal 1 INTRODUCTION study and collected multiple rounds of anonymized YouTube and According to the World Health Organization (WHO), 1 in 13 people Google Search logs from volunteering participants, along with their suffers from anxiety globally, making it one of the most prevalent clinically validated ground-truth anxiety assessment scores. We mental health concerns. In the United States, it is the second lead- then developed explainable features that capture both the temporal ing cause of disability among all psychiatric disorders [67]. Nearly and contextual aspects of online behaviors. Using those, we were 40 million people (age 18 and older) experienced anxiety disorder able to train models that (i) identify individuals having anxiety in any given year, yet only 35.9% of those suffered received treat- ± disorder with an average F1 score of 0.83 0.09 and (ii) assess ments1. A study in 2017 reported that the level of anxiety among the level of anxiety by predicting the gold standard Generalized young adolescents has been gradually increasing in recent years [7]. Anxiety Disorder 7-item scores (ranges from 0 to 21) with a mean The population most vulnerable to anxiety disorder is the stu- ± square error of 1.87 0.15 based on the ubiquitous individual-level dents in high school and early college years. A report by the Amer- online engagement data. Our proposed anxiety assessment frame- ican College Health Association in 2018 stated that 63% of college work is cost-effective, time-saving, scalable, and opens the door students in the U.S. felt overwhelming anxiety during the last 12 for it to be deployed in real-world clinical settings, empowering months, and only 23% of these students were either diagnosed or care providers and therapists to learn about anxiety disorders of treated for an anxiety disorder by a professional mental healthcare patients non-invasively at any moment in time. provider [3]. During the early days of college, students are separated from their traditional support system and find themselves in chal- CCS CONCEPTS lenging social and academic settings such as living with roommates, developing independent identities, making new friends, managing • Information systems ! Web search engines; • Applied com- heavy workloads, etc. All these experiences induce spikes in anxiety arXiv:2007.00613v2 [cs.HC] 30 Nov 2020 puting ! Health informatics; Psychology; • Human-centered from time to time [48], and this psychological distress increases computing ! Empirical studies in HCI. during the first few semesters of college [10]. Furthermore, it has been reported that anxiety disorders are significantly associated with other medical and psychiatric comorbidities [14]. Despite such a high prevalence of anxiety among young adolescents, current Permission to make digital or hard copies of all or part of this work for personal or methods for detecting anxiety disorders consist of self-assessment classroom use is granted without fee provided that copies are not made or distributed surveys and in-person interviews, which can be time-consuming, for profit or commercial advantage and that copies bear this notice and the full citation expensive, lack precision, and hampered by factors such as fear, on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, concealing information, and social stigma related to the mental to post on servers or to redistribute to lists, requires prior specific permission and/or a health issue. fee. Request permissions from [email protected]. Engagements in online platforms are major components in the Woodstock ’18, June 03–05, 2018, Woodstock, NY © 2018 Association for Computing Machinery. lives of young adults [29]. On average, an internet user spent ACM ISBN 978-1-4503-XXXX-X/18/06...$15.00 https://doi.org/10.1145/1122445.1122456 1https://adaa.org/understanding-anxiety Woodstock ’18, June 03–05, 2018, Woodstock, NY Anis Zaman, Boyu Zhang, Henry Kautz, Vincent Silenzio, and Ehsan Hoque the equivalent of more than 100 days online during the last 12 2 RELATED WORK months [63]. It has been reported that 81% of U.S internet users Public social media, blogs, and forums have become popular data 2 aging between 15 to 25 use YouTube regularly. Besides, an average sources for researchers to study the prevalence of mental health internet user uses Google Search at least once a day, and many conditions. [57] showed that the usage of social media sites cor- 3 search dozens of times a day . Extensive studies have been done relates with user depression and anxiety. Tweets, one of the most trying to correlate mental health issues with popular public social explored social media platform, has been used to detect insom- media data such as Facebook [6, 40, 42] and Twitter [9, 11, 13, 23], nia [27], suicidal ideations [18], depressed individuals [17], the yet they may fail to cover people who interact infrequently with extent of depression [64], and languages related to depression and social media or post false positive impressions publicly [21]. In PTSD [12, 46, 49, 50]. Besides, [16] have shown that Facebook status contrast, individual-level search and YouTube logs are ubiquitous can be used to predict postpartum depression and monitor depres- and private for each user and are less likely to be subject to self- sion [56]. Other researchers have leveraged data from Reddit to censorship. A group of researchers has shown that search logs can study mental distress among adolescents [5]. [18] identified shifts be used as a proxy for detecting mental health issues [1, 28, 71]. in language may indicate future suicidal ideations. De Choudhury We draw inspirations from these prior works and hypothesize that et al. provides a comprehensive overview of the role of social media private Google Search engine logs and YouTube histories can leave in mental health researches [15] and evaluation methodologies [9]. a detailed digital trace of the mental health states of users and be Social media users constitutes only a fraction of the general popu- used as a proxy to assess the level of anxiety for individuals. lation, and a small number of them, with particular personalities In this work, we propose a framework that leverages individual- or demographics, typically acts out in social media that may re- level online activities logs, in particular, Google Search and YouTube veal signs of mental health struggles. Hence, findings based on activity histories, to identify individuals with anxiety disorder and social media platforms may not generalize to the majority of the further predict their level of anxiety. We ran a longitudinal study population. to gather two rounds of data, with 5 months in-between, from a A large number of researchers have leveraged sensors, such as college population. During each round, participants shared their smartphone and mobile apps, that are embedded in our daily life anonymized online activity histories along with their answers to a experiences to capture various aspects of mental health [38, 65]. For clinically validated questionnaire for measuring Generalized Anxi- example, sensor data has been used for studies of anxiety [19, 47, 51], ety Disorder (GAD-7) [59]. We then developed an explainable low- stress [39, 55], moods [33, 34], and depression [53, 54, 66]. Several dimensional vector representation that captures different aspects of research groups have developed applications to help users man- one’s online behaviors, including temporal activity patterns, time age stress and anxiety [35, 45] and evoke positive emotions [2]. and semantic diversities, and periods of inactivity. Using these fea- However, smartphone applications for tackling mental health is- ture representations, we trained models that can accurately detect sues have several limitations: (i) not every mental health patient and predict one’s level of anxiety from online activities. Unlike [71] have access to smartphones;