Demystifying Topology of Autopilot Thoughts: A Computational Analysis of Linguistic Patterns of Psychological Aspects in Mental Health

Bibekananda Kundu and Sanjay Kumar Choudhury Language Technology, ICT and Services Centre for Development of Advanced Computing, Kolkata E-mail: {bibekananda.kundu,sanjay.choudhury}@cdac.in

Abstract nebaker et al., 2003). All psychological in- terventions rely on the power of language. The paper investigates topology of un- Psychotherapists rarely intervene directly in controlled dynamic depressive thoughts their client’s lives, they create changes in the which is popularly known as “autopi- thought process through conversation (Vil- lot” in the psychology domain. Per- latte et al., 2015). According to Relational sistent , a mathematical tool Frame Theory (RFT) (Greenway et al., 2010), from algebraic topological has been people use linguistic frames to understand the applied on Vector Space representa- world around them, and subsequently solve tion of tweets generated by users hav- problems. RFT has been suggested as an ing neurotic personality for determin- approach to understanding natural language ing the topological structure of autopi- systems. The theory lends itself well to as- lot thoughts. State-of-the-art machine sessment with Natural Language Processing learning techniques leveraging linguis- (NLP) precisely because it relies on under- tic resources akin to LIWC, WordNet- standing interaction between sensation, affect, Affect and SentiWordNet have been language, and behaviour. When someone uses applied for identifying neurotic per- language, they are labelling their experience. sonality from different Twitter users. For example, someone might tweet “I need to An initiative has been taken for em- escape this world before I get crushed.” indi- powering Neuro Linguistic Program- cating a fear based affective response. We are ming (Bandler and Grinder, 1975; Ban- planning to use NLP to assess this label (Pen- dler and Grinder, 1979; Bandler and nebaker et al., 2015). Simply labelling events Andreas, 1985) and other psychother- and their attributes as ‘positive’ or ‘negative’ apy techniques using Natural Language increases associated memories and emotional Processing in the domain of Mental salience. This type of relational network can Health. be evoked with any number of internal or ex- ternal stimuli, triggering the aforementioned 1 Introduction internal feedback loop, and leading to psy- “Wherever there are sensations, ideas, emo- chological distress. For example, describing a tions, there must be words” ‘negative’ event such as a trauma can evoke — Swami Vivekananda. intense fear and sadness and subsequent sob- We use language for thinking, experienc- bing (Miner et al., 2016; Althoff et al., 2016). ing, expressing, communicating and problem The person suffering from distress actually us- solving. So, to analyze one’s thought pro- ing a model of world which is very limited cess, language is a symbolic medium. In psy- and in this world he/she find no appropri- chotherapy, language is considered as a pri- ate choice from the options available to their mary tool to understand patients’ experiences model of world (Bandler and Grinder, 1975; and express therapeutic interventions (Pen-435 Bandler and Grinder, 1979; Bandler and An- S Bandyopadhyay, D S Sharma and Sangal. Proc. of the 14th Intl. Conference on Natural Language Processing, pages 435–446, Kolkata, India. December 2017. c 2016 NLP Association of India (NLPAI) dreas, 1985). Therefore, there is a requirement of these semantically embedded words using of expanding the model of world i.e. impro- persistent homology (Zhu, 2013; Kaczynski et vise the model to a better model which has al., 2004). We have intuition that timeline of more options. Therefore, the therapeutic tech- neurotic person contains different topological nique would be somehow transforming the ex- structure than timeline of user having other isting model to a better model using a meta- personality. Studies say that neurotic per- model and transformational grammar. The son uses more first person pronoun, less so- linguistic theory plays a vital role to under- cial words, more negative emotion words (Pen- stand the client model and transform it us- nebaker, 2011). An introvert person uses sin- ing transformational grammar (Bandler and gle topic, discusses more regarding problem, Grinder, 1975). Therefore, one of the key con- uses few self-references, many tentative words, cerns of psychotherapy is to understand topol- many negation as compare to extrovert per- ogy of the maladaptive autopilot thoughts and son (Mairesse and Walker, 2007). Topological changing the topology of thought process us- data analysis using persistent homology has ing Mindfulness (Collins et al., 2009), Collab- been discussed in the section 5. In the next orative Empiricism (Beck and Emery, 1979; section, we will discuss our data collection pro- Kazantzis et al., 2013) and other talk ther- cedure from Twitter. apy techniques (Pawelczyk, 2011; Ebert et al., 2015; Mayo-Wilson and Montgomery, 2013; 3 How to Collect Tweets of Mohr et al., 2013). In this regards, un- Neurotic Persons derstanding of topology of uncontrolled dy- The proposed approach utilizes an ensem- namic depressive thought (known as “autopi- ble of state-of-the-art machine learning tech- lot thought” in psychology) is important for niques based on psycholinguistic features to evaluating mental health of patients. After a detect distress users (having neurotic person- brief discussion on psychological background ality) from their social media text. We have and motivation behind the work, we will un- used Twitter API to search in the Twitter derstand how we can represent topology of using some seed words/phrases like “awful”, thought in the next section. “terrible”, “lousy”, “hate”, “lonely”, “hope- less”, “helpless”, “crap”, “sad”, “miserable”, 2 How to Represent Topology of “tired”, “sleep”, “hurt”, “pain”, “kill”, “die”, Thought “dying”, “stressed”, “frustrated”, “irritated”, “depressed” etc. and name of some antide- We are considering written text as a sym- pression drugs like “Sertraline”, “Citalopram”, bolic representation of thoughts. To under- “Clonazepam”, “Propanol”, “Prozac”, “Zopi- stand the topology of “autopilot thoughts”, clone”, “Fluoxetine”, “Quetiapine”, “Hydrox- we have collected tweets of neurotic person- yzine” etc. Next, we have filtered out the ality from Twitter applying a hybrid approach tweets starting with RT to avoid considering combining Deep Learning based classification, retweets. We have also removed tweets con- KL-Divergence (Manning and Schütze, 1999) taining url. Thereafter, these tweets are sent based Timeline Similarity Analysis and Rule- to (in house developed) Psychological Anno- based sentiment analysis technique leveraging tation Interface for manual annotation. Fig- WordNet-Affect1 , SentiWordNet2 and psy- ure 1 shows screenshot of the interface along cholinguistic resource akin to LIWC3. Detailed with some examples of negative tweets. An data collection procedure has been discussed annotator can label a tweet considering three in the section 3 and 4 .We are interested to aspects Viz: study the representation of words used by neu- rotic persons in the Vector Space using Word (a) Personal/Impersonal Emotion Labelling Embedding (Mikolov et al., 2013; Mesnil et (b) Polarity Labelling al., 2013) and topology (Sizemore et al., 2016) (c) Psychological Annotation 1http://wndomains.fbk.eu/wnaffect.html 2http://sentiwordnet.isti.cnr.it Individual words in a tweet are annotated ac- 3http://liwc.wpengine.com/ 436 cording to Psychological Process as discussed Figure 1: Psychological Annotation Interface in (Pennebaker et al., 2015). Special care students used more first person singular pro- has been taken during annotation to find out nouns, more negative emotional words, and “Linguistic Marker of Depression” (Bucci and slightly fewer positive emotion words in their Freedman, 1981; Pyszcynski and Greenberg, essays about coming to college, relative to stu- 1987) in the tweet. Pronouns tell us where dents who had never experienced a depressive people focus their attention. If someone uses episode. These results are in line with (Pysz- the pronoun “I”, it’s a sign of self-focus. De- cynski and Greenberg, 1987) self-awareness pressed people use the word “I” much more theory. Therefore, in our Psycho-logical An- often than emotionally stable people (Pen- notation Interface, we have implemented a fea- nebaker, 2011; Ramirezesparza et al., 2008; ture to highlight tweets with red back ground Nguyen et al., 2014). Researchers have found containing First Person Personal Pronoun i.e. that people who frequently use first-person “I”. Focus on temporal orientation of people singular words like “I”, “me” and “myself” are that is how often they emphasize the past, more likely to be depressed and have more in- present and future is necessary because it af- terpersonal problems than people who often fects their health and happiness (Zimbardo say “we” and “us”. Using LIWC2001, (Stir- and Boyd, 2008). We are interested the pro- man and Pennebaker, 2001) found that sui- portion of a user’s tweets that the analytic cidal poets were more likely to use first per- finds evidence in: Insomnia and Sleep Dis- son pronouns (e.g., “I”, “me”, “mine”) and turbance which is often a symptom of men- less first plural pronouns (e.g., “we”, “ours”) tal health disorders (Weissman et al., 1996; throughout their writing careers than were De Choudhury et al., 2013), so we have cal- non-suicidal poets. These findings supported culated the proportion of tweets that a user the social engagement/disengagement model makes between midnight and 4 am accord- of depression, which states that suicidal in- ing to their local time-zone. Therefore, dur- dividuals have failed to integrate into society ing annotation, special attention should be in some way, and are therefore detached from given on the time perspective of the tweets. social life (Durkheim, 1951). Similarly, (Rude Moreover, in the interface we have highlighted et al., 2004) found that currently depressed tweets in grey colour that have been gener- 437 ated midnight assuming that the user have of standardized questionnaires like Beck De- sleeping problem or insomnia. From all la- pression Inventory (BDI)4 , Big Five Inven- belled tweets, unique users name are auto- tary (BFI) (John and Srivastava, 1999) etc. A matically extracted for analysis of their time- patient’s answers are then compiled and com- lines. Thereafter tweets are extracted from pared with disease classification guidelines, unique users’ timeline and annotated using the such as the International Classification of Dis- aforementioned procedure. After annotation eases or the Diagnostic and Statistical Manual of all tweets from a user timeline, the per- (DSM), to guide the patient’s diagnosis. How- sonality of tweet user is labelled in one of the ever, these diagnostic methods are not pre- nine classes viz.Extravert, Introvert, Emotion- cise and have high rates of false positives and ally Stable, Neuroticism, Agreeable, Disagree- false negatives. In addition, societal and finan- able,Conscientious,Unconscientious and Open cial barriers prevent many people from seek- to Experience. This personality classes are ing medical attention (Michels et al., 2006). basically extended form of “Big Five Per- Many societies around the world stigmatize sonality” (John and Srivastava, 1999) classes. and discriminate against people with mental We are more interested for the users those disorders, contributing to the unwillingness of are labelled as “Neuroticism” after analyzing individuals to acknowledge the problem and the tweets from their timeline. Following the seek help (Fabrega, 1991). While psycholog- above mentioned procedure, we have created ical treatments for depression can be effec- our training data. Manual analysis of tweets tive (Cuijpers et al., 2008), they are often collected from users’ timeline reveals that in- plagued by access barriers and high rates of formation contains in two neurotic users is attrition (Mohr et al., 2010). Internet inter- similar. Moreover, people discuss similar prob- ventions have been touted as an antidote to ac- lem among their friend circles, therefore auto- cess barriers, but they appear to produce more matic searching “friend” and “followers” of a modest outcomes (Andersson and Cuijpers, neurotic users increases the chance of getting 2009), in part also due to high attrition (Chris- more data of similar nature automatically. tensen et al., 2009). In recent years, there In this way collected training data has been has been a tremendous growth in social inter- used to train our ensemble learner for detect- actions on the Internet via social networking ing depression from social media text in order sites and online discussion forums. In contrast to collect more tweets from neurotic users. In to clinical tests, the Internet is an ideal, anony- the next session, we will discuss the depression mous medium for distressed individuals to re- detection technique using an ensemble classi- late their experiences, seek knowledge, and fier. reach out for help. Social media is an emerg- ing tool that may assist research in this area, 4 Depression Detection from Social as there exists the possibility of passively sur- Media Text veying and then subsequently influencing large groups of people in real time. (Ruder et al., 4.1 Why Social Media 2011) have shown that some Facebook users Currently, depression is primarily assessed do, in fact, post suicide notes on their profiles, through surveys. The standard approach to exposing the potential for suicide related re- diagnosing psychological health disorders is search in social media. The amount of publicly through a series of clinically administered di- available information spread across the realm agnostic interviews and tests (Weathers and of social media is extensive. We prefer Twit- Davidson, 2001). However, assessment of pa- ter because of its greater public availability of tients using these tests is expensive and time- data, larger user base, and it being a platform consuming. Furthermore, the stigma associ- of personal expression. Users generate over ated with mental illnesses motivates inaccu- 400 million tweets per day (Bennett, 2012). rate self-reporting by affected individuals and This large reservoir of information regarding their family members, thus making the tests unreliable. Commonly, the evaluation of a pa- 4http://www.hr.ucdavis.edu/asap/pdf_files/ tient is typically performed through the use438 Beck_Depression_Inventory.pdf people’s daily lives and behaviours, if handled based topic distribution. Study shows correctly, can be used to study depression, sui- that theme-based retrieval does a better cide and possibly intervene. Twitter is also job of finding relevant and effective docu- used for keeping in touch with friends and col- ments (tweets in user timeline in our case) leagues, sharing interesting information within for this application than conventional ap- one’s network, seeking help and opinions, and proaches (Dinakar et al., 2012b; Dinakar releasing emotional stress (Johnston and Hau- et al., 2012a). All the weights used in man, 2013). Therefore, Twitter can be identi- the above equations are empirically de- fied as an important surveillance tool for de- termined. tecting depression and suicidal patterns. (b) Emotional Intensity Measurement: 4.2 Methodology We have used following resources for mea- We have applied an ensemble classifier to clas- suring emotional intensity of individual sify distress and non-distress user based on tweet: their social media text collected from Twit- (a) SentiWordNet ter. The ensemble classifier has been built us- ing linear combination of Document Similarity (b) Manually classified lexicon based on and Emotional Intensity Estimator. Weights psychological process akin to LIWC. in this linear combination are estimated em- (c) WordNet Affect pirically to achieve higher accuracy in this We have calculated NegetivityScore classification task. combining LSTM (Hochreiter and We have followed two approaches for detect- Schmidhuber, 1997; Gers et al., 2000; ing depressive tweets viz. Graves, 2012), a deep learning model (a) Document Similarity Measurement: for detecting polarity from tweets and We have used KL-Divergence (Manning rule-based approach using the above and Schütze, 1999) to measure similarity mentioned resources and psycholinguistic between searched user’s timeline and la- features.Features used in this classifi- belled tweets from all users’ timeline and cation task are mainly psycholinguistic used the similarity score for final scoring types, other than that “Pattern of Life 6 of negativity of the user. Applying La- Analytics” (Greetham et al., 2011; tent Dirichlet Allocation (LDA) (Blei et Berkman et al., 2000; De Choudhury al., 2003), we have estimated topic distri- et al., 2013), “Capitalized Text”, “Spe- bution on the labelled negative5 and pos- cialHashTag”, “Probability of Personal itive data extracted from users’ timeline. Pronoun”, “UserName Conatining Spe- We have used sklearn.lda.LDA library cial Keywords” etc. have also been used. to estimate topic distribution. Then we Count of some common phrases like have estimated the topic similarity of a “why me”, “I hate myself” etc. have also query user’s tweets (user whose person- been considered as important feature. ality needs to be estimated) from their Examples of “SpecialHashTag” feature timeline with these labelled negative and are #depressionprobs, #thisiswhatde- positive tweets. Then we have estimated pressionlike, #depression, #suicide etc.. the overall similarity score using the fol- It have been seen that if userid of the lowing equation: users contains some clue substrings like “depressing”, “depression”, “hell”, SimilarityScore = 0.6 β + 0.4 γ (1) ∗ ∗ 6Social engagement has been correlated with posi- tive mental health outcomes. Tweet rate measures how where β and γ are the similarity scores often a Twitter user posts and pro-portion of tweets estimated using LDA and KL-Divergence with @mentions measures how often a user posts ‘in conversation’ with other users. Number of @mentions 5In this paper we have used the term “negative is a measure of how often the user in question engages tweet” and “distressed tweet” interchangeably to rep- other users, while Number of self @mentions is a mea- resent the tweet generate by user having neurotic per- sure of how often the user responds to mentions of sonality. 439 themselves. “depressed”, “sad”, “cry”, “suicidal”, have considered 2000 dimensions and 5 con- “anxious”, “anxiety”, “lonely”, “die”, text window during Word Embedding.± Words “broken”, “stress”, “worthless”, “lost” that are appeared at least 10 times in the etc. the timeline of these users contains corpus have been selected for vector repre- depressive tweets. Therefore the users sentation. We have used t-SNE api available having such userid have been considered in the sklearn.manifold library9 for dimen- as an important feature. Tweets written sionality reduction of these higher dimensional in Upper Case, are considered as impor- points and visualization in the two dimension tant assuming that these are written in space. Figure 2 shows representation of em- Upper Case for providing more impor- bedding words in 2d space using t-SNE. We tance/intensifying the emotion involved can see semantically closer words are form- in the tweet.We have used Theano, a ing clusters in the Vector Space. “Kill me”, python based deep leaning library for “Suicidal”, “destroy”, “Cutting” are appearing implementing our LSTM classifier7. closer to each other and “rejected”, “unloved”, “worthless” are forming separate cluster. Sep- Final score for selecting neurotic persons arate cluster represent the different topic of has been calculated as follows : the thoughts those are having in the mind F inalScore = α SimilarityScore of neurotic persons. Conversely, analysing ∗ (2) + (1 α) NegetivityScore the tweets of positive minded people, we have − ∗ seen that “adorable”, “comfortable”, “eager”, “hopeful”,“satisfied” etc. words are frequently Value of alpha (0 α 1) can be set ≤ ≤ used in their timeline. Persistence homology experimentally to achieve highest accu- has been applied to the point clouds of pos- racy. We have seen empirically that bet- itive and negative tweets separately. In the ter result is found when the value of α next section we will discuss the topological is 0.8. It has been observed that when data analysis of negative and positive tweets the F inalScore is greater than 0.14 then based on their vector representation. the user can be accepted as neurotic per- son. Following the procedure discussed 5 Topological Data Analysis of in section 3 and 4 we have collected 2500 negative tweets from the timeline of 12 Tweets Twitter users having neurotic personal- Persistent homology (Zhu, 2013), a mathemat- ity. Same numbers of positive tweets have ical tool from topological data analysis has been collected from the timeline of users been applied on the collected tweets for multi- having tweets with hashtag #motivation- scale analysis on a set of points and identi- altweet, #positivethinking, #motivation- fies clusters, holes, and voids therein. Persis- alquotes etc. tent homology can identify clusters (0-th order holes), holes (1st order, as in our loopy curve), 4.3 Vector Space Representation of voids (2nd order holes, the inside of a balloon), Distressed Tweets and so on in a point cloud. It finds “holes” by identifying equivalent cycles. Detailed discus- After manual verification, we have converted sion on Persistent homology10 and Algebraic the negative and positive tweets into the multi- Topology is out of scope of the paper. Inter- dimensional Vector Space. Thereafter, using ested readers can follow work of (Zhu, 2013; “t-Distributed Stochastic Neighbor Embed- Singh et al., 2008; Giblin, 2010; Freedman ding” (t-SNE) technique (van der Maaten and and Chen, 2011; Zomorodian, 2001; Carlsson, Hinton, 2008), the higher dimensional data 2008; Edelsbrunner and Harer, 2010; Hatcher, points are projected into a 2d plane. We have 2002). After representing the words in Vec- used gensim8 python library to convert neu- tor Space, we have used these data points rotic persons’ tweets into Vector Space. We

7http://deeplearning.net/tutorial/lstm.html 9http://scikit-learn.org/ 8https://radimrehurek.com/gensim/ 440 10http://outlace.com/ Figure 2: Representation of Words in Vector Space using t-SNE

Figure 3: Generated Vietoris-Rips Complexes on Negative Point Cloud with Incremental Values of ϵ.

Figure 4: Generated Vietoris-Rips Complexes on Positive Point Cloud with Incremental Values of ϵ. 441 to build Vietoris-Rips11 complexes of diame- in F . ter ϵ which are simplicial complexes VR(ϵ) = σ diam(σ) < ϵ . Here diam(σ) represents • Compute the barcodes associated to { | } the largest distance between two points in σ. the N-persistence F-vector spaces Distance measures varies according to differ- Hi(C (n),F ) n { ∗ } ent contexts. Here we have used euclidean distance for our purpose. Figure 3 and fig- Please refer (Carlsson, 2008) for detail expla- ure 4 show generated Vietoris-Rips complexes nation. on negative and positive point cloud respec- The “barcode plot” is a convenient way tively. Here we can see, if we set ϵ too small, to visualize persistent homology (Zhu, 2013; then generated complexes may just consist of Ghrist, 2007). Barcode plot shown in figure 5 the original point cloud, or only a few edges and figure 6 are drawn based on increasing between the points. If we set ϵ too big, then sequence of ϵ and zeroth Betti number (β0) the point cloud will just become one massive calculated from positive and distressed tweets ultradimensional simplex. Our intention is to respectively. discover meaningful patterns in a simplicial We have selected 500 data points randomly complex by continuously varying the ϵ param- from the word to vector representation of pos- eter (and continually re-build complexes) from itive and distressed tweets for the filtration 0 to a maximum that results in a single mas- process. The word to vector representation sive simplex. Then we generate a diagram that using Word Embedding ensures that words shows what topological features are born and that share common contexts (semantics) in die as ϵ continuously increases. We assume the negative and positive tweets are located that features that persist for long intervals in close proximity to one another in the Vec- over ϵ are meaningful features whereas features tor Space. Using persistent homology we are that are very short-lived are likely noise. This trying to examine the topology of these se- procedure is called persistent homology com- mantically oriented data points (words). The putation as it finds the homological features of number of connected components is an im- a topological space that persist while we vary portant topological invariant of a graph. In ϵ. Persistent homology examines all ϵ’s to see topological graph theory, it can be interpreted how the system of hole change (also known as as the zeroth Betti number of the graph. “Birth and Death process”). An increasing se- From figure 5, we can see that positive tweets quence of ϵ produces a filtration. Persistent have less connected components (142 discon- homology tracks homology classes along the nected components out of 500 data points) filtration to know for what value of ϵ does a whereas figure 6 shows that negative tweets hole appear and how long the hole persists. have much more connected component com- We have followed the methodology as reported pare to positive tweets (only 7 disconnected in (Carlsson, 2008) to study the homology of components). Less number of connected com- the complexes constructed. The steps involve ponents in users’ timeline represents wide vari- in this methodology are as follows: ation of topics. Conversely, less number of dis- connected components indicate that tweets are • Construct the R persistence simpli- much more focused towards some specific top- cial complex C using Vietoris-Rips { ϵ} ics. Manually investigating the contents of the method. positive and negative tweets, we have seen that • Select a partial order preserving map f : users having neurotic personality discuss more regarding their pain and problems. Hence, N R → topic discussed in their timeline more focused • Construct the associated N-persistence to their problem area. On the other hand, pos- simplicial complex. itive minded users discuss on different topics and also share ideas and thoughts among their • Construct the associated -persistence N friends and followers. Therefore, tweets gener- chain complex C∗(n) with co-efficients { }n ated by them have wide variation of topics. 11http://outlace.com/ 442 Barcode plot shown in figure 7 and figure 8 are Figure 5: Barcode Plot of Positive Tweets at Figure 8: Barcode Plot of Distressed Tweets Betti Dimension 0 (β0) at Betti Dimension 1 (β1)

drawn based on increasing sequence of ϵ and 1st Betti number (β1) calculated from positive and distressed tweets respectively. We can see that figure 7 has very less number of holes and number of holes in the figure 8 are much more compare to figure 7. As the number of discon- nected components are much more in the pos- itive tweets, the chance of appearance of one dimensional holes are less. Conversely, num- ber of one dimensional holes are much more in negative tweets because of less number of dis- connected components. We have found that number of one dimensional holes in positive Figure 6: Barcode Plot of Distressed Tweets tweets is 1 and for negative tweets, it is 34. at Betti Dimension 0 (β0) This observation corroborates the first obser- vations that the people with negative mind- set has more oriented set of thoughts (focused to their problem domain) than people having positive mindset. The higher order homology groups produces Betti numbers having values zeros, as expected.

6 Conclusion

In this paper, we have proposed a novel ap- proach for collecting tweets of neurotic per- sons. Then these tweets are represented in the Vector Space using Word Embedding and di- mensionality has been reduced using t-SNE. Persistent homology has been applied to anal- Figure 7: Barcode Plot of Positive Tweets at yse the topology of tweets resembling autopi- lot thoughts. Psychological features in term of Betti Dimension 1 (β1) linguistic pattern has been discussed. As a future work we are planning to ex- 443 plore, how natural language generation can be applied for therapeutic text generation fol- Gunnar Carlsson. 2008. Topology and data. Tech- lowing RFT and based on topology of pa- nical report. tients’ thought. We have hypothesized that H Christensen, KM Griffiths, and Farrer L. 2009. tweets having psychological features, linguistic Review adherence in internet interventions for markers of depression are indicator of neurotic anxiety and depression. Journal of Medical In- user’s time line as per our understanding of lit- ternet Research, 11(2). erature. Therefore, as a future work we would SE Collins, N Chawla, SH Hsu, J Grow, JM Otto, like to get expert guidance from psychothera- and Marlatt GA. 2009. Language-based mea- pists for better understanding of the psycho- sures of mindfulness: initial validity and clinical utility. Society of Psychologists in Addictive Be- logical process involved in Mental Health. haviors, 23(4):743–749. The work discussed in this paper is an ini- tiative towards applying NLP in the domain of P Cuijpers, A Van Straten, L Warmerdam, and N Smits. 2008. Characteristics of effective psy- Mental Health which will motivate researchers chological treatments of depression: a meta re- for further exploration of linguistic markers gression analysis. Psychother Res, 18(2):225–36. and topology involved in psychology. Munmun De Choudhury, Scott Counts, and Eric Horvitz. 2013. Predicting postpartum changes in emotion and behavior via social media. In References Proceedings of the SIGCHI Conference on Hu- Tim Althoff, Kevin Clark, and Jure Leskovec. man Factors in Computing Systems, CHI ’13, 2016. Natural language processing for mental pages 3267–3276, New York, NY, USA. ACM. health: Large scale discourse analysis of coun- Karthik Dinakar, Birago Jones, Catherine Havasi, seling conversations. Transactions of the Asso- Henry Lieberman, and Rosalind Picard. 2012a. ciation for Computational Linguistics. Common sense reasoning for detection, preven- G Andersson and P Cuijpers. 2009. Internet-based tion, and mitigation of cyberbullying. ACM and other computerized psychological treat- Trans. Interact. Intell. Syst., 2(3):18:1–18:30, ments for adult depression: a meta-analysis. September. Cognitive Behavioral Therapy, 38(4):196–205. Karthik Dinakar, Birago Jones, Henry Lieberman, Rosalind W. Picard, Carolyn Penstein Rosé, R. Bandler and S. Andreas. 1985. Using Your Matthew Thoman, and Roi Reichart. 2012b. Brain for a Change. You too?! mixed-initiative LDA story matching R. Bandler and J. Grinder. 1975. The Structure of to help teens in distress. In Proceedings of the Magic I: A Book About Language and Therapy. Sixth International Conference on Weblogs and Science and Behavior Books. Social Media, Dublin, Ireland, June 4-7, 2012.

R. Bandler and J. Grinder. 1979. Frogs into E. Durkheim. 1951. Suicide. Free Press, New Princes: Neuro Linguistic Programming. Real York. People Press. DD Ebert, AC Zarski, H Christensen, Y Stikkel- broek, P Cuijpers, M Berking, and H Riper. Rush A. J. Shaw B. F. Beck, A. T. and G. Emery. 2015. Internet and computer-based cognitive 1979. Cognitive therapy of depression. Guilford behavioral therapy for anxiety and depression in Publications, New york. youth: a meta-analysis of randomized controlled Shea Bennett. 2012. Twitter now seeing 400 mil- outcome trials. PLoS One, 10(3). lion tweets per day, increased mobile ad revenue, Herbert Edelsbrunner and John Harer. 2010. says ceo@online. Computational Topology - an Introduction. American Mathematical Society. Lisa F. Berkman, Thomas Glass, Ian Brissette, and Teresa E. Seeman. 2000. From social Horacio Fabrega. 1991. Psychiatric stigma in non- integration to health: Durkheim in the new western societies. Comprehensive Psychiatry, millennium? Social Science and Medicine, 32:534–551. 51(6):843––857. Daniel Freedman and Chao Chen. 2011. Algebraic David M. Blei, Andrew Y. Ng, and Michael I. Jor- topology for computer vision. Computer Vision, dan. 2003. Latent dirichlet allocation. J. Mach. 5:239––268. Learn. Res., 3:993–1022. Felixv Gers, Jurgen Schmidhuber, and Fred Cum- W. Bucci and N. Freedman. 1981. The language mins. 2000. Learning to forget contin- of depression. Bulletin of the Menninger Clinic, ual prediction with lstm. Neural Comput, 45(4):334–358. 444 12(10):2451–2471. Robert Ghrist. 2007. Barcodes: The persistent Grégoire Mesnil, Xiaodong He, Li Deng, and topology of data. Technical report. Yoshua Bengio. 2013. Investigation of recurrent-neural-network architectures and Peter Giblin. 2010. Graphs, Surfaces and Homol- learning methods for spoken language under- ogy. Cambridge University Press. standing. In Frédéric Bimbot, Christophe Cerisara, Cécile Fougeron, Guillaume Gravier, Alex Graves. 2012. Supervised Sequence Labelling Lori Lamel, François Pellegrino, and Pas- with Recurrent Neural Networks, volume 385 of cal Perrier, editors, INTERSPEECH, pages Studies in Computational Intelligence. Springer. 3771–3775. ISCA. David E Greenway, Emily K Sandoz, and David R Kathleen M. Michels, Karen J. Hofman, Gerald T. Perkins. 2010. Potential applications of rela- Keusch, and Sharon H. Hrynhow. 2006. Stigma tional frame theory to natural language systems. and global health: Looking forward. Lancet, In Proceedings of the Seventh International Con- 367:538–539. ference on Fuzzy Systems and Knowledge Dis- covery (FSKD), pages 2955–2958. IEEE. Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of Danica Vukadinovic Greetham, Robert Hurling, word representations in vector space. CoRR, Gab rielle Osborne, and Alex Linley. 2011. abs/1301.3781. Social networks and positive and negative af- fect. Procedia-Social and Behavioral Sciences, Adam Miner, Amanda Chow, Sarah Adler, Ilia 22:4––13. Zaitsev, Paul Tero, Alison Darcy, and Andreas Paepcke. 2016. Conversational agents and men- Allen Hatcher. 2002. Algebraic topology. Cam- tal health: Theory-informed assessment of lan- bridge University Press, Cambridge. guage and affect. In Proceedings of the Fourth International Conference on Human Agent In- Sepp Hochreiter and Jürgen Schmidhuber. 1997. teraction, HAI ’16, pages 123–130, New York, Long short-term memory. Neural Comput., NY, USA. ACM. 9(8):1735–1780, November. DC Mohr, J Ho, J Duffecy, KG Baron, O. P. John and S. Srivastava. 1999. The big-five KA Lehman, L Jin, and D Reifler. 2010. Per- trait taxonomy: History, measurement, and the- ceived barriers to psychological treatments and oretical perspectives. Handbook of personality: their relationship to depression. Journal of Clin- Theory and research, 2:102–138. ical Psychology, 66(4):394–409. Chan M.M Johnston, K. and M. Hauman. 2013. David C Mohr, Michelle Nicole Burns, Stephen M Use, perception and attitude of university stu- Schueller, Gregory Clarke, and Michael dents towards facebook and twitter. The Elec- Klinkman. 2013. Behavioral intervention tech- tronic Journal Information Systems Evaluation, nologies: evidence review and recommendations 16(3):201–211, November. for future research in mental health. General hospital psychiatry, 35(4):332–338. T. Kaczynski, K. Mischaikow, and M. Mozek. 2004. Computational Homology. Springer. Thin Nguyen, Dinh Phung, Bo Dao, Svetha Venkatesh, and Michael Berk. 2014. Affective Nikolaos Kazantzis, John Tee, Frank Dattilio, and and content analysis of online depression com- Keith Dobson. 2013. Collaborative empiricism munities. IEEE Transactions on Affective Com- as the central therapeutic relationship element puting, 5(3). in cbt an expert panel discussion. International Journal of Cognitive Therapy, 6(4). Joanna Pawelczyk. 2011. Talk as Therapy: Psy- chotherapy in a Linguistic Perspective. Walter François Mairesse and Marilyn Walker. 2007. Per- de Gruyter. sonage: Personality generation for dialogue. In In Proceedings of the 45th Annual Meeting of James W. Pennebaker, Matthias R. Mehl, and the Association for Computational Linguistics Kate G. Niederhoffer. 2003. Psychological (ACL), pages 496–503. aspects of natural language use: Our words, our selves. Annual review of psychology, Christopher D. Manning and Hinrich Schütze. 54(1):547–577. 1999. Foundations of Statistical Natural Lan- guage Processing. MIT Press, Cambridge, MA, James W Pennebaker, Ryan L Boyd, Kayla Jor- USA. dan, and Kate Blackburn. 2015. The develop- ment and psychometric properties of liwc2015. Evan Mayo-Wilson and Paul Montgomery. 2013. UT Faculty/Researcher Works. Media-delivered cognitive behavioural therapy and behavioural therapy (self-help) for anxiety James W. Pennebaker. 2011. The secret life of pro- disorders in adults. Cochrane Database Syst nouns: What our words say about us. Blooms- Rev, 9. 445 bury Press, New York. T. Pyszcynski and J. Greenberg. 1987. Self- editor, Proceedings of the Twenty-Third Inter- regulatory perseveration and the depressive self- national Joint Conference on Artificial Intelli- focusing style: A self-awareness theory of de- gence, pages 1953–1959. IJCAI/AAAI. pression. Psychological Bulletin, 102:122–138. P.G. Zimbardo and J.N. Boyd. 2008. The Time Nairan Ramirezesparza, Cindy K. Chung, Ewa Paradox: The new psychology of time that will Kacewicz, and James W. Pennebaker. 2008. change your life. Free Press, New York. The psychology of word use in depression forums in english and in spanish: Testing two text ana- Afra Joze Zomorodian. 2001. Computing and com- lytic approaches. In Proceedings of the ICWSM. prehending topology: persistence and hierarchi- cal Morse complexes. Ph.D. thesis, University Stephanie Rude, Eva-Maria Gortner, and James of Illinois at Urbana-Champaign. Pennebaker. 2004. Language use of de- pressed and depression-vulnerable college stu- dents. Cognition and Emotion, 18:1121–1133.

Thomas Ruder, Gary M Hatch, Garyfalia Am- panozi, and Nadja Fischer. 2011. Suicide an- nouncement on facebook. Crisis, 32(5):280–282.

Gurjeet Singh, Facundo Memoli, Tigran Ishkhanov, Guillermo Sapiro, Gunnar Carlsson, and Dario L. Ringach. 2008. Topological analysis of population activity in visual cortex. Journal of Vision, 8(8):1–18.

Ann Sizemore, Chad Giusti, Ari Kahn, Richard F. Betzel, and Danielle S. Bassett. 2016. Cliques and cavities in the human connectome. arXiv:1608.03520.

S. W. Stirman and J. W. Pennebaker. 2001. Word use in the poetry of suicidal and non-suicidal poets. Psychosomatic Medicine, 63:517–522.

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing high-dimensional data using t-sne. Journal of Machine Learning Research, 9:2579–2605, November.

M Villatte, JL Villatte, and SC Hayes. 2015. Mas- tering the Clinical Conversation: Language as Intervention. Guilford Publications, Palo Alto, CA, October.

Keane T. M. Weathers, F. W. and J. Davidson. 2001. Clinician-administered ptsd scale: A re- view of the first ten years of research. Depres- sion and Anxiety, 13:132–156.

Myrna M. Weissman, Roger C. Bland, Glorisa J. Can-ino, Carlo Faravelli, Steven Greenwald, HaiGwo Hwu, Peter R. Joyce, Eile G. Karam, Chung-Kyoon Lee, Joseph Lellouch, Jean-Pierre Lepine, Stephen C. Newman, Maritza Rubio- Stipec, J. Elisabeth Wells, Priya J. Wickra- maratne, Hans-Ulrich Wittchen, and Eng-Kung Yeh. 1996. Cross-national epidemiology of ma- jor depression and bi-polar disorder. Journal of the American Medical Association (JAMA), 276(4):293––299.

Xiaojin Zhu. 2013. Persistent homology: An intro- duction and a new text representation for nat- ural language processing. In Francesca Rossi,446