Automated Identification of Verbally Abusive Behaviors in Online

Automated Identiﬁcation of Verbally Abusive Behaviors in Online Discussions

Srecko´ Joksimovic´ Ryan S. Baker University of South Australia, Australia University of Pennsylvania, USA [email protected] [email protected]

Jaclyn Ocumpaugh Juan Miguel L. Andres University of Pennsylvania, USA University of Pennsylvania, USA [email protected] [email protected]

Ivan Tot Elle Yuan Wang University of Defence in Belgrade, Serbia Arizona State University, USA [email protected] [email protected]

Shane Dawson University of South Australia, Australia [email protected]

Abstract MOOC offerings, they bring numerous challenges for designing effective teaching and learning ac- Discussion forum participation represents a tivities at scale (Kovanovic´ et al., 2015). The un- crucial support for learning and often the only way of supporting social interactions in on- precedented numbers of learners enrolled, and the line settings. However, learner behavior varies diversity in learners’ motivations and goals are but considerably in these forums, including pos- two factors that add a significant layer of complex- itive behaviors such as sharing new ideas or ity that is seldom experienced in more traditional asking thoughtful questions, but also verbally modes of education (Carlos Alario-Hoyos et al., abusive behaviors, which could have dispro- 2017). A product of the complexity of teaching portionate detrimental effects. To provide at scale resides in the lack of student participa- means for mitigating potential negative effects on course participation and learning, we de- tion in discussion activity (Wise and Cui, 2018; veloped an automated classifier for identify- Rosé and Ferschke, 2016). Despite social inter- ing communication that show linguistic pat- actions between peers being a key factor in stu- terns associated with hostility in online fo- dent learning (Poquet and Dawson, 2016; Joksi- rums. In so doing, we employ several well- movic´ et al., 2016), MOOC discussions often re- established automated text analysis tools and ceive limited participation (Wise and Cui, 2018). build on common practices for handling highly Numerous studies have shown that participation imbalanced datasets and reducing sensitivity in discussions is influenced by factors, such as to overfitting. Although still in its infancy, our approach shows promising results (AUC feelings of confusion or isolation, diverse cultural ROC=0.74) towards establishing a robust de- and educational backgrounds, or the lack of abil- tector of abusive behaviors. We provide an ity to navigate when learning in a crowd (Baxter overview of the classification (linguistic and and Haycock, 2014; Poquet et al., 2018). Learn- contextual) features most indicative of online ers in MOOC settings require the rapid capacity to aggression. establish and sustain shared communication prac- 1 Introduction tices in order to join a new and often brief-lived online community (Rosé and Ferschke, 2016). Massive Open Online Courses represent an important part of the educational landscape, offering ac- There is thus far relatively limited research cess to learning at scale for both for-credit and on the pragmatics of academic discussions in life-long learners (Al-Imarah and Shields, 2019). MOOCs. In one line of work, surveys investigat- While there is significant appeal and popularity in ing why students stop posting in MOOC forums show that many quit because of comments deemed In more severe instances, negativity in online com- as politeness violations (Mak et al., 2010). Many munities could lead to cyberbullying and online of these postings involve relatively mild examples aggression in general (Holfeld and Grabe, 2012). of abusive behaviors violations of pragmatic prac- Designed to support interactions at scale and tices around niceness. More extreme violations facilitated as a fully online learning experience, of politeness conventions in MOOCs have also MOOCs pose multiple challenges to successful emerged in the literature, with Comer and her col- participation. For example, success in MOOCs leagues (Comer et al., 2015) reporting a number of is dependent on learners’ motivation, achievement verbally abusive behaviors on the part of students and social emotions, and self-regulatory learning in MOOCs. While such behaviors are relatively skills (among other factors) (Mak et al., 2010). infrequent, they can have disproportionate effects Therefore, as Rose and Ferschke (2016) posit, it is on those involved in the course (Mak et al., 2010; necessary to create “a supportive environment in Comer et al., 2015). which these learners can find community, support, In this work, we build on prior research on text dignity, and respect” (ibid., p664). In that sense, classification and the analysis of learner generated it seems reasonable to build on the approaches to discourse to build an automated classifier for de- mitigate abusive online behaviors commonly ap- tecting verbally abusive behaviors in online dis- plied in online learning communities, then in more cussion forums. In so doing, we employ a wide traditional educational settings. variety of features that range from simple syntac- To understand the nature of negativity in tic properties of text (such as unigrams, bigrams, MOOCs, we draw on the work by Comer and her or part-of-speech tags), to more complex linguis- colleagues (2015) who discuss three types of nega- tic analysis (e.g., text cohesion), in order to iden- tivity in MOOCs: negativity towards i) the course, tify potentially relevant contextual features. We ii) instructor, and iii) course platform. This mul- enhance these detectors through approaches de- tifaceted perspective demonstrates that the main signed to adjust for imbalance in data. The find- sources of negativity are associated with peda- ings from this work bring new insights into the gogy or course design decisions and cannot be linguistic dimensions that could be indicative of easily addressed during course facilitation (Comer online aggression that can help to mitigate the im- et al., 2015). Despite the relatively low pro- pacts of hostile and abusive behaviors on other portion of abusive behaviors in MOOCs, Comer learners. and colleagues illustrate the negative impacts they 2 Background Work have on instructor presence and the broader levels of participation in discussion forums. Detecting 2.1 Roots of Negativity in MOOCs when negativity occurs could provide the opportu- Discourse around negativity in general, and nity for a more automated or semi-automated ap- MOOCs in particular, draws on the research on proaches to reduce its impact, whether by blocking negative emotions in learning and use of abusive offensive content or deploying supportive strate- language in online learning communities (Comer gies for the individuals impacted (Comer et al., et al., 2015). Experiencing anxiety, anger or frus- 2015). tration caused by learning activities that are be- In this study we aim to automate the detection ing negatively valued or perceived as aversive, can of negativity in MOOCs forums. An outcome of lead to decreased engagement, motivation, and this work is to provide a process to enable more consequently failure to achieve specific learning efficient responses to abusive online behaviors in outcomes (Pekrun et al., 2002; Rowe, 2017). On MOOC discussion forums. In so doing, we treat the other hand, with the emergence of social me- negativity as a single construct, rather than dif- dia and their use to support development of online ferentiating negativity towards the course, plat- learning communities, negativity and abusive on- form, or instructor, due to the relative infrequency line behaviors can potentially have much broader of negative behaviors. Although we concur that consequences (Salminen et al., 2018). Less ex- negativity in MOOCs can potentially have mul- treme manifestations of abusive language in online tiple facets, our goal in this study is to provide learning communities could lead towards disen- insight into factors that could indicate detrimen- gagement from the community (Mak et al., 2010). tal and abusive online behaviors in their broadest manifestation even negativity towards the course social games, interactions with virtual partners, or platform can be upsetting to others (Comer et al., the comments on popular news media (such as 2015). CNN.com or Yahoo! News) (Balci and Salah, 2015; Nobata et al., 2016). Relying on wide 2.2 Automated Analysis of Abusive Language range of linguistic and contextual features (e.g., learner profile related information), Balci and Ali Contemporary literature on affect in MOOC dis- Salah (2015) used the Bayes Point Machine clas- course primarily relies on content analysis meth- sification algorithm to identify online profiles that ods (Joksimovic´ et al., 2018b). To date, this has elicit abusive behaviors in social games. Nobata involved exploring affect and emotions to under- and colleagues (2016), on the other hand, explored stand factors that predict persistence and success the manifestation of abusive language in the com- in MOOCs (Joksimovic´ et al., 2018b). Tucker ments posted on Yahoo! Finance and News ar- and colleagues (2014), for example, relied on a ticles. Nobata and colleagues (2016) developed a word-sentiment lexicon to extract sentiment polar- deep learning approach, utilizing n-grams, linguis- ity (i.e., positive, negative, or neutral) and strength tic features (e.g., length of tokens, average length (i.e., the magnitude of sentiment) from discussion of word), syntactic features (e.g., par-of-speech forum messages. Tucker and colleagues found tag of parent), and distributional semantics fea- a strong negative association between the senti- tures. ment expressed in forums and average assignment Our work goes beyond existing approaches to grade. Adamopoluous (2013) opted for a more understanding MOOC discourse, trying to de- fine-grained analysis, exploring learners’ senti- tect abusive behaviors that could potentially have ment towards course instructor, assignments, and detrimental effects on teaching and learning. In course material, utilizing AlchemyAPI. Finally, so doing, we rely on features commonly identified Yang and colleagues (2015) relied on Linguistic as being predictive of learners’ affective states and Inquiry and Word Count (LIWC) features, and emotions in online learning settings. We also uti- word categories that depict student affective pro- lize algorithms and methods applied in general re- cesses, including positive and negative emotions. search on understanding verbal aggression in on- to detect confusion within student contributions to line learning communities in general. the discussion forum. Although the existing MOOC research recog- 3 Method nizes the importance of understanding learners’ emotions expressed through interactions in online 3.1 Data discussion forums, little has been done to detect The dataset for this study was obtained from the negativity and abusive online behaviors. Relevant Big Data in Education MOOC, delivered from Oc- work exists, however, in efforts to understand on- tober to December 2013, by Columbia University, line learning communities and social media inter- taught through the Coursera platform. This course actions in general. Several approaches have been iteration had a total of 45,256 enrolled learners developed to detect dimensions of verbal aggres- during the course an additional 20,316 joined and sion and abusive behavior in social media and accessed the course after its official end date. To online social platforms more broadly (Balci and successfully complete the course and receive a cer- Salah, 2015; Anzovino et al., 2018). For exam- tificate, learners were required to earn an over- ple, Abozinadah and Jones (2017) used Support all grade average of 70% or above. The overall Vector Machines (SVM) to detect abusive Twit- grade was calculated by averaging the six high- ter accounts. In another example, Anzovino and est grades extracted out of a total of eight as- colleagues (2018), utilized a wide set of linguis- signments. All assignments were composed of tic and bag-of-word features to explore the accu- multiple-choice questions and short numerical an- racy of various classifiers to identify misogynistic swers and as such, were available for automatic language on Twitter. The best classification accu- grading. Discussion participation was not graded. racy was achieved using an SVM classifier based The majority of students only watched videos and on unigrams, bigrams, and trigrams. did not participate in the assessment tasks. Some Additionally, a considerable body of research 1,380 students completed at least one assignment, focuses on detecting verbal aggression in online while a total of 638 learners successfully completed the course. sequences of words that commonly appear to- Like vast majority of MOOC offerings, the dis- gether. Additionally, we extracted part-of-speech cussion activity consists of a considerably small tags (e.g., noun, verb, adjective) and syntactic de- number of learners (Poquet and Dawson, 2016). pendency (i.e., the relation between tokens) fea- For the MOOC under investigation, 747 unique tures. Although features like n-grams tend to in- users were engaged in discussion forum (N=747, flate the feature space, these are often used as a including teaching staff). In total, the discus- baseline feature set, against which other features sion forum contained 4,039 messages, written in are compared to evaluate their contribution to the English (M=5.41, SD=23.93). Two independent classification accuracy. Due to a limited training coders coded the dataset, labeling each message set size and unbalanced data, concerns about over- as being “negative”, if at least one of the negativity fitting led us to use only the top most common 100 types as defined by Comer and colleagues (2015) n-grams. All the basic features were extracted us- was found in a message, or “positive/neutral” oth- ing Python programming language and the spaCy, erwise. The process was performed through sev- open-source library for Natural Language Process- eral phases. First 100 messages were analyzed ing in Python. together, to train the researchers and develop the 3.2.2 Linguistic Facilities coding scheme. After that, each of the coders in- dependently labeled 200, 300, 400, and 500 mes- In this study, we utilize three additional tools sages, until a satisfactory percent agreement (%- for advanced text analytics. Specifically, we use agree = 96.6) was reached. The percent agreement Linguistic Inquiry and Word Count (LIWC) to was calculated at the end of each stage and all dis- extract counts of different word categories, in- agreements were discussed and resolved. The re- dicative of various psychological processes, such maining messages (from 1,501 to 4,039) were split as social words, cognitive processes, or affect between the two coders. words (Tausczik and Pennebaker, 2010). Previ- Out of these 4,039 messages, 3,917 were posi- ous research demonstrates the potential of LIWC tive/neutral, and 122 (3.02%) were coded as neg- to capture different aspects of students’ cognitive ative. From the total number of students who engagement during learning. For example, Ko- posted to discussion forum, 82 students posted at vanovic and colleagues (Kovanovic´ et al., 2014), least one message coded as “negative” (M=1.49, as well as Joksimovic and colleagues (Joksimovic´ SD=1.09). Nevertheless, only 9 students posted et al., 2014), showed that certain LIWC cate- more than two messages coded as negative, show- gories, such as the number of question marks or ing repeated negativity towards the instructor, the number of first-person singular pronouns, are course platform, or course content. among the most important predictors of different phases of cognitive presence. Moreover, dimen- 3.2 Features sions captured by LIWC (e.g., certainty, nega- In order to develop a classification system for rec- tions, or causal verbs), have been positively asso- ognizing negativity in learners’ posts in a dis- ciated with (deactivating) negative emotions, such cussion forum, we utilize several types of fea- as boredom, anxiety, or frustration (D’Mello and tures. The extracted features build on those com- Graesser, 2012). monly used in the existing work on discourse anal- We also utilize TAACO, a linguistic tool for ysis (Kovanovic´ et al., 2014; Joksimovic´ et al., automated analysis of text cohesion that provides 2014). Specifically, we rely on basic linguistic fea- more than 150 indicators of text coherence lin- tures (such as n-grams and part-of-speech tags), guistic complexity, text readability, and lexical features extracted using tools for automated text category use (Crossley et al., 2016). Dowell analysis, and contextual features. The final feature and colleagues (2015), and Joksimovic and col- set included 688 features. leagues (2018a), established the association between various metrics of text cohesion (e.g., ref- 3.2.1 Basic Linguistic Features erential or deep cohesion) and multiple social Our set includes some of the commonly used bag- and academic learning outcomes. D’Mello and of-words features, utilized in similar classifica- Graesser (2012), on the other hand, showed the as- tion problems. Specifically, we extracted n-gram sociation between cohesion-based metrics and stu- features (i.e., unigrams, bigrams, and trigrams), dent emotions (e.g.., boredom, engagement, confusion, or frustration) expressed during tutoring. ative or deactivating emotions is common within It seems also reasonable to expect that the learning (Pekrun et al., 2002), verbally abusive be- negativity in discussion posts would be reflected haviors are less common, although still detrimen- through various emotional states. Therefore, we tal (Mak et al., 2010; Comer et al., 2015). As in- also used the IBM Watson Natural Language Un- dicated in our dataset, a small percentage of mes- derstanding API to detect anger, disgust, joy, fear, sages (3.02%) coded as “negative”, resulted in a and sadness, conveyed in discussion forum mes- highly imbalanced dataset, which could have neg- sages. Finally, given that research argues for the ative effects on the classification results. In ad- importance of considering sentiment expressed in dition, participation in discussion forums, includ- discussion forums as being predictive of persis- ing the use of inappropriate or negative behaviors, tence in MOOCs, we extracted sentiment polarity varies by factors such as student demographics or and sentiment subjectivity, using TextBlob Python motivation (Mak et al., 2010). Thus, the tendency library for natural language processing tasks. to engage in inappropriate behaviors might (and does) vary from one learner to another. That is, 3.2.3 Contextual Features only a small subset of students will express nega- Drawing on previous research by Kovanovic and tivity in discussion forums. colleagues (Kovanovic´ et al., 2014), we further in- To address the first problem of the highly imbal- cluded contextual features into our feature space. anced classes, we employed two strategies. First, As Comer and colleagues (2015) suggest, some the SVM classifier was configured to use balanced of the learners posting negative messages in dis- class weights. This configuration is used to ad- cussion forums tend to do so consistently. There- just weights inversely proportional to class fre- fore, for each post we observed whether the pre- quencies, defining higher weight for the “nega- vious post by the same student was also negative. tive” class in our case. Second, we also imple- Moreover, it seems reasonable to expect that learn- mented a False Positive Rate test into the clas- ers would build on the existing discourse, there- sification pipeline. The False Positive Rate test fore we also observed whether there were nega- controls for the total amount of false detections, tive messages in the same thread, prior to the ob- which are common in imbalanced datasets with a served post. Furthermore, we observed whether rare category of interest, as in this study. the posted message is a post or a comment, the Cross-validation is typically used to control for start or the end of the thread, and number of votes overfitting. Desmarais and Baker (2012), high- the observed post received. Finally, for each of light the importance of cross-validating at student the posts we obtained an information whether the level, to estimate goodness for new students rather message contains positive and negative words, as than for new data from the same students. In our well as the proportion of words that were positive study, we rely on GroupKFold Python implemen- and the proportion that were negative. tation of a K-fold iterator with non-overlapping groups (i.e., ensuring that each learner is only rep- 3.3 Model Implementation resented in a single fold). We built our classifier using the Python scikit- learn implementation of Support Vector Machines 4 Results (SVM), one of the most robust classifiers for text analysis (2014). In order to obtain optimal classifi- 4.1 Model Training and Evaluation cation results, we performed hyperparameter opti- Table 1 shows the results of our model selection mization within the training set with parameters C and evaluation. To find the optimal model, we pri- (0.001, 0.01, 0.1, 1, 10) and gamma (0.001, 0.01, marily rely on Area Under the Receiver Operating 0.1, 1), for each of the four kernels (i.e., “poly”, Characteristic Curve (ROC AUC) score, as Co- “rbf”, “linear”, “sigmoid”). We opted for the lin- hen’s statistics does not yield reliable estimates ear kernel, (C=0.001, gamma=0.001) as the set- for highly imbalanced datasets, as it is the case tings with linear kernel yielded the best perfor- in this study (Jeni et al., 2013). To obtain opti- mance. mal results, we performed classification including There are two challenges associated with the various subsets of the original feature set (Table dataset that is inherent to the nature of the prob- 1). The highest AUC ROC value with the com- lem under study. Although the expression of neg- plete feature set was 0.73 (SD=0.06). The clas- Table 1: Classification results for different SVM configurations, varying the feature set used in predicting abusive language and p-value cutoff point at 0.05 for False Positive Rate test.

Figure 2: Top 40 features differentiating abusive lan- sification accuracy for the same set of parameters guage from overall positive/neutral language in discus- was .86 (SD=0.02), whereas the F1 score was .90 sion forum. It should be noted that values higher than (SD=0.02). 0 indicate features predictive of abusive language. Table 1 further shows that adding bigrams, trigrams and POS features (including tag and syn- platform also revealed high predictive power. Fi- tactic dependency) resulted in lower AUC ROC nally, the total number of votes and whether mes- values, despite the slight increase in the clas- sage contained negative words were also found to sification accuracy. The ROC AUC score for be indicative of messages characteristic of nega- the feature set that included Unigrams, TAACO, tive behaviors towards the course content and de- LIWC, Sentiment, and Contextual features was sign, course platform or course instructor. 0.74 (SD=0.06). The classification accuracy for Figure 1 further shows that part-of-speech tags the same set of parameters was .85 (SD=0.01), representing adjective in superlative whereas the F1 score was .89 (SD=0.01). (e.g., “most”, “worst”), were among the strongest predictors of negativity in online discussions. 4.2 Feature Importance Analysis Other variables labeled as part of the part-of- Given the size of the feature space (688 features), speech dataset that were highly associated with in the feature importance analysis we focus on negative messages are variables indicating the the top 40 features used in the data separation number of possession modifiers in a post (e.g., task. That is, we observe the top 20 features most “... my experiences of the first hour in this class”, predictive of “negative” language and the top 20 “WASTE OF MY TIME”). On the other hand, features most predictive of “positive/neutral” lan- variables indicative of positive/neutral messages guage in the data set. Figure 2 shows that all were adjectives, wh-determiners (e.g., groups of features (i.e., basic linguistic, features “what”, “which”), and adverbial clause extracted using automated text analysis tools, and modifiers (e.g., “Confusion is good, just as contextual features) are being identified within long as it is addressed”). this subset of important features. A considerable number of LIWC features were It is noteworthy that contextual variables identified as being highly related to either nega- yielded the highest predictive power for negativity tive or positive/neutral messages in MOOC dis- (Figure 1). Specifically, Previous negative cussions (Figure 1). Specifically, words asso- thread at least one of the previous messages ciated with common adverbs (e.g., “write”, in the thread was negative - has been identified “read”, “hope”), perceptual processes as the most important variable in predicting detri- (e.g., “watched”, “said”, “showed”), negations mental behaviors. Moreover, whether a message (e.g., “neither”, “don’t”, “couldn’t”), and function is a post (i.e., reply to a thread) or a comment (i.e., words that represent 3rd person singular reply to a post), as defined within the Coursera form (e.g., “him”, “he’s”, “he”), were associated with messages indicative of abusive behaviors. On the other hand, words indicative of psychological processes representing core drives and needs (i.e., affiliation “welcome”, “shared”), positive emotions (e.g., “helpful”, “en- courage”, “honest”), analytical thinking, as well as function words (i.e., conjunctions “how”, “then”, “when”), were highly associated with positive/neutral behaviors (Figure 1). Likewise, two variables extracted using TAACO linguistic facility were ranked among top 20 features predictive of “negative” messages. Specifically, count of causal connectives (e.g., “although”, “because”) and lexical subordinates (e.g., “unless”, “whenever”) were ranked as important variables in predicting abusive behavior. On the other hand, considerably more TAACO variables were identified as Figure 3: Features differentiating abusive language predictive of “positive/neutral” messages. Total from overall positive/neutral language in discussion fo- number of content types, positive words, lemma rum, for the model excluding bigram, trigram, and POS types (including bigram and trigram lemmas), (including dependencies) features. It should be noted connectives, and pronoun types. that values higher than 0 indicate features predictive of Several ngrams were also identified as impor- abusive language. tant variables in differentiating abusive language from “positive/neutral” discourse. In the context of predicting “negative” messages, classify sults show that primarily contextual, but also com- data assign, much, make sen, data plex linguistic features, such as those extracted us- predict, educ data mine, video, and dr ing LIWC and TAACO linguistic facilities repre- baker emerged as the best predictors of abu- sent important variables in predicting negativity in sive behaviors. Ngrams such as hi, thank, or MOOCs. As such, our classifier outperforms, by follow, on the other hand, were associated with a considerable margin, some of the recent work “positive/neutral” category of messages. in identifying hate speech in online communities (Salminen et al., 2018). Observing variable importance with the smaller dataset (excluding part-of-speech, tag, and de- Kovanovic and colleagues (2014), argue for the pendency variables) yielded rather similar results importance of understanding the specific context as the complete feature set (Figure 2). Contex- in which certain messages in discussion forums tual, LIWC, and ngrams (unigrams) still com- have been posted. Our analysis on the complete prise a considerable part of the variables predic- and filtered feature set (without bigram, trigrams, tive of abusive behavior. Similarly, vide variety of and part-of-speech tag features) further support TAACO variables was identified as indicative of this finding. Moreover, the most important fea- “positive/neutral” messages. ture for predicting abusive language in MOOC discussions is a variable that flags whether the thread 5 Discussion and Conclusion in which the current message has been posted al- ready contains a “negative” message. This find- Identifying and mitigating abusive behaviors in ing directly contributes to the claim made by Mak the context of MOOCs is important for reducing and colleagues (2010) or others, about the detri- the detrimental effects of negative language on mental and likely disproportionate effect abusive peers and instructors. In this research, we man- language in MOOCs could have on the overall ually coded all discussion forum messages writ- participation. The count of votes, as a contex- ten in English (N=4,039) from one MOOC, to tual variable, also warrants further exploration. build an automated classifier for identification of Complimenting others or content of others’ mes- potentially harmful discussion messages. Our re- sages represent one of the indicators identified within the social presence open communication ing abusive behaviors, are related to specific as- category (Garrison and Akyol, 2013). However, pects of the course (Figure 1 and 2). For example, one of the potential implications for future re- ngrams such as “educ data mine”, “video”, “data search could be exploration to what extent learners predict”, or “dr baker”, indicate learners’ focus on who express abusive behaviors in online commu- high level and general aspects of the course, rather nities tend to support each other. That is, to what than particular content related issues. On the other extend acknowledgment and approval of negative hand, among the most important variables in pre- behaviors implies negative connotation for the de- dicting positive/neutral messages, unigrams such velopment of supportive learning environment and as “hi” or “thank” emerged. Along with the LIWC consequently learning success. variable “affiliation”, these represent features in- Our work also supports previous findings on un- dicative of higher levels of social presence [34]. derstanding linguistic variables predictive of vari- Being recognized as important aspects of open and ous dimensions of affect and emotions. For exam- cohesive communication, as defined by Garrison ple, D’Mello and Graesser (2012) showed that the and colleagues [34], these variables represent im- high ratio of causal words was positively associ- portant indicators of tendency to establish collab- ated with higher frustration. Whereas, negations orative and engaging community of learners. were positively and significantly associated with 5.1 Limitations boredom. Similar finding has been observed in our work where total count of all causal words was one Although the dataset is reasonably large among of the main predictors of abusive language (Figure text classification problems, high data imbalance 2). Building further on Pekrun’s (2002) control- represents one of the main challenges to this study. value theory of achievement emotions, it seems Moreover, in this preliminary analysis, we rely on that activities learners value negatively and per- the dataset from a single, technical MOOC (i.e., ceive as not being controllable, potentially lead to- focused on the topics of big data and statistics). wards the abusive behaviors in online discussions. Future work should account for different subject It is also noteworthy that variables being iden- domains and different educational settings (e.g., tified as important predictors of “positive/neutral” more formal traditional online courses). messages, have been found to be associated with higher levels of cognitive engagement. For exam- References ple, Joksimovic and colleagues [26] showed that the number of conjunctions (LIWC variable) or Ehab A. Abozinadah and James H. Jones, Jr. 2017. types of verbs (here captures using TAACO) were A Statistical Learning Approach to Detect Abusive Twitter Accounts. In Proceedings of the Interna- some of the variables positively and significantly tional Conference on Compute and Data Analysis, associated with higher phases of cognitive inquiry, ICCDA ’17, pages 6–13. ACM. as defined by Garrison and colleagues [34]. This further supports the work by Rowe [13], among Panagiotis Adamopoulos. 2013. What Makes a Great others, who showed that surface learners might be MOOC? An Interdisciplinary Analysis of Student Retention in Online Courses. In 34th International more likely to experience negative emotions, sug- Conference on Information Systems, United States. gesting that “surface learners may react negatively Association for Information Systems. The most to teaching methods which attempt to foster in- heavily-cited paper from the ICIS 2013 proceedings dependent learning” (ibid., 299). Such a finding (as of August 15th, 2016). could have significant implications for future re- Ahmed A. Al-Imarah and Robin Shields. 2019. search and practice in mitigating abusive behav- MOOCs, Disruptive Innovation and the Future of iors. Gigher Education: A Conceptual analysis. Inno- vations in Education and Teaching International, Although rather simple syntactic properties of 56(3):258–269. text, such as ngram features, can easily inflate the feature space and result in overfitting, our results M. Anzovino, E. Fersini, and P. Rosso. 2018. Auto- show that these variables should not be ignored. In matic identification and classification of misogynistic language on twitter. Lecture Notes in Computer the context of “negative” messages, it is indicative Science (including subseries Lecture Notes in Artifi- that unigrams, bigrams and trigram that emerged cial Intelligence and Lecture Notes in Bioinformat- among the most important variables in predict- ics), 10859 LNCS:57–64. Koray Balci and Albert Ali Salah. 2015. Automatic László A Jeni, Jeffrey F Cohn, and Fernando Analysis and Identification of Verbal Aggression De La Torre. 2013. Facing Imbalanced Data Recom- and Abusive Behaviors for Online Social Games. mendations for the Use of Performance Metrics. In- Computers in Human Behavior, 53:517 – 526. ternational Conference on Affective Computing and Intelligent Interaction and workshops : [proceed- Jacqueline Baxter and Jo Haycock. 2014. Roles and ings]. ACII (Conference), 2013:245–251. Student Identities in Online Large Course Forums: Implications for Practice. The International Re- Srecko´ Joksimovic,´ Nia Dowell, Oleksandra Poquet, view of Research in Open and Distributed Learning, Vitomir Kovanovic,´ Dragan Gaševic,´ Shane Daw- 15(1). son, and Arthur C. Graesser. 2018a. Exploring De- velopment of Social Capital in a cMOOC Through Carlos Alario-Hoyos, Iria Estévez-Ayres, Mar Pérez- Language and Discourse. The Internet and Higher Sanagustín, Carlos Delgado Kloos, and Carmen Education, 36:54 – 64. Fernández-Panadero. 2017. Understanding Learn- ers Motivation and Learning Strategies in MOOCs. Srecko´ Joksimovic,´ Dragan Gaševic,´ Vitomir Ko- The International Review of Research in Open and vanovic,´ Olusola Adesope, and Marek Hatala. 2014. Distributed Learning, 18(3). Psychological Characteristics in Cognitive Presence of Communities of Inquiry: A Linguistic Analysis Denise Comer, Ryan Baker, and Wang Yuan. 2015. of Online Discussions. The Internet and Higher Ed- Negativity in Massive Online Open Courses: Im- ucation, 22:1 – 10. pacts on Learning and Teaching and How Instruc- tional Teams May Be Able to Address It. InSight: A Srecko´ Joksimovic,´ Areti Manataki, Dragan Gaševic,´ Journal of Scholarly Teaching, 10:92 – 113. Shane Dawson, Vitomir Kovanovic,´ and Inés Friss de Kereki. 2016. Translating network position into Scott A Crossley, Kristopher Kyle, and Danielle S Mc- performance: Importance of centrality in different Namara. 2016. The tool for the automatic analysis network configurations. In Proceedings of the Sixth of text cohesion (TAACO): Automatic assessment of International Conference on Learning Analytics & local, global, and text cohesion. Behavior research Knowledge (LAK’16), LAK ’16, pages 314–323, methods, 48(4):1227–1237. New York, NY, USA. ACM. Michel C. Desmarais and Ryan S. J. d. Baker. 2012. A Review of Recent Advances in Learner and Srecko´ Joksimovic,´ Oleksandra Poquet, Vitomir Ko- Skill Modeling in Intelligent Learning Environ- vanovic,´ Nia Dowell, Caitlin Mills, Dragan Gaševic,´ ments. User Modeling and User-Adapted Interac- Shane Dawson, Arthur C. Graesser, and Christopher tion, 22(1):9–38. Brooks. 2018b. How do we model learning at scale? a systematic review of research on moocs. Review of S. K. D’Mello and A. Graesser. 2012. Language and Educational Research, 88(1):43–86. discourse are powerful signals of student emotions during tutoring. IEEE Transactions on Learning Vitomir Kovanovic,´ Srecko´ Joksimovic,´ Dragan Technologies, 5(4):304–317. Gaševic,´ and Marek Hatala. 2014. Automated cognitive presence detection in online discussion tran- Nia M Dowell, Oleksandra Skrypnyk, Srecko´ Joksi- scripts. In Proceedings of the Workshops at the LAK movic,´ Arthur C Graesser, Shane Dawson, Dragan 2014 Conference co-located with 4th International Gaševic,´ Thieme A Hennis, Pieter de Vries, and Vit- Conference on Learning Analytics and Knowledge omir Kovanovic.´ 2015. Modeling Learners’ Social (LAK’14), Indianapolis, IN. Centrality and Performance through Language and Discourse. International Educational Data Min- Vitomir Kovanovic,´ Srecko´ Joksimovic,´ Dragan ing Society, Paper presented at the 8th International Gaševic,´ George Siemens, and Marek Hatala. 2015. Conference on Educational Data Mining (EDM) What public media reveals about MOOCs: A sys- (8th, Madrid, Spain, Jun 26-29, 2015). tematic analysis of news reports. British Journal of Educational Technology, 46(3):510–527. Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, and Dinani Amorim. 2014. Do We Need Sui Mak, Roy Williams, and Jenny Mackness. 2010. Hundreds of Classifiers to Solve Real World Classi- Blogs and forums as communication and learning fication Problems? The Journal of Machine Learn- tools in a MOOC. In Proceedings of the 7th Inter- ing Research, 15(1):3133–3181. national Conference on Networked Learning 2010, pages 275–285. University of Lancaster. D Randy Garrison and Zehra Akyol. 2013. The Com- munity of Inquiry Theoretical Framework. In Hand- C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and book of distance education, pages 122–138. Rout- Y. Chang. 2016. Abusive language detection in on- ledge. line user content. In 25th International World Wide Web Conference, WWW 2016, pages 145–153. Brett Holfeld and Mark Grabe. 2012. Middle School Students’ Perceptions of and Responses to Cyber Reinhard Pekrun, Thomas Goetz, Wolfram Titz, and Bullying. Journal of Educational Computing Re- Raymond P. Perry. 2002. Academic emotions in search, 46(4):395–413. students’ self-regulated learning and achievement: A program of qualitative and quantitative research. Models for Identifying and Classifying Hate in On- Educational Psychologist, 37(2):91–105. line News Media. In International AAAI Conference on Web and Social Media, pages 330–339. Oleksandra Poquet and Shane Dawson. 2016. Un- tangling MOOC learner networks. In Proceedings Yla R. Tausczik and James W. Pennebaker. 2010. The of the Sixth International Conference on Learning Psychological Meaning of Words: LIWC and Com- Analytics & Knowledge, LAK ’16, pages 208–212. puterized Text Analysis Methods. Journal of Lan- ACM. guage and Social Psychology, 29(1):24–54. Oleksandra Poquet, Nia Dowell, Christopher Brooks, Conrad Tucker, Barton K. Pursel, and Anna Di- and Shane Dawson. 2018. Are MOOC forums vinsky. 2014. Mining Student-Generated Tex- changing? In Proceedings of the 8th International tual Data In MOOCS and Quantifying Their Ef- Conference on Learning Analytics and Knowledge, fects on Student Performance and Learning Out- LAK ’18, pages 340–349. ACM. Event-place: Syd- comes. In 2014 ASEE Annual Conference & Ex- ney, New South Wales, Australia. position, Indianapolis, Indiana. ASEE Conferences. Https://peer.asee.org/22840. Carolyn Penstein Rosé and Oliver Ferschke. 2016. Technology Support for Discussion Based Learning: Alyssa Friend Wise and Yi Cui. 2018. Unpacking From Computer Supported Collaborative Learning the relationship between discussion forum partici- to the Future of Massive Open Online Courses. In- pation and learning in MOOCs: Content is key. In ternational Journal of Artificial Intelligence in Edu- Proceedings of the 8th International Conference on cation, 26(2):660–678. Learning Analytics and Knowledge, LAK ’18, pages 330–339. ACM. Event-place: Sydney, New South Anna D. Rowe. 2017. Feelings about feedback: Wales, Australia. the role of emotions in assessment for learning, The Enabling power of assessment, pages 159–172. Diyi Yang, Miaomiao Wen, Iris Howley, Robert Kraut, Springer, Springer Nature, United States. and Carolyn Rose. 2015. Exploring the effect of confusion in discussion forums of massive open on- Joni Salminen, Hind Almerekhi, Milica Milenkovi, line courses. In Proceedings of the Second (2015) Soon gyo Jung, Jisun An, Haewoon Kwak, and ACM Conference on Learning @ Scale, L@S ’15, Bernard Jansen. 2018. Anatomy of Online Hate: pages 121–130, New York, NY, USA. ACM. Developing a Taxonomy and Machine Learning