Emotion Classification in a Resource Constrained Language Using

Emotion Classification in a Resource Constrained Language Using Transformer-based Approach Avishek Das, Omar Sharif, Mohammed Moshiul Hoque and Iqbal H. Sarker Department of Computer Science and Engineering Chittagong University of Engineering & Technology Chittagong-4349, Bangladesh [email protected],fomar.sharif,moshiul 240,[email protected] Abstract have accelerated the development of emotion classification research in high-resource languages such Although research on emotion classification as English, Arabic, Chinese, and French (Plaza del has significantly progressed in high-resource Arco et al., 2020). However, there is no notable languages, it is still infancy for resource- constrained languages like Bengali. However, progress in low resource languages such as Bengali, unavailability of necessary language process- Tamil and Turkey. The proliferation of the Internet ing tools and deficiency of benchmark corpora and digital technology usage produces enormous makes the emotion classification task in Ben- textual data in the Bengali language. The analysis gali more challenging and complicated. This of these massive amounts of data to extract under- work proposes a transformer-based technique lying emotions is a challenging research issue in to classify the Bengali text into one of the the realm of Bengali language processing (BLP). six basic emotions: anger, fear, disgust, sad- The complexity arises due to various limitations, ness, joy, and surprise. A Bengali emotion corpus consists of 6243 texts is developed for such as the lack of BLP tools, scarcity of bench- the classification task. Experimentation car- mark corpus, complicated language structure, and ried out using various machine learning (LR, limited resources. By considering the constraints RF, MNB, SVM), deep neural networks (CNN, of emotion classification in the Bengali language, BiLSTM, CNN+BiLSTM) and transformer this work aims to contribute to the following: (Bangla-BERT, m-BERT, XLM-R) based approaches. Experimental outcomes indicate • Develop a Bengali emotion corpus consisting that XLM-R outdoes all other techniques by of 6243 text documents with manual annota- achieving the highest weighted f1-score of 69:73% on the test data. The dataset is pub- tion to classify each text into one of six emo- licly available at https://github.com/ tion classes: anger, disgust, fear, joy, sadness, omar-sharif03/NAACL-SRW-2021. surprise. 1 Introduction • Investigate the performance of various ML, DNN and transformer-based approaches on Classification of emotion in the text signifies the the corpus. task of automatically attributing an emotion cat- egory to a textual document selected from a set • Proposed a benchmark system to classify emo- arXiv:2104.08613v1 [cs.CL] 17 Apr 2021 of predetermined categories. With the growing tion in Bengali text with the experimental val- number of users in virtual platforms generating idation on the corpus. online contents steadily as a fast-paced, interpret- ing emotion or sentiment in online content is vital 2 Related Work for consumers, enterprises, business leaders, and other parties concerned. Ekman (Ekman, 1993) Substantial research activities have been carried defined six basic emotions: happiness, fear, anger, out on emotion analysis in high-resource languages sadness, surprise, and disgust based on facial fea- like English, Arabic, and Chinese (Alswaidan and tures. These primary type of emotions can also be Menai, 2020). A multi-label with multi-target emo- extracted from the text expression (Alswaidan and tion detection of Arabic tweets accomplished using Menai, 2020). decision trees, random forest, and KNN, where The availability of vast amounts of online data random forest provided the highest f1-score of and the advancement of computational processes 82.6% (Alzu’bi et al., 2019). Lai et al.(2020) proposed a graph convolution network architec- lowing pre-processing before the annotation: ture for emotion classification from Chinese mi- croblogs and their proposed system achieved an F- • Removal of non-Bengali words, punctuation, measure of 82.32%. Recently, few works employed emoticons and duplicate data. transformer-based model (i.e., BERT) analyse emotion in texts. (Huang et al., 2019; Al-Omari et al., • Discarding data less than three words to get 2020) used a pre-trained BERT for embedding pur- an unerring emotional context. pose on top of LSTM/BiLSTM to get an improved After pre-processing the corpus holds 6523 text f -score of 76.66% and 74.78% respectively. 1 data. The processed texts are eligible for manual Although emotion analysis on limited resource annotation. The details of the preprocessing mod- languages like Bengali is in the preliminary stage, ules found in the link1. few studies have already been conducted on emotion analysis using ML and DNN methods. Irtiza 3.2 Data Annotation and Quality Tripto and Eunus Ali(2018) proposed an LSTM Five postgraduate students working on BLP were based approach to classify multi-label emotions assigned for initial annotation. To choose the initial from Bengali, and English sentences. This system label majority voting technique is applied (Mag- considered only YouTube comments and achieved atti et al., 2009). Initial labels were scrutinized 59.23% accuracy. Another work on emotion classi- by an expert who has several years of research ex- fication in Bengali text carried out by Azmin and pertise in BLP. The expert corrected the labelling Dhar(2019) concerning three emotional labels (i.e., if any initial annotation is done inappropriately. happy, sadness and anger). They used Multinomial The expert discarded 163 texts with neutral emo- Na¨ıve Bayes, which outperformed other algorithms tion and 117 texts with mixed emotions for the with an accuracy of 78.6%. Pal and Karn(2020) intelligibility of this research. To minimize bias developed a logistic regression-based technique during annotation, the expert finalized the labels to classify four emotions (joy, anger, sorrow, sus- through discussions and deliberations with the an- pense) in Bengali text and achieved 73% accuracy. notators (Sharif and Hoque, 2021). We evaluated Das and Bandyopadhyay(2009) conducted a study the inter-annotator agreement to ensure the quality to identify emotions in Bengali blog texts. Their of the annotation using the coding reliability (Krip- scheme attained 56.45% accuracy using the con- pendorff, 2011) and Cohen’s kappa (Cohen, 1960) ditional random field. Recent work used SVM to scores. An inter-coder reliability of 93.1% with classify six raw emotions on 1200 Bengali texts Cohen’s Kappa score of 0.91 reflects the quality of which obtained 73% accuracy (Ruposh and Hoque, the corpus. 2019). 3.3 Data Statistics 3 BEmoC: Bengali Emotion Corpus The BEmoC contains a total of 6243 text docu- Due to the standard corpus unavailability, we de- ments after the preprocessing and annotation pro- veloped a corpus (hereafter called ‘BEmoC’) to cess. Amount of data inclusion in BEmoC varies classify emotion in Bengali text. The development with the sources. For example, among online procedure is adopted from the guidelines stated in sources, Facebook contributed the highest amount (Dash and Ramamoorthy, 2019). (2796 texts) whereas YouTube (610 texts), blogs (483 texts), and news portals (270 texts) contributed 3.1 Data Collection and Preprocessing a small amount. Offline sources contributed a to- Five human crawlers were assigned to accumu- tal of 2084 texts, including storybooks (680 texts), late data from various online/offline sources. They novels (668 texts), and conversations (736 texts). manually collected 6700 text documents over three Data partitioned into train set (4994 texts), valida- months (September 10, 2020 to December 11, tion set (624 texts) and test set (625 texts) to eval- 2020). The crawler accumulated data selectively, uate the models. Table1 represents the amount of i.e., when a crawler finds a text that supports the data in each class according to the train-validation- definition of any of the six emotion classes accord- test set. ing to Ekman(1993), the content is collected, oth- 1https://github.com/omar-sharif03/ erwise ignored. Raw accumulated data needs fol- NAACL-SRW-2021/tree/main/Code%20Snippets Class Train Validation Test seem to have almost similar number of texts in all Anger 621 67 71 Disgust 1233 155 165 length distributions. Fear 700 89 83 For quantitative analysis, the Jaccard similarity Joy 908 120 114 among the classes has been computed. We used Sadness 942 129 119 Surprise 590 64 73 200 most frequent words from each emotion class, and the similarity values are reported in table3. Table 1: Number of instances in the train, validation, The Anger-Disgust and Joy-Surprise pairs hold the and test sets highest similarity of 0.58 and 0.51, respectively. These scores indicate that more than 50% frequent words are common in these pair of classes. On the Since the classifier models learn from the train- other hand, the Joy-Fear pair has the least similar- ing set instances to obtain more insights, we further ity index, which clarifies that this pair’s frequent analyzed this set. Table2 shows several statistics words are more distinct than other classes. These of the training set. similarity issues can substantially affect the emo- Total Unique Avg. words tion classification task. Some sample instances of Class words words per text BEmoC are shown in Table 9 (Appendix B). Anger 14914 5852 24.02 Disgust 27192 7212 22.35 C1 C2 C3 C4 C5 C6 Fear 14766 5072 21.09 C1 1.00 0.58 0.40 0.43 0.45 0.47 Joy 20885 7346 23.40 C2 - 1.00 0.41 0.45 0.47 0.44 Sadness 22727 7398 24.13 C3 - - 1.00 0.37 0.45 0.46 Surprise 13833 5675 23.45 C4 - - - 1.00 0.47 0.51 Total 114317 38555 - C5 - - - - 1.00 0.48 Table 2: Statistics of the train set of BEmoC Table 3: Jaccard similarity between the emotion class pairs.

Load more