<<

A Short Survey on Riot Prediction Koushik Deb, Arpita Konar, Kousheya Das, Mrityunjoy Sen

To cite this version:

Koushik Deb, Arpita Konar, Kousheya Das, Mrityunjoy Sen. A Short Survey on Riot Prediction. 2021. ￿hal-03245310￿

HAL Id: hal-03245310 https://hal.archives-ouvertes.fr/hal-03245310 Preprint submitted on 1 Jun 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Short Survey on Riot Prediction

Koushik Deb Arpita Konar Kousheya Das Mrityunjoy Sen Institute of Engineering and Institute of Engineering Institute of Engineering Institute of Engineering Management and Management and Management and Management [email protected] [email protected] [email protected] [email protected]

ABSTRACT It has been observed that there has been increased demand in real-world event detection using publicly accessible data through various social platforms such as Twitter, YouTube, or Facebook. By applying social media intelligence for detection and identification of any civil or secular turmoil-oriented ultimatum is an important field of interest for researchers for the last few decades. Several researchers have come up with various integrated event detection structures. It mainly consisted of data collection, pre-processing, classification. Based on their respective training data they have obtained accuracy. Many researchers have proposed their framework to address such civil agitation by techniques such as semi-supervised machine learning, Neural Network, Decision tree, Naive Bayes. 1.INTRODUCTION Research in the past 10 years has exposed the increasingly important role of utilizing data from social media in different secular unrest situations. With the rapid growth in web-enabled communication technology Wikipedia mainly defines social media as an interaction among people to swap facts and proposals in virtual platforms. One approach is event detection that aims to overcome several secular turmoil or civil agitations[1]. Basically, many researchers have based their framework on the integration of supervised machine learning algorithms. They have used NLP feature extraction for detecting smaller sub events[2].

Text-based dataset Classification Method

Data Pre- Text Archives Labelling processing

Performance Feature Classifiers Evaluation Selection

 Data Gathering: This is the initial stage of the project cycle where the various kinds of reports like web-based media remarks and posts are gathered in one place.  Data Pre-processing: This is the initial step of pre-processing that is utilized to present the content reports into a reasonable word design. The archives arranged for the following stage in text clearing are characterized by a good amount of features. Frequent methods are: Word-stemming: The stemming process changes various word structures into their root structure, for example, ‘association’ with ‘associate’, ‘processing’ to ‘process’. Stopwords Elimination: Stopwords, for example, "will", "to", "but", and so on are often happening, so the immaterial words should be taken out.  Document Labelling: The text labelling in the documents into “positive” and “negative” labels is primarily performed by manual interference. This is a tedious job requiring utmost care and attention. This operation is often implemented by the programmer through code.  Feature Selection: After pre-processing and labelling, the next significant step of text arrangement is to develop vector space, which improves the versatility, productivity, and precision of a text classifier. The principle of Feature Selection (FS) is choosing a part of feature-data from the first records. FS is performed by keeping the words with the most elevated score as per the foreordained proportion of the significance of the words. Due to text grouping, a noticeable issue is the increased dimensionality of the space of features. Many feature assessment metrics have been eminent among which are data gain (IG), Chi-square, Odds Ratio. However, FS of association word mining is more effective than IG. By considering issue of the high dimensional issue, new FS is introduced which utilizes the genetic algorithm (GA) enhancement.  Classification: The programmed characterization of the reports can be performed by three different ways, Unsupervised, semi-supervised and supervised techniques. From most recent couple of years, the assignment of programmed text characterization has been widely contemplated and quick advancement appears here, including the AI approaches, for example, Bayesian classifier, Decision Tree, K – Nearest Neighbour (KNN), Support Vector Machines (SVMs), Neural Networks, etcetera. The various methods are as follows: 1.1. K-NN classifier is a case-based learning calculation that depends on a separation or closeness work for sets of observations, for example, the Euclidean separation or Cosine simplicity measures. As a result of its viability, non-parametric, and simple to usage properties, be that as it may, the arrangement time is long, and hard to locate the ideal estimation of k. For the major portion, higher estimations of k reduce the consequence of disturbance on the arrangement, however, make limits in between classes faultier. A decent k can be chosen by different heuristic methods to defeat this disadvantage alter conventional KNN with various K-values for various classes instead of fixed an incentive for all classes. Fang has been attempting to improve the presentation of KNN by utilizing WKNN.

1.2. Naïve Bayes strategy is somewhat module classifier under known priori likelihood and class contingent likelihood. It is fundamental thought is to ascertain the likelihood that record D is has a place with class C. There are two occasion model are available for naive Bias as multivariate Bernoulli and multinomial model. Out of these models, the multinomial model is more reasonable when the information base is huge, yet there are recognizes two difficult issues with the multinomial model. First, it is unpleasant boundary assessed, and the issue lies in taking care of uncommon classifications that contain just hardly any preparation reports. They propose Poisson model for NB text order and furthermore give weight upgrading strategy to improve the presentation of uncommon classes. Adjusted NB proposes to improve the presentation of text arrangement, likewise gives approaches to improve credulous Bayes characterization via looking through the conditions among property. Innocent Bayes is simple for execution and calculation[4]. Thus, it is used for pre-handling for example, for vectorization. Execution of Naïve Bayes is helpless when highlights are exceptionally connected and, profoundly it is delicate to include choice so the propose two measurements for NB which are applied on a multiclass text archive.

1.3 Decision tree is utilized for text arrangement when tree interior nodes are name by term, branches leaving from them are named by test on the weight, and leaf node are speaking to relating class marks. Tree can order the report by going through the question structure from base to until it arrives at a particular leaf, which points to the aim for the grouping of the record. The vast majority of preparing information won't fit in memory choice tree development it gets wasteful due to trading of preparing tuples. To deal with this issue presents a strategy which can deal with numeric and all-out information[7]. New strategy is proposing as FDT to deal with the multi-mark report which diminishes the cost of enlistment and introduced choice tree-based emblematic guideline acceptance framework for text order which likewise improves text arrangement. The choice tree characterization strategy is differentiable from other choice help instruments with some focal points like its simplicity in perception and unraveling, regardless, for non-ace customers. Along these lines, for that, it is utilized in certain applications[8].

1.4. The use of Support vector machine (SVM)technique to Text Classification has been proposed. SVM requires positive and negative training set which are unprecedented for various arrangement techniques. These positive and negative training set are needed for the SVM to search for the decision surface that best disengages the positive from the negative data in the n- dimensional space, provided the hyper plane[9]. SVM classifier technique is extraordinary from other with its viability to improve the execution of text order consolidating the HMM and SVM where HMMs are utilized to as a component extractor and afterward another element vector is standardized as the contribution of SVMs, so the prepared SVMs can group obscure messages effectively, likewise by brushing with Bayes use to decrease the number of highlights which as lessening number of measurements[10]. SVM is more competent to unravel the multi-label class arrangement.

1.4 A Neural Network classifier is a system of units, where the information units normally speak to terms, the yield unit(s) speaks to the classification. For characterizing a test record, its term loads are relegated to the information units; the enactment of these units is engendered forward through the system, and the worth that the yield unit(s) takes up as a result decides the arrangement choice. A portion of the investigates utilize the single-layer perceptron, because of its straightforwardness of actualizing. The multi-layer perceptron which is more modern, likewise broadly actualized for characterization assignments. Models utilizing modified back-propagation NN (MBPNN) along with back- propagation NN (BPNN) are cited in for reports arrangement[11]. A proficient element determination strategy is utilized to lessen the dimensionality just as improve the exhibition. New Neural Network based archive characterization technique was introduced, which is useful for organizations to oversee patent records all the more viably.

Performance Evaluation: This is Last phase of Text classification, in which the assessments of text classifiers are normally directed tentatively, as opposed to logically[12]. The trial assessment of classifiers, as opposed to focusing on issues of Efficiency, typically attempts to assess the adequacy of a classifier, for example its capacity of taking the correct arrangement choices. A significant issue of Text arrangement is the way to measures the exhibition of the classifiers. Numerous measures have been utilized, similar to Precision and recall, error, accuracy, fallout and so on., and are given underneath[5]. Precision w.r.t. ci (Pri) is characterized as the as the likelihood that if an arbitrary record dx is arranged under ci, this choice is right. Similarly, Recall w.r.t. ci (Rei) is characterized as the contingent that, if an arbitrary archive dx should be grouped under ci, this choice is taken. TPi–The quantity of records effectively appointed to this class. FN - The quantity of records ineffectively appointed to this class. FPi - The quantity of records ineffectively dismissed appointed to this class. TNi - The quantity of records effectively dismissed appointed to this class. Fallout = FNi / FNi + TNi Error = FNi +FPi / TPi + FNi +FPi +TNi Accuracy = TPi + TNi For getting assessments of accuracy and review comparative with the entire class set, two unique techniques might be embraced, namely Micro-averaging and Macro-averaging. Some other methods are also utilised, like Break–even point, F- measure, Interpolation. 2.LITERATURE REVIEW In this survey characterization-based study is done on previous research works. Researchers have proposed different algorithms to overcome several civil agitations. Researcher Nasser Alsaedi et.al have proposed the approach of using temporal, spacial, and textual features to detect small scale events in a given place and time better than existing algorithms[3]. They have also compared their performance results. While several researchers focus on large or small scale events, their approach was to identify large and small scale events[6]. Thus, the proposal referred to the context of small events. Their approach was based on modifying a term frequency algorithm to include dynamic temporal aspect[9]. Researcher Swati Agarwal et.al have proposed in their survey that a variety of information retrieval and machine-learning based methods and techniques are used by researchers to investigate solutions for civil agitation or secular turmoil[14]. Researcher Juhi P Pathak has discussed the role of social media in North East ethnic violence[8]. The objective was to critically analyze the role of social media in North East ethnic violence and to suggest ways to stop the incitement of ethnic violence through social platforms[8]. Researcher Koushik Deb et.al has proposed a framework to handle the growing radicalization issue. Their approach can be used in multiple areas to detect mass sentiment. They have proposed a computer vision technique to understand the meaning of the image and predict based on the image data. Furthermore, they proposed to recorded voice file of WhatsApp group. Those recordings could also be converted to text. They have also measured the frequency of speech by time. They showed that they could relate frequency with incident timing and produce patterns from those speeches [13]. 3.COMPARISON TABLE

Paper name Techniques Merits Demerits Can We Computer systems 1. They have 1. One of the Predict Riot? organization→Embedded presented an main directions Disruptive systems; Redundancy; integrated was to improve Event Robotics; framework for the location Detection Networks→Network detecting real- detection and Using Twitter reliability world events, disambiguation both large and process for small, using small-scale the web events. enabled social networking site such as Twitter. Event detection was performed in several stages

Applying Intelligence and Security 1. 1.Disadvantage Social Media Informatics Machine Clustering, was Intelligence for Learning Mining User Logistic demographic Predicting and Generated Content Regression information and Identifying and activity feeds of a On-line Dynamic user profile and Radicalization Query links between and Civil Expansion two users were Unrest are the discriminatory Oriented commonly features for Threats used locating hidden techniques communities of to predict extremist upcoming users[14]. events related to civil unrest or protest. A Framework Pattern mining Prediction 1.This was a 1. In the actual for Predicting system proposed field, they could and Identifying framework to relate frequency Radicalization handle growing with incident and Civil radicalization timing and Unrest issue. This produce more Oriented could be used effective patterns Threats from in multiple from their WhatsApp areas to detect speech[13]. Group mass sentiment.

4.CONCLUSION Twitter has been very active in facilitating political unrest in comparison to other social media platforms. In many papers, researchers have applied various machine learning algorithms. Experiments conducted by them were effective to evaluate various aspects using their proposed structure using large datasets. However, algorithms such as Apriori could be used to obtain better accuracy. 5. REFERENCES

[1]. Barefoot, Darren &Szabo, Julie. (2010). Friends with Benefits: A Social Media Marketing Handbook. No Starch Press Pub. [2]. Manoj K. Agarwal, Krithi Ramamritham, and Manish Bhide. 2012. Real-time discovery of dense clusters in highly dynamic graphs: Identifying real-world events in highly dynamic environments. Proc. VLDB Endow. 5, 10 (June 2012), 980–991. DOI:http://dx.doi.org/10.14778/2336664.2336671 [3]. Nasser Alsaedi and Pete Burnap. 2015. Arabic event detection in social media. In Proceedings of the 16th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing’15). 384–401. [4]. Hila Becker, Mor Naaman, and Luis Gravano. 2011a. Beyond trending topics: Real-world event identification on twitter. In Proceedings of the 5th International AAAI Conference on Weblogs and Social Media (ICWSM’11). [5]. M.I. Rashid, Online Radicalization: Bangladesh Perspective, in Report of U.S. Command and General Staff College [6]. R. Korolov, D. Lu, J. Wang, G. Zhou, C. Bonial, C. Voss, L. Kaplan, W. Wallace, J. , H. , On predicting social unrest using social media, in 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) [7]. P. Debnath, S. Haque, S. Bandyopadhyay, S. Roy, Post-disaster situational analysis from WhatsApp group chats of emergency response providers, in Proceedings of the ISCRAM 2016 Conference—Rio de Janeiro, Brazil, May 2016 Tapia, Antunes, ed. by Bañuls, Moore and Porto [8]. J.P. Pathak, Role of social media in reference to North-East ethnic violence (2012). IOSR J. Hum. Ities Soc. Sci. (IOSR-JHSS) 19(4), Ver. V, pp. 59–66 (2014) [9]. A. Kaur, G. Jagdev, Analyzing the working of FP-growth algorithm for frequent pattern mining. Int. J. Res. Stud. Comput. Sci. Eng. (IJRSCSE) 4(4), 22–30 (2017) [10]. S.-T., Y. Li, Pattern-based web mining using data mining techniques. Int. J. -Educ., E-Bus., E-Manag. E-Learn. 3(2) (2013) [11]. G. Morrell, S. Scott, D. McNeish, S. Webster, The August riots in England understanding the involvement of young people, in October 2011 Prepared for the Cabinet Office [12]Hila Becker, Mor Naaman, and Luis Gravano. 2011b. Selecting quality twitter content for events. In Proceedings of the 5th International Conference on Weblogs and Social Media. [13]Koushik Deb. 2020. A Framework for Predicting and Identifying Radicalization and Civil Unrest Oriented Threats from WhatsApp Group

[14]Swati Agarwal. 2015. Applying Social Media Intelligence for Predicting and Identifying On-line Radicalization and Civil Unrest Oriented Threats