Predictive Models for Classifying the Outcomes of Violence: Case Study for Thailand’S Deep South*
Total Page:16
File Type:pdf, Size:1020Kb
ISSN 2090-3359 (Print) ISSN 2090-3367 (Online) ΑΔΣ Advances in Decision Sciences Volume 23 Issue 3 September 2019 Michael McAleer Editor-in-Chief University Chair Professor Asia University, Taiwan Published by Asia University, Taiwan ADS@ASIAUNIVERSITY Predictive Models for Classifying the Outcomes of Violence: Case Study for Thailand’s Deep South* Bunjira Makond** Faculty of Commerce and Management Prince of Songkla University Trang, Thailand and Centre of Excellence in Mathematics Commission on Higher Education (CHE) Ministry of Education, Bangkok, Thailand Mayuening Eso Faculty of Science and Technology Prince of Songkla University Pattani, Thailand and Centre of Excellence in Mathematics Commission on Higher Education (CHE) Ministry of Education, Bangkok, Thailand Revised: August 2019 * The authors gratefully appreciate the assistance of Metta Kuning, former Director of DSCC, Prince of Songkla University, Pattani, Thailand, and a reviewer for helpful comments and suggestions. This research received much appreciated financial support from the Centre of Excellence in Mathematics, Commission on Higher Education, Thailand. ** Corresponding author: [email protected] 1 Abstract Violence is now widely recognized as a public health problem because of its significant consequences on the health and wellness of people and it remains a growing problem in many countries including Thailand. Elucidating the factors related to violence can provide information that can help to prevent violence and decrease the number of injuries. This study explored predictive data mining models which have high interpretability and prediction accuracy in classifying the outcomes of violence. After data preprocessing, a set of 21,424 incidents occurring from 2004 to 2016 were obtained from the Deep South Coordination Centre database. A correlation-based feature subset selection and decision tree technique with embedded feature selection were used for variable selection and four data mining techniques were applied to classify the violent outcomes into physical injury and no physical injury. The findings revealed that regardless of the variable selection method, gun was selected as a risk factor of physical injury. Moreover, a decision tree model with three variables, gun, zone, and solid/sharp weapon outperformed a naive Bayes model in terms of accurate performance and interpretability. Decision tree and artificial neural network models have similar levels of performance in classifying the outcome of violence but in practical terms, a decision tree model is more interpretable than an artificial neural network model. Keywords: Decision tree, naive Bayes, artificial neural network, logistic regression, violence in Thailand. JEL: C53, C55, C88, N35 2 1. Introduction Violence is now widely recognized as a public health problem because of its significant effect on the health and wellness of people, and remains a growing problem in many countries including Thailand. In the deep south of Thailand comprising Pattani, Yala, and Narathiwat provinces and parts of Songkhla province (Nathawi, Sabayoi, Chana, and Thepa districts), violence causes serious and extensive impacts on public health. The statistics showed that from January 2004 to March 2013, nearly 13,000 violent events were recorded, which resulted in 15,574 casualties (5,614 deaths and 9,960 injuries) (Burke et al., 2013). Despite the fact that the highest cost in terms of physical injury is by way of loss of life, non- fatal injuries result in functional losses and limitations, medical expenditures, lost work performance and disability compensation. Further, after injury, individuals experience psychological distress that occurs simultaneously with injury-related changes in function and the quality of their lives (Duckworth and Iezzi, 2010). Moreover, violence not only affects the individuals directly involved, but has an effect on the healthcare system, the delivery of healthcare, and surrounding people in the areas affected (The World Medical Association, Inc., 2012). From the perspective of public health, violence is preventable. Understanding the factors related to violence can generate data enabling the likelihood of specific events resulting from specific causes and environments to be predicted and such data can be used as a means of preventing violence and decreasing the number of injuries. For several decades, traditional statistical approaches have been manually implemented to data relating to violence in order to detect characteristics of or risk factors associated with violence (Höhle et al., 2009). Therefore, it is important that relevant information is available and is provided to decision makers in order that they can devise suitable prevention and intervention measures. Due to advances in the development of computer technology, huge amounts of data relating to violence can now be easily and efficiently stored in databases at a reasonable cost. However, the use of only traditional statistical approaches is insufficient to discover the knowledge hidden within huge datasets (Karrar et al., 2016). The application of the technique of data mining can however be employed to examine the factors affecting the outcome of 3 violence and make possible better information for violence prevention and controlling to reduce the number of injuries. In practice, prediction is a goal of data mining involving learning model from independent variables or attributes to predict unknown variables. In data mining activities, classification is a task to discover a predictive learning model that classifies a data item into one of several pre-classified classes (Rokach and Maimon, 2014). Recently, classification has been widely applied in mining data relating to violence in order to identify relationships and to generate data useful for the prevention of violence (Babcock and Cooper, 2018; Kumar et al., 2019; Liu et al, 2011; Ö zyirmidokuz and Kaya, 2014; Wijenayake et al., 2018). This study explored various predictive models with high interpretability and performance in classifying the outcomes of violence, based on a comparative study of decision tree, logistic regression, naïve Bayes, and artificial neural network techniques. The feature selection method was used to select relevant variables, and the pruning method was applied to construct simple and accurate models. Analysis of variance (ANOVA) was employed to identify significant differences in the predictions made by these models and Tukey’s HSD test was used to identify which particular models produced significantly different predictions. The rest of this paper is organized as follows. The literature related to the risk factors associated with and the characteristics of violence in Thailand’s deep south, studies relating to the application of data mining techniques in the domain of violence, and factors which influence the occurrence and outcome of violence are reviewed in section 2. Section 3 describes the variables, the data collected, the data preprocessing step, and the research methods. The experimental framework and the results are presented in section 4 and finally, the results are discussed and conclusions offered in section 5. 2. Literature Review Several researches related to explore the characteristics of the violence data in the southernmost provinces of Thailand have been active as follows. Grid maps and statistical models were used to investigate the terrorist event rate distributed by location and time the finding showed that the violence mostly frequent occurred at between 8 and 9 pm and the 4 most likely days were Wednesdays and Thursdays. Moreover, the violence had steadily increased trend in the rate during 2004 and stabilizing in 2005 and the district effects revealed that violence had enlarged to the neighboring districts in Songkla (Marohabout et al., 2009). Lim et al. (2009) investigated the living conditions of the families of victims of the unrest in Pattani province, and found that the majority of victims were male, the head of the family, of working age (45.9 ± 12.4 years), married, Muslim and had children. Further, the majority of victims had only primary school education and were farmers and most were shot and died. In addition, some of the victims suffered property damage and there were around three people per family who relied on each victim. Khongmark and Kuning (2013) constructed and compared Poisson and negative binomial generalized linear models with zero-corrected log- transformed linear models for the incidence of adverse events over location, time and the demographic characteristics of the victims including their gender and age group. The results showed that the incidence of injuries resulting from terrorism showed different patterns in different districts. Komolmalai et al. (2012) employed negative binomial and log-normal models to analyze the incidence of injuries to civilian victims of violence from terrorism in Pattani, Yala and Narathiwat provinces and four eastern districts of Songkhla province. Their study concluded that while specific regions were at higher risk at different times the pattern of incidence could not be easily predicted and overall the risks among different demographic groups remained relatively constant. Chirtkiatsakul et al. (2014) studied the factors associated with casualties due to the unrest between 2004 and 2011 in the three southern border provinces and the surrounding districts of Songkhla using logistic regression. The results showed that gender, age, religion, occupation, type of weapon,