Applied Research on Predictive Machine Learning Models For

Home , State crime

APPLIED RESEARCH ON PREDICTIVE MACHINE LEARNING MODELS FOR

CAMPUS CRIME POLICING

A Project

Presented to the faculty of the Department of Computer Science

California State University, Sacramento

Submitted in partial satisfaction of the requirements for the degree of

MASTER OF SCIENCE

Computer Science

Ronit Naik

FALL 2020

Ronit Naik

APPLIED RESEARCH ON PREDICTIVE MACHINE LEARNING MODELS FOR

CAMPUS CRIME POLICING

A Project

Ronit Naik

Approved by:

______, Committee Chair Dr. Jingwei Yang

______, Second Reader Dr. Xuyu Wang

Date

iii

Student: Ronit Naik

I certify that this student has met the requirements for format contained in the University format manual, and this project is suitable for electronic submission to the library and credit is to be awarded for the project.

______, Graduate Coordinator Dr. Jinsong Ouyang Date

Department of Computer Science

Abstract

APPLIED RESEARCH ON PREDICTIVE MACHINE LEARNING MODELS FOR

CAMPUS CRIME POLICING

Ronit Naik

While we like to think that universities are safe and free of acts of crime, there is a need for the university police department to be present on campus. The number of crime incidents reported during the year was 225. Of the 3,990 universities and colleges that reported data on crime and protection, 3,397 of them reported fewer incidents than those reported by CSUS [1]. With the number of incidents being reported on campus, it becomes crucial to ask a few questions to the university police department like – What policies does the police department take into action for the crime incidents on campus? What prediction mechanism they currently have in place to prevent crime from taking place? How useful would it be for the department if they have a crime prediction system which not just shows the frequent crime patterns but gives the future crime occurrence for a specific location on campus?

Considering the above questions and a few others, through this project, we emphasize the importance of predictive policing for on campus crime prediction for CSU Sacramento.

The primary step, which plays an essential role in completing the project is the collection of data. The data we collected from the police department was in multiple hard copy binders for years from 2013-2019, which had to be digitized for us to have historical data to predict on.

The goal of this project is to provide a solution on crime prediction technique to use by conducting extensive research on multiple methodologies and techniques starting from cleaning the data and pre-processing it to perform multiple data balancing techniques and then to perform crime prediction techniques which gives an accuracy closer to the industry standard of 75-80%.

______, Committee Chair Dr. Jingwei Yang

Date

ACKNOWLEDGEMENTS

I would like to extend my sincere gratitude to my project advisor Dr. Jingwei Yang for his continued support and encouragement in driving this project from the early phase till its completion. It was because of his guidance, which helped me take the right path in the process of implementation. This project helped me understand the importance behind any implementation of a technology that goes a lot of research and development, which has helped me look at any new problem with a research-oriented perspective.

Also, I would like to thank Dr. Xuyu Wang for his continuous guidance and support.

Finally, I would like to thank the entire faculty and staff at the Department of Computer

Science for their support throughout my Masters.

vii

TABLE OF CONTENTS

Page

Acknowledgements ...... vii

List of Tables ...... x

List of Figures ...... xi

Chapter 1. INTRODUCTION ...... 1

1.1 Overview ...... 1

1.2 Problem Formulation ...... 2

2. LITERATURE REVIEW ...... 4

2.1 Crime Patterns Using Statistical Crime Data from Department of Education .... 6

2.2 Crime Patterns Using Statistical Crime Data from City of Sacramento ...... 8

3. DESIGN AND ANALYSIS OF DATA ...... 9

3.1 Overview of Dataset ...... 9

3.2 Data Preprocessing ...... 10

3.2.1 Data Extraction and Data Cleaning ...... 11

3.2.2 Data Analysis and Visualization ...... 16

3.2.3 Data Preprocessing...... 26

3.2.4 Web Tool to Automate Data Extraction and Data Preprocessing ...... 32

3.3 Data Balancing ...... 34

4. CRIME PREDICTION USING PREDICTIVE ANALYTICS ...... 40

viii

4.1 Crime Prediction Using Machine Learning Algorithms...... 42

4.2 Crime Prediction Using Deep Learning Algorithm (ANN)...... 46

5. EXPERIMENTAL RESULTS...... 49

5.1 Performance Evaluation Metrics ...... 49

5.1.1 Accuracy ...... 49

5.1.2 Confusion Matrix ...... 50

5.1.3 F1-Score ...... 50

5.1.4 Loss Function for ANN ...... 51

5.1.5 Train and Test Data ...... 51

5.2 Comparison of Approach-Wise Results ...... 52

5.2.1 Approach 1: Machine Learning Approach with All Attributes ...... 52

5.2.2 Approach 2: Machine Learning Approach with Feature Extraction ...... 53

5.2.3 Approach 3: ML Approach with Data Balance and Aggregation...... 54

5.2.4 Approach 4: Deep Learning Approach Using ANN ...... 57

5.2.5 Results on Unseen Real Data ...... 58

5.2.6 Summary of the Results ...... 60

6. CONCLUSION AND FUTURE WORK ...... 64

Appendix. University Names and Email Sent ...... 69

References ...... 70

LIST OF TABLES

Tables Page

1. Combining similar crime categories ...... 15

2. Type of feature selection method based on attribute type ...... 28

3. Chi-square feature selection ...... 28

4. Input features for approach 2 ...... 53

5. Input features for approach 3 ...... 55

6. Final results of crime prediction models ...... 56

7. Classification report for XGB ...... 56

8. Accuracies for machine learning algorithms in 1st approach ...... 61

9. Accuracies for machine learning algorithms in 2nd approach ...... 62

10. Accuracies for machine learning algorithms in 3rd approach ...... 62

11. Accuracies for deep learning algorithm in 4th approach ...... 63

12. Results of all algorithmic models ...... 63

LIST OF FIGURES

Figures Page

1. Count of all crime categories on all campuses ...... 7

2. Image of scanned data record ...... 10

3. Machine learning process ...... 11

4. Count of crime occurrence ...... 17

5. Crime occurrence w.r.t month ...... 18

6. Rate of crime every hour ...... 18

7. Crime occurrence w.r.t day of week ...... 19

8. Crime type ‘theft’ occurrence by each hour ...... 20

9. Crime types by hour at Parking Structure 1 ...... 21

10. Crime types by hour at Parking Structure 2 ...... 22

11. Crime types by hour at Rec and Wellness Center ...... 23

12. Theft occurrence per year at all locations ...... 24

13. Count of crime type in Sacramento ...... 25

14. Filter-based feature selection ...... 28

15. Extra trees classifier feature selection ...... 29

16. Dataset after performing label encoding ...... 31

17. Screenshot of the webpage for uploading file for preprocessing ...... 33

18. Methods for handling data imbalance problem ...... 35

19. Predictive analytics workflow...... 41 xi

20. Performance evaluation metric accuracy ...... 49

21. Formula for recall and precision ...... 50

22. Formula for calculating F1-Score ...... 51

23. Summary of ANN model ...... 57

24. Training loss curve ...... 58

25. Flowchart of generic crime prediction technique ...... 66

xii

CHAPTER 1

INTRODUCTION

1.1 Overview

When and where crime occurs the most? Why are there very specific locations and times of crime occurrence? What can the University Police Department at Sacramento

State do to prevent or reduce crime from occurring? These are just a few basic questions that are asked commonly, and there have been not enough convincing answers since the crime rate on campus has been on the rise. The police department at Sacramento State has adopted various methodologies to reduce crimes by providing services like 24-hour patrol and dispatch, emergency preparedness programs, alarm monitoring, and response, to name a few [2]. But, given the idea of implementing predictive analytics for crime prediction has become popular in the past few years, the same has been implemented in this research using various crime prediction and crime forecasting methodologies. And through this research, we provide a solution that can be used as a source of truth for any crime prediction problems dealing with on-campus crime prediction.

Multiple cities and their police departments in the US are currently running algorithms, giving them an overview of the crime forecasts in their locality and the most occurring crime for a particular time interval. For example, in late 2016, the NYPD (New

York Police Department) started using a new tool for predictive policing to understand potential crime patterns and crime behavior [3]. Similarly, on the basis of crime analysis and computer modeling, the LAPD's (Los Angeles Police Department) “Real-Time

Analysis and Critical Response Division” experimented with a predictive algorithm in collaboration with the academic department at several significant universities [4] to anticipate probable crime and location and real-time monitoring [5].

Apart from this, if there have been great initiatives towards Sacramento's smart city, the ongoing projects being done, and the new projects being taken up by the Sacramento government are making use of information technology as a tool that can provide services to the citizens. This is what has been defined as a smart city [6]. These days there has been an increasing focus on public safety. The projects relating to crime are based on the Real-

Time Crime Center [7], which provides real time information about any specific crime incidents to the field personnel [8]. There can be a possibility of partnering with the government with projects related to crime if we propose an approach combining the work done on this research and other additional techniques such as capturing real-time images and videos on campus for crime analysis.

1.2 Problem Formulation

With this project being a research-based, we could define our question for finding a solution to Sacramento State on-campus crime. We formulated the prediction question for doing this research to find an applicable and accurate Machine Learning methodology for on-campus crime data. With this question, we came up with an applied research-based methodology using various machine learning techniques to provide a generic solution that can be applied for any on-campus crime-related research or projects for the universities in the country.

The machine learning methodologies and techniques have been already defined and are available to use. But the main problem lies here: there is no standard defined or a generic solution that handles the problem of using the available methodology on-campus crime data.

The problem specific to the Sacramento State campus is how the campus crime can be reduced by applying Machine Learning techniques on the campus crime data. With this problem statement in mind, we applied crime prediction algorithms to predict the crime incident during a future point in time. Performed various experiments on the crime data to find which algorithm best suits the problem at hand. Further, modifying the algorithm to provide the top 3 probable crimes which can occur at a specific location for a particular time.

CHAPTER 2

LITERATURE REVIEW

For this research to be effective and to understand the existing crime prediction systems, it was important to provide an overview of the works done by researchers in this field. We look at the research conducted by people in the crime domain and research explicitly done on campus crime in this section and provide some key findings.

One of the very firsts research done by Sloan [9] suggests amount of crime on campus with burglary/theft being the highest amounting to 64% of the total crime and the second being vandalism amounting to 18.8% of the total crime, on a survey done by Sloan on an average of 494 campuses in the country.

Carr’s [10] research white paper on campus violence contained many key findings for crimes on campuses. Out of those, there were a few interesting facts such as the majority of crimes, which is 56% of on-campus crimes occur from 6 am to 6 pm, which is when the campus is not empty.

A survey conducted on 3,472 students by Fisher et al. [11] suggests some key facts, and the survey indicates that students were more likely to be experiencing theft when they spent more nights on campus compared to regular ones. The survey also suggests that the students who lived on campus had a lesser chance of experiencing crimes such as theft compared to students who stayed off-campus. This could be because the students who lived on campus had more safer places to store their personal belongings.

An IEEE paper ‘A novel serial crime prediction model based on Bayesian learning theory’ [12] implemented an algorithm called Bayesian Learning, which primarily focused on studying the past crime's geographic information to predict the next crime location.

A paper published on analysis and prediction of crime [13] uses various data mining techniques like Entity extraction for identifying objects from crime records, Deviation detection to detect or trace fraud detection etc. to name a few. They also suggest classification for segregating common crimes and predicting future occurring crime.

However, this paper deals with a theoretical approach rather than giving a practical approach.

An extensive research for crime mapping and predictive policing is discussed in the research paper published in IEEE [48]. They discuss the techniques for crime mapping.

Crime mapping is the approach taken to identify a certain geolocation as a crime hotspot.

In their paper they have discussed multiple methods others have used for crime mapping.

They have utilized one such method for the Indian country to map the states of India in categories like low, medium and high crime hotspots. They have used K-means clustering algorithm on a dataset available from National Crime Records Bureau (NCRB) to generate the levels of hotspots for every state. This research paper also discusses techniques for

Predictive Policing. Predictive Policing is nothing but using previous crime data for predicting crime in future. One of the approaches discussed uses four different algorithms such as Clustering, Classification, Association Analysis and Regression. This research proposes a merger of the Crime Mapping and Predictive Policing techniques to improve the security in smart cities.

This section of the research discussed a few research papers, articles and surveys listed above, but have not encountered a research paper that has performed crime prediction techniques on a specific university campus using the university’s criminal data. Hence, this research is unique in a way where a generalized solution is provided to a research problem that specifies how crime can be reduced on campus by conducting different crime prediction techniques.

Also, because of a lack of research on university crime data or any data related to crime in

Sacramento, we have conducted a literature review by working on multiple experiments and surveyed by sending emails to various universities to verify if any university has conducted similar research on their campus. From the survey conducted on ten campuses by sending an email to their police departments, we got a response from the departments that say they do not know such research on their university campus (Appendix). We have also done two experiments to ascertain if the work we did on this research project produced the same crime patterns as the data from the authoritative sources that is the Sacramento

State police department. The following sections discuss the crime patterns created on the crime data obtained from the U.S. Department of Education and the City of Sacramento, respectively.

2.1 Crime Patterns using statistical crime data from U.S. Department of Education

We performed a literature survey of finding crime patterns by utilizing the crime data from the Office of Postsecondary Education of the U.S. Department of Education [14].

The data retrieved from the OPE website contained more than 41k records, including the criminal offenses that occurred on more than 11k campuses over all the years from 2001

7 to 2018. This literature survey aimed to compare the crime pattern in all the campuses that report crime data and validate the crime pattern's findings on the Sacramento State campus.

After retrieving the data from the OPE website, the next step was to explore the data to find the crime patterns. The dataset contained 41,068 records, as suggested above. Then, performed the preprocessing steps and went ahead with visualizing the data for better analysis. As seen in figure 1, the plot shows the crime categories and their occurrences on

US university campuses in total. There is a similar pattern of crimes for all the years. The most common and highest occurring crime type is ‘Burglary’ and then followed by ‘Motor vehicle theft’. These findings are from the dataset, which were retrieved from the

Department of Education website and later, compare the crime pattern from the data on

Sacramento State crime in the coming sections. We will be investigating if there is a similar pattern as this.

Figure 1: Count of all crime categories on all campuses

2.2 Crime Patterns using statistical crime data from City of Sacramento

In the second literature survey, the data was retrieved from the City of Sacramento public available crime data [15]. This literature survey aimed to extract criminal patterns from in and around the Sacramento State campus and find the correlation between the crimes that occur on campus, and that occur off campus but in close proximity with the

Sacramento State campus. After retrieving the data from the publicly available datasets on the City of Sacramento website, the first step was to explore the dataset to generate crime patterns. The dataset contained all the crimes taken place in Sacramento. The number of rows in the dataset were close to 362k, which is a huge dataset for the entire city. Since the aim was to find patterns around the Sacramento State campus, we filtered the dataset by zip codes, considering only records with the zip codes ‘95819’ or ‘95825’. These two zip codes are considered because ‘95819’ is the zip code where the Sacramento State campus is located, and ‘95825’ is very close in proximity and is primarily populated with students staying off-campus. The next step was to perform the required data preprocessing steps, and then moved on to visualize the dataset and make sense out of the pattern. The similarity in the crime patterns is discussed later in section 3.2.2 of the Design and Analysis of Data.

CHAPTER 3

DESIGN AND ANALYSIS OF DATA

3.1 Overview of Dataset

The dataset for this research is the raw data made available by the Sacramento State police department on request for this project. The data was received as a hard copy binder in the form of physical data records. We scanned every page of the files and uploaded it as a pdf document on the local machine to extract data from them. We received the crime data for the entire Sacramento State campus from the year January 2013 to October 2019. Each crime record consisted of fields like crime type, date and time of the incident, address/location, disposition, and public information.

Figure 2: Image of scanned data record As seen from figure 2, the scanned version of the physical data record cannot directly be converted to the csv data, as there is no technique readily available to extract data and convert to a csv file. It can also be seen that there are hole punches on the text, and it would be evident that the data would have some noise when we extract the data from these files.

Next, we move on to how the data was extracted and pre-processed it.

3.2 Data Preprocessing

There are several parameters that influence the success of machine learning on any task. Data interpretation and quality are essential aspects for solving any given problem

11 using machine learning techniques [16]. If there is a lot of unnecessary and repetitive information present or noisy and inaccurate data, then it is more challenging to extract meaningful information during the training process. It is very well understood that data preparing and filtering phases in any machine learning problems require a substantial amount of processing time. There are multiple steps involved in a life cycle of a data preprocessing process, including data extraction, data cleaning, data normalization, data balancing for classification problems, data transformation, and feature extraction. These are a few steps that every researcher has to go through to bring the best data out of the given dataset for any machine learning model to perform well. Here in our research, too, there is more importance and focus on the data preprocessing steps since it lays the foundation for a good prediction model. Below are listed down the steps taken in this module.

Figure 3: Machine learning process

3.2.1 Data Extraction and Data Cleaning

3.2.1.1 Data Extraction

Data extraction is one of the first steps taken during this research. As shown in

Figure 2, the data was not directly available as a CSV file, and instead, the data was scanned manually for every page in the physical binder copy for every year. The manual process was time-consuming and needed diligence for scanning every page without missing any

12 scan. After the scanned copy, there were seven pdf files for the seven years of data (2013-

2019). The next step in this process was to extract data from the pdf and convert all the data into a comma-separated file (CSV) file, which is easily accessible for preprocessing and training of machine learning models. This was one of the challenging processes performed, and before writing a python script for the extraction of data, we tried using various optical character recognition (OCR) software available online. But, since the data is not formatted correctly, there was no way to get the results in an aligned manner in a

CSV file.

A python script was written to extract data from these files. The libraries used are pytesseract, tesseract-ocr, and pdf2image. Firstly, every page in the pdf file is converted to an image corresponding to that page using the pdf2image library. Then for every image, the data is extracted using the pytesseract library. This library converts the image to string and combines the textual data using delimiters. Once the data is in the form of readable text, the script parses the data by matching patterns with the column names. For example, to extract crime_type, date, and time from the raw data, we match with patterns like a set of month names and characters like “ : ” “ - ” and “,” since all these attributes are on a single line in the file.

There is no way to extract the crime type manually by making a set of all crime types as there are a number of them. But at the same time, there are certain limitations when reading the data from the scanned version of pdf files. One such limitation and the most important in our case is reading data as noise. More often than not, the extracted data, when read by the python script, has noise with it and has to be removed from populating in the dataframe.

Apart from noise, there is also missing data for multiple records. The python script parses most of the data, but because of the conversion of image to text, the data alignment gets distorted, and the pattern matching from the python script does not detect few attribute values. Manual entry was done for the missing attribute values for multiple rows of crime type, date, address, etc. to not end up with a reduced size of the dataset. For finding useful patterns and other important information from the data, new attributes like “week number”,

“day of week”, “month number” and “part of day” were added using the existing attributes of date and time. After performing all these operations, the data was combined from different years into one single dataset to perform further preprocessing. The total number of rows populated were 2927. But after removing the rows with null address values, the total count was 2902.

3.2.1.2 Data Cleaning

Data cleaning is a crucial step in any machine learning prediction model. It can generally have a significant impact on the performance of a machine learning algorithm

[16]. And one of the most challenging machine learning problems is removing noisy objects from the data [17]. While extracting the data from the raw data, most of the noisy data were eliminated by the script and the character recognition libraries. Still, for a lot of records, the noise was in the attribute values. For example, one unique crime type such as

“hit and run, property damage only” was populated multiple times like – “i ad run, property damage only”, “td run, property damage only”, “hitand run, property damage only”, “hin and run, property damage only”, “rend run, property damage only”, “|. and run, property damage only”, “ei and run, property damage only” etc.

Similarly, for other crime types and other attributes such as the crime location, there were misspelled words that are nothing but noise in any machine learning world. There were multiple techniques through the research papers published online for eliminating noise from attributes. One such technique proposed research paper [18] suggests good results with a supervised machine learning-based algorithm which uses context-free spell check correction methodology. Since the data in our case is misspelled, there was a possibility to use this methodology and automate the noise elimination process. This research has implemented the Advance Character Based Tokenization (ACBT) technique among the three techniques proposed which were, Word Based Tokenization (WBT), Character Based

Tokenization (CBT), and Advance Character Based Tokenization (ACBT) since the results and accuracy of the ACBT technique were better than the other two methods. Then, created the correctly spelled dataset as close to the words related to crime, and with the wrongly spelled words in our data after separating the correct words, and used it as a test set for the

ACBT technique-based classifier. With this approach, the noise was removed from the dataset.

Then during analysis, there were multiple crime types of the same nature while analyzing the data but with different naming. Since the task of crime reporting is a manual job by the department of police in Sacramento State, there are certain instances where crime types are repeated within the same type. We combined the crimes having the same crime type but different wordings, as in table 1. The use of doing this is to reduce the number of unique crime types and aggregate the similar into one unique crime type, which will be beneficial for prediction.

Table 1: Combining similar crime categories New generalized category Original crime category Theft Report of theft, Attempted theft-grand, Attempted theft- petty Burglary Burglary report, Burglary to auto report, Robbery/ 5 to 10 ago, Burglary in progress, Robbery report, Robbery in progress, Burglary, Burglary to auto hit and run Hit and run, property damage only, Hit and run w/injury or death other agency misdemeanor Other agency felony warrant, Other agency misdemeanor warrant warrant motor vehicle theft/report of Motor vehicle theft report, Motor vehicle theft in lost or stolen license plate progress, Recovered stolen motor vehicle, Report of lost or stolen license plate possession of controlled Public drunkenness, Possession of controlled substance, substance/marijuana or public Possession of a weapon on campus, Poss. of alcoholic drunkenness beverage by minor, Possession of marijuana (m), Possession of illegal weapon, Possession of drug paraphernalia, Possession of burglary tools, Receiving/possessing stolen property, Possession for sale - marijuana, Possession of a weapon on campus, Possession of counterfeit bills or notes assault/sexual/battery report Assault/battery, 5 to 10 ago, Assault/battery report, Assault/battery in progress, Sexual battery, Battery on officer-not aggravated, Sexual assault

After aggregating the similar crime types into one unique crime type the data had 46 unique crime types. Still, each crime type's count of occurrence had a wide deviation, from the least occurring crime type just one time to the highest occurring crime type occurring 1142 times. Because of these varying occurrences, this research considered only the crime type, which occurs more than 16 times. This reduced the number of unique crime types to 10.

Similarly, for the attribute address, the occurrence of each unique value for few locations was significantly less compared to other address values. Hence, considered the rows which

16 have the occurrence of each unique address more than 15. After these operations, the total number of rows were reduced to 2299 from 2902.

3.2.2 Data Analysis and Visualization

After translating raw data into useful information, the next important step that can help guide the process of successful decision-making and improved model-building is the analysis of data and visualization. Data visualization is the interactive presentation of data, linking data accessibility and data interpretation, organizing, and presenting essential findings from the data [19]. It is an iterative method to visualize the data by asking questions, finding answers to those questions, and starting with fresh questions again. The art of asking quality questions produces a vast number of questions and helps to interpret the data better. Based on various types of questions, creating different types of graphical plots leads to another distinct collection of questions, different sets of graphs. Eventually, it helps understand the data in detail [19].

In our research, visual analysis of the data is as essential and beneficial in capturing crime patterns as building a prediction model. Python has vast library content, which helps us analyze various trends by visualizing the data on graphs. This research has used the matplotlib library, the seaborn library, and plotly library to plot different meaningful bar charts and pie charts and create interactive visualizations to understand the relation of crime type with other attributes in our data. Few of the interesting insights which helped in building an improved model, and also which can help prevent crimes are as follows:

Figure 4 below shows the crime type occurrence count for every unique crime type.

Figure 4: Count of crime occurrence The top 3 crimes on campus are theft, hit and run, and vandalism. As seen in figure 4, theft is occurring the most. Occurrences of each of them are 977, 553, and 163, respectively. It can be noticed from this graph the issue of data imbalance. Theft is the majority class here and the instances of the rest of the classes are way too less except the second, hit and run which is closer to the count of the first. This is a clear problem of data imbalance and we will be discussing more in the section 3.3 of Data Balancing.

In figure 5, it can be seen from the plot for the number of crime occurrences each month.

Figure 5: Crime occurrence w.r.t month

As seen from the plot, crime occurrence is greater just before the start of the fall semester and before the start of the spring semester compared to the other months during the semester. Crime is the least during the summer since the campus population reduces. This insight gives the department of police at Sacramento State to be extra cautious before the start of the semesters.

The plot below in figure 6 gives a clearer understanding of the crime pattern and the hour of the day.

Figure 6: Rate of crime every hour

We can see a clear pattern between the crime occurrence and the time period of the day; the crime rate is usually lower in the morning period, from 4 AM to 7 AM, and starts to increase gradually as students start to trickle in, as there can be cases of hit and run in the parking lots. Then crime is generally at peak during the afternoon hours from 12 PM to 6

PM, which is the usual campus rush hour.

The next plot in figure 7 shows the crime occurrence with respect to days of the week.

Figure 7: Crime occurrence w.r.t day of week From the above pie chart, it can be seen that the crime count on weekends is less when compared to the crime count on weekdays.

We plotted different intuitive graphs for further analysis that gives findings of the particular crime type or the location.

From the plot below in figure 8, it can be seen that crime type theft follows the general crime pattern, as seen in the above plot in figure 6, and the rate of crime theft is more during school hours when there is enough crowd on campus.

Figure 8: Crime type ‘theft’ occurrence by each hour

From the plot in figure 8, there are particular places where ‘Theft’ is occurring more at rec and wellness center and upper east side lofts compared to other locations as expected since both the places tend to be crowded.

Since the top 3 locations for crime occurrence were Parking Structure 1, Parking Structure

2, and Rec and Wellness Center, we plotted the crime occurrences in these 3 locations by the hour.

Figure 9: Crime types by hour at Parking Structure 1

From the above figure 9, it can be seen there have been nine different types of crime occurrences, out of which the most prominent crime which occurs the most is Hit and Run as expected.

Figure 10: Crime types by hour at Parking Structure 2

Similarly, from the above plot in figure 10, it can be seen the different crime types at

Parking Structure 2 by the hour, and the most prominent crime type at this location is Hit and Run. With these insights, it can be useful for the police department at Sacramento State to plan to reduce the cases occurring at these two parking locations.

From the plot below in figure 11, it is visible as theft is the most prominent and majorly occurring crime at the Rec and Wellness Center since it is a very open place and less guarded, which makes it most vulnerable to theft.

Figure 11: Crime types by hour at Rec and Wellness Center

From the plot below in figure 12, we examine the highest occurring crime type ‘theft’ over the years at all the locations; it can be seen that the location at ‘rec and wellness center’ had the highest theft-related incidences till 2014. However, over time these incidences have been going down over the years from 2015 and have significantly reduced in 2019, which is a good indication.

Figure 12: Theft occurrence per year at all locations

Apart from analyzing the on-campus crimes, we tried to find any similarity in the patterns of crimes near the neighboring areas of the campus, as discussed above in the section of the literature survey. The data was acquired from the open dataset available on the city of

Sacramento website [20]; this data is publicly made available by the Sacramento city government.

After processing the data, the next thing was to come up with crime patterns in areas near the campus. On filtering the data by zip codes that surround the campus, there were close to similar patterns compared to the crimes on campus.

Figure 13: Count of crime type in Sacramento

From the above plot in figure 13, it can be seen the top 10 crimes occurring around the

Sacramento State campus. There is a very similar trend among the top 3-5 crimes. These crimes are very much similar to the crimes occurring on campus. Hence, it is fair to infer from this pattern that there is similar crime activity in and around the campus, and this analysis performed on the data of Sacramento gives us an eye-catching result.

It was essential to do such extensive data analysis and visualization since the analysis through graphs paved the way towards building a prediction-based model. There was a clear understanding of the data from the graphs, and one of the critical steps in preprocessing, which will be discussed later, is the data imbalance issue which the dataset has. It is evident from the plot in figure 4 that the output class is not balanced as there is a

26 considerable difference in the count of the crime type occurrences among the first highest occurring crime type and the rest.

3.2.3 Data Preprocessing

After removing noise and aggregation, the dataset had a lot of potential for extracting more meaningful information from the existing columns. Data preprocessing steps like feature selection and feature encoding are essential methods to transform data into a form that can be understood by the model for training and achieve the best results.

Few of the benefits of performing this processing on our data are –

1. Reduces overfitting/underfitting: If there is less redundant data for the training model, the model can predict better with less opportunity to make prediction based on noise.

2. Improves accuracy: If there is less data that is misleading, it increases model accuracy.

3. Increases adaptability: Machine learning algorithm understands mathematical data, that is, numbers and not text, so data encoding the categorical variables increases the model adaptability.

3.2.3.1 Feature Extraction

One of the interesting preprocessing steps is extracting new features from the dataset's old available features by using the domain knowledge. Our dataset has a column called Crime Date, which also has the occurrence date and occurrence time. It was an intuitive decision to split the date column into multiple columns where the new columns would hold more meaning than the date column. Hence, in the research the extracted informative columns were such as Week Number, Day of Week, Year, Month, Time in hour, Part of Day.

3.2.3.2 Feature Selection

Feature subset selection is identifying and selecting a subset of essential features from the dataset that are most relevant for the target class to predict more accurate results

[23]. The most common feature selection methods are filter-based methods, used when working with categorical input data and when the target output feature is also categorical, which means predicting a classification model.

Often, feature selection is compared and mistaken with the dimensionality reduction technique. But, both the methods are different and not the same. Both are involved in reducing the number of variables. The dimensionality reduction method performs data reduction by combining multiple attributes, also known as feature transformation. In contrast, the feature selection method reduces the attributes by excluding them from the dataset based on the approaches, without modifying the attributes [22]. A few dimensionality reduction methods are Principal Component Analysis (PCA), Linear

Discriminant Analysis (LDA), etc. Hence, these methods were not used in our research.

Langley’s [21] work on Feature Selection grouped the feature selection methods into two groups, filter method and wrapper method. The filter approach is based on the total uniqueness of the data to be analyzed and chosen, not having any data mining technique.

The filter method uses an evaluation criterion that includes distance, data, dependency, and consistency. The filter approach uses the principal ranking strategy criterion and uses the rank ordering technique for attribute selection.

Figure 14: Filter-based feature selection

Before the classification process begins, the ranking method will remove irrelevant attributes[22]. Filter methods are usually used during the preprocessing step.

Using this method, features are given a rank-based ordering based on the statistical score computed, determining the correlation with the target attribute.

Table 2: Type of feature selection method based on attribute type [22] Feature/Response Continuous Categorical

Continuous Pearson’s Correlation Linear Discriminant Analysis

Categorical Anova Chi-Square

As seen from table 2 [22] and the above description, Chi-Square is one of the examples of the filter-based methods, and since our features were categorical, this research has used

Chi-Square for feature selection.

Table 3: Chi-square feature selection part_ Variab week_num week_d time_ho address time Year month of_da les ber ay ur y

Chi- 578.7 square 2702.23 17.49 681.84 232.8 153.8 176.7 885.46 3 values

We calculated the chi-square score for all the input attributes against the target attribute, and the results obtained are in table 3. Unsurprisingly, the column address has the highest score, and with this, it is safe to infer that location is one of the attribute values which the target attribute crime type depends on. But, other factors also have to be considered, and have included the other higher score attributes in our prediction model, which will be discussed later.

Apart from the chi-squared test, this research has also implemented the ensemble method- based Extra Trees Classifier model, a ranking-based algorithm. It is one of the most used algorithms for feature selection on categorical variables [23]. It ranks the features based on the importance. It can be useful in knowing which feature is considered important by our model, and at the same time, this can also be used in eliminating the least important features.

Figure 15: Extra trees classifier feature selection

From the above figure 15, it can be seen the ranks given to all the input features based on the Extra Trees Classifier feature selection model. The column “address” was ranked the highest, and the attribute extracted from Date “part_of_day” is ranked the lowest. Though

“time” is ranked among the top attributes, it has been removed time from our prediction model as the chi-squared test score for attribute time is less. Also, there will be maximum unique instances of this attribute, which can hamper the model's performance.

3.2.3.3 Feature Encoding

Many machine learning algorithms cannot take categorical data as input. They require numeric data. One option is to leave out the categorical attributes from using it in an algorithm and using only the numeric values. But if that is done, there may be loss of a lot of important information for the algorithm to perform well [24]. Hence, most often than not, the categorical features are encoded to convert it into numeric values for the algorithms to work.

Generally, there are two types of encoding performed on the categorical features, One-hot

Encoding and Label Encoding.

One-hot Encoding is the most commonly used encoding scheme [25]. In One hot Encoding, there is a mapping of each category value of a categorical feature to a new column and assigning either 0 or 1 denoting the absence or presence of that category [26]. For higher dimensionality data, this method can produce a lot of columns that can potentially slow down the model learning and performance significantly. This process is termed as the

“curse of dimensionality” by Bellman [27].

Label Encoding is another method that is one of the most widely used to convert the categorical feature values into numeric features as understood by the machine learning model. There is a mapping from each categorical value in a feature to a number (an integer) in label encoding. The label encoder does not add any additional columns to the dataset in this method, but the labels have no relationship [28]. If the value is repeating, it is assigned the same label assigned earlier. The column's first unique value is assigned an integer; for example, 1, the next unique value is given the incremented value 2, and so on [29].

Because of the limitation of the one-hot encoding, which increases the column size for every unique categorical variable, this method has not been used for data encoding and gone ahead with label encoding. For example, the number of unique values in the categorical column location was many. Performing one-hot encoding on this column resulted in as many new columns, which reduced the model’s performance, considering the data size (rows) was very small. This factor was involved in deciding to perform label encoding on the dataset.

Figure 16: Dataset after performing label encoding

As seen in figure 16 above, the dataset after performing label encoding on all the columns was converted to all numeric values. The model can easily understand and perform well on this data.

3.2.4 Web Tool to automate Data Extraction and Data Preprocessing

The amount of effort used for extracting the data from the scanned pdf and preprocessing it by removing the null values, cleaning up the data by eliminating noisy values, and exporting it to much cleaner data in a CSV file was significant. Hence, it would be nice if a tool could do the entire process of extracting the data, preprocessing it, and exporting it to a dataframe in a CSV file. With this intuition, we created an online web tool that takes in a scanned pdf file, does all the extraction and preprocessing, and returns cleaner data in a dataframe in a downloadable CSV file.

Since there was a python script written for the entire preprocessing process starting from extracting the data to converting it to a CSV file, it was feasible to utilize this script and create a web application that can be useful going forward when there is more data in coming years.

The web application is built on python's flask platform, supporting the frontend and backend on the platform. The frontend is built on HTML and CSS, and the backend is built on the python script.

This tool is a complete automation of the data preprocessing process. The web tool has an option to import a file. When a pdf file is imported, the python script is executed in the backend by sending this imported file from the frontend to the backend script. The script in the backend receives the imported pdf file and the process of executing the data

33 extraction and preprocessing begins. The script runs the same code we had written for the data preprocessing of the pdf files received from the police department at Sacramento State.

After the preprocessing is complete, the user who uploads the pdf file is prompted with a downloadable CSV file on the web portal. With this automation tool, it will be beneficial going ahead if there is increasing data from the police department, which can be utilized in training the model further.

Figure 17: Screenshot of the webpage for uploading file for preprocessing

As seen from figure 17, it is a screengrab from the webpage created for automatic preprocessing of a file. The user clicks on the choose file button to upload a pdf file with the format version similar to the pdf format of the department of police Sacramento State.

After choosing the file, the user clicks on the Upload button for the processing to begin.

After a few seconds, when the preprocessing is complete, the user is prompted with a download window box containing the downloadable converted and preprocessed CSV file.

3.3 Data Balancing

Handling data imbalance is a serious and challenging task in machine learning. The class is defined as a majority class if the count of that particular instance is higher than any other class in the dataset [30]. In contrast, a class is defined as a minority class if the count of that particular class instance is lesser than the other class in the same dataset [30]. This problem is referred to as data imbalance and the datasets as imbalanced datasets [31].

The seriousness of the data imbalance issue can be explained as follows: consider a dataset having 90% - 95% of instances from the majority class and the rest 5% - 10% instances from the minority class in the dataset. If all of the data predicted is from the majority class based on the classification algorithm, then the algorithm predicts with an accuracy of 90%

- 95%. In such cases, accuracy is not a good representation of the classification algorithm because the model is biased towards the majority class, and nothing is predicted from the minority class. Moreover, the minority classes would be considered as noise in the dataset by the classifier, and there is a possibility of eliminating it by the classifier. The primary problem faced because of an imbalanced dataset is that it can reduce the overall performance and accuracy of the classifier model [32]. It is important for improving the accuracy and for prediction of accurate results to balance the imbalanced dataset [30].

As seen in figure 18 below, these are the available methods to solve the data imbalance problem and are categorized into sections, namely Data Level, Algorithmic Level, and

Hybrid. We won’t be going through each of them but have utilized a few of the methods from this to solve the data imbalance issue faced in our dataset.

Figure 18: Methods for handling data imbalance problem

As discussed during the graph analysis in figure 13, we would be discussing the issue of data imbalance in our dataset in this section. From the graph, it can be seen the difference between the instances of the majority class and the other minority classes. To solve this issue of data imbalance, so the model predicts more accurately and gives accurate results without a bias towards the majority class, we have taken four data balancing approaches and choose the best among them. The approaches taken are as follows:

Data Level Approach – Resampling Techniques

1. Random Over-Sampling

In this approach, this method randomly replicates or add more copies of the minority class to increase the instances in order to match the number of majority class instances [33]. This approach is the right choice when the data is not large enough, such as in our case, but there are limitations to it as well.

When this approach was implemented, the results achieved were very similar to what we got before performing this sampling approach. This is because generally, resampling techniques work on a problem having binary class classification. Here there is a multi-class prediction problem, and hence on performing over-sampling, it does not increase the accuracy of the model. Moreover, the dataset count is in a couple of thousands, and even after over-sampling, there was no significant change in the dataset count.

2. Informed Over Sampling – SMOTE

Synthetic Minority Oversampling Technique (SMOTE) creates synthetic samples for the minority classes rather than replicating them [30]. A subset of the data from the minority classes is taken, and new instances of these classes are generated called synthetic samples.

Then these samples are replaced in the original dataset [33]. This technique is one of the most commonly implemented, which performs better than the random over-sampling approach. It uses the nearest neighbors’ algorithm for generating synthetic data, which can be used in training the model.

We implemented this approach in our research for data balancing. However, due to the limitation of this technique, during generation of the synthetic samples, SMOTE excludes

37 neighboring samples from different classes resulting in an increased number of similar classes that could potentially introduce additional noise [33].

We did not achieve results as expected through this approach, but the results were improved better than the random over-sampling method.

Algorithmic Level Approach – Ensemble Techniques

3. Bagging Based Technique

Ensemble methods generally use multiple algorithms and techniques to obtain better performance than any learning algorithm used alone. Here the primary goal of ensemble techniques is to improve the performance of the single classifiers [34].

Bagging algorithm generates ‘n’ different bootstrap samples with replacement, and separately trains the algorithm on each bootstrapped sample and finally aggregating the predictions [33]. This means the bagging algorithm builds multiple training samples on a different randomly selected subset of data.

We have implemented this approach by using the Balanced Bagging Classifier from the imblearn library instead of using the scikit-learn library’s Bagging Classifier as the latter does not perform the balancing of each subset of data. In contrast, the Balanced Bagging

Classifier, which this research has implemented, allows the resampling of each subset of data before training each of the samples.

Through this approach, the results achieved were good compared to the other data balancing techniques used so far. This approach can be considered as a functional approach and works with most of the data balancing issues for any problem. There are a few

38 advantages of using this technique, such as it improves the accuracy and stability of the machine learning algorithms. It overcomes overfitting and reduces variance.

4. Boosting Based Technique (XG Boost)

XGBoost (Extreme Gradient Boosting) is a very efficient and advanced implementation of the Stochastic Gradient Boosting algorithm. It offers a range of hyperparameters that can be tuned to give fine control over the model training [35]. Though the XGBoost algorithm performs well even on imbalanced classification datasets in general, it provides a way to tune the training algorithm to pay more attention to the minority class misclassification for the imbalanced datasets [35].

The stochastic gradient boosting algorithm, which is also referred to as tree boosting, is an effective machine learning algorithm that performs well on a wide range of challenging machine learning issues [35]. Tree bosting has shown to give powerful results on many standard classification benchmarks [36].

Hence, this algorithm has been implemented in our research to predict crime on-campus.

This approach has proved to be the best algorithm among all the other data balancing algorithms used and discussed above.

One of the critical and primary reasons for this algorithm to work is the hyperparameter tuning, which suits the imbalanced data more. Since the XGBoost algorithm is an implementation of the stochastic gradient algorithm, it works well with data having a multiclass classification problem. Another advantage which XGBoost has is it also works well with less amount of data, which is true in our scenario. As discussed in this published research paper [36], “The most important factor behind the success of XGBoost is its

39 scalability in all scenarios.” All these advantages have proven to be true when the dataset is imbalanced as in our research.

CHAPTER 4

CRIME PREDICTION USING PREDICTIVE ANALYTICS

Once there was a good understanding of the dataset and know the problem that is to be solved, the next step is to choose an algorithmic model that could help tackle the problem.

The approach that has been implemented uses the predictive analytics modeling technique, which is a way of building a machine learning model using an algorithm that is capable of making predictions by immense learning from the properties of the training dataset.

The predictive analytics technique is primarily the use of historical data to predict future outcomes. Generally, the process uses data and performs analysis, statistical methods, machine learning techniques on that data to create a predictive model for future outcomes [30].

Even PredPol [38], an innovative and proven crime predictive technology tool, uses the predictive analytics methodology in their successful tool, which has helped police departments in reduction of crime by 20%.

We have proposed a solution for the crime prediction problem on Sacramento State based on the predictive analytics approach. As seen in figure 19 [37], predictive analytics workflow is described in the image. The first step is to gather the data and explore the data collected from sources. Then comes the next step for preprocessing the data. This workflow has been followed till now in this research project and have explained the work done in this research for data exploration and data preprocessing. Now comes the next step in the workflow, where we discuss the predictive models developed for the crime on the

Sacramento State campus in this section. The scope of this research is limited to the first three steps of the workflow.

Figure 19: Predictive analytics workflow

Predictive analytics is one of the growing predictive techniques “capturing the support of various organizations, with a global market projected to reach approximately $10.95 billion by 2022, according to a repost issued by Zion Market Research [39]”. It has multiple benefits; for example, an organization that manufactures equipment can use this technique to minimize operating costs by anticipating equipment failures. For example, elevator sensors can signal the need for maintenance even before the elevator goes down. Predictive analytics can also be used in the crime domain to identify and prevent multiple criminal activities and understand criminal behavior before any real harm is done [39]. Predictive analytics is valuable for this research in the following ways:

1. It helps us in understanding a crime pattern and crime progression over the years.

2. Data visualization helps in mapping outliers and detecting the data imbalance issue in the dataset.

3. It can help observe the relationship between the input attributes with the target variable.

4. Through predictive analytics, we can bring up some past facts that can prove useful to the Sacramento State police department for better planning and patrolling strategies.

4.1 Crime Prediction using Machine Learning Algorithms

Classification is the process of segregating or distributing the input data into one or more classes or target variables. This process aims to segregate the data based on a set of rules that predict the class the data points belong to. Classification can be mainly of two types:

1. Binary Classification

In Binary Classification, the task is to categorize the dataset observations into two groups or classes by predicting which class each of the element belongs to, based on a defined classification rule [40].

2. Multiclass Classification

In multiclass classification, the task is to classify the dataset elements into more than two categories. There are three types of multiclass classifiers [41]:

1. Pigeonhole Classifier: Pigeonhole classifier is a multiclass classifier where every

item from the dataset is classified into precisely one class of the many classes

(pigeonholes).

2. Combination Classifier: In combination classifier, every item from the dataset is

classified into two or more output classes. Unlike the pigeonhole classifier, the

combination classifier does not classify into a single category for each of the inputs.

3. Fuzzy Classifier: Fuzzy classifiers assign each input variable into one or more

output categories by degree. This means the classifier computes a value between 0

and 1 for each category, and the output is a vector of values given to the classes.

The primary focus of this research is to use the pigeonhole classification technique to predict crime on the Sacramento State campus.

Below, are discussed the different classification methods implemented for predictive crime analytics in our research.

1. K- Nearest Neighbor

In K-Nearest Neighbor (KNN), the data is classified based on the majority vote of the neighbors of that particular new data point, which is yet to be classified. It works by finding the distance between a new point and all the other data points and based on the closest ‘k’ neighbors it can predict the label of the new point by assigning the most frequently occurring label [42]. The elbow method can be used to find the optimal ‘k’ value, or it can be a user-defined value selected intuitively.

For this research, the implementation of KNN was done to identify the crime type by looking at the previous crimes and identifying similar crime patterns and predicting the crime type value based on the ‘k’ nearest neighbors matched.

2. Random Forest

Random Forest is focused on Trees for regression and classification problems. A Random

Forest Classifier performs classification by generating new decision trees that use different samples of data in the training process. It predicts the accuracy by calculating the average values at each decision tree. In standard trees, each node's split is determined using the best split among all the variables. But, in Random Forest, the split at each node is done using the best predictors among the subset of randomly chosen predictors at that node [43]. This strategy proves to perform well compared to other classification algorithms such as Support

Vector Machine, Neural Networks. One additional advantage is it is found to be robust to noise and outliers. Also, implementing a Random Forest is simple and can be easily parallelized [44].

3. Gradient Boosting using XGBoost

Gradient Boosting is a Classification and Regression machine learning technique. It works with a wide variety of loss functions and generates an ensemble of prediction models. This algorithm constructs additive tree-models and selecting the tree’s size to build is an important consideration used in this technique. The accuracy of the model and the speed of execution can further be improved by randomizing the procedure during each iteration of building the tree. A subsample of the training data is randomly chosen from the entire training dataset without substitution and is used to fit the base learner and measure the precision for that iteration [45]. This modification to incorporate randomness to the original boosting algorithm is referred to as the Stochastic Gradient Boosting algorithm [45].

For this research, XGBoost (eXtreme Gradient Boosting) algorithm has been implemented, it is an efficient and an advanced implementation of the Stochastic Gradient Boosting algorithm designed for speed and model performance. There has been growing popularity recently for the application of XGB in machine learning [36]. This is because of the advantages XGB brings to the table, such as its scalability in all scenarios and its ability to solve machine learning problems in an accurate way. The basic idea is to group multiple trees that have low accuracy in building a more precise model. Each iteration, a new tree is generated for the model. The gradient boosting algorithm makes use of gradient descent to generate new trees by moving the base function towards the minimum direction. The

XGB algorithm has a list of configurable hyperparameters that are extremely useful when working on the classification problem. Few of the hyperparameters such as “max_depth”,

“min_child_weight”, “scale_pos_weight”, “learning_rate”, “gamma”, “objective” etc. can be tuned as per the problem definition:

o “max_depth” is used for specifying the depth of the tree, with higher depth the

model can learn relations specifically about the sample.

o “min_child_weight” is for defining the minimum weighed sum for all the attributes

in a child node.

o “scale_pos_weight” is generally used when there is high imbalanced data such as

in our research.

o “learning_rate” is used to make the XGB model robust with reduction in weights

for every step.

o “gamma” is used to tune the minimum loss reduction which is used while making

a node split.

o “objective” specifies the loss function, which is to be minimized, this research uses

“multi:softprob” as the objective criteria which is helpful when the classification is

multiclass and for each class there is a probability value returned as in our research.

For this research, all these hyper-parameters listed above have been tune based on the dataset and further combinations of each of them before selecting the right value for the hyper-parameters. A grid search has been performed on the training dataset with different combinations of the hyper-parameters to choose the optimum value for all the hyperparameters before training the model.

4.2 Crime Prediction using Deep Learning Algorithm (ANN)

Artificial Neural Network (ANN) has been used in our research as an alternative approach to predict crime based on the past data. ANN is defined as nothing but the computational methods that are similar to the human brain in terms of simulating a task by distributed parallel processing of each individual unit. These modules or units are referred to as nodes or neurons with the neural characteristic that store functional knowledge in order to be available to the user by changing weights.

Generally, there are several layers in a neural network. The initial layers where the signals are received from the outside world are known as the input layer, and the layer in the end, which sends the signal to the outside world, is known as the output layer, and the rest are known as hidden layers.

Based on the signal flow between the neurons, the neural networks can be divided into two types, namely:

Feed-forward networks: In this type of network, the signals are propagated directly from the input till the output layer, and no feedback connections or cycles are present.

Recurrent networks: In this type of network, the signals can be propagated backward, which means there can be feedback connections or cycles present.

For this project, a feed-forward neural network has been implemented with a design of one input layer containing 256 neurons, connected with three hidden layers and dropout layers alternatively to an output layer. The output layer contains the neurons equivalent to the number of crime types to be predicted, which is nothing but the output classes. The hidden layers had 128 neurons each, and the dropout layer with a rate of 0.3.

Selecting an activation function:

In ANN, activation functions play a vital role in assessing if the feature representing each neuron is activated. There are three most commonly used activation functions, namely,

TanH, Sigmoid, and ReLu. Out of which, it has been proven that for classification problems, ReLu performs well and gives good accuracy compared to the classifiers using other activation functions based on the research done on the MNIST data [46]. Hence,

ReLU was selected as the activation function for predicting crime using ANN.

Selecting a backpropagation optimizer:

While training a neural network, backpropagation using gradient descent is used. An adaptive step-size method was developed for improving the convergence of the gradient descent method by researchers who suggested Adaptive Moment Estimation (Adam)

48 optimizer as the most common approach. For multiple parameters, the Adam optimizer measures individual learning rates [47]. In our research, Adam optimizer was selected to be used.

Based on these decisions and other hyperparameter optimizations, all the training hyperparameters for the classification model were selected and built a neural network to predict crime type based on the training data.

CHAPTER 5

EXPERIMENTAL RESULTS

This section looks at the results achieved after performing the various crime prediction techniques using the multiple approaches implemented, as explained above.

First, described here are the Metrics undertaken for each of the approaches and the error measurement techniques. Then, moving on to compare the results from all the algorithmic approaches which have been implemented.

5.1 Performance Evaluation Metrics

5.1.1 Accuracy

We have used Accuracy as one of the most critical performance evaluation metrics for comparison against the algorithms implemented in our research. Accuracy denotes the number of correctly predicted samples among the total number of input samples.

Figure 20: Performance evaluation metric accuracy

It is often misused in terms of using it as the only performance metric or the primary performance metric because accuracy is a useful metric only when the number of classes are nearly balanced. For this research, accuracy was used as one of the evaluation metrics.

The distinctions can be made between the different accuracy values among the algorithms

50 used as we have performed the data preprocessing steps and have used the data balancing technique, which has a positive effect on the model's performance.

5.1.2 Confusion Matrix

Confusion Matrix is one of the intuitive performance evaluation metrics as it suggests multiple metrics from the matrix. It is generally used when the classification is multiclass, which means when the output classes are either two or more than two.

Confusion Matrix itself is not a performance evaluation metric, but various metrics can be derived from the Confusion Matrix, such as accuracy and f1-score.

5.1.3 F1-Score

F1-Score is another metric apart from accuracy, which can be used for evaluating the performance of the models. F1-Score, as shown in figure 22 below, is the harmonic mean of precision and recall. It ranges from 0 to 1, and any higher value indicates better performance of the model. The F1-Score metric is considered to be used for most of the classification problems since it suggests the classifier's preciseness and robustness, which generally means the number of classes classified correctly and the most number of classes not missed, respectively.

Figure 21: Formula for recall and precision

Figure 22: Formula for calculating F1-Score

We have used in our research F1-Score as one of the performance metrics for evaluating the model precision and robustness.

5.1.4 Loss Function for ANN

Since ANN was implemented for classification on the Sacramento State campus crime dataset, it was essential to understand which loss function would have an improved impact on the model. Cross-entropy was used as the loss function in our research, which measures the distance between the predicted output and the actual output and determines how accurate the model is. Generally, for multiclass categorical classification, the categorical cross-entropy is used with the SoftMax activation function, predicting the probability of all the individual classes.

5.1.5 Train and Test Data

Generally, for any machine learning or deep learning model to evaluate its metrics, there has to be some amount of data split, which is required for us to determine the accuracy of the algorithm. Splitting the original data into two partitions based on a ratio is referred to as data splitting into training and testing data. The maximum amount of data is the training data fed to the algorithm to learn from it. The other partition, which is minimal in amount, is the testing data, which is then used after the model is trained to evaluate the performance using some of the metrics described above. It is essential to separate the train

52 and test data in such a way where there is no repeated data from the training sample in the test sample.

We have implemented this concept for this project to divide the data into train and test data using the sklearn python library. The data was divided into the ratio 80-20, where 80% is training data, and 20% is testing data. The training data had a count of 1556 observations and a test data count of 389 observations after this split.

5.2 Comparison of Approach-Wise Results

In this section, the results for each of the algorithms used in all the approaches implemented as part of this research are discussed.

5.2.1 Approach 1: Machine Learning Approach with All Attributes

In this approach, the entire dataset after performing the preprocessing steps was considered. The base model in this approach has all the input attributes and 10 output categories, which is the crime type to be predicted.

The results using this approach were not as good as what was expected. The accuracies of the models were less as stated below:

1. KNN accuracy and log loss – 39% accuracy and 11.71 log loss

2. XGB accuracy and log loss – 58% accuracy and 1.36 log loss

3. Random Forest accuracy and log loss – 58% accuracy and 2.75 log loss

It can clearly be seen from the above results that the initial model performs low since there were no preprocessing steps, such as data balancing and feature extraction applied to the data.

5.2.2 Approach 2: Machine Learning Approach with Feature Extraction

In the 2nd approach, a machine learning model was built after performing the necessary data preprocessing steps and an additional step differentiating this approach from the 1st approach by performing feature extraction. After completing feature selection on the data, all input features except “time” were considered, since “time” is a varying feature with a lot of unique values, and it might prove an outlier for the prediction models.

Table 4: Input features for approach 2 Input Features Type address Categorical week_number Categorical week_day Categorical month Categorical part_of_day Categorical time_hour Categorical year Categorical

As seen in table 4, seven input features were taken in this approach. The training and testing data were split by 80% - 20%, and the number of rows the training dataset contained was

1839 rows, and the testing data had 460 rows.

In this approach, there was an increase in the accuracy. The highest accuracy among all the models was found to be 62% for the XGB model. But there was much more improvement needed, which could give us a more accurate model. Since not just accuracy, there are other metrics such as precision, recall, and f1-score to consider the overall performance of the model. Apart from the accuracy of 62%, the other metrics were comparatively on the lower side. The reason for this is because of the high imbalance in the dataset; the classifier

54 predicted correctly only those crime types, which has a higher count value hence the reduced accuracy and other metrics.

5.2.3 Approach 3: Machine Learning Approach with Data Balance and Aggregation

Since the data points are only 2299 in total, and after splitting it into train and test data, it becomes even less for training the models. This amount of data is comparatively less for more accurate results. And the other problem which seen in this dataset is the data imbalance issue for the output classes, which is highly skewed, as discussed before in during the data analysis section. For instance, the crime type ‘Theft’ and ‘Hit and Run’ combined has the count of 1530 data points, and the rest is distributed among 8 other crime types. The difference in the number of data points between the 2nd and 3rd most occurring crime type is 390, which is vast. Data imbalance is a pervasive problem and often gives low accuracies with the classes having minority class count.

This research has implemented various data balancing techniques, as discussed before, in section 3.3, entirely dedicated to data balancing. After performing different data balancing techniques, the best results were achieved using the Boosting based data balancing technique discussed above. Also, data aggregation was implemented since seeing the difference between the bottom five crime category count, it is evident to achieve low metrics. With this approach, the class having lesser data points was eliminated to make the data more balanced.

In this approach, the crimes having fewer occurrences were eliminated and the top 5 crimes were considered. The top 5 crimes were kept as is and removed the other data points with

55 the remaining crime type. Even after removing the bottom 5 crimes, there were 1945 data points, which is not so bad comparatively.

Then, the data was aggregated to get more accurate and precise results. The different locations with similar properties were combined, for example lot 1 to parking structure 1, lot 2 to parking structure 2, etc.

Then, a similar approach was taken, as discussed earlier, by label encoding the features and considering the critical features compared to the unimportant ones. The features which were considered for this approach are in table 5.

Table 5: Input features for approach 3 Input Features Type address Categorical week_number Categorical week_day Categorical month Categorical part_of_day Categorical year Categorical

Before implementing the prediction models, a new approach was implemented on this data, which was much cleaner. Considering the few data points for including the top 5 crimes, the next step was performing data balancing using the Boosting technique, and the results of the prediction models are as follows:

As seen in table 6, the best performing algorithm among the three implemented is XGB.

Random Forest algorithmic model also gives good accuracy, but XGB has a slight advantage when compared to Random Forest because XGB, as discussed, is one of the most suitable algorithms for our research, and the result proves the discussion for choosing

XGB for predicting crime is accurate. Also, the log loss for XGB is the lowest at 0.76, which is one of the best metrics to evaluate a classification-based model having class imbalance. The lower the log loss, the better the model has performed.

Table 6: Final results of crime prediction models Machine Learning models KNN XGB Random Forest Accuracy 0.71 0.78 0.75 F1 Score (class 0: 0.07 / 0.77 0.10 / 0.80 0.08 / 0.77 burglary / class 1: hit and / 0.00 / 0.81 / / 0.43 / 0.88 / / 0.38 / 0.86 / run / class 2: possession of 0.06 0.09 0.02 substance / class 3: theft / class 4: vandalism) Recall 0.36 0.43 0.40 Log loss 5.37 0.76 1.61

As seen in table 7, the table presents a classification report for the XGB model and gives a closer look at all the metrics which are essential for a model to perform well.

Table 7: Classification report for XGB precision recall f1-score support burglary 0.20 0.05 0.10 20 hit and run 0.73 0.82 0.80 104 possession of controlled substance / 0.54 0.29 0.43 24 marijuana or public drunkenness theft 0.81 0.93 0.88 215 vandalism 0.17 0.04 0.09 26 Accuracy 0.78 389 Macro avg. 0.49 0.43 0.43 389 Weighted avg. 0.69 0.76 0.71 389

5.2.4 Approach 4: Deep Learning Approach Using ANN

In this approach, as discussed before in the implementation phase, Artificial Neural

Network was implemented as a feed-forward classification-based network, which takes a converted vector of input variables and gives the probability of occurrence of each of the output variables.

Figure 23: Summary of ANN model

Artificial Neural Network algorithm was applied to classify the crime type classes among the most favorable class based on the probabilities. ANN predicts the most probable class and helps in categorizing the class. As seen in figure 23, the summary of the ANN model

58 that has been implemented. An accuracy of 66% was achieved, which is very good compared to the accuracy score for an ANN model with a very unbalanced dataset.

Figure 24: Training loss curve As seen in figure 24, the training loss starts from 1.25 and gradually reduces to < 0.90 as the epochs increases. With this curve we can validate that the training was performed well, but the loss can be more lesser if all the output classes had almost same amount of data.

Since the ANN model did not perform well compared to the best model XGB, we have concluded that XGB would be a suitable model for crime prediction on the Sacramento

State campus.

5.2.5 Results on Unseen Real Data

To test the crime prediction model, the test data was used to find how accurate the model is. The test data is the actual data which is partitioned from the dataset based on the train and test ratio. The accuracy was as seen in the sections above. But just testing the accuracy on the test data would not give any real understanding of the model. The goal was

59 to get the predictions by giving the unseen real data to the model as the input variables and observed how the model would predict and what the output would be.

There were two sets of unseen real data given to the best performing model XGB and observed the output. The number of output classes which the model could return were enhanced and have implemented probability-based result. This means there is a probability associated with each output classes. Since, the output classes are 5 that is namely:

• ‘burglary’

• ‘hit and run’

• ‘possession of controlled substance/marijuana or public drunkenness

• ‘theft’

• ‘vandalism’

The XGB model was implemented in such a way so it gives a probability to each of the classes and shows the top 3 probable crime type that can occur for the specific inputs. The unseen inputs provided were the location, part of the day and time as inputs. Since the dataset had values only till 2019, a date in a future instance in 2020 was given to the model.

The example inputs and outputs are as shown below:

Example Input 1:

• Location = ‘rec and wellness center’

• Part of the day = ‘night’

• Date = ‘07/01/2020’

Top 3 Probable Crimes:

• ‘theft’ [probability = 0.55]

• ‘possession of controlled substance/marijuana or public drunkenness’

[probability = 0.30]

• ‘burglary’ [probability = 0.05]

Example Input 2:

• Location = ‘parking structure 2’

• Part of the day = ‘morning

• Date = ‘07/01/2020’

Top 3 Probable Crimes:

• ‘hit and run’ [probability = 0.49]

• ‘vandalism’ [probability = 0.25]

• ‘theft [probability = 0.10]

As seen from the above results the model predicts quite different for each of the input sets.

There cannot be a surety that the crime will occur for sure but with this there is an idea of the probable crimes which could occur in future based on the model’s training. This could be useful for the police department at the Sacramento State for preventing crime from occurring by taking proactive measures.

5.2.6 Summary of the Results

To conclude the results achieved and select one final result among all the approaches taken, this section discusses the summary of the results. This research discusses four approaches implemented using various algorithms. Three out of the four approaches use machine learning algorithms with different attributes and preprocessing steps. One of

61 the four approaches use deep learning algorithm ANN as a classification algorithm for this research. The ANN algorithm was implemented only after performing all the data preprocessing steps and after achieving good results from machine learning algorithms.

The reason for doing this is, if a machine learning algorithm performs well after data preprocessing steps then the data features would have finalized after feature selection process which will be beneficial for ANN. Hence the first three approaches discuss machine learning algorithms and the last approach deep learning.

In the first approach, this research implemented the machine learning on the training data without performing feature extraction and data balancing. In this approach as discussed the results were not accurate and precise enough as expected. The table 8 below, it shows in brief the accuracy for all the machine learning algorithms in this approach.

Table 8: Accuracies for machine learning algorithms in 1st approach Machine Learning models KNN XGB Random Forest Accuracy 0.39 0.58 0.58

In the second approach, this research continued the implementation of the machine learning algorithms, but in this approach, few features were eliminated from the dataset after performing feature extraction. After performing feature extraction seven features were selected for training the models. The results achieved using this approach were much better than the first approach but since data was highly skewed the results were biased for a particular class. The table 9 below shows the accuracy for all the machine learning algorithms implemented using this approach.

Table 9: Accuracies for machine learning algorithms in 2nd approach Machine Learning models KNN XGB Random Forest Accuracy 0.52 0.62 0.60

In the third approach, this research implemented the machine learning algorithms, but after performing the feature extraction and also using the data balancing algorithm.

Various data balancing techniques were implemented and out of that the best data balancing technique that uses Boosting algorithm was selected and using that technique the data balancing was performed on the dataset. After completion of the necessary data preprocessing steps the models were applied, and the results achieved using this approach were the best among all the approaches implemented using the machine learning algorithms. The table 10 below, shows the accuracies for the machine learning algorithms implemented using this approach. The accuracies increased because there were number of preprocessing steps done such as feature extraction, data aggregation, data preprocessing, data balancing. After performing these steps, the algorithm’s performances increased.

Table 10: Accuracies for machine learning algorithms in 3rd approach Machine Learning models KNN XGB Random Forest Accuracy 0.71 0.78 0.75

In the fourth approach, deep learning algorithm ANN was implemented as a feed- forward classification algorithm. This algorithm performs well on multiclass classification if the dataset is completely balanced. Since the dataset in this research was not balanced completely the ANN model could not perform well on this dataset and gave accurate results

63 but not better than the XGB machine learning algorithm, since XGB is specifically suitable for unbalanced data as it has multiple hyperparameters that can be modified according to the problem. The table 11 below shows the accuracy achieved using the ANN approach.

Table 11: Accuracies for deep learning algorithm in 4th approach Machine Learning models ANN

Accuracy 0.66

Table 12 below, shows the results for all the algorithmic models. The results in the tables are the best results that were achieved after performing all the necessary preprocessing and performing hyperparameter modification.

Table 12: Results of all algorithmic models Machine Learning models KNN XGB Random ANN Forest Accuracy 0.71 0.78 0.75 0.66

Since, XGB performed well among all the machine learning and deep learning algorithms combined. This research has concluded the XGB algorithm as one of the best algorithms for crime prediction on campus. This algorithm can be applied for building a smart campus application that can predict crime using the previous data.

CHAPTER 6

CONCLUSION AND FUTURE WORK

In this work, we have performed applied research on a unique topic that has not been discussed a lot recently and have provided a detailed analysis of how campus crime can be mitigated and reduced. Through this research, it was vastly seen the importance of data preprocessing steps on a dataset with minimal values used for the multi-class classification problem. This research has tried to fill the gap of bringing the machine learning technology and its application on an unknown problem together. There have been significant advancements in machine learning technologies, but it is vital to know which technology is applied for what type of problem.

This research has tried to address this issue of finding the best machine learning technologies that can be applied to predict crime on a university campus, and we have taken the Sacramento State campus as the subject of this research.

In this research in-depth data preprocessing was performed, including interactive data visualizations and many other data balancing and data cleaning techniques. After studying the visualization graphs and implementing the data preprocessing steps accordingly, this research concludes that the Boosting-based data balancing technique

XGBoost provided stability and robustness in the data for applying accurate modeling techniques.

This research also concludes that predictive analytics, which is currently a popular technique for such prediction problems, was applied and showed good results. It proved to

65 be a fitting technique for the crime prediction problem in our research. This research has implemented multiple crime prediction algorithms such as KNN, Random Forest, eXtreme

Gradient Boost, and Artificial Neural Network. Out of the algorithms applied, every algorithm gave accurate results, but the eXtreme Gradient Boost algorithm with the hyperparameter tuning was the best among them all. Hence, the research concluded this algorithm could be applied for prediction on the Sacramento State crime dataset.

This research has discussed the various steps undertaken, starting from retrieving the data, preprocessing the data, performing data balancing techniques, and finally, application of crime prediction approaches using predictive analytics techniques. There have been discussions on many approaches done previously by researchers and students, but most of the works focused on forecasting crime and generating informative crime patterns at a larger scale using combined data from various sources such as the comprehensive crime data for all campuses. But, hardly there are any research papers or an article that suggests or discusses the crime prediction techniques for a particular campus using the university crime police department's dataset. This research is thus, unique in this way. The focus was on generating informative crime patterns and have developed predictive mechanisms that give an excellent accurate prediction based on the past crime data for campus crime.

While coming up with a solution to this problem of campus crime, there have been various trial and error mechanisms. With that, there are multiple techniques which are suitable for such issues. In this section, we would like to discuss how this research would be beneficial in a generic environment for any researcher to start working on a similar

66 problem of prediction on any dataset. The process is discussed, as shown in figure 24 below:

Figure 25: Flowchart of generic crime prediction technique

The first and foremost step is to gather data and check for the data source. If the data is raw in hard copy files or binders, some prerequisite steps must be performed. First of all, extracting the data into a digital format that can be processed by the language under use, python. Second, data cleaning, which refers to removing any noise or outliers from the extracted dataset in python.

The next step is common for both the sources of data that is raw dataset and a complete dataset, which is formatted and cleaned already. This step is the Data

Preprocessing step, which was discussed during the beginning of the research report. The main focus here should be to visualize the data and generate informative patterns. One of the crucial steps is to perform data balancing if the data is skewed.

The next step, after analyzing data through visualization, is to apply a predictive analytics model. We have primarily suggested applying KNN, Random Forest, XGB, and

ANN to achieve good results. Then compare the results and choose a model which suits as per your use case.

This approach has been proven to work on a dataset which was highly skewed for example, the dataset discussed in this research. If there is an ongoing research that requires an approach for prediction on the dataset then this flowchart could be highly beneficial as a starting point to gather all the necessary information regarding the prediction problem and to start work on the problem using predictive analytics as the first option.

Future work can have many enhancements on top of this research as the data quantity is one of the major concerns of this research. Going ahead, there can be improvement in the accuracy of the model by feeding in more data over all the years combined.

The dataset contained minimal attributes. The accuracy could have enhanced with more features, for example, the age of the person who committed the crime, the geographic locations of the crime, and the information about the arrest. All these data could help find patterns related to the crime and help the Sacramento State Department of police take necessary actions based on the crime patterns.

To take a further step for mitigating crime, analysis of video footage or images around the crime area could be highly beneficial. A CNN (Convoluted Neural Network) model, which is the best algorithm for training image-based data, could be used on such data and get insights into criminal approaches to reduce crime on campus.

Appendix: University Names and Email Sent

University Names:

South Alabama University University of North Texas CSU East Bay UC Davis University of South Dakota Montclair State University Langston University Cerritos College University of Illinois Oakland University UC Riverside Tarleton State University

Email to University:

Hello Sir/Madam,

I am Ronit Naik, a master’s student at Sacramento State University. I am writing to know the research conducted at your campus. I am working on a research-based master’s project on “Crime Prediction on Campus” and would want to know, if to your notice, has there been any kind of research or project which focused on reducing crime or predicting crime on campus at your university? If there was any, could it be possible for you to share a copy of the research. I am working on including it in my research project as a survey.

Thank you! Ronit Naik

REFERENCES

1. College Factual, “California State University - Sacramento Crime and Safety in 2019,” [Online]. Available: https://www.collegefactual.com/colleges/california- state-university-sacramento/student-life/crime/ [Accessed: March 2020].

2. California State University, Sacramento, “About the Campus Police, Sacramento State University,” [Online]. Available: https://www.csus.edu/campus-safety/police- department/units-functions/ [Accessed: September 2020].

3. A.Wood and E. S. Levine, “A Recommendation Engine to Aid in Identifying Crime Patterns,” 49 INFORMS J. ON APPLIED ANALYTICS 154, 2019.

4. A. G. Ferguson, “Policing Predictive Policing,” Washington University Law Review, Volume 94, Issue 5, 2017.

5. G. O. Mohler, M. B. Short, P. J. Brantingham, F. P. Schoenberg and G. E. Tita “Self-Exciting Point Process Modeling of Crime,” Journal of the American Statistical Association, 106:493, 100-108, 2011.

6. City of Sacramento, “What is a Smart City,” [Online]. Available: https://www.cityofsacramento.org/Smart-City [Accessed: September 2020].

7. City of Sacramento, “Public Safety,” [Online]. Available: https://www.cityofsacramento.org/Smart-City/Public-Safety [Accessed: September 2020].

8. Media Advisory/News Release, “City of Sacramento Police Department,” [Online]. Available: https://apps.sacpd.org/Releases/liveview.aspx?reference=20161207-159 [Accessed: September 2020].

9. S. John, “The correlates of campus crime: An analysis of reported crimes on college and university campuses,” Journal of Criminal Justice. 22. 51-61, 1994.

10. Carr, J. L, “American College Health Association campus violence white paper,” American College Health Association, Baltimore, MD, 2005, February.

11. B. S. Fisher, J. Sloan, F. T. Cullen and C. Lu. “CRIME IN THE IVORY TOWER: THE LEVEL AND SOURCES OF STUDENT VICTIMIZATION*,” Criminology 36: 671-710, 1998.

12. R. Liao, X. Wang, L. Li and Z. Qin, “A novel serial crime prediction model based on Bayesian learning theory,” 2010 International Conference on Machine Learning and Cybernetics, Qingdao, pp. 1757-1762, 2010.

13. Yamuna, S. and N. Bhuvaneswari, “Datamining Techniques to Analyze and Predict Crimes,” The International Journal of Engineering And Science (IJES), 2012.

14. Campus Safety and Security, “U.S. Department of Education,” [Online]. Available: https://ope.ed.gov/campussafety/ [Accessed: July 2020].

15. Crime Data, Sacramento County Open Data, “Sacramento County Sheriff's Department,” [Online]. Available: https://data.saccounty.net/datasets/9a7f2df25a584ff9b55db274704ad7c9_0 [Accessed: July 2020].

16. Kotsiantis, Sotiris and Kanellopoulos, Dimitris and Pintelas, “Data Preprocessing for Supervised Learning.” International Journal of Computer Science. 1. 111-117, 2006.

17. C. M. Teng. “Correcting noisy data,” 16th International Conf. on Machine Learning, pages 239–248. San Francisco, 1999.

18. A. Yunus and M. Masum. “A Context Free Spell Correction Method using Supervised Machine Learning Algorithms,” International Journal of Computer Applications 176(27):36-41, June 2020.

19. J. B. Chandrasekar, S. Murugesh and V. R. Prasadula, “Deriving Big Data insights using Data Visualization Techniques,” 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, pp. 724-731, 2019.

20. Sacramento Open Data, “City of Sacramento,” [Online]. Available: https://data.cityofsacramento.org/ [Accessed: May 2020].

21. Langley P., “Selection of relevant features in machine learning,” Proceedings of the AAAI Fall Symposium on Relevance, 1–5, 1994.

22. S. Paul, “Beginner’s Guide to Feature Selection in Python,” [Online]. Available: https://www.datacamp.com/community/tutorials/feature-selection-python [Accessed: April 2020].

23. P. P. Ippolito, “Feature Selection Techniques”, [Online]. Available: https://towardsdatascience.com/feature-selection-techniques-1bfab5fe0784 [Accessed: April 2020].

24. A. Desarda, “Getting Data ready for modelling: Feature engineering, Feature Seelction, Dimension Reduction (Part 1),” [Online]. Available: https://towardsdatascience.com/getting-data-ready-for-modelling-feature- engineering-feature-selection-dimension-reduction-77f2b9fadc0b [Accessed: April 2020].

25. Potdar, Kedar and Pardawala, Taher and Pai, Chinmay, “A Comparative Study of Categorical Variable Encoding Techniques for Neural Network Classifiers.” International Journal of Computer Applications. 175. 7-9. 10.5120/ijca2017915495, 2017.

26. N. Singh, “Overview of Encoding Methodologies,” [Online]. Available: https://www.datacamp.com/community/tutorials/encoding-methodologies [Accessed: April 2020].

27. R. E. Bellman, “Dynamic Programming,” Princeton University Press, Princeton, NJ, USA, 1957.

28. Von Eye, Alexander, and C. C. Clogg, “Categorical variables in developmental research: Methods of analysis.” Elsevier, 1996.

29. L. Arora, “Types of Categorical Data Encoding Schemes.” [Online] Available: https://medium.com/analytics-vidhya/types-of-categorical-data-encoding-schemes- a5bbeb4ba02b [Accessed: May 2020].

30. Vimalraj, Spelmen and Dr.R, Porkodi. “A Review on Handling Imbalanced Data.” Proceeding of 2018 IEEE International Conference on Current Trends toward Converging Technologies, Coimbatore, India, 2018.

31. JoonhoGon, “ RHSBoost: Improving classification performance in imbalance data,” Computational Statistics & Data Analysis. Volume 111, July 2017, Pages 1- 13.

32. Dr.D.Ramyachitra and P.Manikandan, “Imbalanced Dataset Classification and Solutions: A Review,” International Journal of Computing and Business Research (IJCBR). Volume 5 Issue 4 July 2014.

33. Upasana, “Imbalanced Data : How to handle Imbalanced Classification Problems,” [Online] Available: https://www.analyticsvidhya.com/blog/2017/03/imbalanced- data-classification/ [Accessed: May 2020].

34. W. Badr, “Having an Imbalanced Dataset? Here Is How You Can Fix It,” [Online]. Available: https://towardsdatascience.com/having-an-imbalanced-dataset-here-is- how-you-can-solve-it-1640568947eb [Accessed: May 2020].

35. J. BrownIee, “How to Configure XGBoost for Imbalanced Classification,” [Online] Available: https://machinelearningmastery.com/xgboost-for-imbalanced- classification/ [Accessed: May 2020].

36. T. Chen, and C. Guestrin, “XGBoost: A Scalable Tree Boosting System.” XGBoost Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2016.

37. Math Works, “Predictive Analytics,” [Online]. Available: https://www.mathworks.com/discovery/predictive-analytics.html [Accessed: June 2020].

38. Predpol, “Predictive Crime Analytics,” [Online]. Available: https://www.predpol.com/predicting-crime-predictive-analytics/ [Accessed: June 2020].

39. J. Edwards, “What is Predictive Analytics? Transforming data into future insights,” [Online]. Available: https://www.cio.com/article/3273114/what-is-predictive- analytics-transforming-data-into-future-insights.html [Accessed: June 2020].

40. Chaitra P.C and Dr.R. S. Kumar “A Review of Multi-Class Classification Algorithms.” International Journal of Pure and Applied Mathematics. Volume 118 No. 14 2018, 17-26.

41. B. Kolo, “Binary and Multiclass Classification,” books.google.com.

42. A. Kumar, A. Verma, G. Shinde, Y. Sukhdeve and N. Lal, “Crime Prediction Using K-Nearest Neighboring Algorithm,” 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, pp. 1-4.

43. A. Liaw, M. Wiener, “Classification and regression by randomforest,” R news, vol. 2, no. 3, pp. 18–22, 2002.

44. K. Sundharakumar and N. Bhalaji, “A Study on Classification Algorithms for Crime Records,” 873-880. 10.1007/978-981-10-3433-6_104, 2016.

45. Friedman, J.H., “Stochastic gradient boosting.” Comput. Stat. Data Anal. 38(4), 367–378, 2002.

46. F. Ertam and G. Aydin, “Data classification with deep learning using Tensorflow,” 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, pp. 755-758, 2017.

47. Diederik P. Kingma and Jimmy Ba, “Adam: A Method for Stochastic Optimization,” 2017.

48. I. Kawthalkar, S. Jadhav, D. Jain and A. V. Nimkar, "A Survey of Predictive Crime Mapping Techniques for Smart Cities," 2020 National Conference on Emerging Trends on Sustainable Technology and Engineering Applications (NCETSTEA), Durgapur, India, 2020.