Data Mining Techniques Applying on Educational Dataset to Evaluate Learner Performance Using Cluster Analysis

EJERS, European Journal of Engineering Research and Science Vol. 3, No. 11, November 2018 Data Mining Techniques Applying on Educational Dataset to Evaluate Learner Performance Using Cluster Analysis Minimol Anil Job increase student's retention rate, and increase students‟ Abstract—Due to the advancement of technology in this consistency in various academic activities. digital era, academic institutions are bringing out graduates as well as generating enormous amounts of data from their systems. Hidden information and hidden patterns in large II. LITERATURE REVIEW datasets can be efficiently analyzed with data mining techniques. Application of data mining techniques improves Data Mining (DM) is described as a process of discover the performance of many organizational domains and the or extracting interesting knowledge from large amounts of concept can be applied in the education sectors for their data stored in multiple data sources such as file systems, performance evaluation and improvement. Understanding the databases, data warehouses etc. [3],[18]. Defining business value of the collected data it can be used for classifying and predicting the students’ behavior, academic characteristic of data mining is ‘Big data’ [50]. Data mining performance, dropout rates, and monitoring progression and is defined as “the process of discovering “hidden message,” retention. This paper discusses how application of data mining patterns and knowledge within large amounts of data and of can help the higher education institutions by enabling better making predictions for outcomes or behaviour” [30]. understanding of the student data and focuses to consolidate ‘Pattern’ is a single record that consists of input and output. clustering algorithms as applied in the context of educational Data mining is the process of analyzing data from different data mining. perspectives and summarizing it into useful information. Index Terms—Data Mining; Educational Data Mining; Data mining functionalities are classified into two broad Clustering, Cluster Analysis; Performance; Evaluation, categories as descriptive and predictive ones [2] [21]. The Prediction. main functions of data mining are applying various methods and algorithms in order to discover and extract patterns of stored data [48]. The implementation of data mining I. INTRODUCTION methods and tools for analyzing data available at Data mining is the knowledge discovery process, which educational institutions, defined as Educational Data Mining involves discovering hidden patterns, messages and (EDM) [22] is a relatively new stream in the data mining knowledge within large datasets and the process of research. Educational data mining is a research area falls analyzing outcomes or behaviors in various business under data mining. A survey of the application of data domains. Knowledge discovery and data mining can be mining techniques to various educational systems is given in considered as tools for decision-making as well as Romero and Ventura [41]. In an another work of Romero et organizational effectiveness. Data mining is a systematic al. [42],[44] on educational data mining, the application of process of extracting relevant knowledge using various various data mining techniques on data collected from the techniques from large structured and unstructured data. activities of students who use Moodle e-learning course There are a variety of different data mining techniques and management system is discussed. approaches available such as clustering, classification, and Brijesh Kumar Baradwaj and Saurabh Pal [9] describes association rule mining etc. One of the most challenging the main objective of higher education institutions is to tasks of the higher educational institutions is the provision provide quality education to its students. One way to and utilization of up to date information for their achieve highest level of quality in higher education system sustainability, to monitor student performance and to is by discovering knowledge for prediction regarding measure institutional effectiveness. In this paper, the enrolment of students in a particular course, detection of researcher analyses and presents the use of data mining abnormal values in the result sheets of the students, techniques in higher education sectors to evaluate learner prediction about students’ performance and so on. Romero performance. The available data in higher education and Ventura, in 2010 [43] published a paper in IEEE, which institutions can be evaluated using data mining techniques listed most common tasks in the educational environment and it will lead into discovering hidden patterns, messages resolved through data mining and some of the most and knowledge. Based on the results from the applied promising future lines. Educational Data Mining community techniques the institution management can take measures to remained focused in North America, Western Europe, and allocate relevant resources more effectively, make effective Australia/New Zealand. They mentioned that there is a decisions on educational academic activities to improve considerable scope for an increase in educational data students' performance, increase students' learning behavior, mining’s scientific Influence. They also suggested developing more unified and collaborative studies [43]. Published on November 21, 2018. There are increasing research interests in using data mining M. A. Job is Assistant Professor, Arab Open University, Kingdom of in education. This new emerging field, called Educational Bahrain. (e-mail: [email protected]) DOI: http://dx.doi.org/10.24018/ejers.2018.3.11.966 25 EJERS, European Journal of Engineering Research and Science Vol. 3, No. 11, November 2018 Data Mining, concerns with developing methods that discover knowledge from data originating from educational environments [4],[24]. Educational Data Mining uses many techniques such as Decision Trees, Neural Networks, Naïve Bayes, K- Nearest neighbor, and many others. Han and Kamber [21] describes data mining software that allow the users to analyze data from different dimensions, categorize it and summarize the relationships which are identified during the mining process [34]. Data mining tools predict future trends and behaviors, Fig. 2. Concept of Educational Data Mining allowing institution to make proactive, knowledge-driven decisions [11]. The automated, prospective analyses offered The major components of a data mining system are data by data mining move beyond the analyses of past events source, data warehouse, data mining engine, knowledge provided by retrospective tools typical of decision support base and pattern evaluation module [30]. It shows that our systems. Data mining tools can answer institution questions system is using the historical data from the data warehouse that traditionally were too time consuming to resolve server and then training the data after applying various pre- [10],[40]. processing techniques [23]. Data mining tasks can be done in two different ways; predictive or descriptive. Data processing undergoes two types of functions namely III. APPLYING DATA MINING IN EDUCATION clustering and classification. We can apply clustering in A. Knowledge Discovery Process predictive model or classification in descriptive model to the dataset. Following are the steps in Data mining process [25],[26]: Understand application domain C. Classification Create target dataset Classifying data into a fixed number of groups and using Data cleaning and transformation it for categorical variables is known as classification [33]. Apply data mining algorithm Classification can be classified into two types: Supervised Interpret, evaluate and visualize patterns and Unsupervised. When the objects or cases are known in Manage discovered knowledge advance is called supervised classification whereas Fig. 1 shows data mining as a step in an iterative unsupervised classification means the objects or cases are knowledge discovery process. not known in advance. The following algorithm can be used for classification model [19],[1]. · Decision/Classification tree · K-nearest neighbour classifier · Rule-based methods, · Statistical analysis, genetic algorithms · Bayesian classification · Neural Networks · Memory-based reasoning · Support vector machines D. Clustering Clustering is grouping similar objects. Clustering is defined as a process of grouping a set of physical or abstract Fig. 1. Step in an iterative knowledge discovery process object into a class of similar objects [38]. According to Larose [29] cluster does not classify, estimate or predict the B. Educational Data Mining value of target variables but segment the entire data into Educational Data Mining is an emerging discipline, homogeneous subgroups. Heterogeneous population is concerned with developing methods for exploring the classified into number of homogenous subgroups or clusters unique types of data that come from educational settings, are referred as clustering [6]. Furthermore, clustering task is and using those methods to better understand students, and an unsupervised classification. Clustering is a process where the settings which they learn in [28],[35]. The emergent the data divides into groups called as clusters such that field of EDM examines the unique ways of applying data objects in one cluster are very much similar to each other mining techniques education institutions academic and objects in different clusters are very much dissimilar to problems. Fig. 1 shows the concept of EDM. each

Data Mining Techniques Applying on Educational Dataset to Evaluate Learner Performance Using Cluster Analysis

Applying Educational Data Mining to Explore Students' Learning

New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization

Automl Feature Engineering for Student Modeling Yields High Accuracy, but Limited Interpretability

Educational Data Mining Using Improved Apriori Algorithm

A Survey of Educational Data-Mining Research

Predicting and Understanding Success in an Innovation-Based Learning Course

Mining Educational Data to Analyze Students' Performance

Predicting Student Success Using Data Generated in Traditional Educational Environments

Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): a Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes1,2 Alex J

Implementing Automl in Educational Data Mining for Prediction Tasks

Embedding Machine Learning Algorithm Models in Decision Support System in Predicting Student Academic Performance Using Enrollment and Admission Data

Educational Data Mining: an Advance for Intelligent Systems in Education