Mining Educational Data to Analyze Students' Performance

(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011 Mining Educational Data to Analyze Students‟ Performance Brijesh Kumar Baradwaj Saurabh Pal Research Scholor, Sr. Lecturer, Dept. of MCA, Singhaniya University, VBS Purvanchal University, Rajasthan, India Jaunpur-222001, India Abstract— The main objective of higher education institutions is There are increasing research interests in using data mining to provide quality education to its students. One way to achieve in education. This new emerging field, called Educational Data highest level of quality in higher education system is by Mining, concerns with developing methods that discover discovering knowledge for prediction regarding enrolment of knowledge from data originating from educational students in a particular course, alienation of traditional environments [3]. Educational Data Mining uses many classroom teaching model, detection of unfair means used in techniques such as Decision Trees, Neural Networks, Naïve online examination, detection of abnormal values in the result Bayes, K- Nearest neighbor, and many others. sheets of the students, prediction about students’ performance and so on. The knowledge is hidden among the educational data Using these techniques many kinds of knowledge can be set and it is extractable through data mining techniques. Present discovered such as association rules, classifications and paper is designed to justify the capabilities of data mining clustering. The discovered knowledge can be used for techniques in context of higher education by offering a data prediction regarding enrolment of students in a particular mining model for higher education system in the university. In course, alienation of traditional classroom teaching model, this research, the classification task is used to evaluate student’s performance and as there are many approaches that are used for detection of unfair means used in online examination, data classification, the decision tree method is used here. detection of abnormal values in the result sheets of the By this task we extract knowledge that describes students’ students, prediction about students‟ performance and so on. performance in end semester examination. It helps earlier in The main objective of this paper is to use data mining identifying the dropouts and students who need special attention methodologies to study students‟ performance in the courses. and allow the teacher to provide appropriate advising/counseling. Data mining provides many tasks that could be used to study Keywords-Educational Data Mining (EDM); Classification; the student performance. In this research, the classification task Knowledge Discovery in Database (KDD); ID3 Algorithm. is used to evaluate student‟s performance and as there are many approaches that are used for data classification, the decision I. INTRODUCTION tree method is used here. Information‟s like Attendance, Class The advent of information technology in various fields has test, Seminar and Assignment marks were collected from the lead the large volumes of data storage in various formats like student‟s management system, to predict the performance at the records, files, documents, images, sound, videos, scientific data end of the semester. This paper investigates the accuracy of and many new data formats. The data collected from different Decision tree techniques for predicting student performance. applications require proper method of extracting knowledge II. DATA MINING DEFINITION AND TECHNIQUES from large repositories for better decision making. Knowledge discovery in databases (KDD), often called data mining, aims Data mining, also popularly known as Knowledge at the discovery of useful information from large collections of Discovery in Database, refers to extracting or “mining" data [1]. The main functions of data mining are applying knowledge from large amounts of data. Data mining techniques various methods and algorithms in order to discover and extract are used to operate on large volumes of data to discover hidden patterns of stored data [2]. Data mining and knowledge patterns and relationships helpful in decision making. While discovery applications have got a rich focus due to its data mining and knowledge discovery in database are significance in decision making and it has become an essential frequently treated as synonyms, data mining is actually part of component in various organizations. Data mining techniques the knowledge discovery process. The sequences of steps have been introduced into new fields of Statistics, Databases, identified in extracting knowledge from data are shown in Machine Learning, Pattern Reorganization, Artificial Figure 1. Intelligence and Computation capabilities etc. 63 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011 Knowledge but it becomes costly so clustering can be used as preprocessing approach for attribute subset selection and classification. C. Predication Regression technique can be adapted for predication. Regression analysis can be used to model the relationship between one or more independent variables and dependent variables. In data mining independent variables are attributes already known and response variables are what we want to predict. Unfortunately, many real-world problems are not simply prediction. Therefore, more complex techniques (e.g., logistic regression, decision trees, or neural nets) may be necessary to forecast future values. The same model types can often be used for both regression and classification. For example, the CART (Classification and Regression Trees) decision tree algorithm can be used to build both classification trees (to classify categorical response variables) and regression trees (to forecast continuous response variables). Neural networks too can create both classification and regression models. D. Association rule Association and correlation is usually to find frequent item set findings among large data sets. This type of finding helps businesses to make certain decisions, such as catalogue design, Figure 1: The steps of extracting knowledge from data cross marketing and customer shopping behavior analysis. Various algorithms and techniques like Classification, Association Rule algorithms need to be able to generate rules Clustering, Regression, Artificial Intelligence, Neural with confidence values less than one. However the number of Networks, Association Rules, Decision Trees, Genetic possible Association Rules for a given dataset is generally very Algorithm, Nearest Neighbor method etc., are used for large and a high proportion of the rules are usually of little (if knowledge discovery from databases. These techniques and any) value. methods in data mining need brief mention to have better E. Neural networks understanding. Neural network is a set of connected input/output units and A. Classification each connection has a weight present with it. During the Classification is the most commonly applied data mining learning phase, network learns by adjusting weights so as to be technique, which employs a set of pre-classified examples to able to predict the correct class labels of the input tuples. develop a model that can classify the population of records at Neural networks have the remarkable ability to derive meaning large. This approach frequently employs decision tree or neural from complicated or imprecise data and can be used to extract network-based classification algorithms. The data classification patterns and detect trends that are too complex to be noticed by process involves learning and classification. In Learning the either humans or other computer techniques. These are well training data are analyzed by classification algorithm. In suited for continuous valued inputs and outputs. Neural classification test data are used to estimate the accuracy of the networks are best at identifying patterns or trends in data and classification rules. If the accuracy is acceptable the rules can well suited for prediction or forecasting needs. be applied to the new data tuples. The classifier-training F. Decision Trees algorithm uses these pre-classified examples to determine the set of parameters required for proper discrimination. The Decision tree is tree-shaped structures that represent sets of algorithm then encodes these parameters into a model called a decisions. These decisions generate rules for the classification classifier. of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square B. Clustering Automatic Interaction Detection (CHAID). Clustering can be said as identification of similar classes of G. Nearest Neighbor Method objects. By using clustering techniques we can further identify dense and sparse regions in object space and can discover A technique that classifies each record in a dataset based on overall distribution pattern and correlations among data a combination of the classes of the k record(s) most similar to it attributes. Classification approach can also be used for in a historical dataset (where k is greater than or equal to 1). effective means of distinguishing groups or classes of object Sometimes called the k-nearest neighbor technique. 64 | P a g e www.ijacsa.thesai.org (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 2, No. 6, 2011 III. RELATED WORK Pandey and Pal [10] conducted study on the student Data mining in higher education

Mining Educational Data to Analyze Students' Performance

Applying Educational Data Mining to Explore Students' Learning

New Potentials for Data-Driven Intelligent Tutoring System Development and Optimization

Automl Feature Engineering for Student Modeling Yields High Accuracy, but Limited Interpretability

Educational Data Mining Using Improved Apriori Algorithm

A Survey of Educational Data-Mining Research

Predicting and Understanding Success in an Innovation-Based Learning Course

Predicting Student Success Using Data Generated in Traditional Educational Environments

Receiver Operating Characteristic (ROC) Area Under the Curve (AUC): a Diagnostic Measure for Evaluating the Accuracy of Predictors of Education Outcomes1,2 Alex J

Implementing Automl in Educational Data Mining for Prediction Tasks

Embedding Machine Learning Algorithm Models in Decision Support System in Predicting Student Academic Performance Using Enrollment and Admission Data

Educational Data Mining: an Advance for Intelligent Systems in Education

Association Rules Mining from the Educational Data of ESOG Web-Based Application Stefanos Ougiaroglou, Giorgos Paschalis