educare: a for anticipating behaviors among educational actors

Cláudia Antunes

Instituto Superior Técnico, Av Rovisco Pais, 1049-001 Lisboa, Portugal [email protected]

Abstract. Decision support systems play a core role in data management, due to its ability to provide the tools needed to help in the decision making process. Education is just another application field, where those systems may contribute to assist in its improvement. The educare project aims to design a decision support system, specifically created for enabling the discovery of hidden information about students and teachers performance.

Keywords: Decision support systems, Educational , Multi- dimensional model for education

1 Introduction

The educational process produces amounts of data, which can and should be used to understand its actors’ behaviors and to identify failure and success causes. Despite data is collected for years, even in digital storages, education poses a set of particular challenges in respect to data analysis and information discovery. Educational Data Mining (EDM) [1] is an emerging discipline with the goal of applying DM (DM) techniques to data that come from educational settings, like computer-based tutoring systems or the traditional teaching process. In either cases, data hide students’ usual behaviors, process definitions and coordination, teaching strategies and so on. In all these cases, data are framed in a particular and well- defined context, which can and should be used to better understand the data. The reason for creating a dedicated discipline results from the identification of a set of peculiarities that characterize educational data, in particular: the temporality of records, the impact of previous events (data) on the results that occur in the future, and the impact of the context on behaviors. These three factors together make that the application of basic DM tools (like decision trees or association rules) does not give an adequate answer to the questions issued by education professionals. See for example [2], where classification is performed under new considerations, and using estimations instead of fully recorded data. The educare project provides a tool that fallow the best practices among decision support, creating a and developing the adequate mining operations to reach those goals. In this paper, we describe the educare data warehouse, designing its general architecture and stating its main dimensions and facts. Each is described in detail, giving particular attention to their corresponding granularities. The rest of this paper is structured as follows: next we describe the data warehouse architecture; in chapter 3, we define the data architecture, dedicating a special attention to the main dimensions and present the different star schemas in the data warehouse. The paper concludes with some conclusions and guidelines for the tools for operating over the data warehouse.

2 The Educational Process in educare

The educare project aims to provide educational community with a prototype of a decision support system that covers the major aspects of educational data analysis, from students’ behaviors to teacher strategies, walking through programs and subjects organization. The basic idea is to consider two main entities in the educational context: actors (students and teachers) and curricular units; and to deal with different levels of abstraction for actors (individual, working group and set of actors), and curricular units (subjects, groups of subjects and programs). Using this conceptual framework, will then be possible to understand the entire educational process, and prevent and correct problematic situations, whenever is possible. The distinction between educare and other systems resides on the fact that each educational entity can be addressed by similar approaches, since all of them share the educational context. Indeed, the strategy to follow is to create a system that can be guided by contextual information (background knowledge), in order to anticipate failure situations either from actors’ or curricular units point of view.

2.1 Business processes

The educational process can be seen as a cycle (as shown in Figure 1), which involves students’ participation, teaching process and quality assurance.

Student participation

Quality Teaching Assurance process

Figure 1 Educational process

In this manner, the educare project considers three main business processes in education: student enrollment and evaluation (SEE), teaching activity (TA) and quality assurance (QA). The first process contemplates five atomic activities: application, admittance, subject enrollment, lesson attendance and item evaluation.

Subject Lesson Item Application Admittance enrollment attendance evaluation

Figure 2 Students' enrollment and evaluation process

The second process only envisages teaching activities; despite it can also cover both research and management educators’ tasks, in the future. The last process congregates the quality assurance process, both covering subjects’ organization and teaching performance.

3 System architecture

As any decision support system, educare must deal with historical and consolidated data, which should be stored in a way that enables its analysis, and naturally should follow an architecture similar to the typical one in this kind of systems. Figure 3 presents the generic architecture for the educare system (see [3]), which like any decision support system, is centered on a data warehouse full of historical and consolidated data.

Data Exploration External Data Data & Data Warehouse Information Visualization Operational Data Educational Data Mining

Metadata Knowledge Base

Figure 3 educare architecture

The data warehouse should be fed by operational data, and updated repeatedly on each curricular term, in order to be up to date. This storage should follow the multi- dimensional model to ease data analysis and to allow for exploring data at different granularities. In this manner, the development of mining processes will be easier, since data is available at different levels of detail, as needed. The interaction with the data warehouse will be distributed, and done through three modules: the data exploration, the data mining and the data and info visualization tools. Against typical decision support systems architecture, educare will maintain an additional repository, here on designated by knowledge base (KB). This repository will contain an ontology dedicated to the educational context and process, and should have as instances the information discovered by mining tools, such as the sequential patterns describing frequent behaviors, or just knowledge about curricular plans. Figure 4 presents de detailed architecture for the system, contemplating its backroom.

Back Room Front Room

Operational data Presentation Servers Data Exploration Tools

n Data Marts with

o i

t Aggregated Data

c

a

r t

x Data & Information E DW Visualization Bus Data Staging Loading Area Data Marts with Atomic Data Educational Data Mining Transformation

Knowledge Base

Figure 4 System detailed architecture with back room and front room

In order to fill the data warehouse, the system has to provide an additional tool for extracting, transforming and loading the data from the operational to the data warehouse, here on designated educare ETL tool. This tool comprises services for extracting the data from operational data sources, for transforming it to the most adequate formats and for loading this transformed data into the data warehouse. These services should consider the first batch (already existing data) and incremental posterior updates. The entire transformation path between operational data sources and the data warehouse will be documented in the metadata catalog, enabling posterior data warehouse maintenance and extension operations.

3.1 Data architecture in educare

In order to represent all those processes and atomic activities, the data will be organized around seven main dimensions: Student, Teacher, Subject, Program, Term, Curricular day and QAItem. Four additional dimensions will be also consider: Lesson type, Curricular topic, Subject QA survey, Teaching QA survey and Working group. Student The student dimension is the responsible for collecting identification, personal and application data. This is the central key for understanding all issues related with students and its results.

Teacher Similarly, teacher dimension represents educators, storing their identification, personal and professional data. It may be used for analyzing teaching behaviors.

Subject Subject dimension agglomerates the data about each course available at some school, from its goals to its description, passing through the responsibility hierarchy (department and group of disciplines).

Term The term dimension is the main entity with respect to temporal dimension, corresponding to the minor granularity of interest in the main business processes. It just has description data and the academic year.

Curricular day The curricular day dimension is the most detailed time unit, and describes each day of a specific term. This dimension should contain attributes that distinguish the characteristics of different days on the term, such as its ordinal on the term, the day of the week, etc.

Evaluation Item An evaluation item is just an item that students have to do in order to be evaluated in some subject, examples of this items are exams, homework or laboratory reports.

Program Program dimension represents data about graduation or post-graduation programs, and should include attributes like description, scientific area, curricular areas.

QAItem Each Quality Assurance Item (QAItem for short) in the dimension corresponds to one different issue evaluated in the quality assurance process, and can be seen as a question in QA surveys.

Lesson type Lesson type dimension records the different kinds of lessons, such as theoretical, practical or laboratories. Curricular topic Curricular topics correspond to the basic units of knowledge to present on lessons and to evaluate students.

Subject QA survey Quality Assurance Survey is a degenerated dimension, whose purpose is just to aggregate QAAs for each QAItem on each tuple Subject – Term.

Teaching QA survey Similarly, Teaching QA survey serves for aggregating QAAs for each QAItems about teachers’ performance on each tuple Teacher – Subject – Term – Lesson type.

Working group Finally, the working group dimension is used for representing the group of students that perform some collaborating work, in the context of some Subject in a specific Term, for Students being evaluated on some Evaluation item.

3.2 The BUS Matrix

Following Kimball’s methodology for designing a data warehouse [4], Figure 5 shows the bus matrix where business processes are crossed with available dimensions, giving a first clue about the granularities for each data mart.

COMMON DIMENSIONS

urvey

Topic

Type

BUSINESS PROCESSES Term Day Calendar Student Teacher Subject Program Item Evaluation Lesson Curricular Item QA S QA Subject QASurvey Teacher Group Working Application / X X X Admittance Subject X X X X Enrollment Lesson X X X X X X X X attendance Item X X X X X X X X evaluation Teaching X X X X Subject QA X X X X X Teaching QA X X X X X X X Figure 5 The educare bus matrix

The data in the educare data warehouse will be organized following the multidimensional model, considering two kinds of data: atomic and aggregated data. Next, we will introduce the multi-dimensional model for the three main business processes.

3.3 Student Enrollment and Evaluation Schema

Student enrollment and evaluation process comprises five fact tables, sharing the set of previously described dimensions.

Program Student enrollment

PK programID PK,FK1 program PK,FK2 student description PK,FK3 lastTerm Term alias PK,FK4 firstTerm name PK termID type final grade concluded? description academic year season Subject enrollment Subject first day last day PK,FK1 subject PK subjectID PK,FK2 term PK,FK3 student code name approved? ECTS Student grade group of disciplines PK studentID scientific area department student nr Subject result name PK,FK1 subjectID document nr PK,FK2 term birthdate PK,FK3 student birthplace nationality Evaluation item nr of enrollments address-street grade address-postalcode PK itemID previous school application grade description app exam 1 type Item evaluation app exam 1 grade PK,FK1 topic app exam 2 PK,FK2,FK5 day app exam 2 grade PK,FK3 subject Calendar day PK,FK4 student PK dayID PK item Teacher PK working group day PK teacherID month year tacher nr Curricular topic week name day of week PK topicID document nr day of term birthdate epoch description birthplace holiday? nationality Lesson attendance next2holiday? address-street address-postalcode PK,FK1 student PhD area PK,FK2 day PhD year PK,FK3 subject Lesson type MSc area PK,FK4 lesson MSc year PK,FK5 topic PK lessonID graduation area PK,FK6 teacher graduation year description category type category year department scientific area Figure 6 Student enrollment and evaluation constellation schema

The Student enrollment records the final grade obtained by a student when finishes its enrollment in a specific program. This enrollment has two distinct dates, the first term when student enrolls in the program and the last term, corresponding to the last term that student studies under the program.The Item evaluation fact table records student results on each evaluation item performed per subject. Its time granularity corresponds to the calendar day, and it also refers to evaluation items performed in group. In this manner working group is just a foreign key to the degenerated dimension Working group. Lesson attendance fact table lists the lessons given on a particular day, by some teacher in the context of some subject and attended by a specific student. Each lesson is also characterized by the lesson type and curricular topic covered. The other two fact tables record student enrollment in a specific subject. In the Subject enrollment fact table, each student enrollment in the subject in a particular term is recorded. In Subject result fact table it is recorded the result reach by student at the subject, and the number of times that student had enrolled on that subject. In this manner, the second fact table is an aggregation of the first one.

3.4 Teaching Activity Schema

The teaching activity process is represented in a data mart with the same name. Each record in the Teaching fact table records the participation of each teacher in some subject, playing some role (teaching some lesson type) in a particular term in the context of some program.

Subject Program Term Teacher

PK subjectID PK programID PK termID PK teacherID

code description description tacher nr name alias academic year name ECTS name season document nr group of disciplines type first day birthdate scientific area last day birthplace department nationality address-street Teaching address-postalcode PhD area PK,FK2 subject PhD year PK,FK3 lesson MSc area PK,FK4 teacher MSc year Lesson type PK,FK1 term graduation area PK,FK5 program PK lessonID graduation year PK QA survey category description category year QA grade type department QA stdev scientific area QA nr answers

Figure 7 Teaching activity

3.5 Quality Assurance Schema

At last the quality assurance process comprises three fact tables: Teaching QA survey, Subject QA survey and Subject execution. Each tuple in the first fact table records the average grade for a specific QAItem, reached by some teacher when teaching some subject in a specific term for a determined lesson type, in the context of some program. The set of QAItems for the same tuple teacher-subject-term-program can be aggregated by an entry in the QASurvey degenerated dimension. In a similar way, the second fact table collects data for subject surveys, with each tuple corresponding to the average grade obtained by a particular subject, in the context of some program in a term. Identically, QAItems referencing the same tuple subject-program-term can be aggregated by an entry in the QASurvey dimension. At last, the Subject execution fact table records some metrics for the same granularity, relating to the number of students enrolled, the number of approvals, the number of evaluated students, and the rate of approval over evaluated students and the rate of approval over enrolled students.

Teaching QA survey Lesson type Teacher PK,FK1 subject PK lessonID PK teacherID PK,FK2 teacher PK,FK3 program description tacher nr PK,FK4 lesson type name PK,FK6 term document nr PK QA survey QA item birthdate birthplace avg grade PK itemID nationality stdev address-street nr answers question address-postalcode group PhD area PhD year Subject QA survey MSc area Program MSc year PK,FK1 item PK programID graduation area PK,FK2 subject graduation year PK,FK3 term description category PK,FK4 program alias category year PK QA survey name department type scientific area avg grade stdev nr answers

Term Subject execution PK termID Subject PK,FK1 term PK subjectID PK,FK2 teacher description PK,FK3 subject academic year code PK QA survey season name first day ECTS avg grade last day group of disciplines stdev scientific area nr answers department nr evaluated nr enrollments nr approved rate evaluated rate approved

Figure 8 Quality assurance constellation schema

4 Conclusions and Future Work

At a first glance, the architecture proposed for the educare decision support system, does not introduce nothing new for a system with these purposes. Indeed, the general architecture discussed resumes to be the standard one, but data architecture was designed specifically for supporting several mining processes. The first mining process supported by this architecture is the discovery of frequent students’ behaviors, since all steps in the educational process form the point of view of students are recorded at the right granularities. Definitely, each path followed by a student is recorded, but also all evaluation items, with its correspondence to curricular topics covered. In this manner, works like [5] [6] [2] can be fed by our system. From the point of view of teachers, their participation in subjects are also recorded, complemented with quality assurance metrics, giving data for identifying frequent behaviors among teachers and predicting their future outcomes, as in the work described in [7] and [8]. The execution of subjects is also described in detail, with students individual results recorded, and aggregated at additional granularities. At last, data about working groups can give precious information about students’ contribution to collaborative work, for example through the use of social mining methods.

References

1. Baker, B., ed.: Educational data mining 2008. (2008) 2. Antunes, C.: Anticipating Students' Failure as Soon as Possible. In : The Handbook of Educational Data Mining. CRC Publisher (2010) 353-363 3. Kimball, R., Ross, M., Thornthwaite, W., Mundy, J., Becker, B.: The Data Warehouse Lifecycle Toolkit 2nd edn. John Wiley & Sons (2008) 4. Kimball, R., Ross, M.: The Data Warehouse Toolkit - the complete guide to . Wiley Computer Publishing (2002) 5. Antunes, C.: Acquiring BackgroundKnowledge for Intelligent Tutoring Systems. In : Int'l Conf Educational Data Mining, Montreal, pp.18-27 (2008a) 6. Antunes, C.: Mining Models for Failing Behaviors. In : Int'l Conf on Intelligent Systems Design and Applications, Pisa (2009) 7. Barracosa, J., Antunes, C.: Mining Teaching Behaviors from Pedagogical Surveys. In : Int'l Conf in Educational Data Mining (EDM 2011), Eindhoven, pp.329-330 (2011) 8. Barracosa, J., Antunes, C.: Anticipating Teachers' Performance. In : Proc Int'l Workshop on Knowledge Discovery on Educational Data @ ACM Int'l Conf on Knowledge Discovery and Data Mining (KDDinED@KDD), San Diego, USA (2011)