Chapter 5: Predictive Modelling in Teaching and Learning

Christopher Brooks 1, Craig Thompson 2

1 School of Information, University of Michigan, USA 2 Department of Computer Science, University of Saskatchewan, Canada DOI: 10.18608/hla17.005

ABSTRACT

This article describes the process, practice, and challenges of using predictive modelling LQWHDFKLQJDQGOHDUQLQJ,QERWKWKHāHOGVRIHGXFDWLRQDOGDWDPLQLQJ ('0 DQGOHDUQLQJ analytics (LA) predictive modelling has become a core practice of researchers, largely with DIRFXVRQSUHGLFWLQJVWXGHQWVXFFHVVDVRSHUDWLRQDOL]HGE\DFDGHPLFDFKLHYHPHQW,QWKLV chapter, we provide a general overview of considerations when using predictive modelling, the steps that an educational data scientist must consider when engaging in the process, DQGDEULHIRYHUYLHZRIWKHPRVWSRSXODUWHFKQLTXHVLQWKHāHOG Keywords: 3UHGLFWLYHPRGHOLQJPDFKLQHOHDUQLQJHGXFDWLRQDOGDWDPLQLQJ ('0 IHDWXUH selection, model evaluation

3UHGLFWLYHDQDO\WLFVDUHDJURXSRIWHFKQLTXHVXVHG and journals associated with the Society for Learning to make inferences about uncertain future events. Analytics and Research (SoLAR) and the International In the educational domain, one may be interested in (GXFDWLRQDO'DWD0LQLQJ6RFLHW\ ,('06 IRUPRUH predicting a measurement of learning (e.g., student H[DPSOHVRIDSSOLHGHGXFDWLRQDOSUHGLFWLYHPRGHOOLQJ academic success or skill acquisition), teaching (e.g., First, it is important to distinguish predictive mod- WKHLPSDFWRIDJLYHQLQVWUXFWLRQDOVW\OHRUVSHFLāF HOOLQJIURPH[SODQDWRU\PRGHOOLQJ 7,QH[SODQDWRU\ LQVWUXFWRURQDQLQGLYLGXDO RURWKHUSUR[\PHWULFV modelling, the goal is to use all available evidence of value for administrations (e.g., predictions of re- WRSURYLGHDQH[SODQDWLRQIRUDJLYHQRXWFRPH)RU WHQWLRQRUFRXUVHUHJLVWUDWLRQ 3UHGLFWLYHDQDO\WLFV instance, observations of age, gender, and socioeco- in education is a well-established area of research, nomic status of a learner population might be used and several commercial products now incorporate LQDUHJUHVVLRQPRGHOWRH[SODLQKRZWKH\FRQWULEXWH in the learning content manage- to a given student achievement result. The intent of 1 2 PHQWV\VWHP HJ'/ 6WDUāVK5HWHQWLRQ6ROXWLRQV WKHVHH[SODQDWLRQVLVJHQHUDOO\WREHFDXVDO YHUVXV 3 4 Ellucian, and Blackboard )XUWKHUPRUHVSHFLDOL]HG correlative alone), though results presented using these 5  companies (e.g., Blue Canary, Civitas Learning ) now DSSURDFKHVRIWHQHVFKHZH[SHULPHQWDOVWXGLHVDQG provide predictive analytics consulting and products rely on theoretical interpretation to imply causation for higher education. (as described well by Shmueli, 2010). ,QWKLVFKDSWHUZHLQWURGXFHWKHWHUPVDQGZRUNĂRZ In predictive modelling, the purpose is to create a model related to predictive modelling, with a particular that will predict the values (or class if the prediction emphasis on how these techniques are being applied does not deal with numeric data) of new data based on in teaching and learning. While a full review of the REVHUYDWLRQV8QOLNHH[SODQDWRU\PRGHOOLQJSUHGLFWLYH literature is beyond the scope of this chapter, we en- modelling is based on the assumption that a set of known courage readers to consider the conference proceedings data (referred to as training instances in

1 http://www.d2l.com/ 7 Shmueli (2010) notes a third form of modelling, descriptive 2KWWSZZZVWDUāVKVROXWLRQVFRP PRGHOOLQJZKLFKLVVLPLODUWRH[SODQDWRU\PRGHOOLQJEXWLQZKLFK 3 http://www.ellucian.com/ there are no claims of causation. In the higher education literature, 4 http://www.blackboard.com/ we would suggest that causation is often implied, and the majority 5 http://bluecanarydata.com/ of descriptive analyses are actually intended to be used as causal  http://www.civitaslearning.com/ HYLGHQFHWRLQĂXHQFHGHFLVLRQPDNLQJ

CHAPTER 5 PREDICTIVE MODELLING IN TEACHING & LEARNING PG 61 literature) can be used to predict the value or class of PREDICTIVE MODELLING new data based on observed variables (referred to as WORKFLOW features in predictive modelling literature). Thus the SULQFLSDOGLIIHUHQFHEHWZHHQH[SODQDWRU\PRGHOOLQJ Problem Identification and predictive modelling is with the application of the In the domain of teaching and learning, predictive PRGHOWRIXWXUHHYHQWVZKHUHH[SODQDWRU\PRGHOOLQJ modelling tends to sit within a larger action-oriented does not aim to make any claims about the future, HGXFDWLRQDOSROLF\DQGWHFKQRORJ\FRQWH[WZKHUHLQ- while predictive modelling does. stitutions use these models to react to student needs 0RUHFDVXDOO\H[SODQDWRU\PRGHOOLQJDQGSUHGLFWLYH in real-time. The intent of the predictive modelling modelling often have a number of pragmatic differ- activity is to set up a scenario that would accurately HQFHVZKHQDSSOLHGWRHGXFDWLRQDOGDWD([SODQDWRU\ describe the outcomes of a given student assuming PRGHOOLQJLVDSRVWKRFDQGUHĂHFWLYHDFWLYLW\DLPHG no new intervention. For instance, one might use a at generating an understanding of a phenomenon. predictive model to determine when a given individual 3UHGLFWLYHPRGHOOLQJLVDQLQVLWXDFWLYLW\LQWHQGHGWR is likely to complete their academic degree. Applying make systems responsive to changes in the underlying this model to individual students will provide insight data. It is possible to apply both forms of modelling into when they might complete their degrees assuming to technology in higher education. For instance, Lonn no intervention strategy is employed. Thus, while it is and Teasley (2014) describe a student-success system important for a predictive model to generate accurate EXLOWRQH[SODQDWRU\PRGHOVZKLOH%URRNV7KRPSVRQ scenarios, these models are not generally deployed and Teasley (2015) describe an approach based upon without an intervention or remediation strategy in mind. predictive modelling. While both methods intend Strong candidate problems for a successful predictive to inform the design of intervention systems, the modelling approach are those in which there are quan- former does so by building software based on theory WLāDEOHFKDUDFWHULVWLFVRIWKHVXEMHFWEHLQJPRGHOOHG GHYHORSHGGXULQJWKHUHYLHZRIH[SODQDWRU\PRGHOVE\ a clear outcome of interest, the ability to intervene in H[SHUWVZKLOHWKHODWWHUGRHVVRXVLQJGDWDFROOHFWHG situ, and a large set of data. Most importantly, there IURPKLVWRULFDOORJāOHV LQWKLVFDVHFOLFNVWUHDPGDWD  must be a recurring need, such as a class being ordered The largest methodological difference between the two year after year, where the historical data on learners modelling approaches is in how they address the issue (the training set) is indicative of future learners (the RIJHQHUDOL]DELOLW\,QH[SODQDWRU\PRGHOOLQJDOORIWKH testing set). data collected from a sample (e.g., students enrolled in Conversely, several factors make predictive modelling a given course) is used to describe a population more PRUHGLIāFXOWRUOHVVDSSURSULDWH)RUH[DPSOHERWK generally (e.g., all students who could or might enroll in sparse and noisy data present challenges when trying DJLYHQFRXUVH 7KHLVVXHVUHODWHGWRJHQHUDOL]DELOLW\ WRFUHDWHDFFXUDWHSUHGLFWLYHPRGHOV'DWDVSDUVLW\RU are largely based on sampling techniques. Ensuring the missing data, can occur for a variety of reasons, such as sample represents the general population by reducing students choosing not to provide optional information. VHOHFWLRQELDVRIWHQWKURXJKUDQGRPRUVWUDWLāHGVDP - Noisy data occurs when a measurement fails to capture pling, and determining the amount of power needed the intended data accurately, such as determining a to ensure an appropriate sample, through an analysis VWXGHQWÚVORFDWLRQIURPWKHLU,3DGGUHVVZKHQVRPH RISRSXODWLRQVL]HDQGOHYHOVRIHUURUWKHLQYHVWLJDWRU VWXGHQWVDUHXVLQJYLUWXDOSULYDWHQHWZRUNV SUR[LHV is willing to accept. In a predictive model, a hold out used to circumvent region restrictions, a not uncommon dataset is used to evaluate the suitability of a model practice in countries such as China). Finally, in some IRUSUHGLFWLRQDQGWRSURWHFWDJDLQVWWKHRYHUāWWLQJ domains, inferences produced by predictive models of models to data being used for training. There are may be at odds with ethical or equitable practice, several different strategies for producing hold out such as using models of student at-risk predictions datasets, including k-fold cross validation, leave-one- WROLPLWWKHDGPLVVLRQVRIVDLGVWXGHQWV H[HPSOLāHG RXWFURVVYDOLGDWLRQUDQGRPL]HGVXEVDPSOLQJDQG LQ6WULSOLQJHWDO  DSSOLFDWLRQVSHFLāFVWUDWHJLHV Data Collection With these comparisons made, the remainder of this In predictive modelling, historical data is used to gen- chapter will focus on how predictive modelling is erate models of relationships between features. One being used in the domain of teaching and learning, RIWKHāUVWDFWLYLWLHVIRUDUHVHDUFKHULVWRLGHQWLI\WKH and provide an overview of how researchers engage outcome variable (e.g., grade or achievement level) as in the predictive modelling process. well as the suspected correlates of this variable (e.g., JHQGHUHWKQLFLW\DFFHVVWRJLYHQUHVRXUFHV *LYHQ the situational nature of the modelling activity, it is

PG 62 HANDBOOK OF LEARNING ANALYTICS important to choose only those correlates available at categorical, and interval and ratio are considered as or before the time in which an intervention might be numeric. Categorical values may be binary (such as HPSOR\HG)RULQVWDQFHDPLGWHUPH[DPLQDWLRQJUDGH predicting whether a student will pass or fail a course) PLJKWEHSUHGLFWLYHRIDāQDOJUDGHLQWKHFRXUVHEXW or multivalued (such as predicting which of a given set of if the intent is to intervene before the midterm, this possible practice questions would be most appropriate data value should be left out of the modelling activity. IRUDVWXGHQW 7ZRGLVWLQFWFODVVHVRIDOJRULWKPVH[LVW IRUWKHVHDSSOLFDWLRQVFODVVLāFDWLRQDOJRULWKPVDUH In time-based modelling activities, such as the predic- WLRQRIDVWXGHQWāQDOJUDGHLWLVFRPPRQIRUPXOWLSOH used to predict categorical values, while regression models to be created (e.g., Barber & Sharkey, 2012), algorithms are used to predict numeric values. each corresponding to a different time period and set Feature Selection of observed variables. For instance, one might gen- In order to build and apply a predictive model, features erate predictive models for each week of the course, that correlate with the value to predict must be created. incorporating into each model the results of weekly When choosing what data to collect, the practitioner TXL]]HVVWXGHQWGHPRJUDSKLFVDQGWKHDPRXQWRI should err on the side of collecting more information engagement the students have had with respect digital UDWKHUWKDQOHVVDVLWPD\EHGLIāFXOWRULPSRVVLEOH resources to date in the course. to add additional data later, but removing information While state-based data, such as data about demograph- is typically much easier. Ideally, there would be some single feature that perfectly correlates with the cho- ics (e.g., gender, ethnicity), relationships (e.g., course sen outcome prediction. However, this rarely occurs enrollments), psychological measures (e.g., grit, as in 'XFNZRUWK3HWHUVRQ0DWWKHZV .HOO\DQG in practice. Some learning algorithms make use of DSWLWXGHWHVWV DQGSHUIRUPDQFH HJVWDQGDUGL]HG all available attributes to make predictions, whether test scores, grade point averages) are important for they are highly informative or not, whereas others educational predictive models, it is the recent rise apply some form of variable selection to eliminate the of big event-driven data collections that has been a uninformative attributes from the model. particularly powerful enabler of predictive models 'HSHQGLQJRQWKHDOJRULWKPXVHGWREXLOGDSUHGLFWLYH (see Alhadad et al., 2015 for a deeper discussion). PRGHOLWFDQEHEHQHāFLDOWRH[DPLQHWKHFRUUHODWLRQ Event-data is largely student activity-based, and is between features, and either remove highly correlated derived from the learning technologies that students attributes (the multicollinearity problem in regression interact with, such as learning content management analyses), or apply a transformation to the features to systems, discussion forums, active learning technol- eliminate the correlation. Applying a learning algorithm ogies, and video-based instructional tools. This data that naively assumes independence of the attributes LVODUJHDQGFRPSOH[ RIWHQLQWKHRUGHURIPLOOLRQV can result in predictions with an over-emphasis on the of database rows for a single course), and requires repeated or correlated features. For instance, if one VLJQLāFDQWHIIRUWWRFRQYHUWLQWRPHDQLQJIXOIHDWXUHV is trying to predict the grade of a student in a class for . and uses an attribute of both attendance in-class on a Of pragmatic consideration to the educational re- given day as well as whether a student asked a question searcher is obtaining access to event data and creating on a given day, it is important for the researcher to the necessary features required for the predictive acknowledge that the two features are not independent modelling process. The issue of access is highly con- (e.g., a student could not ask a question if they were not in attendance). In practice, the dependencies between WH[WVSHFLāFDQGGHSHQGVRQLQVWLWXWLRQDOSROLFLHVDQG features are often ignored, but it is important to note processes as well as governmental restrictions (such that some techniques used to clean and manipulate DV)(53$LQWKH8QLWHG6WDWHV 7KHLVVXHRIFRQYHUWLQJ 8 FRPSOH[GDWD DVLVWKHFDVHZLWKHYHQWEDVHGGDWD  data may rely upon an assumption of independence. into features suitable for predictive modelling is re- By determining an informative subset of the features, ferred to as , and is a broad area RQHFDQUHGXFHWKHFRPSXWDWLRQDOFRPSOH[LW\RIWKH of research itself. predictive model, reduce data storage and collection requirements, and aid in simplifying predictive models Classification and Regression IRUH[SODQDWLRQ In statistical modelling, there are generally four types of data considered: categorical, ordinal, interval, and 8 The authors share an anecdote of an analysis that fell prey to the ratio. Each type of data differs with respect to the dangers of assuming independence of attributes when using resam- pling techniques to boost certain classes of data when applying the kinds of relationships, and thus mathematical opera- synthetic minority over-sampling technique (Chawla, Bowyer, Hall, tions, which can be derived from individual elements. & Kegelmeyer, 2002). In that case, missing data with respect to city and province resulted in a dataset containing geographically impos- In practice, ordinal variables are often treated as sible combinations, reducing the effectiveness of the attributes and lowering the accuracy of the model.

CHAPTER 5 PREDICTIVE MODELLING IN TEACHING & LEARNING PG 63 Missing values in a dataset may be dealt with in several the instructor of the course, the pedagogical technique ways, and the approach used depends on whether data employed, or the degree programs requiring the course, is missing because it is unknown or because it is not this course may no longer be as predictive of degree applicable. The simplest approach either is to remove completion as was originally thought. The practitioner the attributes (columns) or instances (rows) that have should always consider whether patterns discovered missing values. There are drawbacks to both of these LQKLVWRULFDOGDWDVKRXOGEHH[SHFWHGLQIXWXUHGDWD WHFKQLTXHV)RUH[DPSOHLQGRPDLQVZKHUHWKHWRWDO $QXPEHURIGLIIHUHQWDOJRULWKPVH[LVWIRUEXLOGLQJ amount of data is quite small, the impact of removing predictive models. With educational data, it is com- HYHQDVPDOOSRUWLRQRIWKHGDWDVHWFDQEHVLJQLāFDQW mon to see models built using methods such as these: HVSHFLDOO\LIWKHUHPRYDORIVRPHGDWDH[DFHUEDWHVDQ H[LVWLQJFODVVLPEDODQFH/LNHZLVHLIDOODWWULEXWHV 1. Linear Regression predicts a continuous numeric have a small handful of missing values, then attribute output from a linear combination of attributes. removal will remove all of the data, which would not 2. Logistic Regression predicts the odds of two or be useful. Instead of deleting rows or columns with more outcomes, allowing for categorical predictions. missing data, one can also infer the missing values 3. Nearest Neighbours Classifiers use only the from the other known data. One approach is to re- closest labelled data points in the training dataset SODFHPLVVLQJYDOXHVZLWKDÜQRUPDOÝYDOXHVXFKDVWKH to determine the appropriate predicted labels PHDQRIWKHNQRZQYDOXHV$VHFRQGDSSURDFKLVWRāOO for new data. LQPLVVLQJYDOXHVLQUHFRUGVE\āQGLQJRWKHUVLPLODU records in the dataset, and copying the missing values 4. Decision Trees (e.g., C4.5 algorithm) are repeated from their records. partitions of the data based on a series of single DWWULEXWHÜWHVWVÝ(DFKWHVWLVFKRVHQDOJRULWKPL - The impact of missing data is heavily tied to the choice FDOO\WRPD[LPL]HWKHSXULW\RIWKHFODVVLāFDWLRQV of learning algorithm. Some algorithms, such as the in each partition. QD°YH%D\HVFODVVLāHUFDQPDNHSUHGLFWLRQVHYHQZKHQ some attributes are unknown; the missing attributes 5. 1D°YH%D\HV&ODVVLāHUV assume the statistical are simply not used in making a prediction. The nearest independence of each attribute given the classi- QHLJKERXUFODVVLāHUUHOLHVRQFRPSXWLQJWKHGLVWDQFH āFDWLRQDQGSURYLGHSUREDELOLVWLFLQWHUSUHWDWLRQV between two data points, and in some implementations RIFODVVLāFDWLRQV the assumption is made that the distance between a 6. Bayesian Networks feature manually constructed known value and a missing value is the largest pos- graphical models and provide probabilistic inter- sible distance for that attribute. Finally, when the SUHWDWLRQVRIFODVVLāFDWLRQV C4.5 decision tree algorithm encounters a test on an 7. Support Vector Machines instance with a missing value, the instance is divided use a high dimensional into fractional parts that are propagated down the GDWDSURMHFWLRQLQRUGHUWRāQGDK\SHUSODQHRI tree and used for a weighted voting. In short, missing greatest separation between the various classes. data is an important consideration that both regularly 8. Neural Networks are biologically inspired algo- occurs and is handled differently depending upon rithms that propagate data input through a series the machine learning method and toolkit employed. of sparsely interconnected layers of computational Methods for Building Predictive nodes (neurons) to produce an output. Increased Models interest has been shown in neural network ap- After collecting a dataset and performing attribute proaches under the label of . selection, a predictive model can be built from his- 9. Ensemble Methods use a voting pool of either torical data. In the most general terms, the purpose KRPRJHQHRXVRUKHWHURJHQHRXVFODVVLāHUV7ZR of a predictive model is to make a prediction of some prominent techniques are bootstrap aggregating, unknown quantity or attribute, given some related in which several predictive models are built from NQRZQLQIRUPDWLRQ7KLVVHFWLRQZLOOEULHĂ\LQWURGXFH random sub-samples of the dataset, and boost- several such methods for building predictive models. ing, in which successive predictive models are A fundamental assumption of predictive modelling is GHVLJQHGWRDFFRXQWIRUWKHPLVFODVVLāFDWLRQVRI WKDWWKHUHODWLRQVKLSVWKDWH[LVWLQWKHGDWDJDWKHUHG the prior models. LQWKHSDVWZLOOVWLOOH[LVWLQWKHIXWXUH+RZHYHUWKLV Most of these methods, and their underlying soft- DVVXPSWLRQPD\QRWKROGXSLQSUDFWLFH)RUH[DPSOH ware implementations, have tunable parameters that it may be the case that (according to the historical data change the way the algorithm works depending upon collected) a student’s grade in Introductory Calculus is H[SHFWDWLRQVRIWKHGDWDVHW)RULQVWDQFHZKHQEXLOG - highly correlated with their likelihood of completing a ing decision trees, a researcher might set a minimum degree within 4 years. However, if there is a change in

PG 64 HANDBOOK OF LEARNING ANALYTICS OHDIVL]HRUPD[LPXPGHSWKRIWUHHSDUDPHWHUXVHGLQ GDWDVHW UHIHUUHGWRDVRYHUāWWLQJWKHPRGHO ,QVWHDG RUGHUWRHQVXUHVRPHOHYHORIJHQHUDOL]DELOLW\ LWLVFRPPRQSUDFWLFHWRÜKROGRXWÝVRPHIUDFWLRQRI the dataset and use it solely as a test set to assess Numerous software packages are available for the building of predictive modelling, and choosing the model quality. right package depends highly on the researcher’s The simplest approach is to remove half of the data H[SHULHQFHWKHGHVLUHGFODVVLāFDWLRQRUUHJUHVVLRQ and reserve it for testing. However, there are two approach, and the amount of data and data cleaning drawbacks to this approach. First, by reserving half of required. While a comprehensive discussion of these the data for testing, the predictive model will only be platforms is outside the scope of this chapter, the DEOHWRPDNHXVHRIKDOIRIWKHGDWDIRUPRGHOāWWLQJ freely available and open-source package Weka (Hall *HQHUDOO\PRGHODFFXUDF\LQFUHDVHVDVWKHDPRXQW et al., 2009) provides implementations of a number of of available data increases. Thus, training using only the previously mentioned modelling methods, does half of the available data may result in predictive mod- not require programming knowledge to use, and has els with poorer performance than if all the data had DVVRFLDWHGHGXFDWLRQDOPDWHULDOVLQFOXGLQJDWH[WERRN been used. Second, our assessment of model quality (Witten, Frank, & Hall, 2011) and series of free online will only be based on predictions made for half of the FRXUVHV :LWWHQ  DYDLODEOHGDWD*HQHUDOO\LQFUHDVLQJWKHQXPEHURI While the breadth of techniques covered within a given instances in the test set would increase the reliabil- ity of the results. Instead of simply dividing the data software package has led to it being commonplace for into training and testing partitions, it is common to researchers (including educational data scientists) to use a process of k-fold cross validation in which the SXEOLVKWDEOHVRIFODVVLāFDWLRQDFFXUDFLHVIRUDQXPEHU of different methods, the authors caution against this. dataset is partitioned at random into k segments; Once a given technique has shown promise, time is k distinct predictive models are constructed, with EHWWHUVSHQWUHĂHFWLQJRQWKHIXQGDPHQWDODVVXPS - each model training on all but one of the segments, WLRQVRIFODVVLāHUV HJZLWKUHVSHFWWRPLVVLQJGDWDRU and testing on the single held out segment. The test GDWDVHWLPEDODQFH H[SORULQJHQVHPEOHVRIFODVVLāHUV results are then pooled from all k test segments, and or tuning the parameters of particular methods being an assessment of model quality can be performed. employed. Unless the intent of the research activity 7KHLPSRUWDQWEHQHāWVRINIROGFURVVYDOLGDWLRQDUH is to compare two statistical modelling approaches that every available data point can be used as part of VSHFLāFDOO\HGXFDWLRQDOGDWDVFLHQWLVWVDUHEHWWHU the test set, no single data point is ever used in both RIIW\LQJWKHLUāQGLQJVWRQHZRUH[LVWLQJWKHRUHWLFDO WKHWUDLQLQJVHWDQGWHVWVHWRIWKHVDPHFODVVLāHUDW constructs, leading to a deepening of understanding of the same time, and the training sets used are nearly a given phenomenon. Sharing data and analysis scripts as large as all of the available data. in an open science fashion provides better opportunity An important consideration when putting predictive for small technique iterations than cluttering a pub- modelling into practice is the similarity between lication with tables of (often) uninteresting precision the data used for training the model and the data and recall values. available when predictions need to be made. Often in Evaluating a Model the educational domain, predictive models are con- In order to assess the quality of a predictive model, structed using data from one or more time periods a test dataset with known labels is required. The (e.g., semesters or years), and then applied to student GDWDIURPWKHQH[WWLPHSHULRG,IWKHIHDWXUHVXVHGWR predictions made by the model on the test set can be construct the predictive model include factors such compared to the known true labels of the test set in as students’ grades on individual assignments, then order to assess the model. A wide variety of measures is available to compare the similarity of the known the accuracy of the model will depend on how similar WUXHODEHOVDQGWKHSUHGLFWHGODEHOV6RPHH[DPSOHV WKHDVVLJQPHQWVDUHIURPRQH\HDUWRWKHQH[W7RJHW include prediction accuracy (the raw fraction of test an accurate assessment of model performance, it is LQVWDQFHVFRUUHFWO\FODVVLāHG SUHFLVLRQDQGUHFDOO important to assess the model in the same manner as will be used in situ. Build the predictive model using Often, when approaching a predictive modelling data available from one year, and then construct a problem, only one omnibus set of data is available for testing set consisting of data from the following year, building. While it may be tempting to reuse this same instead of dividing data from a single year into training dataset as a test set to assess model quality, the per- and testing sets. IRUPDQFHRIWKHSUHGLFWLYHPRGHOZLOOEHVLJQLāFDQWO\ higher on this dataset than would be seen on a novel

CHAPTER 5 PREDICTIVE MODELLING IN TEACHING & LEARNING PG 65 PREDICTIVE ANALYTICS IN 1. Supporting non-computer scientists in predictive PRACTICE modelling activities 7KHOHDUQLQJDQDO\WLFVāHOG is highly interdisciplinary and educational re- searchers, psychometricians, cognitive and social 3UHGLFWLYHDQDO\WLFVDUHEHLQJXVHGZLWKLQWKHāHOGRI SV\FKRORJLVWVDQGSROLF\H[SHUWVWHQGWRKDYH teaching and learning for many purposes, with one VWURQJEDFNJURXQGVLQH[SODQDWRU\PRGHOOLQJ VLJQLāFDQWERG\RIZRUNDLPHGDWLGHQWLI\LQJVWXGHQWV 3URYLGLQJVXSSRUWLQWKHDSSOLFDWLRQRISUHGLFWLYH at risk in their academic programming. For instance, modelling techniques, whether through the inno- Aguiar et al. (2015) describe the use of predictive vation of user-friendly tools or the development models to determine whether students will graduate of educational resources on predictive modelling, from secondary school on time, demonstrating how the could further diversify the set of educational accuracy of predictions changes as students advance researchers using these techniques. from primary school through into secondary school. 3UHGLFWHGRXWFRPHVYDU\ZLGHO\DQGPLJKWLQFOXGHD 2. Creating community-led educational data science VSHFLāFVXPPDWLYHJUDGHRUJUDGHGLVWULEXWLRQIRUD challenge initiatives . It is not uncommon for re- student or class of achievement (Brooks et al., 2015) searchers to address the same general theme of LQDFRXUVH%DNHU*RZGDDQG&RUEHWW  GHVFULEH work but use slightly different datasets, implemen- a method that predicts a formative achievement for tations, and outcomes and, as such, have results a student based on their previous interactions with WKDWDUHGLIāFXOWWRFRPSDUH7KLVLVH[HPSOLāHG an intelligent tutoring system. In lower-risk and in recent predictive modelling research regarding semi-formal settings such as massive open online dropout in massive open online courses, where courses (MOOCs), the chance that a learner might a number of different authors (e.g., Brooks et al., disengage from the learning activity mid-course is ;LQJHWDO7D\ORUHWDO:KLWHKLOO another heavily studied outcome (Xing, Chen, Stein, :LOOLDPV/RSH]&ROHPDQ 5HLFK KDYH 0DUFLQNRZVNL7D\ORU9HHUDPDFKDQHQL  all done work with different datasets, outcome O’Reilly, 2014). variables, and approaches. Beyond performance measures, predictive models Moving towards a common and clear set of out- have been used in teaching and learning to detect comes, open data, and shared implementations learners who are engaging in off-task behaviour (Xing LQRUGHUWRFRPSDUHWKHHIāFDF\RIWHFKQLTXHV DQG*RJJLQV%DNHU VXFKDVÜJDPLQJWKH and the suitability of modelling methods for given V\VWHPÝLQRUGHUWRDQVZHUTXHVWLRQVFRUUHFWO\ZLWK - SUREOHPVFRXOGEHEHQHāFLDOIRUWKHFRPPXQLW\ out learning (Baker, Corbett, Koedinger, & Wagner, This approach has been valuable in similar research  3V\FKRORJLFDOFRQVWUXFWVVXFKDVDIIHFWLYHDQG āHOGVDQGWKHEURDGHUGDWDVFLHQFHFRPPXQLW\DQG emotional states have also been predictively modelled we believe that educational data science challenges 'Ú0HOOR&UDLJ:LWKHUVSRRQ0F'DQLHO *UDHVVHU could help to disseminate predictive modelling 2007; Wang, Heffernan, & Heffernan, 2015), using a knowledge throughout the educational research YDULHW\RIXQGHUO\LQJGDWDDVIHDWXUHVVXFKDVWH[WXDO community while also providing an opportunity GLVFRXUVHRUIDFLDOFKDUDFWHULVWLFV0RUHH[DPSOHV for the development of novel interdisciplinary of some of the ways predictive modelling has been methods, especially related to feature engineering. XVHGLQ(GXFDWLRQDO'DWD0LQLQJLQSDUWLFXODUFDQ 3. Engaging in second order predictive modelling. EHIRXQGLQ.RHGLQJHU'Ú0HOOR0F/DXJKOLQ3DUGRV ,QWKHFRQWH[WRIOHDUQLQJDQDO\WLFVZHGHāQH and Rosé (2015). second order predictive models as those that in- clude historical knowledge as to the effects of and CHALLENGES AND OPPORTUNITIES intervention in the model itself. Thus a predictive model that used student interactions with content Computational and statistical methods for predictive to determine drop out (for instance) would be an modelling are mature, and over the last decade, a H[DPSOHRIāUVWRUGHUSUHGLFWLYHPRGHOOLQJZKLOH number of robust tools have been made available for a model that also includes historical data as to the educational researchers to apply predictive modelling effect of an intervention (such as an email prompt to teaching and learning data. Yet a number of chal- or nudge) would be considered a second order lenges and opportunities face the learning analytics predictive model. Moving towards the modelling community when building, validating, and applying of intervention effectiveness is important when predictive models. We identify three areas that could multiple interventions are available and person- use investment in order to increase the impact that DOL]HGOHDUQLQJSDWKVDUHGHVLUHG predictive modelling techniques can have:

PG 66 HANDBOOK OF LEARNING ANALYTICS 'HVSLWHWKHPXOWLGLVFLSOLQDU\QDWXUHRIWKHOHDUQLQJ and learning: while for some researchers the goal analytics and educational data mining communities, is understanding cognition and learning processes, WKHUHLVVWLOODVLJQLāFDQWQHHGIRUEULGJLQJXQGHU- others are interested in predicting future events and standing between the diverse scholars involved. success as accurately as possible. With predictive An interesting thematic undercurrent at learning PRGHOVEHFRPLQJLQFUHDVLQJO\FRPSOH[DQGLQFRP - analytics conferences are the (sometimes-heated) SUHKHQVLEOHE\DQLQGLYLGXDO HVVHQWLDOO\EODFNER[HV  discussions of the roles of theory and data as drivers LWLVLPSRUWDQWWRVWDUWGLVFXVVLQJPRUHH[SOLFLWO\WKH of educational research. Have we reached the point JRDOVRIUHVHDUFKDJHQGDVLQWKHāHOGWREHWWHUGULYH RIÜWKHHQGRIWKHRU\Ý $QGHUVRQ LQHGXFDWLRQDO PHWKRGRORJLFDOFKRLFHVEHWZHHQH[SODQDWRU\DQG UHVHDUFK"8QOLNHO\EXWWKLVTXHVWLRQLVPRVWVDOLHQW predictive modelling techniques. ZLWKLQWKHVXEāHOGRISUHGLFWLYHPRGHOOLQJLQWHDFKLQJ

REFERENCES

$JXLDU(/DNNDUDMX+%KDQSXUL10LOOHU'

$OKDGDG6$UQROG.%DURQ-%D\HU,%URRNV&/LWWOH555RFFKLR5$6KHKDWD6 :KLWPHU- (2015, October 7). The predictive learning analytics revolution: Leveraging learning data for student suc- FHVV7HFKQLFDOUHSRUW('8&$86(&HQWHUIRU$QDO\VLVDQG5HVHDUFK

$QGHUVRQ& -XQH 7KHHQGRIWKHRU\7KHGDWDGHOXJHPDNHVWKHVFLHQWLāFPHWKRGREVROHWH:LUHG KWWSVZZZZLUHGFRPSEWKHRU\

%DNHU56-G  0RGHOLQJDQGXQGHUVWDQGLQJVWXGHQWVÚRQWDVNEHKDYLRXULQLQWHOOLJHQWWXWRULQJV\V - tems. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems  &+,Ú $SULO× 0D\6DQ-RVH&$ SS× 1HZ

%DNHU56-G&RUEHWW$7.RHGLQJHU.5 :DJQHU$=  2QWDVNEHKDYLRXULQWKHFRJQLWLYH tutor classroom: When students game the system. Proceedings of the SIGCHI Conference on Human Factors in Computing Systems  &+,Ú ×$SULO9LHQQD$XVWULD SS× 1HZ

%DNHU56-G*RZGD60 &RUEHWW$7  7RZDUGVSUHGLFWLQJIXWXUHWUDQVIHURIOHDUQLQJ Proceed- ings of the 15 th ,QWHUQDWLRQDO&RQIHUHQFHRQ$UWLāFLDO,QWHOOLJHQFHLQ(GXFDWLRQ  $,('Ú -XQH×-XO\ $XFNODQG1HZ=HDODQG SS× /HFWXUH1RWHVLQ&RPSXWHU6FLHQFH6SULQJHU%HUOLQ+HLGHOEHUJ

Barber, R., & Sharkey, M. (2012). Course correction: Using analytics to predict course success. Proceedings of the 2 nd International Conference on Learning Analytics and Knowledge  /$.Ú $SULO×0D\9DQ - FRXYHU%&&DQDGD SS× 1HZ

Brooks, C., Thompson, C., & Teasley, S. (2015). A time series interaction analysis method for building predictive models of learners using log data. Proceedings of the 5 th International Conference on Learning Analytics and Knowledge  /$.Ú ×0DUFK3RXJKNHHSVLH1<86$ SS× 1HZ

&KDZOD19%RZ\HU.:+DOO/2 .HJHOPH\HU:3  6PRWH6\QWKHWLFPLQRULW\RYHUVDPSOLQJ technique. -RXUQDORI$UWLāFLDO,QWHOOLJHQFH5HVHDUFK ×

'Ú0HOOR6.&UDLJ6':LWKHUVSRRQ$0F'DQLHO% *UDHVVHU$  $XWRPDWLFGHWHFWLRQRIOHDUQ - er’s affect from conversational cues. User Modeling and User-Adapted Interaction, 18 × ×

'XFNZRUWK$/3HWHUVRQ&0DWWKHZV0' .HOO\'5  *ULW3HUVHYHUDQFHDQGSDVVLRQIRUORQJ term goals. Journal of Personality and Social Psychology, 92  ×

+DOO0)UDQN(+ROPHV*3IDKULQJHU%5HXWHPDQQ3 :LWWHQ,+  7KH:HNDGDWDPLQLQJVRIW - ware: An update. SIGKDD Explorations Newsletter, 11  ×GRL

.RHGLQJHU.5'Ú0HOOR60F/DXJKOLQ($3DUGRV=$ 5RVª&3  'DWDPLQLQJDQGHGXFDWLRQ Wiley Interdisciplinary Reviews: Cognitive Science, 6  ×

CHAPTER 5 PREDICTIVE MODELLING IN TEACHING & LEARNING PG 67 /RQQ6 7HDVOH\6'  6WXGHQWH[SORUHU$WRROIRUVXSSRUWLQJDFDGHPLFDGYLVLQJDWVFDOH Proceed- ings of the 1 st ACM Conference on Learning @ Scale  /#6 ×0DUFK$WODQWD*HRUJLD86$ SS × 1HZ

6KPXHOL*  7RH[SODLQRUWRSUHGLFW" Statistical Science, 25  ×GRL676

6WULSOLQJ-0DQJDQ.'H6DQWLV1)HUQDQGHV5%URZQ6.RORZLFK60F*XLUH3 +HQGHUVKRWW$ 0DUFK 8SURDUDW0RXQW6W0DU\ÚV7KH&KURQLFOHRI+LJKHU(GXFDWLRQKWWSFKURQLFOHFRP specialreport/Uproar-at-Mount-St-Marys/30.

7D\ORU&9HHUDPDFKDQHQL. 2Ú5HLOO\80 $XJXVW /LNHO\WRVWRS"3UHGLFWLQJVWRSRXWLQPDVVLYH open online courses. http://dai.lids.mit.edu/pdf/1408.3382v1.pdf

Wang, Y., Heffernan, N. T., & Heffernan, C. (2015). Towards better affect detectors: Effect of missing skills, class features and common wrong answers. Proceedings of the 5 th International Conference on Learning Analytics and Knowledge  /$.Ú ×0DUFK3RXJKNHHSVLH1<86$ SS× 1HZ

:KLWHKLOO-:LOOLDPV--/RSH]*&ROHPDQ&$ 5HLFK-  %H\RQGSUHGLFWLRQ)LUVWVWHSVWR - ward automatic intervention in MOOC student stopout. In O. C. Santos et al. (Eds.), Proceedings of the 8 th International Conference on Educational Data Mining ('0 ×-XQH0DGULG6SDLQ SS;;;× ;;; ,QWHUQDWLRQDO(GXFDWLRQDO'DWD0LQLQJ6RFLHW\KWWSZZZHGXFDWLRQDOGDWDPLQLQJRUJ('0 XSORDGVSDSHUVSDSHUBSGI

:LWWHQ,+  :HNDFRXUVHV7KH8QLYHUVLW\RI:DLNDWRKWWSVZHNDZDLNDWRDFQ]H[SORUHU

Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques , 3 rd ed. 6DQ)UDQFLVFR&$0RUJDQ.DXIPDQQ3XEOLVKHUV

;LQJ:&KHQ;6WHLQ- 0DUFLQNRZVNL0  7HPSRUDOSUHGLFDWLRQRIGURSRXWVLQ022&V5HDFKLQJ WKHORZKDQJLQJIUXLWWKURXJKVWDFNLQJJHQHUDOL]DWLRQ Computers in Human Behavior, 58 ×

;LQJ: *RJJLQV6  /HDUQLQJDQDO\WLFVLQRXWHUVSDFH$KLGGHQQDLYH%D\HVPRGHOIRUDXWRPDWLF students’ on-task behaviour detection. Proceedings of the 5 th International Conference on Learning Analytics and Knowledge  /$.Ú ×0DUFK3RXJKNHHSVLH1<86$ SS× 1HZ

PG 68 HANDBOOK OF LEARNING ANALYTICS