Solaiman E, Barney J, Smith M. Building A Formative Assessment System That is Easy To Adopt Yet Supports Long-Term Improvement: A Review of the Literature and Design Recommendations. In: 9th International Conference on Computer Supported Education CSEDU. 2017, Porto, Portugal.

Copyright: This is the author’s manuscript of a paper that was presented at the 9th International Conference on Computer Supported Education CSEDU, held 21-23 April 2017, Porto, Portugal.

Link to Conference website: http://www.csedu.org/

Date deposited:

05/05/2017

This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License

Newcastle University ePrints - eprint.ncl.ac.uk Building A Formative Assessment System That is Easy To Adopt Yet Supports Long-Term Improvement: A Review of the Literature and Design Recommendations

Ellis Solaiman, Joshua Barney, and Martin Smith School of Computing Science, Newcastle University, United Kingdom {ellis.solaiman, j.barney1, m.smith5}@ncl.ac.uk

Keywords: Formative Assessment, Assessment Design, Review, Feedback Automation, Education Technology

Abstract: Formative assessment has been shown to be an effective teaching tool, yet is infrequently used in practice. With the intent of building a formative e-assessment platform, we examine research on formative practices and supporting computer-based systems with a focus on: institutional barriers to adoption of previous systems; senses in which students and teachers can improve their practices across varying timescales; and collectible data (self-reported or otherwise) necessary or advantageous in supporting these processes. From this research we identify the minimal set of data which adequately supports these processes of improvement, arrive at a set of requirements and recommendations for an innovative system which collects, processes, and presents this data appropriately, and from these requirements design the architecture of an extensible electronic formative assessment system which balances the need for complex long-term analytics with that of accessibility.

1 Introduction by technical problems such as lack of integration with systems in use within institutions, and educational data-mining efforts fail due to the absence of tools Formative assessment refers to the use of testing which are easy for non-experts to use (Romero and to allow both students to reflect on their own learn- Ventura, 2010). A key factor inhibiting the adoption ing and teachers to determine the needs of individ- of formative assessment technologies is staff time. ual students, or groups of students. This can also Teacher time is an especially scarce resource – those be described as assessment for learning. Formative who see the value of may still assessment is often contrasted with summative as- avoid full utilization for this reason; ”These are all sessment, described as assessment of learning, which great initiatives, but I’m running out of hours in a day uses testing as a means for summarizing the perfor- ... I’m in overload!” (Hall and Hall, 2010). Research mance of students and their level of knowledge at the by (Macfadyen and Dawson, 2012) highlights that in end of a course (Looney, 2011) (Black and Wiliam, order to overcome individual and group resistance to 2009). Despite the clear benefits of formative assess- innovation and change, planning processes must cre- ment as an effective teaching tool, it is infrequently ate conditions that allow participants to both think and used in practice. Also whilst VLEs (Virtual Learn- feel positively about change; ”conditions that appeal ing Environments) are widespread in higher educa- to both the heart and the head”. Without investing re- tion, their formative abilities are often severely under- sources for effective automated data analysis and vi- utilized (Blin and Munro, 2008). Even when change sualization, large quantities of data will not compel is not resisted by faculty staff, adoption of educational social or cultural change (Kotter and Cohen, 2002); technology tends to proceed at a slow pace, and often ”increasing observability of change is an effective innovation is driven by lone individuals deciding to strategy for reducing resistance” (Rogers, 1995). use technologies without backing from above (Rus- sell, 2009). Although perception of student expec- This paper builds upon and complements previous tation can provide positive incentive, technical barri- work and literature reviews such as (Gikandi et al., ers along with a lack of institutional support are key 2011), by taking a more integrative approach. We suppressing factors (King and Boyatt, 2015). Cur- attempt through the development a holistic theoretic rently available Education technologies are hampered framework,to arrive at the design of a system suitable for active use both as a tool for student and teacher and Thompson, 2007) in that we conceive of teachers improvement, and as a platform for further empirical not just in terms of the role they can play in providing research about the impact of various formative strate- feedback to students but also as individuals who can gies. Initially, multiple high impact journals in the benefit from feedback themselves. field of educational technology were identified, and the titles and abstract of all publications from the the last 5 years were manually examined. A number of 3 Feedback Processes and Their potentially relevant papers were identified and scruti- nized in further detail, and common subjects and ref- Data Collection Requirements erences were noted. Keywords related to these sub- jects were used to find further papers using academic We examine the specifics of feedback processes search engines. The process of reading papers, dis- below, considering issues related by their similar data- covered through references or search, continued re- requirements together. This view allows us to move cursively until a clear picture of a range of formative closer to the design of a real system. and learning practices emerged. A conceptual frame- work was developed to organize these practices by 3.1 Formative Feedback Principles time and individual, and the body of collected pa- pers was re-examined in the context of this frame- Synthesizing research literature, (Nicol and work to explore the data-collection requirements of Macfarlane-Dick, 2006) present seven princi- each of the identified practices. Recommendations ples, reordered for clarity here, arguing effective are made for implementation, and from these recom- formative feedback practice: mendations, the design of an extensible system to col- 1. delivers high quality information to students lect this data is presented. about their learning. 2. helps clarify what good performance is (goals, criteria, expected standards). 3. provides opportunities to close the gap between current and desired performance. 4. facilitates the 2 Classification of Feedback development of self-assessment (reflection) in learn- by User and Duration ing. 5. encourages positive motivational beliefs and self-esteem. 6. encourages teacher and peer dialogue Multiple researchers distinguish types of forma- around learning. 7. provides information to teachers tive assessment based on their duration. Allal and that can be used to help shape teaching. Schwartz (Allal and Schwartz, 1996) use the termi- The first three principles mirror (Wiliam and nology of “Level 1” and “Level 2’ formative assess- Thompson, 2007)’s theory of formative assessment, ments, which directly benefit students who are as- which emphasizes establishing where the learners are sessed as or use data gathered to benefit future in- in their learning, where they are going, and what structional activities or new groups of students respec- needs to be done to get them there. Black and tively. Alternatively, (Wiliam, 2006) distinguishes be- William, (Black and Wiliam, 1998) warn that class- tween short-, medium-, and long-cycle formative as- room cultures focused on rewards or grades encour- sessment, which operate within and between lessons, age students to game the system, avoid difficult tasks, within and between teaching units, and across mark- and where difficulties or poor results are encountered, ing periods, semesters or even years. In addition to ”retire hurt”. By 2004, this had been strengthened acting as a helpful framework within which to clas- in (Black et al., 2004) into an explicit recommenda- sify and organize research, this viewpoint is useful tion to adopt comment-only marking practices; ”Stu- as a design tool – collecting data required for longer- dents given marks are likely to see it as a way to com- term formative processes requires action over a sus- pare themselves with others; those given only com- tained period of time, and this affects the short-term ments see it as helping them to improve. The latter experience of new users. As discussed below, a sys- group outperforms the former.” A literature review by tem must show immediate value in order to have the (Gikandi et al., 2011) provides supporting studies and best chance at being widely adopted, and balancing explicitly reaffirms the above seven principles as ”an the conflict between supporting longer-term goals and essential condition for effective formative feedback”. maintaining accessibility is a design challenge that Whilst the above principles may be sufficient to deserves thinking about explicitly. In addition to dis- guide the behavior of teachers in conversation with tinguishing processes by time we also do so by the students in a physical classroom, there are pragmatic agents involved (See Figure 1). Our framework dif- details to consider for their implementation within fers from previous formulations like that of (Wiliam computer-based systems. For example Hattie and Short-term Medium-term Long-term

Students Checking understanding of subject, Understanding progress and thinking Subject mastery, perhaps through spaced identifying own misconceptions. about own learning (meta-cognition). repetition.

Teachers Identifying common misconceptions in Understanding broad proficiencies of Constructing next year’s course materials, and the student body, planning next lesson. class across the whole syllabus. designing new summative assessments.

Figure 1: Conceptual framework of feedback for students and teachers in the short, medium, and long terms.

Timperley (Hattie and Timperley, 2007) distinguish be used to gain insight into the problem-solving pro- between task-orientated feedback such as knowledge cess, and from this go on to draw statistical infer- of correctness of response, and processes-orientated ences about the efficacy of various solving strategies. feedback such as worked-out example answer, stat- Similarly, (Seppal¨ a¨ et al., 2006) presented an inter- ing the latter is to be preferred. In the context of active system for examining Computer Science stu- technology-based instruction, learners elect to view dent beliefs about algorithms for creating a heap data- the minimum load of feedback required to achieve structure. The system required students to simulate closure (Lefevre and Cox, 2016), yet for some ques- the working of given algorithms by manipulating the tion designs a correct answer may hide a multitude of visual representations of the corresponding data struc- flawed methods and misconceptions (Seppal¨ a¨ et al., tures on a computer screen. By collecting data on the 2006). Since students will not rigorously examine sequence of actions taken as students progressed to- worked solutions when they know their answer is cor- wards this goal, it became possible to identify not just rect, it may be that this can only be avoided by better one-off slips but systematic misconceptions about the question design. algorithms. These misconceived algorithm variants could then be simulated, and when a student’s re- 3.2 Identifying Misconceptions in sponse matched the sequence generated by the simu- lation, advanced feedback targeted at the specific mis- Student Understanding conception could be delivered.

Closed question types such as multiple-choice ques- 3.3 Self- and Peer-assessment tions can be used powerfully, provided that they’re designed with a clear set of educational principles and As class sizes increase, so does the cost of mark- goals in mind (Nicol, 2007). By analyzing incorrect ing student work in terms of teachers’ time and answers of students on longer-form written exams for energy. Computer-based marking is clearly suffi- mathematics, (Albano and Pepkolaj, 2014) were able cient for closed, well-defined questions, but free to develop effective recovery learning path tests on text-questions, which are known to promote meta- top of a fairly primitive Learning Management Sys- cognition more effectively, pose technical challenges tem quiz tool. Careful question design may be able with respect to automated marking and feedback pro- to detect and correct student misconceptions which vision. Relevant automated systems do exist, often are understood in advance, but how can this prereq- utilizing a vector-space model, however they may be uisite information for understanding be gathered? In overly harsh, fail to recognize the breadth of valid re- some subjects common misconceptions amongst stu- sponses, mark only with respect to an example an- dent bodies have already been identified through aca- swer, and require the teacher to manually resolve am- demic research initiatives, and the results compiled biguities (Rodrigues and Oliveira, 2014). Progress into documents targeted at teaching staff (one such has been made over the years to the point where such example would be (Hestenes et al., 1992)), but what systems are used in practice, but using them to pro- if such documents are not readily available? vide useful feedback to a learner is still an ongoing Interactive questions are capable of recording a area of research (Van Labeke et al., 2013). Such greater variety and density of information than tradi- a system also fails when the subjective qualities be- tional tests questions. Researchers in (Greiff et al., ing measured cannot be reduced to linguistic similar- 2015) present a computerized assessment question ity, such as in evaluations of long free-form essays or where students are expected to optimize the out- creative works. In contrast, self-assessment and peer- come of a simulation by varying input values through assessment have been shown to be effective in a vari- graphical sliders. They show how time-series data ety of contexts such as design, programming, art, and that records actions taken during the assessment can dance. Provided students are happy with the initial workload, such a strategy scales indefinitely, since the between individual competencies in a syllabus (Wil- number of available reviewers grows with the num- son, 2009). Given a dataset of student performance ber of students, and automated systems can be used on topic-tagged questions over time, it is possible to in hybrid with human markers if necessary to detect mechanically extract a concept map showing the de- unusual reviews or to provide an additional source of pendencies between topics, and in doing so identify information which can be used to increase the quality conceptual bottlenecks in learning (Lin et al., 2015). of a final overall review (Luaces et al., 2015). (Lin et al., 2015) go on to show how such a generated (Gehringer, 2014) examines multiple methods for map can be used to indicate a learner’s progress to improving review quality, but most involve aggrega- them, clearly showing how next to progress in terms tion of reviews, implying each student would be bur- of student learning requirements. Also of interest is dened with a large number of other student’s answers the use of factor analysis to gain insight into core to mark, an undesirable workload if formative assess- competency areas within a curriculum. This is hardly ments are to be undertaken regularly. One method a new idea (Zou and Zhang, 2013), however accessi- of improving review quality which does not necessi- ble integration of it and other statistical tools of sim- tate aggregation, is calibration, whereby students are ilar power into assessment systems that ensure resul- presented with multiple example answers which have tant inferences are valid is a open area of research. previously been assigned a mark by the teacher, be- fore they go on to mark a peer’s work. 3.5 and Subject Nicol and Macfarlane (Nicol and Macfarlane- Mastery Dick, 2006) make the case that even ’at-risk’ students can learn to be more self-regulating, that the more learning becomes self-regulated the less students de- The positive effects of spacing learning over time pend on external teacher support, that such students in increasing retention of knowledge are well doc- are more persistent, resourceful, confident and higher umented and researched. Examples include; early achievers, and that the quality of self-regulation work such as Ebbinghaus’s discovery that the for- can be developed by making learning processes ex- getting curve becomes shallower over time (and thus plicit, through meta-cognitive training, through self- the optimal interval between learning repetitions in- monitoring, and by providing opportunities to prac- creases with time) (Ebbinghaus, 1885); Spitzer’s tice self-regulation. large scale study on Iowan school children (Spitzer, Although anonymous peer-assessment may not be 1939); and a large body of studies and inquiry in any more valid than named peer-assessment with re- the 1960s and 1970s including (Landauer and Bjork, spect to construct validity, pupils self-report lower 1978). Several businesses, some with equity of over 1 levels of peer-pressure and feel more positive about million USD, currently offer products based partly or the process (Vanderhoven et al., 2015). wholly on a spaced repetition model – these include , Synap, Brainscape, and Lingvist. Mul- tiple open-source flash-card programs such as 3.4 Understanding Progress, Measuring and exist, often based on an algorithm Difficulty, and Assessment Design from the proprietary software SuperMemo, specifi- cally SM-2, which is described in (Wozniak, 1990) Learning design typically starts by identifying the and bears resemblance to the paper-based algorithm core aims of learning or abilities of a student to be described in (Leitner, 1974). developed based on the domain of the subject to Although the algorithm currently employed by be taught, and structuring practical activities in such SuperMemo at the time of writing (SM-17) is far a way as to develop these abilities (MacLean and more complicated than that of Anki (SM-2), in that Scott, 2011). Misconceptions, identified previously it calculates intervals based on measures of stabil- via empirical measures and documented (for example, ity, retrievability, and item difficulty; the only input (Hestenes et al., 1992)) may be explicitly addressed in required from the user is a self-assessment of confi- order to accelerate the pace of learning. Similarly, hy- dence with respect to the answer; the time taken to pothesized learning progressions that have been vali- answer the question is not used to measure informa- dated empirically may be used to scaffold the learn- tion retrievability from . ing process more helpfully, and where appropriate, The ability to appropriately schedule specific test help students evaluate their own progress. These may questions at timely intervals, independent of larger be in the form of individual construct maps showing tests themselves, is a feature which is hard to build distinct levels of progress in the understanding of a on top of a system which does not natively support concept, but they could also describe the relationship it. For this reason, if spaced repetition functionality is desired in a system it is strongly advisable that such sible to lead towards institutional good practice functionality be implemented explicitly at a fairly low by reducing relevant technical barriers in advance. level of design. This should be explicitly planned for and done. 3. Design data visualizations in such a way as to be easily explorable for non-expert users. Where 4 Recommendations and trade-offs between power and accessibility are Requirements for a Formative necessary, the editorial choices involved should be System shaped by the processes the system is designed to support. For subjects with well-defined domains (as de- 4. Allow users to easily export any visualizations fined by (Spiro et al., 1995)) it is clear that short they create or modify for inclusion in reports, etc., term formative assessment and feedback can provide but unobtrusively watermark them with the name significant value, at the same time collecting suffi- of the system used. In doing so, the analytical cient data to adequately support various long-cycle power of the system used will be gradually adver- improvement strategies without overburdening stu- tised throughout a faculty or institution. dents or teachers, provided it is used appropriately. 5. Where teachers or other staff members are as- Ill-defined domains present more of a challenge for sessed via external criteria, ensure that good for- automated marking, but work may still be marked mative practice using the system will be evaluated scalably through self and peer assessment. Identify- positively against such criteria, by providing fa- ing misconceptions requires comparatively rich data cilities that aid in documenting relevant evidence. – at a minimum the content of student answers for de- tecting common incorrect responses, but ideally some sort of record of the process the student followed to Feedback: solve the question. Longer timescale processes tar- 1. Allow teachers to provide students with ”Knowl- geted at students like long-term adaptive testing and edge of Correct Response”-style feedback auto- spaced repetition may to an extent be driven with data matically. Teachers should be able to provide a collected for and produced by short-term formative worked solution which can be shown upon re- feedback. But the majority of existing solutions and quest. Refraining from showing elaborated feed- research makes use of the student’s own evaluation back unless desired avoids student fatigue. of their confidence in answering a question. As a re- sult of the above, we hypothesize that an extensible 2. Design the platform to make it easy for teachers to and modular system for formative assessment which link feedback with specific tasks they would have initially provides knowledge of correct response and to undertake in the future to perform desirably. optional elaborated feedback whilst collecting data 3. Provide statistical and visualization capabilities to about student answers and confidence, would be suf- express student progress through the curriculum, ficient as a platform for a reasonably wide variety however that curriculum is currently conceived of. of long term improvement processes without over- burdening users in the short-term. The small size and 4. Where it makes sense to do so, and is not discour- traditional nature of this dataset is perhaps surprising aging, make it easy for teachers to frame feedback in contrast with research interest in using alternative in terms of student’s previous performances. sources of data such as student blog frequency or con- versations with peers for learning analytic purposes Self- and Peer- Assessment: (e.g. (Cerezo et al., 2016)). Based on the above research, we present the following recommendations 1. Support rich feedback types such as free text, pic- for designers of formative assessment systems. tures, and video. Or at least enable students to critique pictures and videos, if not produce them Adoption: from within the system. 1. Design for an initially low level of emotional in- 2. Allow reconfigurability within the system with re- vestment and time commitment from users, pro- spect to peer-review, especially with regards to ducing immediate and obvious value in such a cir- how students are assigned work to review, and cumstance. with regards to anonymisation. 2. Whilst shaping institutional culture is not some- thing a software tool can actively do, it is pos- Assessment Design and Process: 1. Implement hardness measures within assessments UI component that displays the feedback specific to to automatically understand the relative difficul- the question type should also be written, and it is ad- ties students have with topics, and adjust teaching visable to write a custom statistical view for teaching time per topic accordingly. staff to use for analyzing answers given to individual 2. Where proficiencies correlate across groups of questions of that question type (the default statistical students and across time, use this information to tools will no be able to understand the structure of inform learning design, ensuring sufficient time student answers without help since that is specific to is spent on bottleneck topics for learning, and re- each question type). With the above completed, the structure course content according to any depen- relevant files can be placed in their appropriate places, dencies discovered empirically between topics. a new question type created in the database with ap- propriate marking server configuration, and from that 3. Present teachers with reliability metrics of pre- point on questions of that new type may be created. viously taken assessments, provided such auto- mated statistics would not be misleading.

Spaced Repetition: 1. The platform must be capable (as an option to teachers since such a process may not always be desired) of scheduling formative assessments Much of the desired functionality can be accom- based on confidence that is self-reported by stu- plished with the system as described above – for dents in order to maximize their learning. example, feedback is trivially possible, and collect- ing data about student misconceptions can be accom- plished by extending the system with complex ques- tion types which can be built using web technolo- 5 Design of an extensible formative gies. Self assessment likewise can be accomplished system via a dedicated question type. However, some prac- tices require certain outcomes to occur as the result A simple illustration of our proposed architecture of previous student performance and/or the time that can be seen in Figure 2 The practicalities of ques- has passed: adaptive testing assigns students recov- tion marking may vary dramatically between more ery tests as a result of poor performance in previ- advanced question types, and for this reason the re- ous tests; spaced repetition involves reassigning pre- sponsibility for marking student answers lies not with viously seen questions to students after an interval the core back end server but with individual mark- calculated from past confidence in their answers; peer ing servers, one for each question type, which com- review needs students to be assigned to review other municate with the core back end using standardized student’s work after they have submitted it. To pro- messages over a message broker. The client, the core vide sufficient functionality for these practices whilst back end, and the database are designed to be agnostic keeping the core application simple, we will imple- with respect to the types of questions supported. The ment a conceptually separate scheduler which runs database schema has a single table for all questions, both periodically to ensure time-dependent rules are with schemaless content and markscheme columns followed globally, but also upon request. Whilst (whose structure is question type specific). All ques- teachers assigning tests and questions manually to tions hold a single reference to an entry in the question students is handled directly by the core application, types table, which holds the configuration for access- any other assignment of questions (for any reason) is ing the marking server for all questions of that type. the responsibility of the scheduler. The rules of the In order for an extension author to write a new ques- scheduler are not designed to be changed by exten- tion type, first the format of data expected to be re- sion authors, and will likely be fairly opaque to users; ceived from the student must be defined, then a mark- instead of seeing the exact rules that are going to be ing server written that takes in this data via the mes- followed, teachers will be presented with a settings sage broker and returns some specific feedback (the UI that allows them to enable, for example, ”com- structure of which must also be defined) and a number puter adaptive testing” at the press of a button, and between 0 and 1 (this is so statistical analysis across this will enable all of the relevant GUI components as heterogeneous question types is possible). The front- well as the back-end logic designed to support such end code to produce this data must then be written, a work-flow. With the above said, there is no reason which consists of the UI of the question itself along a particularly dedicated researcher could not add new with the business logic required to make it work. A rules to the scheduler to support some new process. Figure 2: Design of a formative assessment system.

6 Conclusions and areas for future ponents of feedback such as timing affect its overall research efficacy. It is our hope that a system such as ours once developed, will permit long-term A/B testing of varia- tions to these components. Although ethical and pro- By targeting a web platform, a medium ideally fessional concerns related to delivering a potentially suited for the development of interactive and vi- sub-par educational experience to a section of paying sual tools, we aim to enable learning experiences students will have to be addressed in order to proceed. and questions to be developed with relative ease and in a way that remains coherent and integrated with the other formative practices we hope to encourage. Novel interactive questions types are an area of re- REFERENCES search with potentially rich rewards, as is applying what is known currently about the computer-assisted Albano, G. and Pepkolaj, L. (2014). Formative self- identification of misconceptions to new subject do- assessment to support self-driven mathematics educa- mains. It is our intention that our system serve as tion at university level. In International Conference a platform for multiple researchers looking to ex- on Web-Based Learning, pages 82–91. Springer. plore the effectiveness of richly immersive web-based Allal, L. and Schwartz, G. (1996). Quelle place pour learning tools. l’evaluation´ formative dans l’enseignement au cycle d’orientation? CO Infos, 178:5–8. One of the largest barriers to effective use of data Bennett, R. E. (2011). Formative assessment: A critical in an educational setting is the lack of statistical tools review. Assessment in Education: Principles, Policy which are easy to use for non-experts. Because our & Practice, 18(1):5–25. system aims to target practices that take place over a Black and Wiliam (1998). Inside the black box: Raising variety of timescales whilst collecting data in a con- standards through classroom assessment. Phi Delta sistent format we believe it to be suited for the pro- Kappan, 80(2):139–144, 146–148. totyping and subsequent empirical testing of a wide Black and Wiliam (2009). Developing the theory of forma- range of graphical statistical tools. Of particular note tive assessment. Educational Assessment, Evaluation for researchers is the problem of conveying nuanced and Accountability (formerly: Journal of Personnel statistical ideas which affect validity of inferences Evaluation in Education), 21(1):5–31. made to staff who have not been statistically trained, Black, P., Harrison, C., Lee, C., Marshall, B., and William, especially doing so visually. D. (2004). Working inside the black box: Assessment for learning in the classroom. 86(1):8–21. Appropriate treatment of issues of reliability and validity have been identified as key deficiencies in Blin, F. and Munro, M. (2008). Why hasn’t technology dis- rupted academics’ teaching practices? understanding existing research ((Gikandi et al., 2011), (Bennett, resistance to change through the lens of activity the- 2011)). A potential deficiency with the system design ory. Computers & Education, 50(2):475–490. presented above is the lack of resistance to questions Cerezo, R., Sanchez-Santill´ an,´ M., Paule-Ruiz, M. P., and being incorrectly labeled with the wrong topics. De- Nu´nez,˜ J. C. (2016). Students’ lms interaction patterns spite years of research, it remains unclear how com- and their relationship with achievement: A case study in higher education. Computers & Education, 96:42– Nicol, D. (2007). E-assessment by design: using multiple- 54. choice tests to good effect. Journal of Further and Ebbinghaus, H. (1885). Uber¨ das Gedachtnis¨ . higher Education, 31(1):53–64. Gehringer, E. F. (2014). A survey of methods for improving Nicol, D. J. and Macfarlane-Dick, D. (2006). Formative review quality. In International Conference on Web- assessment and self-regulated learning: A model and Based Learning, pages 92–97. Springer. seven principles of good feedback practice. Studies in higher education, 31(2):199–218. Gikandi, J. W., Morrow, D., and Davis, N. E. (2011). Online formative assessment in higher education: A review of Rodrigues, F. and Oliveira, P. (2014). A system for forma- the literature. Computers & education, 57(4):2333– tive assessment and monitoring of students’ progress. 2351. Computers & Education, 76:30–41. Greiff, S., Wustenberg,¨ S., and Avvisati, F. (2015). Rogers, E. M. (1995). Diffusion of Innovations: Modifica- Computer-generated log-file analyses as a window tions of a Model for Telecommunications, pages 25– into students’ minds? a showcase study based on the 38. Springer Berlin Heidelberg, Berlin, Heidelberg. pisa 2012 assessment of problem solving. Computers Romero, C. and Ventura, S. (2010). Educational data min- & Education, 91:92–105. ing: a review of the state of the art. IEEE Transactions Hall, R. and Hall, M. (2010). Scoping the pedagogic rela- on Systems, Man, and Cybernetics, Part C (Applica- tionship between self-efficacy and web 2.0 technolo- tions and Reviews), 40(6):601–618. gies. Learning, Media and Technology, 35(3):255– Russell, C. (2009). A systemic framework for managing 273. e-learning adoption in campus universities: individual Hattie, J. and Timperley, H. (2007). The power of feedback. strategies in context. Research in learning technology, Review of educational research, 77(1):81–112. 17(1). Hestenes, D., Wells, M., Swackhamer, G., et al. (1992). Seppal¨ a,¨ O., Malmi, L., and Korhonen, A. (2006). Observa- Force concept inventory. The physics teacher, tions on student misconceptions: A case study of the 30(3):141–158. build–heap algorithm. Computer Science Education, 16(3):241–255. King, E. and Boyatt, R. (2015). Exploring factors that influence adoption of e-learning within higher edu- Spiro, R. et al. (1995). Cognitive flexibility, constructivism cation. British Journal of Educational Technology, and hypertext: Random acces instruction for advanced 46(6):1272–1280. knowledge acquisition. P. Steffe e J. Gale. Spitzer, H. F. (1939). Studies in retention. Journal of Edu- Kotter, J. P. and Cohen, D. S. (2002). The heart of change: cational Psychology Real-life stories of how people change their organiza- , 30(9):641. tions. Harvard Business Press. Van Labeke, N., Whitelock, D., Field, D., Pulman, S., and Richardson, J. T. (2013). Openessayist: extractive Landauer, T. K. and Bjork, R. A. (1978). Optimum re- summarisation and formative assessment of free-text hearsal patterns and name learning. Practical aspects essays. of memory, 1:625–632. Vanderhoven, E., Raes, A., Montrieux, H., Rotsaert, T., Lefevre, D. and Cox, B. (2016). Feedback in technology- and Schellens, T. (2015). What if pupils can assess based instruction: Learner preferences. British Jour- their peers anonymously? a quasi-experimental study. nal of Educational Technology, 47(2):248–256. Computers & Education, 81:123–132. Leitner, S. (1974). So lernt man lernen. Herder. Wiliam, D. (2006). Formative assessment: Getting the fo- Lin, Y.-S., Chang, Y.-C., Liew, K.-H., and Chu, C.-P. cus right. Educational Assessment, 11(3-4):283–289. (2015). Effects of concept map extraction and a test- Wiliam, D. and Thompson, M. (2007). Integrating assess- based diagnostic environment on learning achieve- ment with learning: What will it take to make it work? British Journal of Ed- ment and learners’ perceptions. Lawrence Erlbaum Associates. ucational Technology. Wilson, M. (2009). Measuring progressions: Assessment Looney, J. (2011). Integrating formative and summative as- structures underlying a learning progression. Journal sessment: Progress toward a seamless system? OECD of Research in Science Teaching, 46(6):716–730. Working Paper. Wozniak, P. A. (1990). Optimization of learning. Unpub- Luaces, O., Alonso, A., Troncoso, A., Bahamonde, A., et al. lished master’s thesis, Poznan University of Technol- (2015). Including content-based methods in peer- ogy. Poznan, Poland. assessment of open-response questions. In 2015 IEEE International Conference on Data Mining Workshop Zou, X. and Zhang, X. (2013). Effect of different score (ICDMW), pages 273–279. IEEE. reports of web-based formative test on students’ self- regulated learning. Computers & Education, 66:54– Macfadyen, L. P. and Dawson, S. (2012). Numbers are not 63. enough. why e-learning analytics failed to inform an institutional strategic plan. Educational Technology & Society, 15(3):149–163. MacLean, P. and Scott, B. (2011). Competencies for learn- ing design: A review of the literature and a proposed framework. British Journal of Educational Technol- ogy, 42(4):557–572.