<<

This electronic thesis or dissertation has been downloaded from the King’s Research Portal at https://kclpure.kcl.ac.uk/portal/

Bridging the gap between past and present narrative nonfiction in the primary history classroom

Browning, Emma

Awarding institution: King's College London

The copyright of this thesis rests with the author and no quotation from it or information derived from it may be published without proper acknowledgement.

END USER LICENCE AGREEMENT

Unless another licence is stated on the immediately following page this work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International licence. https://creativecommons.org/licenses/by-nc-nd/4.0/ You are free to copy, distribute and transmit the work Under the following conditions:  Attribution: You must attribute the work in the manner specified by the author (but not in any way that suggests that they endorse you or your use of the work).  Non Commercial: You may not use this work for commercial purposes.  No Derivative Works - You may not alter, transform, or build upon this work.

Any of these conditions can be waived if you receive permission from the author. Your fair dealings and other rights are in no way affected by the above.

Take down policy

If you believe that this document breaches copyright please contact [email protected] providing details, and we will remove access to the work immediately and investigate your claim.

Download date: 29. Sep. 2021 Bridging the Gap Between Past and Present: Narrative Nonfiction in the Primary History Classroom

Emma Browning

Thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy

King’s College London Faculty of Social Sciences and Public Policy School of Education, Communication and Society

1 Bridging the Gap Between Past and Present: Narrative Nonfiction in the Primary History Classroom

Abstract

Narrative is seen as fundamental to human thought (Turner, 1996), yet in the Key Stage 2 classroom (pupils aged 7-11 years), expository texts are commonly used to support learning in content-based subjects, such as history (Medwell et al., 2017). If narrative is so essential, it might be harnessed as a powerful tool to support learning. This research compares the impact of narrative nonfiction (NNF) and expository text (ET) on the development and retention of conceptual understanding relating to World War (WWI).

NNF texts combine factual information with narrative devices. Given they present accurate information, embedded in a narrative structure that is thought to mirror the cognitive structures individuals use to make sense of the world around them (Bruner, 1985), NNF texts might support learning. This familiar structure can facilitate the reader’s construction of a strong situation model (van Dijk & Kintsch, 1983), which in turn might support both the development and retention of conceptual understanding of the information presented in the text.

A comparative experiment was conducted in which participants (78 children, aged 9-10- years-old) were placed into one of two conditions: in one, information about WWI was conveyed primarily through NNF texts, and in the other, through ETs. Participants completed written pre-, post- and delayed post-assessments to assess development and retention of conceptual understanding. Participants also discussed questions about the texts, which were recorded and transcribed for later coding and analysis. Specific areas of a conceptual understanding of history were observed to develop differently across conditions: generally, participants in the NNF condition showed greater chronological and causal thinking skills. During discussions, those in the NNF condition made a significantly greater number of inferences, whilst those in the ET condition made significantly more incorrect statements. In addition, participants in the NNF condition retained significantly more conceptual understanding at the delayed post-assessments than those in the ET condition, suggesting that NNF texts enhanced retention of

2 information. Overall, these findings suggest that narrative texts have the potential to be powerful learning tools, supporting specific areas of conceptual development. A new model for reading is proposed, highlighting distinctions between narrative and expository reading processes, particularly in terms of the situation models constructed in relation to these different text types. This research has implications for how texts are selected and utilised to support learning in the primary, history classroom.

3 Acknowledgements

Firstly, I would like to thank the two participating schools, and my amazing Year 5 participants, without whom this research would not have been possible. They truly brought this research project to life, and their enthusiasm and interest was an inspiration to me.

I would also like to thank my supervisors, Dr Jill Hohenstein and Dr Aisha Khan-Evans. Not only did Jill inspire me to pursue a PhD in the first place, but she provided invaluable support throughout the whole process, being there every step of the way. I was delighted to obtain funding through the ESRC London Interdisciplinary Social Science Doctoral Training Partnership Studentship Competition and am grateful to all at LISS- DTP for their support.

Finally, I would like to thank my family, friends and colleagues for their continual support and patience throughout this process, and for being there to keep me going even through the hardest of times. I could not have achieved this without you.

4 Table of Contents

ABSTRACT ...... 2

ACKNOWLEDGEMENTS ...... 4

TABLE OF CONTENTS ...... 5

LIST OF TABLES ...... 10

LIST OF FIGURES ...... 14

CHAPTER ONE: INTRODUCTION ...... 16 1.1. Inspiration ...... 16 1.2. Reading to learn ...... 16 1.3. The unique potential of narrative ...... 17 1.4. History learning at the primary level ...... 19 1.5. The use of narrative and expository texts in the primary classroom ...... 20 1.6. The issue ...... 21 1.7. Research questions ...... 22 1.8. Structure of this thesis ...... 23

CHAPTER TWO: REVIEW OF THE LITERATURE – INTERACTING WITH TEXTS ...... 24 2.1. Part One: Reading and comprehension processes ...... 24 2.1.1. The language debate: phonics vs. whole language ...... 24 2.1.2. Listening and reading comprehension: the unitary and dual process views ...... 26 2.1.3. The three mental levels of representation of a text ...... 28 2.1.4. The importance of prior knowledge ...... 38 2.2. Part Two: Narrative and expository texts ...... 40 2.2.1. What is narrative thinking? ...... 40 2.2.2. Defining narrative ...... 41 2.2.3. Distinguishing narrative and expository texts ...... 43 2.2.4. Forms of narrative: the continuum ...... 45 2.2.5. Memory and the episodic nature of narrative ...... 46

CHAPTER THREE: A CONCEPTUAL UNDERSTANDING OF HISTORY ...... 49 3.1. Historical knowledge ...... 50 3.2. Historical thinking skills ...... 50 3.3. A curriculum perspective of history learning ...... 52 3.3.1. The intended curriculum ...... 52 3.3.2. The implemented curriculum ...... 53 3.3.3. The attained curriculum ...... 54

CHAPTER FOUR: THEORETICAL FRAMEWORK ...... 55 4.1. A social constructivist approach to learning ...... 55 4.1.1. Reading and meaning making from a social constructivist perspective ...... 56 4.1.2. Possible difficulties with adopting a social constructivist perspective ...... 58 4.1.3. History from a constructivist perspective ...... 59 4.2. Cognitive narratology ...... 61 4.2.1. Synthesising social constructivism and cognitive narratology ...... 62 5 4.2.2. Narrative theory and reading ...... 63 4.2.3. Narrative theory stages of reading ...... 64 4.2.4. Bringing together narrative theory stages of reading and mental representations of texts .. 70

CHAPTER FIVE: REVIEW OF THE LITERATURE – READING TO LEARN THROUGH NARRATIVE AND EXPOSITORY TEXTS ...... 73 5.1. Comprehending narrative and expository texts ...... 73 5.2. Learning from narrative and expository texts ...... 75 5.2.1. Mental representations and prior knowledge in relation to narrative and expository texts . 76 5.2.2. Development of conceptual understanding in response to narrative and expository texts .. 84 5.2.3. Narratives to develop an understanding of unfamiliar, abstract concepts ...... 88 5.2.4. Narrative and relatability ...... 90 5.2.5. Narrative and interest: narrative as a learning stimulus ...... 92 5.3. The reader’s dilemma ...... 96

CHAPTER SIX: METHODOLOGY AND RESEARCH METHODS ...... 101 6.1. Overview of research methods ...... 101 6.2. Choice of methodology ...... 102 6.3. Data collection ...... 103 6.3.1. Participants ...... 103 6.3.2. Setting ...... 104 6.3.3. Grouping participants into two experimental conditions ...... 105 6.4. Materials ...... 107 6.4.1. Questionnaires ...... 107 6.4.2. Assessments (pre-, post- and delayed post-assessments) ...... 109 6.4.3. Prior knowledge activities ...... 111 6.4.4. Reading materials ...... 114 6.4.5. Discussion questions ...... 121 6.5. Procedures ...... 123 6.5.1. Administration of pre-/post-/delayed post-assessments ...... 124 6.5.2. Interventions ...... 125 6.6. Ethical Considerations ...... 128 6.7. Pilot study ...... 129 6.8. The coding process ...... 131 6.8.1. Constructing coding schemes ...... 132 6.8.2. Function coding scheme ...... 133 6.8.3. Content coding scheme ...... 136 6.8.4. Truth-value coding scheme ...... 139 6.8.5. Coding limitations ...... 140

CHAPTER SEVEN: ASSESSMENT ANALYSIS AND FINDINGS ...... 142 7.1. Descriptive statistics ...... 144 7.2. Inferential statistics: the development and retention of conceptual understanding ...... 147 7.2.1. How did text type affect assessment scores over time? ...... 148 7.2.2. Ability as an additional factor ...... 152 7.2.3. Analysis of three different themes of interventions ...... 158 7.2.4. Analysis of specific areas of conceptual understanding ...... 164 7.3. Additional factors: descriptive and inferential statistics ...... 170 7.3.1. Situational interest ...... 171 7.3.2. Individual interest: ratings of enjoyment of history learning ...... 173 7.3.3. Individual interest: ratings of interest in WWI ...... 175 7.3.4. The influence of individual interest on assessment scores ...... 176 6 7.3.5. Participant self-evaluations of conceptual understanding ...... 180 7.3.6. Prior knowledge ...... 183 7.3.7. Influence of additional, personal factors ...... 185 7.4. Discussion ...... 190 7.4.1. The development of conceptual understanding ...... 191 7.4.2. The retention of conceptual understanding ...... 193 7.4.3. Additional factors ...... 194 7.4.4. Conclusion ...... 198

CHAPTER EIGHT: DISCOURSE ANALYSIS AND FINDINGS ...... 199 8.1. Descriptive statistics ...... 200 8.1.1. Function codes ...... 201 8.1.2. Content codes ...... 202 8.1.3. Truth-value codes ...... 205 8.2. Inferential statistics: function and content codes ...... 206 8.2.1. Frequency of function and content codes across conditions ...... 207 8.2.2. Possible causes for the differing frequency of codes across conditions ...... 211 8.2.3. Influence of codes on assessment scores ...... 220 8.2.4. Reading ability and the production of content codes ...... 225 8.3. Inferential statistics: truth-value coding ...... 226 8.3.1. Perceptions of truth-value across conditions ...... 226 8.3.2. Influence of truth-value perceptions on assessment scores ...... 229 8.3.3. Justifications for truth-value perceptions ...... 232 8.4. Discussion ...... 236 8.4.1. The development of conceptual understanding ...... 236 8.4.2. The construction of mental representations and the impact on conceptual understanding237 8.4.3. Judgements of and justifications for the truth-value of texts ...... 240

CHAPTER NINE: GENERAL DISCUSSION ...... 242 9.1. The development of conceptual understanding in response to texts ...... 243 9.1.1. Substantive knowledge and historical thinking skills ...... 243 9.1.2. Second-order knowledge ...... 250 9.1.3. Inaccuracies in conceptual understanding ...... 251 9.1.4. Conclusions on the development of conceptual understanding ...... 252 9.2. The retention of conceptual understanding in response to texts ...... 253 9.3. Mental representations of narrative and expository texts ...... 254 9.3.1. Stage Two: Interpreting the text. The construction of a textbase ...... 255 9.3.2. Stage Two: Interpreting the text. The construction of situation models ...... 256 9.3.3. Stage Three: Translating the text into learning ...... 263 9.4. Perceptions of truth-value of narrative nonfiction and expository texts ...... 266 9.4.1. Participants’ truth-value judgements and justifications ...... 266 9.4.2. Implications of truth-value judgements for using different text types in the classroom ..... 270 9.5. Limitations ...... 270 9.5.1. Ecological validity ...... 271 9.5.2. Measures of learning ...... 272 9.5.3. The roles of the participants and the researcher ...... 275 9.5.4. The possible influence of reading ability ...... 276 9.6. Future directions ...... 277

CHAPTER TEN: A NEW MODEL OF READING ...... 280 10.1. Stage One – Accessing a text ...... 282

7 10.2. Stage Two – Interpreting a text ...... 282 10.2.1. Textbase construction ...... 282 10.2.2. Situation model construction ...... 283 10.3. Stage Three: Translating the text into learning ...... 284 10.4. The need for a new model ...... 286 10.5. Benefits of the narrative and expository pathways ...... 287

CHAPTER ELEVEN: CONCLUSION ...... 290 11.1. A contribution to knowledge: theoretical implications ...... 290 11.2. Pedagogical implications ...... 293 11.3. A final note ...... 295

REFERENCES ...... 297

APPENDICES ...... 322 Appendix A: G*Power calculation ...... 323 Appendix B: Details of participants on the Special Educational Needs register ...... 324 Appendix C: Reading and history levels of participants across schools ...... 325 C(i): Reading levels ...... 325 C(ii): History levels ...... 326 Appendix D: Consent package sent home to parents/guardians ...... 327 D(i): Information sheet for parents ...... 327 D(ii): Child-friendly information sheet ...... 330 D(iii): Consent form ...... 331 ...... 331 D(iv): Short questionnaire for parents ...... 332 Appendix E: Questionnaires attached to assessments ...... 333 E(i): Questionnaire attached to pre-assessments ...... 333 E(ii): Questionnaire attached to post-assessments ...... 334 Appendix F: Written assessment ...... 335 Appendix G: Annotated written assessment ...... 339 Appendix H: Mark scheme for written assessment ...... 343 Appendix I: Prior knowledge activities ...... 348 I(i) Intervention One (War Begins) prior knowledge activities ...... 348 I(ii) Intervention Two (Trench Life) prior knowledge activities ...... 349 I(iii) Intervention Three (The Home Front) prior knowledge activities ...... 351 Appendix J: Example script for War Begins intervention session ...... 352 Appendix K: Narrative nonfiction and expository texts designed for interventions ...... 354 K(i): Intervention 1 (War Begins) NNF text ...... 354 K(ii): Intervention 1 (War Begins) ET ...... 357 K(iii): Intervention 2 (Trench Life) NNF text ...... 360 K(iv): Intervention 2 (Trench Life) ET ...... 363 K(v): Intervention 3 (The Home Front) NNF text ...... 366 K(vi): Intervention 3 (The Home Front) ET ...... 369 Appendix L: Discussion questions for each intervention ...... 372 Appendix M: Additional chronological sequencing task cards ...... 373 Appendix N: Function coding scheme ...... 374 Appendix O: Content coding scheme ...... 379

8 Appendix P: Truth-value coding scheme ...... 384 Appendix Q: Rules for codable utterances ...... 386 Appendix R: Bonferroni calculations for groups of tests ...... 389 Appendix S: Repeated measures ANOVA and post-hocs run excluding anomaly observed on pre- assessment ...... 392 Appendix T: Post-hoc analyses exploring main effects for time ...... 393 Appendix U: The influence of ratings of enjoyment of history learning on assessment scores ...... 398 Appendix V: The influence of ratings of interest in WWI on assessment scores ...... 403 Appendix W: Repeated measures ANCOVAs and ANOVAs for additional factors ...... 410 Appendix X: Grouped analyses for Bonferroni calculations ...... 413 Appendix Y: Comparisons of raw coding data and arcsine transformed data ...... 415

9 List of tables

Table 1: A continuum of the different forms of narrative ...... 45 Table 2: Comparison of social constructivist and narrative theory stages of reading ... 64 Table 3: Percentage of pupils eligible for Pupil Premium and receiving free school meals in participating schools ...... 104 Table 4: Number of participants from each school ...... 105 Table 5: Grouping of participants into experimental conditions ...... 106 Table 6: Prior knowledge activities for each intervention ...... 113 Table 7: Themes within WWI topic ...... 115 Table 8: Different features and controlled features across the two text types ...... 118 Table 9: Flesch scores and number of words across pairs of texts in the three interventions ...... 121 Table 10: Outline of the stages of the experiment ...... 124 Table 11: Overview of function codes ...... 135 Table 12: Overview of truth-value justification codes ...... 140 Table 13: Cohen’s guidelines for effect sizes ...... 143 Table 14: Repeated measures ANOVA with condition as a factor ...... 149 Table 15: Mean scores and standard deviations (SDs) of NNF and ET conditions across the three assessments ...... 149 Table 16: Post-hoc paired samples t-tests comparing assessment scores over time .. 150 Table 17: Post-hoc paired samples t-tests comparing assessment scores over time for each condition ...... 151 Table 18: Post-hoc independent samples t-tests comparing assessment scores across conditions ...... 152 Table 19: ANCOVA with condition as a factor and reading level as a covariate ...... 153 Table 20: Adjusted mean assessment scores for conditions, with reading level as a covariate ...... 153 Table 21: Post-hoc ANCOVAs with reading level as a covariate ...... 154 Table 22: Repeated measures ANOVA with condition and a median split of reading level as factors ...... 155 Table 23: Mean assessment scores for higher and lower level readers across the three assessments ...... 155 Table 24: Post-hoc paired samples t-tests comparing assessment scores over time for lower and higher level readers ...... 156 Table 25: Post-hoc independent samples t-tests comparing assessment scores between lower and higher level readers ...... 156 Table 26: Pearson correlations between reading levels and assessment scores ...... 157 Table 27: Repeated measures ANOVA for assessment questions relating to Intervention 1: War Begins ...... 159 Table 28: Mean scores and standard deviations for assessment questions relating to Intervention 1: War Begins ...... 159 Table 29: Repeated measures ANOVA for assessment questions relating to Intervention 2: Trench Life ...... 159 Table 30: Mean scores and standard deviations for assessment questions relating to Intervention 2: Trench Life ...... 160 Table 31: Post-hoc paired samples t-tests comparing assessment scores over time for each condition ...... 160

10 Table 32: Post-hoc independent samples t-tests comparing assessment scores across conditions ...... 161 Table 33: Repeated measures ANOVA for assessment questions relating to Intervention 3: The Home Front ...... 161 Table 34: Mean scores and standard deviations for assessment questions relating to Intervention 3: The Home Front ...... 161 Table 35: Repeated measures ANOVA for assessment questions spanning all three intervention themes ...... 162 Table 36: Mean scores and standard deviations for assessment questions spanning all three intervention themes ...... 162 Table 37: Post-hoc paired samples t-tests comparing assessment scores over time for each condition ...... 163 Table 38: Post-hoc independent samples t-tests comparing assessment scores across conditions ...... 163 Table 39: Repeated measures ANOVA for simple conceptual thinking questions ...... 164 Table 40: Mean scores and standard deviations for simple conceptual thinking questions ...... 165 Table 41: Post-hoc paired samples t-tests comparing assessment scores over time for each condition ...... 165 Table 42: Post-hoc independent samples t-tests comparing assessment scores across conditions ...... 166 Table 43: Repeated measures ANOVA for complex conceptual thinking questions ... 166 Table 44: Mean scores and standard deviations for complex conceptual thinking questions ...... 166 Table 45: Repeated measures ANOVA for chronological thinking questions ...... 167 Table 46: Mean scores and standard deviation for chronological thinking questions 167 Table 47: Mean scores and standard deviation for each condition on sequencing task ...... 167 Table 48: Repeated measures ANOVA for causal thinking questions ...... 168 Table 49: Mean scores and standard deviations for causal thinking questions ...... 168 Table 50: Post-hoc paired samples t-tests comparing assessment scores over time for each condition ...... 169 Table 51: Post-hoc independent samples t-tests comparing assessment scores across conditions ...... 169 Table 52: Post-hoc repeated measures ANOVA for causal thinking questions, looking at the interaction effect between condition and time ...... 170 Table 53: Frequency of participants' ratings of enjoyment of intervention sessions .. 171 Table 54: Repeated measures ANOVA with condition and participants’ reported enjoyment of interventions as factors ...... 173 Table 55: Frequency of participants' ratings of enjoyment of history learning before and after intervention sessions ...... 173 Table 56: Mann-Whitney U exploring difference in ratings of enjoyment of history learning across conditions, for both pre- and post-questionnaires ...... 174 Table 57: Mean rank of NNF and ET participants’ ratings of enjoyment of history learning on pre- and post-questionnaires ...... 174 Table 58: Wilcoxon results exploring differences in ratings of history learning enjoyment over time for each condition ...... 174 Table 59: Frequency of participants interested in WWI ...... 175

11 Table 60: Chi-squares exploring participants’ interest in WWI topic as stated before and after interventions ...... 176 Table 61: Number of participants in each condition expressing interest in WWI on pre- and post-questionnaires ...... 176 Table 62: Mean pre-assessment scores for those interested/not interested in WWI . 179 Table 63: Participants' self-evaluations of knowledge of WWI before and after interventions ...... 181 Table 64: Mann-Whitney U tests comparing self-evaluations of NNF and ET conditions on pre- and post-questionnaires ...... 181 Table 65: Wilcoxon tests exploring self-evaluations of participants over time, in the NNF and ET conditions ...... 182 Table 66: Sources of participants’ prior knowledge and developed knowledge ...... 182 Table 67: Pearson correlations for participants’ pre-assessment scores and post- /delayed post-assessment scores ...... 183 Table 68: Pearson correlations for participants’ pre-assessment scores and post- /delayed post-assessment scores for the NNF and ET conditions ...... 184 Table 69: Average number of hours participants spent reading at home, as stated by parents ...... 185 Table 70: Repeated measures ANOVA with condition and average number of hours spent reading at home in a week as factors ...... 186 Table 71: Post-hoc repeated measures ANOVAs exploring interaction between time and average number of hours spent reading at home in a week ...... 187 Table 72: Frequency of participants that selected various hobbies ...... 188 Table 73: Repeated measures ANOVA with condition and reading as a hobby as factors ...... 188 Table 74: Mean scores and standard deviations of participants who listed reading as a hobby and those who did not ...... 189 Table 75: Post-hoc paired samples t-tests comparing assessment scores over time .. 189 Table 76: Post-hoc independent samples t-tests comparing assessment scores across participants with and without reading as a hobby ...... 190 Table 77: Cohen’s guidelines for effect sizes ...... 200 Table 78: Frequency of overarching and specific function codes ...... 202 Table 79: Frequency of the five specific content codes ...... 203 Table 80: Total number of truth-value judgement codes ...... 206 Table 81: Total number of truth-value justification codes ...... 206 Table 82: Independent samples t-tests comparing frequency of three overarching function codes across conditions ...... 208 Table 83: Means and standard deviations for three overarching function codes across conditions ...... 208 Table 84: MANOVA comparing frequency of three specific function codes across conditions ...... 208 Table 85: Means and standard deviations for three specific function codes across conditions ...... 209 Table 86: MANOVA comparing frequency of three historical thinking codes across conditions ...... 209 Table 87: Means and standard deviations for three historical thinking codes across conditions ...... 209 Table 88: MANOVA comparing frequency of the two historical knowledge codes across conditions ...... 210

12 Table 89: Means and standard deviations for the two historical knowledge codes across conditions ...... 210 Table 90: MANOVA comparing frequency of five specific content codes across conditions ...... 210 Table 91: Means and standard deviations for five specific content codes across conditions ...... 211 Table 92: Pearson correlations between seven codes of interest ...... 212 Table 93: Summary of hierarchical regression analysis for chronological codes ...... 215 Table 94: Summary of hierarchical regression analysis for complex codes ...... 216 Table 95: Summary of hierarchical regression analysis for inaccurate codes ...... 217 Table 96: Summary of hierarchical regression analysis for collaborative (negotiation) codes ...... 218 Table 97: Summary of hierarchical regression analysis for hypothetical codes ...... 219 Table 98: Summary of hierarchical regression analysis for inference codes ...... 220 Table 99: Pearson correlations between two overarching function codes and post- /delayed post-assessments ...... 221 Table 100: Pearson correlations between three specific function codes and post-/ delayed post-assessments ...... 222 Table 101: Pearson correlations between three historical thinking content codes and post-/delayed post-assessments ...... 222 Table 102: Pearson correlations between two historical knowledge content codes and post-/delayed post-assessments ...... 222 Table 103: Pearson correlations between five specific content codes and post-/delayed post-assessments ...... 223 Table 104: Summary of hierarchical regression analysis with condition and chronological codes as independent variables ...... 224 Table 105: Summary of hierarchical regression analysis with condition and chronological codes as independent variables ...... 225 Table 106: MANCOVA comparing frequency of five specific content codes across conditions ...... 226 Table 107: MANOVA comparing frequency of three truth-value judgement codes across conditions ...... 227 Table 108: Means and standard deviations for three truth-value judgement codes across conditions ...... 227 Table 109: Chi-squares comparing truth-value judgements in each intervention across the two conditions ...... 228 Table 110: Correlations between three truth-value judgement codes and post-/delayed post-assessments ...... 230 Table 111: t-tests comparing the frequency of the three truth-value justification codes across conditions ...... 233 Table 112: Means and standard deviations for the three truth-value justification codes across conditions ...... 233

13 List of figures

Figure 1: Model of conceptual understanding of history ...... 49 Figure 2: The cycle of using and informing pre-existing knowledge structures, through narrative thinking ...... 67 Figure 3: A model of reading ...... 71 Figure 4: Adapted model of a conceptual understanding of history ...... 110 Figure 5: Illustration of differences and similarities across equivalent narrative nonfiction and expository texts ...... 119 Figure 6: Overview of the content coding scheme ...... 137 Figure 7: Frequency of pre-assessment scores across conditions ...... 144 Figure 8: Frequency of post-assessment scores across conditions ...... 145 Figure 9: Frequency of delayed post-assessment scores across conditions ...... 145 Figure 10: Points scored on assessment questions relating to the three intervention sessions ...... 146 Figure 11: Proportion of points for different thinking skills questions on pre- assessments ...... 146 Figure 12: Proportion of points for different thinking skills questions on post- assessments ...... 147 Figure 13: Proportion of scores for different thinking skills questions on delayed post- assessments ...... 147 Figure 14: Mean score of each condition across the three assessments ...... 151 Figure 15: Participants’ reported enjoyment of NNF and ET intervention sessions .... 172 Figure 16: Mean scores of participants with different ratings of enjoyment of history learning ...... 177 Figure 17: Mean assessment scores of participants who love learning about history across conditions ...... 178 Figure 18: Mean scores of ET participants who stated an interest/did not state an interest in WWI ...... 179 Figure 19: Correlation of participants’ pre- and post-assessment scores ...... 184 Figure 20: Correlation of participants’ pre- and delayed post-assessment scores ...... 184 Figure 21: Mean assessment scores according to average number of hours spent reading at home in a week ...... 187 Figure 22: Overall number of codes produced, across coding schemes and conditions ...... 201 Figure 23: Frequency of historical knowledge content codes, across the three historical thinking content codes ...... 203 Figure 24: Proportion of specific codes occurring within conceptual thinking codes . 204 Figure 25: Proportion of specific codes occurring within chronological thinking codes ...... 204 Figure 26: Proportion of specific codes occurring within causal thinking codes ...... 205 Figure 27: Truth-values responses given across NNF and ET conditions in response to War Begins texts ...... 228 Figure 28: Truth-values responses given across NNF and ET conditions in response to Trench Life texts ...... 229 Figure 29: Truth-values responses given across NNF and ET conditions in response to Home Front texts ...... 229 Figure 30: Strong positive correlation between number of partially factual codes and post-assessment scores, grouped by condition ...... 231

14 Figure 31: Strong positive correlation between number of partially factual codes and delayed post-assessment scores, grouped by condition ...... 231 Figure 32: Moderate negative correlation between number of entirely factual codes and delayed post-assessment scores, grouped by condition ...... 232 Figure 33: A new model of reading ...... 281

15 Chapter One: Introduction

1.1. Inspiration During my first year of teaching, my Year 6 class and I studied World War Two (WWII). Previously, I had researched the potential of narrative to support young children’s conceptual understanding of evolution (Browning & Hohenstein, 2015), and specifically, how narrative could support children in overcoming conceptual constraints that make evolution a difficult concept to comprehend. History learning also presents barriers to young learners: the past is a foreign country (Lowenthal, 1985), in that it is distant and unobservable, and is subsequently difficult for young learners to conceive of (Wineburg, 2001). As such, I became interested in whether narrative could be utilised in the history classroom, to make history more accessible to learners. Alongside delivering discrete history lessons based on WWII to my Year 6 class, I began to read pupils an historical fiction book called ‘Once’, by Morris Gleitzman. This narrative followed a young, fictional Jewish boy who escaped from a Catholic orphanage – where his parents had left him for protection – at the outbreak of WWII. Following his escape, the boy traverses the complex landscape of WWII, beginning to witness, and struggling to understand, the events occurring in the world around him. At the end of every school day, I would read to the class for ten minutes; the children pulled their chairs closer to mine, with even the most reluctant of learners absorbed in the story. During the discussions after reading, I felt that greater strides were made in the children’s conceptual understanding of WWII than in the entire hour-long history lessons taught during the week. This narrative soon became one of my favourite teaching tools, and sparked my curiosity in pursuing more specifically what benefits narrative might provide in relation to history learning, and what it is about narrative that makes it unique.

1.2. Reading to learn When a reader interacts with a text, they construct three levels of mental representation of the text: the surface model, the textbase and the situation model (van Dijk & Kintsch, 1983). The situation model is a representation of the events denoted by a text, and is infused with the reader’s prior knowledge (Zwaan, 2016): this representation is thought to be important in enabling learning from a text (van Dijk & Kintsch, 1983; Zwaan, 2001). However, research suggests that situation models might

16 be differentially constructed in response to narrative and expository texts (Zwaan, 1994), and in response to the reading goals with which a reader approaches a text (which are also often influenced by text type) (Linderholm & van den Broek, 2002; McCrudden & Schraw, 2007; Bailey et al., 2017). Differences in the construction of situation models might consequently influence the quality of learning from a text. It has been suggested that differences in the construction of situation models might be resultant of differences in the typical structure and content of narrative and expository texts (Newberry & Bailey, 2019); yet research also indicates that, when these differences are controlled for, and instructions regarding reading goals are the same, differences in mental representations constructed in response to narrative and expository texts still emerge (Wolfe & Mienko, 2007; Wolfe & Woodwyk, 2010). This suggests that there may be a deeper, underlying difference between narrative and expository texts that influences how these two text types are processed. This might be due to some form of unique processing of narratives, as suggested in the field of narrative theory. When reading a narrative, readers are thought to construct a storyworld (Herman, 2009a). Although the concept of storyworlds theoretically overlaps with that of situation models – to the point that they are considered as almost interchangeable concepts (Gerrig, 1993) – the fact that storyworlds are uniquely associated with narratives suggests that they hold some distinctive properties, distinguishing them from situation models constructed in response to other text types, such as expository texts. It is important to determine the specific nature of situation models constructed in response to different text types, and the influence that these have on learning, in order to effectively utilise different text types to support learning in the classroom.

1.3. The unique potential of narrative As a phenomenon, narrative is ubiquitous (León, 2016), and is thought to be fundamental to the human mind (Turner, 1996). Narrative constitutes a particular mode of thought (Bruner, 1985), or a type of logic (Herman, 2002), which individuals use to understand and make sense of the world around them: narrative is a way of representing knowledge, and a way of structuring the mind. The word narrative itself is derived from the Latin gnarare, meaning ‘to know’. As such, it is arguable that narrative texts might prove to be beneficial to learning, for a number of reasons. Firstly, narratives are thought to mirror the cognitive structures that individuals use for making sense of

17 the world around them (Bruner, 1986; Gudmundsdottir, 1991; Doyle & Carter, 2003). Not only might this familiar, accessible structure support the understanding of new and unfamiliar information, but information encountered in this structure might be more easily assimilated into pre-existing knowledge structures. In this sense, narratives form a natural vehicle for educating learners (Egan, 1986). Secondly, it is thought that readers of narrative texts can transport, or relocate, into their mental representation of a narrative (Gerrig, 1993; Herman, 2009a), in order to transcend the limits of their personal realities and experiences (Browning & Hohenstein, 2015). This potential benefit is particularly pertinent to history learning, because of the distant, unobservable nature of history. Thirdly, narratives are thought to be memorable (Reiss et al., 1999; Schank & Berman, 2013; Norris et al. 2005) because they embed information in a meaningful context. Finally, narrative competence is generally accepted to appear at an early age and in most cultures (Polkinghorne, 1988): it is ‘a human universal on the basis of which transcultural messages about the nature of a shared reality can be transmitted’ (White, 1980:1). Therefore, any benefits that narrative may offer are likely to be relevant to a broad range of learners. However, whilst this positive ‘narrative effect’ (Norris et al., 2005:536) is discussed commonly throughout theoretical literature, there is little empirical research that explicitly compares the effect of narrative and expository texts on learning.

One concern with the use of narratives to support learning in the history classroom is that narratives do not have to be entirely factual. This leads to apprehension that the use of narrative as a learning tool might be counterproductive as it may reduce history to myth (Barthes, 1993), distorting historical reality so that learners pick up misconceptions regarding historical facts and events (Freeman & Levstik, 1988). To address this concern, this thesis focuses on narrative nonfiction (NNF) (Weisman, 2011): this is a type of narrative that is entirely factual, but is written using literary devices and structures associated with narratives, such as close attention to plot structure, character (Alpert, 2006; Gutkind, 2007; Nünning, 2015), and other techniques such as strong sensory language. It is suggested that these are the features that make narrative appealing and engaging to readers (Freeman & Levstik, 1988; Ehlers, 1999). NNF ‘harnesses the power of facts to the techniques of fiction [or narrative]’ (Alpert, 2006:26): it combines accurate information, which is essential when teaching subjects

18 such as history, with narrative devices, which appeal to a reader’s basic cognitive functioning, creating a potentially powerful teaching tool (Bage, 1999).

1.4. History learning at the primary level According to Wineburg, historical thinking is not a natural process, and does not result ‘automatically from psychological development’ (2001:7). Others argue that younger learners are incapable of historical thought because their minds are cognitively underdeveloped for this area of learning (Zaccaria, 1978). This is partly because the experiences children have with history are remote (Wineburg, 2001; von Heyking, 2004): history is largely unobservable, and therefore developing an understanding requires some form of abstract thought. Hallam (1967) found that Piaget’s stages of cognitive development occurred much later in relation to historical thought than in relation to scientific and mathematical thought: specifically, the concrete operational stage, during which children are no longer restricted by their concrete, immediate environment (Piaget, 1952), was not observed to occur until 16.2- to 16.6-years-old on an historical task. Such difficulties create a tension in history education, highlighted by Wineburg, between ‘the familiar past’ and ‘the strange and inaccessible past’ (2001:6). The former involves teachers and learners contorting the past to fit with their existing beliefs, in an attempt to make the past more accessible. However, if learners can possibly catch a glimpse of the strange and inaccessible past, the subsequent experiences can be mind- expanding, supporting an understanding of the past without radically distorting it. This highlights the issue of how to communicate the past effectively to those who were not present at the time (Harnett, 2010), without contorting it too greatly in attempts to make it more accessible. Although difficult, this issue may not be irresolvable: opposing Hallam’s (1967) findings, Fillpot (2012) found evidence of sophisticated historical thinking by a third grade (Year 4) pupil who had been taught by the lead teacher of the Bringing History Home (BHH) project. Fillpot concluded that the greatest restrictions on the ability to think historically are curriculum and teaching methods, rather than age and the distant nature of history. Perhaps the use of narrative could prove to be one such teaching method in allowing teachers and learners to bridge the vast gap between the past and the present.

19 1.5. The use of narrative and expository texts in the primary classroom Despite claims about the power of narrative, this text type is often overlooked as a teaching tool in particular areas of the curriculum. The 2014 primary National Curriculum promotes the use of both ‘fiction and nonfiction’ to help children develop literacy skills, their ‘knowledge of themselves and the world’ and to ‘gain knowledge across the curriculum’ (DfE, 2013:14). In the context of primary education, fiction is often equated with narrative, whilst nonfiction typically refers to expository texts (these terms will be clarified in section 2.2.3). Little research exists to document primary teachers’ perceptions of children’s literature and how they select it for use in the classroom (Cremin et al., 2008). However, research that does exist suggests that fiction and nonfiction texts might be used for different purposes and to achieve different outcomes. Firstly, it has been found that children’s literature is predominantly used to support the teaching of literacy skills (Adalsteinsdottir et al., 2011), rather than to develop knowledge across the curriculum, and that fiction specifically is typically used when children are learning to read (Duke, 2000). Conversely, expository textbooks are the ‘dominant translation of the curriculum’ (Repoussi & Tutiaux-Guillon, 2010:156), in that they are one of the most widely used resources for conveying the actual content of the curriculum to learners. In particular, expository texts are associated with content- based foundation subjects1, such as history and science (Medwell et al., 2017), because their key purpose is to inform (Hall et al., 2005; Kuhn et al., 2017). This is reflected in teacher perceptions: teachers often feel informational texts have high educational value, whereas fictional books hold a greater entertainment value (Kotaman & Tekin, 2017) and are associated with reading for enjoyment (Marriott, 1986; Bortnem, 2008). These perceptions may reflect that narratives are known to evoke the imagination, which is historically associated with fantastical worlds, therefore opposing notions of evidence and fact (Peñaloza & Robles-Piñeros, 2020). As a result, whilst narratives are more commonly associated with learning to read, expository texts are usually utilised when reading to learn.

1 Content-based subjects such as history and geography are contrasted with skills-based subjects such as English, maths and art. Although learning in most subjects involves both acquiring skills and learning a body of information, skills-based subjects are dominated by the former, and contents-based subjects by the latter. 20 Considering the history classroom more specifically, when primary school teachers were asked about their use of texts to support teaching, historical stories proved popular: 30% of respondents stated that they ‘often’ used historical stories, compared to 19% who ‘often’ used textbooks (Historical Association, 2016). However, when asked how history is largely taught, Key Stage 1 (KS1) and Key Stage 2 (KS2) responses were presented separately, revealing a slightly different picture. In KS1, 58% of respondents either ‘always’, ‘very often’ or ‘often’ used storytelling to teach history, compared to 30% of KS2 teachers. This suggests that whilst historical stories are popular in the primary classroom, popularity decreases from KS1 to KS2. However, these results must be interpreted with caution: the respondents were likely to be history enthusiasts, which is reflected in the fact that 31% of respondents were history graduates, and 67% held a history subject leadership role. Therefore, results might not be representative of general practice. In addition, this data does not make clear how stories were typically being used in the classroom: narratives might be used as a stimulus to spark interest in an historical topic, but not fully integrated into teaching and learning practices in order to unlock their full potential.

1.6. The issue As discussed above, research indicates that both text type and reading goal influence the construction of situation models, and thus may affect how a reader interprets and learns from a text. Yet this area remains relatively unexplored with younger children in the classroom, where narratives are often associated with learning to read, and expository texts with reading to learn. As such, the possible benefits of narratives on learning in the primary classroom are unestablished. This identifies the need for research to explore how young learners respond to different text types, in order to enable practitioners to choose the most appropriate text types for particular circumstances.

Equally, the above highlights difficulties that primary-age pupils might encounter in history learning, due to distant nature of history: learners need to be encouraged to think outside of their own concrete, everyday experiences, in order to imagine and understand concepts that are largely unobservable. Yet the importance of developing a conceptual understanding of history cannot be understated: not only does an

21 understanding of history provide us with a sense of who we are, and where we have come from, but it also provides us with both inspiration and moral lessons to take forward into the future. As novelist Robert Heinlein writes, “[a] generation which ignores history has no past – and no future” (1973:241). Therefore, it is crucial to explore how learners can effectively develop a conceptual understanding of history.

1.7. Research questions In light of these two issues, this thesis intends firstly to compare how narrative and expository texts might support young learners in developing and retaining a conceptual understanding of history. It has previously been argued that establishing whether there is a beneficial narrative effect is ‘virtually impossible’ (Norris et al., 2005:553), due to difficulties in comparing approximately equivalent narrative and expository texts. However, this research, through adopting a quasi-experimental approach, intends to take a step towards providing initial evidence for the possible differential effects of narrative and expository texts on learning.

From any differences that may arise, this thesis aims to reflect on the mental representations that readers construct in response to texts, exploring how these might differ in unique ways in relation to narrative and expository texts. In recognition of the fact that narratives are not always perceived as factual, this research also seeks to explore readers’ perceptions of the truth-value of texts – that is, how reliable, or ‘true’ texts are believed to be – and the impact that this might have on the development of conceptual understanding. This led to the development of the following four research questions:

1 How does narrative nonfiction affect the development of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 2 How does narrative nonfiction affect the longer term retention of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 3 How might mental representations diverge across narrative and expository texts for primary-aged readers?

22 4 How do primary-aged readers perceive and judge the truth-value of narrative nonfiction and expository texts, and what implications might this have for the development and retention of conceptual understanding?

1.8. Structure of this thesis The present chapter has provided the context for this thesis, stating a rationale and setting forth the research questions to be addressed. Following this, Chapters 2 to 5 will encompass the literature review and theoretical framework. Chapter 2 will discuss literature relating to the process of reading, before identifying the distinguishing features of narrative and expository texts. Chapter 3 will consider what constitutes a conceptual understanding of history, contextualising this within the National Curriculum for history in England. Chapter 4 lays out the theoretical framework that this thesis is embedded within, synthesising social constructivist and cognitive narratology approaches and relating these back to the processes of reading discussed in Chapter 2. This chapter will also reflect on how these approaches can be applied to history learning. Finally, a review of the literature that explores learning from narrative and expository texts will be conducted in Chapter 5.

Methodology and research methods are addressed in Chapter 6. Justifications for methodological choices are discussed, before a description of the various methodological stages of this research is given. Chapter 7 presents the findings from the written assessments completed by participants, alongside a brief discussion of these. Similarly, Chapter 8 presents the findings from discourse analyses, with a brief discussion. The main discussion of research findings is presented in Chapter 9, which aims to integrate the findings from the previous two chapters, exploring these in relation to the four research questions. This chapter will also consider limitations of, and future directions for, this research. In light of this general discussion, Chapter 10 will propose a new model of reading, distinguishing possible differential processing pathways taken in response to narrative and expository texts. Finally, Chapter 11 will conclude this thesis, considering both the theoretical and pedagogical implications of this research.

23 Chapter Two: Review of the literature – Interacting with texts The first part of this chapter will consider reading processes, including both a brief consideration of processes involved in learning to read (phonics and whole language) and a more detailed discussion on processes involved in reading to comprehend and learn, focusing on the construction of mental representations of texts. The second part of this chapter will go on to define and distinguish different text types, focusing on narrative and various types of narrative. Overall, this section aims to provide a foundation of different text types and reading processes, which will be drawn on moving forward in this thesis to explore how and why these processes might differ across the defined text types.

2.1. Part One: Reading and comprehension processes This section will begin by briefly considering how readers initially access a text when they are learning to read. Following this, it will compare reading and listening comprehension processes: although participants will be provided with print copies of texts during the experiment, exposure to texts will be primarily audial, and therefore it is important to compare these comprehension processes. Next, it will explore an area particularly pertinent to this research: the mental levels of representation constructed during the process of reading. Finally, it will briefly discuss the importance of prior knowledge during reading comprehension.

2.1.1. The language debate: phonics vs. whole language The ways in which we read, understand and interpret texts is widely contested. Policies regarding the teaching of reading in schools have moved between two main approaches: the whole language approach and the phonics approach (Soler & Openshaw, 2007; Williams, 2014). The whole language approach is embedded in the philosophy that learning to read should be a natural process (Stahl & Miller, 1989) in which learners consider the words and sentences in a text as part of a whole text, and therefore embedded in a context. From this perspective, teaching should be integrated into meaningful language tasks (National Reading Panel, 2000). This view of reading heavily influenced the National Literacy Strategy (DfEE, 1998), which suggested that skilled readers should use a variety of graphophonic, semantic and syntactic clues to gain meaning from texts (Williams, 2014). In contrast, the phonics approach to reading claims 24 that readers fragment words and larger units of language using a set of technical skills, such as segmenting, to enable comprehension (Soler & Openshaw, 2007). This approach relies on the correspondence between graphemes and phonemes: using knowledge of letter-sound correspondence, readers can segment words into smaller units, which can aid the translation of written words into more familiar spoken words for early readers. The phonics approach has influenced educational policy in more recent years, being promoted in influential documents such as the Rose Review (Rose, 2006), and remaining a focus at Key Stage 1 in the 2014 National Curriculum (DfE, 2013). The approach suggests that comprehension is a product of reading, in that it is a result of processing which occurs after reading. Conversely, the whole language approach views comprehension as a process: comprehension is actively constructed during the reading process (Williams, 2014).

Whilst ongoing debates question which of these approaches is most effective, it is likely that both approaches are utilised in the reading process. Stanovich (1980) proposes an interactive-compensatory model of reading, in which the process of reading draws simultaneously on several knowledge sources, including whole language and phonic techniques. Despite evidence to support this approach (Stanovich et al., 1981; Faust & Kandelshine-Waldman, 2011), it must be recognised that different processes occurring within this model may be more or less relevant at different stages of reading development. For instance, research shows that systematic phonics instruction only significantly improves reading comprehension skills in younger children (first-grade, or Year 2, and below) (National Reading Panel, 2000). Conversely, Gee’s (1995) meta- analysis evaluating the effectiveness of phonic and whole language approaches on reading comprehension showed a positive effect size in the direction of whole language. Whilst phonemic awareness and accurate word decoding skills are an important prerequisite to effective comprehension (Stahl & Miller, 1989), once children have mastered these skills, it is likely that they rely on other processes – more pertinent to the whole language approach – to effectively comprehend texts on a more complex level.

This thesis is primarily concerned with the comprehension and interpretation of texts, rather than the decoding of texts: it is interested in reading to learn, as opposed to

25 learning to read. Therefore, whilst phonic approaches are recognised to be important to the reading process, this thesis will orientate towards a whole language approach. As the primary modality of texts used in this research project will be audial, the discussion below will explore similarities and differences between reading and listening comprehension processes.

2.1.2. Listening and reading comprehension: the unitary and dual process views There are two views of listening and reading comprehension: the unitary comprehension process model suggests that individuals use the same comprehension processes during listening comprehension and reading comprehension (Sinatra, 1990; Diakidoy et al., 2005), whilst the dual process theory claims that although reading and listening comprehension share common elements, there are important differences between the two processes. This section will explore evidence for each of these theories.

To explore reading and listening comprehension processes, Diakidoy et al. (2005) had children in Grades 2, 4, 6 and 8 (aged 7- to 14-years-old), from across five schools in Cyprus, read two texts (one narrative and one expository) and listen to two texts (one narrative and one expository). All texts were judged by teachers as being appropriate for each grade. Following each text, participants were asked to state whether literal or inferential questions about the text were correct or not. The data collected showed that listening and reading comprehension scores were significantly correlated at all grade levels. Listening comprehension scores were higher than reading comprehension scores in Grade 2, although the two became more closely aligned from Grade 4 onwards, with reading comprehension significantly improving between Grades 2 and 4. This is likely to be because it is between Grades 2 and 4 that decoding skills are expected to be mastered. These findings align with the view that reading and listening comprehension converge at word level (Sinatra, 1990): once children are competent decoders, their listening and reading comprehension processes are the same. However, as questions required participants to state whether a statement was correct or incorrect, there was a 50% chance of participants correctly guessing answers, making the reliability of comprehension measures questionable. In addition, this research was conducted in Greek, and therefore, findings may not be generalisable to other languages.

26 In support of Diakidoy, poor readers have been found to score significantly lower on comprehension tests when reading a text compared to when listening to a text (Horowitz & Samuels, 1985). The authors suggested that poorer performance when reading was due to barriers posed by problems with decoding skills. From this perspective, listening to a text removes a barrier to comprehension for younger and poorer readers. Cognitive load theory (CLT) suggests that an individual’s working memory can only deal with a limited amount of information (Sweller, 1988): with regard to reading processes, if poorer readers do not have to use processing space to decode words, they can focus all of their working memory on comprehension (Carlisle & Felbinger, 1991). However, Horowitz & Samuels’ (1985) research involved just 18 poorer readers, most of whom showed difficulties with word decoding skills specifically. Yet poorer readers can have difficulties specifically in reading comprehension, listening comprehension, or both areas of comprehension (Carlisle & Felbinger, 1991). The fact that some readers may have difficulties with either listening or reading comprehension suggests that these two skills do not partake of the same processes, otherwise both areas would prove to be difficult. This is consistent with the dual process theory: although reading and listening comprehension share common elements, there are also differences which mean that the two processes are not identical. However, there is little research to suggest exactly what these differences might be (Sinatra, 1990). Some suggest that differences are due to characteristics of spoken language and written language: for instance, spontaneous spoken language is more fragmented than written language, as it is not presented in fully formed sentences (Sinatra, 1990; Carlisle & Felbinger, 1991). However, whilst this might create processing differences between reading and listening comprehension when listening to spontaneous spoken language, it would not equally apply to the comprehension of written language read aloud.

Overall, it appears likely that once words have been effectively decoded and processed, the processes of meaning construction are similar across both reading and listening comprehension (Sinatra, 1990), particularly when listening comprehension involves a written text spoken aloud as opposed to spontaneous spoken language (Gerrig, 1993). Therefore, a unitary comprehension process view will be adopted in this thesis; when interaction with a text is discussed, it is discussed in relation to texts that are either read or listened to, unless otherwise specified.

27 2.1.3. The three mental levels of representation of a text Once a reader can access a text, whether through reading or listening comprehension, they begin to construct multiple levels of mental representation of the text: the surface form, the textbase and the situation model (van Dijk & Kintsch, 1983). In the following section, descriptions of each mental representation will be provided. Following this, a reader’s construction of these mental representations will be explored. The presence of these levels of representation in memory will be considered, before a brief discussion of the role of mental levels of representation in learning from a text. Finally, concerns regarding these levels of representation will be laid out.

2.1.3.1. Surface forms, textbases and situation models The surface form is a mental representation of the exact wording of a text, and is thought to decay quickly in memory. The textbase is a representation of the text itself; it contains propositions presented in the text and the relations between these (van Dijk & Kintsch, 1983). Primarily, the textbase relates to the linguistic content and underlying structure of the text. Situation models represent the situations and events denoted by the text (Zwaan, 2016). They are thought to be bound in time and space, and are specific and episodic in nature (Zwaan, 2001; Tapiero, 2007): ‘[t]he situation model is a microworld that includes a spatial setting, agents in pursuit of goals, and causal chains of events that unfold chronologically’ (Graesser et al., 2002:258). This level of mental representation is also infused with a reader’s prior knowledge. Both textbases and situation models are thought to be stored in the episodic memory (van Dijk & Kintsch, 1983), which deals with memories of events and experiences that are primarily temporal and spatial in nature (Souchay et al., 2013) (see section 2.2.5 for a more detailed discussion of the nature of episodic memory). Whilst this section of the literature review will make reference to the surface form, the focus will be on textbases and situation models. This is because this thesis is primarily interested in learning from texts, and as van Dijk & Kintsch highlight, ‘[l]earning from text is not usually learning a text’ (1983:342). Therefore, the surface form level of representation is not of primary concern to this thesis.

28 2.1.3.2. The construction of textbases and situation models Both textbases and situation models are thought to be constructed and continuously updated during the reading process (Zwaan, 2016). Textbase construction occurs through two levels of processing: microprocessing and macroprocessing. The microstructure contains a list of propositions that represent local meaning in the text (Tapiero, 2007). Coherence is established between these propositions on a local level, for instance, considering referential, temporal and causal relations between propositions. The construction of the microstructure is restricted by capacity for working memory, and therefore small units of the text are considered at a time (Tapiero, 2007). Following this, macroprocessing takes these smaller units and organises them on a global level. It forms a hierarchical structure, where some propositions are recognised to be subordinate or superordinate to others. This global macrostructure accurately represents information in the text (Tapiero & Otero, 1999); it essentially represents an accurate summary of the text whilst highlighting the most relevant, important propositions (Caccamise & Snyder, 2005).

If readers have little prior knowledge of the topic presented in the text, or if they are not motivated to integrate prior knowledge with textual information, readers might only construct a textbase (Wolfe & Woodwyk, 2010). However, in most circumstances, readers go on to construct situation models. The construction of situation models is thought to occur through both incremental and global updating (Kurby & Zacks, 2015). The event-indexing model focuses on the incremental updating of situation models (Zwaan et al., 1995a). This model was originally proposed specifically in relation to the processing of narrative texts (although no definition of what constitutes a narrative text was provided) (Zwaan et al., 1995a). However, as this model deals with the processing of event representations in texts (Zwaan, 2016), it is assumed that it can be applied to any text representing a sequence of events, including expository texts. The model proposes that readers monitor event representations in a text on five dimensions: time, space, protagonist (or entity), causation and intentionality (Zwaan et al., 1995a). Each event in the text is indexed on all five of these dimensions. Where a discontinuity occurs on any of these dimensions (for instance, a change in time or space), reading times increase (Zwaan et al., 1995b; Therriault & Rinck, 2007). This is taken to indicate either that readers are updating the relevant index within the situation model (Zwaan et al.,

29 1995a), or that readers are generating inferences to understand this discontinuity in the context of the overall text (Magliano et al., 1999).

During global updating, the whole situation model is updated (Bailey et al., 2017). This process is explained through event segmentation theory (Zacks et al., 2007). While event segmentation occurs whenever an individual encounters and processes a sequence of events (Zacks & Swallow, 2007), this discussion will focus on event segmentation in relation to texts portraying sequences of events. Once a reader perceives an event boundary in a text – when one event ends and another begins – the reader clears the previous situation model from working memory and a new situation model is constructed (Bailey et al., 2017). This has been evidenced by slower reading times at event boundaries, which are often indicated by particular changes in the five dimensions monitored during reading (Zacks et al., 2009). As such, the reader ‘parse[s] a continuous stream of activity into meaningful events’ (Zacks et al., 2007:80), or multiple situation models, to represent the different events or situations denoted in the text. New situation models may be constructed in response to discontinuities in one dimension (Radvansky et al., 1998; Radvansky, 2005), or multiple dimensions (Magliano et al., 1999), but the more dimensions that show discontinuities, the more likely a new situation model is to be constructed (Zacks et al., 2009).

Both incremental and global updating of situation models occur in conjunction. This is exemplified by Gernsbacher (1996), who proposes the structure-building framework. This theory proposes that readers develop mental representations by constructing a foundation which they map cohesive textual information onto, through incremental updating. When information is less coherent, or there are discontinuities, readers shift to a new substructure within the constructed situation model, rather than a new situation model, through a process of global updating. Within this framework, situation models ‘comprise several branching substructures’ (Gernsbacher, 1996:289).

As a reader constructs situation models, they apply relevant prior knowledge from their long-term memory. This might be done for two reasons. Firstly, readers might draw on prior knowledge to make knowledge-based inferences about a text, in order to establish cohesion (Graesser & Kreuz, 1993), or to make sense of discontinuities. Alternatively,

30 readers might draw on prior knowledge relevant to the specific topic of the text in an attempt to understand the text within the context of what they already know. This latter process involves the reader integrating prior knowledge with information from the text, and can disrupt the structure of the original text, as readers rearrange textual information to accommodate for prior knowledge (Wolfe & Mienko, 2007; Wolfe & Woodwyk, 2010). Prior knowledge utilised might be episodic knowledge of events similar to those portrayed in the text, or more general world knowledge taken from semantic memory (Tulving, 1983; van Dijk & Kintsch, 1983).

Research has evidenced the existence of the three multiple levels of representations. Fletcher & Chrysler (1990) required fifty-four undergraduate participants to read ten short texts describing five items in a particular linear order (for instance, in terms of how expensive each item was). Participants then read a filler text and some instructions before completing one of three sentence recognition tests. In these tests, participants were given partial sentences from texts, and asked to select one of two words that completed sentences as they appeared in the texts. To test the presence of a surface representation of the text, the word options were the original word and a synonym. To test for the textbase, options were the original word and one that introduced new meaning but was consistent with the linear order of the text. Finally, to test for the situation model, participants chose from the original word or one that introduced new meaning and violated the linear order of the text. Participants chose the correct option significantly more on the textbase test than the surface level test, and significantly more on the situation model test than the textbase test. Differing performance across tests provides evidence that multiple levels of representation exist, and that these levels are associated with different retrieval processes. However, the research lacks ecological validity: texts described simple linear sequences and had little authentic reading purpose. The presence of mental representations may differ when reading longer narrative or expository texts with specific reading goals.

Further evidence suggests that mental representations might differ across text types. Johansson et al. (2018) monitored the gaze patterns of 28 university students while they read and then recalled three texts from a computer screen. Two texts were scene descriptions, and the third an expository control text about tuition fees in Germany. In

31 one of the scene descriptions, descriptions of the scene were congruent with where they were described in the text (e.g. the upper part of the scene was described at the beginning of the text, and the lower at the end). In the other, descriptions of the scene were incongruent with the position of the description in the text (e.g. the middle part of the scene was described at the beginning of the text). Fixations were found to be significantly shorter whilst reading the control text than both scene descriptions. These shorter fixations were taken as evidence of lower cognitive activity: the authors argued that there was greater cognitive activity in response to scene descriptions because the spatial information in the texts required readers to continually update their situation model. The authors also found that, when recalling specific units of information from the expository control text, participants were more likely to fixate on the place on the screen that corresponded to where the relevant words were presented in the text. This contrasted both the congruent and incongruent scene descriptions, where participants were more likely to fixate on the space corresponding to the visuospatial representation of the scene. This supports that a situation model was constructed in response to scene descriptions: information in the text was reorganised to create a visuospatial representation of the scene described. It was suggested that because information in the expository control text was abstract, the textbase (the representation of the text itself) was the most likely representation to become reactivated during recall. However, the authors do not suggest whether a situation model was not created in response to the expository control text, or whether the situation model constructed was not relied on for recall. This research will be discussed further in the section on retention below.

2.1.3.3. Retrieval and retention of information from textbases and situation models Both textbases and situation models are thought to be stored in the episodic memory (van Dijk & Kintsch, 1983). Despite both mental representations being stored in the episodic memory, the textbase is thought to decay more quickly in memory, whilst the situation model remains accessible for a longer period of time (Kintsch et al., 1990; Fletcher & Chrysler, 1990). This might be a result of how situation models are stored in memory: when constructing situation models, a reader draws on prior knowledge from similar situation models, and in doing so, they link the current situation model with other, similar situation models in pre-existing knowledge structures. As a result,

32 situation models are embedded within a larger structure, relative to similar situation models, in episodic memory. In contrast, textbases are isolated representations of singular texts (van Dijk & Kintsch, 1983).

To explore the retention of mental representations, Kintsch et al. (1990) presented participants with texts, after which participants completed a recognition test. Recognition test questions asked participants whether they had seen given sentences before. Participants were tested on three types of sentences. Firstly, they were tested on ‘old sentences’: these were sentences that appeared as they had in the original text. Secondly, paraphrased sentences were tested, which were sentences with minimal word order change (differed to original sentences on the surface level), but identical to original sentences in terms of meaning. Finally, inference sentences were tested. These were sentences that did not appear in the text, but could be inferred from the text, as they fit into the same situation model as original sentences. One group completed the recognition test immediately after reading texts, a second group after a 40-minute delay, a third after a two-day delay, and a final group after a four-day delay. It was found that surface memory decayed rapidly: a good memory for original sentences was only found in the ‘immediate’ condition. Whilst memory for the textbase was observed to decrease with delay, it showed a slower rate of decay than surface memory. In contrast, situational memory remained at a high level, regardless of delay, suggesting that the situation model is the strongest mental representation in memory. However, the time between reading of texts and the latest recognition test was only four days: it is uncertain as to how long memory for situation models remains at a high level.

Despite situation models being the slowest mental representation to decay in memory, the process of constructing multiple situations through event segmentation has been observed to both interfere with and support retrieval of information from situation models. Radvansky et al. (1998) gave university participants a list of sentences to memorise. The sentences all involved a person doing an activity whilst an event was occurring (for example, ‘The [person] was [activity] when the [event]’). Participants were told that all events took place at a party, providing a common spatial location. Participants were either placed in the same-time condition (events in sentences could all occur within the same period of time) or the different-time condition (events could

33 not occur within the same period of time). After attempting to memorise sentences, participants were given a set of test questions asking what particular people were doing whilst an event took place (recalling the activity), or when a person was doing a particular activity (recalling the event). The fan effect paradigm was used to explore the effect of situation models on retrieval of information: the fan effect proposes that the more associations that an individual makes with a particular concept, the greater the interference and therefore the more errors are made in retrieval of information (Anderson, 1974). Radvansky et al. (1998) found no fan effect in the same-time condition, but a fan effect was observed in the different-time condition. The authors suggest that discontinuities in the temporal dimension in the different-time condition meant that multiple situation models were constructed to represent events. Conversely, in the same-time condition, the temporal dimension was continuous, and therefore a single situation model was constructed. It is thought that it is more difficult to retrieve information from multiple situation models than from a single situation model (van Dijk & Kintsch, 1983). However, participants were asked to memorise sentences: rather than reflecting processes that participants use to read for various purposes, results might mirror processes that participants use to support memory. Elsewhere, it has been found that readers react differently to temporal, spatial and causal discontinuities in a text when told to read for memory, as opposed to reading for enjoyment, suggesting that reading for memory disrupts the normal construction of situation models (Zwaan et al., 1995b). Alternatively, research has found that event segmentation can advance memory of the temporal order of segmented events when they share the same context (DuBrow & Davachi, 2013) or semantic associations (Zacks et al., 2006). Radvansky (2012) suggests that the influence of event segmentation on memory depends on the circumstances of recall: when recalling information from situation models where there are particular similarities or connections across situation models, segmentation creates a larger organisational frame, increasing memory of events. When recalling a single event from one of many situation models between which there are no such connections, retrieval is negatively impacted.

In the previous section, Johansson et al.’s (2018) research was discussed: participants’ gaze patterns were monitored as they read and recalled scene descriptions (congruent or incongruent) or an expository control text. Gaze patterns suggested that those

34 reading the expository text drew on their textbase during recall, but those reading scene descriptions drew on situation models. This research also found that participants in the expository condition recalled significantly fewer textual ideas than those who read scene descriptions: the authors suggest that the construction of situation models in response to scene descriptions supported recall. However, scene descriptions were likely to be more familiar to Swedish participants (beach and country settings) and therefore arguably easier to recall than the expository information about tuition fees in Germany. Also, although participants would fixate on the visuospatial area of an object in a scene representation during recall, in almost all cases, participants recalled scene descriptions in the order that information was presented in texts. Therefore, it could be argued that the active process of recalling the texts from memory did rely more on text sequence (suggesting recall from the textbase) than visuospatial organisation of the scene depicted (the situation model). However, as recall immediately followed reading, these findings cannot be used to infer which textual representations are relied on in longer-term memory; the textbase may have been drawn on when recalling a text because it remained fully available to the reader immediately after having read the text.

Finally, it is important to note that the use of different levels of representation in retrieving information might differ for individual readers in different circumstances. Radvansky et al. (2001) administered a sentence recognition task to older adults (aged 61- to 97-years) and younger adults (aged 18- to 26-years) immediately after they had read four texts on unfamiliar historical topics. Older adults showed better memory for situation models, whereas younger adults showed better memory for the surface form and textbase. The authors argue that perhaps older adults used the surface form and textbase simply as scaffolds to enable the creation of a situation model, whereas younger adults placed more focus on these initial two mental representations. This brings to light the question of how younger children – specifically those in primary education – construct mental representations of texts, and whether they rely more heavily on one of the three levels. This issue is particularly pertinent when much of the research cited above is conducted with undergraduate students, rather than with younger children, as considered in this thesis.

35 2.1.3.4. Learning from textbases and situation models Much of the above research considers only the immediate recall of textbases and situation models, rather than longer-term retention. Arguably, learning only takes place when textual information is integrated with prior knowledge, and can be retained in the longer-term, beyond immediate recall. From this perspective, situation models are crucial for learning to take place, because they initiate the use of prior knowledge. However, this is not to say that the textbase is redundant: the textbase is an essential ‘stepping stone’ (van Dijk & Kintsch, 1983:341) enabling the construction of situation models. What’s more, it is likely that the textbase determines the nature of situation models: the textbase is important in that it contains information on the specific, linguistic style in which a text conveys information (van Dijk & Kintsch, 1983). This influences the way in which a reader processes the information conveyed. This is particularly important in the current research project, where the key difference between narrative and expository texts used is linguistic style.

2.1.3.5. Concerns regarding mental representations of texts The above discussion presents some concerns relating to the nature of situation models and the body of research conducted about mental representations. Firstly, there appear to be some contradictions in how situation models are viewed. Some perspectives focus on discontinuities in texts and how these can lead to segmentation: ‘breaks’ in particular dimensions can cause previous situation models to ‘be wrapped up’ (Magliano et al., 1999:225) so that new ones can be constructed. This view presents situation models as fragmented representations of texts, focusing on the deconstruction of previously cohesive texts. Conversely, other approaches seem to focus on the situation model as a more cohesive whole: situation models are seen as microworlds within which texts can be framed (Graesser et al., 2002), or overall, cohesive representations where segmented events are seen as substructures of a single situation model (Gernsbacher, 1996), rather than multiple situation models. This leads to the question of the nature of situation models representing texts: is there a single situation model, parsed into smaller substructures, or are there multiple situation models, which are linked or distinct to different degrees? This issue has further implications for the storage of situation models: once situation models are constructed, a reader uses prior knowledge to connect these with other, similar situation models in pre-existing knowledge structures. If multiple

36 situation models are constructed in response to a text, this could pull the segmented text apart further, with different events or situations portrayed in the text stored in relation to similar situations, rather than in relation to the text as a whole.

Further to this, research suggesting possible differential processes in the construction of situation models according to reading goal (McCrudden & Schraw, 2007) and text type indicates that perhaps the nature of situation models generally is not the issue, but the nature of unique, different types of situation models should be considered. When participants reading expository texts have a reading goal of studying, they have been found to have a better memory for textual information (Bohn-Gettler & Kendeou, 2014), and to make different inferences (Linderholm & van den Broek, 2002), compared with when they read for enjoyment. Providing readers with instructions regarding how to read texts has also been observed to influence how situation models are updated: when told to attend to space, readers were more likely to globally update situation models in relation to spatial discontinuities (Bailey et al., 2017). Elsewhere, Wolfe & Woodwyk (2010) suggest that readers of narratives infused with scientific information constructed stronger textbases than readers of equivalent expository texts, who constructed stronger situation models; readers of the narrative also recalled more overall content, whilst readers of the expository texts recalled more scientific content, suggesting that mental representations influenced the type of information recalled from the text. Similarly, Zwaan (1994) found that situational memory was improved for those reading news reports, whereas a better surface memory was present for literary texts: it was suggested that readers allocate processing resources differently according to text type. Despite these various differences observed, research suggesting how the nature of situation models might differ in response to these differential processes, specifically across expository and narrative texts, is lacking (Therriault & Rinck, 2007).

Finally, much of the research described above lacks ecological validity: it is largely conducted in experimental settings, with texts presented to participants in small chunks on computer screens, and reading goals instructed by researchers. Research on unidimensional situation models (constructed in response to texts with discontinuities on only one dimension) focuses largely on the immediate recall of mental representations, whereas research on multidimensional situation models often focuses

37 on how readers respond to discontinuities across the five dimensions when constructing situation models (Therriault & Rinck, 2007). Such research does not replicate interactions with authentic texts in naturalistic circumstances, and therefore, it is questionable as to what extent these processes will occur during natural reading. As such, further research is required to explore the construction of situation models in naturalistic learning environments, to gain an insight into how best to utilise texts as learning tools.

2.1.4. The importance of prior knowledge As highlighted above, prior knowledge plays an important role in the comprehension of texts. Because this research explores participants’ interactions with texts on an unfamiliar topic (World War One), it is important to consider the impact of prior knowledge on comprehension in more detail. This section will briefly outline the role of prior knowledge in reading comprehension.

Readers with a greater prior knowledge of a subject have been found to score more highly on comprehension tests for texts related to that subject than those with less prior knowledge (Ozuru et al., 2009; Kostons & van der Werf, 2015). This is because information stated in a text is not always sufficient for the reader to create a coherent mental representation of the text: the reader needs to activate prior knowledge to complete this mental representation in order to fully comprehend the text (Kintsch, 1998). This might involve activating relevant prior knowledge either to contextualise a text, or to make knowledge-based inferences about the text to establish cohesion (Graesser et al., 1994). In support of the latter, McNamara (2001) found that high knowledge readers were more capable of understanding less coherent texts, because they were able to make knowledge-based inferences to support comprehension. In line with this, readers with less prior knowledge benefit more from highly cohesive texts (McNamara et al., 1996). However, the presence of prior knowledge may not be enough. To utilise this knowledge during reading, it must be activated, that is, brought into a reader’s working memory (Kostons & van der Werf, 2015). Bringing this knowledge into working memory allows readers to establish relationships between their current knowledge and knowledge presented in a text (Mayer, 1979). In a classroom setting, primary school pupils who were taught reading comprehension using prior knowledge

38 questions made more progress between pre- and post-assessments than those who were not: children need to be actively encouraged to utilise prior knowledge (Yusuf & Mohammed, 2013).

Not only can prior knowledge increase comprehension, but readers with greater prior knowledge can also retain new information acquired from texts for longer (Kendeou & van den Broek, 2007): new information is integrated into pre-existing knowledge structures, and the links formed enhance retention (Recht & Leslie, 1988). Although much of the research cited above considers expository texts, evidence shows that the activation of prior knowledge advances reading comprehension for both fiction and nonfiction texts for 10- to 12-years-old readers (Elbro & Buch-Iversen, 2013). However, differences have been observed in prior knowledge activation across different text types. Expository texts have been found to activate readers’ prior knowledge to a greater extent than narrative texts (Wolfe, 2005; Wolfe & Mienko, 2007; Best et al., 2008). Yet more knowledge-based inferences are thought to be made in response to narrative texts (Britton & Gülgöz, 1991; Graesser, 1981), or when reading for entertainment (Linderholm & van den Broek, 2002). Whilst there is contradictory research to suggest that more inferences are generated in response to expository texts, the authors do state that the expository texts used were less demanding than is typical of expository texts, and that this may have accounted for the difference (Baretta et al., 2009).

Despite the importance of prior knowledge, there is a danger that it might also inhibit the understanding of texts in some instances. Kucer (2011) found that when fourth graders (aged 9- to 10-years-old) recalled narrative texts, they produced statements which conflicted with the text: the authors suggested that this was because participants’ prior knowledge of a subject could interfere with what was read in the text, essentially overriding textual information (see section 5.2.1 (p.82) for a more detailed description of this study). In addition, misconceptions in terms of prior knowledge can have a negative impact on text comprehension, as readers can reject new, accurate knowledge from a text if it does not fit with their inaccurate prior knowledge (Alvermann et al., 1985; van Loon et al., 2013).

39 Overall, the activation and utilisation of appropriate prior knowledge during reading enhances text comprehension. It is important here to note that a reader does not always require an in-depth knowledge of the subject of a text, but enough prior knowledge to allow them to access the text. Prior knowledge can be topic-specific or a more general prior knowledge of the world. The importance of prior knowledge in supporting comprehension will be considered later when making methodological decisions (see section 6.4.3).

2.2. Part Two: Narrative and expository texts Through considering the nature of narrative and expository texts, this section intends to provide clarity on the concept of narrative, and more specifically narrative nonfiction (NNF). This section will begin with a discussion of the broader concept of narrative as a mode of thought (Bruner, 1985), before constructing a definition of narrative text. Next, it will distinguish narrative from expository texts, with regard to the features typical of each text type. Following this, it will consider narrative texts on a continuum of factuality, focusing on the position of NNF on this continuum. Finally, this section will consider the relationship between narrative and episodic memory. With regard to this section, it is important to note that although narratives are transmedial in scope, in that they can be expressed through a wide range of media (Ryan, 2005), this thesis focuses on narrative texts: when the term ‘narrative’ is used, it is referring to narrative texts unless otherwise specified.

2.2.1. What is narrative thinking? Bruner (1985) proposed two fundamental modes of knowing and thinking: the paradigmatic mode and the narrative mode. The former mode of thinking, also known as the logico-scientific mode, involves thinking about the world using logical categorisations and empirical evidence. The latter is concerned with how meaning is attributed to experiences, particularly relating to human intentions and actions within these experiences. These modes of thought have been observed in writers: when scored on Bruner’s theory of narrative and paradigmatic thought, creative writers scored significantly higher on narrative thought than journalists (Kaufman, 2002). Dahlstrom (2014) demonstrates how these modes of thought can also be considered in relation to text types. Texts that draw on the paradigmatic mode of thought aim to dictate general,

40 abstract truths, which can then be applied to specific situations. Conversely, texts which draw on the narrative mode of thought work in the opposite direction: such texts describe specific cases, from which an individual can infer more general truths.

Although Bruner (1985) states that neither mode of thought should be valued or ignored at the expense of the other, narrative structures are thought to be central to thought processes. It is thought that the existence of an internal story schema (Gudmundsdottir, 1991) helps individuals to understand the world around them: this mental structure carries a set of expectations about how events should progress, and individuals organise their experiences in terms of this schema in order to make sense of them. Therefore, this narrative mode of thought allows individuals to make sense of and organise their lives (Bruner, 1986; Coles, 1989). It is argued that such a narrative competence is a natural, instinctive ability (Chomsky, 2006), as it is a skill observable in and integral to most cultures (Polkinghorne, 1988): humans are universally predisposed to ‘“story” their experiences’ (Doyle & Carter, 2003:130). The next section will consider the definition of narrative texts more specifically.

2.2.2. Defining narrative Being a difficult concept to comprehensively define, narrative has ‘resisted precise definition’ (Sperry & Sperry, 1996:445). Most definitions recognise that narratives are episodic in nature, in that that they represent a sequence of either true or fictitious events (Redrum, 2005). Labov defines a ‘minimal’ narrative as ‘a sequence of two clauses which are temporally ordered’ (1972:360). However, this creates a broad definition, which encapsulates a number of text types. This definition has been narrowed by some, through the addition of further, defining features: there must be a minimal structure in terms of the number of ‘utterances’ or events (Sperry & Sperry, 1996); all represented events must be temporally sequenced (Norris et al., 2005; Avraamidou & Osborne, 2009; Richmond et al., 2011); represented events must be causally connected (Richardson, 2000); or that there must be some form of disequilibrium or disruption to the events, which there is an attempt to resolve (Herman, 2007). However, for the latter point, it is questionable whether this is an essential defining feature or a feature of an engaging narrative: some common features of narratives are not essential, but are associated with high-quality narratives (Gerrig,

41 1993). Despite these efforts to define narrative, Redrum (2005) argues that there are texts, such as instructional texts, which could still be defined as narrative within the above definitions, although readers instinctively know that they are not. Alternatively, Redrum (2005) proposes a contextualised definition of narrative, elsewhere coined the transactional definition (Richardson, 2000). This definition suggests that rather than considering textual features, it is important to consider how the text is actively used by an individual in a particular context, and whether they intend it to be a narrative. Within different linguistic and cultural communities, members will recognise when a narrative is intended by the creator, and therefore will interpret it as a narrative. Fludernik (2000) proposes a similar definition, in which narrative is distinguishable from other broad ‘macrogenres’, such as argumentative and instructive macrogenres, by its communicative function. It is only within the macrogenre of narrative that other genres or text types can be identified by the presence of traditional genre expectations, or key features, which make them identifiable as a specific text type, for instance, a novel or a myth. However, in both Redrum and Fludernik’s definitions, narratives will be defined uniquely across different cultural and linguistic communities: such definitions make it difficult to define narratives that are not embedded within a particular context. This raises the question as to whether narratives should be defined by the starting point (i.e. textual features used to construct narratives) or the end point (i.e. the use and interpretation of the final product) (Gerrig, 1993).

Elsewhere, definitions consider the presence of narrators of, and protagonists within, narratives. Avraamidou & Osborne (2009) suggest that the presence of a narrator, or at least the sense of a narrator, distinguishes narrative from other text types. However, if a narrator is someone who provides an account of events, a narrator might also be someone who recounts historical events in an expository text. Rather than considering the narrator, Bruner focuses on the centrality of protagonists and particularity in narratives. He proposes that narratives contain two simultaneous psychological ‘landscapes’ (Bruner, 1986). First is the landscape of action, which involves what protagonists do within a narrative. This landscape is unique for its particularity: narratives reference particular, context-specific happenings, rather than making generic claims (Bruner, 1991). Secondly, is the landscape of consciousness, which maps protagonists’ thoughts, knowledge and feelings onto their actions, integrating a

42 psychological perspective to the events described. It is the combination of these two landscapes that makes narrative uniquely distinguishable from other text types. From this perspective, narrative is concerned with the human implications and consequences of events (Freeman & Levstik, 1988), and human intention and agency (Bruner, 1991; Boström, 2008), rather than the simple occurrence of events: readers process narratives not only cognitively, but emotionally (Klassen & Froese-Klassen, 2014).

The importance of protagonists and the landscape of consciousness is supported further by research into mental representations of texts, which suggests that readers track protagonists closely, regardless of reading instructions, when constructing situation models (Bailey et al., 2017). Further to this, areas of the brain activated whilst reading and producing narratives are also associated with theory-of-mind processes (the ability to understand that external agents have their own mental states, different to our own), including perspective-taking and empathy (Mar, 2011; Yuan et al., 2018); this suggests that the reader is seeking to understand the protagonist through the landscape of consciousness. This thesis will adopt the definition of narrative given below, which draws on Bruner’s concept of the landscape of consciousness to distinguish narratives from other text types.

A narrative is a representation of a chronologically or causally linked sequence of events, which are characterised by particularity and represented in relation to the thoughts, knowledge and feelings of the protagonist(s) involved in the events.

2.2.3. Distinguishing narrative and expository texts Whilst a definition of narrative has been provided above, it is important to more explicitly contrast narrative and expository texts, in order to draw a clear distinction between the two. Expository texts are typically distinguished in three ways from narrative texts, in terms of purpose, content and structure (Mosenthal, 1985). However, it must be recognised that the features described are not limited to either narrative or expository texts, but are more commonly associated with one of these two text types. This section will briefly outline these differences; research exploring how differences between narrative and expository texts might cause greater difficulties in accessing expository texts will be outlined in section 5.1.

43 Firstly, whilst a narrative’s primary purpose is typically to entertain, an expository text’s purpose is usually to inform. It must be noted that these purposes are not necessarily directly translated into the reading goals of the reader; rather, a reader establishes their personal reading goal through a combination of personal intentions (the intentions with which a reader consciously or tacitly approaches a text) and given intentions (externally provided cues that are intended to orient readers), such as the purpose of the text (McCrudden & Schraw, 2007; McCrudden et al., 2010). The intended purpose of the text also influences the linguistic devices employed in texts: narratives employ features such as vivid description, suspense, figurative language and plot development in order to engage and entertain the reader, whereas expository texts are often written in a more scholarly, succinct style in order to clearly convey key information. In terms of content, expository texts tend to focus on a particular topic, providing detailed content relevant to the topic. In contrast, narratives are typically more diverse in content, as they trace events that are not limited to a particular topic. Finally, in terms of structure, narratives typically follow a similar structure. Rumelhart’s (1975) ‘story grammar’ describes a set of ‘rules’ which determine the various constituents and the overall structure of these constituents within a narrative. In contrast, expository texts can be structured in numerous different ways. Meyer (1975) suggests that there are a total of five different text structures commonly used for expository texts: description, sequence, cause/effect, problem/solution and compare/contrast. Expository texts for children that are written on historical topics often use ‘description’ text structures (Meyer, 1985): an author describes a topic, grouping information thematically, rather than sequencing it chronologically.

Whilst the terms narrative and expository are used above, the terms fiction and nonfiction are commonly used in relation to children’s literature. Although these are often used synonymously with the terms narrative and expository (respectively), it is important to distinguish these terms. Rather than referring to a specific text type, the terms fiction and nonfiction refer to the degree of factuality of a text. A fictional text is one in which the content is largely invented; a nonfiction text is one in which the content is factual. Therefore, both narrative and expository texts can be classified as either fiction or nonfiction.

44 2.2.4. Forms of narrative: the continuum Narratives exist on a continuum ranging from nonfiction to fiction. This continuum of narrative is illustrated below in Table 1. At the nonfiction end of the continuum is narrative history: this takes the form of true events being recounted in a chronologically linear fashion, referring to the experiences of real historical figures. As the key purpose of this text type is to inform, it maintains the linguistic style of an expository text (Mandler et al., 2011). Next on the continuum is narrative nonfiction (NNF): these texts are also entirely factual, with reference to the experiences of real historical figures, yet unlike narrative histories, the author also employs narrative devices, such as plot structure, figurative language and description, to engage the reader. NNF texts might follow a protagonist, but one whom is a real individual from history: information will not be fabricated about this individual, but information may be taken from chronicles about the individual. Moving closer towards the fictional end of the continuum is historical fiction: these narratives are based in a particular historical era or around a particular historical event, but parts of the plot may be elaborated upon, and the thoughts, feelings and motives of protagonists may be assumed by the author, and therefore fictionalised to some extent. Once more, narrative devices are used to engage the reader. Finally, at the furthest end of the continuum, are fictional narratives, which involve largely fictionalised events written using narrative devices.

Table 1: A continuum of the different forms of narrative

Continuum Nonfiction Fiction of forms of Narrative Narrative Historical fiction Fiction narrative history nonfiction Factuality Based Based entirely Based on true Largely of content entirely on on fact, with events/within a fictionalised. fact, with no no real historical fictionalised fictionalised context, but elements. elements. protagonists/plot fictionalised to some degree. Linguistic Formal, Employs Employs narrative Employs style succinct narrative devices to create narrative style, more devices to plot/engage devices to typical of create readers. create expository plot/engage plot/engage texts. readers. readers.

45 Polkinghorne’s (1988) concept of first-order and second-order referents helps to distinguish between the text types discussed above, particularly historical narratives, NNF and historical fiction. First-order referents are the events, facts and general content which make up a narrative. In any of these three forms of narrative, when written on the same historical topic, the first-order referents would be highly similar. The difference lies in the second-order referents, or in how these first-order referent events and facts are coherently pieced together and conveyed.

Exactly how the NNF and expository texts used in this research will be differentiated is described in the methodology chapter (see section 6.4.4). However, a clarification on the nature of NNF is important here. The above section sets forth that a definitive feature of narrative is the presence of a landscape of consciousness. While NNF predominantly draws on the features of narratives, the content is entirely factual, and as such, this landscape of consciousness might be lessened if the author is to avoid assuming the thoughts and feelings of protagonists. However, the landscape of consciousness is still apparent to a lesser degree in that the actions of protagonists are documented, and therefore a landscape of consciousness is left open for the reader to interpret a protagonist’s feelings and actions.

2.2.5. Memory and the episodic nature of narrative Before considering the relationship between episodic memory and narrative, this section will provide a brief discussion on declarative memory, which is a category of long-term memory. Within the declarative memory, there are two conceptualised neurocognitive systems: semantic memory and episodic memory (Tulving, 2005). Semantic memory relates to knowledge of the world: it deals with facts and generalisations about the world around us (Tulving, 1983). This knowledge is context- free, as it lacks specific chronological or spatial details (Martin, 2009). Conversely, episodic memory is context-specific and relates to the remembering of experiences and events. It is argued that episodic memory is a uniquely human skill: it draws on an autonoetic consciousness, and in doing so, allows mental time travel, enabling individuals to remember personal and thought-about events, and to imagine possible future happenings (Tulving, 2005). Autobiographical memory is often considered as synonymous with episodic memory. However, autobiographical memory is a subtype of

46 episodic memory (Wheeler et al., 1997). Episodic memory refers to memory of events, contextualised in a place and time, but these do not need to be personal or self- referential to the individual; for instance, they might be memories of events depicted in a text. Conversely, whilst autobiographical memory acts similarly, it deals with memories that hold personal relevance, or events that have directly occurred to the individual remembering. Pathman et al. (2011) present evidence supporting the divergence between these two types of memory: children aged 9- to 11-years-old showed an autobiographical memory capacity equivalent to that of adults, but their capacity for episodic memory was less developed. The different developmental rates suggest that these two types of memory draw on slightly different cognitive processes. Further to this, slightly different patterns of brain activations have been observed for episodic and specifically autobiographical memories (Gilboa, 2004).

Episodic memory is thought to be closely related to narrative. Research in the field of neuroscience has shown that the posterior cingulate cortex is activated during both episodic memory retrieval (Cabeza & Nyberg, 2000) and the production and comprehension of narrative (Mar, 2004; Yuan et al., 2018). Further to this, there is evidence that readers process events in narratives almost as if they are experiencing the events themselves: an fMRI study showed that when changes in perception and action occurred in narratives, areas of readers’ brains associated with processing those specific changes were activated: for instance, when a protagonist touched an object, brain regions associated with hand grasping were activated (Speer et al., 2009). This suggests that narrative events are processed in a similar way to personally experienced events, thus suggesting that they draw on the episodic memory: narrative is essentially a simulation of the social world, which readers experience while interacting with a narrative (Mar & Oatley, 2008).

Whilst there is little research into the parts of the brain activated during the reading of expository texts, it is hypothesised that such texts will not activate the same parts of the brain as narrative and episodic memory, at least not to the same degree (Mar, 2004; Larison, 2018). This hypothesis is partially supported by research which shows that different areas of the brain are activated when processing sentences with definite articles compared to indefinite articles (Robertson et al., 2000). The authors argue that

47 definite articles create more coherent discourse, closer to narrative (e.g. The grandmother sat at the table. The child played in the backyard) than indefinite articles (e.g. A grandmother sat at a table. A child played in the backyard), where the more generic nature is typical of expository texts. In addition, research into cognition has found that semantic associations predict recall to a greater extent in expository than narrative texts (Wolfe, 2005), suggesting that semantic memory, rather than episodic memory, might be utilised to a greater extent when reading expository texts.

Finally, it has been shown that the episodic richness of a text can influence the retention of information. Herbert & Burt (2004) showed undergraduate students either an ‘episodic poor’ or an ‘episodic rich’ text. Episodic richness was defined as distinctiveness in terms of specific sections of the text: for instance, the ‘episodic rich’ text read ‘a group of school children’, whereas the ‘episodic poor’ text read ‘a group of subjects’. Participants were tested on their retention of textual information after two days, and again five weeks later. It was found that participants who read the ‘episodic rich’ text showed greater retention on both tests. This suggests that episodic memory is effective in enhancing memory. However, both texts were expository, which highlights that whilst episodic richness is associated with narrative, it is not exclusive to narrative.

Whilst this section considers episodic and semantic memory systems separately, there is a body of neuropsychological research suggesting that these two systems are interdependent, influencing each other during the encoding and retrieval of information (Greenberg & Verfaellie, 2010). Therefore, it is likely that both of these systems are important to, and interact during, reading processes. However, the research described above suggests that there may be differences in the extent to which these different memory systems are drawn on during reading processes across different text types.

48 Chapter Three: A conceptual understanding of history A conceptual understanding of history can be broken down into two strands: historical knowledge and historical thinking skills. Traditionally, history teaching primarily dealt with the passive transmission of historical knowledge to learners (Dulberg, 2005): that is, passing on the ‘content’, or ‘facts’, of history to learners. More recently, a conceptual understanding of history has been recognised to also encompass historical thinking skills: these are the skills required to process historical information, such as chronological thought. Although these two strands are sometimes seen as contesting each other (Counsell, 2000), both are important in developing a conceptual understanding of history, and they often overlap: history is neither a collection of stories nor a set of skills but it comprises ‘both … we need to pass on the stories, but also impart the skills to hack the stories apart and make new ones’ (Mantel, 2017a). Figure 12 further breaks down historical knowledge and historical thinking skills, the specific elements of which will be outlined below. Following this outline of history learning, a curriculum approach to history learning will be considered. In the following chapter, history learning will be considered in relation to the social constructivist theoretical framework that this thesis is embedded in.

Figure 1: Model of conceptual understanding of history

Conceptual understanding of history

Historical knowledge Historical thinking skills

Historical Substantive Second-order Chronological Causal enquiry/ knowledge knowledge thinking thinking interpretation

2 Note that this model is adapted slightly for the practical purposes of data collection and analysis. See the methodology chapter (figure 4, p.110) for the adapted model.

49 3.1. Historical knowledge Historical knowledge comprises substantive and second-order knowledge (Donovan & Bransford, 2005). Substantive knowledge is the foundation of all historical knowledge: it is the acquisition of facts (Rogers, 1979; Freeman & Levstik, 1988; Sheldon, 2010). This may be, for instance, knowledge of significant events, the names of historical figures involved, and so forth. This knowledge is thought to be relatively objective and is essentially the substance, or content, of history, allowing learners to ‘form a network of semantic understandings’ (Harnett, 2000:2). Second-order knowledge is more elusive. Whilst there is no term by which this element is consistently referred to, the concept of it is discussed widely throughout literature on history learning: Coltham & Fines (1971) refer to it as empathy and imagining, whilst Rogers (1979) refers to it as procedural knowledge. It relates to an individual interpretation of the ‘content’ of history, and an understanding of the human implications and consequences of historical events (Freeman & Levstik, 1988). It requires possibility thinking, which involves the ability to consider the perspectives of others, to imagine and make inferences about information presented, and to consider possible alternatives (Cooper, 2018a), in order to more deeply understand the content of history. Essentially, it provides a greater depth of understanding beyond the objective facts, to appreciate history on a more empathetic level. While the focus of this thesis will be on the development of substantive knowledge, second-order knowledge will be considered.

3.2. Historical thinking skills In addition, there are three key historical thinking skills3. Firstly, is chronological thought. This often encompasses the sequencing of historical periods and the development of a ‘sense of period’, which involves an understanding of the similarities, differences and changes between historical periods (De Groot-Reuvekamp et al., 2014). However, the exploration of one specific historical topic in this thesis (WWI) meant that these aspects

3 There are additional thinking skills that are outlined as important to history learning in the National Curriculum (DfE, 2013), such as comparing/contrasting and recognising the significance of historical events. However, this thesis focuses on chronological thought, causal thought and historical enquiry. Chronological and causal thought were selected because this thesis intends to focus on the development of an understanding of historical events and a previous historical era, rather than on an understanding of these events in relation to other time periods/the present day (which thinking skills such as comparing/contrasting and recognising significance would address). Because of the concerns about how young learners might respond to information presented in narratives in terms of its truth-value, historical enquiry was selected as a focus to explore learners’ perceptions of the information presented. 50 were beyond the scope of this research. Chronological thinking with regards to this research will instead encompass the ability to recall chronological information, and to chronologically sequence events within a time period.

The second historical thinking skill is causal thinking. In the absence of level descriptors for history skills in the 2014 National Curriculum, Freeman (2015) suggests that by the age of nine, pupils should be beginning to suggest the causes and consequences of particular events and changes which they have studied, and that by the age of eleven, they should be able to identify and describe both short- and long-term causes and consequences. Difficulties with understanding causation have been associated with learners lacking the appropriate vocabulary to express ideas about causation: this results in children seemingly making huge leaps from ‘one premise to an unconnected conclusion’ (Cooper & Dilek, 2007:722).

The final historical thinking skill is historical enquiry/interpretation. This involves the ability to locate sources and information, interpret content, evaluate sources’ reliability, validity and authenticity, and to ‘understand how our knowledge of the past is constructed from a range of sources’ (DfE, 2013:189). Learners often feel that historical truth is expressed by teachers and texts, and is shown through photographs and film (Gabella, 1994); instead, learners need to develop an understanding that the writers of a text or the creators of an artefact have their own perspective, which influences how they construct the text/artefact, and therefore how we interpret it (Fillpot, 2012). This is a skill that develops with age. Within a participant sample aged 7- to 14-years-old, it was found that younger children often believed historical accounts at face-value, whilst older children understood that accounts were affected by author biases (Lee, 1998, cited in von Heyking, 2004). However, whilst it has been shown that learners appropriately judge the reliability of a source, this skill might not always be applied within the context of history learning. Barton (1997) found that although fourth and fifth grade students (aged 9- to 11-years-old) could assess the reliability of sources, none of them referred to the evidence, or its perceived reliability, when asked to give their opinion on what happened at the historical event studied, but simply described what they thought must have happened.

51 These historical thinking skills might overlap with either substantive or second-order knowledge. In terms of substantive knowledge, learners may acquire facts about history that relate to the dates of events, or the causal relationship between two events. In relation to second-order knowledge, learners might consider, for example, the causal relationship between two events in terms of the motives of historical figures causing events, and how history might have been altered if these motives differed. However, historical thinking skills are distinct from historical knowledge in that historical knowledge refers to the content and knowledge that learners acquire relating to history, whereas thinking skills refer to more specific skills that learners need in order to accurately construct and conceptually understand this historical knowledge (for instance, an understanding that the year 1914 chronologically precedes 2020).

3.3. A curriculum perspective of history learning The curriculum can be considered on three levels: the intended curriculum (as set out in official documentation), the implemented curriculum (how the intended curriculum is implemented by practitioners) and the attained curriculum (the outcomes of learners) (De Groot-Reuvekamp et al., 2014). To paint a holistic picture of the history curriculum, all three of these levels will be considered below.

3.3.1. The intended curriculum The 2014 National Curriculum for history at Key Stage Two (KS2) appears more content heavy than the previous 1999 National Curriculum. It provides a brief paragraph detailing the skills expected of learners, before providing a list of the content that pupils are expected to learn. In terms of content, the previous curriculum listed six historical topics to be covered throughout KS2 (DfEE, 1999), whereas the new curriculum proposes nine topics to be covered (DfE, 2013). Of these nine topics, some items listed are specific (e.g. the Roman Empire), whilst some are broader, within which schools can select specific topics (e.g. a study of an aspect or theme in British history beyond 1066). The document details what is required to be covered in KS2, but schools are able to make decisions about when and in which order to teach these topics. Cooper (2018a) highlights that there is no prescribed content within the topics listed, and this allows schools scope in selecting specific areas of enquiry within topics, and teachers creativity in the approaches taken to applying this content.

52 3.3.2. The implemented curriculum The way in which the history curriculum is delivered differs amongst teachers according to their personal beliefs and experiences of learning about history (Harnett, 2000). However, while history is often taught as a discrete subject, over recent years there has been a shift towards teaching history as part of integrated topic work (Historical Association, 2016; 2020): learners study an overarching topic (for instance, ‘Spies’), through which most foundation subjects will be taught (for instance, pupils may learn about Alan Turing’s historic role during WWII, but within the context of a computing lesson about encryption). Many of these overarching topics popularly taught across primary schools are history-based. Regardless of whether history is taught discretely or as integrated topic work, it is largely taught thematically, with individual time periods studied ‘as if nothing happened in between them’ (De Groot-Reuvekamp et al., 2014:501), despite the curriculum’s intention that history should be understood by learners as a ‘coherent, chronological narrative’ (DfE, 2013:188).

In terms of historical thinking skills, a survey of primary school teachers found that only 40-55% of schools felt that their planning covered the skills listed in the 2014 curriculum (including causation, chronology, change and continuity, evidence, interpretation and significance) ‘well’ or ‘very well’ (Historical Association, 2016). This might be due to either limits in the time available to teach these skills, or a lack of training on how to teach these skills. In terms of the former, it is suggested that ‘humanities continue to cling by their finger tips in the primary phase, especially in Years 5 and 6’ (Alexander, 2009:44, cited in Cooper, 2018b:616), as the general focus on the curriculum is on core skills, such as English and maths (Alexander, 2009). Although these points were argued in relation to the previous National Curriculum (DfEE, 1999), the demands of the 2014 National Curriculum remain similar. In terms of the latter, there are concerns over the provision of training in history teaching for primary teachers: 42% of a sample of primary teachers received fewer than two days training in teaching history during initial teacher training, whilst 63% received little to no continuing professional development in history teaching since qualifying (Historical Association, 2020).

53 3.3.3. The attained curriculum When schools were working under the previous curriculum, Ofsted’s (2011) History for All report found that pupils showed good knowledge of the past, and were able to research historical evidence. However, later Ofsted inspection evidence suggested that pupils’ knowledge and understanding of historical topics had declined since the History for All report, with fewer pupils leaving primary school with a good knowledge of history (Maddison, 2014). Critically, Ofsted’s (2011) report also found that pupils’ understandings of historical interpretations were ‘hazy’, and that knowledge of chronology was ‘underdeveloped’. However, it is difficult to paint a picture of the attained history curriculum in England: an absence of level descriptions to guide the assessment of attainment in history in the 2014 National Curriculum (DfE, 2013) means that no particular approach to assessing history is endorsed, leaving schools to develop their own methods of assessing progress and attainment in history (Freeman, 2015).

54 Chapter Four: Theoretical Framework In its approach to learning, this thesis adopts a social constructivist perspective. More specifically, in exploring narrative as a tool for learning, it takes on the perspective of cognitive narratology. The first section of this theoretical framework will briefly outline a social constructivist approach to learning, and more specifically, reading. Possible issues with this approach will be discussed, specifically, tensions between this approach and a curriculum view of history learning. In an attempt to resolve these tensions, a constructivist approach to historiography will be explored, to demonstrate how a social constructivist approach to learning can be applied more specifically to history learning. In the next section, this theoretical framework will consider cognitive narratology, exploring how this approach might synthesise with social constructivism to create a cohesive theoretical framework for this thesis. It will consider a cognitive narratology approach to reading, including how readers learn from texts, while suggesting how a social constructivist view of reading can integrate with this approach. Finally, as this thesis is concerned with how mental representations of texts might differ across narrative and expository texts, the last section of this theoretical framework will consider where the mental representations constructed whilst reading might fit into the described approach to reading. In doing so, it will propose a model of reading that encompasses both narrative and expository texts. This model will be considered and evaluated throughout this thesis.

4.1. A social constructivist approach to learning Social constructivism views learning as a process of knowledge construction, rather than an outcome; primarily, it adopts the position that a learner’s construction of understanding is the product of social interaction (Vygotsky, 1962), rather than resulting from individual, internal processes. Language is essential to learning because it precedes both thinking and knowledge (Powell & Kalina, 2009). From this perspective, learners negotiate and construct meaning collaboratively in the social space between themselves and others (Mercer & Littleton, 2007), before internalising constructed meaning (Vygotsky, 1981). Vygotsky proposes that all learners have a zone of proximal development (ZPD): this is the difference between what a learner can achieve independently, and what they can achieve with the support of others (Vygotsky, 1978). In constructing new learning, peers or teachers provide scaffolding (Wood et al., 1976)

55 to support a learner in achieving higher levels of thought than they are capable of achieving independently. Applied in the classroom, this approach places an emphasis on the relationships built between learners, teachers and peers, with a focus on providing guidance and facilitation, rather than instruction (Adams, 2006). As this approach does not believe that constructing learning reflects the discovery of an objective, external reality, the validity of knowledge is thought to be judged by consensus between individuals constructing knowledge. Therefore, when a learner ‘misunderstands’ something in a classroom, they are thought to have ‘inadequately synthesized information in order to relay a socially acceptable interpretation’ (Adams, 2006:246).

4.1.1. Reading and meaning making from a social constructivist perspective The above focuses on the importance of the social space created between individuals in the process of learning. This thesis argues that a social space also exists between a reader and a text. Bakhtin (1981) suggests that at least two ‘voices’, or individuals, must interact for meaning to be constructed: whilst traditionally, these are a speaker and a listener, Bakhtin (1986) argues that reading is no less dialogic than face-to-face conversation. From this perspective, a text may be considered a ‘voice’ – or a representation of the author’s voice – that a reader socially interacts with to construct meaning. When reading, the social space between the reader and the text is the primary site of meaning making. Both the text and reader are essential to establish this social space for meaning construction. Without the reader to interpret a text, a text consists only of symbols on a page: sentences themselves do not formulate or carry definite meaning (Bransford et al., 1972), but are ‘merely ink and paper until someone reads [them]’ (Beach, 2010:23). Rather, people carry meanings; the linguistic elements of sentences initiate the construction of meaning in readers, and to some extent, guide meaning making processes (Bruner, 1966; Kintsch, 1972). Therefore, reading is a process of meaning generation, not meaning reception (Spivey, 1997).

This thesis is not only interested in the negotiation of meaning between readers and texts, but how readers might subsequently learn from texts. From a social constructivist perspective, interacting with and learning from a text can be considered in four consecutive stages: meaning construction in a social space, internalisation of meaning,

56 restructuring of the reader’s conceptual system, and finally, new understanding or knowledge (Liu & Matthews, 2005). These stages will be outlined below.

4.1.1.1. Meaning construction in a social space & internalisation of meaning The first stage of reading involves the construction of meaning in the social space between the reader and the text. This occurs through interpsychological, or intermental, dialogue: the text initiates the construction of meaning, and the reader draws on their prior knowledge of the world (Anderson & Pearson, 1984) and other cognitive and affective influences (Braunger & Lewis, 1997) to co-construct meaning in the social space between themselves and the text. Following this, intrapsychological, or intramental, dialogue occurs: this involves a reader internalising the meaning constructed during interpsychological dialogue (Vygotsky, 1978). This interplay between interpsychological and intrapsychological dialogue allows readers to construct new meaning and knowledge that they may not have been capable of constructing independently. This process is comparable to scaffolding (Wood et al., 1976), as the text facilitates the reader in reaching higher levels of thought.

4.1.1.2. Restructuring of conceptual system and creation of new knowledge or understanding Novel meaning that is internalised through interpsychological dialogue is then integrated into a reader’s long-term pre-existing knowledge structures. This constitutes learning from a text. Pre-existing knowledge structures contain both specific and generic knowledge structures: specific knowledge consists of memory representations, for instance of experiences or other texts, whereas generic knowledge structures consist of more generalised knowledge such as schemata and stereotypes (Graesser et al., 1994). Both the stored representations and the way in which they are stored are flexible, and therefore open to change (Squire, 2004).

In considering how socially constructed information is internalised into these pre- existing knowledge structures, it is useful to draw on the work of Piaget (1952). When a reader internalises newly constructed knowledge, one of two processes might occur. The first is assimilation: if the new knowledge fits cohesively into an individual’s pre- existing knowledge structures, it can be assimilated directly into these. The second

57 process is accommodation, which occurs if information does not integrate cohesively into pre-existing knowledge structures; in this case, pre-existing knowledge structures may be altered to accommodate for new knowledge. These two processes suggest that the process of learning is not a linear path. Rather, a learner navigates a matrix of ideas (Prawat, 1992): they can enter the matrix from different directions (according to their different starting points in terms of pre-existing knowledge) and travel through the matrix in various directions as they assimilate and accommodate new information into knowledge structures. Along the way, they will continually reorganise their knowledge as they reflect on and accommodate for new information.

4.1.2. Possible difficulties with adopting a social constructivist perspective Firstly, it might be argued that reading a text does not equate to a ‘dialogue’ in a social space. Although the text provides information to initiate the reader’s construction of meaning, the text is static, and therefore unable to respond to the unique interpretations of the reader, or to actively negotiate meaning with the reader. It is even suggested that ‘the semantic core of a source text or the identity of an author [is] lost in a matrix of [the reader’s] prior textual traces’ (Greene & Ackerman, 1995:406): a text’s central meaning can be highly distorted as the reader applies prior knowledge to a text. As such, a reader’s interpretation of the text may strongly diverge from that intended by the author. However, regardless of the strength of influence of a reader’s prior knowledge, it is the text that initially activates meaning construction, and that guides the reader in their activation of prior knowledge. This continues as a dialogue because meaning construction is a continuous process during reading: the text provides information that the reader interprets, and then provides more information which might encourage the reader to reflect back on previous interpretations, thus creating a dialogue between the two.

Furthermore, constructivism has been criticised for its epistemological relativism: there exists no absolute truth, and no one individual’s perception of truth is superior to another’s (Liu & Matthews, 2005). This criticism is particularly pertinent both in regard to the education system, which expects learners to gain a conventional set of understandings about the world around them, and in regard to history learning more specifically. One form of historical knowledge is substantive knowledge, which involves

58 the acquisition of the content, or ‘facts’, of history. If, from a social constructivist perspective, there exists no absolute truth, then there can be no absolute substantive historical truths for learners to acquire. In exploring historiography from a constructivist perspective, the next section will attempt to provide a more detailed explanation of how social constructivism can be applied to history learning, despite its epistemological relativism.

4.1.3. History from a constructivist perspective A realist theory of history states that historians aim to discover historical facts to be able to describe or explain what happened in the past – an endeavour in which they do not always succeed (Nowell-Smith, 1977). As we are unable to directly observe historical occurrences, this understanding of history is born from interaction with historical artefacts, documents and testimonies: we build theories that explain the existence and nature of this historical evidence (Nowell-Smith, 1977). Realists see what Goldstein (1962) calls ‘the real past’, or the past as it actually happened, as the touchstone against which the truth of historical theories must be measured, or falsified. However, as we cannot directly perceive the past, constructivist theories claim that it is impossible to know what actually happened in the ‘real past’, and therefore impossible to use it as a touchstone for truth. Constructivist theories instead propose that historians construct ‘an historical past’ (Goldstein, 1962), which is a product of the historian’s mind. This reflects a broader constructivist theoretical perspective, and in particular, Bruner’s (1986) concept of possible worlds. Possible worlds are mental representations of reality and are constructed to represent possible realities that we can imagine and experience. As such, an historical past constructed by an historian is a type of possible world, representing their perspective of a possible past. Similarly, history learners can construct their own possible worlds, or historical pasts, using the resources available to them (Wineburg, 2001). This illustrates that history learning is not about learners digesting pre-emptive explanations, but about learners developing their own understanding by organising and contextualising information which is not always completely verified (Bruner, 1996).

The approach to history learning outlined above does not sit neatly with history learning as dictated by the National Curriculum, which suggests that learners should gain a set of

59 conventional understandings about history. Bruner (1986) argues that no one possible world is more correct than any other possible world, as there is no independent reality against which these can be measured; from this, it can be assumed that no one historical past is more correct than any other historical past. This can present problems in history education. For instance, that the beginning of WWI was announced in Britain in 1914 and closed with the signing of the armistice in 1918 is considered a fact: if a history student believed, in their possible, historical world, that WWI began in 1918 and ended in 1922, any history teacher would strive to alter and ‘correct’ this. Therefore, we must hold the belief that the history teacher’s possible, historical world is correct, and the learner’s is not. In light of this, some notion of historical truth must exist. However, Bruner argues that such statements are derived ‘from the nature of language rather than from the world’ (1986:45): time is a social construction that does not exist independently of the human mind, and therefore the statement that ‘WWI began in 1914’ is true only in terms of language.

In consideration of other examples, Bruner’s concept of there being no absolute truth might be usefully applied to history education. Take the example of the mystery of Richard III’s twisted spine. Despite being depicted by Shakespeare as a ‘poisonous bunch-backed toad’ (Shakespeare, 2000, 1.3:246), many historians believed that such descriptions of Richard III as a hunchback were the result of Tudor or Shakespearean propaganda: consequently, in their possible, historical pasts, Richard III was not a hunchback. However, upon the discovery of Richard III’s body in 2012, his spine was discovered to be crooked. Here, a belief held by some historians was falsified due to emerging evidence, thus requiring historians to alter their perceived historical pasts accordingly. This demonstrates that it may be beneficial not to consider absolute truths in history. Rather than constructing a factual image of ‘the real past’, historians construct an historical past which contains beliefs about the past. Whilst no belief is correct or incorrect, these beliefs can be assessed in terms of probability of truth, using the evidence and information available at the time. For instance, there is much evidence to support the fact that WWI began in 1914. Therefore, this is a true belief, unless new evidence comes to light to falsify this fact. In this sense, history is a ‘negotiated practice offering provisional knowledge’ (Bage, 1999:33): historians take historical artefacts, negotiating their meaning and what these artefacts tell us, in order to socially construct

60 some form of collectively accepted set of true beliefs about history (Wineburg, 2001). It is this collectively accepted set of true beliefs that a history learner’s conceptual understanding of history may be compared against. This aligns with a social constructivist perspective, in which the veracity of historical knowledge is judged by consensus amongst individuals (Adams, 2006). As long as historians are open and self- critical about how they select and construct historical true beliefs, the knowledge they create is valuable and can ‘vastly enrich our sense of the possible’ (Bruner, 1996:96).

Whilst the nature of historiography is not pertinent to this thesis, developing ontological and epistemological perspectives relating to this can help us to understand how children may learn about the past. Similarly to historians, history learners construct their own historical pasts in relation to the artefacts, evidence and information with which they interact, alongside texts, teachers and peers in the classroom. When assessing a learner’s historical past, we should consider how viable the learner’s understanding is in relation to the socially constructed, collectively accepted set of true beliefs about history, constructed by historians. When considering misconceptions, this thesis will regard these as inaccurate conceptual understandings, rather than incorrect. When this research discusses learners’ development of a substantive knowledge of history, it is referring to their development of an understanding of a set of socially constructed, collectively accepted true beliefs relating to history.

4.2. Cognitive narratology Underlying the approach of cognitive narratology is the idea that narrative functions as a powerful tool for thinking and cognition (Herman et al., 2005), both enabling and extending the mind (Herman, 2013): narrative is one of the most basic ‘vehicle[s] of human knowledge’ (Richardson, 2000:168). Cognitive narratology is interested in the role of narrative in the interrelated areas of perception, language, knowledge and memory of the world (Jahn, 2005), that is, how these different areas interact in order for individuals to make sense of the world around them. Herman (2013) distinguishes further between different strands of cognitive narratology. Firstly, he distinguishes between using the cognitive sciences to explore narrative, and using ideas from narrative theory to explore the cognitive sciences. The former of these two is more pertinent to this research, as its primary aim is to utilise ideas from the cognitive

61 sciences to illuminate how narrative might influence learning. Secondly, Herman distinguishes between narrative as a target of interpretation and narrative as a resource for sense-making. Narrative as a target of interpretation relates to how individuals comprehend and interpret narratives in unique ways. Narrative as a resource for sense- making refers to how individuals use narrative to understand the world around them, or how individuals use narrative as a mind-enabling and mind-extending tool. The latter is crucial to this thesis, as it explores how narrative can be used to make sense of the world around us, and the historical world that preceded us. However, narrative as a target of interpretation will also be considered; a reader constructs unique meaning in the social space between themselves and a text, and this is an important consideration when using narrative as a teaching tool. Further comparisons between cognitive narratology and social constructivism will be discussed below, to illustrate how the two approaches can complement one another.

4.2.1. Synthesising social constructivism and cognitive narratology Cognitive narratology focuses on how internal, cognitive processes influence how we interact with and interpret narratives, whereas social constructivism emphasises the importance of external, social processes in interpreting narratives: it might initially appear that these two approaches do not synthesise well. However, the two approaches share various common elements which make them compatible; these will be discussed below.

Firstly, although cognitive narratology focuses primarily on how narrative interacts with cognitive processes, it accepts that narrative is a socially embedded psychological tool (Herman, 2013). ‘Psychological tools’ are required for thought and emerge through interactions with others, for instance, language (Vygotksy, 1962). Therefore, narrative arguably constitutes a psychological tool in that it initiates meaning construction and thought, and narrative meaning emerges through the interaction between a reader and a text. Furthermore, the sense-making processes initiated by narratives are not internal, but rather ‘tellers and interpreters… construct and jointly evaluate’ mental representations of texts (Herman et al., 2005:349) in response to narratives. This is similar to social constructivism, which sees meaning as being negotiated and constructed in the social space between two individuals during reading: the narrator

62 (the author, or text) and the interpreter (the reader). Therefore, both approaches accept some form of social interaction as a prerequisite to the cognitive internalisation of socially constructed meaning and knowledge.

In addition, both social constructivism and cognitive narratology highlight the importance of individual interpretation in the construction of meaning. Social constructivism purports that learning involves the construction of knowledge within a social space, not the discovery of an external reality: therefore, all individual’s perceptions of the world are viably unique according to their social experiences and interactions. Similarly, cognitive narratology sees that all narratives represent the formula of ‘seeing X as Y’, in which X is an R-world (the real world) and Y is a P-world (a phenomenal world, as it is individually perceived) (Jahn, 2005). This suggests that no narrative can offer an objective view of an external reality, but that narratives always provide a unique perspective, or come with a particular lens through which the reader can attempt to perceive the world presented.

4.2.2. Narrative theory and reading Pertinent to this research is firstly the way in which readers understand and interpret narrative texts, and secondly, how understanding of a text may be translated into a conceptual understanding of the historical world that precedes the reader. The discussion below considers how readers understand, interpret and learn from narratives in three separate stages. Whilst these stages have not been explicitly defined in the literature, the elements of each stage are commonly discussed throughout narrative theory literature on reading comprehension. Table 2 (below) considers how the stages map onto the social constructivist stages of reading outlined above in section 4.1. The first stage of understanding a narrative text involves the reader obtaining a literal understanding of the events in a narrative, through the use of technical literacy skills such as decoding words. Whilst this stage does not relate to any of the stages of reading from a social constructivist perspective, the ability to complete this stage allows the social constructivist stages of reading to commence. The second stage involves more interpretive processes, in which the reader actively interacts with the text to create deeper, more subjective levels of meaning and understanding; this stage relates to the social constructivist stage of reading in which meaning is constructed in the social space

63 between the reader and the text. In the final stage, the reader integrates any understanding and information that they have gained from the text into their current repertoire of knowledge. This stage relates to the social constructivist notion of internalising understanding that has been constructed socially, and restructuring the conceptual system to construct new knowledge and understanding. These stages do not necessarily occur consecutively; for instance, stages one and two may occur simultaneously, as might stages two and three. Each of these three stages will be discussed in more detail below.

Table 2: Comparison of social constructivist and narrative theory stages of reading

Social constructivist stages of reading Narrative theory stages of reading - Stage 1 – Accessing the narrative Stage 1 - Meaning construction in a social Stage 2 – Interpreting the narrative space Stage 2 – Internalisation of meaning Stage 3 – Restructuring of conceptual Stage 3 – Translating the narrative systems into learning Stage 4 – New knowledge/understanding

4.2.3. Narrative theory stages of reading

Stage 1 - Accessing the text The first stage requires a reader to access the text on the most foundational level. This initial stage of understanding requires technical reading skills, such as decoding and knowledge of relevant lexis (Egan, 1988; Rassool, 2009). It involves decoding the text to gain an understanding – in a very literal sense – of the meaning of words combined, and therefore an approximate overall meaning of the text. If a text is at an appropriate level for a reader, this stage should be successful, and acts as a gateway into deeper comprehension processes. As such, this stage is less relevant to the aims of this thesis, and is only briefly outlined here.

Stage 2 – Interpreting the text This stage has its foundations in a social constructivist approach to learning: learners do not passively respond to readily formulated meaning in a text, but actively interact with it to construct their own, individualised meaning (Fludernik, 1996). This interaction is considered a dialogue between the author (or indeed, the text) and the reader, during 64 which meaning is negotiated and mediated by the purposes of both interactors (Bage, 1999). This raises the question as to the degree to which each the text and the reader construct meaning, and how this meaning is negotiated between the two.

Whilst acknowledging that a reader does interact with a narrative, Egan (1988) suggests that the text is dominant in determining a reader’s final, affective response to the text. He claims that at the beginning of a narrative there are infinite possibilities as to how the narrative might progress; the reader navigates these, predicting potential outcomes. However, the number of possibilities decrease towards the middle of a narrative, as further events unfold, and are reduced to one, final possibility in the text’s conclusion. Therefore, although readers may follow different paths through a narrative, the final, affective meaning is fixed. However, this suggests that every reader will ultimately respond to a text in the same way. Whilst it may be true that particular endings are designed to make readers respond in a certain way, it is an oversimplification to suggest that narratives predetermine a collective response. Rather, a reader’s feelings towards a particular ending – and all events throughout the text – are influenced by the reader’s personal experiences, experiences of similar events (Sarbin, 1986; Fludernik, 1996; Nicolopoulou, 1997; Beach, 2010), and their emotional experiences in relation to particular events (Mar et al., 2011).

Conversely, Bruner (1986) states that all narrative texts have relative indeterminacy: a narrative does not formulate meaning, but allows for the generation of a variety of actualisations on the part of the reader. Narratives ‘guide a search for meanings among a spectrum of possible readings’ (Bruner, 1986:25). In this case, narratives cannot truly be understood and judged by looking at the text, but rather, the final outcome of a reader’s interaction with a text (their actualisation of the text) must be considered. Herman (2013) establishes a middle ground between that of Egan and Bruner, suggesting that narrative texts provide a blueprint, or outline, which the reader uses to actively construct a mental representation of the narrative, or what Herman calls the reader’s unique storyworld. A storyworld is ‘the world evoked by a narrative text or discourse; a global mental model of the situations and events being recounted’ (Herman, 2009a:193), encompassing both space and time (Ryan, 2019). Whilst one narrative text provides the same blueprint for all readers as a common starting point,

65 many unique actualisations, or storyworlds, can be constructed from this initial blueprint. The construction of this storyworld is thought to be initiated by textual clues – such as deictic expressions4 – in a narrative. A reader’s storyworld will be continually remodified as they continue to interact with the narrative. Once a storyworld has been constructed, a reader can transport, or relocate, themselves into this world (Gerrig, 1993; Herman, 2005): they become detached from the real world and immersed in experiencing the storyworld (Mar & Oatley, 2008). This process of transportation allows for the suspension of disbelief: the reader can temporarily accept concepts that are unfamiliar, or even unbelievable, to allow for coherence in the storyworld (Bruner, 1986). Research supports this notion of transportation: participants were slower to respond to a tone when reading narratives than expository texts (Britton et al., 1983), suggesting that they were more absorbed in the reading of the narratives.

A further consideration is, once the construction of a storyworld has begun, what interpretive processes readers use to construct their storyworld. Interpretive processes are necessary when interacting with narratives in order to organise and contextualise information given in a narrative, which cannot be verified in other, more disciplined ways (Bruner, 1986). When trying to organise the information in a narrative, a reader will address the gaps in the narrative’s blueprint, in order to establish cohesion. According to Nathanson (2006), these gaps encourage the reader to make inferences and judgements, which are influenced both by clues in the text and the prior knowledge of readers (Sarbin, 1986; Stanovich, 1994). Readers use prior knowledge to make knowledge-based inferences, either online (whilst interacting with a narrative text) or offline (after interacting with a narrative text) (Graesser et al., 1994). Whilst online inferences often directly reflect specific aspects of a reader’s pre-existing knowledge structures, inferences made offline are more novel inferences, which are produced after various cognitive processes which involve searching memory and taking information from various sources of information within a reader’s pre-existing knowledge structures. Therefore, whilst inferences are somewhat limited by a reader’s prior knowledge and

4 Deictic expressions are words or phrases which refer to the time, place, or situation in which a protagonist is acting. These include pronouns, demonstratives and tense, for instance, this, that, now and then. 66 experiences, readers do have the capacity to formulate novel inferences. As a result, the reader constructs their own, individualised storyworld.

Stage 3 - Translating the narrative text into learning Once readers have developed an individualised interpretation of a text – their unique storyworld – a third stage of processing takes place. This final stage involves readers transferring information from their storyworld into pre-existing knowledge structures, therefore learning new information from the text. Whilst both the storyworld and pre- existing knowledge structures exist within an individual’s mind, the storyworld is comparable to some kind of fantasy world: the information stored in it is hypothetical, unrealised, and not yet used by individuals to inform their knowledge of the world. While readers use their pre-existing knowledge structures to help them understand and interpret a narrative, their interpretation of a narrative may also inform these structures, thus influencing the beliefs that a reader holds about the world in which they live (Sarbin, 1986) (see Figure 2).

Figure 2: The cycle of using and informing pre-existing knowledge structures, through narrative thinking

Pre-existing knowledge Narrative structures Narrative thinking thinking Storyworld construction

Some pre-existing knowledge structures might exist in the form of possible worlds (Bruner, 1986). As possible worlds are mental representations of reality and the world around us, and of alternative possible realities that might exist (Bruner, 1986), they encompass pre-existing knowledge relating to these various versions of reality. An example, as previously discussed, are historical pasts (Goldstein, 1962), which contain knowledge relating to specific, historical time periods. The construction of such possible worlds is likely to be made possible by the episodic memory, which enables mental time travel, and consequently the imagining of possible past and future happenings (Tulving, 2005).

67 The concept of a possible world is similar in nature to Herman’s (2013) concept of the reader’s storyworld. The main difference is that Herman’s storyworlds are constructed specifically to represent narratives, whilst possible worlds are constructed to represent a particular reality. However, the underlying similarities between a reader’s possible worlds and their storyworlds may make it easier to transfer information between the two. This parallel between storyworlds and the reader’s internal representation of the ‘real’ world has also been noted by Beach (2010), who suggests that the reader’s storyworld has a counterpart in their private, real-life narratives. Prior knowledge from a reader’s possible worlds is used to inform the construction of their storyworld, and any new knowledge constructed within the storyworld may then be transferred back into a reader’s relevant possible worlds, resulting in learning. A similar process – of transferring knowledge from an imaginative narrative into daily life – has been observed in children when they engage in ‘playworlds’ (Hakkarainen, 2006).

It may be questioned why this new knowledge is constructed in a storyworld first, rather than inserted directly into a possible world. This might be for a number of reasons. Firstly, the storyworld presents a framework in which the reader can place unfamiliar knowledge, allowing this knowledge to become contextualised (Szurmak & Thuna, 2013). This process of embedding information in a context within the storyworld is important: new knowledge and concepts cannot be understood when presented in isolation, because individuals do not deal with the world in terms of single events, but they frame these events within larger structures (Bruner, 1990). Once a concept is understood within the context of a storyworld, it is argued that the reader can then leave details of context behind in their storyworld, extracting context-free information to insert into their possible worlds (Gerrig, 1993). Gerrig & Prentice (1991) have found evidence which they argue supports the existence of this process. Participants read two versions of a story, which contained both true and false context-free statements (assertions which are not tied to a particular setting) and true and false context details (information specific to a particular fictional world). They were later asked to state whether various context-free assertions or context details were true or false in real life. They found that participants took longer to verify whether context-free assertions were true or false, suggesting that this is because the context-free information provided in the stories was integrated into long-term memory structures, and therefore affected

68 participants’ views of the real-world, whereas context details were not integrated into longer term memory, and therefore did not interfere with participants’ perceptions of real life. However, it might be argued that the context details more starkly contradicted or more clearly aligned with reality (e.g. the vice-president of the United States is George Bush/Geraldine Feraro) than the context-free assertions (e.g. penicillin has been a great benefit/has had bad consequences for humankind), thus leading participants to take slightly longer to consider context-free assertions. Finally, the storyworld provides a safe, experimental, hypothetical context – an ‘experimental laboratory’ (Hakkarainen, 2008:293) – in which new knowledge can be negotiated and explored. Within this context, individuals might assess the viability of information, where best it might fit into their current possible world, and whether any elements of their current possible world might have to be adapted to accommodate for this new information: using Piaget’s (1952) terms, they will consider how to assimilate and accommodate new information. This process of inserting unfamiliar knowledge into a storyworld (using supporting prior knowledge) before extracting the knowledge into a possible world reflects the idea that knowledge does not enter a learner’s mind as ‘a straightforward copy, but a new, personal reconstruction’ (Wells, 1992:286).

An example may help to illuminate this process further. A reader may have a possible world (or an ‘historical past’) representing a particular period in time, for instance, World War I. This possible world has so far been constructed using knowledge obtained from various sources, but may still be relatively incoherent and sparse in terms of knowledge and information. The reader goes on to interact with a narrative about WWI, negotiating meaning in the social space between themselves and the text, before internalising this to construct a storyworld representing this text. This storyworld will not only contain novel information about WWI provided by the narrative text (obtained from the ‘blueprint’ of the narrative), but also, the reader will activate their current possible world about WWI, in order to utilise relevant prior knowledge to make inferences about the text and build upon the ‘blueprint’ of the narrative. Once any new knowledge from the narrative has been framed in the narrative context of the storyworld, it may be transferred back into the reader’s possible world representing WWI.

69 4.2.4. Bringing together narrative theory stages of reading and mental representations of texts Previously, in Chapter 2, the three levels of mental representation of a text constructed whilst reading (surface form, textbase and situation model) were discussed, and possible issues with these were outlined. A primary concern was how these mental representations, specifically situation models, might differ across naturalistic narrative and expository texts, and what implications this might have for learning from these different text types. If mental representations do differ across text types, narrative theory might help to shed some light on where and how narrative representations might diverge. Therefore, synthesising these two approaches may allow a greater understanding of how readers respond to different text types in unique ways. Figure 3 (below) presents a proposed model on how these two approaches can be synthesised. As there is no suggestion regarding exactly how narrative and expository situation models might diverge in the current literature, the below model is proposed in relation to both narrative and expository texts; this research seeks to explore whether this model might be applied to interactions with both text types, or whether it needs to be altered in relation to specific text types.

70 Figure 3: A model of reading

Blueprint Stage 1 – (provided by narrative text) Accessing a text Process of decoding the text and comprehending small units of meaning

Textbase

Stage 2 – Interpreting Segmentation and reorganisation a text Pre-existing knowledge structures Inferences containing prior Situation models Related knowledge knowledge

Learning: assimilation and accommodation

Stage 3 – Translating text into learning

Firstly, the textbase and situation model must be considered in relation to the storyworld, to explore how these might overlap. Storyworlds are thought to exist as ‘containers for entities that possess a physical mode of existence… and as networks of relations between these entities’ (Ryan, 2019:63). This bears resemblance to the definition of a textbase, which contains propositions presented in a text and the relations between these (van Dijk & Kintsch, 1983). However, storyworlds also share common features with situation models. A storyworld is ‘the world evoked by a narrative text or discourse; a global mental model of the situations and events being recounted’ (Herman, 2009a:193); similarly, a situation model is a representation of events denoted by the text (Zwaan, 2016). In addition, both consider and trace the spatial, temporal and causal relations between events (Zwaan et al., 1995b; Thon, 2016), whilst integrating prior knowledge with textual information (van Dijk & Kintsch, 1983;

71 Herman, 2009a). Therefore, it seems that the concept of a storyworld equates more closely to that of a situation model than of a textbase. Similarly, Gerrig (1993) holds the view that situation models and storyworlds are theoretically similar. Until any further light is cast upon differences in the mental representations of narrative and expository texts, this thesis will use the term situation model to refer to the global mental representation of both expository and narrative texts.

The proposed model follows the stages of reading described above, but integrates the mental representations constructed by the reader into this model at the appropriate stages. At the first stage of accessing the text, the text provides a blueprint, which the reader uses basic literacy skills, such as decoding, to gain a literal understanding of. During Stage Two of interpreting a text, the reader will firstly construct a textbase, which represents the relations between the propositions in the blueprint. The reader will then construct situation models to represent the various situations denoted in the text. Where discontinuities occur in either time, space, causality, intentionality or protagonist, new situation models may be constructed to represent the new situations portrayed. Readers will draw on relevant prior knowledge to establish cohesion and to make sense of the text in relation to what they already know. Situation models are unique to the reader, as they draw on individualised prior knowledge. They may also deviate from the structure of the original text and the textbase, as readers reorganise textual information to integrate prior knowledge. Finally, in the third stage of reading, a reader will integrate situation models into pre-existing knowledge structures. This may occur either through linking situation models with other, similar situation models, or possible worlds, or through integrating particular information from the situation model into relevant, pre-existing situation models or possible worlds. This may require processes of assimilation and accommodation, during which readers may adapt either their pre-existing knowledge structures or the new information acquired from the text so that the knowledge sits cohesively together. While the figure above illustrates these stages in a linear fashion, it is not necessary for one stage to be completed to initiate the next stage. A reader may already have begun to construct a situation model, but as a reader continues to read and understand a narrative, their textbase may be altered, which in turn will cause changes to their situation model.

72 Chapter Five: Review of the literature – Reading to learn through narrative and expository texts This chapter will discuss current literature exploring the possible benefits and limitations of narrative and expository texts when reading to learn. Whilst this thesis focuses on the use of narrative nonfiction specifically, there is very little empirical research exploring the influence of narrative nonfiction on learning, and therefore this discussion will primarily explore research into narrative texts. Firstly, it will consider how effectively primary-age readers comprehend narrative and expository texts, discussing how typical differences between the two text types might influence how accessible these texts are to younger readers. Following this, it will consider empirical research exploring the use of these text types to learn across a range of subjects. Finally, this section will address one of the potential issues with using narrative to support learning: the ‘reader’s dilemma’ (Gerrig & Prentice, 1991). This relates to how readers distinguish fact from fiction in texts, in order to judge what information can be applied to reality. It is important to note that the research discussed throughout this chapter uses various terms to refer to different text types, such as ‘stories’ or ‘informational’ texts. While the terms used in the original research papers will be used in discussions below, it is assumed that stories and fictional texts equate to narratives, and that informational and nonfiction texts equate to expository texts.

5.1. Comprehending narrative and expository texts Research has shown that children often have greater difficulties comprehending expository texts than narratives (Dymock, 2005; Diakidoy et al., 2005). Not only have children been found to answer fewer comprehension questions correctly in response to expository than narrative texts (Best et al., 2008; Kraal et al., 2018), but think-aloud procedures have also shown that readers make more invalid inferences and irrelevant comments in response to expository texts (Kraal et al., 2018; Karlsson et al., 2018). A survey considering a total of 1,860,440 reading comprehension quizzes taken by 150,220 children in 967 schools across the UK (taken on the Accelerated Reading quizzing system5) showed that the average percentage of correct questions was lower

5 The Accelerated Reader software provides computerised comprehension quizzes on books which children choose from their school library. Once a book is read, a child takes a computerised quiz, which provides children and teachers with on their comprehension of the book. 73 for nonfiction books than fiction books, suggesting that nonfiction books are either not read as carefully or understood as fully as fiction books (Topping, 2015). This is thought to be a result of various typical differences between the two text types, including exposure, structure and content, each of which will briefly be considered below.

Young readers are thought to have more limited exposure to expository than narrative texts. In kindergarten classrooms (pupils aged 5- to 6-years-old), of time spent reading aloud, teachers spent 83% reading fiction and just 17% reading expository texts (Wright, 2013); continuing this trend, between preschool and third grade (aged 4- to 9-years) it was found that fewer than 9% of all books read aloud were expository (Yopp & Yopp, 2006). In first grade classrooms (aged 6- to 7-years), few expository texts were found in classroom libraries, and on average, just 3.6 minutes per day were spent using expository texts during written language activities (Duke, 2000). Whilst the research cited here was conducted in American schools, there is little reason to believe that a dramatically different picture would be observed in UK schools. In terms of a solution to this issue, increasing the number of expository texts available in classroom libraries has not been seen to increase comprehension of these texts (Baker et al., 2011); alternatively, it is suggested that actively using informational texts to learn during inquiry-based lessons can enhance comprehension (Maloch & Horsey, 2013). Therefore, difficulties with expository comprehension may be resultant of limited meaningful interactions with expository texts within the classroom.

In terms of structure, expository texts can be structured in numerous ways (Meyer, 1975; 1985), which children can struggle to ‘see’ (Hall et al., 2005; Hall-Kenyon & Black, 2010). Without a clear, familiar structure, a text is just a ‘sea of words’ (Dymock, 2005:178). Whilst instruction in text structure can support expository comprehension (Hall et al., 2005; Meyer & Ray, 2011; Hebert et al., 2016), much of this research considers instruction on only one possible text structure (Bohaty et al., 2015). However, Hebert et al.’s (2016) meta-analysis of research exploring the impact of text structure instruction on expository comprehension found that research projects that provided instruction on a greater number of expository text structures resulted in statistically significantly larger effects than research in which fewer text structures were taught. In contrast, narratives typically follow a story grammar (Rumelhart, 1975), meaning that

74 they generally share similar constituents and overall structures. Therefore, readers are familiar with the typical structure of narratives, and this is thought to support comprehension.

Finally, in terms of content, expository texts are usually based on a specific topic. As an expository text’s primary purpose is typically to inform a reader about a topic, this topic is often relatively unfamiliar to the reader. As such, if expository texts are less cohesive, readers can struggle to make appropriate inferences to establish cohesion, because of their lack of relevant prior knowledge (Graesser et al., 2003). In addition, expository texts often contain technical vocabulary that is specific to the topic of the text, which is less likely to have been encountered previously by the reader (Wright, 2013). This can create barriers to understanding. In contrast, narratives do not focus on a particular topic as such, as they observe various events unfolding over time. Therefore, content and vocabulary is varied, and not usually as common a problem to readers during narrative comprehension. However, barriers can be overcome by building and activating a reader’s prior knowledge before interacting with expository texts (Maloch & Horsey, 2013), and by teaching children how to use contextual clues to figure out the meaning of unfamiliar words (Kuhn et al., 2017).

Overall, it is evident that expository texts can create comprehension difficulties for younger readers, especially when the topic of the text is unfamiliar. However, it appears that these difficulties can be addressed with suitable teaching approaches, to enhance expository comprehension. This research seeks to explore whether narrative might have additional benefits over expository texts once these potential barriers to expository comprehension are removed, and therefore these factors will be controlled for in texts used in this research (see section 6.4.4 for further details on how these factors were controlled for in the construction of texts used). The next section will consider differences between narrative and expository texts that are utilised as learning tools.

5.2. Learning from narrative and expository texts More research is beginning to explore how both narrative and expository texts might support the development and retention of conceptual understanding in relation to various topics, but mainly in the area of science learning. This section will begin by

75 discussing research that considers the utilisation of prior knowledge and the construction of mental representations in response to narrative and expository texts. It will go on to explore research that directly compares narrative and expository texts and their influence on the development and retention of conceptual understanding of particular topics. The remaining discussion will explore research that considers the benefits of narrative independently of expository texts: this research encompasses the use of narrative to learn about unfamiliar, abstract concepts (specifically the theory of evolution), the relatability of narrative, and finally, how narrative might evoke interest in a reader, and thus be used as a learning stimulus, rather than directly as a learning tool.

5.2.1. Mental representations and prior knowledge in relation to narrative and expository texts Research has been conducted to explore differences between narrative and expository comprehension in terms of the utilisation of prior knowledge and the construction of mental representations. Wolfe & Mienko (2007) used three different texts (one narrative and two expository) to teach 90 university students about the circulatory system. Firstly, participants completed a questionnaire that assessed prior knowledge of the topic. They were then given five minutes to read one of the three texts, being instructed to read as though studying for a test. The narrative was about a protagonist who built a machine that shrank him; he was then sucked into a woman’s body, and had to travel through it to find a way out. Whilst the plot was fictionalised, information about the circulatory system that was integrated into the narrative was entirely factual. Both expository texts contained identical introductory and concluding paragraphs to the narrative, as well as the same key content about the circulatory system. One text was sequential in nature, presenting information by tracing the path of blood around the body; the other was thematic, presenting information in three, separate topical paragraphs (the heart, blood vessels and blood). Length and readability was similar across all texts. Following reading, participants completed a three-minute filler activity, before a free recall task of the text to assess memory of information. Finally, they resat the questionnaire used at the beginning of the experiment, to evaluate learning.

76 Wolfe & Mienko (2007) found no overall difference between the different text types on either free recall or questionnaires. However, when prior knowledge was considered, differences emerged. Prior knowledge did not influence free recall when the narrative was read; however, in both expository conditions, participants with a greater level of prior knowledge recalled more text content than those with lower levels of prior knowledge. It was found that an intermediate amount of knowledge was optimal for learning from narrative texts, but that the optimal amount of knowledge was higher for approximately equivalent expository texts. In addition, during free recall, those in the expository conditions were marginally more likely to reverse the order of sentences (relative to the number of sentences recalled) than those in the narrative condition. The authors argue this indicates that those who read expository texts were more likely to draw on prior knowledge than those who read the narrative: readers of expository texts who had higher levels of prior knowledge integrated prior knowledge with textual information to construct situation models, which in turn supported free recall. Because readers reorganised textual information to integrate it with prior knowledge in the situation model, readers were more likely to recall information in a different order to that presented in the text. Conversely, those in the narrative condition constructed a mental representation of the text that preserved the order that information was presented in the text. The authors argue that this is because the reading goal associated with narrative is reading for enjoyment, which leads readers to seek cohesion, whereas the reading goal typical of expository texts is reading to learn, and therefore participants seek to integrate content with prior knowledge. As such, a lack of prior knowledge disrupted the comprehension and subsequent recall of expository texts.

Further research conducted with the same texts supported the original findings: on a post-test free recall essay, participants were more likely to recall expository texts in a list-like way, but narratives were recalled in a narrative form, making references to narrative features such as the protagonist (Clariana et al., 2014). Further to this, this later research found that pre-existing knowledge correlated with performance on post- test recall for expository texts, but not for narratives. However, in both studies, free recall occurred just three minutes after reading texts, and therefore these findings to do not indicate whether the utilisation of prior knowledge when reading expository texts might support the longer-term retention of textual information. In addition, if asked to

77 recall the texts, rather than recalling what had been learned from the texts, it is not surprising that participants recalled elements central to the narrative, such as the protagonist. If asked to recall what had been learned from the texts, perhaps the narrative condition would also have reported this in a list-like way: as such, the instructions given at post-recall tasks might have influenced the way in which participants chose to recall information from texts, and subsequently, may have affected correlations between recall and prior knowledge in the narrative condition.

Elsewhere, eye-tracking research has suggested that reading goal, rather than text type, might influence the activation of prior knowledge. Yeari et al. (2015) found that reading times were longer when participants were instructed to read expository texts for study than for entertainment: with the reading goal of studying, reading times were increased because readers were activating prior knowledge to integrate with textual information, thus increasing cognitive demand. However, Wolfe & Mienko (2007) instructed participants to read both expository and narrative texts as though studying, and a difference in the use of prior knowledge still arose across text types. It is likely that a combination of text type and reading instruction (both external, given intentions) interact with a reader’s personal intentions to establish a reader’s reading goal (McCrudden & Schraw, 2007; McCrudden et al., 2010).

Wolfe & Woodwyk (2010) conducted further research comparing a narrative and an expository text in a think-aloud procedure. Texts were similar to those used by Wolfe & Mienko (2007), but shared ten identical ‘common sentences’ containing to-be-learned information about the circulatory system. The narrative followed a protagonist travelling around the circulatory system, integrating common sentences; the expository text embedded common sentences into basic information about the circulatory system. Information in both texts was presented in the same sequential order. Sixty-one university students completed a pre-assessment on the circulatory system, before reading a practice text (on an historical topic) to become familiar with the think-aloud procedure. Participants then read either the narrative or expository text; texts were read one sentence at a time from a computer screen and participants were required to speak aloud any of their thoughts once each sentence was read. Next, participants completed a 3- to 5-minute filler task. Participants were then required to recall as much

78 of the text as possible. Finally, participants sat a post-assessment, which was identical to the pre-assessment. The authors found that those in the expository condition made significantly more prior knowledge elaborations in think-aloud comments than those in the narrative condition. Whilst free recall for all textual elements was greater in the narrative condition, recall of common sentences was greater in the expository condition. Information was also more often recalled in a different order to that presented in the text in the expository condition.

These results of Wolfe & Woodwyk’s (2010) study replicate the findings of Wolfe & Mienko (2007), discussed above, in that the expository text increased the activation of prior knowledge; the authors suggest that this supported the construction of a situation model in which textual information and prior knowledge were integrated. Because the to-be-learned content was central to the expository text, and therefore to the related situation model, more to-be-learned content was recalled in the expository condition. Yeari et al. (2015) similarly found that participants dedicated more time to reading central information than peripheral information in expository texts; readers were connecting their prior knowledge to this central, textual information during situation model construction. In contrast, narrative processing focuses on the events of the narrative, and establishing relations and cohesion between these, rather than to-be- learned content. Therefore, while participants could recall more events from the narrative overall, they were less likely to recall to-be-learned content that was not central to the progression of the narrative plot. The authors argue that this is because narrative details distract readers from to-be-learned content. An alternative explanation arises from the research of Harp & Meyer (1998), who found that undergraduate students recalled significantly fewer key ideas from expository texts when they contained ‘seductive details’ (defined as interesting yet irrelevant additional details). Seductive details primed inappropriate schemas in readers, that were not appropriate for learning textual information. Whilst Harp & Meyer considered expository texts only, it is arguable that the same reasoning can be applied to Wolfe & Woodwyk’s research: reading a narrative primed a schema (or reading goal) in the reader that was inappropriate for the purposes of learning specific content. This is supported further by research showing that 16-year-old Dutch pupils produced significantly more emotive questions, which often reflected incorrect assumptions, in relation to a narrative on the

79 Industrial Revolution than an expository text on the same subject (Logtenberg et al., 2011): the schema activated might have initiated a more emotional response to the narrative text in comparison to the expository texts. However, this did not influence the overall development of conceptual understanding, as understanding of the Industrial Revolution was not found to be superior in response to expository texts compared to narrative texts.

Despite this argument, Wolfe & Woodwyk’s (2010) research showed that there was no difference between conditions on post-assessment scores: when given prompts in the form of questions, participants in the narrative condition were equally able to retrieve to-be-learned content as those in the expository condition. Perhaps during free recall this information was overshadowed by information important to the flow of the narrative: to-be-learned content was embedded in the mental representation of the text, but was not necessary to the cohesive recall of narrative events. However, the ecological validity of this research may be questioned: whilst the think-aloud procedure allowed an insight into participants’ thought processes during reading, it is questionable how natural these processes were. If participants did not produce a think-aloud statement in response to each sentence, researchers prompted them to do so: participants may have put more conscious effort into activating prior knowledge relevant to the text than they might during natural reading. Therefore, whilst expository texts might encourage prior knowledge elaborations to a greater extent than narrative texts, this effect might be stronger when readers are actively prompted to do so.

To explore the construction of textbases and situation models in response to different text types further, Wolfe & Woodwyk (2010) conducted a second experiment with four conditions: narrative immediate, narrative delay, expository immediate and expository delay. Forty participants were assigned to each of the four conditions. All participants completed a pre-assessment and were then instructed to read their assigned text for six minutes, as though studying for a test. In the immediate conditions, participants then completed a filler task, followed by a sentence recognition task. In this task, participants were shown various sentences and rated their confidence, on a scale of one to six, as to whether these exact sentences appeared in the text they had read. In the delay conditions, participants returned after two days to complete the filler and sentence

80 recognition tasks. Findings suggested that the textbase was stronger for the narrative than the expository text, although its presence decayed significantly over the two day delay. Marginally stronger situation models were observed for the expository text than the narrative, and were not observed to decay significantly over the two day delay. This supports the findings of the first experiment, that expository texts initiate the use of prior knowledge to construct situation models to a greater degree than narrative. In addition, they indicate that textual information was retained to some degree at least two days after encountering texts. However, the marginally significant difference observed between the presence of situation models for the narrative and expository text was recorded at a p-value of 0.059, which is not accepted as statistically significant in much research in the social sciences; no effect sizes were detailed to provide an insight into the strength of this effect. Therefore, whilst participants who read the narrative did show evidence of a stronger textbase, this does not exclude the possibility of these participants also having constructed a strong situation model, not considerably weaker than those constructed in response to the expository text.

The research discussed above considers more experienced readers (undergraduate students), who are not only more likely to have some level of prior knowledge related to the texts read, but may be more adept at utilising this than younger children. Therefore, differences observed between narrative and expository texts might be lessened with younger, less experienced readers. However, Best et al. (2008) similarly found that, with readers aged 8- to 11-years-old, comprehension of expository texts was strongly influenced by world knowledge, whereas comprehension of narratives was influenced to a greater degree by decoding skills. In addition, participants answered fewer comprehension questions correctly regarding expository texts than narratives. Similarly to Graesser et al. (1994), the authors conclude that a lack of prior knowledge can cause issues in the creation of coherent situation models when reading expository texts, as important inferences cannot be made. However, texts compared were basal readers (simple narratives used to teach children to read) and science textbooks: with such different texts, difficulties in accessing expository texts may be due to numerous reasons, such as the structure or content of texts, rather than issues with levels of prior knowledge alone.

81 Elsewhere, research supports that younger children can recall more from narrative than expository texts, but that prior knowledge can have a negative impact on recall. Kucer (2011) recruited 69 fourth-grade children (aged 9- to 10-years-old) to read either a narrative or an expository text aloud. The narrative text was the first chapter of ‘Who Stole the Wizard of Oz’; for the expository text, three sub-topics were taken from a text about rock formations. Both texts were six pages in length and of a similar readability level, although the expository text contained approximately 200 fewer words, and was organised into fewer, longer sentences than the narrative. Once texts were read, participants retold them. Retellings were recorded, and clauses in retellings were labelled as either matching or not matching the text. Clauses that did not match were then assigned a further label, which specified how they deviated from the text. For those who read the narrative, 18% of overall clauses did not match the text, compared to 59% for those reading the expository text: participants reading the narrative accurately recalled more of the text. Of the clauses that did not match the text, participants reading the narrative made significantly more conflicting statements, that contradicted the text. The authors suggest that this is because prior knowledge interfered with the process of interpreting the text: readers already held schemas about specific situations (such as visiting the library to borrow books), and when the text did not conform to these schemas (for example, the protagonist visited the library but did not borrow a book), participants misremembered this information. Conversely, participants had lower levels of prior knowledge relating to the topic of the expository text (rocks), and therefore prior knowledge was less likely to interfere with retellings. Further to this, it was found that during recall, those in the expository condition made significantly more summaries (statements that combined at least two distinct ideas in the text into one general idea) and substitutions (statements that modified an idea in the text, but in a way that was still semantically appropriate). It is thought that participants summarised more because this was a way of dealing with an overwhelming amount of new, unfamiliar information, and that they made more substitutions, which still made sense in the context of the book, because they had little prior knowledge and therefore difficulties remembering the information exactly, so filled any gaps with information that they felt was appropriate in the context of the texts. The authors concluded that differences observed in the processing of text types were a result of differing levels of prior knowledge, rather than text type. However, as texts were not controlled for in terms of any other features,

82 and as prior knowledge was not measured in relation to the content of either text, differences cannot be assumed to be attributed purely to levels of prior knowledge.

Wolfe (2005) explored further whether specific textual features might influence the ability to recall narrative and expository texts: 144 university students were recruited to read either three narratives or three expository texts. Eighteen texts were used in total: nine expository texts, which presented basic information on a specific topic, and nine narratives, which all followed protagonists striving towards particular goals. Following the reading of three texts, participants were asked to recall as much of the first text read as possible, followed by the second and third text. Each text was assigned a latent semantic analysis (LSA) value (indicating the level of semantic association in the text) and a construction integration (CI) model value (assessing text organisation). Mirroring the previous findings of Wolfe, overall recall was better for narratives than expository texts. CI values were predictive of the amount of information recalled from narrative and expository texts; LSA values were moderately predictive of recall for narratives, but highly predictive of recall for expository texts. This suggests that semantic associations are more strongly linked to memory of expository texts. Wolfe suggests that this might be because narratives are unpredictable, as the reader observes various events unfolding over time. In contrast, expository texts are predictable if the reader is not too unfamiliar with the topic. For instance, in a text on how the heart functions, a reader is likely to expect there to be information about blood, veins and so forth. Therefore, prior knowledge of the topic might trigger semantic associations, in turn supporting text recall.

A further experiment was conducted by Wolfe (2005) to explore whether familiarity of text content might influence semantic associations and subsequently text recall. Two pairs of narratives were created. In each pair, texts had the same structure, but different content: in one text, the content was expected to be familiar to the participants, and in the other, less familiar. Sixty-four university participants read two texts (one from each pair), completed 30 seconds of arithmetic questions, and then wrote as much information as they could recall from each of the texts. In the texts with familiar information, the LSA predictions were not significant, yet the CI predictions were. This suggests that, whilst semantic associations predict memory of a text to a greater extent

83 for expository than narrative texts, this is not due to the familiarity of content of the texts. However, it might be related instead to the diversity of text content: narratives are more likely to make reference to a diverse range of topics, whereas expository texts often focus on one specific topic, which has a related web of semantic associations.

The research discussed above largely suggests that prior knowledge is important to the comprehension of expository texts, and therefore, when readers have a good level of prior knowledge, situation models are successfully constructed in response to expository texts. Semantic associations may support readers of expository texts in connecting prior knowledge with appropriate textual information. This process supports the development of a conceptual understanding of the educational content central to these texts. However, a lack of prior knowledge when reading expository texts can lead to comprehension difficulties. Conversely, narratives are easier to recall a greater quantity of general information from, but not specific educational content. This may be because reading goals differ from expository texts, and therefore readers seek to establish cohesion in narratives, rather than to learn information. The research discussed in the next section will consider further research comparing narrative and expository texts which are based on the same content, to further explore how effectively readers develop and retain conceptual understanding from texts.

5.2.2. Development of conceptual understanding in response to narrative and expository texts Arya & Maul (2012) compared seventh and eighth grade students’ (aged 12- to 14-years- old) responses to narrative and expository texts, where the narratives did not contain any fictional elements, and thus constituted narrative nonfiction. The authors designed two ‘scientific discovery narratives’ (SDNs) and two expository texts that were equivalent to SDNs in terms of educational content. SDNs were entirely factual texts, written in the third person, which ‘observed’ scientists beginning to understand different scientific concepts. Expository texts presented information in a style similar to that of seventh grade science texts. One expository text was on radioactivity (the SDN related to Marie and Pierre Currie), and one on the Galilean telescope and its role in understanding space (the SDN related to Galileo). Participants were randomly assigned to one of four groups. In each group, participants were presented with an SDN on one

84 topic and an expository text on the alternative topic. The order of text presentation differed across participants. Participants were then given an assessment on texts, followed by an additional, different assessment on texts a week later. The authors found that participants scored higher on both of the SDNs than expository texts when assessed immediately after reading, but that this result was only significant for one of the SDNs. However, when tested a week after reading texts, participants scored significantly higher on both SDNs than expository texts. This suggests that while one SDN did support the development of conceptual understanding, both supported the retention of this understanding to a greater degree than expository texts. However, as the two assessments were not identical, these differences may be partially resultant of differences across the two assessments, such as which textual content they assessed.

Similarly, Maria & Johnson (1990) found that narratives support the retention of conceptual understanding developed during reading. Fifth and seventh graders (aged between 10- and 13-years-old) with misconceptions about seasonal change were recruited. Participants sat a pre-test, read one of three texts on seasonal change, took an immediate post-test and finally completed a delayed post-test one month later. Of the three texts, one was an inconsiderate expository text, which made no reference to misconceptions, another was a considerate expository text, which made reference to misconceptions, and the final text was a considerate soft expository text, which referenced misconceptions and embedded information in a narrative. Participants in the considerate soft expository text condition showed a better understanding of seasonal change at both the post-test and the delayed post-test than readers of the inconsiderate expository text; however, only at the delayed post-test did readers of the considerate soft expository show a better understanding than those in the considerate expository text condition also. This suggests that the key benefit of the narrative element of the considerate soft expository text was that it supported the retention of developed conceptual understanding.

Elsewhere, evidence suggests that narratives can also support the development of conceptual understanding more effectively than expository texts. My previous research (Browning & Hohenstein, 2015) compared the use of expository and narrative nonfiction texts to teach the evolution of apes to 62 Year 1 to Year 3 pupils (aged 5- to 8-years-old).

85 Both texts contained the same to-be-learned information and were of approximately the same difficulty. Texts included images to make them engaging to younger children: the narrative nonfiction text contained cartoon-like illustrations and the expository text contained photographs and life-like images. Parents of participants completed a questionnaire which included questions aiming to establish participants’ levels of prior knowledge: parents were asked about conversations regarding evolution with the child and whether the child had interacted with other sources of evolutionary information, such as books. Participants read the texts with the researcher, before the researcher posed seven questions to participants, in the style of a semi-structured interview. Participants’ responses were audio-recorded, and each answer was assigned a score of between 0 and 2: 0 for an irrelevant answer, 1 for a partial answer, or 2 for an answer showing appropriate, coherent understanding. Additionally, answers were coded into reasoning categories. Those who read the narrative nonfiction text scored higher on questions overall than those who read the expository text, showing a more developed understanding of the theory of evolution. In addition, there was evidence that the narrative nonfiction text supported participants in overcoming underlying conceptual constraints that can create barriers to understanding evolution: expressions suggesting an essentialist constraint (the belief that all species are intrinsically unique) occurred over twice as often in the expository than the narrative nonfiction condition. Those in the expository condition also showed more chronological confusion, answering questions about events portrayed early in the text with unrelated information presented much later in the text.

We argued that the narrative nonfiction text helped readers to suspend their disbelief, transcending the essentialist constraint in order to gain a greater conceptual understanding of evolution (Browning & Hohenstein, 2015). In addition, it was argued that the explicit chronological structure of the narratives supported participants in linking events, which in turn supported participants in answering questions. However, the images in texts used in this research might have influenced responses to texts: it is questionable whether the same findings would be obtained if the texts only differed in terms of linguistic and literary features. In addition, the extent of learning is uncertain. The baseline of prior knowledge (a parental questionnaire) only indicated the possible presence of prior knowledge, rather than actual levels of prior knowledge, and

86 therefore, the amount of progress participants made from this point is unclear. Finally, it is uncertain whether understanding observed in discussions with participants was retained in the longer term, or whether participants were able to apply this understanding to the world around them.

Overall, the research cited so far suggests that whilst narrative might support the development and retention of conceptual understanding in relation to scientific concepts for children, adult readers can recall more educational content from expository texts. Baram-Tsabari & Yarden (2005) provide further support that age, or level of education, may influence response to different text types. The authors compared 272 tenth to twelfth grade (aged 14- to 19-years) students’ responses to a piece of ‘adapted primary literature’ (an article from scientific literature that had been modified to make it suitable for students) and a narrative, within the context of the science classroom in high schools in Israel. Both texts were about a specific type of chemical. Texts were randomly assigned to participants, who read their text independently before completing a questionnaire. The questionnaire required participants to write a summary of the text, answer true/false questions on the content (providing explanations for their answers) and to answer three open-ended questions on the text. At all grade levels, the narrative increased students’ overall comprehension; however, twelfth grade students showed enhanced abilities to think critically after reading the adapted primary literature than the narrative. Therefore, for older learners at higher levels of education, where required skills go beyond comprehending information, expository texts may offer additional benefits.

Research presented in this section generally suggests that narrative might support the development of conceptual understanding in relation to more abstract topics that are difficult for younger learners to conceive of, and might support the retention of conceptual understanding in a wider range of subjects. The next section will go on to discuss research which considers further the use of narrative to develop a conceptual understanding of unfamiliar, abstract concepts (specifically adaptation and evolution), without making comparisons to other text types or teaching methods.

87 5.2.3. Narratives to develop an understanding of unfamiliar, abstract concepts Kelemen et al. (2014) considered how picture books influenced 5- to 6-year-old and 7- to 8-year-old participants’ conceptual understanding of adaptation. Participants sat a pre-test at the beginning of the experiment: they were presented with both an ancestral population and a contemporary population of a fictitious species, which were portrayed using realistic pictures. Participants were then asked a combination of open- and closed- ended questions to assess their knowledge of adaptation. Researchers considered participants’ responses overall, and placed participants into one of five categories, ranging from Level 0 (participants produced no facts on adaptation) to Level 4 (participants showed a coherent understanding of natural selection across multiple generations). Participants then read a storybook with an adult that described the adaptation of the fictitious species, before completing the test twice again after reading, once in relation to the fictitious species portrayed in the book, and once for a novel fictitious species, to assess the ability to generalise what had been learned. Three months later, children were tested again on both the fictitious and novel fictitious species. At the pre-test, there was very little evidence of any understanding of adaptation: of the younger children, 80% were classified as Level 0 and 0% at Level 4, whilst with older children, 42% were classified as Level 0, and 3% at Level 4. In contrast, on post-tests, just 11% of younger participants were classified at Level 0, and 7% at Level 4. Older participants showed a similar improvement, with 6% classified as Level 0, and 15% as Level 4. A large number of participants were also able to generalise what they had learned to the novel fictitious species. Three months later, whilst the number of participants classified as Level 0 had increased slightly, fewer were classified as Level 0 than at the original pre-test. This shows that the storybook was highly effective in teaching children about adaptation, and allowing them to then generalise and retain this understanding in the longer-term.

While Kelemen et al.’s (2014) research was conducted in an experimental setting, with predominantly monolingual, Caucasian children, two of the authors replicated these findings with a racially and ethnically diverse group of children of the same age, most of whom spoke a second language, in the naturalistic environment of their after-school program (Emmons et al., 2016). This suggests that the benefits of narrative are applicable to a wide range of learners. However, whilst participants in both studies

88 proved that they were able to generalise knowledge of adaptation to another fictitious species, it is unclear whether they would then be able to apply this knowledge to an existing species, transferring learning in a fictional context into a real-life context: despite showing an understanding of adaptation, participants may categorise this as a fictional process. In addition, this research does not consider in which specific areas the story supported the development of conceptual understanding, or specifically why this might have been: it focuses more on the finding that young children are capable of developing a conceptual understanding of adaptation, rather than exploring what it is about narrative that makes this development of understanding possible.

With an older sample of students in a school in the Netherlands (aged 15- to 17-years- old), Prins et al. (2017) similarly required participants to read a narrative about the theory of evolution. Participants then completed a questionnaire assessing their ability to retell the narrative, identify information from it, recall information (in response to specific questions) and contextualise information. This aimed to assess understanding of the text. Students were also interviewed by the researchers after this session. A week later, participants sat the same questionnaire again, to assess retention of understanding. Over the course of a week, there was a significant decline in participants’ ability to retell the narrative, but no significant decline on the identify, recall or contextualise questions. This finding highlights that the strong ability to retell narratives that was observed in research discussed previously (Wolfe & Woodwyk, 2010; Kucer, 2011) is likely to be a short-term effect, occurring only when retelling takes place immediately after reading texts. The authors suggest that the decline in ability to retell the text may have been down to participants lacking the appropriate scientific vocabulary to retell the facts in the story. However, participants did show a good retention of conceptual understanding on identify, recall and contextualise questions. This highlights that previous research requiring participants to retell texts immediately after reading is not indicative of the conceptual understanding of educational content that participants developed in response to texts.

During interviews following the narrative intervention in Prins et al. (2017) study, participants were asked to compare the narrative to the standard textbooks typically used in class. Responses were mixed: some participants preferred the narrative,

89 claiming that it was engaging, that they wanted to find out what happened at the end, and that they could form an image of the narrative in their head, making it easier to remember. However, some preferred textbooks because they felt they were clearer and there was no need to try to separate the fact from the fiction. This highlights that responses to different text types are also influenced by the personal attributes and individual preferences of readers.

The above research demonstrates that narrative supports the understanding of the unfamiliar, abstract concept of evolution. It is argued that narrative might support the understanding of unfamiliar scientific concepts because of its approach to these concepts: whereas science usually focuses on general principles, narrative explanations provide specific, detailed accounts of phenomena (van Dijk & Kattmann, 2009). In taking this approach, narratives may convey information ‘packaged in an easier format to comprehend’ (Dahlstrom, 2014:13618). In this sense, information becomes more accessible and relatable to learners, an argument that will be discussed further in the section below.

5.2.4. Narrative and relatability The beneficial learning effects observed in response to narrative may be due to narratives allowing readers to relate to the content provided by the text. Considering a more informal use of narrative to support teaching, Frisch & Saunders (2008) conducted a case study observing undergraduate biology lectures, transcribing all anecdotes or narratives used during lectures. They then interviewed the students. More students reported heightened engagement as a result of narratives told, and almost all used the word ‘relate’ when discussing the narratives, whether this was to suggest that narratives helped them to relate to the lecturer or the subject matter more easily.

For similar reasons, narrative as a tool to initiate active exploration of a concept is becoming more popular in the teaching of mathematics (Padula, 2004; Malinsky & McJunkin, 2008; Muir et al., 2017). This approach has been observed to both engage learners and increase understanding of unfamiliar concepts. Russo & Russo (2018) argue for the importance of a ‘narrative-first approach’ in teaching mathematics: narratives can be read to a class before mathematical investigations are conducted by children,

90 which relate to the content of the book. An example is given of a narrative called ‘Fish Out of Water’, about the rapid growth of a small fish. Year 5 and 6 pupils (aged 9- to 11- years) conducted tasks exploring the exponential growth of the fish. Pupil perceptions were gathered: pupils stated that the story engaged them, motivating them to learn more about mathematics, and that the narrative supported them in making sense of exponential growth as a concept. The concept became more relatable during the story due to it being embedded in a familiar context, and therefore easier to gain a conceptual understanding of. However, similarly to all research considering the impact of narrative in isolation, this research does not suggest whether narrative might be more effective in developing a learner’s conceptual understanding than expository texts. For instance, an expository text could give an example of small fish rapidly growing to explain exponential growth; it is questionable as to whether the narrative context supported learning, or as to whether the analogy made between the growth of the fish and exponential growth supported understanding. However, further research into the use of narrative in the maths classroom has found that a five-month story-based intervention in mathematics increased progress on standardised test scores for kindergarten children (aged 5- to 6-years) to a greater extent than normal curriculum teaching (Jennings et al., 1992). The authors argued that this was due to the fact that the stories made unfamiliar mathematical concepts more accessible, to the point that participants were also integrating more mathematical vocabulary into the games that they played.

Such studies suggest that narrative has the power to ‘bridge the personal and the unfamiliar’ (Cleto & Warman, 2019:112), in that it provides a context and structure familiar to readers, within which it embeds unfamiliar information, thus making this unfamiliar content more relatable and accessible. Mar & Oatley (2008) suggest that the relatability of narrative is primarily due to its social dimension: narrative provides simulative experiences embedded within a social context that readers can immerse themselves in, relating to these experiences through consideration of their own social experiences. Alternatively, with abstract concepts such as exponential growth, it might be that illustrating this in a physical example, or analogy, is what supports understanding, rather than the narrative context as such. However, in the approach to narrative advocated by Russo & Russo (2018), narrative is used as a tool to demonstrate

91 a concept before initiating active exploration of this concept through investigations. In such cases, it might be that narrative acts as a stimulus to initiate interest, which in turn motivates learning, rather than narrative directly impacting the development of understanding itself. Research exploring the use of narrative as a tool to initiate learning will be discussed in the following section.

5.2.5. Narrative and interest: narrative as a learning stimulus Interest is thought to support learning, as it increases attention and consequently cognitive activity (Schraw et al., 2001). More specifically, it has been observed that interest is associated with an increase in dopamine activity, a chemical neurotransmitter associated with positive feelings, and both the comprehension and memory of information (Klassen & Froese-Klassen, 2014). Interest can be situational (interest which arises in response to a particular stimulus) or individual (a personally developed longer- term interest in a particular topic) (Krapp et al., 1992). If narrative evokes a situational interest, or supports an individual interest, it might therefore be levels of interest that initiate learning in response to narratives, rather than the structure and content of the narrative itself. This section will explore research that has utilised narrative as a tool to initiate interest, or a context within which to set learning.

Within the science classroom, Murmann & Avraamidou (2014) used a story to provide a context for learning with third to sixth grade children in Copenhagen. The story introduced an emperor who became angry when various animals claimed that they had senses that he did not (for instance, a dog having a stronger sense of smell). He removed the animals’ body parts that enabled these senses, and the animals subsequently contacted children at a nearby school to ask them to write letters to the emperor explaining different senses. The next part of the story invited students to the emperor’s laboratory (a senses exhibition at a science centre), where students could collect information to include in their letter. A final part of the story, delivered after the visit to the exhibition, asked students to defend their animal in a court of law. The students were observed to be engaged with the story: they particularly enjoyed the sense of agency that they had in relation to the animal that they chose to defend, interpreting the story from the animal’s perspective and showing an attachment to their chosen animal. Whilst no measure of conceptual understanding was taken, this heightened

92 level of interest is likely to have increased attention on tasks, and as a result, the development of understanding. One teacher observed that the story context enabled children to be focused and engaged in the topic for a longer period than expected; with other teaching methods, children usually lost interest in a topic within a few weeks. However, some issues were noted from the perspective of teachers: whilst some teachers felt that the story helped to give structure to their lesson, tying different elements of it together, others found it difficult to implement the story in relation to their own practice. The authors highlight how personal teaching philosophies have a large impact on how practitioners use stories in their practice, and this in turn influences how learners respond to narratives. Therefore, it may be difficult to identify a particular effect that narrative has in the classroom, as this effect is dependent on how the narrative is utilised.

Whilst the above research integrated learning within a narrative context, other research considers narrative as a stimulus to initiate learning at the beginning of a lesson. Kokkotas et al. (2010) conducted interventions with five classes (in Athens, Greece) of approximately 20 sixth grade students (aged 11- to 12-years). Teachers were trained to implement interventions by researchers. A story, which integrated scientific information, was read to participants. Following the story, participants were required to independently design an experiment, using a set number of materials with which they were provided, to demonstrate the scientific process observed in the story. Sessions were videotaped and the authors presented transcripts from these. The transcripts showed that participants were highly engaged in the task, and as a result, were able to formulate hypotheses, design appropriate investigations, collect data and draw appropriate conclusions. However, as the analysis of these transcripts was rather subjective, it is possible that any difficulties that participants had may have been overlooked. Despite this, additional, quantitative data was presented. Participants answered questions which required them to explain scientific processes: percentages according to the type of answer given showed that a majority of participants were able to give an explanation of the scientific process, based on their scientific model. However, it is questionable as to whether engagement and learning was initiated by the narrative stimulus, or whether this arose from the practicality of the following task.

93 Further research suggests that the use of narrative at the beginning of a lesson may serve little purpose other than to evoke interest. Walan (2019) observed six pre-school children (aged 6) from a Swedish school visiting a science centre, where they were told a story about Berta the dragon. The story included chemical experiments, and aimed to enhance understanding of chemistry in relation to real life. Once the story had been read, children conducted their own experiments with support from the storyteller. It was observed that all children were highly engaged and focused, and interviews with teachers suggested that they largely felt positively about the impact of the story. However, although children spoke about what they were doing during experiments, references to the story were rare, suggesting that children were not necessarily using the story to support their understanding. Moreover, whilst children were engaged, this could have been because they were visiting a science centre outside for school, or because of the theatrical aspects of the story, which was introduced by a fire-breathing hand puppet in a room decorated as a cave. While this research illustrates how narratives and hands-on experiments can be used in conjunction, it does little to illustrate to what extent narratives might have supported engagement in experiments, and the subsequent development of conceptual understanding.

Overall, whilst there is an emerging field of research suggesting the possible benefits of narrative as a learning tool, there is currently little research explicitly comparing narrative and expository texts, causing uncertainty over how far positive effects observed can be attributed to narratives specifically. It is unclear as to whether observed effects are directly attributable to narrative, or whether narrative increases interest and engagement, which subsequently impacts learning. Where comparative experiments do exist, texts are often on different topics, and therefore difficult to directly compare. In addition, few of these comparative experiments are conducted within a naturalistic, classroom setting.

While evidence from the research described above is not always in agreement, a number of key points arise from this examination of the literature. Firstly, that prior knowledge is activated to a greater extent in response to expository texts; this might support the construction of situation models as readers integrate prior knowledge with textual information. As a result, readers may be able to recall more to-be-learned

94 educational content in response to expository texts. Secondly, that readers can recall more general information from narratives than expository texts, but that this does not necessarily benefit the development of conceptual understanding in relation to a particular topic, as they recall information relevant to the narrative, rather than educational content. Thirdly, that narrative might support the development of an understanding of topics which require the reader to transcend conceptual constraints; narratives might make unfamiliar, abstract concepts more relatable and accessible. Narrative may also evoke interest, which in turn is a powerful factor in supporting attention, thus enhancing the development of conceptual understanding. Finally, research suggests that narrative might support the retention of conceptual understanding developed in response to texts. However, little research considers the recall of texts a substantial period of time after interaction with texts, and therefore little research accounts for long-term learning in response to different text types.

As highlighted by Prins et al. (2017), some learners prefer expository texts because there is no need to separate the fact from the fiction. This raises one final consideration: when reading a narrative text, readers face the ‘reader’s dilemma’ (Gerrig & Prentice, 1991), in which they have to distinguish factual from fictional information, and crucially, to decide how to then categorise this information in mental representations and pre- existing knowledge structures. For instance, whilst Emmons et al. (2016) found that young participants could apply knowledge of adaptation from one fictional species to another, there was no indication as to whether they could then generalise their understanding to existing species: their knowledge might not have been translated into pre-existing knowledge structures which they could apply to real life, as children might not transfer details that they are uncertain are ‘real’ (Strouse et al., 2018). This highlights the question of whether, in much of the research above, participants are actually learning information from narratives that they can then apply to reality, or whether they are simply acquiring information for the purposes of their participation in the research, which they quickly disregard because they classify information as fictional or unreliable. The reader’s dilemma, and the implications this has for research in this area, will be considered further in the following section.

95 5.3. The reader’s dilemma The current research chose to focus on narrative nonfiction because it is an entirely factual type of narrative. It is thus arguable that the use of narrative nonfiction might help to dispel the reader’s dilemma. However, as these texts are still presented in the format and style typical of a narrative, young readers might not judge them as a reliable, entirely factual source, leading them to make decisions about which content is to be categorised as ‘true’. This is a particularly heightened issue when narrative nonfiction texts are similar to historical fiction texts. Therefore, it is important to consider how readers judge the truth-value of texts, and how this might influence how they interpret information presented in texts. This section of the literature review will briefly consider how capable young children are at distinguishing fact from fiction, how textual features might influence judgements of truth-value, and finally, whether information encountered in narrative might be integrated into pre-existing knowledge structures.

Whilst children as young as three-years-old have shown that they are able to distinguish between fantasy and reality on television (Skolnick & Bloom, 2006), in books (Richert & Smith, 2011), and in terms of fantasy figures and real characters (Sharon & Woolley, 2004), this is often when the differences between reality and fantasy are quite pronounced (for instance, judging whether monsters and fairies are fantastical or real). In contrast, the current research requires participants to judge the truth-value of events with no obvious fantastical elements. Elsewhere, research has suggested that the context within which information is presented might influence judgements of truth- value: participants aged 3- to 5-years-old were more likely to claim that a novel entity introduced was real when it was presented in a scientific context, compared to presentation in a fantastical context (Woolley & Van Reet, 2006). If texts in the current research are presented within the context of the classroom, this might heighten participants’ beliefs that they convey reality. However, Woolley & Van Reet (2006) also found that, with 4- to 5-year-olds, being introduced to a novel entity after hearing a scientific story had less influence on reality status judgements than when novel entities were defined with reference to scientific entities. This may have been because the story was only indirectly connected with the novel entity, or perhaps it related to the nature of the story itself. It might be that children do not consider a story as able to give insights into the reality status of objects.

96 Other research suggests that stories can influence reality judgements, and that children are in fact reality-prone when considering information in stories: when 4-year-olds were presented with realistic and fantastical stories, and asked to choose the best endings for these, they were more likely to choose ordinary events than events which violated laws of reality for both types of story, whereas adults chose the endings appropriate to the type of story (Weisberg et al., 2013). Similarly, Martarelli & Mast (2013) found that whilst 3- to 8-year-olds could quickly classify realistic stimuli as real, they were slower to categorise fantastical stimuli, and often erred towards judging these as real. This suggests that when uncertain, children are more likely to classify unfamiliar entities as real. However, elsewhere, it has been suggested that young children are naïve sceptics when judging the reality status of information (Woolley & Ghossainy, 2013): Shtulman & Carey (2007) found that 4- to 8-year-olds were more inclined to classify improbable events as impossible than adults. Improbable events are those that violate social or physical regularities but that are not impossible, such as finding an alligator under the bed. The authors argue that if children cannot imagine events being possible, they assume that they are not possible.

In terms of judging levels of realism in texts, children often rely on perceptual features (Kelly, 1981; Chandler, 1997); there is a body of literature that explores how pictures and other manipulative features of books influence learning (Strouse et al., 2018). Problematically, such perceptual features are not always available for older readers, who have to rely more on the linguistic clues in a text, as is the case in the current research. In terms of the specificity of language, it has been found that 3- and 4-year- olds are able to extend properties to larger categories in response to generic statements (Cimpian & Markman, 2008). However, this has similarly been found with more specific language in narrative contexts: it has been observed that 3- to 5-year-old children are able to learn about scientific concepts, and apply this learning in real life situations, regardless of whether information was encountered in realistic or fantastical books (Venkadasalam & Ganea, 2018), or in factual books detailing general statements (e.g. ‘the frog) or narratives with specific statements (e.g. ‘Sammy the frog’) (Ganea et al., 2011). However, this may be dependent on the world presented in the book: by at least 3-years-old, children have been found to be more likely to generalise information presented in a book if the fictional world is similar to reality, rather than having

97 obviously fictionalised features (Walker et al., 2015). Additionally, the ability to generalise specific information might decline with age: with a sample of 3- to 8-year- olds, Koenig et al. (2015) observed that the oldest children in the sample became more likely to judge speakers giving generic information about novel animals as more knowledgeable than speakers giving specific information; the authors argue that generic language suggests that the speakers have access to more observations to support the generic statements being made. However, it must be considered that, when learning about more familiar topics, it is likely that children’s personal experiences and prior knowledge might influence how they judge the reality status of information in narratives (Corriveau et al., 2015).

In terms of integrating information into pre-existing knowledge structures, research suggests that children (aged 3- to 5-years-old) are less likely to allow fantastical content to integrate into their knowledge of real-life than they are realistic content (Richert & Smith, 2011). Although there is a large body of literature suggesting that children can learn from fiction (or sources that children believe to be fictionalised) (Hopkins & Weisberg, 2017), and integrate this knowledge into pre-existing knowledge structures, it is questionable as to whether this can be applied to all types of learning. For instance, there is research to suggest that children can learn and retain novel words presented in fictional texts through repeated exposure (McLeod & McDade, 2011; Wilkinson & Houston-Price, 2013). However, regardless of whether a text is fictional or factual, most of the language which the text is conveyed through will need to be ‘real’ to enable comprehension: there is little reason for a reader to resist integration of a novel word into knowledge structures simply because a text is fictional. This is very different to learning, for instance, about historical topics.

More specifically, Corriveau et al. (2009) considered how 5- to 7-year-olds judged the truth-value of fictional narratives and historical narratives. It was found that children were more likely to judge characters in historical narratives as ‘real’, but characters in fictional narratives as ‘fictional’. When justifying their views, children were more likely to provide justifications about the historical/real nature of the historical narrative (for instance, ‘he fought in the war’), but more likely to draw on the impossibility of events of the fictional narrative (‘there’s no such thing as invisible sails’). However, in the

98 absence of evidently fictional information in the texts used in the current research, children are less likely to consider events impossible, unless, as suggested by Shtulman & Carey (2007), they believe that not being able to imagine an event means that it is not possible.

In terms of history learning specifically, Butler et al. (2009) explored whether popular, historical films might influence undergraduate students’ recall of a topic. Nine popular films were selected, each one containing at least one, major historical inaccuracy. The author wrote a text to accompany each film: the same accurate information was included as that in the film, but the inaccuracy was corrected. Students were assigned to one of three ‘warning’ categories: they were either given no warning about possible inaccuracies, a general warning, or a specific warning. Participants were instructed to learn the material in the texts, and told that the film clips were simply intended to illustrate this material. Within each of the three warning conditions, participants either read the text and then viewed the clip, viewed the clip and then read the text, or read the text only. A week after materials were presented, participants took a cued-recall test, in which they were firstly required to answer questions about the text only, and then questions about the film clips only. For questions about information that was inconsistent across the texts and films, participants who were given no warning or a general warning often recalled misinformation from the film. However, when given a specific warning, participants often recalled correct information from the text. This highlights that, when presented with historical information, participants integrated information into prior knowledge structures even when aware that there may be inaccuracies; it was only when given specific guidance on these inaccuracies that participants adapted what was integrated into pre-existing knowledge structures. This suggests that learners might require guidance in judging the truth-value of different elements of texts.

Overall, although it appears that children are able to distinguish between reality and fiction competently from a young age, a portion of the research discussed considers slightly younger children than those participating in the current research, and their ability to judge more obviously realistic or fantastical entities. Therefore, it is uncertain as to how far the findings discussed above apply to older children who are dealing with

99 a topic that is relatively unfamiliar to them. However, the current research aims to provide some insight into how realistic participants judge information presented in both narrative nonfiction and expository texts to be, and whether they integrate this into pre- existing knowledge structures to constitute longer-term learning.

100 Chapter Six: Methodology and research methods

6.1. Overview of research methods For this research, a quasi-experimental design was implemented. Participants were placed into one of two experimental conditions: experimental condition A (the narrative nonfiction (NNF) condition) or experimental condition B (the expository text (ET) condition). Participants in each condition sat a written pre-assessment on World War One (WWI), before completing three intervention sessions with the researcher, during which they learned about WWI. Each intervention focused on a different theme related to WWI. During interventions, participants first completed short activities in small groups of three to four; these activities aimed to activate relevant prior knowledge to support understanding of the texts to be read. Participants were then read either the NNF or ET by the researcher, before discussing six set questions about the text in their small groups. These discussions were audio-recorded for later transcription, coding and analysis; they intended to shed light on the first, third and fourth research questions (detailed below). All participants sat a written post-assessment one week after the final intervention, and a written delayed post-assessment six weeks later. These assessments intended to indicate the development and retention of conceptual understanding, addressing the first and second research questions.

Research questions: 1 How does narrative nonfiction affect the development of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 2 How does narrative nonfiction affect the longer term retention of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 3 How might mental representations diverge across narrative and expository texts for primary-aged readers? 4 How do primary-aged readers perceive and judge the truth-value of narrative nonfiction and expository texts, and what implications might this have for the development and retention of conceptual understanding?

101 6.2. Choice of methodology Although this research is primarily interested in the influence of narrative on learning, both narrative and expository texts were considered and compared, as this allowed for an exploration of whether narrative offers any additional benefits over expository texts. This element of comparison led to the choice of a quasi-experimental design as the method of data collection. As the research sought to explore the influence of a single, pre-determined factor (text type) on learning, an experimental design was appropriate: this design allowed the isolation – to the greatest degree possible – of the independent variable of text type (Creswell, 2014). Limiting possible external factors in order to isolate a dependent variable allows for a more ‘convincing claim to be testing for cause and effect’ (Gorard, 2001:133). As such, this approach meant that any learning effects observed could be more reliably attributed to the texts themselves; if only one text type were considered, learning effects might be considered the result of maturation, rather than the result of interacting with a particular text type (Cook & Campbell, 1979). Whilst it is recognised that complete control of external factors is not possible, this does not mean that it cannot be striven for, to minimise the potential impact of these factors.

The data collected was largely quantitative in nature: scores were obtained from written assessments, and codes obtained from transcripts were analysed primarily in terms of their frequency. This approach towards data collection was chosen for a number of reasons. Firstly, Gorard describes an ‘epistemological crisis of confidence’ (2001:7) in educational research in the UK: much research is small-scale and interpretative in nature, which Gorard argues leads to less confident conclusions, and consequently a lack of interest in findings from practitioners. It is arguable that more subjective approaches cannot strongly justify a statement or hypothesis; rather, some degree of objectivity is required to ensure that knowledge gained from research is justifiable (Popper, 2002). By utilising a larger scale of more objective, quantitative data, the chances of the subjective biases of the researcher having an impact on the interpretation of data are reduced. As such, this research aims to provide more confident insights into the effects of text types on learning, which might be of practical interest to practitioners. Secondly, this research intends to provide findings that are more generalisable to a wider population: within UK class sizes of approximately 30 children, identifying teaching approaches that are generally effective in supporting

102 learning is essential, as each child cannot always be individually taught in a style suited to them. The use of quantitative data required a larger participant sample, which in turn increased generalisability. Finally, in the current educational climate, learning is often operationalised through assessments in order to measure the progress and attainment of learners. Such assessments are even often considered to provide a reflection of the quality of educational practitioners themselves. Therefore, this research aims to reflect the social reality of the education system, producing findings that are relevant to practitioners.

Alongside using assessments to gain insights into the development of conceptual understanding, discussions between participants after interacting with texts were audio-recorded. Whilst the data collected from discussions was quantitatively analysed, it intended to go some way towards capturing learning as a complex, interactive process, rather than an outcome: the codes were designed to explore how knowledge was constructed in the social space between the text, a participant, and their peers.

6.3. Data collection 6.3.1. Participants The sampling frame comprised five Year 5 classes, recruited from two suburban primary schools in Essex. Year 5 was targeted specifically for this research because the Key Stage Two (KS2) National Curriculum for history details a broader range of skills more appropriate to the research aims than the Key Stage One (KS1) curriculum. Upper KS2 pupils (Years 5 and 6) were thought to be more appropriate participants for this research because they would have experienced more teaching in relation to these skills, compared to lower KS2 pupils (Years 3 and 4). Year 6 was excluded because of the additional pressure on pupils sitting their Statutory Assessment Tests (SATS) in the spring term, when data collection was scheduled to take place. There were no exclusionary criteria; the sampling frame comprised the whole of Year 5 in an attempt to gain a more representative picture of the population being studied. Of this potential sample, 78 children returned completed consent forms. Prior to the recruitment of participants, G*Power was used to calculate the necessary sample size to ensure adequate power for this experiment (see Appendix A). A total sample of 44 participants was required, and therefore, a total of 78 participants overall was sufficient.

103 Participants included 36 males and 42 females, ranging between 9.8- and 10.7-years-old (M = 10.2 years, SD = 0.29). All participants spoke English as a native language, and most participants (95%) were of White British ethnicity. Six participants were on the Special Educational Needs (SEN) register and a further four were being monitored for SEN (see Appendix B for the specific SEN of these participants). Three further participants were noted by class teachers as being highly disruptive; specific strategies were in place to support these participants. Participants were of a wide range of abilities in both reading and history, working at anywhere between the expected level at the beginning of Year 2 (6- to 7-years-old) and the expected level at the end of Year 5 (9- to 10-years-old). For further information on the spread of reading and history levels within the participant sample, see Appendix C.

6.3.2. Setting Of the two participating primary schools, one was a three-form-entry primary school, and the other a two-form-entry primary school. The first school (School A) was a community school, rated as ‘Good’ by Ofsted, whilst the second school (School B) was a mainstream academy converter6, rated as ‘Outstanding’ by Ofsted. School A had a considerably higher proportion of pupil premium (PP) children and children eligible for free school meals (FSM) than School B (see table 3). Both schools were located in close proximity, within five miles of each other.

Table 3: Percentage of pupils eligible for Pupil Premium and receiving free school meals in participating schools7

School Percentage of Percentage of Percentage of pupils in school Year 5 children Year 5 children eligible for FSM at any time during eligible for PP receiving FSM the past 6 years (National Average = 24.9%) School A 55% 30% 42.7% School B 9.8% 6.6% 10%

6 Mainstream academy converters are successful schools that have chosen to convert to academies in order to operate independently of local authorities, and thus with more autonomy to be innovative in their approaches to learning. 7 These percentages represent the whole Year 5 cohort in each school (the sampling frame), not the specific participant sample, as information about PP and FSM for specific participants was unobtainable from School B. 104 6.3.3. Grouping participants into two experimental conditions The participant sample from School A was drawn from three classes (Classes 1, 2 and 3), and in School B from two classes (Classes 4 and 5). To avoid any differences between the two schools potentially influencing either experimental condition, each condition comprised participants from both schools. Full classes were assigned to conditions for practical reasons: whilst the researcher completed an intervention session with one class, the other class(es) completed their usual lessons with the class teacher. This method of placing participants into conditions meant that matching participants across conditions according to particular factors, such as reading ability, was not possible. As a result, additional factors that might have influenced the performance of conditions were considered during data analysis.

As the response rate varied widely across the classes (see table 4), grouping participants into two equal conditions was a difficult process. To begin, Class 4 (from School B) were randomly selected from all five classes (using the randomise function in Excel) to be assigned to the NNF condition (24 children), meaning that Class 5 from School B were placed in the ET condition (14 children). To create conditions approximately equal in terms of number of participants, Classes 1 and 2 from School A were combined (27 children), and placed in the ET condition, whilst Class 3 from School A was placed in the NNF condition (13 children). A total of 41 children were placed in the ET condition, and 37 children in the NNF condition (see table 5). Although the number of male and female participants varied slightly across conditions, combining the classes in this way meant that the number of males and females in each condition was as similar as possible.

Table 4: Number of participants from each school

School Class Number of participants Class 1 10 Class 2 17 School A Class 3 13 School A total 40 Class 4 24 School B Class 5 14 School B total 38

105 Table 5: Grouping of participants into experimental conditions

Condition A (NNF) Condition B (ET) TOTALS School Class Male Female Male Female NNF ET Class 1 - - 3 7 School A Class 2 - - 5 12 13 27 Class 3 4 9 - - Class 4 17 7 - - School B Class 5 - - 7 7 24 14 Gender 21 16 15 26 TOTALS Overall 37 41

As the experiment took place over the course of eleven weeks in total, participant attrition was an issue. Although no participants withdrew from the research, a number of participants recorded absences during the course of the experiment. Overall, six participants were absent from at least one of the three intervention sessions, and a further three participants were absent from at least one of the three assessments. This left a sample of 69 participants who completed the entire experiment. As six of these participants were from the ET condition and three from the NNF condition, the conditions remained approximately equal in number (NNF = 34; ET = 35).

Following this, participants were assigned to smaller groups of three or four within their conditions, which participants would complete prior knowledge activities and discussions within. Groups of three were chosen because they were thought to be the ideal group size to encourage participation: placing children in pairs might have limited discussion, whereas larger groups carry the danger of some children being suppressed or avoiding participation, and of discussions becoming more difficult for the researcher to follow during transcription. The pilot study confirmed that groups of three worked well. Therefore, a majority of groups were threes, but occasionally groups of four were necessary due to numbers. These groups were mixed ability. Mixed ability grouping is thought to be more supportive of learning than ability grouping (Sapon-Shevin, 1992; Loreman et al., 2005): higher level learners can scaffold and support lower level learners (Vygotsky, 1978), whilst higher level learners can consolidate their learning by articulating, reasoning and justifying their learning as they support others (Ashbridge & Josephidou, 2018). This was especially important in the absence of adult guidance during discussions. To assign participants to mixed ability groups, three larger, approximately equally sized groups were initially created: ‘working towards’ learners, 106 ‘expected’ learners and ‘greater depth’ learners8. These were created using participants’ reading and history levels as provided by schools. Once these three groups were constructed, one participant was randomly selected from each of the three groups to create a smaller group of three. This was repeated until all participants were assigned to a group. In some instances, class teachers requested that particular children were not grouped together due to behavioural or friendship problems. If such children were placed in the same group, the group was randomly recreated.

6.4. Materials 6.4.1. Questionnaires Questionnaires were constructed for both parents and participants in order to gain additional data on participants. This data intended to provide a general picture of the participant sample, and also to collect information on different factors that might have influenced participants’ responses to interventions. These factors were then considered during data analysis. Parent and participant questionnaires will be discussed in more detail below.

6.4.1.1. Parent questionnaires Within the consent package sent home to parents/guardians of participants, there was a short questionnaire for parents/guardians to complete (Appendix D(iv)). This questionnaire sought to gain additional data on participants, such as gender, age, ethnicity, native languages, learning difficulties and how many hours participants spent reading at home in an average week.

6.4.1.2. Participant questionnaires Short questionnaires for participants were attached to pre- and post-assessments (Appendix E). The pre-questionnaire (Appendix E(i)) asked participants how many days they usually spent reading at home in a week. Although a similar question about reading regularity was asked of parents, both participant and parent perspectives were thought to be important. The question asked participants to state the number of days that they

8 These terms are used to assess children within schools: children are either ‘working towards’ the expected standard, working at the ‘expected’ standard, or working at a ‘greater depth’ than the expected standard. See Appendix C for more detail on reading and history levels. 107 read at home, rather than average number of hours, as this was thought to be more appropriate for children. The questionnaire also asked participants whether they had a preference for fiction or nonfiction9, and to select their three favourite hobbies from a list of seven hobbies. The list of hobbies was refined during discussions with Year 5 children during the pilot study. This question was primarily interested in whether reading was a hobby; however, participants were asked to choose from a list of various hobbies, firstly, to gain a genuine, reliable response from children (because they were not immediately thinking about reading, but their hobbies more generally), and secondly, because other hobbies might have been of interest during data analysis.

Some questions were asked on both pre- and post-questionnaires so that responses could be compared over time, to assess whether intervention sessions influenced responses. Firstly, two questions assessed interest: as discussed in Chapter five, there is evidence suggesting that narrative evokes interest, and that interest it an important motivational factor in learning (Klassen & Froese-Klassen, 2014). To assess individual interest, participants were firstly asked to rate how much they enjoyed learning about history (using a child-friendly likert scale from one to five) and also whether WWI was a topic which interested them (giving a yes/no response). Finally, to explore levels and sources of prior knowledge, and how participants felt their knowledge developed in response to interventions, participants were asked to self-evaluate their knowledge of WWI (on a scale of one to three), and to explain where they had acquired any knowledge from.

Alongside the questions described above, the post-questionnaire (Appendix E(ii)) asked participants how much they enjoyed intervention sessions. This question aimed to assess participants’ situational interest in response to interventions. Finally, participants were asked to describe the type of text that was read during interventions. This question was interested in whether participants recognised the text types read. It was phrased as an open-ended question, to avoid guiding participants towards categorising texts in any particular way. Unfortunately, participants did not respond to this question as intended:

9 Although this research uses the terms ‘narrative’ and ‘expository’ texts, schools more commonly refer to texts as ‘fiction’ and ‘nonfiction’: these terms were used as they were more likely to be familiar to participants. 108 rather than attempting to classify the text type, they described their opinion of the text, for example, ‘interesting’ or ‘enjoyable’. Consequently, this question was not explored during data analysis.

6.4.2. Assessments (pre-, post- and delayed post-assessments) The pre-, post- and delayed post-assessments were identical. A copy of the assessment given to participants can be found in Appendix F. It consisted of 20 questions: six questions about each of the three intervention themes (War Begins, Trench Life and The Home Front) and two questions which spanned learning across all three interventions (see section 6.4.4 for further detail about these themes).

Assessments aimed to evaluate participants’ conceptual understanding of WWI (see figure 4, below) and therefore assessed both historical knowledge and historical thinking skills. Although questions were designed to elicit both substantive and second-order knowledge, the inclusion of effective second-order questions proved to be difficult. During the pilot study, participants often gave substantive answers to questions designed to elicit second-order responses. For instance, one question asked what effect factory jobs had on the women doing them. This question intended to initiate second- order knowledge, as participants could discuss how they felt jobs affected women, for instance, in terms of morale. However, all pilot participants gave substantive responses to this question, in that they drew information directly from the text, such as ‘TNT turned their skin yellow’. Because this information was retrieved from the text, participants were not drawing on their own interpretation of historical information – their second-order knowledge. Therefore, it was decided that assessment questions would focus on substantive knowledge only; second-order knowledge was instead considered during discussion questions.

Of the 20 substantive assessment questions, each question was additionally categorised according to which historical thinking skill it addressed10. In Chapter three, a model of a conceptual understanding of history was outlined. However, for the practical purposes

10 Note that the final thinking skill - historical enquiry/interpretation – is not considered here; this thinking skill is explored separately in relation to truth-value judgements, assessed through audio-recorded discussions. 109 of assessing conceptual understanding in this research, this model was adapted slightly (see figure 4, with additions to the model shown in red). When categorising questions according to historical thinking skill, this left a number of questions – questions where participants did not draw on chronological or causal thinking skills specifically – labelled as substantive knowledge only. Therefore, an additional historical thinking skill was added: conceptual thinking. This thinking skill was assigned when participants were conceptualising the past: their thinking allowed them to conceive of a previous world.

Figure 4: Adapted model of a conceptual understanding of history

Conceptual understanding of history

Historical knowledge Historical thinking skills

Historical Substantive Second-order Conceptual Chronological Causal enquiry/ knowledge thinking thinking thinking knowledge interpretation

In light of this new model, all questions were categorised as requiring either conceptual, chronological or causal thinking skills. Overall there were three chronological thinking questions and six causal thinking questions. There were fewer chronological thinking questions because opportunities to assess this skill were more limited within the relatively short time frame of WWI. The remaining eleven questions assessed conceptual thinking skills, but were assigned one of two sub-categories: simple conceptual thinking or complex conceptual thinking. As there were a wide range of conceptual thinking questions, it did not seem appropriate to group them into one category. There were six simple conceptual questions, which required participants to either recall simple, single units of information, or to match information up. There were five complex conceptual questions: these were more open-ended questions, requiring participants to recall and cohesively present multiple units of historical information. An

110 annotated assessment, showing which category each question was assigned to, can be found in Appendix G.

Assessments were scored out of 56 points. A pre-determined mark scheme was used to score assessments; this mark scheme was designed prior to the pilot study, and adapted according to the outcomes of the pilot (see section 6.7 for more detail). The mark scheme can be found in Appendix H. For questions in which participants were required to either match or order information, participants were not assigned a point per correct answer: such questions could be guessed, and this method of marking reduced the chance of too many points being assigned for accurately guessed answers. For complex conceptual questions, participants were given points for additional details included (for instance, one point for stating that the Black Hand were a group of assassins, with an additional point for stating that they were Serbian). Whilst this meant that some participants might not gain points because they chose not to include additional information, rather than because they were unaware of it, this was done to give credit when more complex answers were given.

6.4.3. Prior knowledge activities As discussed in Chapter two, research shows that a reader’s prior knowledge is important in enabling effective text comprehension and initiating the construction of situation models, and that young readers benefit from prior knowledge being activated prior to reading (see section 2.1.4 for more detail). As a topic that neither School A nor School B had formally taught to the Year 5 cohort before, levels of prior knowledge of WWI were expected to be relatively low. This could consequently create a barrier to comprehension. To maximise comprehension of the texts, and to allow for the potential construction of situation models, participants completed a series of short prior knowledge activities directly before interacting with texts, to equip them with, and activate, relevant prior knowledge.

Table 6 (see page 113) shows an overview of prior knowledge activities. These activities were designed to avoid teaching any information that could assist participants in completing assessments or answering discussion questions, but rather to equip them with adequate prior knowledge to access texts. For instance, in the first intervention

111 (War Begins), one prior knowledge activity involved labelling relevant countries on a map of Europe in 1914 (for instance, the Austro-Hungarian Empire). This gave an opportunity for participants to become familiar with the term Austro-Hungarian Empire (which very few participants had encountered before), and the area of Europe that this referred to. It also gave participants an idea of where Germany was in relation to Belgium and France, which would support an understanding of information in the text about Germany invading Belgium on the way to fight in France. Participants completed these activities in their small groups of three to four, before the answers to activities were discussed as a whole class. At this point, the researcher followed a script to ensure that discussions about activities were as similar as possible across conditions. For each intervention session, activities took approximately fifteen to twenty minutes to complete. All prior knowledge activities can be found in Appendix I, and an example script (used in the War Begins interventions) in Appendix J.

112 Table 6: Prior knowledge activities for each intervention

Intervention Prior knowledge activity 1 Prior knowledge activity 2 Prior knowledge activity 3 Intervention Target prior Knowledge of names/location of Knowledge of the terms allies, Familiarity with names/roles of 1 (War knowledge countries in Europe in 1914. alliance and Central Powers. key historical figures. Begins) (Key vocabulary) France, Belgium, Germany, Britain, Central Powers, Allies, alliance Kaiser Wilhelm II, Archduke Franz Austro-Hungarian Empire, Russia & Ferdinand, Herbert Asquith, Gavrilo Serbia. Princip. Task to initiate Participants label countries involved Participants notice that the colours Participants complete country prior in the outbreak of WWI on a map of of country names placed on map details on four cards containing knowledge Europe in 1914. represent two sides. Discuss meaning details about key figures involved of ‘allies’ and ‘alliance’. in the build-up to WWI.

Intervention Targeted prior Knowledge of vocabulary relevant to Knowledge of different types of Understanding of artillery 2 (Trench knowledge trenches/layout of trenches. trenches and their layout. gun/bombardment and the word Life) (Key vocabulary) Sentry, sandbags, dugout, muddy water, Front Line, support, reserve, ‘casualties’ firing step, barbed wire communication, no-man’s land Artillery, casualties, bombardment Task to initiate Participants place labels on correct Participants label the different types Participants discuss photo of an prior parts of a trench illustration. of trenches on a bird’s eye view artillery gun. knowledge illustration. Intervention Targeted prior Knowledge of the Home and Knowledge of munitions factories. 3 (The Home knowledge Western fronts. Munitions - Front (Key vocabulary) Home Front, Western Front Task to initiate 1a. Participants sort 8 war photos Participants look at the four Home prior into ‘Home Front’ and ‘Western Front pictures and discuss where knowledge Front’ categories. they think they were taken (schools - 1b. Participants shown map of and munitions factories). Europe with Home Front and Western Front labelled. 113 6.4.4. Reading materials Equivalent NNF and ETs were required for each of the three interventions, leading to the creation of three NNF and three ETs in total. This section will describe the process of writing these texts. Firstly, it will discuss how the topic of the texts and the individual themes of the intervention sessions were decided. Following this, it will describe the research conducted to write the initial narrative drafts of texts. Next, it will consider how ETs were constructed from the initial narrative drafts, focusing on the distinguishing features between these text types. Finally, it will consider how the initial NNF and ET drafts were continually redrafted and edited to ensure comparability.

6.4.4.1. Choice of WWI topic and three individual intervention themes The KS2 National Curriculum for history states specific topics to be covered, and also broader areas within which schools can select their own topics (DfE, 2013). From looking at a selection of primary schools’ Year 5/6 curriculum maps11, it became clear that particular historical topics were favoured by schools. To choose the topic to be taught during interventions, a list of topics commonly taught in upper KS2 was compiled. From this list, topics that either of the participating schools had covered before or were planning to cover with the Year 5 cohort being sampled were removed. This ensured that participants from neither school should have a greater prior knowledge of the topic to be taught, and that the research would not interfere with any planned learning. Once these topics were removed, WWI was the final remaining topic.

It was decided that different themes within this topic should be covered in each of the three intervention sessions (see table 7 for an overview of these themes). This variation of themes within the encompassing topic allowed for an exploration of whether learning about particular themes might be affected differently by text types. The first theme focused on key events: the key events chosen were those which occurred in the build- up to Britain declaring war on Germany. This theme was selected because it drew largely on causal and chronological thinking skills. The second theme considered unfamiliar experiences. This theme was chosen to explore participants’ ability to understand more distant concepts, unfamiliar to their everyday lives; it focused on the experiences of

11 Curriculum maps are created by schools to outline what is being taught in each year group, and how this relates to the curriculum. These were downloaded from the websites of various schools across Essex. 114 soldiers in the trenches and trench warfare. The final theme considered more familiar experiences, comparable to experiences in the participants’ own lives, in looking at life on the Home Front. This included learning about the experiences of women and children during the war. This theme, in contrast to the second theme, allowed an exploration into how text types might influence understanding of more accessible, familiar experiences.

Table 7: Themes within WWI topic

Theme Instance of theme Titles of texts Key event(s) Events that occurred in War Begins the build-up to WWI Unfamiliar experiences Life in the trenches and Trench Life the Battle of the Somme Familiar experiences Life on the Home Front, The Home Front including the experiences of women and children

6.4.4.2. Information sources for writing texts Research conducted to inform the content of texts was carried out in different ways for the different themes. For all interventions, the NNF texts were written first. For the War Begins texts, a wealth of relevant information was collected from children’s history books and websites. Information acquired was used to create an historically accurate narrative. It was not possible to follow a main protagonist throughout this narrative, as nobody exists who witnessed and documented all of the events described personally; to maintain historical accuracy and factuality in these NNF texts, such a protagonist could not be fictionalised. Instead, the text referred to key historical figures in the build-up to WWI (Archduke Franz Ferdinand, Gavrilo Princip, Kaiser Wilhelm II and Herbert Asquith). Conversely, the Trench Life and Home Front texts both focused on one or two key protagonists, and therefore, whilst general information on these themes was derived from children’s history books and websites, additional information was sought. For the Trench Life intervention, I read the published diary of a soldier named Harry Stinton (Stinton, 2014). This diary was written by Harry during WWI, and slightly adapted (to make it more easily readable) for publication. Information from this diary was used to write the Trench Life NNF text, which followed Harry Stinton’s journey to, and experiences in, the trenches. Additional information about the trenches (for example,

115 on trench design) was woven into the narrative. Similarly, The Home Front text drew on two additional sources of information. Firstly, information was drawn from the audio- recorded interviews of Ethel May Dean, a woman who worked in a munitions factory during WWI (Imperial War Museum, n.d.). Secondly, the letters of a young boy, George Butling, written to his father in France during the war (Butling, 1916-1918) were read. These letters were accessed through the Imperial War Museum library collection. As with the Trench Life NNF text, the information provided by these individuals was used to create historically accurate narratives, following key protagonists. The six texts created can be found in Appendix K.

6.4.4.3. Key differences between narrative nonfiction and expository texts Once the first drafts of the NNF texts were complete, they were adapted to create three equivalent ETs. At this stage of the writing process, key differences between text types were carefully considered. Important features of narratives are that they are particular and context-specific (Bruner, 1991; Boström, 2008), with some form of landscape of consciousness (Bruner, 1986), and that they are typically read for enjoyment; expository texts typically contain more generic language (Gelman et al., 2013), do not focus on a landscape of consciousness, and are typically read to learn. Therefore, key differences were primarily in the style of writing (specific vs. generic), the presence of protagonists, and the use of linguistic devices to engage the reader. NNF texts followed the experiences of key protagonists throughout (for example, ‘Harry Stinton stood…’): experiences were particular and contextualised, embedded in a specific time and setting. Conversely, ETs considered the more generalised experiences of a population (for example, ‘Soldiers stood…’): information was generic and decontextualised12. In terms of linguistic devices, NNF texts contained emotive word choices, vivid descriptions and features such as ellipsis to build suspense. In contrast, language in ETs was more succinct and informative. For example, appropriate adjectives were only used to provide relevant information. One final, notable difference was the inclusion of sub-headings in ETs: these were included to enhance the sense that information was grouped thematically, rather than sequenced chronologically. Table 8 (see page 118) lists specific

12 This is with the exception of the War Begins texts, where both NNF and ETs followed the actions of four key historical figures, for reasons discussed in section 6.4.4.2 above. 116 key differences between text types, while Figure 5 (see page 119) illustrates these across a section of the NNF and ETs written on Trench Life.

117 Table 8: Different features and controlled features across the two text types Differences across text types Features controlled Text features NNF ET Figures/ • Key protagonist(s) in a specific context • No protagonists – generic information (Gelman protagonists • Narrated in third person, but giving the et al., 2013) presence of a personal, identifiable ‘voice’. E.g. • Written in ‘omniscient third-person’ (Wineburg, Harry had to peek through… 2001:13). E.g. The soldiers had to peer through… • Reference to known historical figures where relevant e.g. Herbert Asquith Information • Historical information that can be verified from a trusted source • Historical content • Information order • Text length Linguistic • Descriptive language used to set the scene. E.g. • Description used to provide information. devices Austro-Hungarian flags hung from windows E.g. People living in the city had hung Austro- and balconies, fluttering gently in the light Hungarian flags out of the windows and summer breeze. balconies of their houses. • Varying sentence structures. • More standard sentence structures. E.g. Sharing final cigarettes and cracking jokes, E.g. The soldiers were said to have shared final the soldiers’ mouths smiled…. cigarettes and cracked jokes. • Literary devices to create plot and engage readers (figurative language, ellipsis). • Factual descriptions used. Language • Less formal cohesive devices. • More formal cohesive devices. • Frequency of E.g. Not only this, That very same day… E.g. In addition to this, On the same day… historical • Emotive language choices. E.g. peering • Standard language choices. E.g. watching vocabulary • Specific, technical historical vocabulary. E.g. artillery bombardment, munitions • Text complexity Organisational • Written in chronologically cohesive prose • Sub-headings used to group information devices thematically in places

118 Figure 5: Illustration of differences and similarities across equivalent narrative nonfiction and expository texts

119 6.4.4.4. Features controlled for across narrative nonfiction and expository texts Once roughly equivalent NNF and ETs had been created and differentiated according to the features listed in table 8, the pairs of texts were continually altered to ensure that they were highly similar in terms of particular elements. Keeping texts similar enough to be comparable whilst making them different enough to be distinguishable as different text types was a difficult process. During this process, decisions had to be made about which elements of texts should remain consistent across both text types, in order to avoid introducing additional factors that might influence interactions with texts.

In the literature review, it was noted that the content and structure of ETs can make them more difficult for younger readers to comprehend than narratives: these were kept similar across the two conditions, to prevent these factors from negatively affecting learning in the ET condition. Content was controlled by ensuring that pairs of texts contained the same historical information. In addition to this, any history-specific vocabulary used appeared with approximately the same frequency across pairs of texts. Both texts were also structured similarly, with events described in the same chronological sequence. This was additionally important because the order of presentation of information can affect how learners acquire information from a text: initial information encountered can influence the interpretation of later information and its subsequent incorporation into pre-existing knowledge structures (Rumelhart & Norman, 1978; Dennis & Ahn, 2001), and later items might receive less attention (Yates & Curley, 1986). However, as discussed above, sub-headings were used in ETs, to group information thematically within this chronological sequence.

Texts were also tested against Flesch (Flesch, 2007), to ensure that they were similar in terms of complexity and reading difficulty. This ensured that the difficulty of text types was not a factor that might influence learning across conditions. The Flesch scores of all texts can be found in table 9 below. The grade level shows how many years of education are required for a child to comprehend a text, whilst the reading ease level indicates the simplicity of a text on a scale of 0-100, with 0 being very complex and 100 being very simple. Although the grade levels were above the expected grade level for the participant sample, both teachers and participants in the pilot study felt that texts were of a suitable level. In addition, literature is doubtful of the accuracy of reading formulae

120 in determining the ease with which a reader can understand particular texts, mainly because reading formulae fail to account for additional factors such as prior knowledge, reading motivation, genre, context, number of inferences required, and so forth (Bruce, et al., 1981; Duffy, 1985). In fact, it has been found that making changes to a text in order to make it easier to read in accordance with reading formulae may actually make it more difficult to understand, for instance, in terms of splitting related clauses to create shorter sentences (Davison & Kantor, 1982). Therefore, the Flesch reading formulae were primarily used to ensure that texts were comparable, rather than to dictate whether they were of an appropriate level for participants. The length of texts was also adapted to make them as similar as possible.

Table 9: Flesch scores and number of words across pairs of texts in the three interventions

NNF ET Reading Number Reading Number Grade Grade Ease of Ease of level level Texts level words level words Intervention 1 (War 7.34 67.04 1068 7.41 67.62 1056 Begins) Intervention 2 (Trench 7.35 69.79 1047 7.37 71.80 1038 Life) Intervention 3 (The Home 7.49 70.91 1328 7.44 71.56 1321 Front)

6.4.5. Discussion questions For each intervention, there were a total of six discussion questions. These were presented to participants in their small groups directly after a text had been read. Group discussions were audio-recorded, transcribed and coded for analysis. The first five questions varied across each of the three interventions, addressing the themes specific to interventions. The sixth question was the same across all three interventions, requiring participants to state how reliable they felt texts were, and to justify their opinion. Questions can be found in Appendix L.

121 The first five questions in each intervention aimed to encourage discussion amongst participants about the theme of the intervention. Questions were designed to avoid soliciting simple answers that could be extracted directly from the text, but were more open-ended, intending to invite discussion. Questions were categorised according to historical knowledge and historical thinking skills. Firstly, they were categorised as either substantive or second-order: substantive questions required participants to recall historical information, whereas second-order questions encouraged participants to make inferences in relation to the texts. These questions did not always elicit the intended type of answer. Secondly, questions were labelled as to which historical thinking skill they encouraged participants to draw on: conceptual thinking, chronological thinking or causal thinking. Appendix L shows the categorisation of each question. Questions were designed to draw on conceptual understanding that was not addressed in the written assessments. This was primarily to minimise the influence that discussions might have on later ability to complete assessments: assessments intended to measure understanding developed through interactions with the texts, as opposed to interactions with texts and peers during discussions. In addition, this allowed a wider breadth of understanding to be assessed overall.

Of all discussion questions, only one chronological thinking question was included, as it was difficult to assess chronological thinking in relation to the texts: WWI has a short time-span, and two of the texts considered general experiences, rather than historical events that could be chronologically sequenced. However, other questions did leave room for chronological references to be made. The chronological question was the first discussion question of the War Begins intervention, and was unique from other discussion questions as it required participants to chronologically sequence six cards depicting events that occurred in the build-up to Britain declaring war on Germany (see Appendix M). Participants were also required to provide the date on which each event occurred. When recordings for this question were transcribed, it became clear that these could not be coded in the same way as the other questions: participants were focused on the physical aspect of the task, and discussions involved frequent ambiguous statements, such as, ‘Wait, this one goes here’, with reference to the cards. Consequently, an alternative analysis was conducted for this question. During interventions, photographs were taken of participants’ completed sequences of cards.

122 These sequences were then marked: one point was awarded for each card in the correct place, and a further point was assigned for each correct date given, providing a possible total of 12 points. This marking scheme was quite rigid: participants could have ordered a sequence of, for example, three cards correctly, but if these cards were placed in the overall incorrect position, the group were given no recognition for the chronological thought shown. However, this issue was present across both conditions, and therefore should not have influenced any differences across conditions. Because analysis of this question does not consider participant discussion and the active dialogical construction of conceptual understanding in relation to the text, this additional chronological sequencing task will be considered alongside analyses of assessments, rather than alongside analyses of discussions.

6.5. Procedures The experiment took place over the course of eleven weeks (see table 10 for an overview). Participants were split into two experimental conditions: experimental condition A (the NNF condition) and experimental condition B (the ET condition), as described in section 6.3.1. In the first week, all participants completed the same written pre-assessment on WWI, to provide a baseline and to assess levels of prior knowledge. In week two, participants completed the first intervention session, followed by the second intervention session in week three, and the final intervention session in week four. In week five, participants completed the written post-assessment, which was identical to the pre-assessment and intended to assess the development of conceptual understanding. Six weeks later, participants completed the delayed post-assessment. This was identical to the previous assessments, and intended to measure retention of conceptual understanding. More details on each stage of the experiment are provided below.

123 Table 10: Outline of the stages of the experiment

Experimental Condition A Experimental Condition B Week (NNF) (ET) Week 1 Pre-assessment Week 2 Intervention 1: War Begins Intervention 1: War Begins Based on NNF text Based on ET Week 3 Intervention 2: Trench Life Intervention 2: Trench Life Based on NNF text Based on ET Week 4 Intervention 3: Home Front Intervention 3: Home Front Based on NNF text Based on ET Week 5 Post-assessment Week 11 Delayed post-assessment

6.5.1. Administration of pre-/post-/delayed post-assessments All assessments were conducted on Tuesday afternoons in School A, and Thursday afternoons in School B. The researcher administered all assessments. This minimised instrumentation as a threat to validity (Gorard, 2001). Participants were given a maximum of thirty minutes to complete assessments. Participants could have questions read to them if they wished, to minimise the barriers that poorer reading skills might have posed for some.

It must be noted that post-assessments were sat during the week in which KS2 SATs were administered. Although only Year 6 children sat these statutory assessments, both schools chose this week to baseline their Year 5 pupils by administering past SATs papers. Teachers in both schools commented that this had been quite an intense week for pupils, who had sat tests over the course of the week and had been preparing for these tests in previous weeks. This might have had an impact on participants’ performance during post-assessments. However, as this was the case for both schools, this should not result in any difference between participants’ post-assessment scores across schools.

The delayed post-assessment was sat six weeks after the post-assessment, to assess participants’ retention of conceptual understanding. This helped to prevent the research from focusing on the short-term development of conceptual understanding; any findings from the research may be more ‘transformative’ than ‘trivial’ (Gorard,

124 2001:136), as they will indicate whether developed conceptual understanding might be long-term, and therefore a more permanent form of learning.

6.5.1.1. Limitations of written assessments There were concerns regarding the construct validity of written assessments: assessment scores may reflect a participant’s ability to understand and respond to questions (their reading comprehension) rather than their conceptual understanding of a topic. This might disadvantage lower level readers, misrepresenting their level of conceptual understanding. The extent to which written assessments may reflect participants’ reading ability will be considered during data analysis. In addition, assessments provide a rather limited view of learning: they offer a snapshot of a single point in time, viewing learning as an outcome, and as such can fail to capture the complexities of the processes of learning.

6.5.2. Interventions All intervention sessions were delivered by the researcher, rather than class teachers. This decision was made to ensure consistency in the delivery of interventions across both conditions and schools. Teachers deliver lessons in unique ways and may have varying levels of prior knowledge relating to WWI. Therefore, teachers delivering interventions would have introduced numerous external factors that could have affected the data. It is recognised that teachers delivering interventions would have been beneficial for various reasons: teachers know their pupils well, and therefore can maximise learning; participants may behave more naturally in response to their class teacher; and the researcher could have observed and taken field notes if not teaching. However, unfortunately, training teachers to deliver interventions was not possible in this research due to time constraints. In addition, ensuring consistency across conditions being compared was central to this research. Finally, the researcher delivering intervention sessions gave an incentive for schools to participate: there was no additional workload for teachers, and teachers would not feel that they were being judged by the observing researcher, which may have caused them to feel uncomfortable and to teach less naturally.

125 At the beginning of each intervention, participants were briefly introduced to the theme of the intervention, before they completed a small number of prior knowledge tasks in their groups of three to four. The researcher read from a script whilst directing participants in completing these activities, to ensure that both conditions were given the same instructions and exposed to the same information. This ensured that all interventions remained similar, maintaining reliability of treatment implementation to the greatest extent possible (Cook & Campbell, 1979). In one classroom, a Learning Support Assistant (LSA) was present during interventions. This LSA was instructed to provide participants with support in what they should be doing, but no support in terms of developing conceptual understanding.

Following prior knowledge activities, the researcher read the relevant NNF or ET to participants. Participants were provided with print copies of the text so that they could follow along while the researcher read aloud, if desired. This was done to minimise potential barriers to the text for all learners, as this experiment was designed to measure the development of conceptual understanding in response to texts, rather than specific reading skills. The researcher did not discuss the texts or any of the vocabulary with participants once the texts had been read. Definitions of any unfamiliar vocabulary (including alliance and casualties, as identified in the pilot) were integrated into prior knowledge activities.

Finally, after reading the text, the children were given the six discussion questions relating to the text and asked to discuss these in their groups. These questions were identical across conditions. Questions were presented one at a time on a PowerPoint slide. They were read aloud by the researcher and then participants were given a short amount of time (dependent on the question) to answer the question. Participants recorded discussions of these questions on iPads, using an app called Voice Recorder. Participants were able to keep copies of the texts and the prior knowledge activities on their desks so that they could refer back to these if needed: discussion questions aimed to assess understanding, not memory for information. If participants tried to initiate discussion with, or ask questions of, adults in the classroom, they were told to continue discussions amongst themselves: no questions related to conceptual understanding

126 were answered. Once all six discussion questions had been completed, participants were asked to stop the recording and to save the audio-file.

6.5.2.1. Limitations of group discussions Similarly to assessments, the audio-recorded discussions may have minimised the participation of some children. Within the participant sample, there was a child with speech, language and communication difficulties, who might have found it difficult to express ideas during group discussions. Additionally, personality factors might affect levels of participation and the dynamics of group discussions: more introverted participants may have valuable input, but possibly fewer opportunities, or less willingness, to make their voices heard in a group of more dominant personalities. Further to this, a group of more extroverted participants with good teamwork and discussion skills might have had a much more productive discussion than other groups. This would not necessarily be because they had developed a greater conceptual understanding of the text than others, but perhaps because they were more adept at constructing and expressing this understanding collaboratively. Despite this, the use of the two separate methods of assessment – written assessments and discussions – allowed participants different ways to express their understanding.

Group discussions were completely controlled by participants, with no input or direction from the researcher. Therefore, sometimes participants would continue to finish their discussion of a previous question although a new question was presented, and would subsequently run out of time to answer the new question. Sometimes, once a group felt that they had provided an appropriate answer, they settled for this answer and did not provide any further details, where prompts from an adult might have sparked further discussion and insights into conceptual understanding. Similarly, occasionally groups became distracted and followed temporary tangents during a question. However, overall, groups were good at staying on topic and maximising the time they had to answer questions.

There was a concern that the nature of discussions may have influenced post- and delayed post-assessments. Some participants may have more effectively developed conceptual understanding from discussions than texts, subsequently expressing this

127 knowledge on written assessments. Therefore, any conceptual understanding observed might be resultant of a combination of texts and discussions. However, assessment and discussion questions differed in content, in order to minimise this possibility. During discourse analyses, relationships between codes produced and assessment scores will be considered to explore this possibility further.

Finally, group discussions made it difficult to identify individual participant responses to texts. However, this research does not aim to consider the conceptual understanding of individuals, but wishes to assess, overall, whether either condition caused participants to differentially develop conceptual understanding. Although individual differences would be interesting to observe, looking at these is beyond the scope of this research, and is possibly an avenue for future enquiry.

6.6. Ethical Considerations Ethical approval was obtained for this research from the E&M Research Ethics Panel (LRS-16/17-4696). Written consent was obtained from both participants and their parents/guardians. Firstly, the researcher briefly spoke to all potential participants in the sampling frame about the research and answered any questions. Following this, a consent package was sent home to the parents/guardians of all potential participants. This included an information sheet for parents, a child-friendly information sheet, two copies of the consent form, and finally a short questionnaire for parents (see Appendix D). Duplicate copies of the consent form were provided so that parents could keep a copy for future reference, if required. The researcher then shared the child-friendly information sheet with children who returned consent forms, obtaining their written consent to participate. Written consent was required from both the parent/guardian and the child in order for a child to participate.

Once consent forms and completed questionnaires were collected, participant data was collated in an Excel document. After assessment scores had also been recorded in this document, data was anonymised: all children were referred to under pseudonyms following the completion of the research project in order to maintain anonymity and confidentiality. Audio-recordings were also saved anonymously, and transcribed without reference to participants’ names.

128 As interventions delivered history content relevant to the curriculum, interventions were delivered to all participants in the sampling frame as their weekly history lesson: not only was this the most practical option, but it also ensured that the research did not interrupt teachers’ weekly timetables and that no children were excluded from this learning opportunity. However, data was only collected from those who returned consent forms. This was made clear in the consent package sent home to parents. Following the conclusion of the research, each class was provided with copies of all of the NNF and ETs used during interventions. Therefore, if any difference did emerge between the quality of learning achieved through either text, all participants would have an equal opportunity to read all texts and receive the same learning benefits.

Finally, no exclusionary criteria were applied when selecting the participant sample, to ensure that data fairly represented the interests of all children, regardless of any Special Education Needs or learning difficulties. However, the administration of assessments and the delivery of interventions were closely controlled across conditions and schools to ensure comparability: this had the potential to cause equity issues, as the experimental design might limit the participation of those with additional needs, potentially influencing their quality of learning. To overcome this, the needs of specific pupils were discussed with their class teachers, and support was put in place to allow these participants to access the assessments and interventions: support was designed to be minimal, in order to minimise the impact on experimental design, but sufficient to allow access to the interventions. Whilst support was put in place to enable all participants to access interventions, other factors may also have influenced levels of participation; for instance, the levels of support offered by other participants in discussion groups may have either supported or lessened participation. Whilst equality was considered when sampling, equity consequently became an important consideration during experimental design.

6.7. Pilot study The first stage of the pilot study involved a selection of six upper KS2 teachers reading the texts written for this research and looking through the materials created for prior knowledge activities. Teachers offered feedback on possible alterations to texts and

129 materials, and also suggested whether they felt these resources were suitable for Year 5 pupils. All teachers felt that materials were at an appropriate level for this age group.

Following this process, the full experiment was conducted with fifteen Year 5 children in a rural primary school in Essex. Firstly, pilot participants were observed whilst completing the pre-assessment and the questionnaire attached to the assessment, and asked whether there were any questions that were ambiguous or unclear to them. One change was made to the assessment, and one to the questionnaire. The change made to the assessment clarified questions by using the term ‘World War One’, instead of the abbreviation WWI, which was not recognised by some pilot participants. The short questionnaire was changed to include brief definitions of fiction and nonfiction, as a number of pilot participants asked to be reminded what these terms meant.

The researcher then marked pre-assessments using the pre-determined marking scheme. Three changes were made to the marking scheme to make it more appropriate in relation to pilot participants’ responses. For instance, when asked when WWI began, children’s answers ranged from correct (1914), to very close (1918) to distant (1810). It was thought that those participants who were close to the correct answer should be recognised as having partial knowledge regarding this question, compared to those who were more distant. Therefore, the mark scheme was changed to award participants who were within four years of the correct answer with one point, whilst two points were awarded for the correct answer. Four years was chosen as an appropriate range for one point to be awarded because WWI lasted from 1914 to 1918, and therefore participants responding with any year throughout the duration of WWI were accepted as showing partial understanding.

Next, interventions were delivered to the fifteen pilot participants (one intervention a week). The pilot confirmed that discussions were more effective with three participants per group: two participants per group seemed to lead to shorter and less detailed discussions, whilst with four participants in a group, at least one participant seemed to participate less and needed reminders to join in. Secondly, I had underestimated how much discussion participants were keen to have during prior knowledge activities and after reading the texts, and how keen they were to share their understanding. Because

130 of this, a line was added to the beginning of the scripts that the researcher used during interventions to make clear to children that they would have the opportunity to have discussions later, upon completion of the experiment, and that I was looking forward to hearing all of their ideas when I listened back to their audio-recordings. Finally, some pilot participants struggled with some of the vocabulary in the texts, for instance, ‘alliance’ and ‘casualties’. Because of this, a brief discussion of definitions of these words was integrated into prior knowledge tasks.

6.8. The coding process All audio-recordings of discussions were transcribed by the researcher. Following transcription, three coding schemes were developed: a function coding scheme, a content coding scheme and a truth-value coding scheme. The function coding scheme and content coding scheme were used to code the first five questions of each intervention13, whilst the truth-value coding scheme was used to code the final question of each intervention, in which participants were asked about their perceptions of the truth-value of texts. The function coding scheme aimed to explore how participants constructed conceptual understanding in the social space between themselves, the text and their peers. The main overarching codes within this coding scheme were distinct from one another because they each related to unique strategies that participants employed to construct understanding. Following this, the content coding scheme was developed to explore the content of discussions, that is, the conceptual understanding that participants constructed, and the accuracy/depth of this understanding. In contrast to the function coding scheme, content codes were much more closely related to one another, as they all reflected historical knowledge or historical thinking skills. This difference between coding schemes influences the way in which data obtained from these coding schemes is later analysed. It is also important to note that the function coding scheme precedes the content coding scheme, as content (conceptual understanding) was developed as a result of function (strategies used to construct understanding). The final coding scheme – the truth-value coding scheme – considered the truth-value that participants assigned to texts (how reliable participants felt texts were), and participants’ justifications for these views. The general construction of

13 With the exception of the War Begins intervention, where only questions 2 to 5 were coded. This was because question 1 was assessed differently (see section 6.4.5 for more detail). 131 coding schemes will be discussed below, before each individual coding scheme will be explained in more detail. Finally, general difficulties in the construction of coding schemes and limitations of this approach will be considered. The three coding schemes can be found in Appendices N to P.

6.8.1. Constructing coding schemes The same process was followed to construct all three coding schemes. The initial codes were created mainly in response to the themes emerging from the transcripts (these were noted during the transcription process), but also in relation to existing literature. Once draft coding schemes were created, a small number of transcripts were coded to test the coding schemes. Any issues were noted and the coding schemes were amended appropriately in response to these. The coding schemes were gradually developed and refined by repeating this process. Some codes were removed as they occurred too infrequently, making them unsuitable for analysis. This process was then repeated with an additional coder: any disagreements over codes were discussed, and relevant changes were made to coding schemes to make them more reliable.

One key difficulty discovered with two coders was the selection of which utterances should be coded, and where these utterances should begin and end. Often, one coder would select multiple utterances to code individually, whereas the other might group these utterances to assign one code to. Additionally, as the transcripts were of young participants’ discussions, with no adult guidance, utterances were often interrupted or incomplete, and therefore difficult to select for coding identically across two coders. Because of these difficulties, the decision was taken that the primary coder should select the lines to be coded, using a specific set of rules on selecting codable utterances (these rules can be found in Appendix Q). Both coders then coded the selected lines. Codes were subsequently compared, and this method resulted in a much higher proportion of agreement between coders. Because this method was applied consistently, with strict adherence to the rules on selecting codable utterances, and still allowed a valuable comparison between coders, this method was thought to be reliable. Consequently, this method was also used when inter-rater reliability was sought. As the primary coder would ultimately be selecting which sections of text to code for the majority of transcripts once inter-rater reliability had been obtained, the process of establishing

132 and adhering to rules on selecting codable utterances was useful, as it ensured consistency throughout the whole coding procedure. To ensure the reliability of coding, a second coder was provided with each of the three coding schemes and asked to code a randomly selected 20% of transcripts (five transcripts from each intervention) for each coding scheme. These transcripts were taken from across the three interventions to ensure that the coding scheme was equally reliable across interventions. Inter-rater reliability showed strong agreement (McHugh, 2012) between myself and the second coder for each of the three coding schemes (function coding: k = 0.85, percent agreement = 87.06%; content coding: k = 0.81, percent agreement = 82.59%; truth-value coding: k = 0.87, percent agreement = 89.47%). Since there were relatively few disagreements, these were dealt with by selecting the first coder’s codes to be input as final data.

6.8.2. Function coding scheme The primary aim of the function coding scheme (Appendix N) was to explore how participants constructed conceptual understanding in the social space between themselves, the text and their peers. To capture this process, codes in this scheme essentially described strategies that participants used to construct conceptual understanding. These codes primarily intended to shed light on the third research question (see section 6.1): participants were likely to have constructed mental representations of texts during the reading process, and discussions aimed to provide an insight into these mental representations, exploring how participants might go on to negotiate and adapt these mental representations in the social space between them. The coding scheme contained two levels of code: there were three overarching codes, with each overarching code encompassing a number of specific codes (see table 11 for an overview of codes (p.135), or Appendix N for a more detailed description codes, including examples). The first overarching code was ‘textual understanding’: specific codes within this overarching code related to how participants – individually and collaboratively – constructed meaning and appropriate conceptual understanding, using both the text and each other. The second overarching code was the ‘referencing’ code, which explored references that participants made to different sources of prior knowledge. The third and final overarching code was ‘navigation’: this code related to how participants navigated texts to locate relevant information. When applying this

133 coding scheme, each selected utterance was assigned both an overarching code and a specific code. To apply codes, the coder would read through the text, identifying where participants were employing strategies listed in the coding scheme. These would then be coded accordingly. Upon the first reading of transcripts, strategies were sometimes overlooked, and therefore transcripts were read through multiple times to ensure that all strategies were identified.

134 Table 11: Overview of function codes

Overarching Specific code Brief description code Participants read information directly Direct reading from the text Participants read short sections from the Support text to support their answer Multiple participants construct a Collaborative collaborative answer, characterised by (cumulative) agreement with one another Multiple participants construct a Collaborative Textual collaborative answer, characterised by (negotiation) Understanding constructively questioning each other Assessing accuracy Participants challenge/correct each other Self-correction A participant corrects themselves Participants question/imagine Hypothetical reasoning hypothetical situations/possibilities A participant summarises the overall Summarise discussion Participants question information not Absent information present in the text Participants draw on prior knowledge of Prior knowledge war Personal experiences Participants draw on personal (in own lives) experiences Personal experiences Participants make reference to a relative’s Referencing (of relatives) experiences (e.g. grandparents) Participants make reference to Within texts information elsewhere within the text Between texts Participants make reference to other texts Participants make general reference to Materials materials in the intervention Participants scan text to locate Scanning information Participants refer directly to a relevant Paragraph paragraph to answer a question Participants refer to a general part of the Navigation General stage of text text to answer a question Participants refer directly to a relevant Page page to answer a question Participants use sub-headings to locate Sub-headings relevant information

Many of the codes reflected processes that participants engaged in to establish and express substantive knowledge. However, some codes reflected processes that participants used to construct second-order knowledge: as there were few of these,

135 these are highlighted here. Firstly, the specific ‘hypothetical reasoning’ code is a type of possibility thinking, which reflects elements of second-order knowledge (Cooper, 2018a), as discussed in section 3.1. This code was assigned when participants either imagined themselves in a scenario, or imagined alternative, possible scenarios, related to the text. For instance, ‘If I were in the war….’, or ‘If Germany didn’t invade Belgium…’. Such utterances required participants to go beyond the information presented in the text, in doing so, making their own subjective judgements, thus constituting second- order knowledge. Similarly, specific codes within the overarching ‘reference’ code often contributed to the development of second-order knowledge. These codes required participants to consider their personal knowledge of previous experiences, applying this to support understanding of textual information. Considering personal experiences in relation to information presented in the texts encourages a more subjective understanding of this information.

One of the codes – the collaborative code – was initially created in response to observations made during the transcription process, but further refined using the work of Mercer. Mercer (1996) describes three kinds of talk: disputational talk (characterised by disagreement, with individuals expressing their own opinions in a somewhat competitive manner); cumulative talk (in which all speakers agree with each other, mostly simply listing ideas, and sometimes elaborating on the ideas of others); and finally, exploratory talk (speakers challenge each other in a constructive way, seeking overall agreement through respectfully sharing and dissecting ideas). In this research, cumulative talk was observed similarly to that described by Mercer, which gave rise to the collaborative (cumulative) code. Another type of group talk observed was similar to Mercer’s concept of exploratory talk. However, in this research, such talk was characterised mainly by constructive disagreements and corrections, and therefore the term ‘negotiation’ was thought to be more suitable than ‘exploratory’, giving rise to the collaborative (negotiation) code.

6.8.3. Content coding scheme The content coding scheme (Appendix O) aimed to identify conceptual understanding expressed by participants, in terms of historical knowledge and historical thinking skills. It also explored the depth and accuracy of this understanding. It primarily aimed to

136 provide insight into the first research question, exploring the development of conceptual understanding across conditions. This coding scheme contains three levels of code. Figure 6 (below) provides an overview of the content coding scheme, which this section will explain in more detail.

Figure 6: Overview of the content coding scheme

Second level First level codes - Third level codes - codes - Historical Accuracy/depth of Historical Thinking historical understanding Knowledge

Basic

Partial Conceptual thinking Recall Mixed (Substantive) Complex

Chronological Inaccurate thinking

Inference (Second-order) Causal thinking

Firstly, utterances were assigned one of three historical thinking codes, according to which historical thinking skill participants were showing: conceptual thinking, chronological thinking or causal thinking. Once assigned an historical thinking code, utterances were assigned a second code, indicating historical knowledge: recall or inference. Recall occurred when participants were recalling information from the texts, reflecting substantive knowledge. The inference code was assigned when participants made an appropriate inference in relation to the text. These inferences reflected participants’ second-order knowledge, in that they were developing an understanding not directly expressed within the text. The third and final level of coding occurred only if utterances were coded as recall; if so, they were assigned a final code to describe the accuracy and/or depth of the utterance. These final accuracy codes included basic, partial, mixed, complex and inaccurate codes. Basic codes occurred when participants

137 recalled a basic unit of historical information from the text, and partial codes when participants recalled partial information which was not sufficient to show full understanding. Mixed codes were applied when participants recalled accurate information from the text, but that was irrelevant in relation to the discussion or question. Complex codes were assigned when participants recalled at least three, related units of information from the text in a single utterance. Finally, inaccurate codes were assigned when information was misremembered or inaccurate in relation to the text.

One key difficulty in constructing this coding scheme was determining the overall structure of the codes: a balance was sought between the coding scheme being thorough enough to capture interesting insights into discussions, yet simple enough that a second coder could use it without difficulty. For instance, originally there were different third level codes for each of the three historical thinking skill codes (first level codes). However, such a vast number of codes was overcomplicated. Instead, the same codes were adapted for use across all three first-level historical thinking codes. In addition, simplifications had to be made to reduce the subjective interpretation involved in the application of some of the initial codes. For instance, initially, utterances coded as inferences were then additionally classified as either appropriate or inappropriate: it was hoped that this would indicate the relevance of inferences made. However, upon applying the inappropriate inference code in numerous transcripts, it became clear that it was difficult to determine whether information in the utterance had been misrecalled from the text (an inaccurate recall), or whether it was the result of an inappropriate inference. Finally, a specific difficulty was encountered with assigning causal utterances. Participants would often begin utterances with the words ‘because’ and ‘so’ (words which the coding scheme states suggest the beginning of a causal utterance); however, participants seemed to use these to signal that they were beginning a turn in conversation, rather than to indicate causality. In such instances, utterances were only coded as causal if it was clear that causality was intended in the content of the utterance.

138 6.8.4. Truth-value coding scheme The truth-value coding scheme (Appendix P) was only used to code the final question of each discussion, when participants were asked about their perceptions of the truth- value of texts. This coding scheme was designed to gain insight into the fourth research question, exploring how participants perceived and judged the truth-value of NNF and ETs. Firstly, truth-value judgement codes were assigned. To do so, utterances were identified in which participants expressed opinions on the truth-value of the text. Each participant was assigned only one truth-value judgement code per discussion. If a participant expressed a perception of truth-value, they were assigned one of three ‘degrees of truth’ codes: ‘entirely factual’, ‘partially factual’ or ‘entirely fictionalised’. Not all participants expressed a view, and therefore some participants were coded as ‘no response’. Some participants expressed a view at one stage, and then changed it during the discussion. In this case, the final view that they expressed was coded.

Following this, truth-value justification codes were assigned, to explore participants’ reasoning behind their truth-value judgements. There were two levels of codes here: overarching and specific codes. The three overarching codes included: composition (consideration of the content/style of texts), source (consideration of where textual information originated/who provided it) and prior knowledge (comparison of information against prior knowledge). Once an overarching code was assigned, an additional specific code was also assigned (see table 12, on the following page). The codes were created entirely in response to the transcripts. However, they do relate to ways in which readers generally evaluate the trustworthiness of texts: when judging the reliability of texts, adults have been found to take into account the source of information, the plausibility of the content of the text, and the author’s credentials (Eyden et al., 2013).

139 Table 12: Overview of truth-value justification codes

Overarching code Specific code Brief description Questioning the degree to Making sense which content makes sense Considering historical Characters figures/protagonists Composition Discussion of literary devices Literary devices (e.g. figurative language, sub- headings) Drawing on ‘facts’ from the text Facts as evidence Reference to the person who Teacher provided the text (the ‘teacher’) Reference to people who might Original source have witnessed historical events described Source Reference to the creator of the Author/historian text Reference to the existence of Evidence documents, artefacts, photographs, etc Drawing on prior knowledge of History war/relevant history Prior knowledge Drawing on personal Experience experiences

6.8.5. Coding limitations Whilst difficulties relating to particular coding schemes were discussed above, the general coding process also presented some difficulties. Firstly, the nature of the recordings and subsequent transcriptions presented issues in terms of applying coding schemes. The recorded discussions involved groups of three to four children discussing questions independently of adult input. Therefore, in places, it became difficult to follow discussions, as there were numerous interruptions and sudden changes in the direction of discussions. Some participants struggled to articulate themselves, and would use ambiguous pronouns (for example, ‘he wanted revenge’, not stating who ‘he’ referred to). Without an adult to prompt participants to explain more, such statements remained ambiguous. In addition, discussions were only audio-recorded, and therefore lacked clues such as facial expressions and gestures, which can give additional meaning during the coding process (Callanan et al., 1995). Because of these various issues, each code came with a clearly defined set of rules for application. As new instances arose in the transcripts that rules did not account for, these rules were slowly adapted and refined 140 to encompass these instances. Once the coding schemes were finalised, it was made clear that any utterances that were open to interpretation should be left uncoded.

141 Chapter Seven: Assessment analysis and findings This section will consider data collected from the written assessments, with the aim of exploring two of the main research questions, detailed below. Both questions will be considered in conjunction throughout this section. This introduction will begin by outlining the number of participants in analyses, and the application of Bonferroni corrections and effect sizes. The next section will go on to provide descriptive statistics of assessment scores, considering points scored across narrative nonfiction (NNF) and expository text (ET) conditions, alongside points scored for specific types of questions. The following section will consider inferential statistics, which aim to assess whether any statistical differences arose across the two conditions in terms of assessment scores. The subsequent section will consider additional factors that may have influenced participants’ performance on assessments across conditions, such as whether participants held an individual interest in World War One. This section will outline descriptive and inferential statistics on each possible additional factor. Finally, there will be a brief discussion of the findings presented.

1 How does narrative nonfiction affect the development of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 2 How does narrative nonfiction affect the longer term retention of conceptual understanding of World War One for primary-aged readers, in comparison to expository text?

Of a total of 78 participants, six were absent from at least one of the three intervention sessions. These six participants were removed from a majority of analyses (with the exception of analyses which focused on single interventions, in which only participants absent from the intervention in question were removed). One participant was absent from the post-assessment, and a further two participants absent from the delayed post- assessment. These three participants’ assessment scores were marked as missing cases, and therefore these participants were excluded from all analyses. This left a sample of 69 participants for a majority of the analyses conducted below. Where the number of participants in an analysis differs from 69, the number of participants is stated.

142 In some places, a large number of tests were run, therefore increasing the chance of type I errors and false findings occurring. To minimise the chances of this, Bonferroni corrections were applied when five or more analyses were run together. Despite a number of variations of the Bonferroni correction being developed, with the aim of increasing statistical power, studies of these variations have found that modified Bonferroni procedures do not result in a large magnitude of power increase (Olejnik et al., 1997). Therefore, the Bonferroni correction was selected for use. How tests were grouped and selected for the application of the Bonferroni correction can be found in Appendix R. Where Bonferroni corrections have been applied, the adjusted alpha will be stated before results of the analyses are reported.

When considering effect sizes, Cohen’s (1988) guidelines are used (see table 13). For ANOVAs and variations of the ANOVA, eta squared was calculated and reported, rather than partial eta squared. Eta squared was reported because it is additive: the effects for all variables together (including error) cannot sum to more than 1. The effect size for individual variables decreases as the number of independent variables increases, and this makes it a useful effect size for comparing the effect of multiple independent variables (Levine & Hullett, 2002). Secondly, reporting eta squared means that effect sizes can be reliably compared against Cohen’s guidelines for effect sizes.

Table 13: Cohen’s guidelines for effect sizes

Eta squared Pearson correlation Effect size (h2) coefficients effect size Cohen’s D Small 0.01 0.10 – 0.29 0.2 Moderate 0.06 0.30 – 0.49 0.5 Large 0.138 0.5 – 1.0 0.8

Where significant interaction effects were observed in repeated measures ANOVAs, further post-hoc comparisons were conducted to explore the specific nature of the effect. For these post-hoc comparisons, multiple paired and independent samples t- tests were run to explore the different levels of the variables, to identify more specifically where these were interacting. These post-hoc comparisons are reported following the main findings of repeated measures ANOVAs.

143 7.1. Descriptive statistics Assessments were marked out of a possible total of 56 points. Figure 7 shows that pre- assessment scores were mainly clustered below 10 points. Mean scores were similar across conditions (NNF: M=4.62, SD=3.652; ET: M=3.03, SD=2.176). In the NNF condition, there was an anomaly: one participant scored 17 on the pre-assessment. This anomaly will be considered during initial inferential statistics to establish whether it might influence comparisons across conditions. For post-assessments, whilst the range of scores was similar across conditions (see figure 8), the mean score for the NNF condition was higher than for the ET condition (NNF: M=19.77, SD=9.528; ET: M=14.20, SD=9.142). At the delayed post-assessment, the NNF condition showed a greater range of scores than the ET condition (see figure 9), and a higher mean score (NNF: M=18.88, SD=10.554; ET: M=12.26, SD=7.975).

Figure 7: Frequency of pre-assessment scores across conditions

144 Figure 8: Frequency of post-assessment scores across conditions

Figure 9: Frequency of delayed post-assessment scores across conditions

In consideration of assessment questions relating to the three intervention sessions, the most points were scored on War Begins questions, across all three assessments (see figure 10). This is regardless of the fact that the War Begins intervention occurred three weeks prior to the post-assessment. Trench Life and Home Front interventions were delivered two weeks and one week before the post-assessment, respectively. Progress from pre- to post-assessments, and a slight decline in scores from post- to delayed post- assessments, was observed for all three interventions, and for questions spanning all three interventions, although the decline between post- and delayed post-assessment scores was minimal for Trench Life questions.

145 Figure 10: Points scored on assessment questions relating to the three intervention sessions

450

400

350

300

250

200

Points scored 150

100

50

0 War Begins Trench Life Home Front Questions spanning all three interventions

Pre Post Delayed post

Finally, assessment questions were categorised as one of four historical thinking skills. On pre-assessments, a majority of points were scored for simple conceptual thinking questions, with a very small number scored for complex conceptual thinking questions (see figure 11). The number of points scored on complex conceptual thinking questions increased by post-assessments (see figure 12). Proportions of points scored on different thinking skills stayed relatively stable between post- and delayed post-assessments (see figures 12 and 13). Figure 11: Proportion of points for different thinking skills questions on pre- assessments

Simple conceptual thinking Complex conceptual thinking Chronological thinking Causal thinking

146 Figure 12: Proportion of points for different thinking skills questions on post- assessments

Simple conceptual thinking Complex conceptual thinking Chronological thinking Causal thinking

Figure 13: Proportion of scores for different thinking skills questions on delayed post- assessments

Simple conceptual thinking Complex conceptual thinking Chronological thinking Causal thinking

7.2. Inferential statistics: the development and retention of conceptual understanding This section will discuss inferential analyses, exploring whether the development and retention of conceptual understanding differed across the two conditions. It will firstly compare the assessment scores of the NNF and ET conditions. Further analyses will then be conducted to explore whether participant ability influenced results. Next, the 147 performance of the two conditions on questions relating to individual intervention themes will be considered, before examining specific historical thinking skills. Finally, possible additional factors that might have influenced assessment scores will be considered. These include situational and individual interest, prior knowledge and enjoyment of reading.

7.2.1. How did text type affect assessment scores over time? Two repeated measures ANOVAs were conducted to compare the assessment scores of the NNF and ET conditions over the course of the three assessments. The first ANOVA included the anomaly observed on pre-assessments, and the second excluded this outlier. The same significant effects were observed in both ANOVAs (and in subsequent post-hoc analyses), and therefore the anomaly will not be excluded from further analyses. ANOVA results with the anomaly excluded can be found in Appendix S; results for the ANOVA including the anomaly are presented in table 14, on the following page.

A significant interaction effect was observed between condition and time, with a small effect size. The NNF condition scored higher on all three assessments (pre-, post- and delayed post-assessments) than the ET condition: whilst the difference between the mean scores of the two conditions on the pre-assessment is relatively small, the difference between conditions’ mean scores increased on post- and delayed post- assessments (see table 15). A significant main effect with a moderate effect size was found for condition, with the NNF condition scoring significantly higher than the ET condition in combined assessments (see table 15). Finally, a significant main effect was found for time, with a large effect size: overall, participants scored more highly on the post-assessment than on the pre-assessment. Whilst scores on the delayed post- assessment were also higher than the pre-assessment, they had declined slightly since the post-assessment (see table 15). A main effect for time was found in a majority of analyses run. Therefore, further main effects for time will be recorded in tables, but only discussed in cases in which they were not found to be significant. Further post-hoc analyses were run to explore where the significant effects lie for time, and also to explore the interaction effect observed between condition and time. These will be discussed in more detail below.

148 Table 14: Repeated measures ANOVA with condition as a factor

Effects df F p h2 Interaction effect 2,66 4.057 0.022* 0.023 (condition*time) Main effect 1,67 7.802 0.007* 0.104 (condition) Main effect (time) 2,66 100.802 <0.001* 0.696 Note: * indicate significant effects

Table 15: Mean scores and standard deviations (SDs) of NNF and ET conditions across the three assessments

Combined Pre- Post- Delayed post- assessment assessment assessment assessment scores Condition Mean SD Mean SD Mean SD Mean NNF 4.62 3.652 19.77 9.528 18.88 10.554 14.42 ET 3.03 2.176 14.20 9.142 12.26 7.975 9.83 Total 3.82 3.079 16.98 9.680 15.57 9.848 12.13

Post-hocs for main effect for time Further post-hoc analyses were conducted to explore the main effect observed for time. Three paired samples t-tests (collapsed across condition) were run to compare the mean scores of assessments in pairs (pre- and post-assessments, pre- and delayed post- assessments, and post- and delayed post-assessments). A significant effect for time, with a large effect size, was observed between pre- and post-assessments and between pre- and delayed post-assessments. A significant effect for time was also observed between post- and delayed post-assessments, but with a small effect size (see table 16). Scores increased significantly from both pre- to post-assessments and from pre- to delayed post-assessments, but decreased significantly from post- to delayed post- assessments (see table 15, above). Once again, these same effects were noted in a majority of post-hoc analyses exploring the main effect for time. Because of this, and because the overall effect of time alone is not of primary interest to this research, all other post-hocs exploring main effects for time will not be reported in the text, but can be found in Appendix T.

149 Table 16: Post-hoc paired samples t-tests comparing assessment scores over time

Assessments df t p Cohen’s D14 Pre to post 68 -13.816 <0.001* -1.6615 Pre to delayed post 68 -12.466 <0.001* -1.50 Post to delayed post 68 2.984 0.004* 0.36

Post-hocs for interaction effect between condition and time Further planned comparison analyses were conducted to explore where the significant interaction effect between condition and time lie. Nine post-hoc t-tests were conducted in total, and therefore a Bonferroni correction was applied, with an alpha adjustment of 0.006. Firstly, six t-tests were run to explore each condition’s progress over time: three paired samples t-tests were run for the NNF condition and three for the ET condition, comparing mean scores across assessments. Results can be found in table 17. A significant difference was found between pre- and post-assessments and pre- and delayed post-assessments for both conditions, with a large effect size: both conditions showed a significant improvement in scores between these assessments (see figure 14). A significant difference was also observed between post- and delayed post-assessments for the ET condition, with a moderate effect size: scores decreased significantly between the two assessments (see figure 14). No significant effect was observed between post- and delayed post-assessment scores for the NNF condition. This suggests that, whilst both conditions made significant progress in conceptual understanding from the pre- assessment, the NNF condition appeared to retain understanding more effectively between post- and delayed post-assessments.

14 Cohen’s D for paired samples t-tests was calculated by dividing the t statistic value by the square root of the sample size. 15 Note that a negative Cohen’s D indicates the direction of the effect, showing that scores increased over time. A positive Cohen’s D shows the opposite, that scores decreased over time. 150 Table 17: Post-hoc paired samples t-tests comparing assessment scores over time for each condition

NNF ET Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 33 -11.853 <0.001* -2.03 34 -8.344 <0.001* -1.41 Pre to 33 -10.223 <0.001* -1.75 34 -8.170 <0.001* -1.38 delayed post Post to 33 1.213 0.234 0.21 34 3.160 0.003* 0.53 delayed post

Figure 14: Mean score of each condition across the three assessments

Secondly, three further independent samples t-tests were run, comparing conditions’ scores on each of the three assessments. A significant difference with a moderate effect size was observed between conditions on delayed post-assessments only (see table 18), with the NNF condition scoring significantly higher than the ET condition (see figure 14 above). From these results, it is evident that the NNF and ET conditions did not significantly differ in terms of prior knowledge on pre-assessments, and therefore neither condition was advantaged at the beginning of interventions. Secondly, these results indicate that NNF participants retained a significantly greater conceptual understanding at delayed post-assessments than ET participants. 151 Table 18: Post-hoc independent samples t-tests comparing assessment scores across conditions

Assessments df t p Cohen’s D16 Pre-assessment 53.511117 2.188 0.033 0.53 Post-assessment 67 2.476 0.016 0.60 Delayed post- 67 2.947 0.004* 0.71 assessment

Overall, participants in both the NNF and ET condition made significant progress in terms of developing a conceptual understanding of WWI over the course of assessments. However, for participants in the ET condition, conceptual understanding decreased significantly over the six weeks from post- to delayed post-assessment. The NNF condition saw no significant decrease over this time, which led to these participants scoring significantly higher on delayed post-assessments than participants in the ET condition.

7.2.2. Ability as an additional factor Participants were not matched according to ability across conditions, and therefore academic ability was a factor that might have influenced the performance of the two conditions. Participants’ academic levels in both reading and history were obtained from schools. A Spearman’s rank-order correlation was run to determine the relationship between participants’ reading and history levels, to assess whether these two factors should be considered separately. There was a strong, positive correlation between reading and history levels, which was statistically significant (rs (67) = 0.602, p = <0.001)18. Because reading and history levels were strongly correlated, analyses below consider reading level only. Reading level was considered instead of history level because reading levels were recorded as a continuous variable, whereas history levels were recorded as a nominal variable (see Appendix C for more detail on ability levels), and therefore using reading levels increased statistical power.

16 Cohen’s D for independent samples t-tests was calculated by dividing the mean difference between groups by the pooled standard deviation. 17 For this t-test, Levene’s test for equality of variances was significant, and therefore equal variances were not assumed. This is the case for all further independent samples t-tests where the degrees of freedom is presented as a decimal. 18 The correlation statistic reported in text was calculated using the 69 participants included in a majority of analyses. A strong positive correlation, that was statistically significant, was also observed when all 78 participants were included (rs (76) = 0.610, p = <0.001). 152 7.2.2.1. ANCOVA with reading level as a covariate An ANCOVA was run with reading level as a covariate, to explore whether reading level influenced assessment scores over time. A significant interaction effect was observed between reading level and time, with a large effect size (see table 19), suggesting that reading ability did influence assessment scores over time. As the reading level covariate was a continuous variable, this analysis did not indicate the direction of this effect; this will be explored in further analyses in the following section (7.2.2.2). The interaction effect observed will be explored through further post-hocs below. Although participants in the NNF condition still scored higher than those in the ET condition with adjustments made according to the reading level covariate (see table 20), no significant interaction effect between condition and time was observed. The main effect for condition was approaching significance, and neither was a significant main effect observed for time (see table 19). The latter finding suggests that variability in the scores over time was explained by reading level: reading ability allowed for progress to be made over time.

Table 19: ANCOVA with condition as a factor and reading level as a covariate

Effects df F p h2 Interaction effect 1,65 10.643 <0.001* 0.188 (reading levels*time) 1,66 3.704 0.059 0.040 Main effect (condition)

Main effect (time) 2,65 1.110 0.336 0.019 Interaction effect 2,65 2.317 0.107 0.031 (condition*time)

Table 20: Adjusted mean assessment scores for conditions, with reading level as a covariate

Pre- Post- Delayed post- Combined assessment assessment assessment assessment Condition means means means scores mean NNF 4.32 18.50 17.74 13.52 ET 3.32 15.43 13.37 10.70 Total 3.82 16.96 15.55 12.11

153 Post-hocs for interaction effect between reading level and time Three further ANCOVAs, collapsed across condition, were conducted as post-hoc analyses to explore where the interaction between reading level and time lie. These ANCOVAS compared the assessments in pairs. Significant interaction effects between reading level and time were found between pre- and post-assessments and between pre- and delayed post-assessments, both with a large effect size (see table 21). No significant interaction effect was found between post- and delayed post-assessment scores, suggesting that reading level did not affect the retention of conceptual understanding between these two assessments.

Table 21: Post-hoc ANCOVAs with reading level as a covariate

Interaction effect (Reading level*time) Assessments df F p h2 Pre to post 1,67 25.761 <0.001* 0.268 Pre to delayed post 1,67 21.116 <0.001* 0.232 Post to delayed post 1,67 0.501 0.482 0.007

7.2.2.2. ANOVA with median split for reading level To explore further how reading level influenced assessment scores over time, a median split for reading level was used as a factor in a repeated measures ANOVA19. Again, a significant interaction effect was observed between time and reading level, with a small effect size (see table 22): higher level readers scored higher on pre-, post- and delayed post-assessments than lower level readers (see table 23). This interaction effect will be explored further in post-hoc analyses below. With the exception of a significant main effect for time, no other significant effects were observed.

19 The two groups created using the median split were unequal. This is because 11 participants were at the median reading level. These participants were put into the ‘higher’ reading level group. Firstly, this made groups more even than if these participants were put into the ‘lower’ reading level group. Secondly, the median reading level was the third highest reading level (this level suggests that these readers are working at the expected level for their age), and therefore it was thought appropriate that these should be classified as higher level rather than lower level readers. 154 Table 22: Repeated measures ANOVA with condition and a median split of reading level as factors

Effects df F p h2 Main effect (condition) 1,65 1.379 0.245 0.016 Main effect (time) 2,64 100.558 <0.001* 0.664 Interaction effect 2,64 1.086 0.344 0.005 (condition*time) Interaction effect (reading 2,64 8.515 0.001* 0.055 level*time) Interaction effect (reading 2,64 0.559 0.574 0.004 level*time*condition)

Table 23: Mean assessment scores for higher and lower level readers across the three assessments

Delayed post- Pre-assessment Post-assessment assessment Reading level Mean SD Mean SD Mean SD Lower level 2.23 2.046 10.97 7.985 9.23 6.986 readers Higher level 5.03 3.208 21.54 8.316 20.36 9.004 readers Total 3.81 3.079 16.94 9.680 15.52 9.848

Post-hocs for interaction effect between reading level and time Planned comparisons were run to further explore the interaction effect between reading level and time. Nine t-tests were run in total, and therefore a Bonferroni correction was applied, with an adjusted alpha of 0.006. Firstly, six paired samples t- tests were run to explore lower level and higher level readers’ assessment scores over time. Significant differences between pre- and post-assessment scores and between pre- and delayed post-assessment scores were found for both lower and higher level readers, all with large effect sizes (see table 24): both groups’ scores significantly increased over time (see table 23 above). No significant difference was observed between post- and delayed post-assessment scores for either group (see table 24). This suggests that participants of both higher and lower reading levels increased their conceptual understanding over the course of the interventions.

155 Table 24: Post-hoc paired samples t-tests comparing assessment scores over time for lower and higher level readers

Lower level readers Higher level readers Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 29 -6.113 <0.001* -1.12 38 -15.051 <0.001* -2.41 Pre to 29 -6.459 <0.001* -1.18 38 -13.399 <0.001* -2.15 delayed post Post to 29 2.161 0.039 0.39 38 2.353 0.049 0.38 delayed post

Three further independent samples t-tests were conducted to compare the mean scores of the lower and higher level readers on each of the three assessments. A significant difference between groups was found for each of the three assessments, each with a large effect size (see table 25), with the higher level readers scoring significantly higher on each of the three assessments (see table 23).

Table 25: Post-hoc independent samples t-tests comparing assessment scores between lower and higher level readers

Cohen’s Assessments df t p D Pre-assessment 67 -4.158 <0.001* 1.04 Post-assessment 67 -5.326 <0.001* 1.30 Delayed post- 67 -5.592 <0.001* 1.38 assessment

7.2.2.3. Concerns with assessing the influence of reading ability The above findings suggest that reading level influenced participants’ progress on assessments between pre- and post-assessments, and between pre- and delayed post- assessments. Further to this, it suggests that once reading level was controlled for, the effect between condition and time disappeared. However, there are two further considerations to be made. Firstly, reading level was not seen to influence the retention of conceptual understanding from post- to delayed post-assessments. In the original ANOVA, which compared the assessment scores of the two conditions over time (see section 7.2.1), it was between the post- and delayed post-assessments that a significant difference was observed across the two conditions. Therefore, the findings about the

156 influence of reading level do not refute this original finding. Secondly, higher level readers scored significantly higher on pre-assessments than lower level readers, before they had encountered reading materials in the interventions. This suggests that reading level might have influenced participants’ ability to access assessments, rather than ability to develop conceptual understanding in response to texts.

Three Pearson correlations were conducted to explore the relationship between participants’ reading levels and pre-, post- and delayed post-assessment scores. There was a significant positive correlation between reading levels and assessment scores on all three assessments (see table 26). The correlation between reading levels and assessment scores was moderate on the pre-assessment, and increased to large on the post- and delayed post-assessments. The moderate correlation between reading levels and pre-assessment scores, before any teaching took place, supports the suggestion that higher level readers were more able to access the written assessments than lower level readers.

Table 26: Pearson correlations between reading levels and assessment scores

Assessment r df p Pre-assessment 0.425 67 <0.001* Post-assessment 0.565 67 <0.001* Delayed post- 0.521 67 <0.001* assessment

However, these results might equally reflect the findings from the Spearman’s rank correlation discussed previously (see section 7.2.2), which explored the relationship between reading and history ability: reading levels significantly positively correlated with history levels, and therefore higher level readers might have scored higher on pre- assessments because they also tend to be of a higher level in terms of history ability. However, this is thought to be unlikely, as it remains questionable as to whether participants’ history levels assigned in school can reflect knowledge of a specific subject (World War One) that participants have not previously studied in school.

Overall, analyses conducted with the factor of reading ability controlled for suggest that reading ability did influence participants’ assessment scores over time. Specifically, it

157 was observed that reading ability influenced progress from pre- to post and from pre- to delayed post-assessments, with higher level readers making more progress between these assessments than lower level readers. However, reading ability did not influence the retention of understanding between post- and delayed post-assessments, which is where the initial difference between conditions was observed. The finding that reading ability also correlated with pre-assessment scores suggests that reading ability might not necessarily have influenced participants’ responses to the texts, but rather, their ability to access written assessments.

7.2.3. Analysis of three different themes of interventions The written assessments contained six questions on each of the three themes covered in the intervention sessions. Three repeated measures ANOVAs were conducted, comparing the progress made by conditions on assessment questions relating to each individual theme. For each of these three analyses, only participants who were absent from the specific intervention in question, and those who had missed one of the three assessments, were excluded. The number of participants in each analysis will be stated. As these analyses considered separate interventions, statistical power was reduced. One further repeated measures ANOVA was conducted to explore assessment scores on the two questions that spanned all three intervention themes.

7.2.3.1. Intervention 1: War Begins A repeated measures ANOVA was conducted for the 73 participants who were present for both the first intervention session and all three assessments. A total of 18 points was available for the six questions considered in this analysis, which assessed participants’ understanding of how WWI began. A significant main effect was observed for condition, with a moderate effect size (see table 27), with the NNF condition scoring significantly higher than the ET condition on assessments overall (see table 28). No significant interaction effect was observed between condition and time, suggesting that for this theme, neither condition performed significantly differently over the course of the three assessments.

158 Table 27: Repeated measures ANOVA for assessment questions relating to Intervention 1: War Begins

Effect df F p h2 Main effect (time) 2,70 70.299 <0.001* 0.560 Main effect (condition) 1,71 4.661 0.034* 0.062 Interaction effect 2,70 2.261 0.112 0.016 (condition*time)

Table 28: Mean scores and standard deviations for assessment questions relating to Intervention 1: War Begins

Pre- Post- Delayed post Total assessment assessment assessment Condition Mean SD Mean SD Mean SD Mean NNF 1.78 2.085 7.50 4.359 6.61 4.493 5.296 ET 1.27 1.465 5.22 3.735 4.97 3.670 3.820 Total 1.52 1.804 6.34 4.187 5.78 4.151 4.558

7.2.3.2. Intervention 2: Trench Life A second repeated measures ANOVA was run to explore assessment questions about life in the trenches and the Battle of the Somme (Intervention 2). In this analysis, 72 participants were included. A total of 15 points was available for these questions. Results are presented in table 29. A significant interaction effect between condition and time was observed, with a small effect size: the NNF condition appeared to show a greater increase in scores by the post- and delayed post-assessments than the ET condition (see table 30). The exact nature of this interaction will be explored in more detail below. A significant main effect for condition was also observed, with a moderate effect size, with those in the NNF condition scoring significantly higher overall on assessments than those in the ET condition (see table 30).

Table 29: Repeated measures ANOVA for assessment questions relating to Intervention 2: Trench Life

Effect df F p h2 Main effect (time) 2,69 56.838 <0.001* 0.519 Main effect (condition) 1,70 9.896 0.002* 0.124 Interaction effect 2,69 5.209 0.008* 0.047 (condition*time)

159 Table 30: Mean scores and standard deviations for assessment questions relating to Intervention 2: Trench Life

Delayed Pre- Post- post- assessment assessment assessment Total Condition Mean SD Mean SD Mean SD Mean NNF 0.53 0.810 4.72 3.048 4.50 3.167 3.250 ET 0.39 0.599 2.64 2.428 2.50 2.580 1.843 Total 0.46 0.711 3.68 2.930 3.50 3.040 2.547

Post-hocs for interaction effect between condition and time Planned comparison post-hoc analyses were conducted to explore where the significant interaction effect between condition and time lie for Intervention 2. Nine post-hoc tests were conducted in total, and therefore a Bonferroni correction was applied, with an adjusted alpha of 0.006. Firstly, three paired samples t-tests were run for the NNF condition and three for the ET condition, comparing differences in scores between pairs of assessments. Significant differences were found between pre- and post-assessment scores and pre- and delayed post-assessment scores for both conditions, all with large effect sizes (see table 31): both conditions made progress between these pairs of assessments (see previous table 30). In neither condition was a significant effect observed between post- and delayed post-assessment scores, suggesting that both conditions retained understanding effectively.

Table 31: Post-hoc paired samples t-tests comparing assessment scores over time for each condition

NNF ET Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 36 -9.160 <0.001* -1.51 37 -6.177 <0.001* -1.00 Pre to 36 -8.073 <0.001* -1.33 37 -5.440 <0.001* -0.88 delayed post Post to 36 0.944 0.352 0.16 37 0.758 0.453 0.12 delayed post

Three further independent samples t-tests were run to compare the mean scores of each assessment across the two conditions. A significant difference between conditions was found on post-assessments and delayed post-assessments, both with a moderate effect size (see table 32), with the NNF condition scoring significantly higher than the ET

160 condition on both assessments (see table 30 above). No significant difference was observed between pre-assessments. This suggests that, whilst both conditions made progress on these questions, the NNF condition made more progress than the ET condition by both post- and delayed post-assessments.

Table 32: Post-hoc independent samples t-tests comparing assessment scores across conditions

Assessments df t p Cohen’s D Pre-assessments 70 0.827 0.411 0.20 Post-assessments 70 3.208 0.002* 0.75 Delayed post- 70 2.938 0.004* 0.69 assessments

7.2.3.3. Intervention 3: The Home Front Finally, a repeated measures ANOVA was conducted to explore assessment questions based on the third and final intervention: The Home Front. Overall, 16 points were available for these questions, and 72 participants were included in this analysis. No significant main effect was observed for condition, and neither was a significant interaction effect observed for condition and time (see table 33).

Table 33: Repeated measures ANOVA for assessment questions relating to Intervention 3: The Home Front

Effect df F p h2 Main effect (time) 2,69 57.770 <0.001* 0.526 Main effect (condition) 1,70 3.817 0.055 0.052 Interaction effect 2,69 1.829 0.168 0.016 (condition*time)

Table 34: Mean scores and standard deviations for assessment questions relating to Intervention 3: The Home Front

Pre- Post- Delayed post- Total assessment assessment assessment Condition Mean SD Mean SD Mean SD Mean NNF 0.31 0.749 3.92 2.802 3.19 2.765 2.472 ET 0.17 0.447 2.81 2.584 2.06 2.137 1.676 Total 0.24 0.617 3.36 2.734 2.62 2.520 2.074

161 7.2.3.4. Questions spanning all three interventions Finally, a repeated measures ANOVA was run to explore scores on the two assessment questions that spanned all three intervention themes. A possible total of 9 points was available for these two questions. As these questions spanned all interventions, participants absent from any of the three interventions were excluded, leaving 69 participants in this analysis. A significant interaction effect between condition and time was observed, with a small effect size (see table 35): participants in the NNF condition scored higher at both post- and delayed post-assessments than those in the ET condition (see table 36). Further post-hocs will be conducted to explore this interaction in more detail below. A significant main effect was also observed for condition (see table 35), with a large effect size, with the NNF condition scoring higher on these questions overall than the ET condition (see table 36).

Table 35: Repeated measures ANOVA for assessment questions spanning all three intervention themes

Effects df F p h2 Main effect (condition) 1,67 13.742 <0.001* 0.170 Main effect (time) 2,66 86.097 <0.001* 0.534 Interaction effect 2,66 4.956 0.010* 0.033 (condition*time)

Table 36: Mean scores and standard deviations for assessment questions spanning all three intervention themes

Combined Delayed post- Pre-assessment Post-assessment assessment assessment scores Condition Mean SD Mean SD Mean SD Mean NNF 1.88 1.409 4.21 1.366 4.44 1.397 3.510 ET 1.11 1.132 3.40 1.882 2.57 2.118 2.362 Total 1.49 1.324 3.80 1.685 3.49 2.019 2.936

Post-hocs for interaction effect between condition and time Planned comparison post-hoc analyses were conducted to explore where the significant interaction effect between condition and time lie. Nine post-hoc tests were conducted in total, and therefore a Bonferroni correction was applied, with an adjusted alpha of 0.006. Firstly, six t-tests were conducted, looking at the NNF and ET conditions individually (see table 37). A significant difference between pre- and post-assessments,

162 and between pre- and delayed post-assessments, was found for each condition, both with a large effect size. Both conditions showed an increase in scores between these pairs of assessments (see previous table 36). No significant difference was found for either condition between post- and delayed post-assessments.

Table 37: Post-hoc paired samples t-tests comparing assessment scores over time for each condition NNF ET Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 33 -9.208 <0.001* -1.58 34 -7.919 <0.001* -1.34 Pre to 33 -10.217 <0.001* -1.75 34 -5.009 <0.001* -0.85 delayed post Post to 33 -1.071 0.292 -0.18 34 2.498 0.018 0.42 delayed post

Three further independent samples t-tests were run to explore whether there was a significant difference between conditions on each of the three assessments. A significant difference, with a large effect size was found between conditions on delayed post-assessments (see table 38). The NNF condition scored higher on the delayed post- assessment than the ET condition (see table 36 above). Unexpectedly, this difference in delayed post-assessment scores was not only resultant of the ET condition’s scores decreasing slightly between post- and delayed post-assessments, but also due to the NNF condition’s scores increasing between these two assessments (see table 36).

Table 38: Post-hoc independent samples t-tests comparing assessment scores across conditions

Assessments df t p Cohen’s D Pre-assessments 67 2.500 0.015 0.60 Post-assessments 67 2.031 0.046 0.49 Delayed post- 59.056 4.340 <0.001* 1.04 assessments

Overall, it appears that participants’ responses to different text types were influenced by the theme of the text. A significant interaction effect between condition and time was observed for the Trench Life intervention only: whilst both conditions made progress in their conceptual understanding over time, NNF participants developed a greater conceptual understanding and retained this by delayed post-assessments. In

163 addition, whilst both conditions made progress over time on the two questions that spanned all three intervention themes, the NNF condition scored significantly higher on these questions at the delayed post-assessment than the ET condition, suggesting that participants in the NNF condition retained conceptual understanding more effectively.

7.2.4. Analysis of specific areas of conceptual understanding Assessment questions were assigned to one of four categories according to which historical thinking skill they addressed: simple conceptual thinking, complex conceptual thinking, chronological thinking or causal thinking. Analyses below will explore these four thinking skills in turn, exploring whether condition influenced responses to questions addressing these skills.

7.2.4.1. Simple conceptual thinking There were six simple conceptual thinking questions, providing a possible total of 18 points. These questions required participants to recall single units of information. A significant interaction effect was found between condition and time (see table 39), with a small effect size. Whilst participants in the ET condition showed an increase in understanding from pre- to post-assessment, this declined slightly by the delayed post- assessment; participants in the NNF condition showed an increase in understanding between all three assessments (see table 40). This interaction effect will be explored further below. A significant main effect was also found for condition, with a large effect size (see table 39): the NNF condition scored significantly higher in assessments overall than the ET condition (see table 40).

Table 39: Repeated measures ANOVA for simple conceptual thinking questions

Effects df F p h2 Main effect (condition) 1,67 12.724 0.001* 0.160 Main effect (time) 2,66 129.719 <0.001* 0.696 Interaction effect 2,66 4.862 0.011* 0.025 (condition*time)

164 Table 40: Mean scores and standard deviations for simple conceptual thinking questions

Combined Pre- Post- Delayed post- assessment assessment assessment assessment scores Condition Mean SD Mean SD Mean SD Mean NNF 3.24 2.257 9.88 3.883 10.71 5.000 7.941 ET 1.91 1.634 7.06 4.249 6.74 3.697 5.238 Totals 2.57 2.061 8.45 4.286 8.70 4.791 6.590

Post-hocs for interaction effect between condition and time Post-hoc planned comparisons were conducted to explore the significant interaction effect between condition and time. Nine t-tests were conducted, and therefore a Bonferroni correction with an alpha of 0.006 was applied. Firstly, six t-tests were run to explore the progress that each condition made between pairs of assessments. Results can be found in table 41. A significant difference in assessment scores, with a large effect size, was found between pre- and post-assessments, and between pre- and delayed post-assessments, for both the NNF and ET conditions. Both conditions performed significantly better at post- and delayed post-assessments than at pre-assessments (see table 40, above). For neither condition was the difference between post- and delayed post-assessment scores significant.

Table 41: Post-hoc paired samples t-tests comparing assessment scores over time for each condition

NNF ET Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 33 -12.860 <0.001* -2.21 34 -8.890 <0.001* -1.50 Pre to 33 -10.784 <0.001* -1.85 34 -9.867 <0.001* -1.67 delayed post Post to 33 -1.856 0.072 -0.32 34 0.665 0.510 0.11 delayed post

To explore the difference in assessments across conditions, three further independent samples t-tests were conducted. No significant difference was observed between pre- assessment scores across conditions, although a significant difference was observed between post-assessment scores, with a moderate effect size, and delayed post-

165 assessment scores, with a large effect size (see table 42). The NNF condition scored significantly higher than the ET condition in both of these assessments (see table 40).

Table 42: Post-hoc independent samples t-tests comparing assessment scores across conditions

Assessment df t p Cohen’s D Pre- 60.042 2.778 0.007 0.68 assessment Post- 67 2.881 0.005* 0.69 assessment Delayed post- 67 3.751 <0.001* 0.90 assessment

7.2.4.2. Complex conceptual thinking There were five complex conceptual thinking questions, offering a possible total of 15 points. These questions required participants to combine multiple pieces of historical information. No significant interaction effect was found between condition and time for these questions, nor was a main effect found for condition (see table 43). Therefore, there did not appear to be a differential effect of condition on responses to complex conceptual thinking questions.

Table 43: Repeated measures ANOVA for complex conceptual thinking questions

Effect df F p h2 Main effect (condition) 1,67 2.416 0.125 0.035 Main effect (time) 2,66 56.446 <0.001* 0.570 Interaction effect 2,66 0.819 0.445 0.008 (condition*time)

Table 44: Mean scores and standard deviations for complex conceptual thinking questions

Delayed Combined Pre- Post- post- assessment assessment assessment assessment scores Condition Mean SD Mean SD Mean SD Mean NNF 0.21 0.538 4.47 3.250 3.79 2.921 2.824 ET 0.03 0.169 3.43 3.099 2.80 2.677 2.086 Total 0.12 0.404 3.94 3.194 3.29 2.824 2.455

166 7.2.4.3. Chronological thinking The questions assessing chronological understanding required learners either to chronologically sequence events or to recall the date of events. There were three chronological questions, offering a potential total of 9 points. A repeated measures ANOVA was run to explore differences between conditions for these assessment questions. No significant main effect was observed for condition, and neither was a significant interaction effect observed between condition and time (see table 45).

Table 45: Repeated measures ANOVA for chronological thinking questions

Effect df F p h2 Main effect (condition) 1,67 1.957 0.166 0.028 Main effect (time) 2,66 37.310 <0.001* 0.353 Interaction effect 2,66 0.871 0.423 0.008 (condition*time)

Table 46: Mean scores and standard deviation for chronological thinking questions

Delayed Combined Pre- Post- post- assessment Condition assessment assessment assessment scores Mean SD Mean SD Mean SD Mean NNF 0.68 0.843 2.44 1.673 2.26 1.959 1.794 ET 0.57 0.948 1.83 1.403 1.91 1.422 1.438 Totals 0.62 0.893 2.13 1.562 2.09 1.704 1.616

On the additional chronological sequencing task, groups were assigned one point for each event placed in the correct chronological position, and one point for each event labelled with the correct date. This gave a potential total of 12 points. In both the NNF and ET condition, 12 groups completed this task, giving a total of 24 groups. An independent samples t-test was conducted to compare the performance of conditions on this task. A significant difference was observed between conditions, with a large effect size (t (22) = -3.592, p = 0.002, Cohen’s D = 1.47), with the NNF condition scoring just over twice the number of points as the ET condition (see table 47).

Table 47: Mean scores and standard deviation for each condition on sequencing task

Condition Mean SD NNF 8.17 2.657 ET 4.08 2.906

167 7.2.4.4. Causal thinking Assessments contained six causal questions overall, with a potential total of 16 points. A repeated measures ANOVA was run to compare the performance of conditions on these questions. A significant interaction effect was observed between condition and time (see table 48), with a small effect size: participants in the NNF condition scored higher on both the post- and delayed post-assessments than those in the ET condition (see table 49). A significant main effect was also observed for condition, with a moderate effect size (see table 48): participants in the NNF condition scored higher on causal thinking questions overall than those in the ET condition (see table 49). Post-hocs exploring the interaction effect between condition and time will be presented below.

Table 48: Repeated measures ANOVA for causal thinking questions

Effect df F p h2 Main effect (condition) 1,67 5.050 0.028* 0.070 Main effect (time) 2,66 41.774 <0.001* 0.445 Interaction effect 2,66 4.885 0.011* 0.054 (condition*time)

Table 49: Mean scores and standard deviations for causal thinking questions

Pre- Post- Delayed post- Combined assessment assessment assessment assessment means means means scores Condition Mean SD Mean SD Mean SD Mean NNF 0.47 1.134 3.50 2.946 3.15 2.664 2.373 ET 0.49 0.781 1.89 2.111 1.89 1.906 1.419 Totals 0.48 0.964 2.68 2.665 2.51 2.380 1.896

Post-hocs for interaction effect between condition and time Post-hoc planned comparisons were conducted to explore the interaction effect between condition and time. Nine t-tests were run; consequently, a Bonferroni correction was applied, with an adjusted alpha of 0.006. Firstly, six t-tests were conducted to explore the progress of each condition over time. Results can be found in table 50. In the NNF condition, assessment scores differed significantly between pre- and post-assessments with a large effect size; the same was observed for the ET condition, but with a moderate effect size. Both conditions scored significantly higher on the post-assessment than the pre-assessment (see previous table 49). For both

168 conditions, assessment scores differed significantly between pre- and delayed post- assessments, both with a large effect size: again, both conditions scored significantly higher in the delayed post- than the pre-assessment. For neither condition was a significant difference observed between post- and delayed post-assessment scores.

Table 50: Post-hoc paired samples t-tests comparing assessment scores over time for each condition

NNF ET Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 33 -6.889 <0.001* -1.18 34 4.682 <0.001* 0.79 Pre to 33 -7.056 <0.001* -1.21 34 5.320 <0.001* 0.90 delayed post Post to 33 1.234 0.226 0.21 34 0.000 1.00020 0.0 delayed post

Three further t-tests were conducted to explore whether there were any differences between the performance of conditions on each of the three assessments. No significant differences were found between conditions for any of the three assessments (see table 51). Despite the initial repeated measures ANOVA suggesting that the two conditions performed differently over time, no differences between conditions were observed in post-hoc analyses. However, this might be the result of the Bonferroni correction applied, resulting in a smaller adjusted alpha: the difference between the post- assessment scores of conditions was approaching this adjusted alpha. Alternatively, there could be a difference between groups, in terms of assessment scores over time, that might not have been captured by the selected post-hoc comparisons run here.

Table 51: Post-hoc independent samples t-tests comparing assessment scores across conditions

Assessments df t p Cohen’s D Pre-assessment 67 -0.065 0.949 0.02 Post-assessment 59.709 2.609 0.011 0.63 Delayed post- 67 2.267 0.027 0.54 assessment

20 The mean scores in the post-assessment and the delayed post-assessment were identical for the ET condition. 169 As no significant effects emerged in the post-hoc t-tests presented above, three repeated measures ANOVAs were also conducted as post hocs. Significant interaction effects between condition and time were found between both pre- and post- assessments (with a moderate effect size) and pre- and delayed post-assessments (with a small effect size) (see table 52). Participants in the NNF condition made significantly more progress between pre- and post-assessments and between pre- and delayed post- assessments than those in the ET condition (see previous table 49).

Table 52: Post-hoc repeated measures ANOVA for causal thinking questions, looking at the interaction effect between condition and time

Assessments df F p h2 Pre to post 1,67 9.487 0.003* 0.065 Pre to delayed 1,67 7.721 0.007* 0.050 post Post to delayed 1,67 1.045 0.310 0.015 post

Overall, both simple conceptual thinking and causal thinking showed significant interaction effects between condition and time. For simple conceptual thinking, whilst both conditions made significant progress from pre- to post- and from pre- to delayed post-assessments, the NNF condition scored significantly higher on both the post- and delayed post-assessments than the ET condition. In terms of causal thinking, a significant interaction effect was observed between condition and time between pre- and post- assessments and between pre- and delayed post-assessments, again with those in the NNF condition making more progress between these assessments than those in the ET condition. In terms of chronological thinking, it appears that neither condition outperformed the other on chronological assessment questions, but that the NNF condition performed better on the additional chronological sequencing task. Finally, condition was not found to influence participants’ complex conceptual thinking skills.

7.3. Additional factors: descriptive and inferential statistics Background data were collected to allow an exploration of whether various, additional factors might influence participants’ responses to intervention sessions. Data were collected from parents (through questionnaires in the consent package (Appendix D(iv))), participants (in questionnaires attached to assessments (Appendix E)), and from

170 schools. This section of the data analysis will explore these factors. For each factor, descriptive statistics will be presented before inferential statistics will be discussed. All descriptive statistics include the 69 participants that were present for the entire experiment. The number of participants differed across inferential statistics depending on missing data, and will be stated before these statistics are reported. Firstly, situational and individual interest will be considered, including differences across conditions and any effect these variables might have on assessment scores. Following this, participants’ self-evaluations of both their prior and developed conceptual understanding of WWI will be explored, including an exploration of sources of participants’ prior knowledge. Prior knowledge, as measured by pre-assessments, will then be considered in relation to post- and delayed post-assessment scores. Finally, the influence of other factors – primarily reading outside of school – will be considered in relation to participants’ performance on assessments.

7.3.1. Situational interest To measure situational interest, participants were asked to rate their enjoyment of intervention sessions (on a 5-point likert scale). A majority of participants stated that they really enjoyed or enjoyed the intervention sessions (see table 53), whilst approximately a third of participants stated that they ‘did not mind’ the intervention sessions.

Table 53: Frequency of participants' ratings of enjoyment of intervention sessions

Enjoyment of Percentage of Frequency intervention sessions participants Really enjoyed 17 24.6% Enjoyed 26 37.7% Did not mind 21 30.4% Did not like 4 5.8% Really did not like 0 0.0% No response 1 1.4%

A Mann-Whitney U was conducted to explore whether participants’ enjoyment ratings of interventions differed across conditions. As one participant did not respond to this question, analyses in this section were run with 68 participants. A statistically significant effect was observed for condition (U= 346.500, N=34, N2=34, p=0.003): the mean rank

171 was higher for participants in the NNF condition than in the ET condition21 (mean ranks: NNF = 41.31; ET = 27.69), suggesting that participants in the NNF condition showed a greater enjoyment of intervention sessions than those in the ET condition (see figure 15).

Figure 15: Participants’ reported enjoyment of NNF and ET intervention sessions

The finding that those in the NNF condition showed a greater enjoyment of intervention sessions than those in the ET condition opens up the possibility that situational interest might have positively influenced the development of conceptual understanding in the NNF condition. To explore this possibility, a repeated measures ANOVA was run with condition and participants’ reported enjoyment of interventions as additional factors. No significant interaction effects were observed (see table 54). This suggests that although participants in the NNF condition reported greater levels of enjoyment of the interventions, enjoyment neither influenced performance on assessments over time, nor did it interact with condition to influence assessment scores over time.

21 The mean rank is calculated instead of the rank for ordinal data. It is calculated by collating all scores from both groups of data (in this case, the NNF condition and the ET condition) into one column, before rank ordering these scores from lowest to highest. Once ranks have been assigned, the data is split back into the original two groups. The mean rank of each group is then calculated. Therefore, the higher the mean rank, the more highly rated the interventions were. 172 Table 54: Repeated measures ANOVA with condition and participants’ reported enjoyment of interventions as factors

Effects df F p h2 Condition 1,61 2.558 0.115 0.036 Time22 2,60 67.122 <0.001* 0.609 Condition*time 2,60 1.733 0.185 0.009 Time*enjoyment 6,120 1.746 0.116 0.041 Condition*time* 4,122 0.861 0.490 0.004 enjoyment

7.3.2. Individual interest: ratings of enjoyment of history learning Participants were asked to rate their enjoyment of history learning on a 5-point likert scale on the pre- and post-questionnaires, to assess individual interest in history generally. On the pre-questionnaire, a majority of participants stated that they either loved or liked learning about history (see table 55). Just 3 participants stated that they did not like learning about history. On the post-questionnaire, whilst a majority of participants still stated that they loved or liked learning about history, the number of participants giving this rating declined.

Table 55: Frequency of participants' ratings of enjoyment of history learning before and after intervention sessions

Questionnaire Rating of enjoyment of Pre-questionnaire Post-questionnaire history learning Frequency Percentage Frequency Percentage Love 17 24.6% 12 17.4% Like 24 34.8% 20 29.0% Do not mind 25 36.2% 29 42.0% Do not like 3 4.3% 6 8.7% Really do not like 0 0% 2 2.9% No response 0 0% 0 0%

Two Mann-Whitney U analyses were conducted to explore whether participants rated enjoyment of history learning differently across conditions, on either pre- and post- questionnaires. A statistically significant difference was observed between the two conditions for ratings on the pre-questionnaire and on the post-questionnaire (see table

22 No post-hocs were conducted for the main effect for time in this analysis or any of those below; with factors removed to explore the main effect of time, these post-hoc tests would be identical to those run for the main effect for time in the initial repeated measures ANOVA that compared assessment scores across conditions over time (see section 7.2.1) 173 56). For both, the mean rank was higher for NNF than ET participants (see table 57), suggesting that NNF participants generally enjoyed history learning more.

Table 56: Mann-Whitney U exploring difference in ratings of enjoyment of history learning across conditions, for both pre- and post-questionnaires

Rating of history U N N2 p learning Pre-questionnaire 437.500 34 35 0.046* Post-questionnaire 395.500 34 35 0.011*

Table 57: Mean rank of NNF and ET participants’ ratings of enjoyment of history learning on pre- and post-questionnaires

Rating of history Mean Rank learning NNF ET Pre-questionnaire 39.63 30.50 Post-questionnaire 40.87 29.30

Further Wilcoxon tests23 were run to explore whether ratings of enjoyment of history learning changed over the course of the intervention sessions for each condition. For the NNF condition, there was no significant change in ratings from pre- to post- questionnaire. However, for the ET condition there was a significant difference between pre-questionnaire and post-questionnaire ratings (see table 58), with 16 participants giving a lower rating of their enjoyment of history on the post-questionnaire than on the pre-questionnaire. Five participants gave a higher rating on the post-questionnaire, whilst 14 participants gave the same rating. This suggests that the ET condition may have negatively influenced participant perceptions of their enjoyment of history learning.

Table 58: Wilcoxon results exploring differences in ratings of history learning enjoyment over time for each condition

Condition Z p NNF -1.507 0.132 ET -2.559 0.010*

23 Wilcoxon tests were run here instead of further Mann-Whitney U tests because the samples were paired, rather than independent. 174 7.3.3. Individual interest: ratings of interest in WWI To assess individual interest in WWI specifically, participants were asked whether WWI was a topic that interested them, on both the pre- and post-questionnaires. Participants answered either ‘yes’ or ‘no’. A majority of participants stated an interest in WWI both before and after the intervention sessions (see table 59), although the number interested did decrease between the two questionnaires.

Table 59: Frequency of participants interested in WWI

Pre-questionnaire Post-questionnaire Interest in WWI Frequency Percentage Frequency Percentage Interest 51 73.9% 47 68.1% No interest 14 20.3% 22 31.9% No response 4 5.8% 0 0.0%

Two chi-square tests were conducted to explore whether participants responded to this question differently across conditions, on either pre- or post-questionnaires. Results are reported in table 60. No significant difference in interest in WWI was observed on pre- questionnaires across the two conditions. However, a significant difference between participants’ interest in WWI was observed between conditions on post-questionnaires, with more participants in the NNF condition expressing an interest than in the ET condition (see table 61). In the NNF condition, there was very little change in interest between pre- and post-questionnaires: the same number of participants stated an interest in WWI, and the number of participants stating no interest in WWI increased by two24 (see table 61). Conversely, in the ET condition, the number of participants stating an interest in WWI decreased between pre- and post-questionnaires (see table 61). However, a McNemar test showed that the decrease in interest observed in the ET condition was not significant (df=1, p=0.14625)26.

24 Some participants gave no response to this question, which is why the responses of 32 participants are reported here for the pre-questionnaire in the NNF condition, and 34 for the post-questionnaire. The same is true of the ET condition. 25 No chi-square statistic is reported here. For this analysis there was a small number of discordant cells (25 or below); therefore binomial distribution is used and no chi-square statistic is reported. 26 As the NcNemar compared scores across both questionnaires, any participants who did not answer the question on interest on either the pre- or post-questionnaire were excluded, leaving a total of 65 participants in this analysis. 175 Table 60: Chi-squares exploring participants’ interest in WWI topic as stated before and after interventions

Questionnaire χ2 df p Phi (φ) Pre 3.047 1 0.081 -0.217 Post 6.256 1 0.012* -0.301

Table 61: Number of participants in each condition expressing interest in WWI on pre- and post-questionnaires

Number of participants Questionnaire Response NNF ET Pre Interest 28 23 No interest 4 10 Post Interest 28 19 No interest 6 16

Overall, the general enjoyment of history learning was maintained by participants in the NNF condition over the course of the interventions, whilst the ET participants’ ratings of enjoyment of history learning decreased. While both the NNF condition and the ET condition maintained an approximately consistent level of interest in WWI over the course of the interventions, the NNF condition showed a significantly higher level of interest in WWI following the interventions than the ET condition.

7.3.4. The influence of individual interest on assessment scores The above results suggest that the NNF condition maintained levels of individual interest, whereas the ET condition seemed to have a negative effect on individual interest. Further repeated measures ANOVAs were run to explore whether individual interest (both enjoyment of history learning and interest in WWI) influenced the development and retention of conceptual understanding. As this required a large number of tests, and does not directly pursue any of the research questions, a summary of the main results are reported here. These analyses can be found in full detail in Appendices U and V.

In terms of an individual interest in history generally, it was found that participants’ ratings of enjoyment of history learning on post-questionnaires significantly influenced participants’ progress on assessments over time: those who showed a greater

176 enjoyment of history learning made more progress from pre- to post- and from pre- to delayed post-assessments than those with a lesser enjoyment (see figure 16). No significant difference was found between post- and delayed post-assessments. Further to this, a significant three-way interaction effect was observed between post- questionnaire enjoyment ratings, time and condition. Further post-hoc analyses revealed that those who loved learning about history made significantly more progress from pre- to post-assessments in the ET condition than in the NNF condition, yet also showed a significantly larger decrease in scores from post- to delayed post-assessment. This resulted in participants who loved learning about history making similar progress from pre- to delayed post-assessments in both conditions (see figure 17). However, as this last finding involved a very small and uneven sample size (NNF participants who loved history = 8; ET participants who loved history = 4), it will not be considered further.

Figure 16: Mean scores of participants with different ratings of enjoyment of history learning

177 Figure 17: Mean assessment scores of participants who love learning about history across conditions

In terms of an interest in WWI specifically, it was found that interest (as rated both before and after interventions) influenced progress on assessments over time. For pre- questionnaire ratings, although both those who showed an interest in WWI and those who did not show an interest made significant progress from pre- to post- and from pre- to delayed post-assessments, those who showed an interest scored significantly higher on both post- and delayed post-assessments. However, those with an interest in WWI showed a significant decrease in scores from post- to delayed post-assessments, whereas those with no interest showed no significant decrease in scores. Post- questionnaire ratings of interest showed a similar pattern: although those who showed an interest in WWI and those who did not both made significant increases in scores from pre- to post- and from pre- to delayed post-assessments, those who showed an interest scored significantly higher on post- and delayed post-assessments.

A three-way interaction was also observed between interest, time and condition for ratings made on the pre-questionnaire. It was found that, although both those who stated an interest and those who did not state an interest in WWI made significant progress from pre- to post- and from pre- to delayed post-assessments in the ET condition, those who stated an interest scored significantly higher on post-assessments. 178 However, these participants also showed a significant decline in assessment scores between post- and delayed post-assessments, leading to them making similar progress overall from pre- to delayed post-assessments as those who did not state an interest (see figure 18). Those with an interest in WWI were not found to score higher on any of the three assessments in the ET condition than in the NNF condition. However, this was a relatively small and uneven sample size (interest N=23; no interest N=10).

Figure 18: Mean scores of ET participants who stated an interest/did not state an interest in WWI

Finally, one further test was run to explore whether an individual interest in WWI (prior to the interventions) might have indicated a heightened prior knowledge. An independent samples t-test was run to compare pre-assessment scores for those interested in WWI and those not interested in WWI. A significant difference between groups was found, with a large effect size (t = -3.589 (50.452), p = 0.001, Cohen’s D = 0.82). Those with an interest in WWI scored higher on pre-assessments, indicating a more advanced prior knowledge (see table 62).

Table 62: Mean pre-assessment scores for those interested/not interested in WWI

Interest stated N Mean SD Interest 58 4.33 3.295 No interest 16 2.19 1.642 179

Overall, it appears that individual interest supported the development of conceptual understanding, but not the retention of this understanding. In addition, those with an individual interest in WWI specifically showed an enhanced development of conceptual understanding in the ET condition than those who did not state an interest in WWI. However, participants with an individual interest also showed a significant decline in conceptual understanding from post- to delayed post-assessments, leading to them making a similar amount of progress in their conceptual understanding overall to those in the ET condition who showed no interest in WWI. Participants with an individual interest in WWI also had greater levels of topic-specific prior knowledge than those with no such interest.

7.3.5. Participant self-evaluations of conceptual understanding Participants’ perceptions of their own conceptual understanding of WWI were assessed in both the pre-questionnaire (assessing prior knowledge) and in the post-questionnaire (assessing developed knowledge). In both pre- and post-questionnaires, participants were asked to rate their knowledge of WWI on a scale of 1 to 3, with one being ‘I don’t know anything’, two being ‘I know a little’, and three being ‘I know a lot’. These questions were asked to establish degrees of prior knowledge, and also to gain an insight into participants’ perceptions of their own learning over the course of interventions. Firstly, descriptive statistics will be presented. Inferential analyses will then consider whether participants’ self-evaluations of knowledge differed across conditions, and whether self-evaluations changed over time, in relation to the two conditions. Finally, the sources of self-evaluated knowledge will be briefly considered.

Pre-questionnaire ratings of knowledge of WWI indicate that levels of prior knowledge were generally quite low, with almost half of participants stating that they had no knowledge of WWI (see table 63). However, post-questionnaire ratings show that overall, participants did feel that they had learned more about WWI during intervention sessions, with just 13% still stating that they had no knowledge of WWI.

180 Table 63: Participants' self-evaluations of knowledge of WWI before and after interventions

Pre-questionnaire Post-questionnaire Knowledge of WWI Frequency Percentage Frequency Percentage A lot of knowledge 2 2.9% 9 13.0% A little knowledge 33 47.8% 47 68.1% No knowledge 31 44.9% 9 13.0% No response 3 4.3% 4 5.8%

Two Mann-Whitney U tests were conducted to explore whether there was a significant difference between the self-evaluations of participants in the NNF and ET conditions on pre- and post-questionnaires. Some participants did not answer these questions: the number of participants included in analyses are included in the table below. No significant difference was observed between conditions in terms of self-evaluations on either of the questionnaires (see table 64). This suggests that in neither condition did participants feel they had greater levels of prior knowledge or had developed more knowledge in response to interventions.

Table 64: Mann-Whitney U tests comparing self-evaluations of NNF and ET conditions on pre- and post-questionnaires

Questionnaire U N N2 p Pre 499.000 32 34 0.511 Post 471.000 34 31 0.349

Two Wilcoxon tests were run to assess whether participants’ self-evaluations changed over time in each condition. Participants who did not answer both self-evaluation questions were excluded from analyses, leaving a total of 64 participants. Both conditions showed a significant difference between self-evaluations on pre- and post- questionnaires (see table 65). In the NNF condition, only 1 participant gave a lower self- evaluation of their knowledge of WWI on the post-questionnaire than the pre- questionnaire, whilst 19 participants gave a higher self-evaluation. Twelve participants gave the same self-evaluation on pre- and post-questionnaires. Similarly, in the ET condition, 1 participant gave a lower self-evaluation of knowledge on the post- questionnaire, 10 participants gave a higher self-evaluation, and 19 gave the same self- evaluation. Overall, participants’ self-evaluations increased from pre- to post- questionnaires in both conditions.

181 Table 65: Wilcoxon tests exploring self-evaluations of participants over time, in the NNF and ET conditions

Condition Z p NNF 4.025 <0.001* ET 2.652 0.008*

On both the pre- and post-questionnaires, participants were asked where they had acquired any knowledge of WWI from. For the pre-questionnaire responses, answers were grouped into four categories: other people, books/TV, computer games, and experiences. While 36 participants provided a response to this question, some participants gave multiple sources in their response: each source given was assigned to a category. The most commonly stated source of prior knowledge was books/TV (see table 66). Within this category, films or TV were referred to 7 times, books were referred to 16 times, and Horrible Histories (a series of books also made into a TV programme) was mentioned 4 times. No participants mentioned the internet. These results highlight how important it is to develop an understanding of how children learn from different types of texts.

Participants were asked the same question on the post-questionnaire. As expected, there was a new category of answer: the intervention sessions. This time, 52 participants provided responses to the question (some of these provided multiple sources) (see table 66). Answers relating to the intervention sessions included those such as ‘listening to texts’, ‘reading text and discussion’ and ‘memorising facts from the text’.

Table 66: Sources of participants’ prior knowledge and developed knowledge

Number of participants Developed Prior knowledge Source Examples knowledge Other people Friends, family 18 14 Books/TV Books, documentaries, 27 14 Horrible Histories Computer War-related computer 3 1 games games Experiences Museums, WWI dance 2 0 Lessons/texts Intervention sessions, 0 34 texts read during interventions

182 Overall, whilst both the NNF and ET conditions showed an increase in self-evaluated knowledge over the course of the interventions, there was no difference between the two conditions’ self-evaluated knowledge either before the interventions or after the interventions. This suggests that participants did not feel that they particularly benefited to a greater extent in terms of an increased conceptual understanding in either condition. In stating where they had developed their knowledge of WWI, a majority of participants recognised that they had developed this in response to intervention sessions.

7.3.6. Prior knowledge This section will explore the potential influence of prior knowledge on assessment scores, where prior knowledge is assessed by pre-assessment scores. Pearson correlations were conducted to explore the relationship between pre-assessment scores and both post- and delayed post-assessment scores. A positive correlation was observed between pre- and post-assessment scores and between pre- and delayed post-assessment scores, both with large effect sizes (see table 67): the higher participants’ pre-assessment scores, the higher their post- and delayed assessment scores (see figures 19 and 20). Four further Pearson correlations were conducted to explore whether this pattern was the same across the two conditions: once more, significant, positive correlations were found between pre- and post-assessment scores and between pre- and delayed post-assessment scores for participants in each condition, all with large effect sizes (see table 68).

Table 67: Pearson correlations for participants’ pre-assessment scores and post- /delayed post-assessment scores

Assessment r df p Post 0.686 67 <0.001* Delayed post 0.752 67 <0.001*

183 Figure 19: Correlation of participants’ pre- and post-assessment scores

Figure 20: Correlation of participants’ pre- and delayed post-assessment scores

Table 68: Pearson correlations for participants’ pre-assessment scores and post- /delayed post-assessment scores for the NNF and ET conditions

Condition Assessment r df p Post 0.698 32 <0.001* NNF Delayed 0.759 32 <0.001* post Post 0.643 33 <0.001* ET Delayed 0.683 33 <0.001* post

184 7.3.7. Influence of additional, personal factors A number of additional factors were considered in relation to participants’ assessment scores, to explore whether they influenced participants’ progress over the course of interventions, particularly in relation to condition. Age, gender and participants’ preferred text type were not found to have any influence on participants’ progress over time, or across conditions. Whether children were entitled to free school meals or eligible for pupil premium were also considered: no significant effect on assessment scores was found27. The repeated measures ANOVAs exploring these factors can be found in Appendix W. Information on additional factors such as ethnicity, whether English was a native language and whether participants were on the special education needs (SEN) register was also collected. However, there were no participants with English as a second language, only four participants did not identify as White British, and only six participants were on the SEN register. Therefore, no analyses were run for these factors. There were two factors which were found to influence assessment scores over time: the average number of hours a participant spent reading at home in a week and whether the participant classified reading as a hobby. These factors will be explored below.

7.3.7.1. Hours spent reading at home In the questionnaire sent home to parents, parents were asked how many hours their child spent reading in an average week. A majority of parents stated that their child spent over four hours reading at home in an average week, whilst very few participants spent less than an hour reading at home in an average week (table 69).

Table 69: Average number of hours participants spent reading at home, as stated by parents

Hours reading at home Frequency Percentage 0 to 1 hours 3 4.3% 1 to 2 hours 11 15.9% 2 to 3 hours 15 21.7% 3 to 4 hours 11 15.9% 4+ hours 27 39.1% No response 2 2.9%

27 These analyses were only run for pupils in School A, as data on free school meals and pupil premium for individual participants were unobtainable from School B. 185

Parents were given ranges to choose from (0 – 1 hours, 1 to 2 hours, and so forth), making this a categorical variable. Therefore, a repeated measures ANOVA was run, with both condition and hours spent reading as factors, to explore the influence of this factor on participants’ assessment scores over time. In this analysis, 67 participants were included, as two parents did not respond to this question. A significant interaction effect was observed between time and the number of hours that parents reported their child read at home in an average week, with a small effect size (see table 70). The more hours a participant was reported to spend on average reading at home, the higher their scores in each of the three assessments (see figure 21). Further post-hoc analyses were conducted to explore this interaction effect; these are reported below.

Table 70: Repeated measures ANOVA with condition and average number of hours spent reading at home in a week as factors

Effects df F p h2 Time 2,57 63.729 <0.001* 0.586 Condition 1,58 3.824 0.055 0.052 Time*condition 2,57 2.549 0.087 0.018 Time*hours 8,116 2.019 0.050* 0.057 reading at home Time*condition* 6,116 0.462 0.835 0.007 hours reading at home

186 Figure 21: Mean assessment scores according to average number of hours spent reading at home in a week

Further repeated measures ANOVAs were run as post-hocs to explore how the average number of hours a week spent reading at home influenced participants’ progress between paired assessments. Results are presented in table 71. A significant interaction effect between number of hours spent reading and time was observed between both pre- and post-assessments and pre- and delayed post-assessments, both with a moderate effect size. The more hours a child read on average at home, the more progress they made between these pairs of assessments (see figure 21 above). However, number of hours spent reading at home did not influence the retention of understanding from post- to delayed post-assessment.

Table 71: Post-hoc repeated measures ANOVAs exploring interaction between time and average number of hours spent reading at home in a week

Assessments df F p h2 Pre to post 4,62 3.774 0.008* 0.084 Pre to delayed post 4,62 3.067 0.023* 0.080 Post to delayed 4,62 1.094 0.368 0.062 post

187 7.3.7.2. Reading as a hobby During the pre-questionnaire, participants were asked to select their three favourite hobbies from a list of seven activities. Whilst computer games was the most popularly selected hobby, reading was a popular hobby, selected by 39.1% of participants (see table 72).

Table 72: Frequency of participants that selected various hobbies

Percentage of Hobbies Frequency participants28 Reading 27 39.1% TV 23 33.3% Board games 9 13.0% Computer games 40 58.0% Listening to music 26 37.7% Playing outside 39 56.5% Playing sport 28 40.6%

Of all possible hobbies, a significant effect was only observed for reading. Analyses for other hobbies can be found in Appendix W. A repeated measures ANOVA, with both condition and reading as a hobby as additional factors, showed a significant interaction effect between reading as a hobby and time, with a small effect size (see table 73). Those participants who selected reading as a hobby scored higher than those who did not select reading as a hobby on all three assessments (see table 74).

Table 73: Repeated measures ANOVA with condition and reading as a hobby as factors

Effects df F p h2 Time 2,64 127.294 <0.001* 0.708 Condition 1,65 10.084 0.002* 0.108 Condition*time 2,64 4.589 0.014* 0.022 Time*reading as a 2,64 7.349 0.001* 0.041 hobby Time*condition* 2,64 0.592 0.556 0.02 reading as a hobby

28 Percentages do not add up to 100% because participants could choose their three favourite hobbies from the list. 188 Table 74: Mean scores and standard deviations of participants who listed reading as a hobby and those who did not

Pre Post Delayed post

Mean SD Mean SD Mean SD Reading as a 5.00 4.000 22.30 10.347 20.41 10.896 hobby Reading not a 3.05 2.012 13.50 7.517 12.38 7.730 hobby

Post-hocs for interaction effect between reading as a hobby and time Post-hoc planned comparisons were conducted to explore the significant interaction effect between reading as a hobby and time. Nine t-tests were conducted, and therefore a Bonferroni correction with an alpha of 0.006 was applied. Firstly, six t-tests were run to explore the progress that both those with reading as a hobby (N=27) and those without reading as a hobby (N=42) made between pairs of assessments. Results can be found in table 75. Both those with and without reading as a hobby made significant progress from pre- to post- and from pre- to delayed post-assessments, with large effect sizes. No effect was observed for post- to delayed post-assessments for either group.

Table 75: Post-hoc paired samples t-tests comparing assessment scores over time

Reading as a hobby Reading not a hobby Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 26 -11.142 <0.001* -2.14 41 -10.291 <0.001* -1.59 Pre to 26 -10.147 <0.001* -1.95 41 -8.850 <0.001* -1.37 delayed post Post to 26 2.423 0.023 0.47 41 1.855 0.071 0.29 delayed post

To explore the difference in assessment scores across groups, three further independent samples t-tests were conducted. Groups scored significantly differently at post- and delayed post-assessments, with large effect sizes (see table 76): those with reading as a hobby scored significantly higher on both assessments than those without (see table 74).

189 Table 76: Post-hoc independent samples t-tests comparing assessment scores across participants with and without reading as a hobby

Assessment df t p Cohen’s D Pre- 34.562 2.352 0.024 0.62 assessment Post- 43.462 3.817 <0.001* 0.97 assessment Delayed post- 67 3.580 0.001* 0.85 assessment

Overall, those engaged with reading outside of school developed a greater conceptual understanding in response to the texts than those less engaged with reading. In terms of participants’ self-evaluations of knowledge, it was evident that participants in both conditions felt that they had developed a conceptual understanding of WWI over the course of the interventions. Prior to the interventions, the most common source of prior knowledge was books/TV, suggesting just how important research into learning in response to different text types is. Whilst prior knowledge correlated with both post- and delayed post-assessment scores in both conditions, this does not indicate whether participants with greater levels of prior knowledge made more progress across assessments than those with lower levels.

7.4. Discussion This discussion will explore the findings presented above in relation to the first two research questions, presented at the beginning of this chapter (see page 142). Firstly, it will focus on the development of conceptual understanding overall and in relation to specific historical thinking skills and themes. Next, it will discuss the retention of conceptual understanding. Following this, it will consider the impact of additional factors on understanding, focusing on situational interest, individual interest and reading ability. This discussion will briefly propose possible theories to explain differences between the two conditions; a greater depth of discussion, synthesising these findings with those gained during discourse analysis, will follow in the general discussion chapter.

190 7.4.1. The development of conceptual understanding Both text types supported the development of conceptual understanding between pre- and post-assessments to a similar degree. However, differences between conditions emerged in more specific areas: Trench Life assessment questions, simple conceptual thinking questions and causal thinking questions. In addition, whilst chronological thinking was not observed to develop differently across conditions on assessment questions, participants in the NNF condition outperformed those in the ET condition on the additional chronological sequencing task. These areas of conceptual understanding will be examined in turn below.

7.4.1.1. The development of conceptual understanding on Trench Life assessment questions Participants in the NNF condition scored significantly higher on post-assessment questions and delayed post-assessment questions relating to the Trench Life theme than those in the ET condition: not only did NNF participants develop a greater conceptual understanding, but they also successfully retained this in the longer-term, over the course of six weeks. This theme was categorised as the ‘unfamiliar experiences’ theme: it described experiences of soldiers living and fighting in trenches, a concept far removed from participants’ everyday experiences. Narrative theory suggests that when reading a narrative, the reader is able to transport or relocate into their mental representation of the narrative (Gerrig, 1993; Herman, 2013); this process of transportation might have supported participants in conceptually understanding the unfamiliar concepts presented in the Trench Life text more concretely. It allows readers to leave behind the rules and constraints of their reality (Browning & Hohenstein, 2015), encouraging them to conceive of more unfamiliar, distant concepts. With regard to more familiar themes, such as the Home Front, transportation might not be necessary to concretise concepts: as the experiences portrayed parallel the everyday experiences of participants, they are already relatable and comparable (for example, children attending school), and as such, participants can make direct comparisons, contextualising textual information in relation to their own lives. As a result, those in the ET condition were equally able to develop a conceptual understanding of the Home Front text as those in the NNF condition.

191 However, this theory does not explain why conditions performed similarly on War Begins questions, as these texts also referred to concepts unfamiliar to participants (such as alliances between countries and the making of tactical decisions by leaders). Two possibilities might explain this. Firstly, the War Begins texts dealt primarily with a sequence of events, and the Trench Life texts with experiences of soldiers. It might be that the content influenced the way in which these texts were processed: War Begins texts may have been primarily processed chronologically (as a result, participants in the NNF condition outperformed those in the ET condition on the additional chronological sequencing task related to these texts), whereas Trench Life texts might have been processed in terms of understanding events in relation to how the soldiers experienced them. This leads into the second possibility, that differences might be due to the presence of a protagonist: the War Begins NNF text considered the actions of four historical figures, whereas the Trench Life NNF text followed the journey of one, specific protagonist. Perhaps when unfamiliar ideas were portrayed in relation to a specific protagonist, participants were more inclined to view events in the text through the eyes of the protagonist, which might have enabled transportation into the mental representation of the text. In this sense, a protagonist can almost be considered to be a vehicle into the mental representation. These possibilities will be considered further in the general discussion, in light of the discourse analysis findings.

7.4.1.2. The development of simple conceptual thinking Whilst participants in both conditions developed simple conceptual thinking skills effectively, those in the NNF condition scored significantly higher on such questions on both post- and delayed post-assessments: not only did they develop simple conceptual thinking skills more effectively, but they retained these in the longer-term. This suggests that participants constructed stronger, more appropriate mental representations of NNF texts than ETs, from which they could retrieve understanding to construct accurate responses to simple conceptual thinking questions. As this ability was also observed at delayed post-assessments, this also suggests that NNF participants were more effective at integrating this understanding into longer-term pre-existing knowledge structures. This will be explored in further detail in the general discussion, in light of discussions about the mental representations constructed in response to texts.

192 7.4.1.3. The development of chronological thinking The NNF condition outperformed the ET condition on the additional, physical chronological sequencing task. An underlying chronological sequence is thought to be a defining feature of narrative (Richmond et al., 2011), and this might have supported the development of chronological thinking skills in the NNF condition. Whilst information was presented in the same chronological sequence in both NNF and ETs, there were subtle differences between texts in the presentation of chronological information, which might have influenced chronological thought. These will be discussed in detail in section 9.1.1.1 of the general discussion. However, this finding must be taken with caution: it cannot be assumed that narrative invariably supports chronological thinking, as improved chronological thought was only observed in relation to one NNF text, and therefore might be resultant of the theme of the text. As considered above, the War Begins text described a sequence of key events that were chronologically and causally connected, whereas the Trench Life and Home Front texts considered the experiences of groups of people. Arguably, chronology is less pertinent to the theme of these latter two texts. Therefore, processing of the War Begins NNF text specifically might have focused on the chronological structure of the text, thus resulting in enhanced chronological thought.

7.4.1.4. The development of causal thinking It was also found that NNF texts supported both the development and retention of the ability to think causally. Similarly to chronology, causality is thought to be central to narrative structures (Richardson, 2000). However, this was not reflected in the way that NNF and ETs were written for this research: there were no evident differences in the presentation of causal information across text types. Despite this, causality and chronology are closely related, in that causally linked events must occur in a particular chronological order: for instance, if one event causes another, it must also precede it chronologically. Therefore, enhanced chronological thinking skills might have supported enhanced causal thinking.

7.4.2. The retention of conceptual understanding As expected, conceptual understanding declined in both conditions over the six weeks between post- and delayed post-assessments. This decrease was significant for the ET

193 condition only, subsequently causing significantly lower delayed post-assessment scores in the ET condition than the NNF condition. This suggests that the NNF texts supported participants’ retention of conceptual understanding over a longer period of time, thus strengthening overall learning. In addition, for questions that spanned all three interventions, the two conditions’ scores were comparable on post-assessments, but the NNF scored more highly on delayed post-assessments, once more showing heightened retention of conceptual understanding. Three possible reasons are proposed for these findings. Firstly, narrative texts are closely related to episodic memory (Mar, 2004; Larison, 2018), and retention of textual information has been observed to be greater in response to episodically rich than episodically poor texts (Herbert & Burt, 2004). Perhaps the context-specific, episodically rich nature of the NNF texts supported retention of understanding because it encouraged participants to draw on the episodic memory to a greater extent than the more generalised, episodically poorer ETs. Alternatively, retention might relate to the nature of mental representations constructed. Situation models have been observed to decay more slowly than textbases (Kintsch et al., 1990; Fletcher & Chrysler, 1990), and therefore perhaps participants in the NNF condition constructed stronger situation models representing texts, which they were able to retain for a longer period of time. However, this possibility would require participants to retain situation models over the course of seven to nine weeks between intervention sessions and the delayed post-assessment. As such, improved performance at delayed post-assessments might indicate that conceptual understanding developed in response to NNF texts was more effectively integrated into pre-existing knowledge structures than that from ETs, leading to enhanced retrieval of this understanding in the longer-term. These possibilities will be considered further in the general discussion.

7.4.3. Additional factors Additional factors that may have influenced the development and retention of conceptual understanding will be discussed in this section. Firstly, participants’ situational and individual interest will be considered, in conjunction with the influence of prior knowledge. Next, whether participants enjoyed reading as a hobby, or read regularly at home, will be explored. Finally, participants’ reading abilities will be examined in relation to assessment scores.

194 7.4.3.1. Situational and individual interest This research considered the potential impact of both situational interest and individual interest on the development and retention of conceptual understanding. Findings indicated that whilst participants in the NNF condition showed higher levels of situational interest than those in the ET condition, this did not have any impact on assessment scores over time, or across conditions. This finding supports the suggestions of Krapp et al. (1992), who state that because situational interest is quite suddenly evoked by a stimulus, the effect it has is short-term, and only marginally influences conceptual understanding developed in response to the stimulus. However, situational interest can serve ‘as the basis for the emergence of individual interests’ (Krapp et al., 1992:6).

In terms of individual interest, the NNF condition appeared to maintain participants’ levels of individual interest, whereas the ET condition was more likely to have a negative effect on individual interest. In addition, individual interest appeared to influence responses to the two text types. In the ET condition, participants who stated an interest in WWI developed a significantly greater conceptual understanding by the post- assessment than those who stated no interest. It might be argued that, as individual interest increases attention (Schraw et al., 2001), the increased attention allowed participants to overcome difficulties associated with reading ETs (such as content and structure). However, differences in terms of both content and structure were minimised across text types in this research, and therefore should not have presented barriers that required attention to overcome. Alternatively, it was also observed that an individual interest was typically associated with higher levels of prior knowledge: as these levels of prior knowledge were indicated by pre-assessment scores, this prior knowledge was topic-specific knowledge relating to WWI. In line with research elsewhere, this might indicate that greater levels of prior knowledge are important to the processing of, and possibly enhance the comprehension of, expository texts (Wolfe & Mienko, 2007; Best et al., 2008; Wolfe & Woodwyk, 2010). However, the initially developed conceptual understanding of those with an interest in WWI in response to ETs was not retained by the delayed post-assessments, leading to both those who stated an interest and those who stated no interest developing a similar level of conceptual understanding in response to ETs overall. This reflects one of the primary findings from assessment

195 analyses: ETs do not effectively support the retention of developed conceptual understanding.

7.4.3.2. Reading as a hobby The average number of hours that participants spent reading at home in a week, as reported by parents, impacted participants’ development of conceptual understanding: the more hours spent reading at home, the more progress participants made from pre- to post-assessments, and from pre- to delayed post-assessments. However, number of hours spent reading did not influence retention of understanding from post- to delayed post-assessments. The same pattern was observed for those participants who had selected reading as a hobby. This is not an unexpected result: those who enjoy reading and read more regularly are likely to be more developed, higher level readers, and therefore are likely to have stronger comprehension skills. Neither of these factors influenced responses to different text types, and therefore will not be considered any further. However, these findings do raise the concern that reading to learn in the classroom may only benefit those learners who are more engaged with reading at home, leading to questions of how texts might be made more accessible to pupils who are not keen readers, and read more infrequently outside of school.

7.4.3.3. Reading ability Findings suggested that reading ability influenced participants’ development of conceptual understanding, but not the retention of understanding. Moreover, when reading was controlled for, condition was no longer found to influence performance made on assessments over time. However, this does not necessarily discredit findings for a number of reasons.

Firstly, reading ability was not observed to influence the retention of conceptual understanding; it was only in this area that the NNF texts were observed to support participants to a greater extent than ETs overall. If reading ability did not influence this area, this finding remains valid. Unfortunately, analyses could not be run to see whether reading ability influenced findings relating to specific areas of conceptual development: in isolating more specific dependent variables, there was less power in analyses, making the introduction of a reading level covariate inappropriate.

196

Secondly, there were concerns about the reliability of measures of reading ability. Reading levels were assessed by class teachers (five teachers in total across the two schools) at different stages in the year. Whilst the assignation of reading levels was likely to be partially informed by children’s scores on written comprehension tasks, there were also likely to be subjective differences in interpreting and assigning reading levels, especially across schools. Therefore, the levels assigned might lack reliability and comparability across teachers and schools. In addition, reading levels were observed to significantly correlate with pre-assessment scores. This suggests that reading levels might not have affected ability to develop conceptual understanding in response to texts, but rather, ability to access written assessments. Whilst participants were told that they could have assessment questions read aloud to them, to support those who might find reading more difficult, few participants requested this. Therefore, written assessments relied more heavily on reading comprehension skills, whereas the texts read by the researcher could also be accessed using listening comprehension skills, which are often stronger than reading comprehension skills for readers who find decoding difficult (Diakidoy et al., 2005). Not only might comprehending assessment questions have been a difficulty for lower level readers, but forming responses to these questions might also have been difficult: reading ability has been found to influence writing ability (Ahmed et al., 2014), and therefore poorer readers may have struggled to construct coherent written response to questions, even if they did hold the requisite conceptual understanding to answer questions.

Overall, while it appears that reading ability influenced participants’ ability to access both texts and written assessments, it is unclear as to the extent that reading ability influenced the effect of condition on the development of conceptual understanding. However, it appears that reading ability did not influence the retention of conceptual understanding. It is hoped that considering reading levels in relation to conceptual understanding shown during participant discussions will provide further insights into the extent that reading level might have influenced the development of conceptual understanding, as opposed to ability to perform on written assessments.

197 7.4.4. Conclusion Overall, it appears that NNF texts supported the retention of conceptual understanding to a greater extent than ETs, thus supporting longer-term learning. This may be due to the episodic richness of texts, or alternatively, the mental representations constructed in response to NNF texts, and the way in which information from these are transferred into pre-existing knowledge structures. In addition, the NNF texts seem to have supported the development of specific areas of conceptual understanding, including simple conceptual thinking, chronological thinking, causal thinking and a conceptual understanding of unfamiliar experiences, specifically, life in the trenches. Once more, these findings may be a result of the mental representations constructed in response to the different text types, and the ability to transport into these mental representations. Conversely, increased individual interest, and subsequently a greater level of prior knowledge, appeared to support the development of conceptual understanding in relation to ETs, suggesting that topic-specific prior knowledge may be more important to ET than NNF comprehension. Through considering the results presented above in conjunction with the results of discourse analyses, the general discussion aims to shed more light on how mental representations constructed in response to texts types might have influenced observed differences in conceptual understanding.

198 Chapter Eight: Discourse Analysis and Findings

This chapter presents analyses of coded data gained from participant discussions. These discussions were transcribed and coded using three separate coding schemes: the function coding scheme, the content coding scheme, and the truth-value coding scheme. This section of data analysis seeks to provide insight into the three questions detailed below. These intend to then provide further insight into the main research questions, which will be considered during the general discussion.

(a) Does the appearance of different types of talk relate to differences in the development of conceptual understanding across conditions? (b) Does the appearance of different types of talk indicate how participants constructed conceptual understanding, and whether this was differential across conditions? (c) How do readers perceive the truth-value of narrative nonfiction and expository texts, and justify these perceptions, during discourse?

Firstly, this section will outline descriptive statistics to provide an overview of which codes were produced and the frequency with which they were produced. This section will consider all three coding schemes, and will also determine which codes were used frequently enough to be explored further in subsequent, inferential data analysis. Following this, inferential analyses will be reported. The section on inferential analyses will comprise two parts. The first part will explore questions (a) and (b) above, considering whether function and content codes differed across the two conditions, what might have influenced these differences, and what impact these differences might have had on assessment scores. Through looking at function codes specifically, it aims to shed light on the mental representations that participants may have constructed in response to texts. Content codes should indicate the conceptual understanding that function codes have supported the development of. The second part will address question (c) above, through exploring codes within the truth-value coding scheme: it will examine whether truth-value judgements and justifications differed across conditions, and whether these judgements influenced assessment scores.

199 Similarly to previous analyses on assessment scores, where large numbers of analyses were conducted together (five or more related analyses), Bonferroni corrections were applied to minimise the chance of type I errors occurring (Appendix X). Where Bonferroni corrections were applied, the new alpha is stated in the text. When considering effect sizes in analyses, Cohen’s (1988) guidelines are used (table 77). Eta squared is reported for MANOVAs29. For a majority of the following analyses, all 78 participants were included: as this section focuses on the production of codes, those participants who missed intervention sessions or assessments did not need to be excluded from analyses.

Table 77: Cohen’s guidelines for effect sizes

Eta squared Pearson correlation Effect size Cohen’s D (h2) coefficients effect size Small 0.01 0.10 – 0.29 0.2 Moderate 0.06 0.30 – 0.49 0.5 Large 0.138 0.5 – 1.0 0.8

8.1. Descriptive statistics Across the two conditions and the three coding schemes, 2311 codes were produced overall. Figure 22 shows the proportion of these codes from each coding scheme. Content codes were the most frequently produced codes, whilst truth-value codes appeared least frequently, due to these codes only being applied to the final question in each discussion. Of all codes, 1221 were produced in the NNF condition, and 1090 in the ET condition. The NNF condition produced a greater number of function and content codes, but the frequency of truth-value codes was similar across conditions.

29 MANOVAs only had one independent variable, and therefore eta squared and partial eta squared are equal for these analyses. 200 Figure 22: Overall number of codes produced, across coding schemes and conditions

Truth-value (justification) codes

Truth-value (judgement) codes

Content codes

Function codes

0 100 200 300 400 500 600 700 Number of codes

NNF ET

8.1.1. Function codes The function coding scheme (Appendix N) contained three overarching codes, with numerous specific codes within them (see table 78, on the following page). For a description of codes, see section 6.8.2 of the methodology chapter. Textual understanding codes occurred most frequently, whilst there were very few instances of navigation codes overall (see table 78).

In terms of specific function codes, many of these occurred relatively infrequently, with the exception of collaborative (cumulative), collaborative (negotiation) and hypothetical codes, all of which occurred over 100 times (see asterisked items in table 78). Because of this, these three specific codes will be considered further during inferential data analysis, whereas the remaining specific codes will only be considered further in terms of the overarching code that encompasses them.

201 Table 78: Frequency of overarching and specific function codes

Overarching Number of Specific codes Number of codes codes codes Direct 46 Support 22 Collaborative 193* (cumulative)* Collaborative Textual 154* 598 (negotiation)* understanding Accuracy 29 Self-correction 6 Hypothetical* 129* Summarise 3 Absent information 16 Prior knowledge 51 Own Lives 17 Relatives 3 Referencing 137 Within text 26 Between texts 3 Prior materials 37 Scanning 11 Paragraph 3 Navigation 26 Stage 8 Page 1 Sub-headings 3

8.1.2. Content codes Within the content coding scheme (Appendix O), utterances were firstly assigned an historical thinking code: conceptual, chronological or causal thinking (see section 6.8.3. for more detail). Of these codes, conceptual thinking occurred most frequently (figure 23); chronological and causal thinking codes are likely to have occurred less frequently because they are more specific in nature. As conceptual thinking is a broad thinking skill in comparison to chronological thinking and causal thinking (it encompasses any reference to history that is not chronological or causal in nature), only the overarching chronological and causal thinking codes will be considered during inferential analyses.

Next, each utterance was assigned one of two possible historical knowledge codes: recall (substantive knowledge was recalled or retrieved from the text) or inference (supporting the development of second-order knowledge). Recall codes occurred far

202 more frequently than inference codes (see figure 23). Within the overarching chronological thinking code, no inference codes were assigned.

Figure 23: Frequency of historical knowledge content codes, across the three historical thinking content codes

900

800

700

600

500

400

300

200

100

0 Conceptual thinking Causal thinking Chronological thinking

Total Recall Inference

Finally, each utterance coded as ‘recall’ was assigned one of five specific codes that described the accuracy or depth of the recalled utterance. Of these specific codes, basic codes occurred most frequently. Inaccurate codes also occurred regularly, whilst partial, mixed and complex codes were the most infrequent of the five codes (see table 79).

Table 79: Frequency of the five specific content codes

Specific codes Number of codes Basic 497 Partial 90 Mixed 85 Complex 105 Inaccurate 254

The three graphs presented below (figures 24, 25 and 26) show the proportion of the five specific content codes produced within each of the three overarching content codes. Proportions were similar across each of the three overarching codes, with two key differences: there were a noticeably greater proportion of complex codes, and a smaller proportion of incorrect codes, within the chronological category of codes (figure 25).

203 Figure 24: Proportion of specific codes occurring within conceptual thinking codes

Basic Partial Mixed Complex Incorrect

Figure 25: Proportion of specific codes occurring within chronological thinking codes

Basic Partial Mixed Complex Incorrect

204 Figure 26: Proportion of specific codes occurring within causal thinking codes

Basic Partial Mixed Complex Incorrect

8.1.3. Truth-value codes The last question of each discussion required participants to attribute a truth-value to the text, and to justify their opinion. There were two stages to this coding process (see section 6.8.4 or Appendix P for more detail). Firstly, within each transcript, each participant was assigned one of four truth-value judgement codes according to the truth-value that they attributed to the text: ‘entirely factual’, ‘partially factual’, ‘entirely fictionalised’ or ‘no response’. No response codes were omitted from data analysis. This was because assumptions could not be made regarding why participants were providing no response: it might have been due to uncertainty on the truth-value, an introverted personality with little inclination to voice ideas, or lack of clarity in trying to express an opinion. Therefore, ‘no response’ codes provided no additional, relevant information regarding participants’ perceptions of the truth-value of texts. As there were 78 participants overall, there was a potential total of 234 truth-value judgement codes (78 responses in each of the three intervention discussions). However, three of the general discussion questions were lost (due to technical difficulties), some participants were absent from interventions, and a number of ‘no response’ codes were recorded. Therefore, 135 truth-value judgement codes were assigned in total. Only four participants classified any of the texts as entirely fictionalised, whilst a majority of responses classified texts as entirely factual (see table 80).

205 Table 80: Total number of truth-value judgement codes

Codes Number of codes Entirely factual 75 Partially factual 56 Entirely fictionalised 4

Following this, transcripts were coded according to the justifications that participants gave for their truth-value judgement. Codes were no longer limited to one code per participant per transcript. Justifications were coded into three, separate overarching categories: composition, source and prior knowledge. Prior knowledge codes were the least frequent of the three overarching code categories, whilst source codes occurred most frequently (see table 81).

Table 81: Total number of truth-value justification codes

Number of Number of Overarching codes Specific codes overarching codes codes Making sense 25 Characters 3 Composition 42 Literary Devices 5 Factual 9 information Teacher 13 Original source 14 Source 54 Author/historian 17 Evidence 10 History 15 Prior knowledge 27 Personal 12 experience

8.2. Inferential statistics: function and content codes This section will begin by comparing the frequency of codes across the two conditions. Any codes that differ across conditions will be considered in further analyses, to explore what is possibly causing these differences, and whether these differences might have a further impact on assessment scores. Although assessment and discussion questions were based on different content from the texts in order to minimise the chance that discussions might directly impact assessment scores, it is thought that function and content codes might provide insights into the mental representations that participants constructed in response to texts; these mental representations may in turn influence

206 performance on assessments. For these analyses, both the function and content coding schemes will be considered in conjunction in order to observe any interactions between codes across the different coding schemes. Finally, this section will explore the possible influence of reading ability on the production of codes.

As the total number of codes produced by each of the two conditions differed (NNF: 1221; ET: 1090), proportional data was initially considered to assess whether these unequal numbers might influence data analysis. Firstly, the raw data was converted into proportional data, before an arcsine transformation was applied. MANOVAs and t-tests were then run to explore whether the number of function, content, truth-value judgement and truth-value justification codes differed significantly across the two conditions, using this transformed data. The same MANOVAs and t-tests were run a second time using the raw data. The raw data and the transformed data showed similar results in these analyses: therefore, converting into proportional data was considered unnecessary and the raw data was used for all further analyses. The results of the MANOVAs and t-tests run on arcsine transformed data can be found in Appendix Y.

8.2.1. Frequency of function and content codes across conditions Whilst MANOVAs were run to compare a majority of codes, t-tests were run for the three overarching function codes. This is because most of the codes being compared related very closely to one another (for instance, codes were all a type of historical thinking skill, or a type of historical knowledge) and therefore MANOVAs were appropriate. In contrast, the three overarching function codes were distinct from one another, sharing no common elements despite being in the same coding scheme, thus rendering a MANOVA inappropriate. This section will consider function codes, followed by content codes.

8.2.1.1. Function codes Firstly, t-tests compared the three overarching function codes. A significant difference was found between the two conditions in the frequency of referencing codes only, with a moderate effect size (see table 82): the NNF condition produced significantly more of these codes than the ET condition (see table 83). Referencing codes involved

207 participants making reference to any form or source of prior knowledge. No other differences were found.

Table 82: Independent samples t-tests comparing frequency of three overarching function codes across conditions

Code df t p Cohen’s D Textual understanding 76 1.226 0.224 0.28 Referencing 76 2.854 0.006* 0.64 Navigation 76 0.610 0.544 0.14 Note: * denote significant effects

Table 83: Means and standard deviations for three overarching function codes across conditions

NNF ET Code Mean SD Mean SD Textual 8.43 5.585 6.98 4.912 understanding Referencing 2.24 1.606 1.32 1.254 Navigation 0.38 0.639 0.29 0.602

Three of the specific function codes occurred frequently enough to warrant inferential analysis. A MANOVA showed a significant difference between conditions in terms of these three specific function codes combined, with a large effect size (see table 84). When the dependent variables were considered individually, a significant difference between conditions was observed for both collaborative (negotiation) and hypothetical codes, both with a moderate effect size (see table 84). Both of these codes appeared significantly more frequently in the NNF condition than in the ET condition (see table 85).

Table 84: MANOVA comparing frequency of three specific function codes across conditions

Variables F df p h2 Combined dependent Condition 4.924 3,74 0.004* 0.166 variables Individual Collab (cum)† 0.113 1,76 0.738 0.001 dependent Collab (neg) 5.090 1,76 0.027* 0.063 variables Hypothetical 5.732 1,76 0.019* 0.070 †Note: In tables, collab (cum) refers to the collaborative (cumulative) code, and collab (neg) refers to the collaborative (negotiation) code.

208 Table 85: Means and standard deviations for three specific function codes across conditions

NNF ET Code Mean SD Mean SD Collab (cum) 2.38 2.465 2.56 2.335 Collab (neg) 2.51 2.445 1.49 1.502 Hypothetical 2.19 2.158 1.17 1.580

8.2.1.2. Content codes Firstly, a MANOVA was conducted to explore the frequency of the three historical thinking content codes across the two conditions. Results are reported in table 86. A significant difference was observed for the dependent variables combined, with a moderate effect size. To explore this difference further, the dependent variables were considered individually: a significant difference between conditions was only observed in the frequency of chronological codes, with a moderate effect size. The NNF condition produced significantly more chronological codes than the ET condition (see table 87).

Table 86: MANOVA comparing frequency of three historical thinking codes across conditions

Variables F df p h2 Combined dependent Condition 3.395 3,74 0.022* 0.121 variables Individual Conceptual 3.216 1,76 0.077 0.041 dependent Chronological 8,389 1,76 0.005* 0.099 variables Causal 0.004 1,76 0.952 0.000

Table 87: Means and standard deviations for three historical thinking codes across conditions

NNF ET Code Mean SD Mean SD Conceptual 11.57 5.645 9.22 5.889 Chronological 1.46 1.282 0.71 1.006 Causal 5.19 2.717 5.15 3.461

A further MANOVA was conducted to explore whether conditions produced a different number of historical knowledge codes (see table 88 for results). Once more, a significant effect was observed for the combined dependent variables with a moderate effect size. To explore this combined effect further, individual dependent variables were considered: a significant difference between conditions was found in the production of

209 inference codes, with the NNF condition producing a greater number of inference codes than the ET condition (see table 89).

Table 88: MANOVA comparing frequency of the two historical knowledge codes across conditions

Variables F df p h2 Combined Condition 5.746 2,75 0.005* 0.133 dependent variables Individual Recall 0.799 1,76 0.374 0.010 dependent variables Inference 11.542 1,76 0.001* 0.132

Table 89: Means and standard deviations for the two historical knowledge codes across conditions NNF ET Code Mean SD Mean SD Recall 13.97 6.318 12.54 7.711 Inference 4.24 2.597 2.54 1.804 Finally, a MANOVA was run to explore whether there was a difference in any of the five specific codes (collapsed across the three historical thinking content codes) across conditions. Because a larger number of related analyses were conducted within the MANOVA at this point due to the larger number of dependent variables, a Bonferroni correction was applied for these variables when considered individually, with a new alpha of 0.01. A significant difference was found between the two conditions in terms of these five variables combined, with a large effect size (see table 90). When the dependent variables were considered individually, a significant difference between conditions was found for the number of complex and inaccurate utterances made, both with a moderate effect size. The NNF condition produced significantly more complex utterances, whilst the ET condition produced more inaccurate utterances (see table 91).

Table 90: MANOVA comparing frequency of five specific content codes across conditions

Variables F df p h2 Combined Condition 6.189 5,72 <0.001* 0.301 dependent variables Basic 5.723 1,76 0.019 0.070 Partial 0.711 1,76 0.402 0.009 Individual Mixed 0.085 1,76 0.772 0.001 dependent variables Complex 9.067 1,76 0.004* 0.107 Inaccurate 9.144 1,76 0.003* 0.107

210 Table 91: Means and standard deviations for five specific content codes across conditions

NNF ET Code Mean SD Mean SD Basic 7.62 4.424 5.24 4.346 Partial 1.03 1.118 1.27 1.379 Mixed 1.14 1.357 1.05 1.264 Complex 1.84 1.659 0.90 1.044 Inaccurate 2.35 1.602 4.07 3.110

Overall, a number of codes differed significantly in terms of frequency across conditions. In terms of function coding, the NNF condition produced a significantly greater number of referencing codes than the ET condition, alongside a greater number of collaborative (negotiation) and hypothetical codes. Within the content coding scheme, it was found that chronological codes, inference codes and complex codes were produced significantly more frequently in the NNF condition, whereas the ET condition produced significantly more inaccurate utterances. As these seven codes differed across conditions, they were identified as codes of interest for further exploration.

8.2.2. Possible causes for the differing frequency of codes across conditions Whilst the above results suggest interesting differences between the two conditions, they do not indicate whether condition directly caused these differences. Other factors might have influenced code production, such as the particular combination of participants within discussion groups. Alternatively, condition might have indirectly influenced the production of some codes: if condition caused the production of particular codes, these codes might in turn have contributed to the production of other, subsequently produced codes. To assess what was driving the codes of interest to be produced, hierarchical regressions were run for each code of interest: these analyses allowed an exploration of whether condition and/or other discourse codes might predict the codes of interest, and to what extent they might do so. However, as hierarchical regressions can only take a limited number of predictors, Pearson correlations were firstly run to indicate which codes might be appropriate to input as potential predictors for other codes. Theoretical considerations will also be made when selecting possible predictors: these will be discussed prior to each hierarchical regression in section 8.2.2.2. below.

211 8.2.2.1. Pearson correlations exploring relationships between codes of interest As seven codes of interest were identified above, a large number of correlations (21) were run in total. Therefore, a Bonferroni correction was applied, with an adjusted alpha of 0.002. Correlations for each of the codes of interest are be reported below (see table 92).

Table 92: Pearson correlations between seven codes of interest30

Complex Inference Inaccurate Collab (neg) Collab Referencing Hypothetical Chronological

r 0.544 0.101 0.428 0.212 0.117 Chronological p <0.001* 0.377 <0.001* 0.063 0.310 r 0.544 0.007 0.453 0.401 0.038 Complex P <0.001* 0.951 <0.001* <0.001* 0.740 r 0.101 0.007 0.006 0.174 0.195 Inaccurate p 0.377 0.951 0.956 0.128 0.087 r 0.428 0.453 0.006 0.432 0.154 Inference p <0.001* <0.001* 0.956 <0.001* 0.178 r 0.212 0.401 0.174 0.432 Referencing p 0.063 <0.001* 0.128 <0.001* r 0.117 0.038 0.195 0.154 0.365 Collab (neg) p 0.310 0.740 0.087 0.178 0.001* r 0.329 0.356 0.300 0.337 0.402 0.217 Hypothetical p 0.003 0.001* 0.008 0.003 <0.001* 0.057

Of the seven codes, only inaccurate codes did not correlate with any of the other codes of interest. Of the remaining six codes of interest, a number of statistically significant correlations were found. Firstly, chronological codes positively correlated with two of the specific content codes: complex codes (with a strong effect size) and inference codes (with a moderate effect size).

Complex and referencing codes showed the greatest number of correlations with other codes. Complex codes significantly positively correlated with chronological codes (as discussed above), and also with inference, referencing and hypothetical codes, all with

30 Degrees of freedom for each correlation was 76. 212 moderate effect sizes. Similarly, referencing codes significantly positively correlated with both inference and hypothetical codes, with moderate effect sizes. Referencing codes positively correlated with collaborative (negotiation) codes and complex codes, both with moderate effect sizes.

The final three codes to be considered are inference codes, collaborative (negotiation) codes and hypothetical codes. Whilst these codes do show significant positive correlations with other codes, these correlations have been covered in the discussion above. Inference codes positively correlated with chronological, complex and referencing codes, with moderate effect sizes; collaborative (negotiation) codes positively correlated with referencing codes, with moderate effect sizes; and finally, hypothetical codes positively correlated with complex and referencing codes, with moderate effect sizes. In the following section of the data analysis, these results will be used to support the identification of potential predictors to be input into hierarchical regressions exploring the predictors of codes of interest.

8.2.2.2. Hierarchical regressions exploring predictors of discourse codes This research will follow Stevens’ (1996) suggestion that a reliable regression requires approximately fifteen participants per predictor in the social sciences: therefore, with a total of 78 participants, up to five predictors were input for each hierarchical regression. Numerous considerations were considered when selecting possible predictors. Firstly, whether codes had a correlational relationship, as indicated by analyses above. Secondly, whether there was a theoretical reason underpinning why codes might predict the code in question. Finally, how codes related to each other in terms of the structure of coding schemes was considered. As previously noted, the function codes precede the content codes, in that participants must construct knowledge (as indicated by function codes), in order to express conceptual understanding (as indicated by content codes). Therefore, content codes cannot predict function codes, and were not used as potential predictors for function codes. Similarly, in the content coding scheme, the five specific codes are assigned in relation to the three specific historical thinking codes. This linear sequence of assigning codes meant that specific codes could not predict previously assigned codes, as these specific codes were assigned as a consequence of the overarching codes.

213 When potential predictors were input into hierarchical regressions, condition was entered at Block 1 because the condition participants were placed in preceded the production of any codes. Function codes were input at Block 2 and content codes at Block 3 (unless there were no function codes being input): as discussed above, this was because the function codes precede, and are likely to then influence, the content codes. As a number of analyses were run here (six in total), a Bonferroni correction was applied, with an adjusted alpha of 0.009. This section of the results will briefly outline each of the six regressions conducted. Only six hierarchical regressions were conducted despite there being seven codes of interest: no hierarchical regression was run for referencing codes because as an overarching function code, referencing codes preceded all other codes, and therefore there were no possible predictors.

Chronological codes Firstly, a hierarchical regression was run to explore potential predictors of chronological codes. Chronological codes significantly correlated with complex and inference codes; a correlation between chronological and hypothetical codes was also approaching significance. Complex and inference codes are preceded by chronological codes, and therefore these two codes could not be considered as possible predictors. However, hypothetical codes are considered as possible predictors. Hypothetical thought requires thinking beyond an observable reality to imagine alternative possibilities; similarly, the conception of time requires some form of abstract thought, as it is not directly observable. Subsequently, the ability to think hypothetically might predict chronological thought. Hypothetical codes were input into a hierarchical regression as an independent variable. Condition was entered at Block 1, and the function discourse code (hypothetical) was entered at Block 2.

The model was significant at Block 1 (F(1,76) = 8.389, p = 0.005), with condition significantly predicting chronological codes (see table 93). Condition accounted for 9.9% of the variance in chronological codes. The model was also significant at Block 2 (F(2,75) = 7.373, p = 0.001); however, the addition of the hypothetical code did not result in a significant increase in variance explained (F(1,75) = 5.824, p = 0.018), only explaining a further 6.5% of variance in chronological codes. Overall, 16.4% of variance was

214 accounted for. After the addition of the hypothetical code at Block 2, condition no longer significantly predicted chronological codes.

Table 93: Summary of hierarchical regression analysis for chronological codes

Block Variable B SE b t p Block 1 Constant 2.212 0.417 5.305 <0.001 Condition -0.752 0.260 -0.315 -2.896 0.005* Block 2 Constant 1.686 0.459 3.672 <0.001 Condition -0.585 0.261 -0.245 -2.241 0.028 Hypothetical 0.164 0.068 0.264 2.413 0.018

Complex codes A second hierarchical regression was run to explore possible predictors of complex codes. Chronological, inference, referencing and hypothetical codes correlated with complex codes. Theoretically, enhanced chronological thought might support participants in chronologically linking together more historical ideas in single utterances in order to construct complex utterances. In addition, other codes which suggest deeper comprehension processes (inference and hypothetical) might enable a greater understanding of the text, therefore allowing more complex utterances to be made. Finally, referencing codes might theoretically predict complex codes: as participants make reference to their own prior knowledge and other texts or resources, they may be able to make more connections between information in the text to produce complex utterances. Condition was entered in Block 1, the two function discourse codes (referencing and hypothetical) were entered at Block 2, and the two content discourse codes (chronological and inference) were entered at Block 3.

The model was significant at Block 1 (F(1,76) = 9.067, p = 0.004): condition significantly predicted complex codes (see table 94), accounting for 10.7% of variance. The model was also significant at Block 2 (F(3,74) = 7.744, p = <0.001), with the two function discourse codes predicting a further 13.2% of variance in complex codes. This was a significant increase in variance explained (F(2,74) = 6.434, p = 0.003). However, none of the independent variables, when considered individually, significantly predicted complex codes. Finally, the model was significant at Block 3 (F(5,72) = 10.023, p = <0.001): once more, a significant increase in variance explained was observed (F(2,72) = 10.468, p = <0.001). The content codes entered at Block 3 explained a further 17.1% of

215 variance in complex codes, and overall, Block 3 accounted for 41% of variance. Only chronological codes were observed to significantly predict complex codes when independent variables were considered individually at Block 3.

Table 94: Summary of hierarchical regression analysis for complex codes

Block Variable B SE b t p Block 1 Constant 2.773 0.499 5.561 <0.001 Condition -0.935 0.311 -0.326 -3.011 0.004* Block 2 Constant 1.498 0.588 2.550 0.013 Condition -0.551 0.310 -0.192 -1.777 0.080 Referencing 0.149 0.084 0.201 1.788 0.078 Hypothetical 0.251 0.110 0.261 2.293 0.025 Block 3 Constant 0.392 0.592 0.661 0.511 Condition -0.188 0.289 -0.066 -0.650 0.518 Referencing 0.059 0.077 0.079 0.764 0.448 Hypothetical 0.195 0.103 0.203 1.902 0.061 Chronological 0.469 0.125 0.391 3.754 <0.001* Inference 0.090 0.068 0.148 1.334 0.186

Inaccurate codes Although no codes correlated with inaccurate codes, referencing, inference and hypothetical codes were considered as potential predictors: as correlations were observed between these three independent variables, the combination of these codes might predict the dependent variable. Participants who draw on their own prior knowledge, make inferences and construct hypothetical situations are possibly more likely to produce inaccurate codes because they are drawing on information that is not directly from the text to construct meaning in relation to the text. This may lead to inaccurate interpretations of the text. Condition was entered at Block 1, the two function codes (hypothetical and referencing) were entered at Block 2, and inference codes were entered at Block 3.

The model was significant at Block 1 (F(1,76) = 9.144, p = 0.003): condition significantly predicted the variance of inaccurate codes (see table 95), predicting 10.7% of variance. The model was also significant at Block 2 (F(3,74) = 10.277, p = <0.001): it explained a significant increase in variance (F(2,74) = 9.786, p = <0.001), predicting a further 18.7% variance in inaccurate codes. When considered individually, both condition and hypothetical codes significantly predicted inaccurate codes. Finally, the model was

216 significant at Block 3 (F(4,73) = 7.618, p = <0.001). However, there was not a significant increase in variance (F(1,73) = 0.042, p = 0.838), with 0% additional variance in inaccurate codes being explained. Overall, 29.4% of variance was accounted for. Condition and hypothetical codes remained significant predictors of inaccurate codes, with condition recording a higher standardised beta value than hypothetical codes.

Table 95: Summary of hierarchical regression analysis for inaccurate codes

Block Variable B SE b t p Block 1 Constant 0.630 0.914 0.689 0.493 Condition 1.722 0.569 0.328 3.024 0.003* Block 2 Constant -1.930 1.038 -1.860 0.067 Condition 2.509 0.547 0.478 4.586 <0.001* Referencing 0.318 0.193 0.180 1.643 0.105 Hypothetical 0.484 0.148 0.354 3.278 0.002* Block 3 Constant -1.829 1.155 -1.584 0.117 Condition 2.481 0.586 0.472 4.370 <0.001* Referencing 0.330 0.204 0.187 1.620 0.110 Hypothetical 0.489 0.151 0.358 3.247 0.002* Inference -0.026 0.128 -0.023 -0.205 0.838

Collaborative (negotiation) codes Referencing codes were the only code to correlate with collaborative (negotiation) codes, and also have a theoretical reason for possibly predicting collaborative (negotiation) codes: if participants are making references to their own, unique sources of prior knowledge, they are introducing new understanding to discussions that other participants may not possess in their own prior knowledge structures, or that is not verifiable in relation to the texts. This could cause higher levels of disagreement and consequently negotiation amongst participants. None of the content discourse codes could be considered as potential predictors, as collaborative (negotiation) precedes them all. Therefore, referencing codes were the only discourse codes to be considered as potential predictors.

Condition was entered at Block 1, and referencing codes were entered at Block 2. The model was not significant at Block 1 (F(1,76) = 5.090, p = 0.027): condition predicted 6.3% of variance in collaborative (negotiation) codes, and did not reach significance (see table 96). However, the model did reach significance at Block 2 (F(2,75) = 6.826, p = 0.002), with referencing codes predicting an additional 9.1% of variance in collaborative

217 (negotiation) codes: this was a significant increase in variance accounted for (F(1,75) = 8.087, p = 0.006). Block 2 overall accounted for 15.4% of variance. When considered individually, referencing codes significantly predicted collaborative (negotiation) codes at Block 2.

Table 96: Summary of hierarchical regression analysis for collaborative (negotiation) codes

Block Variable B SE b t p Block 1 Constant 3.539 0.730 4.850 <0.001 Condition -1.026 0.455 -0.251 -2.256 0.027 Block 2 Constant 2.154 0.851 2.531 0.013 Condition -0.621 0.457 -0.152 -1.357 0.179 Referencing 0.437 0.154 0.318 2.844 0.006*

Hypothetical codes Complex and referencing codes correlated with hypothetical codes. As hypothetical codes precede content codes, complex codes were not considered as potential predictors. Referencing codes were considered as the only possible predictor. When a reader is interacting with a text, they might make reference to their prior knowledge to interpret the text, which might in turn enable a participant to construct hypothetical thoughts in relation to the text. Condition was entered at Block 1, and referencing was entered at Block 2.

The model was not significant at Block 1 (F(1,76) = 5.732, p = 0.019): condition predicted 7% of variance in hypothetical codes, which was not a significant prediction (see table 97). The model became significant at Block 2 (F(2,75) = 8.396, p = 0.001), with referencing codes predicting a further 11.3% variance in hypothetical codes. This was a significant increase in variance explained (F(1,75) = 10.355, p = 0.002), and led to a total of 18.3% of variance in hypothetical codes being accounted for overall. When possible predictors were considered individually, referencing codes significantly predicted hypothetical codes at Block 2.

218 Table 97: Summary of hierarchical regression analysis for hypothetical codes

Block Variable B SE b t p Block 1 Constant 3.208 0.683 4.697 <0.001 Condition -1.018 0.425 -0.265 -2.394 0.019 Block 2 Constant 1.761 0.786 2.241 0.028 Condition 0.596 0.422 -0.155 -1.410 0.163 Referencing 0.457 0.142 0.353 3.218 0.002*

Inference codes Finally a hierarchical regression was run to explore potential predictors of inference codes. Chronological, complex and referencing codes correlated with inference codes. Having knowledge of the chronological links between historical ideas in the text (and therefore not having to process and link these ideas in other ways) might allow more working memory space for other processes to take place, such as inferencing (Sweller, 1988). Referencing codes were also considered as possible predictors: participants are likely to have drawn on prior knowledge in order to make inferences relating to the text. Finally, hypothetical codes were considered as possible predictors: if a participant is considering hypothetical situations, they could possibly use these considerations to make inferences regarding the text. For instance, imagining how they might feel in a hypothetical situation might help to make an inference about how a protagonist/figure in the text felt in that situation. Condition was entered at Block 1. Referencing and hypothetical codes were entered at Block 2, and chronological codes at Block 3.

The model was significant at Block 1 (F(1,76) = 11.542, p = 0.001), with condition significantly predicting inference codes (see table 98): condition predicted 13.2% of variance in inference codes. The model remained significant at Block 2 (F(3,74) = 8.872, p = <0.001), and the increase in variance explained (13.3%) was also significant (F(2,74) = 6.675, p = 0.007). When considered individually at Block 2, referencing codes significantly predicted inference codes. The model remained significant at Block 3 (F(4,73) = 9.158, p = <0.001). There was a significant increase in variance (F(1,73) = 7.630, p = 0.007), with 7% additional variance in inference codes being explained. Overall, 33.5% of variance in inference codes was explained. When considered individually, both chronological and referencing codes were significant predictors of inference codes at Block 3, with chronological codes recording a higher standardised beta value than referencing codes.

219 Table 98: Summary of hierarchical regression analysis for inference codes

Block Variable B SE b t p Block 1 Constant 5.590 0.806 7.378 <0.001 Condition -1.707 0.502 -0.363 -3.397 0.001* Block 2 Constant 3.846 0.947 4.059 <0.001 Condition -1.076 0.599 -0.229 -2.155 0.034 Referencing 0.471 0.177 0.298 2.663 0.009* Hypothetical 0.191 0.135 0.156 1.417 0.161 Block 3 Constant 2.925 0.967 3.026 0.003 Condition -0.754 0.493 -0.160 -1.531 0.130 Referencing 0.454 0.169 0.287 2.679 0.009* Hypothetical 0.102 0.133 0.084 0.768 0.445 Chronological 0.569 0.206 0.289 2.762 0.007*

In summary, the above hierarchical regressions suggest that chronological codes significantly predicted complex codes. Chronological and referencing codes significantly predicted inference codes, although chronological codes were the strongest predictor. Referencing codes also predicted both hypothetical and collaborative (negotiation) codes. Finally, both condition and hypothetical codes predicted inaccurate codes, although condition was the stronger predictor. Whilst condition did predict chronological codes, complex codes and inference codes, this was only at Block 1, before additional variables were added.

8.2.3. Influence of codes on assessment scores This section of data analysis seeks to explore whether the production of particular codes might influence post- and delayed post-assessment scores. Firstly, Pearson correlations were run to identify whether any codes had a correlational relationship with post- or delayed post-assessment scores. Although the above section focused on the selected seven codes of interest, this section will consider all codes again: even if codes did not differ across conditions, they might have influenced assessment scores. For codes that showed a correlational relationship with assessment scores, further hierarchical regressions were run to explore whether these codes predicted assessment scores.

For these analyses, participants were considered in the 24 small groups that they completed the discussion tasks in, as opposed to individually. It was not possible to identify voices in the audio-recordings of discussions in order to connect these with

220 particular participants, and therefore, codes produced by individual participants could not be matched to participants’ individual assessment scores. In order to consider assessment scores in relation to the production of particular codes, mean post- and delayed post-assessment scores were calculated for each of the 24 groups of participants. This was also appropriate theoretically: as participants were part of a group, they were participating in any conversation that took place, whether they uttered a statement or not. Therefore, developed conceptual understanding was collective, and as such, considering the mean assessment scores of groups, which reflect this conceptual understanding, was appropriate. Consequently, the analyses in the section below consider 24 groups of participants, and the mean assessment scores for these participants. Whilst having fewer cases for these analyses meant that statistical power decreased, this was necessary in order to pursue this line of enquiry.

8.2.3.1. Pearson correlations exploring relationship between codes and assessment scores

Function codes Firstly, Pearson correlations were run considering the overarching function codes in relation to both post- and delayed post-assessment scores. The overarching code ‘textual understanding’ was not considered here: this code was extremely broad in nature, considering various strategies used by participants to construct conceptual understanding in relation to texts. Therefore, if this correlated with assessment scores, no specific theories could be proposed as to why. Next, the three specific function codes that occurred frequently enough to warrant analysis were considered. As six analyses were conducted for the three specific function codes, a Bonferroni correction was applied, with an adjusted alpha of 0.009. No significant correlations were observed for either of the overarching function codes nor for the three specific function codes (see tables 99 and 100).

Table 99: Pearson correlations between two overarching function codes and post- /delayed post-assessments

Post-assessments Delayed post-assessments Code df r p df r p Referencing 22 0.342 0.101 24 0.152 0.478 Navigation 22 -0.173 0.419 24 -0.317 0.131

221 Table 100: Pearson correlations between three specific function codes and post-/ delayed post-assessments

Delayed post- Post-assessments assessments Code df r p df r p Collaborative 22 0.196 0.358 24 0.224 0.293 (cumulative) Collaborative 22 0.292 0.166 24 0.320 0.128 (negotiation) Hypothetical 22 0.378 0.069 24 0.311 0.139

Content codes Pearson correlations were run for the three historical thinking codes and the two historical knowledge codes. Because six analyses were conducted for the historical thinking codes, a Bonferroni correction was applied with an adjusted alpha of 0.009. In terms of the three historical thinking content codes, chronological codes showed a strong, positive correlation with delayed post-assessment scores (see table 101). No significant correlations were observed between either of the historical knowledge codes and post- or delayed post-assessment scores (see table 102).

Table 101: Pearson correlations between three historical thinking content codes and post-/delayed post-assessments

Code Post-assessments Delayed post-assessments df r p df r p Conceptual 24 0.338 0.106 24 0.413 0.045 Causal 24 0.261 0.219 24 0.117 0.586 Chronological 24 0.477 0.018 24 0.660 <0.001*

Table 102: Pearson correlations between two historical knowledge content codes and post-/delayed post-assessments

Post-assessments Delayed post-assessments Code df r p df r p Recall 24 0.386 0.063 24 0.398 0.054 Inference 24 0.172 0.422 24 0.202 0.344

Five correlations were then run for each of the specific content codes with both post- and delayed post-assessments. Because multiple related tests were run at this stage, a Bonferroni correction was applied, with an adjusted alpha of 0.005. No significant

222 correlations were observed. The correlation between basic codes and delayed post- assessment codes was approaching significance (table 103).

Table 103: Pearson correlations between five specific content codes and post- /delayed post-assessments

Post-assessments Delayed post-assessments Code df r p df r p Basic 24 0.507 0.012 24 0.529 0.008 Partial 24 -0.306 0.147 24 -0.085 0.692 Mixed 24 0.128 0.551 24 0.129 0.548 Complex 24 0.394 0.057 24 0.385 0.063 Inaccurate 24 0.006 0.979 24 -0.068 0.754

8.2.3.2. Hierarchical regressions exploring discourse codes as predictors of post- and delayed post-assessment scores

In light of the correlations above, one further hierarchical regression was run to explore the relationship between chronological codes and delayed post-assessments. Once more, analyses considered participants in their 24 discussion groups, rather than individually. Following Stevens’ (1996) suggestion that a reliable regression requires approximately fifteen participants per predictor in the social sciences, this allowed two predictors to be input.

Condition was input as a predictor at Block 1 and chronological codes at Block 2. Results can be found in table 104, below. The model was significant at Block 1 (F(1,22) = 15.543, p = 0.001), with condition significantly predicting delayed post-assessment scores: alone, condition accounted for 41.4% of variance in delayed post-assessment scores. In addition, the model was significant at Block 2 (F(2,21) = 12.989, p = <0.001): chronological codes accounted for an additional 13.9% of variance in delayed post- assessment scores, which was a statistically significant increase (F(1,21) = 6.528, p = 0.018). In the final model, both condition and chronological codes statistically significantly predicted delayed post-assessment scores, with chronological codes recording a higher standardised beta value than condition. Overall, a total of 55.3% variance in delayed post-assessment scores was accounted for.

223 Table 104: Summary of hierarchical regression analysis with condition and chronological codes as independent variables

Block Variable B SE b t Sig. (p) Block 1 Constant 25.718 2.902 8.863 <0.001 Condition -7.235 1.835 -0.643 -3.942 0.001* Block 2 Constant 17.263 4.205 4.106 0.001 Condition -4.559 1.946 -0.405 -2.343 0.029* Chronological 1.284 0.503 0.442 2.555 0.018*

The above results suggest that chronological codes predicted the retention of understanding at delayed post-assessments. However, there was a concern that chronological codes might only predict delayed post-assessment scores because they initially predicted post-assessment scores, and therefore they did not directly support the retention of conceptual understanding only, but instead, the initial development of conceptual understanding. To assess whether this might be the case, an analogous hierarchical regression was conducted, exploring chronological codes as a possible predictor of post-assessment scores.

Condition was entered in Block 1 and the chronological discourse code in Block 2. Results are presented in table 105, below. The model was significant at Block 1 (F(1,22) = 11.953, p = 0.002), with condition significantly predicting post-assessment scores. The model was also significant at Block 2 (F(2,21) = 6.636, p = 0.006). However, the addition of the chronological code did not result in a significant increase in variance explained (F(1,21) = 1.207, p = 0.284): condition alone explained 35.2% in the variability in post-assessment scores, whilst chronological codes explained a further 3.5%, leading to 38.7% of variance in post-assessment scores being accounted for overall. Condition was the only significant predictor of post-assessment scores at Block 2. This suggests that chronological codes directly predicted the retention of conceptual understanding at delayed post-assessments, rather than the development of conceptual understanding at post-assessments.

224 Table 105: Summary of hierarchical regression analysis with condition and chronological codes as independent variables

Block Variable B SE b t Sig. (p) Block 1 Constant 25.508 2.870 8.889 <0.001 Condition -6.275 1.815 -0.593 -3.457 0.002* Block 2 Constant 21.504 4.630 4.645 <0.001 Condition -5.008 2.143 -0.474 -2.337 0.029* Chronological 0.608 0.553 0.223 1.099 0.284

8.2.4. Reading ability and the production of content codes In the previous chapter, analyses suggested that reading ability might have influenced the development of conceptual understanding in response to texts. However, this might have reflected difficulties in accessing written assessments, rather than texts. To explore this further, a MANCOVA was run to explore the influence of condition on the production of specific content codes (similarly to the MANOVA run in section 8.2.1, p.210), but with reading level as a covariate. Specific content codes were considered in relation to reading ability because these codes showed the accuracy and depth of conceptual understanding. Again, participants were considered in their 24 small discussion groups; as reading level was recorded as a continuous variable, a mean reading level was calculated for each group, to be input as the covariate. Because a larger number of related analyses were conducted within the MANCOVA at this point due to the larger number of dependent variables, a Bonferroni correction was applied for these variables when considered individually, with an adjusted alpha of 0.01.

Results are presented in table 106. A significant difference was still found between the two conditions in terms of the five variables combined, with a large effect size. When the dependent variables were considered individually, complex and inaccurate utterances were approaching significance. However, reading level was not observed to influence the production of the five specific content codes. That complex and inaccurate utterances were only approaching significance may be a result of the reduced power with the addition of a covariate, and the consideration of 24 groups, rather than 78 individual participants. Overall, this suggests that reading level did not influence participants’ ability to produce specific content codes to express their conceptual understanding. In doing so, it provides further indication that assessment scores may be

225 more related to reading level in terms of ability to complete assessments, rather than reading level in terms of the degree of comprehension of texts.

Table 106: MANCOVA comparing frequency of five specific content codes across conditions

Variables F df p h2 Combined dependent Condition 5.632 5,17 0.003* 0.624 variables Reading Level 1.455 5,17 0.256 0.300 Basic 0.255 1,21 0.619 0.012 Individual dependent Partial 1.231 1,21 0.280 0.055 variables Mixed 0.103 1,21 0.751 0.005 (Condition) Complex 7.317 1,21 0.013 0.258 Inaccurate 7.801 1,21 0.011 0.271

8.3. Inferential statistics: truth-value coding This section of data analysis will begin by exploring whether there was a difference in participants’ perceptions and judgements of the truth-value of texts across conditions. Firstly, truth-value judgement codes will be compared across conditions, before they will be considered across conditions for each individual intervention theme. Next, analyses will be run to explore whether perceptions of truth-value influenced assessment scores. Finally, justifications for truth-value codes will be considered, to see if participants judged the truth-value of texts in different ways across conditions. Examples of justification codes will be given to illustrate results.

8.3.1. Perceptions of truth-value across conditions Firstly, a MANOVA explored whether the frequency of truth-value judgements differed across conditions. Results are presented in table 107. A significant difference was observed between conditions for the three codes combined, with a large effect size. When codes were considered individually, a significant difference between conditions, with a moderate effect size, was observed for each of the three codes. ETs were categorised as entirely factual significantly more frequently than NNF texts, and NNF texts as partially factual significantly more frequently than ETs (see table 108). Whilst a significant difference between conditions was also observed for entirely fictional responses, there were only four instances in which participants categorised a text as entirely fictional (all of which occurred in the NNF condition). Therefore, although a

226 significant difference was apparent, there were not enough instances of entirely fictional codes to consider this result further.

Table 107: MANOVA comparing frequency of three truth-value judgement codes across conditions

Variables F df p h2 Combined dependent Condition 5.411 3,73 0.002* 0.182 variables Entirely 5.214 1,75 0.025* 0.065 factual Individual Partially 10.608 1,75 0.002* 0.124 dependent factual variables Entirely 4.992 1,75 0.028* 0.062 fictional

Table 108: Means and standard deviations for three truth-value judgement codes across conditions

NNF ET Codes Mean SD Mean SD Entirely factual 0.72 0.779 1.20 1.005 Partially factual 1.03 0.878 0.46 0.636 Entirely fictional 0.11 0.319 0.00 0.000

Following this, three chi-square tests were conducted to explore whether participants’ truth-value judgements differed across conditions for each of the three intervention sessions. Because interventions were considered individually here, each participant only provided one truth-value judgement, and therefore chi-squares could be used to compare responses given across conditions. ‘Entirely fictional’ responses were excluded from these chi-squares. A statistically significant difference between the responses of participants in the NNF and ET conditions was observed for both the Trench Life and Home Front interventions31 (see table 109). For both the Trench Life and Home Front interventions, ETs were more frequently classified as entirely factual, whilst NNF texts were more frequently labelled as partially factual (see figures 28 and 29). A similar

31 For comparison, the same chi-square tests were run with participants who gave no response to the question included. These led to similar results as when no response codes were not included in analyses. (War Begins: χ2(3) = 4.273, p = 0.233, V = 0.247; Trench Life: χ2(3) = 8.086, p = 0.044, V = 0.322; Home Front: χ2(3) = 10.464, p = 0.015, V = 0.374). 227 pattern was observed for War Begins texts, but differences did not reach significance (see figure 27).

Table 109: Chi-squares comparing truth-value judgements in each intervention across the two conditions

Effect size Intervention 2 df p χ (Phi (φ))32 War Begins 3.348 1 0.067 -0.323 Trench Life 5.370 1 0.020* -0.324 Home Front 6.379 1 0.012* -0.372

Figure 27: Truth-values responses given across NNF and ET conditions in response to War Begins texts

32 Phi is reported because contingency tables in these analyses are 2 x 2 228 Figure 28: Truth-values responses given across NNF and ET conditions in response to Trench Life texts

Figure 29: Truth-values responses given across NNF and ET conditions in response to Home Front texts

8.3.2. Influence of truth-value perceptions on assessment scores There was a concern that if participants categorised texts as partially factual, they may not translate information from texts into pre-existing knowledge structures; this might negatively impact learning. To explore this possibility further, four Pearson correlations were run to explore whether perceptions of truth-value influenced performance on post- and delayed post-assessments. As these Pearson correlations considered both

229 codes and assessment scores, participants were considered in their 24 discussion groups. It was not possible to conduct correlations for the NNF and ET conditions separately as the sample sizes would be too small, with just 12 groups per correlation. However, when correlation results are presented in graphs below, plots for the NNF and ET conditions are shown in distinct colours to allow an insight into whether conditions showed different patterns of correlation.

Partially factual codes showed a significant, strong, positive correlation with both post- assessment and delayed post-assessment scores (see table 110). For both post- and delayed post-assessments, the ET condition cluster around the point of lower scores and fewer partially factual responses, whereas the NNF groups typically cluster around higher scores and a greater number of partially factual responses (see figures 30 and 31). In contrast, a moderate, negative correlation was observed between delayed post- assessment scores and entirely factual codes (see table 110): the opposite trend was observed to partially factual codes (see figure 32). This does not indicate a direct relationship: groups in the NNF condition typically scored higher on assessments and gave a greater number of partially factual truth-value judgements, and groups in the ET typically scored lower and gave more entirely factual judgements. However, it does suggest that the belief that texts were partially factual did not negatively impact performance on assessments.

Table 110: Correlations between three truth-value judgement codes and post- /delayed post-assessments

Post-assessments Delayed post-assessments Code df r p df r p Entirely factual 22 -0.313 0.136 24 -0.421 0.041* Partially factual 22 0.564 0.004* 24 0.532 0.007*

230 Figure 30: Strong positive correlation between number of partially factual codes and post-assessment scores, grouped by condition33

Figure 31: Strong positive correlation between number of partially factual codes and delayed post-assessment scores, grouped by condition34

33 Note that only 11 groups can be seen plotted for the NNF condition, despite there being 12 NNF groups in total: two of the NNF groups had identical mean scores and ‘partially factual’ responses (mean score = 21.33; ‘partially factual’ responses = 5). 34 Similarly to above, only 11 groups can be seen plotted for each the NNF and ET conditions. Two groups had identical mean scores and ‘partially factual’ responses in the NNF condition (mean score = 17.33; ‘partially factual’ responses = 2) and two groups were also identical in the ET condition (mean score = 7.66; ‘partially factual’ responses = 1) 231 Figure 32: Moderate negative correlation between number of entirely factual codes and delayed post-assessment scores, grouped by condition

8.3.3. Justifications for truth-value perceptions This section will firstly consider whether truth-value justification codes differed across conditions, before briefly considering some examples of these codes from transcripts, to give an insight into how participants justified their perceptions.

Firstly, three t-tests were conducted, exploring whether participants’ justifications for their truth-value judgements differed across conditions. The three justifications were composition (participants considering the subject and composition of the text), source (consideration of where the text had come from) and prior knowledge (comparing textual information to prior knowledge). A significant difference was observed for composition codes only (see table 111), with those in the NNF condition producing these codes more frequently than those in the ET condition (see table 112).

232 Table 111: t-tests comparing the frequency of the three truth-value justification codes across conditions

Code df t p Cohen’s D Composition 53.198 3.216 0.002* 0.74 Source 76 0.103 0.918 0.02 Prior 76 0.750 0.456 0.18 Knowledge

Table 112: Means and standard deviations for the three truth-value justification codes across conditions

NNF ET Code Mean SD Mean SD Composition 0.81 0.877 0.29 0.461 Source 0.70 0.845 0.68 0.850 Prior knowledge 0.41 0.798 0.29 0.512

There were four specific codes within the overarching composition code (making sense, characters, literary devices and facts). Descriptive statistics presented previously (see section 8.1.3) showed that a majority of the composition codes were categorised as the making sense code: this code was assigned when participants questioned information presented in the texts, judging whether it made sense to them. To illustrate this code further, examples from transcripts are provided below, on the following page. It is evident that participants are questioning whether information in the text is logical and reliable in relation to how they understand the world around them. For example, in Extract 1, participants appear to be debating how an overarm throw could physically result in the grenade skimming off the back of the car, whilst in Extract 2, one participant questions the likelihood of it being silent when the artillery bombardment finished at the beginning of the Battle of the Somme, when there were so many critically injured soldiers on the battlefield. Similarly, in Extracts 3 and 4, participants question the reliability of particular information in relation to their personal perceptions of how events should have unfolded.

233 Extract 1, taken from a NNF transcript: Child 2: Look. [Reading from text] Suddenly one of the Serbian men drew his hand out of his pocket, brought it over his head and sent a grenade soaring through the air towards the Archduke's car. Luckily for the Archduke, the grenade skimmed off the back of his car. So, I don't know it might not Child 1: Like, because it would have Child 2: I don't think, because. If it skimmed off the back, it's a roofless car Child 1: Yeah? Child 2: So, if he throws overarm, it's going to be a high throw, isn't it? Child 1: Yeah but because it's high, and because the car was moving, maybe it like, maybe he should have aimed it a bit more forwards to land it in the car, but instead it just like skimmed, hit the back and went flying off

Extract 2, taken from a NNF transcript: Child 3: Yeah um I think this bit is definitely true [Reading from text] The loud roar of the artillery guns stopped Child 1: Well they do make a loud noise Child 3: [Continues reading] And silence settled. I think that's definitely true Child 2: Because the Germans are faking to have casualties. Wait. No. Because if they have casualties and it was all quiet wouldn't you hear the groans? Of the soldiers? Because it's all quiet you can't hear anything at all

Extract 3, taken from an ET transcript: Child 1: I think it's all true. Apart from that video bit cos cos I don't think anyone would have time to be recording someone Child 3: Yeah same Child 1: Because they'd have to be having a lookout wouldn't they

Extract 4, taken from a NNF transcript: Child 1: I think like one or two bits might be false. I'm- I don't know, because - because Germany and Britain aren't allies. So how can how would Germany like, how would Germany know that Britain have powerful warships? Because they're not allies

Other specific codes within the overarching composition code included the character and literary device codes. Character codes were assigned when participants referred to the figures or protagonists in the texts, and literary device codes when participants considered the literary nature of the texts in considering their truth-value judgement. Although very few of these codes were produced, a key difference between NNF and ETs (in the Trench Life and Home Front interventions) was the presence of at least one key protagonist in the NNF texts, and a key difference between texts in all interventions was the use of literary devices. Examples of these two codes are detailed below. In literary device codes, participants made reference to features such as description, figurative language and exaggeration in the texts: participants in Extracts 5 and 6 seem to be taking these descriptions quite literally, for instance, questioning whether you

234 would hear a dog growling during the war, despite this being a simile to describe the rumble of distant gunfire. However, in Extract 6 there is also recognition that the author has included this information to enhance description. In terms of the character codes, Extracts 7 and 8 show participants considering how events are seen from the point of view of protagonists in the NNF Trench Life text: they are judging the truth-value of the text by considering what this protagonist actually saw, considering the uniqueness of individual perception.

Extract 5 (Literary devices): Child 2: You wouldn't put a bleary-eyed- a blurry-eyed soldier on sentry because if they can't see that well- Child 3: You want the best sighted people to do it

Extract 6 (Literary devices): Child 3:And um yeah. Like this low sound like a dog's growl. You wouldn't really hear a dog's growl in the war Child 1: It doesn't really make sense Child 3: And there's not really a maze of communication trenches Child 2: Yeah there is! Child 3: Yeah no cos there's just one row then- Child 2: It was um- it's- it's a description, it's describing it because it's hard to see that in the night. So um yeah it's describing it a little bit better to say it's like that

Extract 7 (Character): Child 2: I think there was a bloke called Harry Stinton. I think I’ve seen a guy named him and I don’t know – this is from like one person’s point of view Child 1: Like other people might have thought it was quite sunny

Extract 8 (Character): Child 1: Oh, some of it – one bit of it could not be true like um. Like it says they saw the character Harry whatever his name was, saw a light, because you don’t know because you’re not in his eyes. You’re not in his body so you don’t know

Overall, whilst perceptions of truth-value differed across conditions, partially factual judgements did not appear to negatively influence participants’ development of conceptual understanding, or their ability to retain this understanding. Specific interventions also influenced participants’ truth-value judgements of NNF and ETs: the truth-value of NNF and ETs was only perceived differently in the Trench Life and Home Front interventions. Finally, participants in the NNF condition were more likely to produce composition utterances when justifying judgements of the truth-value of texts.

235 8.4. Discussion This discussion will briefly explore the above analyses in relation to the questions presented at the beginning of this chapter (p.199). In the general discussion, these findings will be considered in conjunction with the assessment findings, in relation to the four main research questions. Therefore, while this section will propose potential theories for differences emerging between conditions, the general discussion will critically examine these theories in more detail. The first section will discuss the conceptual understanding that participants expressed during discussions, focusing on differences across the two conditions. Reasons for these differences will be discussed in the following section, which will consider how participants constructed this conceptual understanding, and what this might indicate about the mental representations constructed in response to texts. Finally, the influence of reading ability and perceptions of truth-value will be considered.

8.4.1. The development of conceptual understanding In terms of conceptual understanding, it was observed that participants in the NNF condition constructed significantly more complex utterances than those in the ET condition, in which participants linked multiple pieces of historical information from the text in a single utterance. This suggests a deeper conceptual understanding of the relationships between distinct pieces of historical information presented in texts. Additionally, participants in the NNF condition produced a greater number of chronological codes: this may be because of the underlying chronological sequence that is definitive of narrative (Richmond et al., 2011). In contrast, ET participants produced a larger number of inaccurate utterances, suggesting that these participants were more likely to struggle in constructing appropriate conceptual understanding in response to texts. It may be that this is the result of struggling to construct strong mental representations that accurately represent textual information (Caccamise & Snyder, 2005).

Hypothetical and inference codes both occurred more frequently in the NNF condition than in the ET condition: in the methodology section, these codes were highlighted as two of the three codes that are likely to be indicative of the development of second- order knowledge, as they required participants to make utterances that drew on

236 understanding not directly available in the text, therefore constructing a more subjective, personal understanding of information. The methodology also suggested that referencing codes, in which participants referred to personal prior knowledge, might also indicate heightened second-order knowledge. Whilst more frequent referencing codes occurred in the NNF condition, descriptive statistics showed that only a small proportion of these codes (approximately 15%) made reference to personal prior knowledge. As such, the finding that referencing codes occurred more frequently in the NNF condition does not indicate whether either condition developed second-order knowledge to a greater extent. Each of these codes will be considered further in the section below.

8.4.2. The construction of mental representations and the impact on conceptual understanding To begin, chronological codes not only occurred more frequently in the NNF condition, but were also found to predict the production of both complex and inference codes. The subtle differences in the presentation of chronological information in the NNF texts may have made chronological information more central to NNF than ETs, leading to the creation of chronologically structured mental representations of these texts, thus increasing the frequency with which participants drew on chronological information during discussions. This suggestion is supported by research discussed previously, suggesting that text type influences which textual information readers more readily recall (Wolfe & Woodwyk, 2010). Consequently, chronological codes may have predicted complex codes because they enabled participants to make chronological links between multiple pieces of textual information with greater ease, due to the chronologically structured mental representation constructed. In terms of chronological codes predicting the production of inference codes, in line with Cognitive Load Theory (Sweller, 1988), it might be that having knowledge of the chronological links between historical ideas in the text (and therefore not having to process and link these ideas in other ways) might allow more working memory space for other processes to take place, such as inferencing.

In addition, both chronological codes and condition predicted delayed post-assessment scores, with chronological codes being the strongest predictor. This suggests that

237 chronological thinking played an influential role in the ability to retain conceptual understanding over a longer period of time. Above, it was discussed that the increased production of chronological codes might have reflected the chronological structure of mental representations of the text. As textbases decay quickly (Kintsch et al., 1990), and are therefore not supportive of longer-term memory of a text, this might indicate that participants developed a chronologically structured situation model in response to NNF texts. However, the fact that condition remained a significant predictor of delayed post- assessment scores alongside chronological codes suggests that it may not have been chronological structure alone that influenced retention of information in texts. Alternatively, it might be that the specific, episodically rich information in NNF texts, including chronological information (e.g. ‘The following day…’), encouraged the use of the episodic memory to process the text: it has been found that episodically rich texts enhance the retention of information presented within texts compared to episodically poor texts (Herbert & Burt, 2004). These theories will be considered further in the general discussion.

Referencing codes also occurred more frequently in the NNF than the ET condition, and were found to predict inference codes: it is likely that this is resultant of participants drawing on different forms of prior knowledge to support the process of inference generation. Referencing codes were also found to predict hypothetical thinking codes. The construction of a situation model requires the integration of a reader’s prior knowledge with information presented in the text, and therefore, more frequent referencing codes in the NNF condition might be indicative of the construction of stronger situation models than in the ET condition. This contradicts findings elsewhere which suggest that prior knowledge is utilised to a greater extent in response to ETs than narrative texts (Wolfe & Mienko, 2007; Wolfe & Woodwyk, 2010). If a higher frequency of referencing codes in the NNF condition allowed participants to construct stronger situation models representing NNF texts, it might be that, as suggested by narrative theory, a reader can then transport, or relocate, into this situation model (Gerrig 1993; Herman, 2009a). This process of transportation may enable participants to transcend their reality (Browning & Hohenstein, 2015) in order to think hypothetically about alternative situations, thus producing hypothetical thinking codes.

238 It was also found that both hypothetical codes and condition predicted the production of inaccurate codes, although condition was the strongest predictor. This was despite the fact that hypothetical codes occurred more frequently in the NNF condition and inaccurate codes occurred more frequently in the ET condition. When inputting hypothetical codes as a potential predictor of inaccurate codes, it was suggested that attempts to construct understanding that is not directly available from the text might be more likely to result in misconceptions, as participants are more imaginatively constructing understanding themselves. It might be that, if participants struggled to construct a strong situation model in response to ETs, hypothetical thought became more difficult, and subsequently, hypothetical thoughts that were produced resulted in the construction of misconceptions, and thus inaccurate utterances.

Finally, referencing codes predicted collaborative (negotiation) codes, which also occurred more frequently in the NNF condition. References to prior knowledge encourage the construction of situation models: these situation models are actualisations of the text that are unique to the reader, because they are infused with the reader’s personal prior knowledge. Therefore, in a group of three participants, each participant would have constructed their own, individualised situation models in the social space between themselves and the text. When drawing on these situation models during discussions in the social space between themselves and peers, it is likely that conflicts and tensions might arise between the readers’ unique situation models, which they might then work to resolve through collaborative negotiations. In such a situation, negotiation may have been essential in creating some kind of shared, collectively accepted meaning in the social space between participants and the text. Participants may then internalise this collectively constructed meaning, altering their situation models accordingly.

On a final note, reading ability was considered in relation to the five specific content codes. Once reading ability was controlled for, condition still influenced the production of these codes across conditions. Reading ability itself was not observed to influence these codes and therefore the conceptual understanding expressed by participants. Whilst these analyses did consider participants in their small groups, where lower level readers may have been supported by higher level readers, it still indicates that whether

239 groups had an overall higher or lower reading ability, this did not influence the development or verbal expression of conceptual understanding of groups as a whole.

8.4.3. Judgements of and justifications for the truth-value of texts Analyses showed a significant difference in terms of participants’ truth-value judgements, with participants more often classifying NNF texts as partially factual, but ETs as entirely factual. When considering interventions individually, this same pattern was observed in each of the three interventions, but only reached significance in the Trench Life and Home Front interventions. Differences between texts in the War Begins intervention might not have reached significance because fewer participants stated a truth-value judgement in these discussions than in the other two interventions: as this was the first intervention, participants may have been less confident in expressing judgements, whilst this effect may have been lessened when they were asked to do so in the following two interventions.

Alternatively, a key difference between NNF and ETs across interventions might have influenced this: in the War Begins intervention, both NNF and ETs referred to historical figures, whereas in the Trench Life and Home Front interventions, NNF texts followed one or two protagonists and ETs referred to broader, general groups of people (e.g. soldiers). Perhaps the presence of protagonists made participants question whether content could be entirely factual. The few justification codes that referred to protagonists in NNF texts did indicate that some participants were considering the fact that NNF texts only considered one individual’s perspective, sometimes questioning how the author could be aware of this individual’s perspective. However, NNF and ETs in the War Begins intervention still showed that NNF texts were more often classified as partially factual and ETs as entirely factual, even if this result did not reach significance. Therefore, it is likely that other factors, alongside the presence of a protagonist, influenced how participants judged the truth-value of texts.

There was a concern that the classification of texts as NNF might negatively influence the conceptual understanding participants developed in response to these texts. However, a positive correlation was observed between the number of partially true judgements and both post- and delayed post-assessment scores: the more partially true

240 judgements that were made within a discussion group, the higher the mean post- and delayed post-assessment scores typically were for that group. This suggests that whilst NNF texts were more likely to be classified as partially factual, this did not negatively influence assessment scores, and thus the development of conceptual understanding in response to these texts.

In terms of participants’ justifications for truth-value judgements, it was found that participants in the NNF condition justified their opinions using composition codes significantly more than those in the ET condition. Within this overarching code category, the specific making sense code was the most frequent to occur: within these codes, participants were often evaluating information presented in the text, comparing whether it made sense in relation to their own understanding of reality. For instance, participants questioned whether a grenade thrown at a roofless car could skim off the back of the car. Perhaps because NNF texts often contained protagonists acting in specific, contextualised settings, participants compared how realistic the events and actions were in relation to their own experiences of life. In doing so, participants appeared to be judging the verisimilitude of NNF texts (Bruner, 1991). Bruner argues that narrative ‘truth’ is judged by its verisimilitude, that is, by the extent to which it appears to be ‘real’ or how far it compares with our expectations of the real world. This reflects a social constructivist view of learning, in that when considering perceptions of reality, a learner should consider how viable these are in relation to their own perceptions, rather than whether they map onto a ‘true’, external image of reality. Conversely, ETs were decontextualised and generic, and therefore participants may have been less inclined to judge information in the same way. Elsewhere, it has been found that older children become more likely to judge speakers giving generic information as more knowledgeable than those giving specific information (Koenig et al., 2015). Generic language might give the impression that an author is an authoritative source of knowledge, who is not observing one situation, but has made enough observations of situations to support their generic statements. This might be reflected in the finding that ETs were more likely to be classified as entirely factual.

241 Chapter Nine: General discussion The previous two chapters considered assessment and discourse findings independently, briefly discussing these in relation to theories and literature. This general discussion aims to bring together these two chapters: in considering findings in conjunction, it aims to more critically discuss these in relation to pertinent theories and literature. In doing so, it intends to provide insight into the four research questions set out at the beginning of this thesis:

1 How does narrative nonfiction affect the development of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 2 How does narrative nonfiction affect the longer term retention of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 3 How might mental representations diverge across narrative and expository texts for primary-aged readers? 4 How do primary-aged readers perceive and judge the truth-value of narrative nonfiction and expository texts, and what implications might this have for the development and retention of conceptual understanding?

This discussion will firstly aim to explore the first two research questions, examining the differences in developed and retained conceptual understanding across the two conditions. Following this, it will consider the mental representations constructed in response to narrative and expository texts; this part of the discussion will be framed within the process of reading set out in Chapter 4 (see figure 3, p.71). In exploring mental representations constructed, it will reflect on how these may have influenced the development and retention of conceptual understanding. Next, with regard to the fourth research question, this discussion will explore participants’ perceptions of the truth-value of texts and the implications that these perceptions might have for utilising narrative in the classroom. Finally, limitations of the research and possible future directions will be discussed.

242 9.1. The development of conceptual understanding in response to texts This section will explore the development of a conceptual understanding of World War One (WWI) across the two conditions. Firstly, this section will consider substantive knowledge and historical thinking skills in conjunction, as these elements of conceptual understanding overlap. Secondly, it will consider the development of second-order knowledge, and finally, the development of inaccurate conceptual understanding.

9.1.1. Substantive knowledge and historical thinking skills 9.1.1.1. Chronological and causal thought The ability to think chronologically was not observed to differ across conditions on written assessments. However, participants in the NNF condition outperformed those in the ET condition on the additional chronological sequencing task, and also produced more chronological utterances during discussions. These somewhat contradictory findings may be due to differences in methods of assessing chronological thought. On written assessments, participants were required to number events to show chronological sequence, whereas the additional sequencing task involved participants physically sequencing event cards collaboratively in small groups. Not only did participants have the opportunity to express, reason about and negotiate their chronological thoughts verbally during the latter task, but the process of physically ordering events might have made the concept of time more concrete, thus supporting chronological thought: young learners can show their full potential for chronological thought in tasks that are meaningful and appropriate (Hoodless, 2002). Alternatively, it might be that chronological thought became more difficult as time elapsed after interacting with texts: discussions and the additional sequencing task occurred immediately after reading texts, whereas the post-assessment was sat a week after encountering the third and final text. However, the influence of chronological thinking codes on delayed post-assessment scores (to be discussed in section 9.2) suggests that the latter theory is less probable. In terms of causal thinking, participants in the NNF condition showed a greater development of this skill, by both post- and delayed post- assessments, than those in the ET condition; no differences were observed in the frequency of causal utterances across conditions during discussions.

243 These results may not appear surprising in light of the fact that the chronological and causal sequencing of events is thought to be a definitive feature of narratives (Richardson, 2000; Norris et al., 2005; Richmond et al., 2011). However, the paired NNF and ETs used in this research presented the same chronological and causal information, sequenced identically. This presents the question of why differences were observed across conditions in these two areas. Despite similarities between texts, there were subtle differences in terms of chronological references in particular. A primary difference was that NNF texts made more specific references to time because they followed protagonists within a specific context. For instance, the NNF Trench Life text began a paragraph with ‘The following summer, on the 1st of July 1916,…’. The use of the phrase ‘the following summer’ embeds the event within a specific timeframe, linking this event to that described in the preceding paragraph. In contrast, the equivalent ET paragraph began with ‘On the 1st of July 1916,…’: the absence of a specific, contextualised timeframe meant that this event was chronologically isolated from other events in the text. This illustrates Bruner’s (1991) concept of ‘narrative diachronicity’: narratives are by their very nature durative, and often deal with ‘human time’, rather than the more abstract notion of ‘clock time’. Human time is time to which significance is assigned by the meaning of events occurring within the narrative. Whilst in the ETs, chronology was largely used to show ‘clock time’ (for example, the dates of specific events), the NNF texts dealt with ‘human time’ in that they described events in a protagonist’s life in relation to one another, within a specific, ‘human’ timeframe.

Additionally, whilst NNF texts were written in continuous prose, ETs included thematic sub-headings (e.g. ‘Life in the Trenches’). These sub-headings refocused the reader on the thematic content of the paragraph, rather than any possible chronological connection to the previous paragraph. Finally, in the ETs, significant dates usually fronted sentences (e.g. ‘On the morning of the 28th June, in 1914, crowds of…’), whereas in NNF texts, the same information was embedded within description (e.g. ‘Elsewhere, the skies were crystal clear on the morning of the 28th June, 1914, as crowds…’). These three slight differences in the presentation of chronological information might have contributed to the sense of a strong ‘chronological thread’ in NNF texts: this chronological thread runs throughout the whole text, connecting all events through the use of ‘human time’, rather than clock time. Whilst the chronological sequence of

244 information in ETs meant that a chronological thread was present in these texts, the three features of the presentation of chronological information, described above, weakened this thread.

The above only considers differences in the presentation of chronological information across NNF and ETs; no differences were apparent in the presentation of causal information. However, it might be that the presence of a stronger chronological thread supports both chronological and causal thought. It is known that chronology and causality are closely intertwined: two of the four properties of causality relate to chronology: temporal priority (that a cause can never occur after its consequent) and operativity (that a cause is active at the time that its consequent occurs) (van den Broek, 1990). Whilst research elsewhere has highlighted that events with a greater number of causal connections are more easily recalled from narratives than those with fewer causal connections (Black & Bern, 1981), other research places an emphasis on the memorability of causal chains that chronologically unfold (Trabasso & van den Broek, 1985; Kintsch, 1998; Graesser at al., 2002): the chronological sequencing of the causal events is important. Therefore, it can be argued that the strong chronological thread running through the NNF texts enhanced both chronological and causal thought. However, claims over the power of narrative to support chronological and causal thought in relation to history must be made with caution: whilst these findings indicate that narrative might support chronological and causal thought, this was only in relation to events that occurred within a short timeframe. It is uncertain whether narratives could be utilised to support more complex chronological and causal thinking within larger timeframes, for instance, the sequencing of historical periods, and an understanding of the length of these historical periods relative to one another. Despite this, chronology did appear to be central to the interpretation of NNF texts, as will be discussed throughout the coming sections.

9.1.1.2. Simple and complex conceptual thought Participants in the NNF condition showed an enhanced capacity for simple conceptual thought on both post- and delayed post-assessments compared to those in the ET condition. Simple conceptual thought involved the recall of single units of substantive knowledge (that were not chronological or causal in nature). Secondly, although

245 complex conceptual thought was not observed to differ on written assessments across conditions, participants in the NNF condition did produce more frequent complex utterances during discussions, in which they verbally linked multiple units of substantive knowledge. Similarly to the issues with measuring chronological thought, discussed above, it might be that participants only showed their potential for complex conceptual thought when provided with appropriate opportunities to do so; participants may have found it more difficult to transcribe complex ideas than to verbally express them, and additionally, it may not have been clear on written assessments that more complex answers were required. The construct validity of assessments will be discussed further in the limitations section (see section 9.5.2). Despite these possible issues, it was evident that participants in the NNF condition showed an enhanced capacity for simple conceptual thought, and also more frequently constructed complex utterances during discussions, than those in the ET condition.

Enhanced simple conceptual thought and more frequent complex utterances might also be resultant of the chronological thread running through NNF texts, as suggested by the finding that chronological utterances predicted the production of complex utterances. Drawing on Herbert & Burt’s (2004) work on how learners respond to unfamiliar topics might help to illustrate how chronology can support these areas of conceptual understanding. Whilst the teacher (or in this case, the author) is an ‘expert’ on a topic, who has a mental representation of propositions organised in a complex web of interrelated facts, this complex web is not accessible to learners (or readers) who are unfamiliar with this topic. As such, learners can struggle to establish relations between propositions presented in texts in meaningful ways, leaving propositions conceptually isolated. This leads to difficulties in understanding the relevance of these propositions, and therefore difficulties in developing an appropriate conceptual understanding of them. However, in the current research, the presence of a strong chronological thread in NNF texts might have allowed participants to establish chronological relations between propositions; this thread provides a familiar structure to embed unfamiliar propositions within, comparable to the narrative structure that we use to organise our own experiences and thus make sense of the world around us (Bruner, 1986). This may support an initial conceptual understanding of propositions, as reflected in NNF participants’ advanced capacity for simple conceptual thought. It may also have

246 supported complex conceptual thought: chronological relations could be used to link multiple units of substantive knowledge, in order to produce complex utterances in response to discussion questions. For participants reading the ETs, in the absence of a strong chronological thread, historical propositions may have been more difficult to establish relations between, and therefore more likely to remain conceptually isolated. This may have resulted in less developed simple conceptual thinking skills, and greater difficulties in formulating complex utterances.

However, these findings dispute those of research elsewhere, which suggests that readers recall more general details but less specific ‘to-be-learned’ content from educational narratives (Wolfe & Woodwyk, 2010) than expository texts. It was suggested that this was because narratives initiate reading goals and schema not appropriate for learning information from texts (Harp & Meyer, 1998): narratives encourage reading for enjoyment, the establishment of cohesion, and therefore a focus on events central to the narrative plot. However, these contradictory findings might be a result of the degree of integration of to-be-learned information in narrative texts. In Wolfe & Woodwyk’s (2010) narrative about the circulatory system, to-be-learned content might not always have been necessary to the progression of the plot: the narrative might still make sense with much of the to-be-learned content removed. In the current research project, to-be-learned content was more often integral to the progression of the narrative: if removed, the narrative would make less sense, and therefore this content was not irrelevant detail to be disregarded. This is supported by the findings of Negrete (2005), who found that undergraduate students were more likely to remember scientific information embedded in a story when the information was central to the development of the story. This has implications for possible further research in this area and educational practice: an important consideration when constructing educational narratives is how the content intended to be learned is integrated into the narrative.

9.1.1.3. Substantive knowledge of unfamiliar experiences: Trench Life Participants in the NNF condition showed a greater substantive knowledge of the ‘unfamiliar experience’ theme (Trench Life) than those in the ET condition, at both post- and delayed post-assessments. As indicated in the assessment discussion (Chapter

247 seven), the construction of a mental representation of the text in the NNF condition may have enabled transportation into this mental representation, allowing the unfamiliar experiences portrayed in the text to become more concrete to readers. Transportation may allow readers to suspend their disbeliefs regarding information which, whilst fitting comfortably within a narrative, does not sit comfortably with their perception of the reality that they inhabit (Bruner, 1986), allowing them to imagine a possible world that does not necessarily conform to their expectations of reality. This storyworld provides a simulation, conveying unfamiliar information that cannot be directly accessed by the reader in a familiar context (Mar & Oatley, 2008). These findings are consistent with those of Browning & Hohenstein (2015), who found that narratives can support children in transcending specific conceptual constraints in order to gain an understanding of evolution.

Despite the above, Chapter 7 questioned why no difference in conceptual understanding was observed between conditions for War Begins texts, which also dealt with unfamiliar concepts. Two possible reasons were proposed for this: firstly, that processing of the War Begins texts focused on the chronological sequence of events, rather than the experiences of a group of people, or secondly, that War Begins texts lacked a key protagonist. The former is unlikely, as chronology has emerged as central to the processing of NNF texts generally, as evidenced through discourse data. Regarding the latter, participants’ judgements of the truth-value of texts indicated that young readers are sensitive to the presence of protagonists, classifying texts with protagonists as partially factual significantly more than the equivalent ETs. As such, once a reader recognises and empathises with a specific protagonist, they might be more inclined to transport into a mental representation in order to view events portrayed in the text through the eyes of the protagonist; the lack of a protagonist may inhibit this process when reading ETs. The notion that readers can transport into mental representations of narratives but not of ETs suggests that mental representations of narratives might be unique in some way; this will be considered further in section 9.3.

9.1.1.4. Individual interest and expository texts Whilst NNF texts supported the development of specific areas of conceptual understanding, ETs appear to have supported the development of conceptual

248 understanding more effectively for participants with an individual interest in WWI, and consequently, higher levels of prior knowledge, than those with no such interest. It was argued above that the lack of a strong chronological thread in ETs contributed to participants struggling to establish relations between historical propositions presented in the texts. Subsequently, propositions remained conceptually isolated, which caused difficulties for participants attempting to contextualise and understand information. It might be that an enhanced prior knowledge enabled participants with an individual interest in WWI to establish more appropriate relations between propositions in response to ETs, thus supporting the development of conceptual understanding. Chapter 7 noted that research elsewhere supports this, suggesting that prior knowledge is more supportive of expository than narrative comprehension (Wolfe & Mienko, 2007; Best et al., 2008; Wolfe & Woodwyk, 2010). Similarly, McNamara et al. (1996) found that readers with lower levels of prior knowledge benefit from more cohesive texts, whilst readers with a greater level benefit from reading texts with low cohesion. In line with this, those with lower levels of topic-specific prior knowledge in the current research may benefit more from NNF texts, in which a strong chronological thread supports the establishment of cohesion, whereas those with a higher level of topic-specific prior knowledge can use this knowledge to establish relations between propositions in ETs, in the absence of a chronological thread. However, whilst prior knowledge might support the comprehension of ETs, there is no evidence to suggest that it supported the comprehension of ETs to a greater extent than NNF texts. Despite this, a greater number of referencing codes were observed in the NNF condition; this presents a slight paradox, in that whilst NNF might enable the use of prior knowledge to a greater extent, the use of prior knowledge might be more beneficial to expository comprehension. Referencing codes will be discussed further in section 9.3.2.1.

Despite the enhanced development of substantive knowledge in response to ETs for those with an individual interest in WWI, this understanding was not retained by delayed post-assessments, resulting in this group of participants making no greater development in conceptual understanding than those with no interest in WWI by delayed post-assessments. Possible reasons for difficulties in retaining conceptual understanding will be discussed in section 9.3.3. It is important to note here that, whilst ETs may be beneficial for those with an increased individual interest, and thus a

249 heightened prior knowledge, it is important to pursue ways of ensuring that the initially developed conceptual understanding is retained as longer-term learning.

9.1.2. Second-order knowledge Participants in the NNF condition appeared to develop a greater second-order knowledge than those in the ET condition, through the more frequent generation of inferences and the production of hypothetical thoughts during discussions. With regards to inference generation, this is supported by research elsewhere, suggesting that narratives are associated with the production of a greater number of inferences than expository texts (Britton & Gülgöz, 1991; Graesser, 1981). This is thought to be because readers draw on their personal experiences to make knowledge-based inferences in order to establish textual cohesion. Readers are able to do this because narrative structures mirror those that individuals use to organise and make sense of their own lives (Graesser et al., 1994): narratives are social simulations (Mar & Oatley, 2008). Regarding hypothetical thought, the process of transporting into the mental representation of a narrative might allow this: this process transports the reader into an ‘experimental laboratory’ (Hakkarainen, 2008: 293), in which new understanding and possibilities can be safely experimented with and explored. As such, hypothetical thinking might occur. Hypothetical thinking involved participants either imagining themselves in scenarios presented in the texts (e.g. ‘If I were in the war…’), or imagining relevant but alternative scenarios to those presented in the texts (e.g. ‘If Germany took their troops out of Belgium…’). The former might support second-order knowledge in that participants are seeking to understand how an historical figure might have felt, or why they might have acted in a particular way, thus developing a subjective understanding of the historical figure. The latter might support second-order knowledge in that participants are employing possibility thinking (Cooper, 2018a): they are considering possibilities outside of those directly presented in the texts, acknowledging the motives of historical figures in considering how alternative paths might have been taken.

The above indicates that participants may have developed a greater second-order knowledge in the NNF condition than in the ET condition. However, caution must be taken here. Whilst the inference code was only assigned to appropriate inferences, the

250 hypothetical code was part of the function coding scheme, and therefore did not indicate the accuracy or appropriacy of hypothetical thoughts. Therefore, whilst suggesting that participants attempted to construct second-order knowledge, the presence of these codes does not mean that participants were able to construct appropriate second-order knowledge. The following section will consider conceptual inaccuracies or misconceptions formed in response to texts.

9.1.3. Inaccuracies in conceptual understanding Despite the fact that no significant difference was observed in post-assessment scores across the two conditions, participants in the ET condition produced more inaccurate utterances during discussions than those in the NNF condition, suggesting that participants in the ET condition developed a less accurate conceptual understanding of WWI. This might be resultant of differences in the ways that assessments and discourse codes measured conceptual understanding. Written assessments only measured accurate conceptual understanding: points were awarded for accurate answers, but not detracted for inaccurate answers, and therefore assessments did not recognise inaccurate conceptual understanding. If inaccurate conceptual understanding was scored in some way on written assessments, perhaps a difference between the two conditions would be observed. Whilst discourse codes did recognise inaccurate conceptual understanding, they could not identify the specific nature of misconceptions. While inaccurate codes were only assigned to utterances categorised as ‘recall’ utterances, it was impossible to determine whether inaccurate utterances were a result of misunderstanding information in the text, leading to the retrieval of inaccurate substantive knowledge, or whether they were the result of inaccurate conclusions being made through processes of constructing second-order knowledge, such as inferencing or hypothetical thought.

Evidence from this research supports both of these suggestions: both condition and hypothetical utterances were found to predict the production of inaccurate utterances, with condition being the strongest predictor. With regards to previous arguments, it might be that the weaker chronological thread in the ET condition meant that participants struggled to establish relations between propositions in the text, thus leaving these propositions conceptually isolated, making the development of a

251 conceptual understanding more difficult. This might result in misconceptions, through inaccurate attempts to establish relations between these propositions, and a subsequent lack of understanding of the propositions presented. Additionally, it might be that when participants in the ET condition attempted to make hypothetical utterances, they struggled to do so because they could not transport into their mental representation of the text (to be discussed further in section 9.3.2.4). As such, inaccuracies may have risen from these attempts at hypothetical thought. This is partially supported by research elsewhere, which has found that more invalid inferences and irrelevant comments are made in response to ETs (Kraal et al., 2018; Karlsson et al., 2018): difficulties in thinking hypothetically might well lead to irrelevant, inaccurate utterances being made, and to the generation of inferences that are not appropriate to the text.

However, despite inaccurate utterances occurring more frequently during discussions in the ET condition, the nature of the discussions meant that there were opportunities for other participants to alter these misunderstandings, or for participants to collaboratively construct a more accurate answer. Therefore, because of the collaborative, dynamic nature of discussions, the production of inaccurate utterances did not necessarily result in the development or retention of inaccurate conceptual understanding.

9.1.4. Conclusions on the development of conceptual understanding Overall, in terms of developing a conceptual understanding of WWI, it appears that NNF texts offered various benefits, including the enhanced development of chronological and causal thinking processes, simple conceptual thinking skills, and the formulation of complex utterances, which all seem to be resultant of the chronological thread running through NNF texts. Additionally, the process of transportation into the mental representation of NNF texts might have allowed participants to gain a greater substantive knowledge of unfamiliar concepts (specifically Trench Life), and to develop a stronger second-order knowledge of WWI. In contrast, ETs caused more inaccuracies in conceptual understanding during discussions, which may have been caused by difficulties in establishing relations between propositions presented in the text, or from inappropriate attempts at constructing second-order knowledge, specifically through

252 hypothetical thought. However, those participants with an individual interest in WWI, and consequently an increased prior knowledge of WWI, were more effective at developing conceptual understanding in response to ETs than those with no interest in WWI, perhaps because their prior knowledge supported them in establishing relations between otherwise conceptually isolated propositions presented in ETs.

9.2. The retention of conceptual understanding in response to texts Text type influenced how able participants were to retain initially developed conceptual understanding. Whilst participants did not initially develop a greater conceptual understanding in response to NNF texts, the fact that they could retain their developed conceptual understanding more effectively over the six-week period with no teaching meant that overall learning was stronger in the NNF condition. This mirrors findings elsewhere that greater differences emerge between narrative and expository texts when conceptual understanding is assessed a longer period of time after interactions with texts (Maria & Johnson, 1990; Arya & Maul, 2012).

In seeking to understand why NNF texts supported retention, it was found that both chronological codes and condition significantly predicted delayed post-assessment scores. In line with the suggestion of Herbert & Burt (2004), it is thought that condition predicting delayed post-assessment scores might relate to the episodic richness of narratives. NNF texts were context-specific and rich in episodic detail in terms of time (e.g. ‘The following night…’) and the presence of protagonists (e.g. ‘Harry stood…’). In contrast, ETs were more generalised, and therefore episodically poorer, in terms of both time (e.g. ‘during the nights’) and references to general groups of people (e.g. ‘The soldiers stood…’). This might result in the activation of the episodic memory in processing NNF texts to a greater degree than ETs. This is supported by neurological research, which suggests that similar areas of the brain are activated when comprehending narratives and drawing on the episodic memory (Mar, 2004; Yuan et al., 2018), whereas semantic associations are more important to the processing of ETs (Wolfe, 2005). Although it could be argued here that the episodic richness of expository texts could be increased to achieve a similar effect on retention, without following a specific protagonist within a specific context, it is difficult to see how expository texts

253 can be as episodically rich as narratives: as discussed in section 2.2.5 of the literature review, episodic memory and narrative are thought to be closely related.

Despite this argument, chronological codes were found to be a stronger predictor of delayed post-assessment scores than condition. It might be that the chronological structure of NNF texts supported the retention of conceptual understanding; if a chronological mental representation of the text is constructed, this might establish important links between various parts of the text, which allow more information to be retrieved from the narrative with greater ease (Radvansky, 2012). It is likely that retention of narrative information is due to a combination of episodically rich information embedded within a chronological structure: narrative is ‘residing in memory as separate but unified episodes with rich context and event sequences’ (Clariana et al., 2014:604).

However, the above theories suggest that participants were retrieving information from mental representations of the texts when they were completing delayed post- assessments. As these assessments were sat six weeks after the third and final intervention, it is more likely that participants were retrieving information that they had integrated into pre-existing knowledge structures. If this is the case, then conceptual understanding developed from NNF texts is likely to have maintained some kind of chronological structure when integrated into pre-existing knowledge structures. In order to explore this, it is firstly important to consider the construction of mental representations in response to narrative and expository texts.

9.3. Mental representations of narrative and expository texts This section of the discussion will explore the mental representations constructed in response to NNF and ETs. This discussion will be framed within the stages of reading relevant to this research, as set out in Chapter 4 (see figure 3, p.71). Stage Two of the reading process will be considered first: this stage involves the construction of textbases, followed by situation models. In the discussion regarding situation models, two distinct types of situation models will be proposed in relation to NNF and ETs. Following this, Stage Three of the reading process will be discussed, in which conceptual understanding developed in response to texts is transferred into pre-existing knowledge structures.

254

9.3.1. Stage Two: Interpreting the text. The construction of a textbase Previously, it was argued that NNF texts used in this research had stronger chronological threads than ETs, and that these allowed participants to establish chronological relations between propositions presented in texts. This process constitutes the construction of a chronologically structured textbase. In contrast, participants in the ET condition might have found it more difficult to establish relations between propositions in the absence of a strong chronological thread, particularly if they lacked the adequate prior knowledge to establish any alternative relations. Consequently, constructed textbases might have been weaker, with propositions, or historical facts, being conceptually isolated (Herbert & Burt, 2004). As these differences existed despite the texts in this research being highly similar in terms of chronological structure, it is possible that the observed effects would be exaggerated further in naturalistic texts, where children’s expository texts on historical subjects are more likely to be thematically rather than chronologically organised (Meyer, 1985). In addition, the topic of WWI was unfamiliar to participants: if the topic were more familiar, participants may have established alternative relations between propositions in response to NNF texts, and been more able to establish these relations in response to ETs. Consequently, textbases may have been equally strong across different text types. This is supported by the finding that participants with a greater interest in WWI, and subsequently a higher level of prior knowledge, developed conceptual understanding more effectively in response to ETs than those with no interest. It is important to note here that while prior knowledge is not actively integrated into textbase representations, the presence of prior knowledge can still support understanding and the ability to link propositions. For instance, knowledge of the Austro-Hungarian Empire might support participants in being able to ‘see’ why propositions about this Empire and Germany might be related, as they are aware that these areas shared a border in the same continent. Finally, whilst textbases are not of direct importance to learning processes, it is thought that the textbase determines the nature of situation models, in that it contains information on the specific, linguistic style in which a text conveys information (van Dijk & Kintsch, 1983).

255 9.3.2. Stage Two: Interpreting the text. The construction of situation models In exploring the construction of situation models, this section will propose the existence of two distinct types of situation models: storyworlds, typically constructed in response to narratives, and conceptual models, typically constructed in response to expository texts. Firstly, it will explore what might initiate the construction of one or the other of these situation models. Following this, it will outline a brief definition of each of these. Finally, it will look at distinguishing features of these situation models, including their structure, processes of transportation and the importance of protagonists. Throughout, it will reflect back on the development and retention of conceptual understanding, illustrating how these distinct situation models might have impacted the findings discussed above.

9.3.2.1. Initiating the construction of distinct situation models An essential part of the construction of a situation model is the activation and integration of a reader’s prior knowledge. Despite previous research suggesting that prior knowledge is utilised more frequently during expository text comprehension (Wolfe & Mienko, 2007; Best et al., 2008; Wolfe & Woodwyk, 2010), the current research found that more references to prior knowledge were made in response to NNF than ETs. Initially, it was assumed that these contradictions may be due to differences in the type of prior knowledge measured. Both Best et al. (2008) and Wolfe & Woodwyk (2010) appeared to only measure participants’ topic-specific prior knowledge, rather than their personal prior knowledge. As a result, they concluded that when reading an expository text, the reading goal is typically to learn; as a result, readers activate topic- specific prior knowledge, relevant to the subject of the text, in an attempt to understand textual content in light of what they already know. In contrast, when reading a narrative, readers typically read for enjoyment; as narratives mirror structures that individuals use to make sense of their own lives, readers seek cohesion by drawing on personal knowledge to make knowledge-based inferences (Britton & Gülgöz, 1991; Graesser, 1981; Graesser et al., 1994; Elbro & Buch-Iversen, 2013).

The current research did find that participants in the NNF condition generated more inferences, but also, that they made more references to prior knowledge, and that these references predicted the production of inferences. Although prior knowledge in this

256 research encompassed both topic-specific and personal prior knowledge, only approximately 15% of referencing codes referred to personal prior knowledge, whilst the remaining 85% referred to topic-specific prior knowledge. The typical use of different types of prior knowledge in response to different text types, described above, may still stand, yet as a hybrid of narrative and exposition, NNF may encourage simultaneous reading goals of reading to learn and for enjoyment. Therefore, this text type might encourage the use of both topic-specific and personal prior knowledge to a greater extent than either narrative or expository texts individually, alongside encouraging the generation of knowledge-based inferences. Although it is uncertain as to how readers will categorise NNF, and therefore how they respond to it in terms of reading goal, this research found that participants in the NNF condition were more likely to label texts as partially factual35: this indicates that participants did recognise NNF texts as a fusion of narrative and exposition.

9.3.2.2. Distinguishing between and defining storyworlds and conceptual models At this point, it is possible to begin to draw a distinction between the situation models constructed in response to narrative and expository texts. As described above, different text types and their associated reading goals may initiate unique uses of prior knowledge, which in turn may initiate the construction of unique situation models. In relation to a narrative text, references to personal prior knowledge and the generation of knowledge-based inferences help to establish cohesion in the mental representation constructed: the situation model constructed is a cohesive whole, or a ‘microworld’ (Graesser et al., 2002:258). It represents an alternative possible world, or reality. This is reminiscent of Herman’s definition of storyworlds as ‘global mental representations enabling interpreters to frame inferences about the situations, characters, and occurrences’ (2009b:72-73), and therefore, going forward, the term storyworld will be used to refer to narrative situation models. In contrast, rather than representing a cohesive whole, situation models constructed in response to expository texts are more likely to integrate topic-specific prior knowledge with relevant information presented in the text in order to gain an understanding of new information in the context of what a reader already knows: these situation models are individual mental representations of

35 While this difference was only significant for Trench Life and Home Front texts, War Begin texts showed a similar pattern, which did not reach significance. 257 the key concepts presented in the text. Therefore, the term conceptual models will be used to refer to the situation models typically constructed in response to expository texts. Throughout the previous two chapters, it was suggested that NNF texts might support the development of stronger situation models than those in the ET condition, thus resulting in the more effective development of conceptual understanding. This section highlights that, alternatively, NNF and ETs might initiate the construction of distinct types of situation models, which offer their own, unique benefits depending on the reader and circumstances of reading. These will be explored in more detail in the following sections.

9.3.2.3. The underlying structure of storyworlds and conceptual models In constructing a storyworld, there needs to be a thread that holds all events in the storyworld together, to ensure cohesion. Throughout the above discussion, it has become evident that chronology is central to the storyworld. Firstly, it was argued that the chronological thread running through NNF texts enabled participants to establish chronological relations between propositions, thus constructing a chronologically structured textbase. However, the fact that chronological codes also predicted delayed post-assessment scores suggests that this chronological structure is maintained beyond the textbase, which decays rapidly (Kintsch et al., 1990), and is unlikely to be strong enough to retrieve information from after six weeks of no teaching. Therefore, it is suggested that the storyworld retains this chronological structure. This might be supported further by the finding that chronological utterances predict the generation of inferences. In Chapter 8, it was suggested that this might have been a result of a chronologically structured textbase reducing cognitive load, as relations between propositions were simple to establish, and therefore more working memory space was available to generate inferences. However, in light of the above discussion, an alternative reason is proposed: narratives are ‘irreducibly durative’ (Bruner, 1991:6), and therefore a chronologically structured textbase might initiate the construction a storyworld, in which the same chronological structure is retained. If a strong sense of chronology initiates the construction of a storyworld, it is simultaneously encouraging readers to make inferences in order to establish further cohesion within this storyworld.

258 However, these findings do not sit neatly with event segmentation theory, which suggests that a reader segments a text into separate situation models denoting the different events that occur in a text: discontinuities in the five dimensions monitored during reading might signal the end of one event, and therefore the beginning of a new event (Zacks et al., 2007; Bailey et al., 2017). Whilst the current research does not intend to disregard this theory, in light of the above findings, it proposes a shift in how this theory is framed in relation to narrative. This thesis proposes that what were previously considered as discontinuities on any of the five dimensions might be considered as continuities between different events occurring within a narrative, therefore connecting these otherwise separate events. Through this process, a cohesive, global storyworld is constructed. In contrast, conceptual models constructed in response to expository texts that describe events are more likely to be segmented where discontinuities occur in order to allow for distinct sections to then be integrated with relevant prior knowledge. This results in the creation of multiple conceptual models, with each model representing the different events, and concepts, denoted in the text.

An example from the texts used in this research may illustrate this point further. Trench Life texts explored life in the trenches, before moving on to consider the Battle of the Somme. The beginnings of the paragraphs about the Battle of the Somme are illustrated below:

NNF: ‘The following summer, on the 1st of July 1916, about 100km away from where Harry had fallen asleep after his first sentry duty, mist settled over no-man’s land.’

ET: ‘On the 1st of July 1916, rows of soldiers sat in the Front Line trenches in an area of France called Somme.’

In response to the NNF text, both time and spatial location changed, initiating the beginning of a new event. However, this does not lead to the construction of a new storyworld; the references to time and space link back to the previously described event, and the protagonist remains consistent, thus creating a bridge in the storyworld, connecting Harry’s actions discussed in the previous paragraph to the actions in the subsequent paragraph. As suggested by Clariana et al., ‘elements of narratives such as

259 the story setting, sequential events, and characters should strongly bind intra-story associations’ (2014:603). Conversely, with regard to the ET, the starker change in time and spatial location, with no link back to previous events and no protagonist to connect this paragraph to the preceding one, may initiate the construction of a new conceptual model: the conceptual model on life in the trenches is completed, and a new one is initiated to represent the Battle of the Somme.

These differing structures are exemplified in this research. Firstly, participants in the NNF condition were more able to construct complex utterances than those in the ET condition, linking multiple pieces of textual information; this is a likely result of their cohesive storyworld with established links between events. This is supported by Radvansky (2012), who notes that event segmentation can affect retrieval of information in different ways: if retrieving information when different events share attributes (for instance, the same protagonist, or occurring within the same, overarching chronological timeframe), retrieval is improved, whereas when attempting to retrieve information about an individual event (or conceptual model) that does not share common attributes with other events, retrieval interference can occur. This might additionally explain why those in the NNF condition retained a greater conceptual understanding than those in the ET condition: all events in NNF texts occurred within the same, strong chronological thread, and this provided a link, or bridge, between all events, thus supporting retention of information. Further to this, while participants with an individual interest in WWI developed significantly more conceptual understanding than those with no interest in the ET condition, they could not retain this over time; this may be because conceptual understanding was represented in multiple conceptual models, with no links connecting them, making the retrieval of information more challenging.

The differences in the structure of situation models constructed in response to narrative and expository texts is also supported by research elsewhere. Clariana et al. (2014) found that expository texts were more often recalled in a list-like way (listing individual concepts), whereas narratives were retold in a narrative form, maintaining narrative elements. Similarly, chronological confusion has been observed more frequently in response to expository than narratives texts (Browning & Hohenstein, 2015), and

260 narratives are more likely to be recalled in their original order, whereas more reversals of the order of information occur when recalling expository texts (Wolfe & Mienko, 2007; Wolfe & Woodwyk, 2010). As discussed in Chapter 5 (section 5.2.1, pp.80-81), while the authors presented further evidence that this was because narratives resulted in stronger textbases, whereas expository texts resulted in the construction of stronger situation models (Wolfe & Woodwyk, 2010), the statistics presented regarding situation models were questionable. The current research instead argues that results presented in these studies reflected the presence of stronger textbases and similarly structed storyworlds in response to narratives, and stronger conceptual models in response to expository texts, which would account for differences in the order of textual information recalled: conceptual models are distinct and rearranged to accommodate for a reader’s prior knowledge, whereas storyworlds are cohesive, and maintain the structure of the original text.

9.3.2.4. Transportation into situation models Previously in this discussion, it was suggested that readers can transport into their mental representation of a narrative to support conceptual understanding, but that this is less likely to occur in response to ETs. It might be that readers can transport into storyworlds because these mental representations represent context-specific, cohesive, alternative possible worlds that can be explored; these worlds are structured in a similar way to that which individuals structure their own reality, and readers can process them as such. This is supported by suggestions that the episodic memory draws on autonoetic consciousness, which allows mental time travel, enabling individuals to imagine possible future happenings (Tulving, 2005): this argument could be extended to suggest that it also allows imagining possible pasts. Conversely, readers of ETs are less likely to transport into conceptual models because these do not represent alternative possible worlds, but rather, a set of context-free, generalised concepts. These are not mental representations that readers can experience and ‘live’ in, as they do not parallel the structure of a reader’s reality.

This argument is supported in the finding that the production of referencing codes predicted the production of hypothetical codes, both of which occurred more frequently in the NNF than the ET condition. It is likely that references to prior knowledge support

261 the successful construction of a cohesive storyworld, which subsequently enables transportation into this storyworld, and therefore hypothetical thought. This may appear to raise the question of why referencing codes, rather than inference codes, predicted hypothetical thought, as the current line of argument sets out that knowledge-based inferences are also important to establishing a cohesive storyworld. However, as hypothetical thinking codes preceded inference codes in the coding schemes, this possibility could not be explored (see section 6.8).

9.3.2.5. The role of the protagonist It is evident that participants were sensitive to the presence of protagonists, as texts that followed protagonists were more likely to be classified as partially factual than texts with no protagonist, or reference only to historical figures, which were more often categorised as entirely factual. This indicates that the presence of a protagonist might also influence how a reader processes a text, possibly encouraging readers to construct a storyworld rather than a conceptual model. The importance of the protagonist in narrative comprehension is supported by neurological research: areas of the brain activated whilst reading narratives are associated with theory-of-mind processes, including perspective-taking and empathy (Mar, 2011; Yuan et al., 2018), and areas of the brain associated with processing particular actions are activated when relevant actions occur in a narrative (Speer et al., 2009). Such research suggests that readers are processing narrative events as though they themselves are active participants in these events. This may relate to the process of transportation: while the cohesive, chronological structure of the storyworld provides an alternative world that a reader can transport into, readers may be motivated to transport into this storyworld to view events through the eyes of a protagonist. This reflects definitions of narratives, which consider the landscape of consciousness that is unique to narratives (Bruner, 1986), and the emotional processing of narrative relating to human agency (Bruner, 1991; Boström, 2008; Klassen & Froese-Klassen, 2014). Although only anecdotal evidence, it is noteworthy that a number of participants showed a great interest in Harry Stinton, the soldier in the Trench Life text: after interventions, they were curious to know whether he had survived the Battle of the Somme, and the remainder of the war, showing empathy for this protagonist.

262 9.3.2.6. A summary of the distinct nature of storyworlds and conceptual models Overall, discourse codes and other findings suggest differences in how participants responded to NNF and ETs, which may reflect the nature of the situation models constructed in response to different text types. It is argued that, in response to a narrative text, readers are more likely to construct a chronologically structured, cohesive storyworld: this is a global representation of the text that emphasises the connections between events denoted in the text. Knowledge-based inferences, which draw on a readers’ prior knowledge of personal experiences, are made to establish cohesion. The chronological structure of the storyworld supports chronological and causal thought, and the development of a complex conceptual understanding. The context-specific nature enables participants to transport into the storyworld, which represents an alternative reality, and they are motivated to do so to traverse this storyworld alongside, or through the eyes of, the protagonist. This enables the development of a conceptual understanding of unfamiliar concepts and the development of a greater second-order knowledge of historical concepts. In response to expository texts, conceptual models are constructed: readers segment the text into different conceptual models to represent the different events and concepts denoted in the text. Readers integrate their topic-specific prior knowledge into conceptual models, reorganising textual information so that it fits with prior knowledge, in order to understand the text in the context of what they already know. As such, readers with higher levels of prior knowledge can develop conceptual understanding more effectively in response to this text type than those with lower levels of prior knowledge. However, the construction of situation models is unlikely to constitute long-term learning, but rather, the way in which situation models, or information from situation models, is embedded into long-term memory constitutes learning from texts. The following section will go on to explore how both storyworlds and conceptual models might be embedded into long-term memory.

9.3.3. Stage Three: Translating the text into learning To be able to answer delayed post-assessment questions accurately, it is likely that conceptual understanding was integrated into participants’ pre-existing knowledge structures, rather than retrieved from situation models constructed between six and nine weeks previously. To explore this process in relation to storyworlds, distinguishing

263 the storyworld from the concept of possible worlds (Bruner, 1986), and more specifically ‘historical pasts’ (Goldstein, 1962), is important. Firstly, possible worlds represent possible, alternative realities, or worlds. A storyworld represents the world portrayed by a narrative, and the events that occur within this world: it is a type of possible world, but one which might not be believed to be entirely factual. An historical past is another type of possible world, which represents how an individual perceives the world in the past, or a particular time period. These different possible worlds are all context-specific. Once a reader has constructed their storyworld, this storyworld is comparable in structure to a reader’s many possible worlds, or historical pasts: in constructing the storyworld, the reader is likely to have utilised any topic-specific prior knowledge from the relevant historical past. Information in the storyworld that the reader feels holds truth-value might then be translated from the storyworld into the historical past. The information remains context-specific, and it might run parallel to other context-specific details already in the historical past. For instance, a reader may already have an historical past about WWI, within which a particular soldier is crossing no-man’s land to save a fellow soldier. Parallel to this, a reader might then integrate Harry Stinton peering out through a tangle of barbed to watch over no-man’s land whilst on sentry duty.

This suggestion that all conceptual understanding relating to WWI is stored in a cohesive historical past is supported by the finding that participants in the NNF condition performed significantly better on the two assessment questions that spanned all three interventions at the delayed post-assessment than those in the ET condition. This shows that NNF participants were more able to retrieve information from all three intervention sessions in conjunction, suggesting that understanding developed in response to all three texts might be stored in some form of cohesive historical past. As a learner encounters more specific ‘episodes’ relating to a topic, knowledge within these episodes can gradually shift from context-specific episodic knowledge to a more generalised, schematic understanding in a process of schematisation (Herbert & Burt, 2004). To continue with the previous example, the same learner might go on to watch a film in which another soldier is looking out over the top of the trench; this context-specific information might also be integrated into the historical past, parallel to the information about Harry Stinton. As more episodes are encountered and accumulated the learner integrates a general understanding into their semantic memory that soldiers often have

264 to be on sentry duty, which involves looking out from the trench to sense any movement.

In contrast to the above, previous research suggests that only context free information is transferred into possible worlds, rather than context details (Gerrig & Prentice, 1991; Gerrig, 1993). However, this research considered context details that starkly contradicted or aligned with reality (e.g. the vice-president of the United States is George Bush/Geraldine Feraro): readers with knowledge of the vice-president can easily reject inaccurate context details. When reading context details on an unfamiliar topic, it might be that these details are integrated into possible worlds: with a lack of appropriate prior knowledge, context details are not easily verifiable to the reader, and therefore likely not to be dismissed as easily. Despite this, Gerrig and Prentice’s (1991) findings might be supported by the fact that whilst NNF texts were context-specific, participants were able to appropriately answer post- and delayed post-assessment questions which were all general in nature (for instance, ‘Describe the factory jobs which women did during the war’, rather than ‘Describe the factory job that Ethel May Dean did during the war’). However, as observed elsewhere, young readers are able to apply conceptual understanding from specific statements in narratives to real-life situations (Ganea et al., 2011): this might not be a reflection of how information is stored in pre-existing knowledge structures, but of how learners are able to apply context-specific understanding that is stored in possible worlds.

In contrast, conceptual models segment expository texts, with multiple conceptual models representing the different situations and concepts denoted in the text. Topic- specific prior knowledge is integrated into these conceptual models, and these conceptual models are stored in relation to similar conceptual models in a reader’s pre- existing knowledge structures. For instance, a learner might construct a conceptual model about sentry duty; this may then be linked to any other conceptual models related to soldiers or sentry duty. As such, information is more likely to be processed semantically, rather than episodically: conceptual models are linked to other pre- existing conceptual models according to the concepts that they represent. This is supported by research suggesting that semantic associations predict recall to a greater extent in expository than narrative texts (Wolfe, 2005). Information is therefore

265 embedded within a larger structure, which serves as a supporting ‘retrieval system’ (van Dijk & Kintsch, 1983:341). As a result, ETs may have a broader beneficial effect than narratives, as knowledge developed in response to ETs is integrated with a broader range of similarly categorised knowledge, rather than stored within a specific storyworld alongside other textual information.

9.4. Perceptions of truth-value of narrative nonfiction and expository texts The third and final stage of the reading process, discussed above, involves participants transferring new information from the text into pre-existing knowledge structures, thus initiating learning. At this stage of the reading process, the reader is faced with the ‘reader’s dilemma’ (Gerrig & Prentice, 1991): when reading a text not perceived as entirely factual, a reader needs to determine which information is true in a fictional world only, and which information is also true of reality, and therefore can be translated into pre-existing knowledge structures. This section of the discussion will firstly consider participants’ truth-value judgements of texts, and their justifications for these, before considering the possible implications that these judgements might have for the findings of this research.

9.4.1. Participants’ truth-value judgements and justifications As discussed in Chapter 8, it was found that truth-value judgements differed significantly across conditions: participants were more likely to judge NNF texts as partially factual, and ETs as entirely factual. This finding is particularly pertinent in consideration that judgements were made blind to text type, as texts were not labelled in any way. In addition, exposure to NNF and ETs occurred within the same learning environments, meaning that the texts were positioned with the same primary reading purpose: to inform. Therefore, truth-value judgements were entirely dependent on the features and general impression of texts.

It was also observed that truth-value judgements were likely to be made in consideration of the presence of one or more protagonists: the only texts that followed protagonists were also the only texts to be classified as partially factual significantly more than their equivalent ET texts. Examples of one of the justification codes – the ‘character’ code – showed some participants posing questions about how the author

266 could know what the protagonist had seen: the author was ‘not in his [the protagonist’s] eyes’. This suggests some kind of recognition that the protagonist’s experiences are unique to that individual: they cannot be observed directly by the author, and therefore cannot be accurately retold by the author, resulting in the categorisation of texts as partially factual. However, this is speculative: very few of these character codes were produced, which means the ideas expressed in them cannot be generalised to the rest of the participant sample. Further to this, this could not have been the sole factor: although not a significant effect, the War Begins NNF text was classified as partially factual more often than the War Begins ET, despite containing no key protagonists. Therefore, truth-value judgements are likely to only partially reflect the presence of a protagonist. It was also found that participants in the NNF condition produced a greater number of composition codes in justifying their truth-value judgements; these codes were largely ‘making sense’ codes, whereby participants critically evaluated whether specific content made sense in relation to their own understanding of reality. In doing so, participants were considering the verisimilitude of texts (Bruner, 1991) in relation to their own experiences. This might have been initiated by the underlying structure and particularity of narrative texts, which reflects how readers organise and understand their own experiences (Bruner, 1986; Gudmundsdottir, 1991; Doyle & Carter, 2003), encouraging participants to compare the narrative content to their own experiences. Therefore, judgements of narrative texts as partially factual might have been partially down to a combination of levels of particularity, structure and the presence of protagonists.

Conversely, in the ET condition, texts were more generalised. It has been found that older children become more likely to judge speakers giving generic information as more knowledgeable than those giving specific information (Koenig et al., 2015). They are seen as experts, and it is thought that we view experts, such as historians, as having some kind of moral authority (Mantel, 2017a): we trust that these experts will impart factual knowledge upon us. Wineburg comments that the way in which ETs are written encourages this: they speak to us ‘in the omniscient third-person. No visible author confronts the reader; instead, a corporate author speaks from a position of transcendence’ (2001:12-13). This might be applicable to the texts used in this research, as participants more often categorised ETs as entirely factual than equivalent NNF texts.

267 There is no need to judge the verisimilitude of the text if readers are placing a greater degree of trust in the author.

It is evident that truth-value judgements did not significantly influence participants’ integration of conceptual understanding into pre-existing knowledge structures: partially factual codes positively correlated with post- and delayed post-assessment scores. Although participants often classified NNF texts as partially factual, their doubts over factuality might be in relation to elements that do not interfere with the development of a conceptual understanding of history. For instance, the participant questioning how the author could know what the protagonist had seen made this comment in relation to Harry Stinton seeing a Very Light. This might lead to questions not regarding whether Very Lights exist, but whether Harry Stinton saw a Very Light at the time and place stated.

However, caution must be taken in assuming that delayed post-assessment scores reflect the integration of conceptual understanding into pre-existing knowledge structures. It might be argued that participants viewed written assessments as reading comprehensions, and therefore there is the danger that learners did not actually integrate learning into pre-existing knowledge structures, but rather regurgitated information from texts to answer assessment questions. However, learners sat delayed post-assessments between seven and nine weeks after interacting with texts, with no copies of the texts present: it is unlikely that participants viewed this as a reading comprehension, suggesting that integration of conceptual understanding into pre- existing knowledge structures did take place. However, another concern is that performance on assessments does not indicate that participants are able to generalise and apply conceptual understanding demonstrated outside of the interventions. ‘It is very easy to pick up beliefs about a real world based on what has been learned about a fictional world… but this is a far cry from enabling us to have justified true beliefs about the world, which is required for acquiring knowledge’ (Jones, 2019:4). Participants’ beliefs that the NNF texts were partially factual might have led them to categorise any resultant conceptual understanding as ‘beliefs’ about WWI, rather than as justified true beliefs about WWI. Further research is required to explore how far learners might apply

268 understanding developed in response to narratives in settings outside of experimental ones.

A final consideration is how participants negotiated their conceptual understanding developed in response to the texts between them. Referencing codes were found to predict collaborative (negotiation) codes, both of which occurred more often in the NNF condition. Referencing codes were assigned when participants made any form of reference to their own pre-existing knowledge. Collaborative (negotiation) codes were assigned when participants collaboratively constructed understanding, negotiating each other’s ideas in a constructive way. It is thought that the more references made to prior knowledge, the more unique the situation models constructed by readers, as these are infused with a greater amount of individualised prior knowledge. As a consequence of this, when drawing information from these situation models in order to answer discussion questions, it is more likely that discrepancies arose between participants, as their situation models were all unique. However, participants were then able to collaboratively negotiate and construct accurate conceptual understanding within the social space between themselves and the text, which participants could then choose whether to internalise, by adapting their situation model accordingly. As both referencing and collaborative (negotiation) codes occurred more frequently in the NNF condition, it might be argued that increased variation in storyworlds enhanced accurate understanding when discrepancies between participants’ storyworlds were encountered. Alternatively, the increased collaborative (negotiation) codes might also be a reflection of participants’ perceptions of the truth-value of texts: if a text is believed to be entirely factual, as the ETs were more often believed to be, there may be less room for disagreement about conceptual understanding amongst participants. If a text is believed to be partially factual, there may be more scope for participants to constructively create meaning in the social space between them, as they need to negotiate which parts of a text to believe, thus leading to a greater number of collaborative (negotiation) codes.

269 9.4.2. Implications of truth-value judgements for using different text types in the classroom Although it does not appear that truth-value judgements influenced the development and retention of conceptual understanding, practitioners may remain concerned about using NNF texts to support learning if learners are more likely to classify them as only partially factual. However, this concern may be alleviated for two reasons. Firstly, it is important to note that participants’ judgements in this research were made entirely independent of any adult influence. When using texts in the classrooms, practitioners can establish that textual information is entirely factual, explaining the concept of NNF texts to learners, either before or after reading the text. This might help to ensure that no textual information is disregarded by learners. However, more research is required to explore how revealing the truth-value of the text before reading might influence a reader’s subsequent response to the text: if told that a text is entirely factual, readers might be more inclined to read explicitly to learn, thus constructing conceptual models in response to a narrative rather than a storyworld.

Secondly, it is not necessarily a disadvantage that readers might judge a text as partially factual. In fact, negotiating the factuality of a text evidences a further historical thinking skill: historical enquiry and interpretation. Speaking for the , Mantel (2017b) argues that this is important both with regard to narrative and expository texts: ‘When the reader of a story asks, ‘How do I know which bits of this are true?’ he must ask that same question of the historian, as well as the novelist’. There is no historian who has actively witnessed the ‘real past’, but rather, the historian presents an ‘historical past’, which is naturally imbued with subjective interpretation (Goldstein, 1962). Therefore, it is a positive step to encourage young learners to actively debate the truth-value of all texts within an historical context, rather than accepting expository texts to be entirely factual. Perhaps NNF texts could be a further way of encouraging learners to construct their own interpretations of history, to essentially become their own historian.

9.5. Limitations This section will discuss the main limitations of this research. It will begin by considering the ecological validity of the research, focusing on the experimental approach and the

270 texts used in interventions. It will then go on to explore the different measures of learning, considering both how learning was operationalised and the construct validity of assessment tools. The position of participants and the role of the researcher will also be considered, in terms of how participants were considered as a collective, and the difficulties encountered with the researcher conducting interventions. Finally, the possible influence of reading ability on findings will be explored.

9.5.1. Ecological validity Whilst there were numerous reasons for adopting a quasi-experimental approach in this research, it might be criticised that this approach compromises ecological validity, as it is less likely to capture an image replicating learning in a naturalistic environment. In striving to minimise the potential influence of external factors, a rigid learning environment was imposed. For instance, a script was used to deliver the interventions, to maximise consistency across the two conditions. If participants asked questions about the topic being taught, they were told to write these down to be discussed after the entire experiment was complete. Whilst this ensured that no condition received any additional support or information which might influence the results of the research, it also created a restrictive environment unlike that which learners are likely to be used to. This might lead to participants behaving differently in response to the unfamiliar circumstances. In addition, it means that findings may be less generalisable to everyday teaching settings: the classroom is a dynamic place which is fluid and ever-changing, and texts might be utilised and responded to in different ways within such an environment. In light of this anticipated criticism, measures were taken to ensure that the environment was as naturalistic as possible. Interventions were conducted within participants’ everyday classrooms, alongside their everyday peers. They were conducted like a series of lessons, taught by a different teacher. Already, this approach was more naturalistic than much of the research exploring reading processes, which is often conducted within experimental settings.

A further consideration is how representative the texts used in interventions were of authentic narrative and expository texts. Texts were written specifically for this research project, and therefore many elements of these texts could be closely controlled to ensure that fewer additional factors in the texts might influence learning. For instance,

271 despite the fact that a definitive feature of narratives is an underlying chronological/causal sequence (Richardson, 2000; Norris et al., 2005; Richmond et al., 2011), NNF and ETs presented information in the same sequence, because the order of information in a text has been observed to influence which information is later remembered (Rumelhart & Norman, 1978; Yates & Curley, 1986; Dennis & Ahn, 2001). Here, controlling for potential external factors may have detracted from key differences between NNF and ETs. Therefore, it is uncertain as to whether similar differences in learning across text types would be observed when texts were more distinctly narrative or expository in nature, thus lessening ecological validity. However, the fact that the NNF and ETs used in this research were likely to be more similar in nature than authentic narrative and expository texts suggests that these findings may in fact be conservative regarding the distinctions between NNF and ETs.

9.5.2. Measures of learning The development and retention of conceptual understanding was primarily assessed using written assessments, because numerical data was desired for quantitative data analysis. However, it is questionable as to how far learning can reliably be operationalised through points scored on an assessment: learning is a complex process, which written assessments might not fully capture. The assessments also determined what counted as learning: a short, written assessment could not possibly contain questions on all aspects of the texts, and therefore knowledge of only particular parts of texts was assessed. Some participants might have developed a good conceptual understanding of other ideas in the texts that were not assessed. Regardless of this issue, written assessments are a common method of measuring learning in the current educational climate, to the extent that pupils’ performance on tests is often thought to be a reflection of teacher performance. In light of this, arguably, practitioners may be less interested in interpretative research which leads to less secure conclusions (Gorard, 2001), but more interested in seeing how teaching methods impact what is central to their practice: assessments and grades.

This brings into question the construct validity of assessments. As discussed in the methodology chapter, assessment questions designed to assess second-order knowledge only elicited substantive knowledge answers from participants, therefore

272 not measuring what they set out to. Different results observed between assessment scores and discussions also raised the issue of construct validity. For instance, whilst participants in the NNF condition made significantly more complex utterances during discussions than those in the ET condition, no difference across conditions was observed for complex conceptual thinking questions on assessments. This might have been due to participants responding with simple answers because they were not aware that greater complexity was required: whilst longer answer spaces were provided, no indication was given of potential points for each question. This suggests that these measures should be questioned: are they measuring what they intend to measure? However, this was why discussion questions were also used to assess the development of conceptual understanding once texts had been read. Not only did these shed more light on the complexity of the process of developing conceptual understanding from texts, but they also allowed for comparisons to be made with findings from assessments, so that both methods of assessing could be critiqued, discrepancies could be identified, and a more holistic image of conceptual understanding could be captured.

In addition, the use of written assessments to measure learning might be exclusive to some participants. For instance, those participants with any particular learning difficulties which might affect their writing ability or concentration may be less able to express their knowledge through a formal, written assessment. Therefore, it is questionable as to whether their learning was accurately represented in the data collected. However, few participants had specific learning difficulties which their teachers identified might influence their performance on assessments: those who did were given minor additional support where necessary, to enable them to access the task, for instance having questions read to them, or regular reminders to focus. The subsequently coded discussions after texts also allowed participants who were less able of accessing assessments another way to express their conceptual understanding.

A further consideration regarding how learning was measured relates to the use of group discussions. Discussions were completed in groups of three to four, and therefore required the collaboration of participants. As such, it cannot be claimed that each individual’s conceptual understanding was being assessed in discussions: participants’ conceptual understanding was inevitably influenced by the input of others. Not only

273 does this impact discussions, but it may also have impacted later post- and delayed post- assessment scores: participants’ assessed learning might not be a product of their own, individual interaction with a text, but rather of their interaction with peers alongside the text. For instance, an individual’s knowledge might have been positively influenced by being in a group with a learner who was particularly competent at comprehending and interpreting the texts read. However, firstly, this approach reflects the theoretical framework underpinning this research: the social constructivist view adopted suggests that understanding is created in a social space, whether this social space exists between individuals, or between a text and a reader. Therefore, understanding cannot be considered without acknowledgement of these social spaces. In addition, once understanding has been negotiated, an individual internalises understanding: this process of internalisation is still completed by the individual, and therefore controlled by the individual to some extent. Finally, this approach is likely to enhance ecological validity within the primary classroom setting: in such a setting, it is unlikely that children would be expected to read a text to learn about a topic without an opportunity to discuss what had been read with those around them.

Despite the above, whilst this research makes claims about the influence of narratives on learning, it must be considered that these findings were made when participants had the opportunity to discuss their conceptual understanding of narratives in small groups after reading the texts. It is questionable as to whether the same findings would have been made if texts were read in isolation, with no opportunities for discussion. For instance, there is no evidence to suggest whether inferences were made on-line (during the reading of the text) or off-line (after text comprehension, during retrieval tasks or discussions) (Graesser & Kreuz, 1993). If made off-line, then the discourse after reading texts was crucial in supporting the generation of these inferences. A similar claim can be made about the prior knowledge tasks: if adequate levels of prior knowledge were not activated before interacting with texts, would similar results have been found? Therefore, the development and retention of conceptual understanding in this research cannot be attributed to narratives alone, but must be considered within the context of interventions. Whilst this was true for both the NNF and ET conditions, and therefore is unlikely to have influenced comparisons, it is an important consideration for

274 practitioners hoping to implement narrative to support learning in the classroom, and for research moving forward in this area.

Finally, whilst this research argues that differences observed across the two conditions were resultant of the different text types that participants were exposed to, it does not address how participants perceived texts, and whether they identified these as expository or narrative. Whilst a question on the post-questionnaire intended to explore participants’ judgements of the type of text they had read, participant answers were not appropriate, and therefore participant judgements of text types could not be explored. Further research might consider how differences in responses to different text types may reflect participants’ perceptions and categorisation of texts, rather than the researcher’s perceptions and categorisations.

9.5.3. The roles of the participants and the researcher Within this research, participants were grouped and considered collectively as ‘conditions’, rather than as individual learners. This approach might fail to capture the essence of learning being a highly individualised process, in which all learners will respond uniquely to particular stimuli. Whilst this may be problematic, this is also an issue that teachers encounter and attempt to deal with daily in the classroom: that is, how to teach a class of approximately thirty children when each child has a unique learning style, and individual learning needs. This research recognises this, and consequently realises the need for generalisable findings in educational research, where research can suggest which methods of teaching might generally be beneficial for larger numbers of children. Further to this, additional data was collected to isolate specific groups of participants who shared particular characteristics (for instance, age, gender, socioeconomic status, reading habits), to allow for analyses regarding whether specific groups responded in different ways. This allowed some form of further insight not into the specific responses of individual children, but into the responses of particular groups of children. While this research does not qualitatively address the individual responses of children, its quantitative nature provides a basis for further, qualitative research in the area (Gorard, 2001).

275 The decision was made that I was to deliver the intervention sessions, rather than class teachers. Whilst this decision was made for a number of reasons (see section 6.5.2 for justifications), there were also limitations. Firstly, this might have introduced some form of bias into the research: as laid out at the beginning of this thesis, this research hoped to identify possible benefits of using narrative to support learning. As I delivered both the NNF and ET interventions, there was the danger that I could, subconsciously, deliver the interventions in different ways. To control for this possibility as much as possible, I followed a script in each condition. Additionally, teaching participants myself meant that participants were taught by somebody unfamiliar, who did not know them. This might have influenced learning: class teachers are likely to have been able to deliver interventions in a way more effective for the individuals in their class, and therefore might have maximised learning, allowing pupils to reach their full potential. Conversely, I was unfamiliar with the classes, and participants might not have responded as positively to me as to their usual class teacher. Although I am a fully qualified primary school teacher, with many years of teaching experience, I may not have delivered interventions as effectively as they could have been delivered. Finally, as I delivered the interventions myself, it is questionable as to whether the same findings would be observed if different teachers would have delivered the interventions. Even if different teachers were trained to deliver interventions in a particular way, each teacher has a unique, individual style of delivering materials, and a personal teaching philosophy which impacts how they utilise tools in the classroom (Murmann & Avraamidou, 2014). The style of delivery might in turn influence learners’ responses to different text types. Further to this, it is recognised that the way in which a narrator – in this case the person delivering the intervention sessions – tells a narrative influences the quality of the narrative, and therefore the degree of engagement on the part of the listener (Norris et al., 2005). Therefore it is questionable as to how far findings were due to text types, and how far they were due to teaching style.

9.5.4. The possible influence of reading ability Finally, there is a concern that the reading ability of participants might have influenced the results of this research. Reading ability was observed to influence assessment scores over time, and once reading ability was controlled for, condition no longer appeared to influence assessment scores. However, reading ability was only seen to influence the

276 development of conceptual understanding between pre- and post- and between pre- and delayed post-assessments; it did not influence the retention of conceptual understanding between post- and delayed post-assessments, which is where a significant effect was originally observed for condition. In addition to this, there are questions about the reliability of the assigned reading levels, and it is evident that reading ability also influenced ability to access written assessments. Further to this, it was found that reading ability did not influence the accuracy, depth and quality of conceptual understanding as expressed during participant discussions. Finally, as texts were read to participants, this should have removed possible decoding barriers for poorer readers, allowing them to utilise their listening comprehension skills, which have been observed to be stronger than reading comprehension skills in poorer readers, yet similar in stronger readers (Diakidoy et al., 2005). Therefore, whilst evidence initially suggested that reading ability might have influenced results, there are also findings which suggest that this might not be the case. Regardless, further research is required to more specifically consider the relationship between reading ability, enjoyment of reading, and the ability to learn from narrative and expository texts.

9.6. Future directions The benefits of narrative observed in this research occurred when texts were used within particular intervention sessions; further research is required to explore whether narrative might be an effective learning tool when used within a range of different learning situations. In particular, it is important to explore whether narrative might still effectively support the development of conceptual understanding when read in isolation, without opportunities to discuss content afterwards. It might be that narrative and discussion work in conjunction to support learning, and therefore that, in isolation, narratives are less effective. The same applies to the prior knowledge activities that participants completed before the reading of texts: just how important was the activation of prior knowledge in enabling participants to develop a conceptual understanding of the texts? Future research might explore responses to narrative in isolation, or might attempt to more directly trace how prior knowledge activities and discussions of texts influence conceptual understanding developed in response to texts, in order to establish links between prior knowledge, discourse and resultant conceptual

277 understanding. Additionally, it might explore how different aspects of conceptual understanding might be developed in discourse.

This research project focused on just one, specific topic: World War One. Even within this historical topic, different themes (War Begins, Trench Life and the Home Front) were observed to influence how participants responded to NNF and ETs. Therefore, it is likely that readers respond to texts in different ways depending on the topic of the texts, and consequently, narrative might not be an effective tool in supporting learning in particular areas of history. Further to this, WWI is a topic that all children are at least slightly familiar with, due to the prominence of remembrance days and so forth. However, with a more distant, less familiar topic, such as Ancient Greece, children might find it more difficult to establish whether information is reliable, and this might influence their response to texts. The concerns that narratives may be counterproductive, as they can reduce history to myth (Barthes, 1993), might be particularly pertinent when the topic taught through narrative is completely novel to readers. On the other hand, with a more familiar topic, it would be interesting to explore what impact topic-specific prior knowledge had on learning and responses to the different text types. Herbert & Burt (2004) suggest that once a learner has processed information episodically and as learning continues, learning is then schematised, and more likely to be processed by semantic memory. Therefore, learners might respond more positively to expository texts once they are more familiar with information. Overall, more research is needed to explore whether narrative continues to support the development and retention of conceptual understanding across a broader range of topics, and amongst learners with differing degrees of prior knowledge.

Future research not only needs to explore the use of narrative as a learning tool in different situations, but also the responses of specific groups of learners to narratives. This research did touch on different groups of learners, considering the possible effect of factors such as how often children read at home, whether children received Free School Meals, and gender, on the development of conceptual understanding. However, whilst these measures could be used to assess how particular groups of children performed on assessments, they did not consider the reactions of individual children in response to different text types. As highlighted by Prins et al. (2017), some learners state

278 a preference for expository texts because they are clearer, and there is no need to separate fact from fiction. A further avenue of research might involve observing particular groups of children reading narratives to learn, exploring not only their development of conceptual understanding in response texts, but also exploring how they interact with a narrative text in order to develop this understanding, and how they feel about learning from such texts. This research has laid down foundations for further such qualitative research to be built upon.

Finally, throughout the course of this research, I became particularly interested in the concepts of the textbase and the situation model (van Dijk & Kintsch, 1983), and how these compared to other mental representations of texts proposed elsewhere, such as the storyworld (Herman, 2009a). While this research explored these concepts, and will propose a new model for reading with two distinct types of situation model in the following chapter, further research might aim to assess the presence of these mental representations more directly, to assess the strength of this model and to suggest any appropriate alterations. If, as this research suggests, mental representations are constructed differently in response to different text types, and that this in turn influences learning, it is important to explore different ways to observe and research these mental representations more directly.

279 Chapter Ten: A new model of reading In light of the findings of this research, a new model of reading is proposed, which highlights distinctions between the processing of narrative and expository texts, and the potential impact that these differences might have on learning. The model can be found in figure 33, on the following page. This model suggests two alternative pathways that a reader might follow when interacting with a text that predominantly depicts events: one pathway is typically followed in response to expository texts, and the other in response to narratives. It is assumed that narrative nonfiction texts are typically processed along the narrative path. The following explanation of the new model of reading will consider each of the three stages of reading (accessing a text, interpreting a text and translating a text into learning), suggesting how these may be similar or differ across narrative and expository texts. Whilst it is recognised that the response to a text depends on how the text is used, its intended use and who is using it (Mili & Winch, 2019), this model proposes how learners in a classroom might generally respond to narrative and expository texts. After outlining the model, the possible benefits of this new model of reading, in contrast to the old model, as presented in the literature review (see figure 3, page 71), will be explored. Limitations of this model will also be considered.

280 Figure 33: A new model of reading

Blueprint Stage 1 – (provided by text) Accessing

a text Process of decoding the text and comprehending small units of meaning

Textbase

Segmentation Composition

Stage 2 – Interpreting Conceptual Storyworld a text models

episodic pr

Preservation & Knowledge

i nferences

ocessing

- specific prior based -

Reorganisation & knowledge emantic processing s

Topic

Pre-existing Pre-existing knowledge: Stage 3 – knowledge: collection of Translating text conceptual possible worlds into learning models (including representing historical pasts) specific topics

Schematisation: shift from context- specific episodic knowledge to generalised semantic knowledge

281 10.1. Stage One – Accessing a text Accessing a text involves a reader decoding words and understanding the meaning of these words, in order to gain some form of basic comprehension of the text. This stage essentially allows readers to access the blueprint provided by the text. Whilst this research does not focus on this stage of reading, consideration of the nature of the blueprint of the text, which is accessed at this stage, is important, as this is influential in determining down which pathway the text is subsequently processed. Typically, narrative blueprints follow a familiar narrative structure which is comparable to how we structure our own experiences in life (Gudmundsdottir, 1991; Bruner, 2002; Doyle & Carter, 2003). These blueprints are characterised by an underlying chronological thread and a sense of narrative diachronicity (Bruner, 1991). They are context-specific, particular and episodically rich in nature, as they usually follow a specific protagonist. While they provide adequate information for a reader to understand and follow the narrative, they also leave gaps for the reader to fill in. The typical reading goal initiated is reading for enjoyment, and as such, readers seek to establish cohesion during reading. In contrast, expository blueprints can be presented in a range of different structures. Whilst some may have some form of underlying chronological structure (as in this research), they are likely to lack a strong chronological thread and sense of narrative diachronicity. They usually employ generic language and are episodically poorer than narratives. Whilst there may be gaps in an expository blueprint, these are likely to be far fewer than in narrative blueprints: omitting information would be counterproductive to the purpose of expository texts, which is typically to inform. To build on Herman’s (2013) analogy, rather than providing a blueprint which guides the construction of a unique final product, an expository text is more likely to provide a precise diagram, illustrating the intended final product. The typical reading goal initiated is reading to learn, and as such, readers seek to understand information in the context of what they already know. If a reader is able to decode a text to access the blueprint, they can begin Stage Two of the reading process: interpreting a text.

10.2. Stage Two – Interpreting a text

10.2.1. Textbase construction If a reader is able to access a text’s blueprint, this is then used to construct a textbase, through a process of establishing relations between the propositions in a text (van Dijk

282 & Kintsch, 1983). When reading a narrative, the underlying chronological thread allows readers to establish familiar, chronological relations between propositions, in order to construct a strong, chronologically structured textbase. Expository texts are likely to lack the same level of chronological thread as narratives, and therefore readers may have to identify different relations between propositions. These may differ according to the structure or content of the text. If the topic of a text is unfamiliar to a reader, this stage might be difficult when reading expository texts. However, other than potential differences in the nature of relations established between propositions, textbases are similar across narrative and expository texts.

10.2.2. Situation model construction Once readers have constructed a textbase, the processing of narrative and expository texts diverges more distinctly. Two alternative paths are available (see figure 33). It is unclear as to precisely what initiates readers to take either of these paths; however, it is likely that a combination of the reading goal of the reader and the type of text being read ultimately determine which path is taken. Additional factors such as the context and environment that reading takes place within might also be influential factors, as these are likely to affect reading goal (McCrudden et al., 2010). Firstly, expository texts typically activate the primary goal of reading to learn. This is likely to initiate the construction of conceptual models, through a process of segmentation: the textbase is segmented into individual conceptual models representing the different situations or concepts denoted in the text. In line with the event segmentation theory, different conceptual models are constructed where new events begin in the text (Zacks et al., 2007), as indicated by discontinuities in the text on one or more of the five dimensions that are monitored during reading: time, space, causality, entity and intentionality (Zwaan et al., 1995b; Radvansky et al., 1998). Consequently, the reader has various representations of the different situations in a text. Figure 33 demonstrates how topic- specific prior knowledge from pre-existing knowledge structures is integrated into conceptual models, in order for readers to begin to understand textual information in relation to what they already know.

In contrast, narratives typically activate the reading goal of reading for enjoyment. This initiates the construction of a storyworld not through segmentation, but through a

283 process of what will be termed composition. Through this process, readers are searching for connections between events to compose a globally cohesive storyworld. Similarly to expository texts, the reader monitors the five dimensions; however, rather than creating new storyworlds where ‘discontinuities’ occur on any of the five dimensions, the reader use continuities on the five dimensions to bridge events within the storyworld. Where there are gaps in the textbase, or where there are difficulties bridging events along the five dimensions, the reader uses prior knowledge of personal experiences and the world around them to make knowledge-based inferences, in doing so filling these various gaps. This helps to establish cohesion. They may also draw on topic-specific prior knowledge contained in relevant possible worlds to develop a greater understanding of textual information. The storyworld retains the chronological structure of the textbase, but gives it a fuller form, essentially fleshing out the bones of the textbase. It represents the narrative as an alternative possible reality, or world, that readers can transport themselves into.

10.3. Stage Three: Translating the text into learning Finally, conceptual understanding developed within these situation models can be integrated into pre-existing knowledge structures. However, this is likely to occur in different ways in relation to storyworlds and conceptual models. The multiple conceptual models constructed in response to expository texts are likely to be integrated into a larger network of conceptual models, according to the links that a reader has drawn between conceptual models and their own topic-specific prior knowledge. During this process, conceptual models are reorganised to enable them to be connected to other similar conceptual models. Information in these models is generalised. It is likely that semantic memory plays a role here, as readers use semantic associations to link similar conceptual models together. This process is comparable to placing a file in the most appropriate place in a filing cabinet. In contrast, the storyworld represents contextualised events, which occurred in a particular time and setting, and is therefore more likely to be processed using the episodic memory. If a possible world (or historical past) already exists in relation to a topic, a learner can transfer information from the storyworld into this possible world. The possible world might have many different strands, as it includes previous episodic experiences of the same topic. The underlying chronological structure of the storyworld is preserved as information is

284 transferred into the possible world. Alternatively, in the absence of a relevant, existing possible world, a new possible world might be constructed. This process is comparable to placing a book on a bookshelf which is organised according to content and genre: the storyworld remains as a cohesive whole, sitting alongside similar storyworlds.

As learners gradually encounter more episodes related to a topic, and as a result, have constructed detailed possible worlds, schematisation may occur (Herbert & Burt, 2004). This is a shift in a learner’s conceptual understanding, from a specific, contextualised conceptual understanding to a more generalised, semantic conceptual understanding. This generalised, semantic understanding is resultant of encountering enough information to understand what can be generalised, and as a result, beginning to recognise more complex links between different strands within the possible world. In doing so, learners gradually build a conceptual understanding of the more complex interrelationships between the propositions presented in texts (Mili & Winch, 2019).

These two pathways reflect Bruner’s (1985) modes of thought, which Dahlstrom (2014) similarly relates to science texts. The expository path of processing draws on the paradigmatic mode of thought, with texts presenting events that are generalised, rather than embedded in a specific context. As such, the text depicts history from the ‘outside’, as though the reader is an omniscient observer. Generalised ideas from the text can then be applied to specific circumstances. For instance, learners might read in the expository text that soldiers usually have to complete sentry duty in the trenches; later, they might be able to apply this to the description of a specific soldier, recognising what they are doing. In contrast, narratives describe specific cases of events: they are context- specific, episodically rich and view history from the ‘inside’. The reader may step into a protagonist’s shoes, allowing them to relate to the protagonist and to view history from their perspective. From these descriptions, a reader can begin to infer generalised ideas. For instance, learners read in the narrative text that Harry Stinton stood on sentry duty; they can later generalise this idea, recognising that it was an experience had by many soldiers.

285 10.4. The need for a new model Whilst research has recognised that text type and reading goal can influence inferences made and the recall of textual information (Zwaan, 1994; Linderholm & van den Broek, 2002; Wolfe & Woodwyk, 2010; Bohn-Gettler & Kendeou, 2014; Bailey et al., 2017), this body of research does not look further into how these differential processes might reflect the construction of unique types of situation models in response to different text types. As a result, the previous model for reading suggested that readers respond to all texts that represent events in a similar way; a similar outcome would be obtained from any text on the same subject, regardless of text type. This presented an oversimplification of the ways in which readers interact with texts. According to this old model, due to the similarities in content between the NNF and ETs used in this research, readers would respond to both texts in very similar ways. Therefore, a new model was required to begin to understand why and how readers might respond to expository and narrative texts in unique ways, and the impact that this might have on how readers learn from texts. The new model of reading presented above advances the old model, firstly in that it indicates how different texts might initiate the construction of unique types of situation models, and secondly in that it addresses how these unique types of situation models might influence how a reader develops conceptual understanding in response to different texts. In doing so, this new model refocuses on texts as learning tools, rather than texts as having no real purpose or goal. Much of the literature that the previous model was based on considered inauthentic texts read in artificial situations, therefore not recognising the complexity of natural texts read for genuine, educational purposes.

The advances in this new model offer various benefits over the previous model of reading. Primarily, it offers an indication of how different text types can be used to support learners in different situations: it allows an insight into when expository texts might be the most beneficial and purposeful type of text to use to support learning, and when narrative might be. Further to this, in highlighting the differing processes that might take place in response to the reading of narrative and expository texts, the new model provides an insight into how practitioners can support learners in these different reading processes, to maximise the power of different text types as learning tools.

286 Whilst it is recognised that this model offers a compartmentalised view of the reading process, it does illustrate how and where reading processes might diverge when reading different text types. This is not to say that only one of the two proposed pathways might be taken when reading; it is likely that there is a more broad continuum of situation models that lie between the conceptual model and the storyworld. In fact, whilst NNF is associated with the narrative pathway in this research project, it might be that subtle distinctions between narrative nonfiction and fictional narratives mean that even these text types are processed in slightly different ways. Alternatively, there may be some form of communication between the two different pathways: if a reader who is highly familiar with a topic reads a narrative on that topic, they might construct a storyworld alongside various conceptual models, or they might be more inclined to draw on their topic-specific prior knowledge in the construction of a storyworld. Further research is required to explore more subtle distinctions between situation models constructed in response to various text types, and whether there may be some form of communication between the proposed processing pathways.

10.5. Benefits of the narrative and expository pathways Whilst neither the narrative or expository pathway presented in this model is superior, there are circumstances in which one of the two pathways might be more beneficial for those reading to learn. This research argues that the narrative pathway might enhance the development of conceptual understanding for younger readers who have little prior knowledge of a topic. Alternatively, for readers with a good level of prior knowledge and an established interest in a topic, expository texts may be more beneficial. These readers may recognise more complex interrelations between propositions in a text, enabling them to construct a strong textbase. Whilst a strong textbase is not retained overtime, it supports the construction of strong conceptual models. In constructing conceptual models, such readers also have a greater amount of prior knowledge to integrate with textual information, in order to make sense of textual information in relation to what they already know. They can then embed the conceptual models into their pre-existing knowledge structures with ease, as they have an appropriately complex network of conceptual understanding to embed new understanding within. However, a lower level of prior knowledge can inhibit readers of expository texts: in such situations, narratives seem to be a supportive way of introducing learners to relatively new and unfamiliar

287 topics. The chronological thread that runs through a narrative can provide an initial way of establishing chronological relations between propositions, so that readers can begin to develop a basic understanding of these propositions within a familiar structure. As a result, narratives can provide learners with initial encounters with episodes, so that they can become familiarised with a topic, creating a foundation for further learning to be built on. This is supported by research suggesting that prior knowledge is more important to expository comprehension (Wolfe & Mienko, 2007; Best et al., 2008; Wolfe & Woodwyk, 2010; Clariana et al., 2014).

The construction of storyworlds might also offer some additional benefits over conceptual models. It is thought that because the storyworld represents a global, cohesive, alternative reality, a reader can transport into this storyworld. This process of transportation can support the development of different types of knowledge. Firstly, it can support the development of substantive knowledge of unfamiliar concepts. If concepts presented in a text are new and unfamiliar, as discussed above, the storyworld provides a safe, hypothetical context – an ‘experimental laboratory’ (Hakkarainen, 2008: 293) – in which new concepts can be explored. Firstly, this process makes unfamiliar concepts accessible and relatable, through positioning them in a familiar structure. Secondly, if the new concept being presented conflicts with or defies the reader’s understanding of the world around them, the storyworld allows the reader to suspend their disbelief (Bruner, 1986) in order to conceive of this unfamiliar information. Transportation to the storyworld can also support the development of second-order knowledge. Within the storyworld, a reader can take the perspective of protagonists and can consider how events might have unfolded differently through hypothetical thought. Therefore, the construction and the experience of the storyworld can encourage subjective, individualised judgements to be made regarding information in the texts. Ultimately, narratives and storyworlds are likely to be relatable to readers, as they mirror the ways in which readers structure and understand their own lives.

A final benefit of storyworlds is that they appear to support the retention of conceptual understanding developed in response to a text to a greater degree than conceptual models. This may be resultant of episodic processing and the integration of storyworlds into pre-existing knowledge structures as a cohesive whole. The bridges in storyworlds

288 that connect information along a chronological thread also support the retrieval of more information once the storyworld is activated. In contrast, conceptual models are stored as separate units, and therefore interference can occur in the retrieval of information, as an individual searches for the appropriate information (Radvansky, 2012).

289 Chapter Eleven: Conclusion This research set out to explore four key research questions (detailed below), which collectively aimed to shed light on the potential benefits of narrative as a teaching tool, and the possible unique processing of narratives underlying these benefits. This conclusion will consider the theoretical implications of this thesis, before discussing the pedagogical implications, in relation to the four research questions. It will end on a final note, reflecting on my personal journey and implications for my own, personal practice.

1 How does narrative nonfiction affect the development of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 2 How does narrative nonfiction affect the longer term retention of conceptual understanding of World War One for primary-aged readers, in comparison to expository text? 3 How might mental representations diverge across narrative and expository texts for primary-aged readers? 4 How do primary-aged readers perceive and judge the truth-value of narrative nonfiction and expository texts, and what implications might this have for the development and retention of conceptual understanding?

11.1. A contribution to knowledge: theoretical implications In seeking answers to these research questions, this thesis provides two main contributions to current knowledge. Firstly, with regard to the first two research questions, this thesis evidences that there is something truly unique about narrative. This unique quality means that narrative is a powerful educational tool not simply because it evokes interest, nor because its typical features make it easier to comprehend than expository texts, but because it initiates particular processes of meaning construction. Primarily, the chronological structure of narrative acts as a familiar organisational frame in which readers can embed unfamiliar, otherwise conceptually isolated information (Herbert & Burt, 2004), thus providing an initial gateway into the development of a conceptual understanding of content embedded in the text. Further to this, narrative overcomes a barrier to history learning for younger children: that history is distant and largely unobservable. This research has demonstrated that

290 narrative can essentially act as a bridge between the past and the present – between the unfamiliar and the personal (Cleto & Warman, 2019) – through encouraging the construction of and transportation to a storyworld, which represents the historical reality portrayed in the text. As such, narrative evokes the imagination in a way that encourages creativity and discovery within this storyworld, which acts as an ‘experimental laboratory’ (Hakkarainen, 2008:293) in which unfamiliar information can be explored without the restrictions of a learner’s own reality. Learners can consequently develop a deeper conceptual understanding of concepts and periods which otherwise seem distant, improbable and beyond their grasp (Jennings et al., 1992; Browning & Hohenstein, 2015; Russo & Russo, 2018). This process of transportation essentially allows readers to view history as active participants from the ‘inside’, rather than passive observers from the outside. In doing so, narrative can strike a balance between what Wineburg describes as the tension between ‘the familiar past’ and ‘the strange and inaccessible past’ (2001:6): it does not over familiarise and distort history by attempting to bring history to the learner, but encourages learners to step outside of their familiar reality, into an alternative historical reality, to expand their minds. In transporting into this storyworld, Ian Mortimer’s words in his introduction to The Time Traveller’s Guide to Medieval England are particularly appropriate:

‘As soon as you start to think of the past happening (as opposed to it having happened), a new way of conceiving history becomes possible… [it] allows us … an investigation into the sensations of being alive in a different time’ (2009:1).

This thesis observed these benefits of narrative with regard to history learning specifically. However, as history is arguably inherently narrative, because it deals with chronologically and causally sequenced events and the individuals experiencing these events, it is questionable as to how far the benefits of narrative might extend into other subject areas. Research elsewhere demonstrates that narrative is also beneficial in the science classroom (Murmann & Avraamidou, 2014; Kelemen et al., 2014; Emmons et al., 2016; Prins et al., 2017) and the mathematics classroom (Padula, 2004; Malinsky & McJunkin, 2008; Muir et al., 2017), primarily because it makes unfamiliar, more abstract concepts accessible and relatable (Jennings et al., 1992). Just as history is largely distant and unobservable, so are various abstract scientific and mathematical concepts, such as

291 adaptation, and exponential growth. Therefore, it might be that the ability of narrative to allow the reader to transport into a storyworld can support the understanding of unfamiliar, abstract concepts across a range of topics. However, other benefits of narrative observed in this research may be unique to history learning: history is a subject centred on people, their opinions, their motives, their actions, and ultimately, their lives. Being able to transport into a storyworld allows a reader to experience the lives of historical figures, or protagonists, portrayed in the narrative: the storyworld is a simulation of the social world, functioning to create ‘a deep and immersive simulative experience of social interactions for readers … achieving a form of learning through experience’ (Mar & Oatley, 2008:173). The importance of protagonists is highlighted throughout this thesis: they motivate readers to transport into the storyworld, to consider and explore history from the inside. They encourage readers to empathise, and to seek to emotionally understand them, and the events occurring in the narrative around them, in relation to the reader’s own social experiences (Bruner, 1991; Boström, 2008; Klassen & Froese-Klassen, 2014). As such, the development of a second-order knowledge of history becomes possible. This particular beneficial aspect of narrative is likely to be unique to history learning.

A final unique benefit of narrative observed in this thesis was its ability to support the retention of conceptual understanding developed in response to the text, as similarly observed by Maria & Johnson (1990) and Arya & Maul (2012). Narratives are memorable (Schank & Berman, 2013; Norris et al. 2005) because they are episodically rich, and contain a chronological thread that binds events and ideas portrayed within the narrative together: narrative is ‘residing in memory as separate but unified episodes with rich context and event sequences’ (Clariana et al., 2014:604). It is processed using the episodic memory, in a similar way to how individuals process their own experiences in life, therefore creating a rich impression in the memory.

The second contribution made by this thesis is that it proposes unique processing pathways for narrative and expository texts. Whilst much previous theory focuses on either narrative or expository processing, comparing and contrasting processing of two approximately equivalent narrative and expository texts has allowed for further insights into what is unique about each of these text types. Further to this, this thesis refocuses

292 reading theories on how texts are used to learn, rather than considering reading processes in isolation, in doing so highlighting how differential processing might influence how learners develop and retain conceptual understanding from texts containing the same educational content. This takes reading theory and makes it relevant to education. In contrasting narrative and expository texts, this model highlights features that are central to narrative processing, primarily a chronological thread which is characterised by narrative diachronicity, a sense of particularity (Bruner, 1991), and the presence of protagonists. These three features appeared to influence the ways in which texts were identified and subsequently processed by participants. However, this model is an initial proposal; further research may consider how applicable it is for different readers in different circumstances, and whether mental representations might diverge further according to different text types.

In addition to these two main contributions, this research has highlighted that narrative is neither simply a stimulus to evoke interest at the beginning of a lesson, nor a vehicle to convey historical information, as required by a content-heavy history curriculum, but it is an educational instigator, facilitator and mediator of conceptual understanding in its own right. Within the social space between the text, the participant and their peers, a conceptual understanding of WWI was negotiated and constructed, with participants drawing on the text to mediate the construction of understanding. Participant discussions, independent of adult support, showed how capable participants were of utilising the text as a scaffold in order to develop a sophisticated conceptual understanding of WWI. This highlights the role of texts as facilitators to the construction of meaning, rather than as resources that directly convey meaning to be absorbed by the reader: reading is a process of active meaning generation, not passive meaning reception (Spivey, 1997).

11.2. Pedagogical implications The proposed model of reading has pedagogical implications for how and when teachers choose to use different text types in the history classroom. For instance, it highlights how narratives may be more beneficial at the beginning of an historical topic, when little prior knowledge has been accumulated, whereas expository texts might provide additional benefits later in the teaching sequence, once topic-specific knowledge is

293 more developed. Not only this, but it has implications for how these texts are utilised in the classroom, and how learners are supported in developing understanding in relation to these texts. If choosing to use an expository text, related activities might involve explicitly making connections between topic-specific prior knowledge and textual information, whereas tasks related to narratives might seek ways for learners to depict their storyworld themselves, or to express their experiences of their storyworld.

Despite the observed benefits, this thesis also highlights important considerations that must be made when utilising narratives in the history classroom. Firstly, it must be recognised that narratives are a dynamic teaching tool, and their benefits are likely to vary to some degree depending on the topic being taught. When using a narrative to teach, teachers need to consider what historical knowledge or historical thinking skills are intended to be taught, and whether a particular narrative can facilitate learners developing the intended knowledge or skills. They need to consider the context required around the narrative to support learning, such as the activation of prior knowledge, and the opportunity to discuss the narrative in depth. It needs to be ensured that narratives are high-quality, and that educational content is not superfluous, but integral to the narrative. In addition, there is the issue that utilising texts in the classroom is more beneficial for those learners that enjoy reading outside of school: this raises the question of how to ensure that less engaged readers can access and engage with texts equally.

Finally, whilst findings from this research encourage the use of narrative nonfiction texts in the history classroom, this does not mean that teachers should be limited to the use of entirely factual narratives. Firstly, this would severely limit possible narratives to be used, as historical narrative nonfiction written for children is not widely available. This may partially be because narrative nonfiction texts that follow a protagonist are much more difficult to construct in relation to more distant time periods, where limited chronicles of the lives of specific individuals exist. Secondly, with regard to the fourth and final research question, participants in this research showed that they were capable of being critical of the content of the texts, often providing sophisticated justifications for perceptions of truth-value, and showing an awareness of subtle textual features in these justifications. This is an essential skill not only for reading historical narratives, but for reading historical expository texts, in which authors interpret information in their

294 own, subjective manner; such skills of historical enquiry are important to foster, particularly in an age where such large quantities of historical information, often from unverifiable sources, are available on the internet at the click of a finger. In encouraging such historical enquiry, the process of judging the truth-value of information in a text encourages history learners to mirror the processes of historians themselves, in constructing, critically negotiating and reasoning about historical information in a social space.

11.3. A final note In writing this thesis, I have gained an insight into the true complexity of narrative. Whilst this research has answered my proposed questions, it has opened up so many more for future exploration. My initial thoughts about the benefits of narrative have been challenged; throughout the journey of completing this PhD, I have moved from focusing on the benefits of narrative towards recognising the benefits of both narrative and expository texts in different situations, and for different learners. It has opened my eyes as to how activities can be tailored to text types, so that they support the differential ways in which readers process texts and construct conceptual understanding in response to different text types, and how the comprehension of different text types might be supported in unique ways.

Additionally, I have reflected on my role as a history teacher: as a teacher, I am not the conveyor of historically accurate information, but a facilitator to encourage learners to discover and explore historical information themselves. Similarly, a narrative text can be the teacher, the facilitator and the mediator in the social space between itself and learners. The development of conceptual understanding about WWI observed in this research was a result of the participants collaboratively constructing their own understanding in relation to a text, with no adult input. Yet these learners were incredibly successful in this endeavour. Under the pressure of a heavily content-based National Curriculum, this research has served as a reminder that my position as history teacher is not to focus on conveying as much content as possible to learners in the limited time available, but to foster creative discussion, negotiation, and the freedom to delve into and explore history. When using narratives, the focus should not necessarily be on these texts as a vehicle for conveying content to learners, but on these texts as

295 mediational tools to inspire curiosity, to instigate discussions and negotiations. Whilst this thesis was born from my enjoyment of sharing a narrative with a class, observing my pupils’ engagement as they pulled their chairs closer to mine, and facilitating their development of conceptual understanding through subsequent discussions, this thesis has equally broadened my horizons to encourage me to use narrative in different ways, allowing learners more independence in their interactions with texts. In returning to the classroom, I will adapt the ways in which I use both narrative and expository texts to support the teaching of history in light of these findings, in doing so, exploring the implications of this research in practise.

296 References

Adalsteinsdottir, K., Kiris, A., Butler, C., Newman, E., Engilbertsson, G., Carter, J., Erol, N, Harnett, P., & Sànchez, P. (2011). Learning and Teaching Children’s Literature in Europe – Final Report. doi:10.13140/RG.2.2.19317.42720.

Adams, P. (2006). Exploring social constructivism: theories and practicalities. Education 3-13, 34(3), 243-257. doi:10.1080/03004270600898893

Ahmed, Y., Wagner, R.K., & Lopez, D. (2014). Developmental Relations Between Reading and Writing at the Word, Sentence, and Text Levels: A Latent Change Score Analysis. Journal of Educational Psychology, 106(2), 419-434. doi:10.1037/a0035692

Alexander, R. ed. (2009). Children, Their World, Their Education: Final Report and Recommendations of the Cambridge Primary Review. London: Routledge.

Alpert, A. (2006). Incorporating Nonfiction into Readers’ Advisory Services. Reference & User Services Quarterly. 46(1), 25-32. doi:10.5860/rusq.46n1.25

Alvermann, D.E., Smith, L.C., & Readence, J.E. (1985). Prior Knowledge Activation and the Comprehension of Compatible and Incompatible Text. Reading Research Quarterly, 20(4), 420-436. doi:10.2307/747852

Anderson, J.R. (1974). Retrieval of propositional information from long-term memory. Cognitive Psychology, 6(4), 451-474. doi:10.1016/0010-0285(74)90021-8

Anderson, R.C. & Pearson, P.D. (1984). A schema-theoretic view of basic processes in reading. In Pearson, P.D. (Ed.) Handbook of Reading Research (pp.255-291). Mahway, NJ: Lawrence Erlbaum Associates.

Arya, D.J. & Maul, A. (2012). The Role of the Scientific Discovery Narrative in Middle School Science Education: An Experimental Study. Journal of Educational Psychology, 104(4), 1022-1032. doi:10.1037/a0028108

Ashbridge, J. & Josephidou, J. (2018). Classroom Organisation and the Learning Environment. In Cooper, H. & Elton-Chalcraft, S. (Eds.) Professional Studies in Primary Education, 3rd edn (pp.121-141). London:Sage.

Avraamidou, L. & Osborne, J. (2009). The Role of Narrative in Communicating Science. International Journal of Science Education, 31(12), 1683-1707. doi:10.1080/09500690802380695

Bage, G. (1999). Narrative Matters: Teaching and Learning History Through Story. London: Falmer Press.

Bailey, H.R., Kurby, C.A., Sargent, J.Q., & Zacks, J.M. (2017). Attentional focus affects how events are segmented and updated in narrative reading. Memory & Cognition, 45(6), 940-955. doi:10.3758/s13421-017-0707-2

297 Baker, L., Dreher, M.J., Shiplet, A.K., Beall, L.C., Voelker, A.N., Garrett, A.J., Schugar, H.R., & Finger-Elam, M. (2011). Children’s comprehension of informational text: Reading, engaging and learning. International Electronic Journal of Elementary Education, 4(1), 197-227. Available at https://www.iejee.com/index.php/IEJEE/article/view/221

Bakhtin, M.M. (1981). The Dialogic Imagination: Four Essays. Austin: University of Texas Press.

Bakhtin, M.M. (1986). Speech genres and other late essays. Austin: University of Texas Press.

Baram-Tsabari, A. & Yarden, A. (2005). Text Genre as a Factor in the Formation of Scientific Literacy. Journal of Research in Science Teaching, 42(4), 403-428. doi:10.1002/tea.20063

Baretta, L., Tomitch, L.M.B., MacNair, N., Lim, V.K., & Waldie, K.E. (2009). Inference making while reading narrative and expository texts: An ERP study. Psychology & Neuroscience, 2(2), 137-145. doi:10.3922/j.psns.2009.2.005

Barthes, R. (1993). Mythologies. London: Vintage Classics.

Barton, K.C. (1997). “I Just Kinda Know”: Elementary Students’ Ideas about Historical Evidence. Theory & Research in Social Education, 25(4), 407-430. doi:10.1080/00933104.1997.10505821

Beach, L.R. (2010). The Psychology of Narrative Thought. How the Stories We Tell Ourselves Shape Our Lives. Indiana: Xlibris Corporation.

Best, R.M., Floyd, R.G., & McNamara, D.S. (2008). Differential competencies contributing to children’s comprehension of narrative and expository texts. Reading Psychology, 29(2), 137-164. doi:10.1080/02702710801963951

Black, J.B. & Bern, H. (1981). Causal coherence and memory for events in narratives. Journal of Verbal Learning and Verbal Behavior, 20(3), 267-275. doi:10.1016/S0022- 5371(81)90417-5

Bohaty, J.J., Hebert, M.A., Nelson, J.R., & Brown, J.A. (2015). Methodological Status and Trends in Expository Text Structure Instruction Efficacy Research. Reading Horizons, 54(2), 35-65. Available at https://scholarworks.wmich.edu/reading_horizons/vol54/iss2/3/

Bohn-Gettler, C.M. & Kendeou, P. (2014). The Interplay of Reader Goals, Working Memory, and Text Structure During Reading. Contemporary Educational Psychology, 39(3), 206-219. doi:10.1016/j.cedpsych.2014.05.003

Bortnem, G.M. (2008). Teacher use of Interactive Read Alouds Using Nonfiction in Early Childhood Classrooms. Journal of College Teaching & Learning, 5(12), 29-44. doi:10.19030/TLC.V5I12.1213

298 Boström, A. (2008). Narratives as Tools in Designing the School Chemistry Curriculum. Interchange, 39(4), 391-413. doi:10.1007/s10780-008-9072-1

Bransford, J.D., Barclay, J.R., & Franks, J.J. (1972). Sentence Memory: A Constructive Versus Interpretive Approach. Cognitive Psychology, 3(2), 193-209. doi:10.1016/0010- 0285(72)90003-5

Braunger, J. & Lewis, J.P. (1997). Building a Knowledge Base in Reading. Portland: Northwest Regional Educational Laboratory.

Britton, B.K., Graesser, A.C., Glynn, S.M., Hamilton, T., & Penland, M. (1983). Use of cognitive capacity in reading: Effects of some content features of text. Discourse Processes, 6(1), 39–57. doi:10.1080/01638538309544553

Britton, B.K., & Gülgöz, S. (1991). Using Kintsch’s computational model to improve instructional text: Effects of repairing inference calls on recall and cognitive structures. Journal of Educational Psychology, 83(3), 329-345. doi:10.1037/0022-0663.83.3.329

Browning, E. & Hohenstein, J. (2015). The use of narrative to promote primary school children’s understanding of evolution. Education 3-13: International Journal of Primary, Elementary and Early Years Education, 43(5), 530-547. doi: 10.1080/03004279.2013.837943

Bruce, B.C., Rubin, A., & Starr, K.R. (1981). Why Readability Formulas Fail. IEEE Transactions on Professional Communication, 24, 50-52. doi:10.1109/TPC.1981.6447826

Bruner, J.S. (1966). Toward a Theory of Instruction. Cambridge, Massachusetts: Harvard University Press.

Bruner, J.S. (1985). Narrative and Paradigmatic Modes of Thought. In Eisner, E. W. (Ed.) Learning and Teaching the Ways of Knowing (pp. 97-115). Chicago: University of Chicago Press.

Bruner, J.S. (1986). Actual Minds, Possible Worlds. Cambridge, Massachusetts: Harvard University Press.

Bruner, J.S. (1990). Acts of Meaning. Cambridge, Massachusetts: Harvard University Press.

Bruner, J.S. (1991). The Narrative Construction of Reality. Critical Inquiry, 18(1), 1-21. doi:10.1086/448619

Bruner, J.S. (1996). The Culture of Education. Cambridge, Massachusetts: Harvard University Press.

Bruner, J.S. (2002). Making Stories: Law, Literature, Life. Cambridge, Massachusetts: Harvard University Press.

299 Butler, A.C., Zaromb, F.M., Lyle, K.B., & Roediger, H.L. III (2009). Using Popular Films to Enhance Classroom Learning: The Good, the Bad, and the Interesting. Psychological Science, 20(9), 1161-1168. doi:10.1111/j.1467-9280.2009.02410.x

Butling, G. (1916-1918). Private Papers of G and E Butling. [letters] (Personal communication, April 23rd 1916 – January 27th 1918).

Cabeza, R. & Nyberg, L. (2000). Imaging Cognition II: An Empirical Review of 275 PET and fMRI studies. Journal of Cognitive Neuroscience, 12(1), 1-47. doi:10.1162/08989290051137585

Caccamise, D. & Snyder, L. (2005). Theory and Pedagogical Practices of Text Comprehension. Topics in Language Disorders, 25(1), 3-18. doi:10.1097/00011363- 200501000-00003

Callanan, M.A., Shrager, J., & Moore, J.L. (1995). Parent-Child Collaborative Explanations: Methods of Identification and Analysis. Journal of the Learning Sciences, 4(1), 105-129. doi:10.1207/s15327809jls0401_3

Carlisle, J.F. & Felbinger, L. (1991). Profiles of Listening and Reading Comprehension. The Journal of Educational Research, 84(6), 345-354. doi:10.1080/00220671.1991.9941815

Chandler, D. (1997). Children’s Understanding of What is ‘Real’ on Television: A Review of the Literature. Journal of Educational Media, 23(1), 65-80. doi:10.1080/1358165970230105

Chomsky, N. (2006). Language and Mind. 3rd ed. Cambridge: Cambridge University Press.

Cimpian, A. & Markman, E.M. (2008). Preschool children’s use of cues to generic meaning. Cognition, 107(1), 19-53. doi:10.1016/j.cognition.2007.07.008

Clariana, R.B., Wolfe, M.B., & Kim, K. (2014). The influence of narrative and expository lesson text structures on knowledge structures: alternate measures of knowledge structure. Educational Technology Research and Development, 62, 601-616. doi:10.1007/s11423-014-9348-3

Cleto, S. & Warman, B. (2019). Teaching with Stories: Empathy, Relatability, and the Fairy Tale. Marvels & Tales, 33(1), 102-115. doi:10.13110/marvelstales.33.1.0102

Cohen, J.W. (1988). Statistical Power Analysis for the Behavioural Sciences. 2nd ed. New York: Routledge.

Coles, R. (1989). The Call of Stories: Teaching and the Moral Imagination. Boston: Houghton Mifflin.

Coltham, J.B., & Fines, J. (1971). Educational Objectives for the Study of History: A Suggested Framework. London: Historical Association.

300 Cook, T.D. & Campbell, D.T. (1979). Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston: Houghton Mifflin.

Cooper, H. (2018a). What is creativity in history? Education 3-13: International Journal of Primary, Elementary and Early Years Education, 46(6), 636-647. doi:10.1080/03004279.2018.1483799

Cooper, H. (2018b). Children, their world, their history education: the implications of the Cambridge review for primary history. Education 3-13: International Journal of Primary, Elementary and Early Years Education, 46(6), 615-619. doi:10.1080/03004279.2018.1483797

Cooper, H., & Dilek, D. (2007). A comparative study on primary pupils’ historical questioning processes in Turkey and England: empathetic, critical and creative thinking. Educational Sciences: Theory & Practice, 7(2), 713-725. Available at http://insight.cumbria.ac.uk/id/eprint/463/1/Cooper_AComparativeStudyOnPrimary.p df

Corriveau, K.H., Chen, E.E., & Harris, P.L. (2015). Judgements about Fact and Fiction by Children from Religious and Nonreligious Backgrounds. Cognitive Science, 39, 353-382. doi:10.1111/cogs.12138

Corriveau, K.H., Kim, A.L., Schwalen, C.E., & Harris, P.L. (2009). Abraham Lincoln and Harry Potter: Children’s differentiation between historical and fantasy characters. Cognition, 113(2), 213-225. doi:10.1016/j.cognition.2009.08.007

Counsell, C. (2000). Historical knowledge and historical skills: A distracting dichotomy. In Arthur, J. & Phillips, R. (Eds.) Issues in History Teaching (pp. 54-71). London: RoutledgeFalmer.

Cremin, T., Mottram, M., Bearne, E., & Goodwin, P. (2008). Exploring teachers’ knowledge of children’s literature. Cambridge Journal of Education, 38(4), 449-464. doi:10.1080/03057640802482363

Creswell, J.W. (2014). Research Design: Qualitative, Quantitative, and Mixed Methods Approaches. 4th ed. USA: SAGE.

Dahlstrom, M.F. (2014). Using narratives and storytelling to communicate science with nonexpert audiences. Proceedings of the National Academy of Sciences of the United States of America, 111(4), 13614-13620. doi:10.1073/pnas.1320645111

Davison, A. & Kantor, R.N. (1982). On the Failure of Readability Formulas to Define Readable Texts: A Case Study from Adaptations. Reading Research Quarterly, 17(2), 187-209. doi:10.2307/747483

De Groot-Reuvekamp, M.J., Van Boxtel, C., Rose, A., & Harnett, P. (2014). The understanding of historical time in the primary history curriculum in England and the Netherlands. Journal of Curriculum Studies, 46(4), 487-514. doi:10.1080/00220272.2013.869837

301 Dennis, M.J. & Ahn, W. (2001). Primacy in causal strength judgements: The effect of initial evidence for generative versus inhibitory relationships. Memory & Cognition, 29(1), 152-164. doi:10.3758/BF03195749

DfE (2013). The national curriculum in England. Key stages 1 and 2 framework document. London: Department for Education.

DfEE (1998). The National Literacy Strategy: Framework for Teaching. London: Department for Education and Employment.

DfEE (1999). The National Curriculum. Handbook for primary teachers in England. London: Department for Education and Employment.

Diakidoy, I.N., Stylianou, P., Karefillidou, C., & Papageorgiou, P. (2005). The relationship between listening and reading comprehension of different types of text at increasing grade levels. Reading Psychology, 26(1), 55-80. doi:10.1080/02702710590910584

Donovan, M.S. & Bransford, J.D. (2005). How Students Learn: History in the Classroom. Washington: The National Academies Press.

Doyle, W. & Carter, K. (2003). Narrative and Learning to Teach: Implications for Teacher- Education Curriculum. Journal of Curriculum Studies, 35(2), 129–137. doi:10.1080/0022027022000023053

DuBrow, S. & Davachi, L. (2013). The influence of context boundaries on memory for the sequential order of events. Journal of Experimental Psychology: General, 142(4), 1277-1286. doi:10.1037/a0034024

Duffy, T.M. (1985). Readability formulas: What’s the use? In Duffy, T.M. & Waller, R.M. (Eds.) Designing Usable Texts (pp.113-143). Florida: Academic Press.

Duke, N.K. (2000). 3.6 Minutes per Day: The Scarcity of Informational Texts in First Grade. Reading Research Quarterly, 35(2), 202-224. doi:10.1598/RRQ.35.2.1

Dulberg, N. (2005). “The Theory Behind How Students Learn”: Applying Developmental Theory to Research on Children’s Historical Thinking. Theory & Research in Social Education, 33(4), 508-531. doi:10.1080/00933104.2005.10473293

Dymock, S. (2005). Teaching Expository Text Structure Awareness. The Reading Teacher, 59(2), 177-181. doi:10.1598/RT.59.2.7

Egan, K. (1986). Teaching as Storytelling: An Alternative Approach to Teaching and Curriculum in the Elementary School. Chicago: The University of Chicago Press.

Egan, K. (1988). Primary Understanding: Education in Early Childhood. Oxon: Routledge.

302 Ehlers, M.G. (1999). “No Pictures in My Head”: The Uses of Literature in the Development of Historical Understanding. OAH Magazine of History, 13(2), 5-9. doi:10.1093/maghis/13.2.5

Elbro, C. & Buch-Iversen, I. (2013). Activation of Background Knowledge for Inference Making: Effects on Reading Comprehension. Scientific Studies of Reading, 17(6), 435- 452. doi:10.1080/10888438.2013.774005

Emmons, N., Smith, H., & Kelemen, D. (2016). Changing Minds With the Story of Adaptation: Strategies for Teaching Young Children About Natural Selection. Early Education and Development, 27(8), 1205-1221. doi:10.1080/10409289.2016.1169823

Eyden, J., Robinson, E.J., Einav, S., & Jaswal, V.K. (2013). The power of print: Children’s trust in unexpected printed suggestions. Journal of Experimental Child Psychology, 116(3), 593-608. doi:10.1016/j.jecp.2013.06.012

Faust, M. & Kandelshine-Waldman, O. (2011). The effects of different approaches to reading instruction on letter detection tasks in normally achieving and low achieving readers. Reading and Writing, 24(5), 545-566. doi:10.1007/s11145-009-9219-1

Fillpot, E. (2012). Historical Thinking in the Third Grade. The Social Studies, 103(5), 206- 217. doi:10.1080/00377996.2011.622318

Flesch (2007). Flesch. [online] Available at http://flesh.sourceforge.net/ [Accessed 8th March 2017].

Fletcher, C.R. & Chrysler, S.T. (1990). Surface forms, textbases, and situation models: Recognition memory for three types of textual information. Discourse Processes, 13(2), 175-190. doi:10.1080/01638539009544752

Fludernik, M. (1996). Towards a ‘Natural’ Narratology. London: Routledge.

Fludernik, M. (2000). Genres, Text Types, or Discourse Modes? Narrative Modalities and Generic Categorization. Style, 34(2), 274-292. Available at https://www.jstor.org/stable/pdf/10.5325/style.34.2.274.pdf?seq=1

Freeman, J. (2015). Assessment and Progression without levels: where do we go from here? Primary History, 69, 8-13. Available at https://www.history.org.uk/publications/resource/8210/assessment-and-progression- without-levels

Freeman, E.B., & Levstik, L. (1988). Recreating the Past: Historical Fiction in the Social Studies Curriculum. The Elementary School Journal, 88(4), 329-337. doi:10.1086/461542

Frisch, J.K. & Saunders, G. (2008). Using stories in an introductory college biology course. Journal of Biological Education, 42(4), 164-169. doi:10.1080/00219266.2008.9656135

303 Gabella, M.S. (1994). Beyond the Looking Glass: Bringing Students into the Conversation of Historical Inquiry. Theory & Research in Social Education, 22(3), 340- 363. doi:10.1080/00933104.1994.10505728

Ganea, P.A., Ma, L., & DeLoache, J.S. (2011) Young Children’s Learning and Transfer of Biological Information From Picture Books to Real Animals. Child Development, 82(5), 1421-1433. doi:10.1111/j.1467-8624.2011.01612.x

Gee, E.J. (1995). ‘The Effects of a Whole Language Approach to Reading Instruction on Reading Comprehension: A Meta-Analysis’, Annual Meeting of the American Educational Research Association. San Franciso, 18-22 April. Available at http://files.eric.ed.gov/fulltext/ED384003.pdf

Gelman, S.A., Ware, E.A., Manczak, E.M., & Graham, S.A. (2013). Children’s Sensitivity to the Knowledge Expressed in Pedagogical and Non-pedagogical Contexts. Developmental Psychology, 49(3), 491-504. doi:10.1037/a0027901

Gernsbacher, M.A. (1996). The Structure-Building Framework: What It Is, What It Might Also Be, and Why. In Britton, B.K. & Graesser, A.C. (Eds.) Models of text understanding (pp.289-311). New Jersey: Lawrence Erlbaum Associates.

Gerrig, R.J. (1993). Experiencing Narrative Worlds: On the Psychological Activities of Reading. New Haven: Yale University Press.

Gerrig, R.J. & Prentice, D.A. (1991). The Representation of Fictional Information. Psychological Science, 2(5), 336-340. doi:10.1111/j.1467-9280.1991.tb00162.x

Gilboa, A. (2004). Autobiographical and episodic memory – one and the same?: Evidence from prefrontal activation in neuroimaging studies. Neuropsychologia, 42(10), 1336-1349. doi:10.1016/j.neuropsychologia.2004.02.014

Goldstein, L.J. (1962). Evidence and Events in History. Philosophy of Science, 29(2), 175- 194. doi:10.1086/287860

Gorard, S. (2001). Quantitative Methods in Educational Research: The role of numbers made easy. London: Continuum.

Graesser, A.C. (1981). Prose Comprehension Beyond the Word. New York: Springer- Verlag.

Graesser, A.C. & Kreuz, R.J. (1993). A theory of inference generation during text comprehension. Discourse Processes, 16(1-2), 145-160. doi:10.1080/01638539309544833

Graesser, A.C., McNamara, D.S., & Louwerse, M.M. (2003). What do Readers Need to Learn in Order to Process Coherence Relations in Narrative and Expository Text? In Sweet, A.P. & Snow, C.E. (Eds.) Rethinking reading comprehension. Solving Problems in the Teaching of Literacy (pp.82-98). New York: Guilford Press.

304 Graesser, A.C., Olde, B., & Klettke, B. (2002). How does the mind construct and represent stories? In Green, M.C., Strange, J.J., & Brock, T.C. (Eds.) Narrative impact: Social and Cognitive Foundations (pp.229-262). New York: Psychology Press.

Graesser, A.C., Singer, M., & Trabasso, T. (1994). Constructing Inferences During Narrative Text Comprehension. Psychological Review, 101(3), 371-395. doi:10.1037/0033-295X.101.3.371

Greenberg, D.L. & Verfaellie, M. (2010). Interdependence of episodic and semantic memory: Evidence from neuropsychology. Journal of the International Neuropsychological Society, 16(5). 748-753. doi:10.1017/S1355617710000676

Greene, S. & Ackerman, J.M. (1995). Expanding the Constructivist Metaphor: A Rhetorical Perspective on Literacy Research and Practice. Review of Educational Research, 65(4). 383-420. doi:10.3102/00346543065004383

Gudmundsdottir, S. (1991). Story-maker, story-teller: narrative structures in curriculum. Journal of Curriculum Studies, 23(3), 207-218. doi:10.1080/0022027910230301

Gutkind, L. (2007). The best creative nonfiction, Vol. 1. New York: Norton.

Hakkarainen, P. (2006). Learning and Development in play. In Einarsdottir, J. & Wagner, J.T. (Eds.) Nordic Childhoods and Early Education. Philosophy, Research, Policy, and Practice in Denmark, Finland, Iceland, Norway and Sweden (pp.183-222). Greenwich, CT: Information Age Publishing.

Hakkarainen, P. (2008). The challenges and possibilities of a narrative learning approach in the Finnish early childhood education system. International Journal of Educational Research, 47(5), 292-300. doi:10.1016/j.ijer.2008.12.008

Hall, K.M., Sabey, B.L., & McClellan, M. (2005). Expository Text Comprehension: Helping Primary-Grade Teachers Use Expository Texts to Full Advantage. Reading Psychology, 26(3), 211-234. doi:10.1080/02702710590962550

Hall-Kenyon, K.M. & Black, S. (2010). Learning From Expository Texts: Classroom-Based Strategies for Promoting Comprehension and Content Knowledge in the Elementary Grades. Topics in Language Disorders, 30(4), 339-349. doi:10.1097/TLD.0b013e3181ff21ea

Hallam, R.N. (1967). Logical Thinking in History. Educational Review, 19(3), 183-202. doi:10.1080/0013191670190303

Harnett, P. (2000). History in the Primary School: Re-Shaping Our Pasts. The Influence of Primary School Teachers’ Knowledge and Understanding of History on Curriculum Planning and Implementation. History Education Research Journal, 1(1), 7-19. doi:10.18546/HERJ.01.1.02

305 Harnett, P. (2010). Why did you write it like a story rather than just saying the information? Primary History, 4, 16-17. Available at: https://www.history.org.uk/primary/resource/3725/why-did-you-write-it-like-a-story- rather-than-just

Harp, S.F. & Mayer, R.E. (1998). How Seductive Details Do Their Damage: A Theory of Cognitive Interest in Science Learning. Journal of Educational Psychology, 90(3), 414- 434. doi:10.1037/0022-0663.90.3.414

Hebert, M., Bohaty, J.J., Nelson, J.R., & Brown, J. (2016). The Effects of Text Structure Instruction on Expository Reading Comprehension: A Meta-Analysis. Journal of Educational Psychology, 108(5), 609-629. doi:10.1037/edu0000082

Heinlein, R.A. (1973). Time Enough For Love. New York: Ace Books.

Herbert, D.M.B. & Burt, J.S. (2004). What do Students Remember? Episodic Memory and the Development of Schematization. Applied Cognitive Psychology, 18(1), 77-88. doi:10.1002/acp.947

Herman, D. (2002). Story Logic: Problems and Possibilities of Narrative. Lincoln: University of Nebraska Press.

Herman, D. ed. (2007). The Cambridge Companion to Narrative. Cambridge: Cambridge University Press.

Herman, D. (2009a). Basic Elements of Narrative. Chichester, UK: Wiley-Blackwell.

Herman, D. (2009b). Narrative ways of worldmaking. In: Heinan, S. & Sommer, R. (Eds.) Narratology in the age of cross-disciplinary narrative research (pp.71-87). Berlin: Walter de Gruyter.

Herman, D. (2013). Storytelling and the Sciences of Mind. Cambridge, Massachussetts: The MIT Press.

Herman, D., Jahn, M., & Ryan, M. (2005). Routledge Encylopedia of Narrative Theory. London: Routledge.

Historical Association. (2016). Report: National Primary History Survey 2015. [online] Available at https://www.history.org.uk/primary/categories/709/news/3033/report- national-primary-history-survey-2015. [Accessed 19 June 2017].

Historical Association. (2020). Historical Association Survey of History in English Primary Schools 2019. Available at https://www.history.org.uk/primary/categories/709/news/3823/primary-history- survey-report. [Accessed 22 July 2020].

Hoodless, P.A. (2002). An investigation into children’s developing awareness of time and chronology in story. Journal of Curriculum Studies, 34(2), 173-200. doi:10.1080/00220270110080962

306 Hopkins, E.J. & Weisberg, D.S. (2017). The youngest readers’ dilemma: A review of children’s learning from fictional sources. Developmental Review, 43, 48-70. doi:10.1016/j.dr.2016.11.001

Horowitz, R. & Samuels, S.J. (1985). Reading and Listening to Expository Text. Journal of Literacy Research, 17(3), 185-198. doi:10.1080/10862968509547539

Imperial War Museum (n.d.) Dean, Ethel May (Oral History) [online] Available at: https://www.iwm.org.uk/collections/item/object/80009226 [Accessed 16th April 2020].

Jahn, M. (2005). Cognitive Narratology. In Herman, D., Jahn, M., & Ryan, M. (Eds.) Routledge Encylopedia of Narrative Theory (pp.67-71). London: Routledge.

Jennings, C.M., Jennings, J.E., Richey, J., & Dixon-Krauss, L. (1992). Increasing Interest and Achievement in Mathematics Through Children’s Literature. Early Childhood Research Quarterly, 7(2), 263-276. doi:10.1016/0885-2006(92)90008-M

Johansson, R., Oren, F., & Holmqvist, K. (2018). Gaze patterns reveal how situation models and text representations contribute to episodic text memory. Cognition, 175, 53-68. doi:10.1016/j.cognition.2018.02.016

Jones, T. (2019). Will students gain knowledge of the world by reading fiction? Theory and Research in Education, 17(1), 3-18. doi:10.1177/1477878519832675

Karlsson, J., van den Broek, P., Helder, A., Hickendorff, M., Koornneef, A., & van Leijenhorst, L. (2018). Profiles of young readers: Evidence from thinking aloud while reading narrative and expository texts. Learning and Individual Differences, 67, 105- 116. doi:10.1016/j.lindif.2018.08.001

Kaufman, J.C. (2002). Narrative and Paradigmatic Thinking Styles in Creative Writing and Journalism Students. Journal of Creative Behavior, 36(3), 201-219. doi:10.1002/j.2162-6057.2002.tb01064.x

Kelemen, D., Emmons, N.A., Schillaci, R.S., & Ganea, P.A. (2014). Young Children Can Be Taught Basic Natural Selection Using a Picture-Storybook Intervention. Psychological Science, 25(4), 893-902. doi:10.1177/0956797613516009

Kelly, H. (1981). Reasoning about Realities: Children’s Evaluations of Television and Books. New Directions for Child and Adolescent Development, 1981(13), 59–71. doi:10.1002/cd.23219811306

Kendeou, P. & van den Broek, P. (2007). The effects of prior knowledge and text structure on comprehension processes during reading of scientific texts. Memory & Cognition, 35(7), 1567-1577. doi:10.3758/BF03193491

Kintsch, W. (1972). Notes on the Structure of Semantic Memory. In Tulving, E. & Donaldson, W. (Eds.) Organisation of Memory. New York: Academic Press.

307 Kintsch, W. (1998). Comprehension: A paradigm for cognition. Cambridge: Cambridge University Press.

Kintsch, W., Welsch, D., Schmalhofer, F., & Zimny, S. (1990). Sentence Memory: A Theoretical Analysis. Journal of Memory and Language, 29(2), 133-159. doi:10.1016/0749-596X(90)90069-C

Klassen, S. & Froese-Klassen, C. (2014). The Role of Interest in Learning Science through Stories. Interchange, 45(3-4), 133-151. doi:10.1007/s10780-014-9224-4

Koenig, M.A., Cole, C.A., Meyer, M., Ridge, K.E., Kushnir, T., & Gelman, S.A. (2015). Reasoning about knowledge: Children’s evaluations of generality and verifiability. Cognitive Psychology, 83, 22-39. doi:10.1016/j.cogpsych.2015.08.007

Kokkotas, P., Malamitsa, K. & Rizaki, A. (2010). Storytelling as a Strategy for Understanding Concepts of Electricity and Electromagnetism. Interchange, 41(4), 379- 405. doi:10.1007/s10780-010-9137-9

Kostons, D. & van der Werf, G. (2015). The effects of activating prior topic and metacognitive knowledge on text comprehension scores. British Journal of Educational Psychology, 85(3), 264-275. doi:10.1111/bjep.12069

Kotaman, H. & Tekin, A.K. (2017). Informational and fictional books: young children’s book preferences and teachers’ perspectives. Early Child Development and Care, 187(3-4), 600-614. doi:10.1080/03004430.2016.1236092

Kraal, A., Koornneef, A.W., Saab, N. & van den Broek, P.W. (2018). Processing of expository and narrative texts by low- and high-comprehending children. Reading and Writing. 31(9), 2017-2040. doi:10.1007/s11145-017-9789-2

Krapp, A., Hidi, S., & Renninger, K.A. (1992). Interest, Learning, and Development. In Renninger, K.A., Hidi, S., & Krapp, A. (Eds.). The Role of Interest in Learning and Development. New Jersey: Lawrence Erlbaum Associates, Inc.

Kucer, S.B. (2011). Going beyond the author: what retellings tell us about comprehending narrative and expository texts. Literacy, 45(2), 62-69. doi:10.1111/j.1741-4369.2010.00568.x

Kuhn, K.E., Rausch, C.M., McCarty, T.G., Montgomery, S.E., & Rule, A.C. (2017). Utilizing Nonfiction Texts to Enhance Reading Comprehension and Vocabulary in Primary Grades. Early Childhood Education Journal, 45(2), 285-296. doi:10.1007/s10643-015-0763-9

Kurby, C.A. & Zacks, J.M. (2015). Situation models in naturalistic comprehension. In Willems, R.M. (Ed.) Cognitive Neuroscience of Natural Language Use (pp.59-76). Cambridge: Cambridge University Press.

Labov, W. (1972). Language in the Inner City: Studies in the Black English Vernacular. Philadelphia: University of Pennsylvania Press.

308 Larison, K.D. (2018). Taking the Scientist’s Perspective: The Nonfiction Narrative Engages Episodic Memory to Enhance Students’ Understanding of Scientists and Their Practices. Science & Education, 27(1-2), 133-157. doi:10.1007/s11191-018-9957-z

León, C. (2016). An architecture of narrative memory. Biologically Inspired Cognitive Architectures, 16, 19-33. doi:10.1016/j.bica.2016.04.002

Levine, T.R. & Hullett, C.R. (2002). Eta Squared, Partial Eta Squared, and Misreporting of Effect Size in Communication Research. Human Communication Research, 28(4), 612-625. doi:10.1111/j.1468-2958.2002.tb00828.x

Linderholm, T. & van den Broek, P. (2002). The Effects of Reading Purpose and Working Memory Capacity on the Processing of Expository Text. Journal of Educational Psychology, 94(4), 778-784. doi:10.1037/0022-0663.94.4.778

Liu, C.H. & Matthews, R. (2005). Vygotsky’s philosophy: Constructivism and its criticisms examined. International Education Journal, 6(3), 386-399. Available at https://files.eric.ed.gov/fulltext/EJ854992.pdf

Logtenberg, A., van Boxtel, C., & van Hout-Wolters, B. (2011). Stimulating situational interest and student questioning through three types of historical introductory texts. European Journal of Psychology of Education, 26(2), 179-198. doi:10.1007/s10212-010- 0041-6

Loreman, T., Deppeler, J., & Harvey, D. (2005). Inclusive Education: A practical guide to supporting diversity in the classroom. Abingdon: RoutledgeFalmer.

Lowenthal, D. (1985). The Past Is a Foreign Country. Cambridge: Cambridge University Press.

Maddison, M. (2014). The National Curriculum for History from September 2014: the view from Ofsted. Primary History, 66, 5-7. Available at https://www.history.org.uk/primary/resource/7191/the-national-curriculum-for- history-from-september

Magliano, J.P., Zwaan, R.A., & Graesser, A. (1999). The Role of Situational Continuity in Narrative Understanding. In van Oostendorp, H. & Goldman, S.R. (Eds.) The Construction of Mental Representations During Reading (pp.219-246). New Jersey: Lawrence Erlbaum Associates.

Malinsky, M.A. & McJunkin, M. (2008). Wondrous Tales of Measurement. Teaching Children Mathematics, 14(7), 410-413. doi:10.5951/TCM.14.7.0410

Maloch, B. & Horsey, M. (2013). Living Inquiry: Learning From and About Informational Texts in a Second-Grade Classroom. The Reading Teacher, 66(6), 475-485. doi:10.1002/TRTR.1152

309 Mandler, P., Lang, S. & Vallance, T. (2011). Debates: Narrative in school history. Teaching History, 145, 22-31. Available at https://www.history.org.uk/secondary/resource/5127/debates-narratives-in-school- history

Mantel, H. (2017a). The BBC Reith Lectures – The Day Is for the Living [Audio] Broadcast 13 June 2017. Available at http://www.bbc.co.uk/programmes/b08tcbrp [Accessed 17th June 2017].

Mantel, H. (2017b). The BBC Reith Lectures – The Iron Maiden [Audio] Broadcast 20 June 2017. Available from: http://www.bbc.co.uk/programmes/b08tcbrp [Accessed 28 June 2020].

Mar, R.A. (2004) The neuropsychology of narrative: story comprehension, story production and their interrelation. Neuropsychologia, 42(10), 1414-1434. doi:10.1016/j.neuropsychologia.2003.12.016

Mar, R.A. (2011). The Neural Bases of Social cCgnition and Story Comprehension. Annual Review of Psychology, 62(1), 103-134. doi:10.1146/annurev-psych-120709- 145406

Mar, R.A. & Oatley, K. (2008). The Function of Fiction is the Abstraction and Simulation of Social Experience. Perspectives on Psychological Science, 3(3), 173-192. doi:10.1111/j.1745-6924.2008.00073.x

Mar, R.A., Oatley, K., Djikic, M. & Mullin, J. (2011). Emotion and narrative fiction: Interactive influences before, during, and after reading. Cognition and Emotion, 25(5), 818-833. doi:10.1080/02699931.2010.515151

Maria, K. & Johnson, J.M. (1990). Correcting misconceptions: Effect of type of text. In National Reading Conference Yearbook, Volume 39 (pp.329-337). Texas: Texas Christian University Press.

Marriott, S. (1986). Teachers’ Use of Fiction in Primary Schools in Northern Ireland. The Irish Journal of Education, 20(2), 97-108. Available at https://www.jstor.org/stable/30077332?seq=1

Martarelli, C.S. & Mast, F.W. (2013). Is It Real or Is It Fiction? Children’s Bias Toward Reality. Journal of Cognition and Development, 14(1), 141-153. doi:10.1080/15248372.2011.638685

Martin, A. (2009). Semantic Memory. In Squire, L.R. (Ed.) Encyclopedia of Neuroscience (pp.561-566). New York: Elsevier.

Mayer, R.E. (1979). Twenty Years of Research on Advance Organizers: Assimilation Theory is Still the Best Predictor of Results. Instructional Science, 8(2), 133-167. doi:10.1007/BF00117008

310 McCrudden, M.T. & Schraw, G. (2007). Relevance and Goal-Focusing in Text Processing. Educational Psychological Review, 19(2), 113-139. doi:10.1007/s10648- 006-9010-7

McCrudden, M.T., Magliano, J.P., & Schraw, G. (2010). Exploring how relevance instructions affect personal reading intentions, reading goals and text processing: A mixed methods study. Contemporary Educational Psychology, 35(4), 229-241. doi:10.1016/j.cedpsych.2009.12.001

McHugh, M.L. (2012). Interrater reliability: the kappa statistic. Biochemia MedicaI 22(3):276-282. doi:10.11613/BM.2012.031

McLeod, A. N. & McDade, H. L. (2011). Preschoolers’ Incidental Learning of Novel Words During Storybook Reading. Communication Disorders Quarterly, 32(4), 256-266. doi:10.1177/1525740109354777

McNamara, D.S. (2001). Reading Both High-Coherence and Low-Coherence Texts. Effects of Text Sequence and Prior Knowledge. Canadian Journal of Experimental Psychology, 55(1), 51-62. doi:10.1037/h0087352

McNamara, D.S., Kintsch, E., Songer, N.B., & Kintsch, W. (1996). Are Good Texts Always Better? Interactions of Text Coherence, Background Knowledge, and Levels of Understanding in Learning From Text, Cognition & Instruction, 14(1), 1-43. doi:10.1207/s1532690xci1401_1

Medwell, J., Wray, D., Moore, G., & Griffiths, V. (2017). Primary English: Knowledge and Understanding. 8th ed. London: SAGE.

Mercer, N. (1996). The Quality of Talk in Children’s Collaborative Activity in the Classroom. Learning and Instruction, 6(4), 359-377. doi:10.1016/S0959- 4752(96)00021-7

Mercer, N. & Littleton, K. (2007). Dialogue and the Development of Children’s Thinking. London: Routledge.

Meyer, B.J.F. (1975). The Organization of Prose and its Effects on Memory. Amsterdam, Netherlands: North-Holland Publishing.

Meyer, B.J.F. (1985). Prose analysis: Purposes, procedures, and problems. In Britton, B.K. & Black, J.B. (Eds.), Understanding Expository Text: A Theoretical and Practical Handbook for Analyzing Explanatory Text (pp.11-64). New Jersey: Lawrence Erlbaum Associates.

Meyer, B.J.F. & Ray, M.N (2011). Structure strategy interventions: Increasing reading comprehension of expository text. International Electronic Journal of Elementary Education, 4(1), 127-152. Available at https://www.iejee.com/index.php/IEJEE/article/view/217/213

311 Mili & Winch, C. (2019). Teaching through textbooks: Teachers as practitioners of a discipline? Theory and Research in Education, 17(2), 181-201. doi:10.1177/1477878519862547

Mortimer, I. (2009). The Time Traveller’s Guide to Medieval England: A Handbook for Visitors to the Fourteenth Century. London: Vintage Books.

Mosenthal, P.B. (1985). Defining the expository discourse continuum: Towards a Taxonomy of Expository Text Types, Poetics, 14(5), 387-414. doi:10.1016/0304- 422X(85)90035-X

Muir, T., Livy, S., Bragg, L., Clark, J., Wells, J. & Attard, C. (2017). Engaging with mathematics through picture books. Albert Park, Australia: Teaching Solutions.

Murmann, M. & Avraamidou, L. (2014). Animals, Emperors, Senses: Exploring a Story- based Learning Design in a Museum Setting. International Journal of Science Education, Part B, 4(1), 66-91. doi:10.1080/21548455.2012.760857

Nathanson, S. (2006). Harnessing the Power of Story: Using Narrative Reading and Writing Across Content Areas. Reading Horizons, 47(1), 1-26. Available at https://pdfs.semanticscholar.org/5e43/2645ede85144b0401e18abd707816399bd13.p df

National Reading Panel. (2000). Teaching Children to Read: An Evidence-Based Assessment of the Scientific Research Literature on Reading and Its Implications for Reading Instruction. Available at: https://www.nichd.nih.gov/publications/pubs/nrp/Documents/report.pdf. [Accessed 3rd August 2017].

Negrete, A. (2005). Fact via Fiction: Stories that Communicate Science. In Sanitt, N. (Ed.) Motivating science: Science communication from a philosophical, educational and cultural perspective (pp.95-102). Luton: The Pantaneto Press.

Newberry, K.M. & Bailey, H.R. (2019). Does semantic knowledge influence event segmentation and recall of text? Memory & Cognition, 47(6), 1173-1187. doi:10.3758/s13421-019-00926-4

Nicolopoulou, A. (1997). Children and Narratives: Toward an Interpretive and Sociocultural Approach. In Bamberg, M. (Ed.) Narrative Development: Six Approaches (pp.179-216). New Jersey: Lawrence Erlbaum Associates.

Norris, S.P., Guilbert, S.M., Smith, M.L., Hakimelahi, S., & Phillips, L.M. (2005). A theoretical framework for narrative explanation in science. Science Education, 89(4), 535-563. doi:10.1002/sce.20063

Nowell-Smith, P.H. (1977). The Constructionist Theory of History. History and Theory, 16(4), 1-28. doi:10.2307/2504805

312 Nünning, V. (2015). Narrative Fiction and Cognition: Why We Should Read Fiction. Forum for World Literature Studies, 7(1), 41-61. Available at http://www.fwls.org/plus/view.php?aid=250

Ofsted. (2011). History for All. History in English Schools 2007/10. Available from: https://www.gov.uk/government/publications/history-for-all-strengthes-and- weaknesses-of-school-history-teaching. [Accessed 3rd August 2017].

Olejnik, S., Li, J., Supattathum, S., & Huberty, C.J. (1997). Multiple Testing and Statistical Power With Modified Bonferroni Procedures. Journal of Educational and Behavioural Statistics, 22(4), 389-406. doi:10.3102/10769986022004389

Ozuru, Y., Dempsey, K., & McNamara, D.S. (2009). Prior knowledge, reading skill, and text cohesion in the comprehension of science texts. Learning and Instruction, 19(3), 228-242. doi:10.1016/j.learninstruc.2008.04.003

Padula, J. (2004). The Role of Mathematical Fiction in the Learning of Mathematics in Primary School. Australian Primary Mathematics Classroom, 9(2), 8-14. Available at https://files.eric.ed.gov/fulltext/EJ793882.pdf

Pathman, T., Samson, Z., Dugas, K., Cabeza, R., & Bauer, P.J. (2011). A “snapshot” of declarative memory: Differing developmental trajectories in episodic and autobiographical memory. Memory, 19(8), 825-835. doi:10.1080/09658211.2011.613839

Peñaloza, G. & Robles-Piñeros, J. (2020). Imagination and Narratives to Tell Stories About Natural History. Human Arenas. doi:10.1007/s42087-020-00124-8

Piaget, J. (1952). The origins of intelligence in children. New York: International University Press.

Polkinghorne, D.E. (1988). Narrative Knowing and the Human Sciences. New York: State University of New York.

Popper, K. (2002). The Logic of Scientific Discovery. Oxon: Routledge Classics.

Powell, K.C. & Kalina, C.J. (2009). Cognitive and social constructivism: developing tools for an effective classroom. Education, 130(2), 241-250. Available at https://docdrop.org/static/drop-pdf/ConstructivismDay1-ln36v.pdf

Prawat, R.S. (1992). Teachers’ Beliefs about Teaching and Learning: A Constructivist Perspective. American Journal of Education, 100(3), 354-395. doi:10.1086/444021

Prins, R., Avraamidou, L., & Goedhart, M. (2017). Tell me a Story: the use of narrative as a learning tool for natural selection. Educational Media International, 54(1), 20-33. doi:10.1080/09523987.2017.1324361

Radvansky, G.A. (2005). Situation models, propositions, and the fan effect. Psychonomic Bulletin & Review, 12(3), 478-483. doi:10.3758/BF03193791

313 Radvansky, G.A. (2012). Across the Event Horizon. Current Directions in Psychological Science, 21(4), 269-272. doi:10.1177/0963721412451274

Radvansky, G.A., Zwaan, R.A., Federico, T., & Franklin, N. (1998). Retrieval From Temporally Organized Situation Models. Journal of Experimental Psychology: Learning, Memory, and Cognition, 24(5), 1224-1237. doi:10.1037/0278-7393.24.5.1224

Radvansky, G.A., Zwaan, R.A., Curiel, J.M., & Copeland, D.E. (2001). Situation Models and Aging. Psychology and Aging, 16(1), 145-160. doi:10.1037/0882-7974.16.1.145

Rassool, N. (2009). Literacy: In Search of a Paradigm. In Soler, J., Fletcher-Campbell, F. & Reid, G. (Eds.) Understanding Difficulties in Literacy Development: Issues and Concepts. (pp.7-31). London: SAGE Publications Ltd.

Recht, D.R. & Leslie, L. (1988). Effect of Prior Knowledge on Good and Poor Readers’ Memory of Text. Journal of Educational Psychology, 80(1), 16-20. doi:10.1037/0022- 0663.80.1.16

Redrum, D. (2005). From Narrative Representation to Narrative Use: Towards the Limits of Definition. Narrative, 13(2), 195-204. doi:10.1353/nar.2005.0013

Reiss, M.J., Millar, R., & Osborne, J. (1999). Beyond 2000: Science/biology education for the future. Journal of biological education, 32(2), 68-70. doi:10.1080/00219266.1999.9655644

Repoussi, M. & Tutiaux-Guillon, N. (2010). New Trends in History Textbook Research: Issues and Methodologies toward a School Historiography. Journal of Educational Media, Memory & Society, 2(1), 154-170. doi:10.3167/jemms.2010.020109

Richardson, B. (2000). Recent Concepts of Narrative and the Narratives of Narrative Theory. Style, 34(2), 168-175. doi: 10.1002/9780470693780.ch18

Richert, R.A. & Smith, E.I. (2011). Preschoolers’ Quarantining of Fantasy Stories. Child Development, 82(4), 1106-1119. doi:10.1111/j.1467-8624.2011.01603.x

Richmond, G., Juzwik, M.M., & Steele, M.D. (2011). Trajectories of Teacher Identity Development across Institutional Contexts: Constructing a Narrative Approach. Teachers College Record, 113(9), 1863-1905. Available at https://www.tcrecord.org/Content.asp?ContentId=16177

Robertson, D.A., Gernsbacher, M.A., Guidotti, S.J., Robertson, R.R.W., Irwin, W., Mock, B.J., & Campana, M.E. (2000). Functional Neuroanatomy of the Cognitive Process of Mapping During Discourse Comprehension. Psychological Science, 11(3), 255-260. doi:10.1111/1467-9280.00251

Rogers, P.J. (1979). The New History: Theory into Practice. London: The Historical Association.

314 Rose, J. (2006). Independent Review of the Teaching of Early Reading: Final Report. Nottingham: DfES Publications.

Rumelhart, D.E. (1975). Notes on a schema for stories. In Bobrow, D.G. & Collins, A. (Eds.) Representation and Understanding: Studies in Cognitive Science (pp.211-236). New York: Academic Press.

Rumelhart, D.E. & Norman, D.A. (1978). Accretion, Tuning, and Restructuring: Three Modes of Learning. In Cotton, J.W. & Klatzky, R.L. (Eds.) Semantic factors in cognition. (pp.37-53). New Jersey: Lawrence Erlbaum Associates.

Russo, T. & Russo, J. (2018). Narrative-first approach: Teaching mathematics through picture story books. Australian Primary Mathematics Classroom, 23(2), 8-14. Available at https://www.researchgate.net/publication/326332318_Narrative- first_approach_Teaching_mathematics_through_picture_story_books

Ryan, M.L. (2005). On the Theoretical Foundations of Transmedial Narratology. In Meister, J.C., Kindt, T., & Schernus, W. (Eds.) Narratology beyond Literacy Criticism: Mediality, Disciplinarity (pp.1-24). Berlin: Walter de Gruyter.

Ryan, M (2019). From Possible Worlds to Storyworlds: On the Worldness of Narrative Representation. In Bell, A. & Ryan, M. (Eds.) Possible Worlds Theory and Contemporary Narratology (pp.62-87). Lincoln: University of Nebraska Press.

Sapon-Shevin, M. (1992). Ability Differences in the Classroom: Teaching and Learning in Inclusive Classrooms. In Byrnes, D.A. & Kiger, G. (Eds.) Common Bonds: Anti-Bias Teaching in a Diverse Society (pp.39-52). Wheaton: Association for Childhood Education International.

Sarbin, T. (1986). Narrative Psychology: The Storied Nature of Human Conduct. London: Praeger.

Schank, R.C. & Berman, T.R. (2013). The Pervasive Role of Stories in Knowledge and Action. In Green, M.C., Strange, J.J., & Brock, T.C. (Eds.) Narrative impact: Social and Cognitive Foundations (pp.287-314). New York: Psychology Press.

Schraw, G., Flowerday, T., & Lehman, S. (2001). Increasing Situational Interest in the Classroom. Educational Psychology Review, 13(3), 211-244. doi:10.1023/A:1016619705184

Shakespeare, W. (2000). The Tragedy of King Richard III. Edited by John Jowett. New York: Oxford University Press. 1.3:246.

Sharon, T. & Woolley, J.D. (2004). Do Monsters Dream? Young Children’s Understanding of the Fantasy/Reality Distinction. British Journal of Developmental Psychology, 22(2), 293-310. doi:10.1348/026151004323044627

315 Sheldon, N. (2010). Jeannette Coltham’s, John Fines’ and Peter Rogers’ Historical Association pamphlets: their relevance to the development of ideas about History teaching . International Journal of Historical Learning Teaching and Research, 9(1), 9-12. Available at https://www.history.org.uk/secondary/categories/8/resource/3220

Shtulman, A. & Carey, S. (2007). Improbable or impossible? How children reason about the possibility of extraordinary events. Child Development, 78(3), 1015-1032. doi:10.1111/j.1467-8624.2007.01047.x

Sinatra, G.M. (1990). Convergence of listening and reading processing. Reading Research Quarterly, 25(2), 115-130. doi:10.2307/747597

Skolnick, D. & Bloom, P. (2006) What Does Batman Think about SpongeBob? Children’s Understanding of the Fantasy/Fantasy Distinction. Cognition, 101(1), B9-B18. doi:10.1016/j.cognition.2005.10.001

Soler, J. & Openshaw, R. (2007). ‘To Be or Not to Be?’: The Politics of Teaching Phonics in England and New Zealand. Journal of Early Childhood Literacy, 7(3), 333-352. doi:10.1177/1468798407083662

Souchay, C., Guillery-Girard, B., Pauly-Takacs, K., Wojcik, D.Z., & Eustache, F. (2013). Subjective experience of episodic memory and metacognition: a neurodevelopmental approach. Frontiers in Behavioural Neuroscience, 7(212), 1-16. doi:10.3389/fnbeh.2013.00212

Speer, N.K., Reynolds, J.R., Swallow, K.M., & Zacks, J.M. (2009). Reading stories activates neural representations of visual and motor experiences. Psychological Science, 20(8), 989-999. doi:10.1111/j.1467-9280.2009.02397.x

Sperry, L.L. & Sperry, D.E. (1996). Early Development of Narrative Skills. Cognitive Development, 11(3), 443-465. doi:10.1016/S0885-2014(96)90013-1

Spivey, N.N. (1997). The Constructivist Metaphor. Reading, Writing, and the Making of Meaning. San Diego: Academic Press.

Squire, L.R. (2004). Memory systems of the brain: A brief history and current perspective. Neurobiology of Learning and Memory, 82(3), 171-177. doi:10.1016/j.nlm.2004.06.005

Stahl, S.A. & Miller, P.D. (1989). Whole Language and Language Experience Approaches for Beginning Reading: A Quantitative Research Synthesis. Review of Educational Research, 59(1), 87-116. doi:10.2307/1170448

Stanovich, K.E. (1980). Toward an Interactive-Compensatory Model of Individual Differences in the Development of Reading Fluency. Reading Research Quarterly, 16(1), 32-71. doi:10.2307/747348

316 Stanovich, K.E. (1994). Constructivism in Reading Education. The Journal of Special Education, 28(3), 259-274. doi:10.1177/002246699402800303

Stanovich, K.E., West, R.F., & Feeman, D.J. (1981). A longitudinal study of sentence context effects in second-grade children: Tests of an interactive-compensatory model. Journal of Experimental Child Psychology. 32(2), 185-199. doi:10.1016/0022- 0965(81)90076-X

Stevens, J. (1996). Applied multivariate statistics for the social sciences. 3rd ed. New Jersey: Lawrence Erlbaum Associates.

Stinton, H. (2014). Harry’s War: A British Tommy’s experiences in the trenches in World War One. Researched and edited by V. Mayo. London: Bloomsbury Publishing.

Strouse, G.A., Nyhout, A., & Ganea, P.A. (2018). The Role of Book Features in Young Children’s Transfer of Information from Picture Books to Real-World Contexts. Frontiers in Psychology, 9(50), 1-14. doi:10.3389/fpsyg.2018.00050

Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. doi:10.1016/0364-0213(88)90023-7

Szurmak, J. & Thuna, M. (2013). ‘Tell Me a Story: The Use of Narrative as a Tool for Instruction’, Conference of the Association of College and Research Libraries. Indianapolis, 10-13 April. Available at http://www.ala.org/acrl/sites/ala.org.acrl/files/content/conferences/confsandpreconf s/2013/papers/SzurmakThuna_TellMe.pdf

Tapiero, I. (2007). Situation Models and Levels of Coherence: Toward a Definition of Comprehension. New York: Lawrence Erlbaum Associates.

Tapiero, I. & Otero, J. (1999). Distinguishing Between Textbase and Situation Model in the Processing of Inconsistent Information: Elaboration Versus Tagging. In van Oostendorp, H. & Goldman, S.R. (Eds.) The Construction of Mental Representations During Reading (pp.341-366). New Jersey: Lawrence Erlbaum Associates.

Therriault, D.J. & Rinck, M. (2007). Multidimensional situation models. In Schmalhofer, F. & Perfetti, C.A. (Eds.) Higher Level Language Processes in the Brain: Inference and Comprehension Processes (pp.311-327). New Jersey: Lawrence Erlbaum Associates.

Thon, J. (2016). Transmedial Narratology and Contemporary Media Culture. Lincoln: University of Nebraska Press.

Topping, K.J. (2015). Fiction and Non-Fiction Reading and Comprehension in Preferred Books. Reading Psychology, 36(4), 350-387. doi:10.1080/02702711.2013.865692

Trabasso, T. & van den Broek, P. (1985). Causal thinking and the representation of narrative events. Journal of Memory and Language, 24(5), 612-630. doi:10.1016/0749- 596X(85)90049-X

317 Tulving, E. (1983). Elements of episodic memory. Cambridge: Oxford University Press.

Tulving, E. (2005). Episodic Memory and Autonoesis: Uniquely Human? In Terrace, H.S. & Metcalfe, J. (Eds.) The missing link in cognition: Origins of self-reflective consciousness (pp.3-56). New York: Oxford University Press.

Turner, M. (1996). The Literary Mind: The Origins of Thought and Language. Oxford: Oxford University Press. van den Broek, P. (1990). Causal Inferences and The Comprehension of Narrative Texts. Psychology of Learning and Motivation, 25, 175-196. doi:10.1016/S0079- 7421(08)60255-8 van Dijk, E.M. & Kattmann, U. (2009). Teaching Evolution with Historical Narratives. Evolution: Education and Outreach. 2, 479-489. doi:10.1007/s12052-009-0127-2 van Dijk, T.A. & Kintsch, W. (1983). Strategies of Discourse Comprehension. New York: Academic Press. van Loon, M.H., de Bruin, A.B.H., van Gog, T., & van Merriënboer, J.J.G. (2013). Activation of inaccurate prior knowledge affects primary-school students’ metacognitive judgments and calibration. Learning and Instruction, 24(1), 15-25. doi:10.1016/j.learninstruc.2012.08.005

Venkadasalam, V.P. & Ganea, P.A. (2018). Do objects of different weight fall at the same time? Updating naïve beliefs about free-falling objects from fictional and informational books in young children. Journal of Cognition and Development, 19(2), 165-181. doi:10.1080/15248372.2018.1436058 von Heyking, A. (2004). Historical Thinking in the Elementary Years: A Review of Current Research. Canadian Social Studies, 39(1). Available at https://files.eric.ed.gov/fulltext/EJ1073974.pdf

Vygotsky, L. (1962). Thought and Language. Massachusetts: MIT Press.

Vygotsky, L. (1978). Mind in society: The development of higher psychological processes. Cambridge: Harvard University Press.

Vygotsky, L. (1981). Genesis of Higher Mental Functions. In Light, P., Sheldon, S., & Woodhead, M. (Eds.) Child Development in Social Context 2. Learning to Think (pp.32- 41). London: Routledge.

Walan, S. (2019). Teaching children science through storytelling combined with hands- on activities – a successful instructional strategy? Education 3-13, 47(1), 34-46. doi:10.1080/03004279.2017.1386228

Walker, C.M., Gopnik, A., & Ganea, P.A. (2015). Learning to learn from stories: children’s developing sensitivity to the causal structure of fictional worlds. Child Development, 86(1), 310-318. doi:10.1111/cdev.12287

318 Weisberg, D.S., Sobel, D.M., Goodstein, J., & Bloom, P. (2013). Young Children are Reality-Prone When Thinking about Stories. Journal of Cognition and Culture, 13(3-4), 383-407. doi:10.1163/15685373-12342100

Weisman, K. (2011). The Rising Star of Narrative Nonfiction. Book Links, 20(4), 8-12. Available at https://library.laredo.edu/eds/detail?db=lfh&an=66719633&isbn=9780374399184

Wells, G. (1992). The Centrality of Talk in Education. In Norman, K. (Ed.) Thinking Voices: The Work of the National Oracy Project (pp.283-310). London: Hodder and Stoughton.

Wheeler, M.A., Stuss, D.T., & Tulving, E. (1997). Toward a theory of episodic memory: The frontal lobes and autonoetic consciousness. Psychological Bulletin, 121(3), 331- 354. doi:10.1037/0033-2909.121.3.331

White, H. (1980). The Value of Narrativity in the Representation of Reality. Critical Inquiry, 7(1), 5-27. doi:10.1086/448086

Wilkinson, K. S. & Houston-Price, C. (2013). Once upon a time, there was a pulchritudinous princess…: The role of word definitions and multiple story contexts in children’s learning of difficult vocabulary. Applied Psycholinguistics, 34(3), 591–613. doi:10.1017/S0142716411000889

Williams, J.C. (2014). Recent official policy and concepts of reading comprehension and inference: the case of England’s primary curriculum. Literacy, 48(2), 95-102. doi:10.1111/lit.12012

Wineburg, S. (2001). Historical Thinking and Other Unnatural Acts: Charting the Future of Teaching the Past. Philadelphia: Temple University Press.

Wolfe, M.B.W. (2005). Memory for Narrative and Expository Text: Independent Influences of Semantic Associations and Text Organization. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 359-364, doi:10.1037/0278- 7393.31.2.359

Wolfe, M.B.W. & Mienko, J.A. (2007). Learning and memory of factual content from narrative and expository text. British Journal of Educational Psychology, 77(3), 541- 564. doi:10.1348/000709906X143902

Wolfe, M.B.W., & Woodwyk, J.M. (2010). Processing and memory of information presented in narrative or expository texts. British Journal of Educational Psychology, 80(3), 341-362. doi:10.1348/000709910X485700

Wood, D., Bruner, J.S., & Ross, G. (1976). The Role of Tutoring in Problem Solving. Journal of Child Psychology and Psychiatry, 17(2), 89-100. doi:10.1111/j.1469- 7610.1976.tb00381.x

319 Woolley, J.D. & Ghossainy, M.E. (2013). Revisiting the fantasy-reality distinction: children as native skeptics. Child Development, 84(5), 1496-1510. doi:10.1111/cdev.12081

Woolley, J.D. & Van Reet, J. (2006). Effects of Context on Judgments Concerning the Reality Status of Novel Entities. Child Development, 77(6), 1778-1793. doi:10.1111/j.1467-8624.2006.00973.x

Wright, T.S. (2013). From potential to reality: Content-rich vocabulary and informational text. The Reading Teacher, 67(5), 359-367. doi:10.1002/trtr.1222

Yates, J.F. & Curley, S.P. (1986). Contingency judgment: Primacy effects and attention decrement. Acta Psychologica, 62(3), 293-302. doi:10.1016/0001-6918(86)90092-2

Yeari, M., van den Broek, P., & Oudega, M. (2015). Processing and memory of central versus peripheral information as a function of reading goals: evidence from eye- movements. Reading and Writing, 28(8), 1071-1097. doi:10.1007/s11145-015-9561-4

Yopp, R.H. & Yopp, H.K. (2006). Informational texts as read-alouds at school and home. Journal of Literacy Research, 38(1), 37-51. doi:10.1207/s15548430jlr3801_2

Yuan, Y., Major-Girardin, J., & Brown, S. (2018). Storytelling Is Intrinsically Mentalistic: A Functional Magnetic Resonance Imaging Study of Narrative Production across Modalities. Journal of Cognitive Neuroscience, 30(9), 1298-1314. doi:10.1162/jocn_a_01294

Yusuf, H.O. & Mohammed, S. (2013). Influence of Prior Knowledge Questions on Pupils’ Performance in Reading Comprehension in Primary Schools in Kaduna, Nigeria. Advances in Language and Literacy Studies, 4(1), 134-139. doi:10.7575/aiac.alls.v.4n.1p.134

Zaccaria, M.A. (1978). The Development of Historical Thinking: Implications for the Teaching of History. The History Teacher, 11(3), 323-340. doi:10.2307/491623

Zacks, J.M., Speer, N.K., & Reynolds, J.R. (2009). Segmentation in reading and film comprehension. Journal of Experimental Psychology: General, 138(2), 307-327. doi:10.1037/a0015305.

Zacks, J.M., Speer, N.K., Swallow, K.M., Braver, T.S., & Reynolds, J.R. (2007). Event perception: a mind-brain perspective. Psychological Bulletin, 133(2), 273-293. doi:10.1037/0033-2909.133.2.273.

Zacks, J.M., Speer, N.K., Vettel, J.M., & Jacoby, L.L. (2006). Event understanding and memory in healthy aging and dementia of the Alzheimer type. Psychology and Aging, 21(3), 466-482. doi:10.1037/0882-7974.21.3.466.

Zacks, J.M. & Swallow, K.M. (2007). Event Segmentation. Current Directions in Psychological Science, 16(2), 80-84. doi:10.1111/j.1467-8721.2007.00480.x

320 Zwaan, R.A. (1994). Effect of genre expectations on text comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 20(4), 920-933. doi:10.1037/0278-7393.20.4.920

Zwaan, R.A. (2001). Situation Model: Psychological. In Smelser, N.J. & Baltes, P.B. (Eds.) International Encyclopedia of the Social & Behavioural Sciences (pp.14137-14141). New York: Elsevier.

Zwaan, R.A. (2016). Situation models, mental simulations, and abstract concepts in discourse comprehension. Psychonomic Bulletin & Review, 23(4), 1028-1034. doi:10.3758/s13423-015-0864-x

Zwaan, R.A., Langston, M.C., & Graesser, A.C. (1995a). The Construction of Situation Models in Narrative Comprehension: An Event-Indexing Model. Psychological Science, 6(5), 292-297. doi:10.1111/j.1467-9280.1995.tb00513.x

Zwaan, R.A., Magliano, J.P., & Graesser, A.C. (1995b). Dimensions of Situation Model Construction in Narrative Comprehension. Journal of Experimental Psychology: Learning, Memory, and Cognition, 21(2), 386-397. doi:10.1037/0278-7393.21.2.386

321

Appendices

322 Appendix A: G*Power calculation

323 Appendix B: Details of participants on the Special Educational Needs register

• Six participants were on their school’s SEN register Note that the total number of children in the table adds up to seven. This is because one of the participants falls under 2 of these categories (ASCD and SLCN)

• Four further participants were being monitored for SEN at the time of the research

Table 113: Number of participants with specific learning difficulties Number of children Learning difficulty (number of children being monitored) Learning Difficulties & Disabilities 4 (3) (LDD) Speech, Language and 1 (1) Communication needs (SLCN)

Attention Deficit Hyperactivity 1 Disorder (ADHD)

Autism and Social Communication 1 Difficulties (ASCD)

324 Appendix C: Reading and history levels of participants across schools

C(i): Reading levels

Table 114 shows the spread of reading levels across the participant sample. In primary schools, children’s reading levels are assessed by class teachers, using both observational data (for instance, when reading with children) and summative data (primarily scores on written reading comprehension assessments) to inform the assessment. Each child is assigned a reading level.

For each year group, there is a ‘band’ which children are expected to be working within. For instance, Year 5 children are expected to be working within band 5. For each band, various descriptors are listed explaining what children are expected to do. Children are then assigned a letter according to the depth they are working within that band (see table 115). For instance, if children are beginning to work on the Year 5 descriptors, their reading level will be 5b.

Children may be assigned to bands below their year group (e.g. a Year 5 child might be in band 3), but may not be assigned to bands above their year group (e.g. a Year 5 child cannot be placed in band 6).

Table 114: Reading levels of participants

Year group Number of Reading levels (age) participants Year 2 2b – 2s+ 2 (age 6-7 years) Year 3 3b – 3s+ 1 (age 7-8 years) Year 4 4b – 4s+ 9 (age 8-9 years) 5b 5 5b+ 7 Year 5 5w 13 (age 9-10 years) 5w+ 11 5s 22 5s+ 8

Table 115: Description of letters assigned in reading levels

b Beginning the band b+ Beginning the band plus w Working within the band w+ Working within the band plus s Secure in the band s+ Secure in the band plus

325 C(ii): History levels

Table 116 shows the spread of history levels across the participant sample. In foundation subjects, such as history, children are assessed by their teachers according to what teachers have observed during lessons.

History levels are assigned in different ways across schools. School B assessed history ability using the same system as reading, described above. However, School A categorised children as one of three levels: working towards the expected standard, working at the expected standard, or working at a greater depth than the expected standard.

To ensure that levels were comparable across the two schools, School B’s data was translated into the same format as School A’s data. Participants in School B who were assigned as a level 5b or 5b+ were described as ‘working towards’, those at a level 5w or 5w+ were described as ‘expected’, and those at 5s or 5s+ were described as ‘greater depth’.

Table 116: History levels in participant sample

Number of History level participants

Working towards 6 expected standard

Working at 58 expected standard

Greater depth than 14 expected standard

326 Appendix D: Consent package sent home to parents/guardians D(i): Information sheet for parents

INFORMATION SHEET FOR PARTICIPANTS Monday 12th March 2018

REC Reference Number : LRS-16/17-4696

PLEASE RETAIN THIS INFORMATION SHEET IF YOU WOULD LIKE TO REFER BACK TO IT AT ANY TIME.

How can nonfiction and narrative nonfiction texts support learning in the history classroom?

I would like to invite you and your child to participate in this research project which forms part of my PhD research. You and your child should only participate if you want to; choosing not to take part will not disadvantage you, your child, or your child’s education, in any way. This project does not form part of your child’s curriculum or set education, but is completely voluntary. Before you decide whether you and your child want to take part, it is important for you to understand why the research is being done and what your and your child’s participation will involve. Please take the time to read the following information carefully and discuss it with others if you wish. If there is anything that is not clear or if you would like more information, please contact me using the contact details at the end of this information.

What is the purpose of this study?

I am interested in exploring how different types of books might affect learning about history in different ways. Although nonfiction books are often used to support the teaching of history, it is thought that narratives might have a positive effect on learning, for many reasons. ‘Narrative nonfiction’ texts are narratives which are entirely factual, and therefore appropriate to use in the history classroom. This study aims to explore the effects of both nonfiction and narrative nonfiction books on history learning, and whether either text is more effective in supporting learning.

Why have I been invited to take part?

I have given the Year 5 classes in your child’s school a talk about what my research involves and have answered any questions that the children had. Every child in each of these classes is eligible to take part and has received the same information package to take home to their parents/guardians. No children who wish to participate will be excluded. The Year 5 classes in another school will also be asked to participate.

Do I have to take part?

Participation is voluntary. You and your child do not have to take part. You should read this information sheet and if you have any questions you should ask the researcher (please find contact details at the end of this information). You should not agree to take part in this research until you have had all your questions answered satisfactorily. If you and your child choose not to take part, your child will receive the same teaching sessions as participants, but they will not be audio-recorded and no data will be collected.

327 What Will Your Child Be Asked to Do?

The main part of this research will take place over the course of 5 weeks. Your child will be asked to sign a consent form by the researcher before the study begins. In the first week, your child will be given an assessment on a specific history topic, to assess their knowledge of the topic. An intervention will take place in week 2. This intervention will involve a 30-minute history lesson on a specific topic. Following this lesson, a text relating to the history topic will be read to your child. Your child will then be asked, as part of a small group, to discuss some questions about the history topic. These discussions will be audio-recoded. The intervention should take approximately an hour in total. This sequence will be repeated once in week 3 and once more in week 4. Your child will be given a second assessment about the history topic in week 5. Approximately 9 weeks later, your child will be asked to sit the assessment one final time.

What Will You Be Asked to Do?

Firstly, you will be asked to discuss the study with your child and to ensure that they feel comfortable with the study. A simplified information sheet has been included in this package, which you may like to read with your child. If you feel that your child is prepared to participate, and your child has agreed, please sign the following consent form and answer the questions on the question sheet provided in this pack. You will need to return the consent form and question sheet to the school office or class teacher. There is an additional consent form in this pack for you to keep for your own reference. Once this has been done, no further action will be required from you.

What are the possible risks of taking part?

The risks involved in participating are minimal. There is a risk that your child may experience slight discomfort either at being recorded or studied. However, the study fits into your child’s daily classroom routine to ensure that they are comfortable. The history topics chosen are part of the National Curriculum, so your child will not be exposed to any information which they would not be expected to encounter during their daily school life. If I feel that your child is uncomfortable at all during the study, I will stop any audio-recording and discuss any problems with your child, before consulting their teacher. You may then be contacted by the school. A further risk is that if one type of text is less effective in supporting learning, some children might be disadvantaged by a poorer learning experience than other children. To counter this, when the study is complete, I will provide each class with a copy of the texts, so that all children have the opportunity to read all texts. No further data will be recorded at this stage.

What are the possible benefits of taking part?

When the project has been completed (September 2019), I will be more than happy to send you an email, detailing any findings, if you wish. If so, there is an opportunity to request one when completing the consent form. The findings of the study may suggest new ways in which your child can learn using different types of books, which can be put into practice both in school and at home.

What happens to the data collected, and how will we maintain your confidentiality and privacy?

The recording of your child’s discussion will be kept on my personal, password-protected home computer. All hard copies of the data will be kept in a locked box in my home. I will be using the data to explore how different texts types affect learning about history, and will write up a report on my findings. I will discuss no personal details. I will also obtain some information about your child from the school, such as the academic levels which they are currently working at. This information will also be stored securely on my personal, password-protected home computer. 328 All data collected will remain completely confidential. To protect your privacy, your child and the school will be given a pseudonym, which will be used whenever I discuss the data and in the final report. I will have a master list detailing which pseudonym corresponds to which child. If you agree to participate, you reserve the right to withdraw at any point between signing the consent form and 1st September 2018, when the analysis of data will be completed. You do not have to give a reason for withdrawing. If you choose to withdraw, all of the data stored about you and your child will be deleted. A decision to withdraw at any time, or a decision not to take part, will not affect the standard of care you receive.

What will happen to the results of the study?

I will produce a final report summarising the main findings, which will form a part of my thesis. I also plan to disseminate the research findings through publication upon the completion of the project.

What if I have any questions about the project, or what if something goes wrong?

If you have any questions or require more information about this study, please contact me, Emma Browning, using the following contact details: [email protected]

It is up to you to decide whether to take part or not. If you do decide to take part, please keep this information sheet safe, so that you can consult it at any time.

If this study has harmed you in any way you can contact King's College London using the details below for further advice and information. The details provided below are of my PhD supervisor, who is a senior lecturer at King’s College London.

Jill Hohenstein Address: Franklin-Wilkins Building Stamford Street, London SE1 9NH Email: [email protected] Phone: 020 7848 3100

329 D(ii): Child-friendly information sheet

Information Sheet for children

How do children learn about history?

I want to explore how children learn about history when they are reading different types of books. To do this, I will need your help!

First, I will ask you to answer some history questions, to see how much you already know about the history topic that we will be learning about together – the First World War.

The next week, we will begin some history lessons. After a short history lesson, I am going to share some different history books with you. Some of you will look at one type of book, and others will look at another type of book.

These books will all be about the First World War.

I’m going to give you some questions about the history topic which you have been learning about, which I will ask you to discuss together in groups.

I will ask you to record these discussions. These recordings will only be sound recordings, not videos.

We will do this once a week, for 3 weeks.

Once we have completed 3 history lessons, I will ask you to answer some more history questions, to see what you have learnt.

Everything will be kept private.

Would you like to be one of these people?

If you say “yes” now, you can still change your mind later.

Nothing bad will happen to you if you say “no”.

Do you have any questions?

I can talk to you before you say “yes” or “no” if you want to ask any questions or talk more about what will happen.

You can speak to me in school, or your parents/guardians can contact me at: [email protected]

Participant’s name: ______

330 D(iii): Consent form

331 D(iv): Short questionnaire for parents Please complete the following questions.

1. Is your child male or female?

Male Female

2. What is your child’s date of birth? _ _ // _ _ // _ _ _ _

3. To which ethnic group does your child belong?

White – British White – non-British Black – British Black – non-British Asian – British Asian – non-British Mixed – British Mixed – non-British Other – British Other – non-British

4. What is the name of your child’s class/teacher?

______

5. Is English one of your child’s native languages?

Yes, it is their native language/it is one of their native languages No, it is not my child’s native language If it is not, please specify your child’s native language ______

6. Does your child have any problems or difficulties which affect their learning? Yes No

If so, please provide more detail of the difficulty and how it might affect your child’s learning.

______

7. On average, approximately how many hours does your child spend reading at home a week? This includes both reading alone and reading with others.

0 0-1 1-2 2-3 3-4 4+

332 Appendix E: Questionnaires attached to assessments E(i): Questionnaire attached to pre-assessments Name: ______Class: ______

Q1. How many days a week do you usually read at home?

None 1–2 days a week 3–4 days a week 5 – 6 days a week Every day

Q2. What type of books do you like to read the most?

Fiction (stories that are not true) Non-fiction (true information books)

I enjoy both fiction and non-fiction equally

Q3. What other hobbies do you enjoy at home? Tick your three favourite.

Watching TV Playing board games Playing outside

Playing computer games Listening to music Reading

Playing sports

Q4. On a scale of 1-5, how much do you enjoy learning about history? Please circle your answer. 1 2 3 4 5 I really I don’t I don’t don’t I like I love like mind like history history history history history

Q5. Is World War One a topic that interests you?

Yes No

Q6. On a scale of 1-3, how much do you know about World War One? Please circle your answer. 1 2 3

I don’t I know a I know a know little lot anything

Q7. If you circled 2 or 3, where/how did you learn about World War One? ______

333 E(ii): Questionnaire attached to post-assessments Name: ______Class: ______

Q1. On a scale of 1-5, how much do you enjoy learning about history? Please circle your answer. 1 2 3 4 5

I really I don’t I don’t don’t I like I love like mind like history history history history history

Q2. Is World War One a topic that interests you?

Yes No

Q3. On a scale of 1-5, how much did you enjoy learning about WWI in our lessons together? 1 2 3 4 5

I really I didn’t I didn’t I I really didn’t enjoy mind enjoyed enjoyed enjoy them them them them them

Q4. How would you describe the type of text that we read during the interventions?

______

Q5. On a scale of 1-3, how much do you know about World War One? Please circle your answer. 1 2 3

I don’t I know a I know a know little lot anything

Q6. If you circled 2 or 3, where/how did you learn about World War One?

______

334 Appendix F: Written assessment The First World War Q1. When did World War One begin?

______Q2. Which events caused World War One to begin? Give as many details as you can.

______Q3. List the main countries on each side at the beginning of World War One.

The Allies The Central Powers 1 - Britain 1 - Germany 2 - ______2 - ______3 - ______4 - ______

Q4. Match the names to the description of the people:

Herbert Asquith Serbian assassin

Archduke Franz Ferdinand Heir to Austro-Hungarian throne

Kaiser Wilhelm II British Prime Minister

Gavrilo Princip Leader of Germany

Q5. Who were the Black Hand and what was their mission?

______Q6. What caused Britain to become involved in World War One?

______

335 Q7. What were the 4 different types of trenches called? Name as many as possible.

1 – ______2 – ______3 – ______4 – ______Q8. Trenches were designed to be deep, narrow and twisted. Give two reasons explaining why they were designed like this.

1 – ______2 – ______

Q9. What infection did soldiers often suffer from in the trenches? Can you describe how they caught it and how they tried to prevent it?

______

Q10. Order these events which occurred during the Battle of the Somme, from the first event (1) to the last event (4) .

German soldiers climbed out of their dugouts to prepare for battle. British and French soldiers ‘go over the top’ German soldiers attacked British and French soldiers. British artillery bombardment of German trenches. Q11. How many casualties were there on the first day of the Somme?

______

Q12. What caused so many casualties on the first day of the Somme?

______

336 Q13. Name 3 differences/changes between life before and during the war for people living on the Home Front? 1- ______2- ______3- ______

Q14. Why did more women begin working in munitions factories after World War One began?

______Q15. Describe the factory jobs which women did during the war, and the effect that these jobs had on the women doing them.

______

Q16. What was conscription?

______

Q17. What effect did conscription have on families during the war?

______

Q18. What was school life like for children during the war?

______

337 Q19. Order these events which occurred during World War One, from the first event (1) to the last event (6) . Add the year in which these events happened if you can (for example, 1 9 1 4).

The Battle of Somme began. ______Conscription was introduced in Britain. ______Archduke Franz Ferdinand assassinated. ______The Battle of Somme ended. ______Britain declare war on Germany. ______The Defence of Realm Act was introduced in Britain. ______

Q20. Match the different groups of people to their wartime experiences. Each group of people can be matched to two different experiences.

Soldiers Children Women Country leaders

Worked with Stayed in small Were often sent to dangerous villages in wait in long queues chemicals such France before outside shops to buy as ballistite and moving to the food for their TNT. trenches. families.

Were given tough

punishments for Had to be Made decisions misbehaviour, such as on sentry about their having to write out the duty during countries actions same sentence 100 the night. during the war. times.

Worked long hours Introduced rules from 7 o’clock in the for citizens during morning to 7 o’clock the war. in the evening.

338 Appendix G: Annotated written assessment The First World War War Begins Questions (Questions 1 - 6) Q1. When did World War One begin? [2 points] Chronological ______Q2. Which events caused World War One to begin? Give as many details as you can. [4 points] Causal ______Q3. List the main countries on each side at the beginning of World War One. [4 points] Simple conceptual

The Allies The Central Powers 5 - Britain 3 - Germany 6 - ______4 - ______7 - ______8 - ______

Q4. Match the names to the description of the people: [2 points] Simple conceptual

Herbert Asquith Serbian assassin

Heir to Austro-Hungarian throne Archduke Franz Ferdinand

Kaiser Wilhelm II British Prime Minister

Gavrilo Princip Leader of Germany

Q5. Who were the Black Hand and what was their mission? [3 points] Complex conceptual ______Q6. What caused Britain to become involved in World War One? [3 points] Causal ______

339 Trench Life Questions (Questions 7 - 12) Q7. What were the 4 different types of trenches called? Name as many as possible. [4 points] Simple conceptual 1 – ______2 –______3 – ______4 – ______

Q8. Trenches were designed to be deep, narrow and twisted. Give two reasons explaining why they were designed like this. [2 points] Causal

1 – ______2 – ______Q9. What infection did soldiers often suffer from in the trenches? Can you describe how they caught it and how they tried to prevent it? [3 points] Complex conceptual

______

Q10. Order these events which occurred during the Battle of the Somme, from the first event (1) to the last event (4) . [2 points] Chronological

German soldiers climbed out of their dugouts to prepare for battle. British and French soldiers ‘go over the top’ German soldiers attacked British and French soldiers. British artillery bombardment of German trenches.

Q11. How many casualties were there on the first day of the Somme? [2 points] Simple conceptual

______

Q12. What caused so many casualties on the first day of the Somme? [2 points] Causal

______

340 The Home Front Questions (Questions 13 - 18) Q13. Name 3 differences/changes between life before and during the war for people living on the Home Front? [3 points] Complex conceptual 1- ______2- ______3- ______

Q14. Why did more women begin working in munitions factories after World War One began? [3 points] Causal

______Q15. Describe the factory jobs which women did during the war, and the effect that these jobs had on the women doing them. [2 points] Complex conceptual

______Q16. What was conscription? [2 points] Simple conceptual

______Q17. What effect did conscription have on families during the war? [2 points] Causal

______Q18. What was school life like for children during the war? [4 points] Complex conceptual

______

341 Questions spanning all three interventions (Questions 19 - 20)

Q19. Order these events which occurred during World War One, from the first event (1) to the last event (6) . Add the year in which these events happened if you can (for example, 1 9 1 4). [5 points] Chronological

The Battle of Somme began. ______Conscription was introduced in Britain. ______Archduke Franz Ferdinand assassinated. ______The Battle of Somme ended. ______Britain declare war on Germany. ______The Defence of Realm Act was introduced in Britain. ______

Q20. Match the different groups of people to their wartime experiences. Each group of people can be matched to two different experiences. [4 points] Simple conceptual

Soldiers Children Women Country leaders

Worked with Stayed in small Were often sent to dangerous villages in wait in long queues chemicals such France before outside shops to buy as ballistite and moving to the food for their TNT. trenches. families.

Were given tough punishments for misbehaviour, such as Made decisions Had to be having to write out the about their on sentry same sentence 100 countries actions duty during times. during the war. the night.

Worked long hours from 7 o’clock in the Introduced rules morning to 7 o’clock for citizens during in the evening. the war.

342 Appendix H: Mark scheme for written assessment

Possible Question Answer Additional guidance points 1 When did World War 2 1914 Give 2 marks if exact year given (1914) One begin? Give 1 mark if within 4 years of exact date (1910-1918). 2 Which events caused 4 - Archduke Franz Ferdinand assassinated by the Black Give 1 mark for reference to any of the World War One to Hand points listed to the left, up to 4 marks in begin? Give as many - Austro-Hungarian Empire declared war on Serbia total. details as you can. - Germany supported Austro-Hungarian Empire/Russia and France support Serbia If information is appropriate but in a - Germany invaded Belgium to get to France confused order, deduct a mark. - Britain declared war on Germany to protect Belgium 3 List the main countries 4 Allies Central Powers Give one mark for reference to any of the on each side at the - Serbia - Austro-Hungarian countries listed, up to 4 marks. beginning of World War - France Empire One - Russia Although America not discussed in text, - Belgium accept as appropriate understanding. - America 4 Match the names to the 2 Herbert Asquith – British Prime Minister Give 2 marks for all 4 matched correctly. description of the Archduke Franz Ferdinand – Heir to Austro-Hungarian Give 1 mark for 1 or 2 matched correctly. people throne Give 0 marks for all incorrect. Kaiser Wilhelm II – Leader of Germany Gavrilo Princip – Serbian assassin 5 Who were the Black 3 - A group of assassins Give 1 mark for reference to each of the Hand and what was - The group were Serbian points. their mission? - Mission to kill Archduke Franz Ferdinand

343 6 What caused Britain to 3 - Germany invaded Belgium Give 1 mark for reference to each of the become involved in - Britain had promised to protect Belgium (under the points, up to 3 marks. World War One? Treaty of London 1839) - Britain gave Germany an ultimatum Information in brackets is optional – give - Britain declared war when Germany did not withdraw marks for the main information. their troops from Belgium/did not respond to their ultimatum Do not give mark for stating that English were allies with France as this was not a direct cause for Britain to become involved. 7 What were the 4 4 - Front Line Give 1 mark for each of the trenches listed. different types of - Reserve trenches called? Name - Support Accept various spellings as long as the word as many as possible. - Communication is recognisable.

Accept ‘contact trench’ for communication trench, due to the semantic similarity. 8 Trenches were 2 - Deep to protect soldiers (from enemy snipers) Give 1 mark for each of the following points. designed to be deep, - Twisted to prevent enemies from rushing through narrow and twisted. Information in brackets is optional – give Give two reasons marks for the main information. explaining why they were designed like this. 9 What infection did 3 Infection: Trench foot Give 1 mark for naming the infection, 1 mark soldiers often suffer Causes: Cold conditions/wet conditions for giving a cause, and one mark for naming from in the trenches? Prevention: Given (whale) oil (to rub on feet) the prevention. Can you describe how they caught it and how Information in brackets is optional – give they tried to prevent it? marks for the main information.

344 10 Order these events 2 1- British artillery bombardment of German trenches Give 2 marks for all 4 in the correct order which occurred during 2- British and French soldiers ‘go over the top’ Give 1 mark for 2-3 in the correct order the Battle of the 3- German soldiers climbed out of their dugouts to Give 0 marks for 1 in the correct place/none Somme, from the first prepare for battle in the right order. event (1) to the last 4- German soldiers attacked British and French soldiers event (4). 11 How many casualties 2 20,000 Give 2 marks for correct answer were there on the first Give 1 mark for answer with an extra day of the Somme? zero/missing a zero (2000 or 200,000). 12 What caused so many 2 - (British/French artillery bombardment was not effective) Give 1 mark for each of the points listed. casualties on the first because the shells did not explode day of the Somme? - Germans ready in dugouts to attack Information in brackets is optional – give marks for the main information. 13 Name 3 3 - More women were working Give a mark for any relevant changes, up to differences/changes - No bonfires/flying kites 3 marks. Examples from text listed to the between life before and - No giving food to chickens/horses left. during the war for - No gossiping about the army in public places people living on the - Men (aged 18-41) had to join the army Do not award marks for vague answers such Home Front? as ‘bombs’.

14 Why did more women 3 - Most men had gone to fight (conscription) Give 1 mark for each relevant point listed, begin working in - Munitions/weapons/ammunition needed to be made to up to 3 marks. munitions factories be sent to fighters (in France) after World War One - Because they wanted to help in the war effort Information in brackets is optional – give began? marks for the main information.

345 15 Describe the factory 2 Jobs: weighing ballistite/packing TNT/making ammunition Give 1 mark for any of the jobs listed. jobs which women did Effect: TNT turned women’s hands/faces yellow during the war, and the Give 1 further mark when any relevant effect that these jobs effect is given. had on the women doing them. 16 What was conscription? 2 - A law where men aged 18-41 had to go to war/join the Give 1 mark for giving the fact that men had army/fight in France to fight. Give 1 additional mark if specified the age of men . 17 What effect did 2 - Fathers had to go to war, leaving families at home Give 1 mark for the optional points given. conscription have on without them families during the war? - Fewer men to work on the Home Front/in Britain, so women took over jobs 18 What was school life 2 - Cold classrooms Give 1 mark for any of the points listed, up like for children during - Still sat exams to a total of 2 marks. the war? - Some schools closed to form temporary hospitals for soldiers - The children sat at long wooden desks - Children copied from the board into their books to learn - Children had to write lines if misbehaving

346 19 Order these events 5 1 – Archduke Franz Ferdinand assassinated – 1914 Give 3 marks for all answers ordered which occurred during 2 – Britain declare war on Germany – 1914 correctly. World War One, from 3 – The Defence of Realm Act was introduced in Britain – Give 2 marks if 4 to 5 ordered correctly. the first event (1) to 1914 Give 1 mark if 2 to 3 ordered correctly. the last event (6) . Add 4 – Conscription was introduced in Britain – 1916 Give 0 marks if no answers ordered correctly the year (1 9 1 4) in 5 – The Battle of Somme began – 1916 or only 1 is in the correct position. which these events 6 – The Battle of Somme ended – 1916 happened if you can. Give a further 2 marks for all dates correct. Give a further 1 mark if 2 or more correct dates are given. Give no further marks if 0 or 1 correct dates are given.

20 Match the different 4 Soldiers: 4 marks if all 8 experiences matched groups of people to Had to be on sentry duty during the night / Stayed in correctly. their wartime small villages in France before moving to the trenches 3 marks if 6 to 7 experiences matched experiences. Each Children: correctly. group of people can be Were given tough punishments for misbehaviour, such as 2 marks if 4 to 5 experiences matched matched to two having to write out the same sentence 100 times / Were correctly. different experiences. often sent to wait in long queues outside shops to buy 1 mark if 2 to 3 experiences matched food for their families correctly. Women: No marks if 0 to 1 experiences matched Worked with dangerous chemicals such as ballistite and correctly. TNT / Worked long hours from 7 o’clock in the morning to 7 o’clock in the evening. Country leaders: Made decisions about their countries actions during the war / Introduced rules for citizens during the war

347 Appendix I: Prior knowledge activities I(i) Intervention One (War Begins) prior knowledge activities

Activity 1: Participants asked to place country labels in correct place on a map of Europe in 1914. Activity 2: Participants notice that colours of country names placed on map represent two sides. Discuss meaning of ‘allies’ and ‘alliance’.

Activity 3: Participants read information about each historical figure, before filling in which country they are associated with.

348 I(ii) Intervention Two (Trench Life) prior knowledge activities

Activity 1: Participants asked to place labels in correct place on illustration of a trench.

Activity 2: Participants asked to place labels in correct place on bird’s eye view illustration of trench networks.

349

Activity 3: Participants shown an image of an artillery gun. Participants asked to discuss in groups what it is, before researcher explains what it is to the class. Discuss the meaning of the word ‘casualties’.

350 I(iii) Intervention Three (The Home Front) prior knowledge activities

Activity 1a: Participants asked to sort 8 photographs under two given

headings: The Home Front and the Western Front.

Activity 2: Participants look at four Home Front pictures and discuss where

they might have been taken.

Activity 1b: Labelled map of Europe (in 1914) shown to participants to introduce the concept of the ‘The Home Front’ and the ‘Western Front’.

351 Appendix J: Example script for War Begins intervention session

Black italics = to be read by researcher. Blue bold = instructions for researcher.

INTRODUCTION [5 minutes] Settle class (they should already be organised into their groups by class teacher).

We have got a lot to do in a short space of time today, so everybody needs to be on their absolute best behaviour, with their best listening skills. If I say 3, 2, 1, it means that I want all eyes this way and listening by 1.

Today, we are going to begin learning about the First World War, and we are going to be learning about how and why the First World War began.

Firstly, we are going to do some short geography tasks, in small groups, to help us to understand which countries and people were involved in WWI.

Then, we are going to read a text together about how WWI began.

After this, I’m going to give you some questions for you to discuss in your groups – you will be recording your discussions about these questions on an iPad. Unfortunately, I will not be able to discuss your learning with you today, but I am looking forward to listening back to all of your fantastic ideas that you record during your discussions, and talking to you about your learning after we have finished all of our sessions together.

TASKS [15-20 minutes] TASK 1

Open envelope 1, which you will find inside your group’s folder. Inside, you will have a map of Europe in 1914. It looks a little different to maps of Europe today. You will also have some arrows with countries on. These countries are some of the important countries involved in World War I. I would like you to place the arrows on the map, pointing to the correct country. You have 3 minutes.

Give children 3 minutes to work. Find relevant page on PowerPoint and stop children to look at answers. Bring up one arrow at a time on PowerPoint. Note that Belgium is between Germany and France. Encourage children to correct their own maps where necessary. Ask children the following questions and provide the following pieces of information.

Q1. Are there any places that you haven’t heard of before? - The Austro-Hungarian Empire was made up of many countries that we might recognise now, such as Austria, Hungary, Slovakia, Slovenia, and many others. The Austro-Hungarian Empire was had power over these countries in 1914.

352 Q2. What do you notice about the arrows? - The colour of each arrow shows whether countries were with the Allies or the Central Powers. Making an ‘alliance’, or being ‘allies’, is like being friends: it means that countries support each other. The red arrows show the Central Powers, and the yellow arrows show The Allies.

TASK 2

Now we’re going to look at some people who were important at the beginning of World War I, and which countries they came from. In your folder, you have 4 cards with different people on. Read about these people. Which country is each person from? Write this on the cards, using the board pen in your pack.

Give children 2 minutes to work. Find relevant page on PowerPoint and stop children to look at answers together.

READING OF TEXT [10 MINUTES]

Now we’re going to read the text together. You have some copies of the text in your folder if you want to follow along while I read.

Read text.

QUESTIONS [20 MINUTES]

Send children who are not participating out with teacher/LSA. Provide teacher/LSA with a copy of discussion questions to ask children. Give out iPads to remaining groups. Using PowerPoint slides, demonstrate how to open and work the audio- recording app. Give children a minute to open the app and begin recording.

We are going to record your discussions on this app. Unfortunately, we won’t have time to listen to all of your answers today, but I am going to listen to all of them later today.

Remember, you can look back at and use the text to help you answer your questions. Work as a team, and make sure that everyone gets a chance to speak.

Share first question on PowerPoint. Give children 5 minutes for this task. After five minutes, share second question. While children are answering, take photographs of how they ordered the cards for the first question. Continue through all questions until complete. Once all questions are complete, share with children how to save their audio file with the correct name. Instruct children to tidy up folders ready for the next groups (e.g. wipe cards clean, remove arrows from maps). Finish laptop recording and replace texts with alternative versions ready for the next group.

After both interventions are complete, transfer audio files and delete from iPads.

353 Appendix K: Narrative nonfiction and expository texts designed for interventions K(i): Intervention 1 (War Begins) NNF text War Begins

The winter sun glared brightly down on London one day in the early months of 1914. Despite the bitter cold, the city of London and its shipyards were busy and bustling; people rushed to and fro and the clank of metal echoed around the shipyards. Britain was a rich and powerful country. To make Britain even more powerful, Britain’s Prime Minister, Herbert Asquith, had made allies with Russia and France. Not only this, but the British Royal Navy also owned the biggest, fastest, most deadly warship around - the HMS Dreadnought. However, other countries were growing stronger too… Kaiser Wilhelm II, the leader of Germany, was envious of the British Royal Navy’s powerful warships. So that Germany could become more powerful, like their enemies, he ordered for many of his own powerful warships to be built in Germany. Germany also grew stronger by making allies with the Austro-Hungarian Empire.

Elsewhere, the skies were crystal clear on the morning of the 28th June, 1914, as crowds lined the streets of the city of Sarajevo, the capital of Bosnia. Austro-Hungarian flags hung from windows and balconies, fluttering gently in the light summer breeze. Children scrambled through the crowds, peeking through adults’ legs, whilst the adults stood tall, peering past each other. The crowds were hoping to catch a glimpse of an important visitor to the city: Archduke Franz Ferdinand. The crowds cheered loudly for the Archduke as him and his wife paraded slowly down the street in an open-top car. This was the man who was next in line to the Austro- Hungarian throne. He would soon become the Emperor of the Austro- Hungarian Empire, ruling over tens of millions of people. Unfortunately, not everybody gathered in the crowd liked the Archduke… A group of seven Serbian men stood together, watching and waiting, as the Archduke moved closer, and closer. They called themselves the ‘Black Hand’, and they were on a special mission. They did not want the Austro-Hungarian Empire to take over their country, Serbia, and to rule over them! Suddenly, one of the Serbian men drew his hand out of his pocket, brought it over his head, and sent a grenade soaring through the air towards the Archduke’s car. Luckily for the Archduke, the grenade skimmed off the back of his car, bouncing towards the car behind. BANG! The explosion missed the Archduke, but caused injuries in the car behind. Confused, the crowd did not know what the loud bang was. Was it danger, or another canon firing to celebrate the Archduke? The members of the Black Hand cautiously left the scene of the failed crime, whilst the Archduke was whisked away to a safer place. Once he had reached safety, it was decided that the final part of the route had to be changed, to avoid any further danger. Unfortunately, no one told the driver about this change to the route…

354 Strolling through the crowds, one of the members of the Black Hand stayed in the area to continue the mission. Thinking that perhaps the Archduke might continue the route as planned, Gavrilo Princip joined the crowds in a different part of the city. He waited. And waited. Until at last, he saw the Archduke’s car turn down the street, and start to crawl slowly towards him. Just as the car reached Princip, one of the men in the car realised that the driver had made a mistake. Urgently, he shouted at the driver to turn around, because the route had been changed. The driver stopped the car, directly in front of Princip. This was Princip’s opportunity. He raised his gun, aimed carefully, and fired. Two shots rang out, one hitting the Archduke, and one hitting the Archduke’s wife, Sophie. Mission accomplished.

The Austro-Hungarian Empire were not happy. They wanted revenge for the death of Archduke Franz Ferdinand. Exactly a month after the death of the Archduke, on the 28th July, they declared war against Gavrilo Princip’s home country, Serbia. Unfortunately, this put other countries in a difficult position. Because Russia and France were allies with Serbia, the two countries both promised to support Serbia. So Russia, France and Serbia were all on the same side, fighting against the Austro-Hungarian Empire. However, because Germany were allies with the Austro-Hungarian Empire, Germany promised to help the Austro-Hungarians. Because of this, Germany declared war on the Austro-Hungarian Empire’s two new enemies, Russia and France.

Kaiser Wilhelm II now had a problem. Looking at a map of the world, his new enemies, Russia, were to the North-East. Yet France were in the opposite direction. Two different wars to fight, in two different directions. Luckily for Kaiser Wilhelm II, he was quick to think up a solution. The Russian army was huge, so it would take a while for Russia to prepare their army to fight. Because of this, the German army would march to France first, by the quickest possible route. Tracing a finger across the map, this route took the German army straight through Belgium… To take this route, the German army would have to successfully invade innocent Belgium on their way to fight in France! Once they had conquered Belgium and France, the German army would return home, ready to fight Russia.

In a flash, Germany’s army were made ready to march to Belgium. That very same day, on the 3rd of August, Kaiser Wilhelm II’s German army began their invasion of Belgium. Unfortunately for Kaiser Wilhelm II, there was a slight problem with his plan. Seventy-five years before, Britain had promised to defend Belgium under the Treaty of London 1839. Kaiser Wilhelm II never thought that Britain would still care about something promised so long ago! But he was wrong. Britain cared very much, and kept their promise. On the same day that the German troops invaded Belgium, a message arrived from Britain for Kaiser Wilhelm II. An ultimatum. It was

355 signed by Herbert Asquith, the British Prime Minister. It read: ‘Withdraw your troops from Belgium by midnight, or Britain will act’. Kaiser Wilhelm II read the message. He did not respond. His army stayed in Belgium.

Back in London, Herbert Asquith waited for a reply, as the clock ticked closer and closer towards midnight. Still no reply. On the stroke of midnight, he had still not heard from Kaiser Wilhelm II. This could only mean one thing. On the morning of 4th August, Britain declared war on Germany…

356 K(ii): Intervention 1 (War Begins) ET War Begins Britain before the war. In the early months of 1914, the city of London was a busy place. Some of its busiest sites were the shipyards. These were full of workers and activity. Britain was a rich and powerful country. To make Britain even more powerful, Britain’s Prime Minister, Herbert Asquith, had made allies with two other countries, Russia and France. In addition to this, the British Royal Navy owned the biggest, fastest, most deadly warship around - the HMS Dreadnought. However, other countries were also gaining in strength. Kaiser Wilhelm II, the leader of Germany, was well aware of the British Royal Navy’s powerful warships. So that Germany could become more powerful and grow stronger, like their enemies, he ordered for many of his own powerful warships to be built in Germany. In addition, Germany also grew stronger by making allies with the Austro-Hungarian Empire.

The assassination of the Archduke. On the morning of the 28th June, in 1914, crowds of people lined the streets of the city of Sarajevo, which is the capital of Bosnia. People living in the city had hung Austro-Hungarian flags out of the windows and balconies of their houses. The crowds contained both children and adults. These crowds of people were there to see an important visitor to the city: Archduke Franz Ferdinand. People there on the day said that the crowds were cheering loudly for the Archduke. Him and his wife paraded down the street in an open-top car. This was the man who was next in line to the Austro- Hungarian throne. He was going to become the next Emperor of the Austro- Hungarian Empire, ruling over tens of millions of people. However, not everybody gathered in the crowd on that day liked the Archduke. A group of seven Serbian men were together, waiting for the Archduke to move closer towards them. They called themselves the ‘Black Hand’, and they were there on a special mission. They did not want the Austro-Hungarian Empire to take over their country, Serbia, and to rule over them. One of the Serbian men had a grenade in his pocket, which he threw at the Archduke’s car. The grenade bounced off of the Archduke’s car and moved towards the car behind it. The grenade then exploded. The explosion missed the Archduke, but caused injuries in the car behind. The crowds were confused because they were not certain whether the bang was a sign of danger, or another canon being fired to celebrate the Archduke. The members of the Black Hand left the scene of the failed crime. The Archduke was taken away to a safer place. Once he had reached safety, it was decided that the final part of the route had to be changed, to avoid any further danger. However, no one had told the driver that they had decided to change the route.

357 One of the members of the Black Hand stayed in the area to continue the mission. In case the Archduke might continue the route as planned, Gavrilo Princip went to join the crowds in a different part of the city. He waited there until he saw the Archduke’s car turn down the street. The car started driving towards where he was standing. However, when the car got closer to Princip, one of the men in the car realised that the driver had made a mistake. He told the driver to turn around because the route had been changed. The driver stopped the car, directly in front of Princip. This gave Princip the opportunity to raise his gun and fire. He took two shots, one hitting the Archduke, and one hitting the Archduke’s wife, Sophie. He had accomplished the mission.

Declaring war. The Austro-Hungarian Empire wanted to get revenge for the death of Archduke Franz Ferdinand. Exactly a month after the death of the Archduke, on the 28th July, they declared war against Gavrilo Princip’s home country, Serbia. This put other countries in a difficult position. Because Russia and France were allies with Serbia, the two countries both promised to support Serbia. So Russia, France and Serbia were all on the same side, fighting against the Austro-Hungarian Empire. However, because Germany were allies with the Austro-Hungarian Empire, Germany promised to help the Austro-Hungarians. Because of this, Germany declared war on the Austro-Hungarian Empire’s two new enemies, Russia and France.

Kaiser Wilhelm II now had a problem. His new enemies, Russia, were to the North-East. Yet France were in the opposite direction on the map. He had two different wars to fight, and these wars were in two different directions. Kaiser Wilhelm II was quick to come up with a solution. The Russian army was large, so it would take some time for Russia to prepare their army to fight. Because of this, the German army marched to France first, by the quickest possible route. This route took the German army marching straight through Belgium. This meant that the German army had to invade Belgium, who were not involved in the war, on the way to fight their war with France. Once they had conquered Belgium and France, the German army returned home, ready to fight the Russian army.

Germany’s army were quickly made ready to march to Belgium. On the same day, the 3rd of August, Kaiser Wilhelm II’s German army began their invasion of Belgium. However, there was a problem with Kaiser Wilhelm II’s plan. Seventy-five years before, Britain had promised to defend Belgium under the Treaty of London 1839. It is believed that Kaiser Wilhelm II ignored the possibility that Britain would still care about something promised so long ago. However, he was wrong about this. Britain had decided to keep to their promise. On the same day that the German troops invaded Belgium, a message arrived from Britain for Kaiser Wilhelm II. It was an ultimatum, that was signed by Herbert Asquith, the British Prime

358 Minister. It read: ‘Withdraw your troops from Belgium by midnight, or Britain will act’. Kaiser Wilhelm II did not respond to this message, and his army stayed in Belgium.

Herbert Asquith waited for a reply in London as it got closer to midnight. He still had not got a reply from Kaiser Wilhelm II by midnight. This meant that on the morning of the 4th August, Britain decided to declare war on Germany.

359 K(iii): Intervention 2 (Trench Life) NNF text Life in the Trenches. It was the bitterly cold winter of 1915. Harry Stinton and his Division of the British army were staying in a small village in France before they moved into the nearby trenches. The distant rumble of gunfire from the trenches reached the ears of the soldiers. This low sound, like a dog’s growl, was a reminder to the soldiers that they were travelling to the Front Line. The sun rose the following morning, casting a warming light over the soldiers. By 10 o’clock, they had begun their march towards the Front Line. All day they marched tiredly onwards. Along the journey, the soldiers noticed small changes which suggested that they were getting closer and closer to the Front Line. French civilians gradually became more scarce, whilst other soldiers became more common. For soldiers who looked closely, English and French guns could be noticed here and there, pointing up at the sky. Cleverly camouflaged, these guns were almost invisible to enemy planes. The rumble of gunshots grew louder, sounding angrier. As the sun began to set, slowly sinking from the sky, Harry’s division entered the nearest village. The village was in ruins, destroyed by the war waging around it. No lights were allowed. Strict orders were given for soldiers not to wander about. Instead, the soldiers hid themselves in the village. Once darkness had fallen over the soldiers, it was time for them to move on. Under the cover of darkness, the soldiers marched to the nearest reserve trenches. From here, they walked through the maze of reserve trenches, before winding their way through the communication trenches and support trenches, until they finally reached the Front Line. This was the most dangerous trench of all.

The following night, Harry stood on the firing step in the trench, peering out over the eerie darkness of no-man’s land, looking for danger… This was Harry’s first ever sentry duty. The edge of the trench, in front of his eyes, was ragged from shell-fire. Harry had to peek through the twisted tangle of barbed wire, which was all that separated him and the trench from no- man’s land. His feet and legs were shivering from cold; they had been soaked with the wet mud of the trench floors on his walk to the firing step. Many soldiers that Harry knew suffered from trench foot, a painful infection which caused soldiers’ feet to swell and hurt because of the cold, wet conditions. Doctors gave soldiers whale oil to rub on their feet, to try to prevent trench foot. Luckily, nights in the trenches were usually quiet, with very little gunfire. Suddenly, a small white light – a ‘Very Light’ – flew up into the air, briefly illuminating no-man’s land. Soldiers sent these up to check whether any enemy soldiers were prowling about in no-man’s land. However, no-man’s land was barren, with only a few dark shapes scattered across the ground: the dead from previous days of fighting. Harry focused his eyes to detect any movement out in no-man’s land. Everything was still.

360 If a single movement was detected, he had been ordered to shoot immediately, without question.

At long last, Harry’s hour of sentry duty was finally done. A bleary-eyed soldier appeared, ready to take the next sentry shift. Harry stumbled along the narrow, twisting path of the trench. Although trenches were designed like this to make it difficult for enemies to rush through them, it also made it difficult for soldiers to find their way through in the darkness of the night. He kept his head low: even though trenches were tall enough to protect men of an average height, Harry still knew of many who had been killed by deadly enemy snipers in the trenches. All because they were taller than the trenches were. Harry felt along the wall of the trench, quickly finding what he was looking for - an old blanket. He pulled this back to reveal a dugout. A flickering candle cast dancing shadows on the wall. Harry carefully stepped over the four men already sleeping, before finding a small gap in which he could rest. Then, he found as comfortable a position as possible to sleep in for the brief four hours before he would be called for sentry duty again. Using his pack as a pillow, he settled down.

The following summer, on the 1st of July 1916, about 100km away from where Harry had fallen asleep after his first sentry duty, mist settled over no-man’s land. Rows of soldiers sat in the Front Line trenches in the French village of Somme, their faces tense with anxiety as they waited for battle. They were preparing to “go over the top”. Ladders leaned against the trench walls, stretching up to the top of the trenches. The soldiers would soon climb up, scramble out of the trenches, and cross no-man’s land to attack the German Front Line. Their commanders had told them that they would be fine, because for the last five days, an artillery bombardment had rained down on the German Front Line. There shouldn’t be many Germans left alive to fight. The deafening sound of the artillery bombardment could still be heard that morning. Sharing final cigarettes and cracking jokes, the soldiers’ mouths smiled, but their eyes were full of fear. At 7:30am, the loud roar of the artillery guns stopped. A strange silence settled. It was time. The soldiers rose from their positions, grasping their guns, and began climbing up the ladders, into no-man’s land…

Little did the British and French soldiers know, the Germans were ready and waiting. Many of the artillery shells did not explode when they fell on the Germans. Because so many shells were needed so quickly, they had been made in a rush, and they had not been made very well. Prepared for an attack, the Germans had hidden in their dugouts. As soon as silence hit that morning when the artillery bombardment stopped, the Germans clambered out of the dugouts. They prepared their guns for the British and French soldiers. Sitting, waiting patiently for the British and French soldiers to approach, they were ready to attack… On the first day of the Battle of the

361 Somme alone, the British Army suffered 20,000 casualties. The battle didn’t end until the 18th November 1916.

362 K(iv): Intervention 2 (Trench Life) ET Life in the Trenches. Travelling to the trenches. Divisions of the British army would stay in small villages in France before they moved into the nearby trenches. The distant sound of gunfire from the trenches usually reached the ears of these soldiers. This sound was a reminder to the soldiers that they were soon going to be travelling to the Front Line. In the morning, the soldiers would wake up and begin their march towards the Front Line. They marched all day long, which was often quite tiring work. There were small changes that could be spotted along the journey which suggested that the soldiers were getting closer to the Front Line. French civilians gradually became more scarce and other soldiers became more common. There were also English and French guns pointing up at the sky in some places, which had been cleverly camouflaged to hide them from the view of the enemy planes. The sound of gunshots would also grow increasingly louder. When the sun began to set, the division would enter the village nearest to them. These villages were occasionally just ruins, which had been destroyed by the war which was waging around them. No lights were allowed, and strict orders were given for soldiers not to wander about. Instead, the soldiers would hide themselves somewhere in these villages. Once it was dark, it was time for the soldiers to move on. While it was still dark, the soldiers would march on to the nearest reserve trenches. From here, they continued to walk through the reserve trenches, then through the communication trenches and finally through the support trenches, until they reached the Front Line. This was the most dangerous trench of all.

Life in the trenches. During the night, soldiers stood on the firing step in the trenches to watch out over the dark no-man’s land and look for danger. This was called sentry duty. The edge of the trench was ragged from shell-fire, and the soldiers had to peer through the tangled barbed wire which separated them and the rest of the trench from no-man’s land. Soldiers usually had very cold feet and legs; they were often soaked with the wet mud of the trench floors on their walk to the firing step. Many soldiers suffered from trench foot, a painful infection which caused soldiers’ feet to swell and hurt because of the cold, wet conditions. Doctors gave soldiers whale oil to rub on their feet, to try to prevent trench foot. The nights in the trenches were often quiet, with very little gunfire. Small white lights – which were called ‘Very Lights’ – were sent up into the air, to illuminate no-man’s land. Soldiers sent these up to check whether any enemy soldiers were moving about in no-man’s land. However, no-man’s land was usually barren, with only a few dark shadows on the ground: these were the dead from the previous days of fighting. Soldiers on sentry duty had to focus their eyes to detect any movement in

363 no-man’s land. If any movement was detected, they were ordered to shoot immediately, without question.

Once an hour of sentry duty was done, another soldier would arrive ready to take the next sentry shift. The soldier who had just finished a sentry shift would walk along the narrow, twisting path of the trench. Although trenches were designed like this to make it difficult for enemies to rush through them, it also made it difficult for soldiers to find their way through at night when it was dark. Soldiers also had to keep their heads low: even though trenches were tall enough to protect men of an average height, many soldiers were still killed by enemy snipers in the trenches, because they were taller than the trenches were. Along the walls of the trenches, old blankets would be hanging. These blankets hung in front of dugouts, which were lit by candles. Often, about four men would sleep in a dugout at the same time. Soldiers would have to find a small gap in which they could rest. The soldiers would have four hours to sleep as comfortably as possible before being called up for sentry duty again. They would try to get comfortable, using their packs as pillows.

The Battle of the Somme. On the 1st of July 1916, rows of soldiers sat in the Front Line trenches in an area of France called Somme. Video recordings of the day showed that the soldiers’ faces looked anxious, probably because they were waiting to begin a battle. They were preparing to “go over the top”. Ladders had been put in place against the walls of the trenches, reaching up to the top of the trenches. The soldiers would soon climb up and out of the trenches, crossing no-man’s land to attack the German Front Line. Their commanders had told them that they would be fine. For the last five days, an artillery bombardment had attacked the German Front Line. This meant that there should not be many Germans left alive to fight. The artillery bombardment could still be heard that morning. The soldiers were said to have shared final cigarettes and cracked jokes. Although they were smiling in the video recordings, many soldiers’ eyes showed their fear. At 7:30am, the sound of the artillery guns stopped and everything was silent. The soldiers stood up, prepared their guns, and began to climb up the ladders and into no-man’s land.

However, the British and French soldiers did not know that the Germans were waiting for them. Many of the artillery shells did not explode when they fell on the Germans. Because so many shells were needed so quickly, they had been made in a rush, and had not been made very well. Also, the Germans were prepared for the attack and had hidden in their dugouts. As soon as the artillery bombardment had stopped that morning, the Germans had climbed out of their dugouts and had prepared their guns for the British and French troops. They waited for the British and French soldiers to approach, and then attacked them. On the first day of the Battle of the

364 Somme alone, the British Army suffered 20,000 casualties. The battle did not end until the 18th November 1916.

365 K(v): Intervention 3 (The Home Front) NNF text The Home Front The sun rose over the Home Front, casting its warming rays over the streets of London. Birds twittered in the trees as Ethel May Dean emerged from her house to begin her journey. The Home Front was a very different picture to the Western Front. Although parts of life carried on in the same way as before the war, there were also many changes to Ethel’s life, both small and large. Some small changes, introduced by the Defence of Realm Act in 1914, included no flying kites or lighting bonfires (these might attract dangerous enemy airships) and no feeding bread to horses and chickens (this was a waste of precious food!). Another rule was no gossiping about the army in public places (positive thoughts were important). But Ethel was part of a much bigger change too. That morning, she joined thousands of workers collected outside the Royal Arsenal in Woolwich – one of the largest munitions factories in Britain – waiting patiently for the day’s work to begin. Although this wasn’t a big change itself, there was something unusual about this scene. Something had changed since the war had begun… most of the workers collecting outside the Royal Arsenal were not men, but like Ethel, they were women! Before the war began, men usually worked while women often stayed at home to care for the children and look after the household. When the war began, many men left home to fight for their country, whilst some chose to stay with their families. That is, until January 1916, when conscription was introduced. This was a law which forced all men between the age of 18 and 41 to leave home and fight in the war, whether they wanted to or not! After this, fewer men were left at the Home Front. This meant that fewer men were working in the munitions factories, creating weapons to send those brave soldiers fighting on the Western Front. Because of this, there was a massive shortage in the weapons that the British army needed. This is where women proudly stepped in. Beginning to work in the munitions factories, they were in high spirits. They were fighting their own battle, by making weapons and shells to send off to the British soldiers fighting for their country, and for their lives.

Inside, the Royal Arsenal was buzzing with activity. The noise of the work and machines was deafeningly loud. Most of the jobs inside the Royal Arsenal were very repetitive, such as making shell cases, operating machinery and filling bullets. Ethel took her place at a long bench of women. Just a small pair of scales sat in front of her. Her job was to weigh a brown-coloured powder called Ballistite, and to fill up small bags of it. The Ballistite was made of two different types of explosive…

Ethel had a long day ahead of her. Having started work at 7 o’clock in the morning, she would not finish until 7 o’clock in the evening. And she earned less money than the men who had worked in this job before her! Even

366 though this seemed unfair, when Ethel started working at the Royal Arsenal there was one job which she felt lucky that she was not given: packing TNT. It was a similar job to her own, but with one big difference. Although the TNT powder was white, every single item that the powder touched would turn… yellow! Even the hands and faces of the women who had to fill bags with TNT turned yellow. These women had their own canteen to eat in, where many of the trays and tables were also turning yellow. Ethel had heard nicknames for these girls: ‘canaries’, due to the yellow colour of their skin. As night fell, Ethel joined the crowd of weary workers which flowed out of the Royal Arsenal, whilst fresh workers arrived to replace them for the night shift.

As Ethel resumed her work the next morning, George Butling sat down at his kitchen table in Liverpool, pulling his writing materials closer. He didn’t have the cheeky smile of his younger brother Eric, and his eyes shone with a seriousness unusual in a 14-year-old boy. Although unusual, this seriousness was not unsurprising, as George’s father had recently been sent to fight in France. This left George, his younger siblings and his mother to look after the family home. Writing was the only way to keep in contact with their father. George picked up his pen, beginning the letter to his father in the usual way. ‘Dear Father, I hope that you’re in good health…’ His pen sped up, racing across the page as he told his father the latest news from the Home Front…

Unfortunately, not all of George’s news was positive. He began by writing about the price of food going up, yet again! Potatoes, salmon, butter, onions, carrots and cabbages were all becoming more and more expensive. George had read the papers and knew that German submarines, called U- boats, were attacking the ships which carried food into Britain. They did this so that British civilians would have less to eat. Because not much food was arriving in Britain, the food that did arrive became more expensive to buy. As food prices rose, people in George’s neighbourhood panicked. They rushed to the shops, buying lots of supplies of food in case it ran out. Queues stretched on for what seemed like eternity outside of shops. Sometimes, when mothers didn’t want to queue themselves, they let their children skip school to stand in the queues instead! Luckily for George, to stop children from skipping school to queue for food, the first school dinners were introduced.

Even though the German submarines were stopping food from reaching Britain, at least they weren’t harming British soldiers or civilians. As George’s younger brother Ben said, ‘We’re not dead yet!’ Ben was only 3, but he was always so smart and optimistic. Also, just a few weeks before, the family had come up with a plan in case the food prices kept rising… One day, sweating as they worked, they pulled up the paving stones, clearing the garden to make space for their brand new allotment. They planted

367 London Pride, parsley, mint, marigolds and a variety of other herbs and plants. George and Eric had heaved bags of leafmould over from a nearby neighbour. Leafmould would help the plants to grow healthily. If the family struggled to buy food, they could grow it! More families were doing this after a book about looking after your own allotment had been published. This book said that if you could not fight in the war, you could at least make sure that everybody had enough food!

George was also eager to hear back from his father, because he had recently sent him his school report, which glowed with praise. During the war, school went on as usual for George, although he’d heard that some other schools were closed to make temporary hospitals for soldiers. He was so glad that his school was still open, because he loved school, even though it could be tough sometimes. Most days, he sat amongst his classmates at a long row of wooden desks, copying what the teacher wrote on the blackboard into his exercise book. In complete silence. He worked so hard to memorise this information, and this was how most of his learning happened. Behind him, at the back of the classroom, some children shivered with cold; they were too far away from the iron stove by the teacher’s desk to keep warm. Recently, George had lots of exams at school, so he’d been working extra hard to become top of the class! Luckily for George, because he was so hardworking, he missed the punishments that some children got. It was rumoured that one punishment involved staying after school to write out ‘I must not chatter in lesson time’ one hundred times. Just for being caught talking!

368 K(vi): Intervention 3 (The Home Front) ET The Home Front Changes to life on the Home Front. During the First World War, the Home Front was a very different place to the Western Front. Although parts of life still carried on in the same way as they did before the war had begun, there were also many changes to civilians’ lives. Some of these changes were small, and some of them were much larger. Some small changes to life were introduced by the Defence of Realm Act, in the year of 1914. This Act introduced some new rules for citizens. An example of one of these rules was no flying of kites or lighting of bonfires, as both of these activities might attract dangerous enemy airships. Another rule stated that people could not feed bread to horses and chickens as this was a waste of food. A further rule ordered that there was to be no gossiping about the army in public places, because positive thoughts were important. One of the larger changes to life on the Home Front involved work. Thousands of people worked at the Royal Arsenal in Woolwich, which was one of the largest munitions factories in Britain. However, this was not the big change – the big change was that since the start of the war, most of the workers at the Royal Arsenal were not men, but women. Before the war had begun, men usually went to work while women often stayed at home to care for their children and to look after the household. The war began, and whilst many men left home to fight for their country in the war, some men chose to stay at home with their families. In January 1916, something called conscription was introduced. This was a law which meant that all men between the age of 18 and 41 had to leave home to fight in the war for their country. After this law was introduced, fewer men were left at the Home Front. This meant that fewer men were working in the munitions factories, creating weapons to send to the soldiers fighting on the Western Front. This led to a shortage in the weapons that the British army needed. Women living on the Home Front began to work in munitions factories instead. They created weapons and shells to send off to the British soldiers who were fighting for their country.

Women workers. The Royal Arsenal was a very busy place inside. The sound of the work and machines made it a very noisy place to work. Most of the jobs inside the Royal Arsenal were very repetitive. Examples of such repetitive jobs include making shell cases, operating machinery and filling bullets. Workers often sat at long benches with lots of other women. For one job, women had a small pair of scales on the bench in front of them. Their job was to weigh a brown-coloured powder called Ballistite, and to fill up small bags of it. The Ballistite was made of two different types of explosive. The days were long, with women starting work at 7 o’clock in the morning and not finishing work until 7 o’clock in the evening. The women also earned less money than

369 the men who had been doing these jobs before them. There was one job which many women avoided, if possible, when they started working at the Royal Arsenal. This job was packing TNT. It was similar to weighing and packing Ballistite. However, the TNT powder was a white colour, and everything this powder touched would turn yellow. This included the hands and faces of the women who had to fill the bags with TNT. These women even had their own canteen to eat in, where many of the trays and tables would also turn yellow. The women who worked with TNT were nicknamed ‘canaries’, due to the yellow colour of their skin. In the evening, the day workers would leave the Royal Arsenal. Fresh workers would arrive at the same time to replace them for the night shift.

Family Life. While many women began to work during the war, many children’s fathers were sent to fight in the great war in France. This meant leaving their wives and their children to look after the family home. The only way for wives and children to keep in contact with their husbands and fathers while they were away was to write letters to them. In these letters, families would often ask about the health of their husband or father, and would also tell them all about the latest news that they had from the Home Front.

Food. Not all news from the home Front was positive. Food prices kept on going up during the war, with items such as potatoes, salmon, butter, onions, carrots and cabbages all becoming more expensive. During the war, German submarines, which were called U-boats, attacked the ships which carried food into Britain. They did this so that British civilians would not have as much food to eat. Because not as much food was arriving in Britain, the food that did arrive became more expensive to buy. With food prices rising, people often panicked, and many people rushed to the shops to get lots of supplies of food in case the food ran out. This created long queues outside of shops. Mothers quite often allowed their children to take time off of school to queue for the food. Because of this, the first school dinners were introduced, to encourage children not to skip school to queue for food.

Although the high food prices could make life more difficult for families, whilst the German submarines were trying to prevent food from reaching Britain, they were not attempting to harm the British soldiers or civilians. Many families had also planned to create their own allotments in their gardens in case the food prices continued to rise. They would pull up the paving stones in their gardens and clear the rest of their gardens out to make space for these new allotments. They would plant a variety of different herbs and plants. These might include London Pride, parsley, mint, marigolds, and many more. If they could, families would also collect leafmould from different places, which could be used to help the plants to grow healthily. With these allotments at home, if families were struggling to

370 buy food for themselves, they could try to grow it instead. More families were doing this and growing their own food after a book about looking after your own allotment was published. This book said that if you could not fight in the war, you could at least make sure that everybody had enough food!

School life. Sometimes, children would send their fathers, who were fighting on the Western Front, their school reports to read. During the war, most schools carried on as usual, although some schools were closed in order to make temporary hospitals for soldiers. Although lots of children enjoyed school, it could be tough for all children who attended. The class sat in long rows at wooden desks, copying what the teacher wrote on the blackboard into their own exercise books. This was done in silence. Children were expected to memorise this information, and this was how most of their learning at school happened. At the back of the classroom, children shivered with cold. This was because they were too far away from the iron stove by the teacher’s desk to be able to warm up. Schools still had their children sitting exams during the war. This meant that the children were still expected to work hard and to achieve highly. If children were hardworking, they could avoid the punishments that some children were given. An example of one punishment is having to stay behind after school to write out the sentence ‘I must not chatter in lesson time’ one hundred times. This was given out if children were caught talking during their lessons.

371 Appendix L: Discussion questions for each intervention Question Historical Historical knowledge thinking skill War Begins Q1. Look at the six cards in your folder. Put Substantive Chronological these events into the order that they happened, starting with the first and earliest event. If you can, label each event with specific dates. Note that this question was not coded. Q2. Who killed Archduke Franz Ferdinand, and Substantive Conceptual how did this happen? Q3. Why do you think that Kaiser Wilhelm Second-order Causal wanted to support the Austro-Hungarian Empire when they declared war against Serbia? Q4. Why did Germany invade Belgium? Substantive Causal Q5. What could Germany have done to avoid Second-order Causal going to war with Britain? Trench Life Q1. Describe a soldier’s journey to the Front Substantive Conceptual Line. Q2. Do you think that soldiers enjoyed their Second-order Conceptual time in the trenches? Q3. What was sentry duty, and what did Substantive Conceptual soldiers have to do on sentry duty? Q4. What happened during the Battle of the Substantive Conceptual Somme? Give as much information about the battle as possible. Q5. Why was the British/French artillery Substantive Causal bombardment of the German Front Line during the Battle of the Somme unsuccessful? The Home Front Q1. What extra rules/laws were introduced on Substantive Conceptual the Home Front during the war, and why were Partially these introduced? causal Q2. More women began to work in munitions Second-order Conceptual factories during WWI. Do you think these women were happy working in factories or not? Why do you think this? Q3. Who were nicknamed ‘canaries’ during the Substantive Conceptual war, and why? Partially causal Q4. Why did food prices rise during the war? Substantive Causal What did families and schools do about this? Q5. How was school life different for children Substantive Conceptual during the war, in comparison with today?

372 Appendix M: Additional chronological sequencing task cards

Germanyof its own built powerful many Archduke Franz The Austro-Hungarian warships. Ferdinand was Empire declared war

Date: ______assassinated. on Serbia. Date: ______Date: ______

Britain declared war Germany declared Germany invaded Belgium to get to on Germany. war on Russia and France. France. Date: Date: ______Date: ______

373 Appendix N: Function coding scheme

Function coding.

This coding scheme explores how participants construct historical understanding. It focuses on the processes that participants go through in their discussions in order to make sense of what they have learnt, and to express this coherently.

This coding scheme is to be used to code questions 1 to 5 in each transcript, with the exception of transcripts from the first intervention (War Begins): on these transcripts, this coding scheme is to be used to code questions 2 to 5 only.

Note that a ‘turn’ refers to one or more lines of the transcript during which one participant is speaking, without interruption. A new turn begins when another participant begins speaking.

Coders will:

1. Locate the line numbers listed in the Excel document on the transcript. 2. Decide which specific code the line numbers identified are most relevant to. Record both the overarching code and the specific code (described below).

Outline of coding categories:

Textual Understanding: Relates to how participants – individually and collaboratively – construct meaning and appropriate historical understanding, using both the text and each other.

Referencing: Explores what participants refer to when discussing answers, including prior knowledge, experience, the prior knowledge activity materials, and so forth.

Navigation: Relates to how participants navigate the texts to locate information in order to answer questions.

374 Overarching Specific codes Explanation of code Example of code code Children read answer directly from the text and do not ‘On the morning of the 28th June, in 1914, elaborate. Identifiable partly from reader’s intonation, but crowds of people…’ Direct reading also when wording is identical to the text used in the intervention. A child reads a small section of the text/refers to a specific ‘No it says in the text that….’ section of the text before elaborating on this or discussing [Reading from text] ‘On the morning of this in their own words. Here children are using the text to the 28th of June….’ [stops reading from Support support their own explanation. However, children must text] ‘Oh wait, yes it was Gavrilo Princip add to what they have read, rather than simply repeating on 28th of June’ it. Only code as ‘support’ if the same child reads from the text and provides additional explanation. Textual A series of at least 3 turns where each child’s turn relates Child 2: But it missed and got the car Understanding to the last child’s turn. Creates a larger unit of meaning, or behind it. And then it went over but. Child 3: They changed the route. a more detailed unit of meaning, than children’s individual Child 2: The driver didn’t know the route. turns could. Ends when the topic is changed, or when So he stopped right in front of Gavrilo Cumulative there is a turn which does not relate to the previous Princip. utterance. Not when children are listing ideas related to Child 3: They shot a bullet at him. the question, which do not relate to each other’s ideas. [Cumulative] Collaborative Cumulative: Children simply add to each other’s turns, Child 2: Um, the actual person. How did with no corrections or questioning of each other’s turns. he die? He died by a grenade didn't he? Negotiation: Involves children questioning/correcting Child 1: He didn't die by a grenade each other’s utterances. Critical but constructive. Child 2: No but Child 1: No that's the car behind Negotiation Must involve exchanges of questioning/corrections, not Child 2: Yeah that's the- oh yeah one child questioning/correcting another. Such utterances Child 3: It's Prin- it's Princip will be coded as ‘assessing accuracy’ (see below). [Negotiation]

375 Children challenge, question or correct others’ ‘Did they sleep on sandbags?’ understanding. Involves two turns: one mistake being ‘I don’t think that’s right’ made and one correction/challenge being made. This Assessing accuracy does not always have to be a correct challenge. Greater discussions involving multiple disagreements/corrections will be coded as collaborative (negotiation), as described above. When a child recognises that what they have been saying ‘They were friends with Russia… oh hang Self-correction has not been correct, and corrects themselves on, no with the Austro-Hungarian Empire’ accordingly. Children question/imagine hypothetical ‘Yeah but what if Britain went to their situations/possibilities in relation to events and country?’

information in the text. ‘Imagine how many people died’ Often signalled by the phrases ‘what if….’, ‘imagine…’, or Hypothetical reasoning ‘if I were…’ ‘If I was in WWI I wouldn’t have survived for longer than a day’ This does not include inferencing from the text, but refers to questioning/imagining possible, hypothetical situations only. After children have gradually built an answer to the [After much discussion over multiple question over the course of multiple turns, one child turns] ‘So, what we’re saying is basically Summarise that Gavrilo Princip killed the Archduke might pull this information together in a single turn, with a gun and then shot his wife too’ recapping/summarising what has been discussed. Children discuss or question information that is not Child 1: Gavrilo Princip was shot present in the text. Child 2: Yeah but where? Child 1: Where? It was in Absent information Child 2: Like in the head?

‘Were the trenches small?’

376 Overarching Specific codes Explanation of code Example of code code Children draw on a notion of prior knowledge, whether ‘I know that women worked in munitions accurate or not, relevant to war. Information that the child factories’

Prior knowledge gives is not something which can be inferred from the text, but ‘They had prisoners of war for four years’ something which is likely to have been obtained elsewhere. This includes knowledge that participants might have gained from the prior knowledge activities. Personal experiences Children refer to experiences in their own lives when answering ‘Like when you’re in a really long traffic jam’ (in own lives) a question. [Compared to queueing for food during WWI] Personal experiences Children refer to the experiences of their relatives during the ‘My Grandad was in World War 2’ (of grandparents/ war, even if it is simply stating an involvement of a relative in a relatives) war. Children make connections between different parts of the text Q. What rules were introduced on the Home Front when explaining an answer. during the war? Referencing ‘No feeding bread to horses or chickens, because Within texts food supplies were short because of the German U-boats’ (The underlined information comes much later in the text but the child has linked it to this previous rule). When children make relevant references to information from ‘His Dad might have been in the Battle of Somme Between texts previous texts to elaborate their answers to questions relating like Harry Stinton was’ [during discussion in Intervention 3 on the Home Front]. to later texts. Children make a general reference to prior knowledge tasks or ‘Look at the map’ the texts to support their answer. Children do not discuss actual ‘Let’s read the text again’ information from the text/activities, but simply refer to these Materials resources more generally for support: if they do discuss information, it should be coded as one of the relevant textual understanding codes, or as a prior knowledge code.

377

Overarching code Specific codes Explanation of code Example of code Navigation Scanning Participants show evidence that they are using ‘So Britain Britain Britain, where’s Britain?’ scanning techniques to locate specific [Reads text under breath]

information, e.g. scanning the text for relevant words in order to find information to answer a ‘Let’s just look for dates on the page and see question. what they are for’ Paragraph Participants refer directly to a particular ‘I remember, it’s in the first paragraph’ paragraph which they remember/know is relevant to question. Does not include when a child asks which paragraph certain information is in. General stage of Participants refer to a general ‘part’ of the text ‘That part was at the end of the story’ text when retrieving information. Does not include questioning/wondering where information is in the text. Page Participants refer directly to a particular page ‘So this one goes last for definite because it’s on which they remember/know is relevant to the last page of that’ question. Does not include questioning/wondering which page it is on (e.g. I can’t remember which page it’s on). Sub-headings Participants refer to sub-headings in the text to ‘No this is the designing bit’ locate relevant information. [Trench design was one of the sub-headings – design was not mentioned anywhere other than in the actual sub-heading itself]

378 Appendix O: Content coding scheme

Content coding. This coding scheme aims to identify any historical understanding that participants express, for instance, in terms of historical facts, information and thinking skills. It also explores the depth and accuracy of this understanding. This coding scheme focuses on the content of the transcripts.

This category of coding is to be used to code questions 1 to 5 in each transcript, with the exception of transcripts from the first intervention (War Begins): on these transcripts, this category of coding is to be used to code questions 2 to 5 only.

For each utterance (line numbers for each utterance are listed in Excel document):

1. Select one of the three historical thinking codes (outlined in table i, below). : Each utterance can only be coded as one of the categories. : For each discussion question, the historical thinking code that the question intended participants to draw on will be listed in the Excel document: it is expected that most utterances in response to the question will be coded as the stated historical thinking code. However, utterances may be coded as a different historical understanding code where necessary.

2. Code as either recall or inference (outlined in table ii, below) : Once more, for each discussion question, it will be listed whether the question was intended to encourage participants to recall information or to produce inferences. However, this does not mean that utterances must be coded as that category.

3. If recall was chosen, select a third and final code describing the accuracy/depth of the recalled information (outlined in table iii, below). If inference was chosen, no further code needs to be assigned.

379 Table i: First Level Codes - Historical Thinking Conceptual thinking Definition: When an utterance expresses some form of general, historical thinking, which is not chronological or causal in nature.

Reference to the feelings/thoughts of figures in the texts are included in this category (e.g. It must have been very, very boring).

Chronological thinking Causal thinking Definition: When an utterance expresses Definition: When an utterance expresses ideas that are causal in nature. ideas which relate to time or chronology. Causal statements are always formed from at least two parts: These might include: : a cause and an event OR an event and a consequence • Reference to dates e.g. The shells weren’t made properly so they didn’t explode. • The sequencing of a number events, often with the use of time They may also be expressed in more than two parts: conjunctions : A cause and an event and a consequence e.g. The shells were made in a rush so they didn’t explode which meant that not many Germans Signalling words: Specific dates, ‘then’, were killed. ‘next’, ‘after that’ Participants might not always express both parts of the causal relationship when talking; one part may be already expressed in the question or by another participant, and the speaker omits this. e.g. Q: Why did Kaiser Wilhelm support the AH Empire? A: Because they were allies.

Signalling words: Often signalled by conjunctions such as ‘because’, ‘so’ and ‘if’. However, these do not always signal causation, but might signal, for example, participants justifying their opinions in some cases.

When ‘because’ is clearly used for purposes other than signalling causation, code as one of the other 2 categories.

380 Table ii: Second Level Codes - Historical Knowledge Recall Inference Recall codes are assigned when participants are recalling When participants use information from the text to make their own, appropriate information directly from the text. They are an indication inferences in order to answer a question, rather than recalling information directly of substantive knowledge. from the text. Includes inferences about the feelings/perspective of characters which are not referred to in the texts, e.g. [Life in the trenches] must have been very, very Information that participants discuss must be explicitly boring. mentioned in the texts, although participants will discuss it in their own words. Looking back through texts will help Note that, for questions which intend inferences to be produced (as labelled, e.g. ‘Did to clarify when information is recalled. soldiers enjoy their time in the trenches?’), if participants give a yes/no answer only, or repeat the phrasing of the question only (e.g. I don’t think they enjoyed it) this is Note: Some information might be incorrectly recalled not categorised as inferencing. Participants must either justify their yes/no answer (this will be explained further in relation to the third level (e.g. ‘Yes because they spent time with their friends’), or provide a more specific codes, below). In this case, recalled information will not feeling/emotion (e.g. I think that they were bored). appear in the text but will bear resemblance to information in the text. If children state information which is not in the text, but which is neither an Examples: appropriate inference to make regarding the text, this utterance must be coded as an Conceptual thinking recall: Soldiers got trench foot. incorrect recall utterance (see third level codes below), not as an inference utterance. Chronological thinking recall: The Battle of the Somme began on 1st July 1916. If a child makes an inference which is not relevant to the question they are answering, Causal thinking recall: The shells weren’t made properly but it still appropriate in relation to the text, it should still be coded as an inference. so they didn’t explode. Examples: Historical understanding inference: [Life in the trenches] must have been very, very Recall codes are then assigned a third level code, boring. assessing the accuracy/depth of the recalled information. Causal inference: Kaiser Wilhelm II could have replied back to Herbert Asquith’s letter These will be discussed in more detail below. to prevent war. Chronological inference: No examples of chronological inference were found.

381 Table iii: Third Level Codes – Accuracy/depth of historical understanding Note that the same five third level codes (listed in the left column) are applied to each recall utterance. Below are descriptions of how these codes might look for each of the three historical thinking codes. Examples are given where the code has been identified in transcripts (some were not found to occur). Third level codes Conceptual thinking Chronological thinking Causal thinking Basic One to two units of complete Participant responses: A simple explanation of a causal factor, either Appropriate responses information given. a) refer to a single date stating the cause or consequence of an which contain basic e.g. ‘Archduke Franz Ferdinand got e.g. ‘It happened on the 28th June’ event, or both. May be a causal explanation information. shot by Gavrilo Princip’ b) sequence two significant events in relation to a question, or to another correctly, where the order of these participant’s comment. events is important to understanding (Note: One ‘part’ of the causation may be referred to e.g. ‘Germany invaded Belgium and then Britain in the question, and therefore the child may not repeat declared war’ it. E.g. child may begin with ‘because…’) e.g. ‘Because Germany needed to get through Bel- to France through the quickest possible route’ Partial Partial/vague information Participant gives a partial explanation of A partial explanation of a causal factor – Participant’s response relating to the question. chronological information. Sometimes participants have got a ‘correct’ idea but it is draws on relevant Sometimes signalled by the use signalled by the use of ambiguous not fully explained. Sometimes signalled by information, but this is not fully explained, is of ambiguous pronouns. pronouns. the use of ambiguous pronouns. vague, or important e.g. ‘They could have responded’ [when e.g. ‘1914!’ [In which case there is no prior e.g. Stating that Germany invaded Belgium ‘to get to information is missing. the pronoun ‘they’ does not refer back utterance to suggest what this refers to, although France’. to a previously stated noun]. the date is relevant]. Mixed Participants draw on accurate Participants: Where accurate causal information is given, Participants use information, but that is a) Discuss a sequence of events, with but this information is confused with historically accurate inappropriate in relation to the some in the correct order and some in information found elsewhere in the text. information in their question being answered. the incorrect order e.g. Stating that Kaiser Wilhelm supported the Austro- answer, but apply this Hungarian Empire because ‘they made a promise years incorrectly or e.g. Q. Describe a soldier’s journey to b) Use relevant dates from the text but in the Front Line ago’ (a reference to Britain’s promise to Belgium). inappropriately in relation to the wrong events relation to the question A. ‘Soldiers got trench foot’ asked.

382 Third level Conceptual thinking Chronological thinking Causal thinking codes Complex Three or more distinct but related Participants: Explanation linking multiple causal factors Participant’s pieces of historical information are a) sequence three or more significant (e.g. two or more causes of an event; two or response links presented. events correctly, where the order of more consequences of an event). multiple, e.g. ‘Germany had to invade Belgium because they distinct ideas. these events is significant e.g. ‘There were four different types of e.g. ‘Germany invaded Belgium and then Britain needed the quickest possible route to France because trenches, the communication trenches, sent Germany a note telling them to withdraw. they needed to get back to fight Russia’ support trenches, reserve trenches and the After that, Britain declared war’ Front Line trenches’ b) reason, using information in the text, to support their chronological response. ‘Actually I think it was, the following summer, no it says the following summer. That's when they started the actual war’ Incorrect Information is not found in the text. Participants order a series of events A causal explanation which is not relevant or Participant’s Information has been invented, or is incorrectly/give an incorrect, unrelated accurate in relation to the texts. May be a response is the result of an inappropriate date in relation to an event. broad inference, but one which cannot be incorrect in relation to the inference. directly inferred from the text. texts. e.g. ‘They used sandbags as pillows’ e.g. ‘Germany wanted more land’

A statement might also be coded as incorrect if only some of the statement is correct, but the rest is incorrect. In the example below, it wasn’t just women who weren’t allowed to, which makes this statement incorrect overall. e.g. ‘Women weren’t allowed to gossip in public places’

383 Appendix P: Truth-value coding scheme

This coding scheme explores how reliable participants believe the texts are (the truth-value of texts), and their justifications for their perceptions. It is to be used to code question 6 only in each transcript.

Coders will: 1. Locate the line numbers listed in the Excel document on the transcript. 2. Decide whether the utterance is indicating perception of the truth-value of the text or the justification for this perception. • If the utterance relates to a participant’s perception of the truth-value of a text, assign the ‘degrees of truth’ code category. Then assign a specific code describing how reliable participants felt texts were (see table i below). NOTE: Each participant can be assigned only one ‘degrees of truth’ code category per transcript. If participants express multiple opinions, code the final opinion that they give in the transcript. Participants must be explicit in their statement in order for it to be coded: the coder should not interpret what the participant means if it is not clear, but rather, should leave the utterance uncoded. • If the utterance is a justification, choose which of the remaining code categories it best fits into. Then assign an additional, specific code (see table ii below).

Table i: Code category for utterances where participants state truth-value judgements Code category Specific code Explanation of code Example of code Degrees of truth Participants believe that the entire text is ‘I think it’s all true they wouldn’t Entirely factual give us a fake text’ true/reliable.

Relates to how much of the text participants feel is true/fictionalised. Participants believe that there are both true ‘So some of this is true and some Partially factual and fictionalised elements to the text. of this isn’t true. That’s what I Each child can be coded once only for think’ this. Aims to assess proportion of children who think different texts types Entirely Participants believe the text is entirely ‘I think it’s fake’ are reliable/unreliable. fictionalised fictionalised, containing no truth.

384 Table ii: Code categories for justification utterances Code category Specific Explanation of code Example of code code Composition Making Participants question whether information in the text makes ‘Women worked in ammunition places to make guns, Participants use sense sense when considering the reliability of the text. May grenades….. but they had no women in the Black Hand thing. But include general references as to whether the overall text how would they get it?’ evidence about the makes sense. content/style of the Characters Participants consider the historical figures/protagonists in ‘Just the characters aren’t true. text to judge the the text when considering how reliable it is. Maybe they were just saying specific characters from WWI’ reliability of the text. Literary Children discuss literacy devices (e.g. figurative language, ‘Oh except for the bit when it says the queues go on for eternity devices sub-headings) used in text when discussing reliability of text. because that’s just exaggerating’ Facts Children draw on ‘facts’ from the text (e.g. dates), using ‘Because it gives you exact dates’ these as evidence that the text must be true. Source Teacher Participants refer to the person who has provided the text ‘Miss wouldn’t give us a fake text’ Participants use (the ‘teacher’), when considering the reliability of the text. evidence about the Original Participants refer to people who were there at the time of ‘I think it is all true for the allies side but for the Central Powers source historic events, who might have witnessed and could have side you do not know their story. Because people are actually source of the text to recorded events. there to record the allies side’ judge the reliability Author/ Participants refer to the creator of the text when considering ‘How do they know though that this actually happened?’ of the text. historian reliability. Includes statements about ‘how do we/they know that…?’ Evidence When considering the source of information, participants ‘We have documents, artefacts’ refer to the existence of documents, artefacts, photographs, etc, which could have informed us. Prior knowledge History Participants draw on prior knowledge of war/relevant history ‘Well this text some of it is true like I know because I know the Participants draw on to make judgements about the reliability of a text (whether HMS Dreadnought was the strongest because I know later on this prior knowledge is accurate or not). Should only be given there was a big battle’ their own if the prior knowledge is explicit. knowledge/ Experience Participants draw on their personal experiences to make ‘Because if you watched a film called [inaudible] it’s like that’ experiences to judge judgements about the reliability of a text, or references to the reliability of the relatives living during the war. text.

385 Appendix Q: Rules for codable utterances

Turns • A turn refers to one or more lines of the transcript during which one participant is speaking, without interruption. A new turn begins when another participant begins speaking.

Utterances • An utterance is a statement which performs a particular function. This function differs across the three coding schemes. Utterances across the different coding schemes will be discussed in further detail below.

Function coding scheme:

In the function coding scheme, an utterance is one or more turns in the transcript where participants are using one of the strategies (the codes listed in the function coding scheme) to construct historical understanding.

To identify codable utterances, the coder first needs to identify instances where readers are using particular strategies to construct meaning. The length of the utterance will depend on the strategy being used, and guidance is given in the coding scheme to support identification of the length of utterances for different codes.

For instance, one of the strategies which might be used is ‘assessing accuracy’, which involves one participant challenging or correcting another. Once this strategy is identified in the transcript, then the utterance can be identified as the two turns which combined to create this code. In the example below, these two turns constitute one utterance. E.g. Child 1: ‘Archduke France Ferdinand’ Child 3: ‘No, it’s Franz Ferdinand, not France’.

Alternatively, the ‘collaborative’ strategies involve participants combining at least three related units of historical meaning collaboratively, and therefore once this has been identified, this utterance would consist of at least three turns, and possibly more. For instance, the three turns in the example below constitute one utterance. E.g. Child 2: ‘Women had to work from 7 o’clock until 7 o’clock’ Child 3: ‘That’s like 12 hours. They must be so tired’ Child 1: ‘And then they’d still have to go home and look after their children because their husbands aren’t there to help’

Content coding scheme:

In the content coding scheme, an utterance is one or more turns in the transcript where participants express some form of historical understanding. However, there are additional rules for identifying what constitutes ‘some form of historical understanding’:

• Determining the number of utterances in one turn: : To decide where an utterance ends, the coder must keep the codes in mind. If a participant expresses historical understanding which fits into two different coding categories, this must be coded as two different utterances. In the example below, the participant makes one basic causal statement, followed by an incorrect statement. Therefore, these are coded as two separate utterances. E.g. Child 4: No because they were trying to fill up the bullets so they actually went faster. And made them whiter. : If a child expresses a number of closely related units of historical understanding in one turn of speaking, this is coded as a single utterance, because this falls under the ‘complex utterance’ category (see content coding scheme). Although three individual units (highlighted) are identifiable in this one turn, because they are all closely related they are coded as a single utterance. E.g. Child 3: Well basically Britain sent Germany a message telling them to remove their troops or they would declare war, but Germany didn’t respond to this so Britain declared war on them. • Determining utterances across multiple turns: : A codable utterance can only involve multiple participants if neither of the participant’s statements make sense without the other. The statements must be together to create a basic unit of historical meaning, and therefore are coded as one utterance. E.g. Child 1: ‘If you were aged 18 to 41-‘ Child 2: ‘You had to go to fight’ : However, if participants express whole ideas similar to each other’s, but each child’s statement does express historical understanding on its own, these are coded as separate utterances. E.g. Child 1: ‘One of the rules was not to feed bread to animals’ Child 2: ‘Because this was a waste of precious food’ : A single utterance can also be expressed across two or more turns if a participant is interrupted whilst expressing an idea. An utterance can only be continued across multiple turns if there is minimal interruption (only a couple of irrelevant words or an incomplete idea) from other participants. If another participant expresses a complete utterance themselves, then the previous participant’s utterance must end. In the example below, both of child one’s turns (underlined) constitute one codable utterance. E.g. Child 1: Well first of all the British and French artillery shells weren’t made properly because they needed so many to do the bombardment Child 2: Yeah [inaudible] Child 1: So most of them didn’t actually explode • Repetition: If one participant expresses an idea, and a second participant repeats this information, then the second participant’s statement is not also coded as a second utterance. Only code as an additional utterance if the information contributes additional meaning to the first participant’s utterance. This is because coding looks at understanding of texts as a group, rather than individuals: individual understanding is impossible to untangle in these discussions. Deciding whether a piece of information is being repeated may require looking back through the transcript to determine exactly which information has been repeated: some participants might repeat the same utterance multiple times during a discussion question.

387 • Reading directly from the text: : Instances where participants are directly reading from the text at length are not classified as utterances which express historical understanding. • Coder interpretation: : If a participant makes a vague statement, where the coder cannot understand what the participant is trying to express, this should not be classified as a codable utterances. The coder should not impose their own assumptions to try to interpret what the participant is trying to express.

Truth-value coding scheme:

In the truth-value coding scheme, utterances are either statements that express an opinion on how reliable or truthful the text is, or statements that justify this opinion.

• Statements that express an opinion on how reliable/truthful the text is: : Each participant is coded once for their opinion per transcript. : If participants express two differing opinions during the discussion (perhaps because the discussion has changed their mind), code the last opinion that they express only. • Statements that justify opinions of truth-value: : Multiple utterances can occur within one turn if participants express two different justifications for their opinion within the same turn. : An utterance can be coded across multiple turns if two participants describe a justification together. For instance, one participant might provide a justification and another might provide evidence to support this from the text. For instance, the two turns below would be coded as one utterance. E.g. Child 2: Yeah because it gives the exact times and dates Child 3: Yeah! Like it says here the battle ended on 18th November 1916.

388 Appendix R: Bonferroni calculations for groups of tests

Bonferroni correction, if Number of required Analysis analyses aI = 1 – (1 - a)1/k conducted aI= new alpha a = alpha k = number of tests Repeated measures ANOVA 1 - Post-hoc t-tests for main effect for time 3 - Post-hoc t-tests for interaction effect 9 0.006 between time and condition Spearman’s rank-order correlation 1 - Reading levels ANCOVA 1 - Post-hoc ANCOVAS for interaction effect 3 - between reading level and time Post-hoc ANCOVAS for interaction effect 3 - between condition and time Repeated measures ANOVA (median split) 1 - Post-hoc t-tests for interaction effect 9 0.006 between reading level and time Post-hoc ANOVAs for interaction effect 3 - between reading level and time Pearson correlations to explore relationship between reading level and 3 - assessment scores Repeated measures ANOVA for 1 - Intervention 1: War Begins Repeated measures ANOVA for 1 - Intervention 2: Trench Life Post-hoc t-tests for interaction effect between time and condition for 9 0.006 Intervention 2: Trench Life Repeated measures ANOVA for 1 - Intervention 3: The Home Front Repeated measures ANOVA for questions 1 - spanning all three intervention themes Post-hoc t-tests for interaction effect between time and condition for questions 9 0.006 spanning all three intervention themes Repeated measures ANOVA for simple 1 - conceptual thinking questions Post-hoc t-tests for interaction effect between time and condition for simple 9 0.006 conceptual thinking questions Repeated measures ANOVA for complex 1 - conceptual thinking questions 389 Repeated measures ANOVA for 1 - chronological thinking questions Independent t-test for chronological 1 - sequencing activity Repeated measures ANOVA for 1 - chronological thinking questions Post-hoc t-tests for interaction effect between time and condition for causal 9 0.006 thinking questions Post-hoc ANOVAS for interaction effect between time and condition for causal 3 - thinking questions Mann-Whitney U for enjoyment of 1 - interventions Repeated measures ANOVA 1 - Mann-Whitney U for ratings of enjoyment of history learning on pre- and post- 2 - questionnaires Wilcoxon tests exploring whether ratings 2 - of enjoyment of history change over time Repeated measures ANOVAs exploring influence of enjoyment of history learning 2 - ratings on assessment scores Post-hoc repeated measures ANOVAs exploring interaction effect between 3 - enjoyment of history learning and time Post-hoc repeated measures ANOVAs exploring interaction effect between 6 0.009 condition, enjoyment of history learning and time Post-hoc repeated measures ANOVAs exploring interaction effect between time 3 - and condition for those who love history learning Chi-squares comparing interest in WWI 2 - across conditions McNemar exploring ET interest in WWI 1 - across pre- and post-questionnaires Repeated measures ANOVAs exploring 2 - interest in relation to assessment scores Post-hoc repeated measures ANOVAs exploring interaction effect between time 6 0.009 and interest on pre- and post- questionnaire ratings of interest Post-hoc repeated measures ANOVAs exploring interaction effect between 4 - time, interest and condition on pre- questionnaire ratings of interest

390 Post-hoc repeated measures ANOVAs exploring interaction effect between 3 - condition and time for participants interest in WWI Post-hoc repeated measures ANOVAs exploring interaction effect between 3 - interest and time in ET condition Mann-Whitney U comparing self- evaluations of NNF and ET conditions on 1 - pre- and post-questionnaires Wilcoxon analyses exploring self- evaluations of participants over time, in 1 - the NNF and ET condition Pearson correlations for participants’ pre- assessment scores and post-/delayed 2 - post-assessment scores Pearson correlations for participants’ pre- assessment scores and post-/delayed 4 - post-assessment scores in the NNF and ET conditions Repeated measures ANOVA with condition and average number of hours 1 - spent reading at home in a week as additional factors Post-hoc repeated measures ANOVAs exploring interaction between time and 3 - average number of hours spent reading at home Repeated measures ANOVA with condition and whether reading was a 1 - hobby as additional factors Post-hoc repeated measures ANOVAs exploring interaction between time and 1 - reading as a hobby

391 Appendix S: Repeated measures ANOVA and post-hocs run excluding anomaly observed on pre-assessment

Table 117: Repeated measures ANOVA with condition as a factor

Effects df F p h2 Interaction effect 2,65 3.592 0.033* 0.100 (condition*time) Main effect 1,66 6.631 0.012* 0.091 (condition) Main effect (time) 2,65 97.372 <0.001* 0.750

Table 118: Post-hoc paired samples t-tests comparing mean assessment scores over time for each condition

NNF ET Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 32 -11.482 <0.001* -2.00 34 -8.344 <0.001* -1.41 Pre to 32 -9.895 <0.001* -1.72 34 -8.170 <0.001* -1.38 delayed post Post to 32 1.519 0.139 0.26 34 3.160 0.003* 0.53 delayed post

Table 119: Post-hoc independent samples t-tests comparing assessment scores across conditions

Assessments df t p Cohen’s D Pre-assessment 58.49836 1.914 0.061 0.46 Post-assessment 66 2.299 0.025 0.56 Delayed post- 66 2.736 0.008* 0.66 assessment

36 For this t-test, Levene’s test for equality of variances was significant, and therefore equal variances were not assumed. 392 Appendix T: Post-hoc analyses exploring main effects for time

Post-hoc comparisons for main effect for time observed for assessment questions relating to Intervention 1: War Begins

Post-hoc paired-samples t-tests, collapsing across condition, were conducted to explore the main effect for time further. These t-tests showed a significant difference in mean scores between both pre- and post-assessments, and pre- and delayed post- assessments, both with a large effect size (see table 120), with participants scoring significantly higher in the post- and delayed post-assessments than in the pre- assessments (see table 121). However, there was no significant difference in assessment scores between the post- and delayed post-assessments (see table 120), suggesting that there was no significant decrease in knowledge by the point of the delayed post- assessment.

Table 120: Post-hoc paired-samples t-tests: Intervention 1 (War Begins)

Cohen’s Assessments df t p D Pre to post 72 -11.290 <0.001* -1.32 Pre to delayed post 72 -10.194 <0.001* -1.9 Post to delayed post 72 1.779 0.079 0.21

Table 121: Mean scores and standard deviations for assessment questions relating to Intervention 1: War Begins

Assessments Mean SD Pre 1.52 1.804 Post 6.34 4.187 Delayed post 5.78 4.151

Post-hoc comparisons for main effect for time observed for assessment questions relating to Intervention 2: Trench Life

Paired-samples t-tests, collapsing across condition, were conducted to explore the main effect observed for time. These t-tests showed a significant difference in mean scores between both pre- and post-assessments, and pre- and delayed post-assessments, both with a large effect size (see table 122): overall, participants scored significantly higher on the post- and delayed post-assessments than on the pre-assessments (see table 123). No significant difference in assessment scores was observed between the post- and delayed post-assessments (see table 122), suggesting that there was not a significant decrease in knowledge retained between the post- and delayed post-assessments.

393 Table 122: Post-hoc paired-samples t-tests: Intervention 2 (Trench Life)

Cohen’s Assessments df t p D Pre to post 71 -9.984 <0.001* -1.18 Pre to delayed post 71 -9.008 <0.001* -1.06 Post to delayed post 71 0.916 0.363 0.11

Table 123: Mean scores and standard deviations for assessment questions relating to Intervention 2: Trench Life Assessments Mean SD Pre 0.46 0.711 Post 3.68 2.930 Delayed post 3.50 3.040

Post-hoc comparisons for main effect for time observed for assessment questions relating to Intervention 3: The Home Front

Three paired-samples t-tests, collapsing across condition, were run to further explore the main effect for time. As expected, there was as significant difference in mean scores from pre- to post- and from pre- to delayed post-assessments (see table 124), with a large effect size, with scores increasing over time (table 125). In contrast to the previous two intervention topics, a significant difference between assessment scores was also observed between post- and delayed post-assessments with a small effect size (see table 124), with scores decreasing significantly over time (see table 125).

Table 124: Post-hoc paired-samples t-tests: Intervention 3 (The Home Front)

Assessments df t p Cohen’s D Pre to post 71 -10.558 <0.001* -1.24 Pre to delayed post 71 -8.819 <0.001* -1.04 Post to delayed post 71 3.581 0.001* 0.42

Table 125: Mean scores and standard deviations for assessment questions relating to Intervention 3: The Home Front Assessments Mean SD Pre 0.24 0.617 Post 3.36 2.734 Delayed post 2.62 2.520

Post-hoc comparisons for main effect for time observed for assessment questions spanning all three intervention themes

Paired-samples t-tests were conducted to explore the main effect for time observed. These t-tests showed a significant difference in mean scores between both pre- and post-assessments, and pre- and delayed post-assessments, both with large effect sizes (see table 126); participants’ scores increased significantly from both pre- to post- and from pre- to delayed post-assessments (see table 127). No significant difference was observed between post- and delayed post-assessment scores (see table 126). 394 Table 126: Post-hoc paired-samples t-tests: all intervention themes

Assessments df t p Cohen’s D Pre to post 68 -12.084 <0.001* -1.45 Pre to delayed post 68 -9.887 <0.001* -1.19 Post to delayed post 68 1.457 0.150 0.18

Table 127: Mean scores and standard deviations for assessment questions spanning all three intervention themes Assessments Mean SD Pre 1.49 1.324 Post 3.80 1.685 Delayed post 3.49 2.019

Post-hoc comparisons for main effect for time observed for simple conceptual thinking assessment questions

Three paired-samples t-tests, collapsing across condition, were run to explore the significant main effect observed for time. It was found that, overall, there was a significant difference in the mean scores of participants between both pre- and post- and between pre- and delayed post-assessments, both with large effect sizes (see table 128), with participants scoring significantly higher in the post- and delayed post- assessments than in the pre-assessments (see table 129). However, no significant difference was observed in mean scores between post- and delayed post-assessments (see table 128).

Table 128: Paired-samples t-tests: simple conceptual thinking

Assessments df t p Cohen’s D Pre to post 68 -14.846 <0.001* -1.79 Pre to delayed post 68 -13.669 <0.001* -1.65 Post to delayed post 68 -0.748 0.457 0.06

Table 129: Mean scores and standard deviations for simple conceptual thinking questions Assessments Mean SD Pre 2.57 2.061 Post 8.45 4.286 Delayed post 8.70 4.791

Post-hoc comparisons for main effect for time observed for complex conceptual thinking assessment questions

Three paired-samples t-tests, collapsing across condition, were run to explore the significant main effect observed for time. It was found that there was a significant difference with a large effect size in the mean scores of participants between both pre- and post- and between pre- and delayed post-assessments (see table 130), with participants scoring significantly higher in both the post- and delayed post-assessments than in the pre-assessments (see table 131). In addition, a significant difference was 395 found between post- and delayed post-assessment scores, with a small effect size (see table 130), with scores significantly decreasing between these two assessments (see table 131).

Table 130: Paired-samples t-tests: complex historical understanding

Assessments df t p Cohen’s D Pre to post 68 -10.459 <0.001* -1.26 Pre to delayed post 68 -9.802 <0.001* -1.18 Post to delayed post 68 3.210 0.002* 0.39

Table 131: Mean scores and standard deviations for complex conceptual thinking questions Assessments Mean SD Pre 0.12 0.404 Post 3.94 3.194 Delayed post 3.29 2.824

Post-hoc comparisons for main effect for time observed for chronological thinking assessment questions

Three paired-samples t-tests, which were collapsed across condition, were run to explore the significant main effect observed for time. A significant difference between assessment scores was found between both pre- and post-assessments and between pre- and delayed post-assessments, both with large effect sizes (see table 132), with participants scoring significantly higher in the post- and delayed post-assessments than in the pre-assessments (see table 133). However, no significant difference was observed between post- and delayed post-assessments (see table 132).

Table 132: Paired-samples t-tests: chronological thinking

Assessments df t p Cohen’s D Pre to post 68 -7.844 <0.001* -0.94 Pre to delayed post 68 -7.105 <0.001* -0.86 Post to delayed post 68 0.218 0.828 0.03

Table 133: Mean scores and standard deviation for chronological thinking questions

Assessments Mean SD Pre 0.62 0.893 Post 2.13 1.562 Delayed post 2.09 1.704

Post-hoc comparisons for main effect for time observed for causal thinking assessment questions

Three paired-samples t-tests, collapsing across condition, were run to explore the significant main effect observed for time further. A significant difference with a large effect size in the assessment scores of participants was found between both pre- and 396 post- and between pre- and delayed post-assessments (see table 134), with participants scoring significantly higher in the post- and delayed post-assessments than in the pre- assessments (see table 135). However, no significant difference was observed in mean scores between post- and delayed post-assessments (see table 134).

Table 134: Paired-samples t-tests: causal thinking

Assessments df t p Cohen’s D Pre to post 68 -7.854 <0.001* -0.95 Pre to delayed post 68 -8.428 <0.001* -1.01 Post to delayed post 68 1.007 0.317 0.12

Table 135: Mean scores and standard deviations for causal thinking questions

Assessments Mean SD Pre 0.48 0.964 Post 2.68 2.665 Delayed post 2.51 2.380

397 Appendix U: The influence of ratings of enjoyment of history learning on assessment scores

Two repeated measures ANOVAs were run to explore whether participants’ ratings of enjoyment of history learning had an influence on assessment scores over time. No significant interaction effects were observed for participants’ enjoyment ratings on the pre-questionnaire (see table 136). However, on the post-questionnaire, a significant interaction effect was observed between enjoyment ratings and time, with a moderate effect size (see table 136): the higher participants’ ratings of their enjoyment of history learning, the more progress was made over the course of the assessments (see figure 34). A significant three-way interaction effect was also observed between post- questionnaire enjoyment ratings, time and condition, with a moderate effect size (see table 136). It appears that in the ET condition, those who stated that they love learning about history made more progress in assessment scores over time than participants who gave lower ratings (see figure 35), although this was not the case in the NNF condition (see figure 36). Further post-hoc tests exploring these two interaction effects will be discussed below.

Table 136: Repeated measures ANOVAs with condition and rating of enjoyment of history learning as factors

Questionnaire Effects df F p h2 Pre-questionnaire Time 2,60 40.036 <0.001* 0.466 Condition 1,61 1.448 0.234 0.019 Condition*time 2,60 0.630 0.536 0.008 Enjoyment*time 2,60 1,983 0.073 0.060 Time*condition* 6,122 1.533 0.173 0.013 enjoyment Post- Time 2,59 52.149 <0.001* 0.495 questionnaire Condition 1,60 0.758 0.388 0.009 Condition*time 2,59 0.471 0.627 0.002 Enjoyment*time 8,120 2.712 0.009* 0.084 Time*condition* 6,120 4.574 <0.001* 0.060 enjoyment

398 Figure 34: Mean scores of participants with different ratings of enjoyment of history learning

Figure 35: Mean scores of participants with different ratings of enjoyment of history learning in ET condition

399 Figure 36: Mean scores of participants with different ratings of enjoyment of history learning in NNF condition

Post-hocs for interaction effect between enjoyment and time

Three further repeated measures ANOVAs, collapsed across condition, were conducted to explore the interaction effect between rating of enjoyment of history learning and time37. A significant interaction effect between time and enjoyment was found between both pre- and post-assessments (with a moderate effect size) and pre- and delayed post- assessments (with a large effect size) (see table 137): the more that participants enjoyed learning about history, the greater progress they made between these assessments (see figure 34 above). However, no significant interaction effect between time and enjoyment was observed from post- to delayed post-assessments (see table 137), suggesting that enjoyment rating did not influence retention of conceptual understanding.

Table 137: Post-hoc repeated measures ANOVAs with enjoyment of history learning as a factor Assessments df F p h2 Pre to post 4,64 3.117 0.021* 0.084 Pre to delayed 4,64 5.102 0.001* 0.143 post Post to 4,64 0.899 0.470 0.049 delayed post

37 t-tests were not used for post-hocs because there are five levels to the rating of enjoyment variable, meaning that it cannot be used in an independent samples t-test. 400 Post-hocs for interaction effect between time, condition and enjoyment

Further ANOVAs were conducted to explore the three-way interaction effect observed between time, condition and rating of enjoyment of history learning. As six ANOVAs were run in total to explore this interaction effect, a Bonferroni correction was applied, with an adjusted alpha of 0.009. Firstly, a two-way ANOVA was conducted to explore the interaction between enjoyment rating and time for each condition. A significant interaction effect for time and enjoyment was observed for neither the NNF nor ET conditions (see table 138).

Table 138: Repeated measures ANOVAs for NNF and ET conditions, with enjoyment of history learning as a factor

NNF ET

df F p h2 df F p h2

Time 2,29 34.161 <0.001* 0.598 2,29 31.187 <0.001* 0.525 Time*enjoyment 6,60 2.862 0.016 0.086 8,60 2.402 0.026 0.153

In addition, repeated measures ANOVAs were run to explore interactions between condition and time for each of the different enjoyment ratings. Only 2 participants – both in the ET condition – stated that they really did not like history learning, so no analysis was run for this rating. Therefore, four ANOVAs were run in total. A significant interaction effect between condition and time was found for those who stated that they loved learning about history, with a small effect size (see table 139). No significant interaction effects were observed for any other ratings. For those participants who loved history, more progress was made from the pre-to post-assessment in the ET condition than in the NNF condition. However, retention of information between post- and delayed post-assessments showed a sharper decrease in the ET condition, and a slight increase in the NNF condition (see figure 37, below). Additional post-hocs were run to explore this interaction effect further.

Table 139: Repeated measures ANOVAs for ratings of enjoyment of history learning, with condition as a factor

No. of Rating of ANOVA results participants history Effect learning NNF ET df F p h2 Do not Time 2 4 2,3 10.246 0.046 0.776 like Condition*time 2 4 2,3 9.665 0.049 0.075 Do not Time 11 18 2,26 34.846 <0.001* 0.681 mind Condition*time 11 18 2,26 1.165 0.328 0.013 Time 13 7 2,17 30.361 <0.001* 0.705 Like Condition*time 13 7 2,17 3.221 0.065 0.065 Time 8 4 2,9 45.078 <0.001* 0.818 Love Condition*time 8 4 2,9 9.026 0.007* 0.053

401 Figure 37: Mean assessment scores of participants who love learning about history across conditions

Three final repeated measures ANOVAs were conducted to explore the interaction observed above between condition and time for those participants who love learning about history. A significant interaction effect between condition and time was observed between pre- and post-assessments, with a small effect size, and between post- and delayed post-assessments, with a large effect size (table 140). Participants who love history learning made significantly more progress from pre- to post-assessments in the ET condition than in the NNF condition, yet showed a significantly larger decrease in scores from post- to delayed post-assessment. This resulted in both conditions making similar progress from pre- to delayed post-assessments overall. However, these results must be treated with caution: the numbers of participants in this analysis was very small, and therefore power is reduced. Also, there are double the number of participants in the NNF condition, creating an uneven sample size.

Table 140: Repeated measure ANOVAs for participants who love history learning, with condition as a factor

Number of participants ANOVA results: condition*time

NNF ET df F p h2 Assessments Pre to post 8 4 1,10 6.025 0.034* 0.054 Pre to delayed 8 4 1,10 0.063 0.807 0.001 post Post to delayed 8 4 1,10 14.735 0.003* 0.467 post

402 Appendix V: The influence of ratings of interest in WWI on assessment scores

Two repeated measures ANOVAs were conducted to explore whether participants’ interest in WWI (as rated on both pre- and post-questionnaires) influenced assessment scores over time. Participants who did not respond to either of the questions on interest were excluded from analyses: the number of participants included in analyses are detailed in tables of results. In terms of participants’ interest in WWI as stated on the pre-questionnaire, a significant interaction effect was observed between time and interest, with a small effect size (see table 141). The same interaction effect was observed on the post-questionnaire, with a small effect size (see table 141). For ratings made on both pre-questionnaires and post-questionnaires, those interested in WWI made greater progress in assessment scores over time than those uninterested in WWI (see table 142). In addition, a significant three-way interaction was observed for pre- questionnaire responses between interest, time and condition, with a small effect size (see table 141). Those interested in WWI showed similar patterns of progression in both the NNF and ET condition, but scored more highly in the NNF condition (see figure 38). Those not interested in WWI made more progress from pre- to post-assessments in the NNF condition, but showed better retention of understanding in the ET condition (see figure 39). Post-hocs to explore the two interaction effects between time and interest will be conducted below, followed by post-hocs to explore the three-way interaction effect between time, condition and interested observed for pre-questionnaire responses.

Table 141: Repeated measures ANOVAs with condition and interest in WWI as factors Questionnaire Effects df F p h2 Time 2,60 47.608 <0.001* 0.518 Condition 1,61 0.782 0.380 0.010 Pre-questionnaire Condition*time 2,60 0.592 0.556 0.006 (65 participants) Interest*time 2,60 4.823 0.011* 0.051 Time*condition* 2,60 3.872 0.026* 0.013 interest Time 2,64 67.047 <0.001* 0.592 Condition 1,65 2.256 0.138 0.030 Post-questionnaire Condition*time 2,64 0.923 0.403 0.007 (69 participants) Interest*time 2,64 4.239 0.019* 0.038 Time*condition* 2,64 1.427 0.247 0.004 interest

Table 142: Mean scores and standard deviations for participants who stated an interest/no interest in WWI on the pre-questionnaire and the post-questionnaire Assessment Pre- Post-assessment Delayed post- Interest assessment assessment Questionnaire in WWI N Mean SD Mean SD Mean SD Pre- Yes 51 4.378 0.402 19.577 1.196 17.603 1.222 questionnaire No 14 2.100 0.844 9.950 2.515 9.175 2.569 Post- Yes 47 4.210 0.443 19.203 1.322 17.434 1.334 questionnaire No 22 2.885 0.714 11.948 2.130 10.542 2.149

403 Figure 38: Mean scores of participants who stated an interest in WWI on pre- questionnaires across conditions

Figure 39: Mean scores of participants who stated no interest in WWI on pre- questionnaires across conditions

Post-hocs for interaction effect between interest and time on pre-questionnaire

Further planned comparison analyses were conducted to explore where the significant interaction effect between interest and time lie. Nine post-hoc t-tests were conducted in total, and therefore a Bonferroni correction was applied, with an alpha adjustment of 0.006. Firstly, six t-tests were run to explore each group’s progress over time (interest

404 (N=51) and no interest (N=14)): three paired samples t-tests were run for those interested in WWI, and three for those not interested in WWI, comparing mean assessment scores from pre- to post-assessments, pre- to delayed post-assessments, and post- to delayed post-assessments. Results can be found in table 143. Those with an interest in WWI made significant increases from pre- to post-assessment and pre- to delayed post-assessment, with a large effect size, but a significant decrease in score from post- to delayed post-assessment, with a small effect size (see previous table 142). Those with no interest in WWI also made a significant increase in scores between pre- and post-assessments and between pre- and delayed post-assessments (see previous table 142), with a large effect size. However, no significant difference was found between post- and delayed post-assessment scores for those not interested in WWI (table 143).

Table 143: Post-hoc paired samples t-tests comparing mean assessment scores over time for those interested and those not interested in WWI

Interest No interest Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 50 -14.087 <0.001* -1.97 13 -6.673 <0.001* -1.78 Pre to 50 -12.121 <0.001* -1.70 13 -5.974 <0.001* -1.60 delayed post Post to 50 3.237 0.002* 0.45 13 -0.071 0.945 -0.02 delayed post

Secondly, three further independent samples t-tests were run, comparing groups’ scores on each of the three assessments. Results can be found in table 144. Those interested in WWI scored significantly higher on post- and delayed post-assessments (see previous table 142), each with a large effect size.

Table 144: Post-hoc independent samples t-tests comparing assessment scores across participants with an interest in WWI and with no interest

Assessments df t p Cohen’s D Pre-assessment 63 -2.380 0.020 0.83 Post-assessment 46.80938 -5.706 <0.001* 1.37 Delayed post- 41.893 -4.176 <0.001* 1.03 assessment

Post-hocs for interaction effect between interest and time on post-questionnaire

The same nine post-hoc t-tests were conducted for the post-questionnaire ratings of interest. As multiple tests were run at this point, a Bonferroni correction was applied, with an alpha adjustment of 0.006. Firstly, six t-tests were run to explore each group’s

38 For both the t-test looking at post-assessments and the t-test looking at delayed post-assessments, Levene’s test for equality of variances was significant, and therefore equal variances were not assumed. 405 progress over time (interest (N=47) and no interest (N=22)). Results are displayed in table 145. The same pattern was observed for both those interested in WWI and those not interested in WWI: significant increases in score, with a large effect size, were observed between both pre- and post-assessments and between pre- and delayed post- assessments. No difference in scores was observed between post- and delayed post- assessments.

Table 145: Post-hoc paired samples t-tests comparing mean assessment scores over time for those interested and those not interested in WWI

Interest No interest df t p Cohen’s df t p Cohen’s Assessments D D Pre to post 46 -12.664 <0.001* -1.85 21 -8.683 <0.001* -1.85 Pre to 46 -11.621 <0.001* -1.70 21 -6.743 <0.001* -1.44 delayed post Post to 46 2.557 0.014 0.37 21 1.507 0.147 0.32 delayed post

Finally, three further independent samples t-tests were run, comparing groups’ scores on each of the three assessments. Results can be found in table 146. There was no significant difference between scores on pre-assessments, but a significant difference was observed between scores on both post- and delayed post-assessments, both with large effect sizes. For both assessments, those with an interest in WWI scored significantly higher than those with no interest (see table 142).

Table 146: Post-hoc independent samples t-tests comparing assessment scores across participants with an interest in WWI and with no interest

Assessments df t p Cohen’s D Pre-assessment 67 -2.140 0.036 0.58 Post-assessment 59.69839 -4.152 <0.001* 0.99 Delayed post- 67 -3.298 <0.001* 0.90 assessment

Post-hocs for interaction effect between time, interest and condition on pre- questionnaire rating of interest in WWI

To explore the three-way interaction between time, interest and condition, four further repeated measures ANOVAs were run in total. Firstly, participants who stated that they had no interest in WWI were selected: a repeated measures ANOVA was run for these participants with condition as a factor. This was repeated for participants who had stated an interest in WWI. For those who were interested in WWI, a significant interaction effect was observed between condition and time, with a small effect size

39 For this t-test, Levene’s test for equality of variances was significant, and therefore equal variances were not assumed. 406 (see table 147): those interested in WWI made more progress over time in the NNF condition than in the ET condition (see previous figure 38). This interaction effect will be explored further shortly.

Table 147: Repeated measures ANOVAs for those who stated an interest/no interest in WWI (on pre-questionnaire rating), with condition as a factor

Number of ANOVA results Level of participants interest Effects NNF ET df F p h2 Time 28 23 2,48 95.856 <0.001* 0.747 Interest Condition*time 28 23 2,48 5.035 0.010* 0.020 No Time 4 10 2,11 18.969 <0.001* 0.651 interest Condition*time 4 10 2,11 1.877 0.199 0.034

Secondly, NNF participants were selected; a repeated measures ANOVA was run with interest in WWI as a factor. This was repeated for ET participants. For those in the ET condition, a significant interaction effect was observed between time and interest, with a small effect size (see table 148): in the ET condition, participants who showed an interest in WWI outperformed those who showed no interest (see figure 40, below). However, it must be noted that the number of participants was uneven across the two groups. The interaction effect observed between condition and time for those interested in WWI and the interaction effect between time and interest observed in the ET condition will be considered further below.

Table 148: Repeated measures ANOVAs for NNF/ET conditions, with interest (as measured on pre-questionnaires) as a factor

Number of ANOVA results participants No Interest df F p h2 Condition Effects interest Time 4 28 2,29 22.102 <0.001* 0.482 NNF Time*interest 4 28 2,29 2.740 0.081 0.065 Time 10 23 2,30 28.261 <0.001* 0.584 ET Time*interest 10 23 2,30 7.594 0.002* 0.059

407 Figure 40: Mean scores for ET participants interested in and not interested in WWI

Firstly, the interaction effect between condition and time observed for participants who stated an interest in WWI will be explored. Nine post-hoc t-tests were conducted in total, and therefore a Bonferroni correction was applied, with an alpha adjustment of 0.006. Firstly, those who showed an interest in WWI were selected, and six t-tests were run to explore each condition’s progress over time: three paired samples t-tests were run for the NNF condition (N=28) and three for the ET condition (N=23), comparing mean assessment scores from pre- to post-assessments, pre- to delayed post- assessments, and post- to delayed post-assessments. Results can be found in table 149. Those with an interest in WWI made significant progress from both pre- to post- and from pre- to delayed post-assessments in each condition, each with a large effect size. Those who showed an interest in WWI also showed a significant decrease in from post- to delayed post-assessment in the ET condition, with a large effect size, but not in the NNF condition.

Table 149: Post-hoc paired samples t-tests comparing mean assessment scores over time for each condition

NNF ET Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 27 -11.801 <0.001* -2.23 22 -8.209 <0.001* -1.71 Pre to 27 -10.686 <0.001* -2.02 22 -7.105 <0.001* -1.48 delayed post Post to 27 0.822 0.418 0.16 22 4.483 <0.001* 0.93 delayed post

408 Secondly, those who stated an interest in WWI were selected (N=51), and three further independent samples t-tests were run, comparing conditions’ scores on each of the three assessments. No differences were observed between the two conditions on any of the three assessments (see table 150).

Table 150: Post-hoc independent samples t-tests comparing assessment scores across conditions Assessments df t p Cohen’s D Pre-assessment 49 2.175 0.035 0.63 Post-assessment 49 1.787 0.080 0.50 Delayed post- 49 2.774 0.008 0.79 assessment

Next, nine post-hoc t-tests were conducted to explore the interaction effect between interest and time for those participants in the ET condition. Due to multiple tests run at this point, a Bonferroni correction was applied, with an alpha adjustment of 0.006. Firstly, participants in the ET condition were selected, and six t-tests were run to explore two groups’ progress over time (interest (N=23) and no interest (N=10). Results can be found in table 151. Both those who showed an interest in WWI and those who did not in the ET condition made significant progress from pre- to post- and from pre- to delayed post-assessment scores, with large effect sizes. However, those with an interest in WWI also showed a significant decrease, with a large effect size, in scores from post- to delayed post-assessment in the ET condition.

Table 151: Post-hoc paired samples t-tests comparing mean assessment scores over time for each group Interest No interest Cohen’s Cohen’s df t p df t p Assessments D D Pre to post 22 -8.209 <0.001* -1.71 9 -4.562 0.001* -1.44 Pre to 22 -7.105 <0.001* -1.48 9 -5.089 0.001* -1.61 delayed post Post to 22 4.483 <0.001* 0.93 9 -2.250 0.051 -0.71 delayed post

Secondly, participants in the ET condition were selected (N=33), and three further independent samples t-tests were run, comparing groups’ scores on each of the three assessments. Those who showed an interest in WWI scored significantly higher than those who did not on post-assessments in the ET condition (see table 152).

Table 152: Post-hoc independent samples t-tests comparing assessment scores across groups Assessments df t p Cohen’s D Pre-assessment 31 -0.921 0.365 0.36 Post-assessment 28.78940 -3.103 0.004* 1.05 Delayed post-assessment 26.656 -1.385 0.178 0.48

40 For t-tests run for post-assessments and delayed post-assessments, Levene’s test for equality of variances was significant, and therefore equal variances were not assumed. 409 Appendix W: Repeated measures ANCOVAs and ANOVAs for additional factors

Table 153: Repeated measures ANCOVA with condition as a factor and age as a covariate Effect df F p h2 Main effect (condition) 1,66 7.595 0.008* 0.103 Main effect (time) 2,65 0.047 0.955 0.001 Interaction effect 2,65 0.089 0.915 0.003 (age*time) Interaction effect 2,65 3.976 0.023* 0.109 (condition*time)

Table 154: Repeated measures ANOVA with condition and gender as factors Effect df F p h2 Main effect (condition) 1,65 8.699 0.004* 0.118 Main effect (time) 2,64 98.627 <0.001* 0.755 Interaction effect 2,64 4.243 0.019* 0.117 (condition*time) Interaction effect 2,64 2.126 0.128 0.062 (gender*time) Interaction effect 2,64 0.807 0.451 0.025 (condition*time*gender)

Table 155: Repeated measures ANOVA with condition and participants’ preferred text types as factors Effect df F p h2 Main effect (condition) 1,61 1.480 0.228 0.024 Main effect (time) 2,60 28.898 <0.001* 0.491 Interaction effect 2,60 1.123 0.332 0.036 (time*condition) Interaction effect 4,122 0.697 0.595 0.022 (time*preferred text type) Interaction effect 4,122 0.292 0.883 0.009 (condition*time*preferred text type)

Table 156: Repeated measures ANOVA with condition and Free School Meals (FSM) as factors Effect df F p h2 Main effect (condition) 1,29 12,669 0.001* 0.394 Main effect (time) 2,28 41.222 <0.001* 0.746 Interaction effect 2,28 6.286 0.006* 0.310 (time*condition) Interaction effect (time*FSM) 2,28 0.034 0.966 0.002 Interaction effect 2,28 0.094 0.910 0.007 (condition*time*FSM)

410 Table 157: Repeated measures ANOVA with condition and Pupil Premium (PP) as factors Effect df F p h2 Main effect (condition) 1,29 9.840 0.004* 0.253 Main effect (time) 2,28 39.883 <0.001* 0.740 Interaction effect 2,28 5.419 0.010* 0.279 (condition*time) Interaction effect (pupil 2,28 1.118 0.341 0.074 premium*time) Interaction effect 2,28 1.568 0.226 0.101 (condition*time*pupil premium)

Table 158: Repeated measures ANOVA with condition and TV as a hobby as factors Effects df F p h2 Time 2,64 85.161 <0.001* 0.727 Condition 1,65 5.515 0.022* 0.078 Time*condition 2,64 2.744 0.072 0.079 Time*TV as a 2,64 0.369 0.693 0.011 hobby Time*condition* 2,64 1.520 0.226 0.045 TV as a hobby

Table 159: Repeated measures ANOVA with condition and board games as a hobby as factors Effects df F p h2 Time 2,64 38.605 <0.001* 0.547 Condition 1,65 3.532 0.065 0.052 Time*condition 2,64 1.693 0.192 0.050 Time*board games 2,64 0.523 0.595 0.016 as a hobby Time*condition* 2,64 0.334 0.717 0.010 board games as a hobby

Table 160: Repeated measures ANOVA with condition and computer games as a hobby as factors Effects df F p h2 Time 2,64 96.410 <0.001* 0.751 Condition 1,65 8.710 0.004* 0.118 Time*condition 2,64 3.688 0.030* 0.103 Time*computer 2,64 2.018 0.141 0.059 games as a hobby Time*condition* 2,64 0.291 0.749 0.009 computer games as a hobby

411 Table 161: Repeated measures ANOVA with condition and listening to music as a hobby as factors Effects df F p h2 Time 2,64 90.576 <0.001* 0.739 Condition 1,65 6.761 0.012* 0.094 Time*condition 2,64 3.481 0.037* 0.098 Time*music as a 2,64 0.055 0.946 0.002 hobby Time*condition* 2,64 0.046 0.955 0.001 music as a hobby

Table 162: Repeated measures ANOVA with condition and playing outside as a hobby as factors Effects df F p h2 Time 2,64 96.246 <0.001* 0.750 Condition 1,65 7.079 0.010* 0.098 Time*condition 2,64 3.699 0.030* 0.104 Time*playing 2,64 0.254 0.777 0.008 outside as a hobby Time*condition* 2,64 0.252 0.778 0.008 playing outside as a hobby

Table 163: Repeated measures ANOVA with condition and sport as a hobby as factors Effects df F p h2 Time 2,64 100.161 <0.001* 0.758 Condition 1,65 8.004 0.006* 0.110 Time*condition 2,64 4.075 0.022* 0.113 Time*sport as a 2,64 1.208 0.306 0.036 hobby Time*condition* 2,64 0.305 0.738 0.009 sport as a hobby

412 Appendix X: Grouped analyses for Bonferroni calculations

Bonferroni correction, if required Number of aI = 1 – (1 - a)1/k Analysis analyses I conducted a = new alpha a = alpha k = number of tests Independent samples t-tests comparing frequency of three overarching function 3 - codes across conditions MANOVA comparing frequency of three 3 - specific function codes across conditions MANOVA comparing frequency of three historical thinking content codes across 3 - conditions MANOVA comparing frequency of two historical knowledge content codes across 2 - conditions MANOVA comparing frequency of five 5 0.01 specific content codes across conditions Pearson correlations exploring relationships 21 0.002 between seven codes of interest Hierarchical regressions exploring 6 0.009 relationships between codes and condition Pearson correlations exploring relationship between overarching function codes and 4 - assessments Pearson correlations exploring relationship between specific function codes and 6 0.009 assessments Pearson correlations exploring relationship between historical thinking codes and 6 0.009 assessments Pearson correlations exploring relationship between historical knowledge codes and 4 - assessments Pearson correlations exploring relationship between specific content codes and 10 0.005 assessments Hierarchical regressions exploring relationships between codes and delayed 2 - post-assessments Hierarchical regressions exploring relationships between codes and post- 2 - assessments

413 MANOVA comparing frequency of three truth-value judgement codes across 3 - conditions t-tests comparing frequency of three truth- value justification codes across conditions 3 -

Chi-squares comparing frequency of truth- value justifications in three individual 3 - interventions Pearson correlations between three truth- value judgements and post-/delayed post- 4 - assessments

414 Appendix Y: Comparisons of raw coding data and arcsine transformed data

Table 164: Independent t-test comparing frequency of three overarching function codes across conditions

Original Data Code df t p Cohen’s D Textual 76 1.226 0.224 0.28 understanding Referencing 76 2.854 0.006* 0.64 Navigation 76 0.610 0.544 0.14 Arcsine transformed data Code df t p Cohen’s D Textual 76 -0.914 0.364 0.21 understanding Referencing 76 2.339 0.022* 0.53 Navigation 53.883 53.883 0.212 0.28

Table 165: MANOVA comparing frequency of three specific function codes across conditions

Original Data F df p h2 Combined independent Condition 4.924 3,74 0.004* 0.166 variables Individual Collab (cum) 0.113 1,76 0.738 0.001 independent Collab (neg) 5.090 1,76 0.027* 0.063 variables Hypothetical 5.732 1,76 0.019* 0.070 Arcsine transformed data F df p h2 Combined independent Condition 6.409 3,74 <0.001* 0.206 variables Individual Collab (cum) 4.560 1,76 0.036* 0.057 independent Collab (neg) 4.008 1,76 0.049* 0.050 variables Hypothetical 4.874 1,76 0.030* 0.060

415 Table 166: MANOVA comparing frequency of three historical thinking content codes across conditions

Original data F df p h2 Combined independent Condition 3.395 3,74 0.022* 0.121 variables Individual Conceptual 3.216 1,76 0.077 0.041 independent Causal 0.004 1,76 0.952 0.000 variables Chronological 8,389 1,76 0.005* 0.099 Arcsine transformed data F df p h2 Combined independent Condition 5.601 3,74 0.002* 0.185 variables Individual Conceptual 0.357 1,76 0.552 0.005 independent Causal 5.227 1,76 0.025* 0.064 variables Chronological 11.075 1,76 0.001* 0.127

Table 167: MANOVA comparing frequency of two historical knowledge content codes across conditions

Original data Variables F df p h2 Combined dependent Condition 5.746 2,75 0.005* 0.133 variables Individual Recall 0.799 1,76 0.374 0.010 dependent Inference 11.542 1,76 0.001* 0.132 variables Arcsine transformed data Variables F df p h2 Combined dependent Condition 2.557 2,75 0.084 0.064 variables Individual Recall 3.684 1,76 0.059 0.046 dependent Inference 1.985 1,76 0.163 0.025 variables

416 Table 168: MANOVA comparing frequency of five specific content codes across conditions Note: Adjusted alpha = 0.01 Original data F df p h2 Combined independent Condition 6.189 5,72 <0.001* 0.301 variables Basic 5.723 1,76 0.019 0.070 Individual Partial 0.711 1,76 0.402 0.009 independent Mixed 0.085 1,76 0.772 0.001 variables Complex 9.067 1,76 0.004* 0.107 Incorrect 9.144 1,76 0.003* 0.107 Arcsine transformed data F df p h2 Combined independent Condition 4.491 5,72 0.001* 0.238 variables Basic 8.896 1,76 0.004 0.105 Individual Partial 2.059 1,76 0.155 0.026 independent Mixed 0.579 1,76 0.449 0.008 variables Complex 5.978 1,76 0.017* 0.073 Incorrect 13.780 1,76 <0.001* 0.153

Table 169: MANOVA comparing frequency of three truth-value judgement codes across conditions Original Data F df p h2 Combined independent Condition 5.411 3,73 0.002* 0.182 variables Entirely 5.214 1,75 0.025* 0.065 factual Individual Partially independent 10.608 1,75 0.002* 0.124 factual variables Entirely 4.992 1,75 0.028* 0.062 fictionalised Arcsine transformed data F df p h2 Combined independent Condition 3.677 3,74 0.016* 0.130 variables Entirely 5.878 1,76 0.018* 0.072 factual Individual Partially independent 4.966 1,76 0.029* 0.061 factual variables Entirely 4.622 1,76 0.035* 0.057 fictionalised

417 Table 170: Independent t-tests comparing frequency of three truth-value justification codes across conditions

Original Data Code df t p Cohen’s D Composition 53.198 3.216 0.002* 0.74 Source 76 0.103 0.918 0.02 Prior Knowledge 76 0.750 0.456 0.18 Arcsine transformed data Code df t p Cohen’s D Composition 76 1.568 0.121 0.36 Source 76 -0.758 0.451 0.17 Prior Knowledge 76 0.053 0.958 0.01

418