<<

Running head: SYSTEMATIC 1

Reconsidering the evidence that systematic phonics is more effective than

alternative methods of instruction

Jeffrey S. Bowers

University of Bristol

Author Note:

Jeffrey S. Bowers, School of Experimental Psychology, University of Bristol.

I would like to thank Patricia Bowers, Peter Bowers, Danielle Colenbrander, Rebecca

Marsh, Kathy Rastle, Robert Ross, and Gail Venable for comments on previous drafts and

Abla Hatherell for help on compiling the data for Figures 2-3.

Correspondence concerning this article should be addressed to Jeffrey S Bowers,

School of Experimental Psychology, 12a Priory Road, Bristol, BS8-1TU. Email j.bow- [email protected] Personal website: https://jeffbowers.blogs.ilrt.org/research/

SYSTEMATIC PHONICS 2

Abstract

There is a widespread consensus in the research community that reading instruction in Eng- lish should first systematically teach children letter () to sound () corre- spondences rather than meaning-based reading approaches such as whole instruc- tion. That is, initial reading instruction should emphasize systematic phonics. In this system- atic review I show this conclusion is not justified. First, I review and critique experimental studies that have assessed the efficacy of systematic phonics as summarized in 12 meta-anal- yses and two government reports. Not only are the results and conclusions of these reports often mischaracterized in the , there are serious flaws in analyses that undermine the conclusions that are drawn. Second, I review non-experimental studies have been used to support the conclusion that systematic phonics is most effective. Again, I show the conclu- sions are not justified. These findings should not be taken as an argument in support of and related methods, but rather, highlight the need for alternative approaches to reading instruction. Third, I consider why the scientific consensus in support of systematic phonics is so at odds with the data, and briefly outline an alternative approach to reading in- struction called (SWI). SWI takes key insights from both system- atic phonics and whole language, but goes beyond either approach by teaching children the logic of their system.

SYSTEMATIC PHONICS 3

Reconsidering the evidence that systematic phonics is more effective than alternative methods of reading instruction

There is a widespread consensus in the research community that early reading instruc- tion in English should emphasize systematic phonics. That is, initial reading instruction should explicitly and systematically teach letter (grapheme) to sound (phoneme) correspond- ences before focusing on the meaning of written words in isolation and in text. This contrasts with the main alternative method called whole language in which children are encouraged to focus on the meanings of words embedded in meaningful text, and where letter-sound corre- spondences are only taught incidentally when needed (Moats, 2000). Within the psychologi- cal research community, the “Reading Wars” (Pearson, 2004) that pitted whole language and phonics is largely settled – systematic phonics is claimed to be more effective. Indeed, it is widely claimed that systematic phonics is better than all alternative methods of reading in- struction.

The evidence for this conclusion comes from various sources, including government panels that assessed the effectiveness of different approaches to reading instruction in English (e.g., the US , 2000; a review commissioned by the English govern- ment, Rose, 2006), multiple systematic reviews of experimental research, as well as non-ex- perimental studies that have tracked progress of students in England since the requirement to teach systematic phonics in state schools. The results are claimed to be clear-cut. For exam- ple, in his review for the English government, Sir Jim Rose writes:

Having considered a wide range of evidence, the review has concluded that the case for systematic phonic work is overwhelming …” (Rose, 2006, p. 20).

Many of the most prominent researchers reach similarly strong conclusions. In his re- cent book on reading entitled “Reading at the Speed of Sight: How We Read, Why So Many Can’t, and What Can Be Done About It”, Mark Seidenberg (2017) writes:

The phonological pathway requires knowing how print relates to sound, the focus on “phonics” instruction… For reading scientists the evidence that the phonological pathway is used in reading and especially important in beginning reading is about as close to conclusive as research on complex human behavior can get. (p. 124)

Similarly, in his book entitled “Raising Kids Who Read: What Parents and Teachers Can Do”, Daniel Willingham (2017) writes:

… there are few topics in educational psychology that have been more thor- oughly studied, and for which the data are clearer… it’s clear that virtually all kids benefit from explicit instruction in the [letter-sound] code, and that such instruction is crucial for children who come to school with weak oral language skills.

The cognitive neuroscientist Stanislas Dehaene, author of the best-selling book “Reading in the Brain: The New Science of How We Read”, writes:

SYSTEMATIC PHONICS 4

It should be clear that I am advocating here a strong ‘phonics’ approach to teaching, and against a whole-word or whole-language approach… theoreti- cal and laboratory-based arguments converge with school-based studies that prove the inferiority of the whole-word approach in bringing about fast im- provements in reading acquisition. (Dehaene, 2011, p. 26).

Countless quotes to this effect could have been included.

Importantly, this strong consensus has resulted in important policy changes in Eng- land and US. Based on the Rose (2006) review, systematic phonics became a legal require- ment in state-funded primary schools in England since 2007, and in to ensure compliance, all children (ages 5-6) complete a national “phonics screen” since 2012 that measures how well they can sound out a set of words and meaningless pseudowords. Similarly, based on the rec- ommendations of the National Reading Panel (NRP, 2000), systematic phonics instruction was included in the Common Core State Standards Initiative in the US (http:// www.corestandards.org/). The Thomas Fordham Foundation concluded that the NRP docu- ment is the third most influential policy work in US education history (Swanson & Barlage, 2006).

Nevertheless, despite this strong consensus, I will show that there is no evidence that systematic phonics is better than the main alternative method used in schools, namely, whole language and balanced . Importantly, this should not be taken as an argument in sup- port of these alternative methods, but rather, it should be taken as evidence that all the current methods used in schools are far from idea. Once this is understood, my hope is that research- ers and politicians will be more motivated to consider alternative methods.

Structure of Paper

The remainder of the paper is organized in four main sections. First, I review the most common methods of reading instruction. There are some points of overlap between the alternative methods, but a commitment to systematic phonics entails some specific claims about what constitutes effective early reading instruction. Second, I explore the experimental evidence taken to support of systematic phonics. The majority of this section is devoted to a detailed review of the existing meta-analyses that assess the efficacy of systematic phonics under a range of conditions, including for beginning readers and children with reading diffi- culties. The conclusion from this review is simple: There is no evidence that systematic phon- ics is better than the most common alternative methods used in schools. I finish this section by briefly consider findings from educational neuroscience taken to support systematic phon- ics, and again, show the conclusions are unjustified. Third, I review non-experimental re- search that has assessed the impact of requiring systematic phonics in all English state schools since 2007. Again, the findings provide no evidence that systematic phonics has im- proved reading. Fourth, I briefly outline one reason why so many researchers have endorsed an unjustified conclusion, and outline an alternative approach that is consistent with the cur- rent experimental and non-experimental research.

What is Systematic Phonics and What are the Main Alternatives?

All forms of reading instruction are motivated by one or more of the following facts: (1) written words have pronunciations, (2) written words have a meaning, (3) words are com- SYSTEMATIC PHONICS 5 posed of parts, including letters and (4) written words tend to occur in meaning- ful text, and (5) the ultimate goal of reading is to extract meaning from text. Different forms of instruction emphasize some of these points and down-play or ignore others, but there is nevertheless some overlap between different methods, and this complicates the task of com- paring methods. For example, whole language instruction focuses on understanding words in the context of text, but it also includes some degree of phonics (e.g., Moats, 2000), and this has implications for how the meta-analyses described below can be interpreted. A further complication is that it is widely claimed that systematic phonics should be embedded in a broader literacy curriculum. For instance, the NRP (2000) emphasizes that systematic phon- ics should be integrated with other forms of instruction, including , flu- ency, and comprehension strategies, and again, this makes it more difficult to make claims regarding systematic phonics per se. Because of these complexities, it is important to review systematic phonics and its relation to alternative methods in some detail so that the claims re- garding the importance of systematic phonics can be evaluated.

As noted above, systematic phonics explicitly teaches children grapheme-phoneme correspondences prior to emphasizing the meanings of written words. It is called systematic because it teaches grapheme-phoneme correspondences in a specific sequence as opposed to incidentally or on a 'when-needed' basis. Several versions of systematic phonics exist. Practi- tioners of (the version mandated in the UK), teach children the pronuncia- tions associated with in isolation and then coach students to blend the sounds to- gether. For example, a child might be taught to break up the written word into its com- ponent letters, pronounce each letter in turn—/d/, /ɔ/, /g/— then blend them together to form the spoken word “dog” (Bowey, 2006). By contrast, in , the of a given word are not read in isolation. Rather, children identify (analyze) words looking for a common target phoneme across a set of words. For instance, children are taught that , , and share the letter , which is pronounced /d/ (Moustafa & Maldonado- Colon, 1998). This approach may also teach children larger units of letter-sound correspond- ences, such as onsets (the
in ) and rimes (the in ). In other words, synthetic goes from “parts to wholes,” starting with letters and phonemes to build up words, whereas analytic phonics goes from “wholes to parts,” starting with words and breaking them into their component parts.

The main alternative to phonics is whole language that primarily focuses on the meaning of words presented in text. Teachers are expected to provide a literacy rich environ- ment for their students and to combine speaking, listening, reading, and writing. Students are taught to use critical thinking strategies and to use context to guess words that they do not recognize, based on their context. Importantly, whole language typically includes some em- bedded phonics, but the phonics instruction is not systematically taught (e.g., children are taught to sound out words when they cannot guess the word from context). For example, the authors of the NPR (2000) report write:

Whole language teachers typically provide some instruction in phonics, usually as part of invented activities or through the use of graphophonemic prompts during reading (Routman, 1996). However, their approach is to teach it unsystemati- cally and incidentally in context as the need arises. The whole language approach re- gards letter-sound correspondences, referred to as graphophonemics, as just one of three cueing systems (the others being semantic/meaning cues and syntactic/language cues) that are used to read and write text. Whole language teachers believe that phon- ics instruction should be integrated into meaningful reading, writing, listening, and SYSTEMATIC PHONICS 6

speaking activities and taught incidentally when they perceive it is needed. As chil- dren attempt to use written language for communication, they will discover naturally that they need to know about letter-sound relationships and how letters function in reading and writing. When this need becomes evident, teachers are expected to re- spond by providing the instruction.

The fact that whole language (and related methods) include non-systematic phon- ics turns out to be critical to the evaluations of the meta-analyses that follow.

Another approach to reading instruction called is designed to com- bine whole language with its focus on reading for meaning with systematic phonics. How- ever, it is often claimed that balanced literacy is effectively just another name for whole lan- guage, and that the phonics in balanced literacy is neither given enough emphasis nor is it taught systematically (e.g., Moats, 2000).

Another teaching method is called whole word or training in which chil- dren are taught to identify individual words (out of context) without breaking down the words into phonemes or other sub-lexical parts. For instance, in order to improve word naming, children might be given a list of written words and then one of the words is read aloud. The child’s task is to select the corresponding written word, with the goal of improving their abil- ity to read the word later (McArthur et al., 2013, 2015). Similarly, the look-say-cover-write method is commonly used in whole word instruction to teach children the spelling of words. In this method a child looks at a word, reads it aloud, covers the word up, and then attempts to spell the word (for review, see Browder, & Xin, 1998). Although whole word and whole language methods are different in many ways (most notably in whether words are presented in isolation or text), the two methods are often treated equivalently in the meta-analyses de- scribed below, and this has important implications for how the meta-analyses can be inter- preted.

Morphological instruction, like whole language or balanced instruction, emphasizes the importance of attaching meaning to words, but it also teaches children to break down words into their meaningful parts (prefixes, bases, and suffixes). For review of this method see Carlisle (2010). Related to this, Structured Word Inquiry (SWI) teaches children the in- terrelation between all the sub-lexical components of words (, , and et- ymology) in order to make sense of word with the aim of improving all aspects of literacy, including reading, spelling, , and comprehension (Bowers & Kirby, 2010). Like systematic phonics this approach explicitly teaches children the mappings be- tween graphemes and phonemes, but children are taught how these mappings are organized within morphemes (Bowers & Bowers, 2017, in press). This approach is discussed in more detail in the final section.

Recently, there has also been growing interest in the importance of improving oral language skills early for the sake of improving reading. Unlike the above methods, this ap- proach does not involve written words at all, and accordingly, can in principle be imple- mented even earlier than phonics (Fricke, Bowyer‐Crane, Haley, Hulme, & Snowling, 2013). Relatedly, there is recent work highlighting the importance of vocabulary for reading com- prehension (e.g., Brown, Mohr, Wilcox, & Barrett, 2017; Neuman, Newman, & Dwyer, 2011; Quinn, Wagner, Petscher, & Lopez, 2015; Valentini, Ricketts, Pye, & Houston-Price, 2018), and accordingly, various forms of reading instruction have been designed to increase vocabulary knowledge. SYSTEMATIC PHONICS 7

The overlap between methods and the claim that systematic phonics should be em- bedded with other methods makes the task of assessing the efficacy of systematic phonics more difficult. Nevertheless, proponents of systematic phonics are committed to two specific claims about what does and does not constitute good instruction, meaning that this approach can be evaluated.

First, it is claimed that systematic phonics should be taught before meaning-based ap- proaches that focus on the meaning of written words in the context of sentences or the mean- ingful sub-lexical structure of words (e.g., morphological instruction). For instance, Larkin and Snowling (2008, p. 374) write:

In line with the large body of research demonstrating the importance of phonological skills for literacy development (e.g., Caravolas et al., 2001), the findings from the present study emphasize the importance of teaching with an initial focus on mappings between phonemes and graphemes rather than wider language skills such as morphology

Similar claims that grapheme-phoneme correspondences should be taught prior to any mor- phological instruction is widespread (e.g., Adams, 1994; Ehri & McCormick, 1998; Henry, 1989). One common justification of delaying morphological instruction is the claim that children are not developmentally prepared for it (Frith, 1985; Ehri, 1997; Nunes, Bryant, & Bindman, 1997). And more pragmatically, it is noted that teaching children about the mor- phological structure of words at the start of instruction reduces the time that children can be taught systematic phonics given limited class-time (e.g., Taylor., Davis, & Rastle, 2017).

Second, it is claimed that grapheme-phoneme correspondences should be taught sys- tematically (as the name suggests). That is, there should be a program of instruction in which all the relevant grapheme-phoneme mappings are taught explicitly over the course of instruc- tion. This contrasts with the non-systematic phonics included in whole language and other al- ternative methods in which grapheme-phoneme mappings are only taught when a child is struggling with a given word. In non-systematic phonics, there is no explicit program for teaching grapheme-phoneme mappings, and no guarantee that all the key mappings are the target of instruction. The important point for present purposes is the widespread claim that systematic phonics is more effective than alternative methods that include non-systematic phonics (e.g., Galuschka, Ise, Krick, & Schulte-Körne, 2014; McArthur et el., 2012; National Reading Panel, 2000; Torgerson, Brooks, & Hall, 2006; Rose, 2006, 2009).

To summarize, there are a number of different forms of reading instruction, some of which emphasize letter-sound mappings before other properties of words (e.g., systematic phonics), others that emphasize meaning from the start (e.g., whole language) and others that claim that phonology and meaning should be taught together from the beginning (structured word inquiry). There is no disagreement that reading instruction needs to ultimately incorpo- rate both meaning and phonology, but the widespread consensus in the research community is that instruction needs to systematically teach children the grapheme-phoneme correspond- ences before meaning-based strategies are emphasized. Accordingly, almost all researchers today claim that systematic phonics is better than whole language, balanced literacy, and all forms of instruction that consider morphology at the start. The evidence for this claim in now considered.

SYSTEMATIC PHONICS 8

A Review of the Experimental Evidence Taken to support Systematic phonics

Multiple systematic reviews, meta-analyses, and government reviews have concluded that systematic phonics is the most effective method of reading instruction for English chil- dren of different ages and abilities based on the experimental evidence. In this section I pro- vide detailed critiques of 12 meta-analyses and 2 government reports, as well as a recent re- view of these meta-analyses. I also briefly consider neuroscience findings that have been taken to support systematic phonics.

National Reading Panel (2000) and Ehri et al. (2001) Meta-Analyses The seminal report most often taken to support the efficacy of systematic phonics com- pared to alternative methods was produced by the National Reading Panel (NRP, 2000), with the findings later published in peer review form (Ehri, Nunes, Stahl,, & Willows, 2001). The authors carried out a meta-analysis evaluating the effects of systematic phonics compared to forms of instruction that include unsystematic or no-phonics across a range of reading measures, including word naming, nonword naming, and text comprehension tasks. The meta-analysis included 66 treatment-control comparisons taken from 38 experiments. A num- ber of key findings were reported, including:

1) An overall effect of systematic phonics instruction on reading (d = 0.41) and the effects persisted after instruction ended when children were tested 4-12 months after the intervention (d = .27). 2) Effects were larger when phonics instruction began by first grade (d = 0.55) rather than after first grade (d = 0.27). 3) Phonics benefited decoding (naming) of regular words (.67), pseudowords (.60), and miscellaneous words (both regular and irregular) (.40), as well as the spelling of words (.35), reading text orally (.25), and comprehending text (.27). 4) Different forms of systematic phonics instruction (including synthetic and analytic) all improved reading performance to the same extent. The NPR report has been cited over 20,000 times and continues to be used in support of sys- tematic phonics, with over 1000 citations in 2017.

However, as detailed below, the authors of the NPR report noted that the benefits of sys- tematic phonics were limited in some important ways. And more importantly, there are fun- damental methodological, statistical, and conceptual problems with the analyses that under- mine the conclusions that were drawn. I consider these points in turn.

Limitations Considered by the NRP Authors Themselves

Perhaps the most important limitation noted by the authors of the report is that sys- tematic phonics did not help children who were labelled “low achieving” poor readers (d = .15, not significant). These were children above first grade who were below average readers and whose cognitive level was below average or was not assessed. By contrast, children who were below grade level in reading but at least average cognitively and were above first grade in most cases did benefit (d = .32); so-called “Students with a ”. The find- ing that the systematic interventions did not benefit the former group is problematic given that many struggling readers have a range of language problems (e.g., Snowling, 2009), and SYSTEMATIC PHONICS 9 comorbidity among developmental disorders such as , language impairment, atten- tion deficit/hyperactivity disorder and developmental coordination disorder is common (Gooch, Hulme, Nash, & Snowling, 2014).

Second, although the meta-analysis is often taken to support the importance of intro- ducing phonics early, the authors of the NPR emphasized that their evidence for this conclu- sion was weak. One problem is that the majority of older students (78%) in the various stud- ies included in the NPR analysis were either low achieving readers or students with reading disability, and as noted above, systematic phonics was less effective with these populations (especially the former group). Accordingly, the reduced effectiveness of phonics after grade 2 may reflect the reading disorders of the older children rather than the age of the children per se. Unfortunately, it is not possible to carry out the appropriate analysis given that the NRP meta-analysis only included seven comparisons with normally developing older readers, and four of these came from one study by Vickery, Reynolds, and Cochran, S.W. (1987). This is problematic given that Vickery et al. (1987) study employed the Orton-Gillingham method that was developed for younger students with reading difficulty rather than normal achieving older readers. As noted by Ehri et al. (2001):

Other types of phonics programs might prove more effective for older readers without any reading problems, for example, phonics programs that improve the decoding of multisyllabic word. (p. 428)

So the finding that older children did not improve as much as younger children might simply reflect the fact that many of the older children were given the wrong type of phonics instruc- tion. Furthermore, as discussed below, the Vickery et al. (1987) study should not have been included in the NRP analyses because it did not employ a control condition (Camilli, Vargan, & Yurecko, 2003). Given all these considerations, the NRP provides little or no basis for the conclusion that it is important to introduce systematic phonics early.

Third, although the authors emphasized the longevity of the effects, they also noted the overall effects were roughly half the size, with the effect size of the seven relevant studies declining from d = .51 to d = .27. However, the authors did not break down performance across reading measures in the text of the NPR. When looking at the tables, the largest long- term effect was found for naming nonwords (four studies contributing to an estimate of d= .61) and the smallest for comprehension (five studies contributing an estimate of d= .08), with other measures in-between (often with few studies to produce reliable estimates of effect sizes). Therefore, there is no evidence that the long-term benefits extend to the most im- portant measures of reading.

In sum, based on the authors own analysis, the appropriate conclusion from the NPR report is that systematic phonics provides small to moderate short-term benefits on word reading, spelling, and comprehension as long as children are average or above on non-reading cognitive measures, and these effects are substantially reduced following a delay of 4-12 months, with no long-term benefits on comprehension. It is hard to reconcile these findings with widespread claim that the NPR report provides strong support for systematic phonics.

Camilli et al. (2003, 2006)

SYSTEMATIC PHONICS 10

Although a number of researchers have challenged the conclusions of the NPR report, here I focus on two re-analyses of the NPR dataset carried out by Camilli and colleagues (Ca- milli et al., 2003, Camilli, Wolfe, & Smith, 2006). These articles have largely been ignored, but they provide the most forceful argument against the conclusions mostly commonly drawn from the NPR analysis.

Camilli et al. (2003) made two general points regarding the NPR meta-analysis. First, the authors identified a fundamental conceptual problem with the design of the study that means that the results do not mean what most people think. This point is especially im- portant, not only because it plays the most important role in challenging the conclusions of the NPR report, but the same conceptual confusion applies to all subsequent meta-analyses (other than Camilli et al., 2006). Second, they raise a number of statistical and methodologi- cal mistakes of the NPR that also contributed to false conclusions. I consider these in turn. Regarding the conceptual flaw, the NPR meta-analysis did not test the hypothesis that systematic phonics is more effective than common alternative methods used in schools, in- cluding whole language. This might seem a surprising claim given statements by the authors of the NPR report, such as the following statement in the abstract:

Students taught systematic phonics outperformed students who were taught a variety of nonsystematic or non-phonics programs, including basal pro- grams, whole language approaches, and whole word programs [bold added]. (NPR, 2000, p. 2-134).

But in fact, the analyses addressed a different hypothesis, as made clear in the following pas- sage:

…findings provided solid support for the conclusion that systematic phonics instruction makes a more significant contribution to children’s growth in reading than do alternative programs providing unsystematic or no phonics instruction [bold added] (NRP, 2000, p. 2-132).

On first reading these two passages might appear to say the same thing. But they do not, and the difference greatly impacts the conclusion you can draw. The first passage claims that the NPR analysis tested and confirmed the hypothesis that systematic phonics is better than whole language (and other alternative methods). However, as clarified in the second passage above, the meta-analysis in the NPR study compared systematic phonics to a control condition that combined to separate conditions, namely, (1) intervention studies that included unsystematic phonics and (2) intervention studies that included no phonics. Why is this im- portant? Because within the NPR analysis, most whole language intervention studies in- cluded non-systematic phonics instruction, it is possible that the advantage of systematic phonics in the NPR analyses was due to the poor performance in the non-phonics condition, with children in the systematic and non-systematic phonics conditions doing similarly.

This is a striking limitation of the NPR analysis given that the NPR is widely under- stood to support systematic phonics over whole language. Indeed, this is the question that matters most given that the introduction of systematic phonics is only justified if it improves reading outcomes compared to the most common alternatives (whole language and related balanced literacy instruction). However, the design of the NPR does not even test this hy- pothesis.

SYSTEMATIC PHONICS 11

Of course, the fact that the NPR failed to distinguish between non-systematic phonics and no phonics does not falsify the claim that phonics is better than whole language. It is quite possible that if the relevant analysis was carried out, systematic phonics would be better than non-systematic phonics interventions. This motivated a series of re-analyses of the NPR dataset by Camilli et al. (2003, 2006) in which the studies were explicitly coded with regards to whether they included systematic phonics, non-systematic phonics, and no phonics. The key question was whether systematic phonics was better than non-systematic phonics. How- ever, before discussing results of this new analysis, the statistical and methodological limita- tions of the NPR report need to be considered.

Camilli et al. (2003, 2006) noted that some moderator variables were ignored by the NPR analysis that may have contributed to the outcomes. Accordingly, when coding the stud- ies, they added the degree to which the interventions included language-based reading activi- ties, whether treatments were carried out in the regular class or involved tutoring outside the class, and whether basal readers were used (if known). Both the experimental and control groups were coded with regards to these moderator variables.

In addition, the Camilli et al. (2003, 2006) analyses were carried out on a slightly modified dataset given problems with some of the studies and conditions included in the NPR report. For example, the new analysis dropped one study (Vickery et al., 1987) that did not include a control condition (an exclusion condition according to the NPR), and included three studies that were incorrectly excluded (they did fulfil the NRP inclusion criterion), resulting in a total of 40 rather than 38 studies. Furthermore, some conditions from some of the NPR studies were inappropriately included or excluded from the analysis in ways that impacted on the results. For example, Tunmer and Hoover (1993) compared the efficacy of three different reading programs: A Modified program (that included systematic phon- ics), a Standard Reading Recovery program (that only included non-systematic phonics), and a Standard Intervention condition that received diverse support services typically available to at risk readers. In order to assess the impact of systematic phonics the NPR analysis com- pared the Modified Reading Recovery group to Standard Intervention condition and obtained extremely large effect sizes for word identification (d = 2.94), spelling (d = 1.63), nonword reading (d = 1.49), and an implausibly large effect on oral reading (d = 8.79). However, as noted by Camilli et al. (2003), the authors should have compared the Modified and Standard Reading Recovery conditions that specifically differed in terms of systematic phonics (in- deed, the authors of the original study considered the Standard Reading Recovery condition the relevant control condition). When this contrast is considered then the effect sizes are recomputed as follows: word identification (d = -0.12), spelling (d = -0.25), nonword naming (d = -0.12), and oral reading (d = 0.12). Similar problems were observed in the NRP anal- yses of other studies, and the new analyses were designed to overcome these limitations.

The new analyses assessed the impact of systematic phonics compared to non-system- atic phonics once new moderator variables were added and the above problems were ad- dressed. They produced quite different outcomes. Specifically, Camilli et al. (2003) reported that the effect size of systematic phonics compared to non-systematic phonics (d = .24) was roughly half the size of the effect of systematic phonics reported in the NPR report (d = .41). An updated analysis by Camilli et al., (2006) on the same dataset that included another mod- erator variable revealed an even smaller effect of systematic phonics (d = .12) that was no longer significant.

SYSTEMATIC PHONICS 12

The Camilli et al. (2003) paper did receive some attention, with some authors claim- ing the results provided additional support for systematic phonics. For instance, Shanahan (2004) characterized the results as “identical to the NRP’s original conclusion” (p. 261). In response, Camilli et al. (2006) wrote:

Contrary to the beliefs of Shanahan (2004), an average effect size of d = .24 is small and falls well below the threshold of a potent educational intervention. Subsequent research has suggested that the effect is smaller still (d = .12) and not statistically significant…

Despite the fact that the Camilli et al. re-analyses challenged the main conclusion of the highly influential NPR report, the findings have largely been ignored. As far as I am aware, only a single paper by Stuebing et al. (2008) provided a detailed critique of the re- analyses. The authors carried out their own analyses on the Camilli et al. (2003, 2006) da- taset. When testing the hypothesis that systematic phonics interventions are more effective than a control condition that included both non-systematic and no phonics interventions (the original NPR hypothesis) they obtained similar results, with an effect size of d = .39 as op- posed to an NRP effect size of d = .41. Similarly, when comparing systematic phonics to a non-systematic phonics control condition, Stuebing et al. (2008) reported an effect size of .27, again similar to Camilli et al. (2003) estimate of .24. Stuebing et al. (2008) concluded that the NPR and Camilli analyses were not in fact in conflict, but rather, the analyses simply asked different questions. They wrote:

The NRP question is analogous to asking about the value of receiving the intervention versus not receiving the intervention. The Camilli et al. (2003) report is analogous to asking what is the value of receiving a strong form of the intervention compared to a receiving weaker forms of the intervention and relative to factors that moderate the outcomes. From our view, both questions are reasonable for intervention studies.

But it is not correct to claim that the two questions are equally reasonable if the find- ings are to be relevant for teaching policy and practice. The relevant question is whether sys- tematic phonics is better than whole language and other existing alternative teaching prac- tices that include non-systematic phonics instruction. There is no reason to make a change if the change does not improve outcomes. And in fact, non-systematic phonics has always been common in alternative methods of instruction. For example, prior to the mandatory require- ment to teach systematic phonics in the UK in 2007, Her Majesty’s Inspectorate (1990) re- ported on the teaching and learning of reading observed in 470 classes and over 2,000 chil- dren. They wrote:

...phonic skills were taught almost universally and usually to beneficial ef- fect’’ (p. 2) and that ‘‘Successful teachers of reading and the majority of schools used a mix of methods each reinforcing the other as the children’s reading developed (p. 15).

This same point was made with regards to teaching in the USA. As the authors of the NPR (2000) report wrote:

In the present day, whole language approaches have replaced the whole word method as the alternative to systematic phonics programs. The shift has involved a change SYSTEMATIC PHONICS 13

from very little letter-sound instruction in 1st grade to a modicum of letter-sounds taught unsystematically… Whereas in the 1960s, it would have been easy to find a 1st grade reading program without any phonics instruction, in the 1980s and 1990s this would be rare. (p. 2-102)

Accordingly, the Camilli et al. (2006) reanalysis shows that there is no evidence that system- atic phonics is better than the methods that were employed in England prior to the introduc- tion of systematic phonics, nor better than whole language instruction as practiced in the USA in the 1980-1990s. In addition, as noted above, the NRP panel itself claims that their analysis addresses the question as to whether systematic phonics is better than whole lan- guage and related teaching practices. That is, the NPR report is itself confused about which question it was addressing.

In a follow-up paper, Camilli, Kim, & Vargas (2008) highlighted some fundamental analytical mistakes and conceptual confusions with the Stuebing et al. (2008) critique. But as Camilli et al. (2008) note, even if the Stuebing et al. re-analysis of Camilli et al. (2003) is ac- cepted at face value (which the Camilli et al. do not), it is still the case that Stuebing et al. re- ported that the difference between systematic phonics and non-systematic phonics (.27) is re- duced compared to the headline figure of .41 from the NRP report. It is also important to em- phasize that this .27 is an overall average effect, and presumably, the effect sizes are smaller when considering the variables that we should care about most (e.g., reading , vocabu- lary, and text comprehension) compared to the factor that tends to be the largest but that mat- ters least (e.g., nonword reading). But the breakdown of effect sizes across reading measures was not reported. It is also the case that the .27 effect would likely be reduced by approxi- mately half following a short delay of 4-12 months given the pattern of results reported in the NPR analysis. It is also likely an overestimation given that the updated Camilli et al. (2006) analyses estimated the effect of systematic phonics at d = .14.

It is unfortunate that the Camilli et al. (2008) has been cited a total of 7 times accord- ing to Google Scholar between 2008-2017, with no response from Stuebing and colleagues. During the same period, the NRP meta-analysis has been cited ~15,000 times and continues to be cited as strong evidence that systematic phonics is more effective than alternative meth- ods, including whole language. Given the above considerations, the NPR report should not be used to make this claim.

To avoid any conclusion, it is important to highlight that the Camilli et al. (2006) re- sults do not suggest that grapheme-phoneme knowledge is unimportant to reading. Indeed, their reanalysis showed that systematic phonics is significantly better than a non-phonics con- trol condition. But with regards to the “reading wars” that pitted systematic phonics against whole language, the Camilli et al. (2006) re-analysis of the NPR (2000) data suggests a draw.

Torgerson et al. (2006)

The Torgerson et al. (2006) meta-analysis was primarily motivated by another key limitation of the NPR report not touched on thus far, namely, the fact that the NPR included studies that employed both randomized and non-randomized designs. Given the methodolog- ical problems with non-randomized studies, Torgerson et al. (2006) carried out a new meta- analysis that was limited to randomized control trials (RCTs). But it is worth noting two ad- ditional limitations of the NPR report that motivated this analysis.

SYSTEMATIC PHONICS 14

First, the authors were concerned that bias played a role in 13 RCT studies included in the original NPR report given that the NPR report only considered published studies (stud- ies that obtained null effects may have been more difficult to publish). Indeed, the authors carried out a funnel plot analysis on these 13 studies and concluded that the results provided: “…prima facie evidence for publication bias, since it seems highly unlikely that no RCT has ever returned a null or negative result in this field”. Accordingly, Torgerson et al. (2006) searched for unpublished studies that met their inclusion criteria. They found one additional study that reported an effect size of -0.17 that they included in their analyses. Note, the Ca- milli (2003, 2006) meta-analyses did not consider the role of bias in inflating effect sizes.

Second, Torgerson et al. noted a number of methodological and statistical problems that they attempted to address. For example, the NPR counted children in the control groups multiple times (each child counted four times in one study) in order to extract 66 comparisons from the 38 trials. This spuriously increased the precision of the estimated effects, and ac- cordingly, the Torgerson et al. analysis avoided double counting participants. The authors also removed two studies that should have been excluded from the NRP analyses (Gittelman & Feingold, 1983, because it did not include a phonics instruction intervention group; Man- tzicopolous et al., 1992, because the children in the control condition did not receive a read- ing intervention, and the attrition rate of the studies was extreme, with 437 children random- ized and only 168 children tested). Note, these studies were included in the Camilli et al. (2003, 2006) re-analysis.

In total, the authors were able to identify 12 studies that compared systematic phonics to a control condition that included unsystematic phonics or no phonics instruction control (same number as in the original NPR report, with one study dropped, one unpublished study added). The key positive result was with regards to word reading accuracy with an effect size estimated to between .27 and .38 (depending on assumptions built into the analyses). By con- trast, no significant effects were obtained for comprehension (d estimates ranging between .24 and .35), or spelling (d = .09).

There are, however, reasons to question the significant word reading accuracy results. This result was largely due to one outlier study (Umbach et al., 1989) that obtained a massive effect on word reading accuracy (d = 2.69)1. In this study, the control group was taught by two regular teachers with help from two university supervised practicum students, whereas the experimental group was taught by four masters degree students who were participating in a practicum at a nearby university. Accordingly, there is a clear confound in the design of the study. As reported by Torgerson et al. themselves, when this study was excluded the effect was much reduced (d estimates between .20 and .21) and the effect was only marginal on one analysis (p = .03) and non-significant (p = .09) on another. And once again there was evi- dence that bias may have inflated the estimate of effect sizes in this study. As Torgerson et al. wrote:

In addition, the strong possibility of publication bias affecting the results cannot be excluded. This is based on results of the funnel plot... It seems clear that a cautious approach is justified (p. 48).

1 For some reason the NPR report estimated the word reading accuracy effect size to be 1.3 whereas Torgerson et al. reported it to be 2.69. Umbach and Darch did not report standardized effect sizes themselves, but reported that the word identification scores from the Woodcock Reading Master subtest were 30.43 and 10.36 in the ex- perimental and control conditions, respectively. SYSTEMATIC PHONICS 15

If the impact of the outlier study and the bias are considered together (as they should), no sig- nificant effect of spelling would be obtained. Relatedly, the authors questioned the quality of the existing research, writing:

…none of the 14 trials reported method of random allocation or sample size justifica- tion, and only two reported blinded assessment of outcome… all were lacking in their reporting of some issues that are important for methodological rigour. Quality of re- porting is a good but not perfect indicator of design quality. Therefore due to the lim- itations in the quality of reporting the overall quality of the trials was judged to be ‘variable’ but limited.

Another key point to emphasize is that Torgerson et al. once again compared system- atic phonics to a control condition that combined non-systematic and non-phonics (as in the NRP report). So again, this makes it inappropriate to make any conclusions regarding whether systematic phonics is better than whole-language and other methods that include non-systematic phonics. Nevertheless, despite all these issues, including those raised by Torgerson et al. themselves, the authors concluded:

Two of the main findings of the current review supported those of Ehri et al. (2001), namely that systematic phonics instruction enables children to make better progress in reading accuracy than unsystematic or no phonics, and that this is true for both normally-developing children and those at risk of failure.

These conclusions are not justified. Indeed, the study does not even test for the hypothesis that systematic phonics is better than non-systematic phonics.

Rose (2006)

It is worth briefly mentioning the Rose (2006) report that led to the legal requirement to teach systematic synthetic phonics in UK state schools. The main goal of the report was not to compare systematic phonics to alternative methods, but rather, to compare two differ- ent sorts of systematic phonics, namely, analytic vs. synthetic phonics. Rose concludes that synthetic phonics is the better approach, but most relevant for present purposes, claims that both forms of systematic phonics are better than alternative methods, writing:

Having considered a wide range of evidence, the review has concluded that the case for systematic phonic work is overwhelming and much strength- ened by a synthetic approach’’ (Rose, 2006, p. 20).

Although this report is frequently cited as providing strong support for systematic synthetic phonics, it did not provide a systematic review of the research literature, and re- search was only one line of evidence that was relied on. Other considerations were theoreti- cal, school visits, and testimonials (e.g., Rose, 2006, notes that one teacher commented “I have never seen results like this in 30 years of teaching”, p.63). What is quite striking is how little weight was given to the actual research in reaching the recommendation, including quotes such as:

SYSTEMATIC PHONICS 16

The review’s remit requires a consideration of ‘synthetic’ phonics… ‘through examination of the available evidence and engagement with the teaching profession and education experts’. Having followed those direc- tions, and notwithstanding the uncertainties of research, [bold added] there is much convincing evidence to show from the practice observed that, as generally understood, ‘synthetic’ phonics is the form of systematic phonic work that offers the vast majority of beginners the best route to be- coming skilled readers.

This dismissive view of research is again highlighted when Rose (2006) brushed aside serious criticisms of the ‘Clackmannanshire studies’ (Johnston & Watson, 2003, 2004, 2005; Watson & Johnston, 1998) that provided the main empirical support for synthetic phonics. Rose (2006) writes:

Although the research methodology had received some criticism by research- ers, the visit provided the review with first-hand evidence of very effective teaching and learning of phonic knowledge and skills of the children in the P1 classes, as well as much useful contextual information, which was also asso- ciated with their success. Focusing on the practice observed in the class- room and its supportive context, rather than debating the research, is therefore not without significance for this review. [bold added]

Wyse and Goswami (2008) provide a detailed analysis of why the conclusion that synthetic phonics is better than analytic phonics is not supported by the Rose (2006) report. But most important for present purposes, this report provides no empirical evidence that systematic phonics is better than non-systematic phonics, despite many claims to the contrary (the report has been cited over 1500 times).

MacArthur et al. (2012)

This meta-analysis was designed to assess the efficacy of systematic phonics with children, adolescents, and adults with reading difficulties. The authors included studies that use randomization, quasi-randomization, or minimization (that minimizes differences be- tween groups for one or more factors) to assign participants to either a phonics intervention group, or a control group that received no training or alternative training that did not involve any reading activity (e.g., math training). That is, the control group received no phonics at all. Based on these criteria the authors identified 11 studies that assessed a range of reading out- comes, although amount of data for the different outcome measures varied considerably. Critically, the authors found a significant effect of word reading accuracy (d = .47, p = 0.03) and nonword reading accuracy (d = 0.76, p < 0.01), whereas no significant effects were ob- tained in word reading fluency (d = -0.51; expected direction), (d = 0.14), spelling (d = 0.36), and nonword reading fluency (d = 0.38, the unexpected direction). Additional analyses also failed to observe an effect of type of systematic phonics, training in- tensity (less or more than two hours per week), training duration (less or more than three months), training group size (one-on-one versus small group training), or training administra- tor (human vs. computer administration). The authors note that the small sample sizes for many of these dependent measures may explain the non-significant results obtained.

The authors also provide a detailed analysis of possible bias in studies. Although only one study explicitly described their random sequence procedure (Ford, 2009), and no study SYSTEMATIC PHONICS 17 was reported to be blind regarding either allocation to condition or analysis of the datasets, a funnel plot carried out on the word accuracy reading scores (the only dependent measure to have enough observations for this method) did not provide evidence for bias.

Based on the results the authors concluded that phonics improved performance, but they were also cautions in their conclusions, writing:

…there is a widely held belief that phonics training is the best way to treat poor reading. Given this belief, we were surprised to find that of 6632 rec- ords, we found only 11 studies that examined the effect of a relatively pure phonics training programme in poor readers. While the outcomes of these studies generally support the belief in phonics, many more randomised con- trolled trials (RCTs) are needed before we can be confident about the strength and extent of the effects of phonics training per se in English- speaking poor word readers.

Nevertheless, the results are frequently taken to provide good evidence in support of system- atic phonics.

But there are reasons to question even these modest conclusions of MacArthur et al. One notable feature of the word reading accuracy results is that they were largely driven by two studies (Levy et al., 1997; Levy et al. 1999) with effect sizes of d = 1.12 and d= 1.80, re- spectively. The remaining 8 studies that assessed reading word accuracy reported a mean ef- fect size of .16 (see appendix 1.1, page 63). This is problematic given that the children in the Levy studies were trained on one set of words and then reading accuracy was assessed on an- other set of words that shared either onsets or rhymes with the trained items (e.g., a child might have been trained on the word beak and later be tested on the word peak; the stimuli were not presented in either paper). Accordingly, the large benefits observed in the phonics conditions compared to a non-trained control group only shows that training generalized to highly similar words. Accordingly, the results do not provide any evidence that word reading accuracy improved in general (the claim of the meta-analysis). In addition, both Levy et al. studies taught systematic phonics using one-on-one tutoring. Although McArthur et al. re- ported that group size did not have an overall impact on performance, their supplemental analyses revealed that the overall effect of effect of training group size was numerically large (d = .62) and approached significance (p = .07). Furthermore, this analysis of training group size conflates one-on-one training when the child worked on a computer by him/herself (three studies, with average effect size d = .25) and the most relevant comparison, namely, one-on- one training studies with a tutor (three studies, with average effect size of d= .93). Accord- ingly, the large effects size for word reading accuracy may be more the product of one-on- one training with a tutor rather than any benefits of phonics per se, consistent with the find- ings of Camilli et al. (2003). Absent these two studies, there is no evidence that word reading accuracy improved in the systematic phonics condition, leaving only a benefit for nonword reading.

And once again, this meta-analysis suffers from the limitation of all the previous meta-analyses, namely, it did not compare performance in a systematic vs. unsystematic phonics condition. Indeed, in this meta-analysis, performance in a systematic phonics condi- tion was compared to no extra training at all, or to training on non-reading tasks. Accord- ingly, this analysis cannot be used to make any claims that systematic phonics is better than standard alternative methods, such as whole language. SYSTEMATIC PHONICS 18

Galuschka et al. (2014)

Galuschka et al. carried out a meta-analysis of randomized controlled studies that fo- cused on children and adolescents with reading difficulties. The authors identified twenty- two trials with a total of 49 comparisons of experimental and control groups that tested a wide range of interventions, including five trials evaluating reading fluency trainings, three phonemic awareness instructions, three reading comprehension trainings, 29 phonics instruc- tions, three auditory trainings, two medical treatments, and four interventions with colored overlays or lenses. Outcomes were divided into reading and spelling measures.

The authors noted that only phonics produced a significant effect, with an overall ef- fect size of g’ =. 32, and concluded:

This finding is consistent with those reported in previous meta-analyses... At the current state of knowledge, it is adequate to conclude that the system- atic instruction of letter-sound-correspondences and decoding strategies, and the application of these skills in reading and writing activities, is the most effective method for improving literacy skills of children and adolescents with reading disabilities

However, there are serious problems with this conclusion. Most notably, the overall effect sizes observed for phonics (g’ =.32) was similar to the outcomes with phonemic aware- ness instruction (g’ = .28), reading fluency training (g’ = .30), auditory training (g’ = .39), and colour overlays (g’= .32), with only reading comprehension training (g’ = .18) and medi- cal treatment (g’ = .12) producing numerically reduced effects. The only reason significant results were only obtained for phonics is that there were many more phonics interventions. In order to support their conclusion, the authors need to show an interaction between the phonics results and the alternative methods. They did not report such an analysis, and given the similar size effects across conditions (with small sample sizes), this analysis would al- most certainly not be significant.

Furthermore, the authors reported evidence that the results in the published phonics studies were biased using a funnel plot analysis. Using a method called Duval and Tweedie’s trim and fill they measured the extent of publication bias and estimated an unbiased effect size for systematic phonics of g’= 0.198. Although this small effect was still reported to be significant, the authors (once again) did not assess whether systematic phonics was more ef- fective than non-systematic phonics, let alone show that systematic phonics is more effective than the alternative methods they investigated.

Suggate (2010, 2016)

Suggate (2010) carried out a meta-analysis to investigate the relative advantages of systematic phonics, , and comprehension-based interventions with children at-risk of reading problems. The central question was whether different forms of in- terventions were more effective with different age groups of children who varied from pre- school to Grade 7.

The meta-analysis included peer-reviewed randomized and quasi-experimental stud- ies, with control groups receiving either typical instruction or an alternative “in-house” SYSTEMATIC PHONICS 19 school reading intervention. Three studies (one of each sort of intervention) reported effects larger than 2.00, and these were excluded as outliers. This lead to the inclusion of 85 studies, with 116 interventions. Of these interventions, 13 were classified as phonological awareness, 36 as phonics, 37 as comprehension based, and 30 as mixed. Twelve studies were conducted with participants who did not speak English. A range of dependent measures were assessed, from pre-reading (e.g., letter knowledge, phonemic/sound awareness), reading, and compre- hension measures.

Averaging over age, similar overall effects were for phonological awareness (d = .47), phonics (d = .50), meaning based (d = .58), and mixed (d = .43). The critical novel finding, however, was that there was a significant interaction between method of instruction and age of child, such that phonics was most useful in Kindergarten for reading measures, but alterna- tive interventions were more effective for older children. As Suggate (2010) writes:

If reading skills per se are targeted, then there is a clear advantage for phon- ics interventions early and—taking into account sample sizes and available data— comprehension or mixed interventions later.

However, this is not a safe conclusion. First the difference in effect size in phonics compared to alternative methods was approximately d = .10 in Kindergarten and .05 in Grade 1 (as estimated from Figure 1 in Suggate, 2010). This is not a strong basis for arguing the importance of early systematic phonics. And furthermore, Suggate notes that “there were too few preschool phonological awareness studies to include in Figure 1, preventing strong con- clusions about phonics versus [phonological awareness] interventions in preschool”, making any conclusions regarding the importance of systematic phonics in the early years even more tentative. And once again, the treatments were compared to a control condition that combined a range of teaching conditions, and accordingly, it is again unclear whether there was a differ- ence between systematic vs. non-systematic phonics during early instruction.

It is also worth noting that the effect sizes of all interventions may have been overesti- mated given other factors. For example, 10% of the studies included in the meta-analysis were carried out on non-English children. Although the overall difference between non-Eng- lish (d = 0.61) and English (d = .48) studies was reported as nonsignificant, the difference in the effect sizes was borderline (p = .06). Indeed, the phonics intervention that reported the very largest effect size (d = 1.37) was carried out in Hebrew speakers (Aram & Biron, 2004), and this study contributed to the estimate of the phonics effect size in pre-kindergarten. Ac- cordingly, the small advantage of phonics for early English instruction is inflated. In addi- tion, the effect sizes were larger for quasi-experimental studies (d= 0.64) than for random- ized-control designs (d = 0.41), p <.001, again raising concerns that the effect size estimates for early phonics instruction was inflated.

But perhaps the most critical limitation is that Suggate’s (2010) conclusion was not supported in a subsequent Suggate (2016) meta-analysis that considered the long-term effects of various forms of reading interventions, as discussed next.

The Suggate (2016) meta-analysis included 71 experimental and quasi-experimental reading interventions that assessed the short- and long-term impacts of phonemic awareness, phonics, fluency, and comprehension interventions on pre-reading, reading, reading compre- hension, and spelling measures. The analysis revealed an overall short-term effect (d = 0.37) SYSTEMATIC PHONICS 20 that decreased in a follow-up test (d = 0.22; with mean delay of 11.17 months). This drop of approximately 50% following a delay is similar to that reported in the NPR (2000).

What is most surprising, however, is phonics and fluency interventions were the most short-lived whereas comprehension and phonemic awareness interventions produced the best long-term benefits. Specifically, the short- and long-term effects were as follows: phonics, d = .29 and = .07; fluency, d = .47 and .28; comprehension, .38, .46; phonemic awareness, d = .43 and d = .38. This poor long-term outcome of phonics was observed despite the fact that the mean grade of children receiving phonics (a mixture of reception and grade 1 children, with mean grade = .45; reception was coded as grade 0) and phonemic awareness (mean grade = .5) were similar, and lower than children receiving comprehension (mean grade = 3.1) and fluency (mean grade = 1.25) interventions. Accordingly, this most recent meta-anal- ysis not only found phonics to be the least effective method, but also, the findings contradict the common claim that earlier instruction is more effective than after instruction (e.g., NPR, 2000).

As with the other meta-analyses there are additional issues that should be raised. For example, a funnel plot observed evidence for publication bias, especially in the long-term condition, and again, the study does not contrast systematic vs. non-systematic phonics. It is striking that long-term benefits of systematic phonics are so small despite these factors that should be expected to inflate effect sizes. The important implication of this work is that any small benefits of systematic phonics observed in the previous meta-analyses (once all the concerns with them are addressed) are short-lived.

Other meta-analyses and a systematic review of meta-analyses

There are a number of additional relevant meta-analyses and reviews of meta-analyses that should be mentioned briefly as well.

Hammill and Swanson (2006). These authors took a different approach to Camilli et al. (2003, 2006, 2008) in criticizing the NPR (2000) report. Rather than challenging the logic and analyses themselves, they noted that the effects sizes reported in the NPR were small and questioned their significance.

The NPR reported that systematic phonics instruction was effective across a variety of conditions, with 94% of the d’s supporting the superiority of phonics instruction over other approaches. However, as noted by Hammill and Swanson, the standard convention in evalu- ating the magnitude of d’ sizes (d = 0.2 is small, d = .5 medium, and .9 large) reveals that 65% of the significant d’s were small. In order to get a better intuitive understanding of the practical significance of the results, the authors converted all these d’s values to r-type statis- tics. They noted that the overall effect of .44 corresponds to an r2 value of .04. That is, 96% of the variance in reading achievement can be attributed to factors other than the systematic phonics instruction.2 The r2 value for the follow-up analysis (4-12 months later) was .02.

What Hammill and Swanson do not acknowledge, however, is that these small effect sizes translate into real benefits when considering an entire population of children. The real

2 Note, the overall effect size reported in the NPR are estimated as .44 or .41, depending on the specific tests that are included in the analyses. SYSTEMATIC PHONICS 21 problem is not that the effect sizes in the NPR report were too small to be of any value, it is that the effect sizes were inflated for the reasons noted above, and further, that the meta-anal- ysis did not even test the critical hypothesis of whether systematic phonics is better than alter- native methods such as whole language (that typically include non-systematic phonics in- struction).

Han (2010) and Adesope, Lavin, Tompson, and Ungerleider (2011). These authors reported meta-analyses that assessed the efficacy of phonics for non-native English speakers learning English. Hans (2010) included 5 different intervention conditions and dependent measures and reported the overall effect sizes as 0.33 for phonics, 0.41 for phonemic aware- ness, 0.38 for fluency, 0.34 for vocabulary, and 0.32 for comprehension. In the case of Adesope et al. (2011), the authors found that systematic phonics instruction improved perfor- mance (g = +0.40), but they also found that an intervention they called Collaborative Reading produced a larger effect (g = +.48) as did a condition called Writing (Structured & Diary) that produced an effect of g = +.54. Accordingly, ignoring all other potential issues discussed above, these studies do not provide any evidence that phonics is the most effective strategy for reading acquisition.

Sherman (2007). Sherman compared phonemic awareness and phonics instruction with students in grades 5 through 12 who read significantly below grade level expectations. Neither method was found to provide a significant benefit.

Torgerson et al. (2018). Torgerson et al. recently carried out a brief systematic re- view of existing meta-analyses that assessed the efficacy of systematic phonics. They identi- fied 12 meta-analyses, all of which were considered above. The authors raise a number of concerns regarding design and publication bias of studies included in these meta-analyses and argued that more data (in the form of large randomized controlled studies) are needed before strong conclusions can be made. Indeed, they even challenge the common claim that system- atic phonics is more effective that balanced literacy instruction that combines systematic phonics with whole language, writing:

Given the evidence from this tertiary review, what are the implications for teaching, policy and research? It would seem sensible for teaching to in- clude systematic phonics instruction for younger readers – but the evidence is not clear enough to decide which phonics approach is best. Also, in our view there remains insufficient evidence to justify a ‘phonics only’ teaching policy; indeed, since many studies have added phonics to whole lan- guage approaches, balanced instruction is indicated [bold added].

It is striking how these conclusions are at odds with the many strong claims regarding the strong benefits of systematic phonics.

Still, the authors are too positive regarding the benefits of systematic phonics given the existing evidence. In part, this is due to the way the authors summarize the findings they do report. But more importantly it is the consequence of not considering some of key limita- tions of the meta-analyses discussed above.

With regards to their own summary of the meta-analyses, they stated that 10 the 12 meta-analyses showed that there were significant benefits of systematic phonics on at least one reading measure, with effect sizes ranging from small to moderate effects (Ehri et al. SYSTEMATIC PHONICS 22

2001; Camilli, Vargas, and Yurecko 2003; Torgerson, Brooks, and Hall 2006; Sherman 2007; Han 2010; Suggate 2010; Adesope et al. 2011; McArthur et al. 2012; Galuschka et al. 2014; Suggate 2016). Furthermore, they note that non-significant positive effects were found in the remaining meta-analyses (Camilli, Wolfe, and Smith 2006; Hammill and Swanson 2006). This provides a reasonably positive take on the findings, consistent with their conclusion teaching should include systematic phonics (perhaps along with other methods).

One problem with this description of the results is that it does not indicate which measures tended to be significant over the meta-analyses. In fact, as discussed above, most meta-analyses failed to obtain significant effects for the measure we should care about most, namely, reading comprehension. Indeed, only 1 of 12 studies reported significant effects in comprehension, and this effect was lost after a delay (NPR, 2000). And this characterization of the findings obscures the fact that the benefits did not always extend to the children who are below average in their cognitive capacities (NPR, 2000).

This summary also does not highlight the fact that many of 12 meta-analyses ob- served larger effect sizes for non-phonics interventions. For example, from Table 3 of Torg- erson et al. (2018) you find out that synthetic phonics did not produce the largest effect in 5 of the 12 meta-analyses (Adesope et al. 2011; Camilli, Vargas, and Yurecko 2003; Camilli, Wolfe, and Smith 2006; Han 2010; Suggate 2016). In addition, as discussed above, Ga- luschka et al. 2014 reported similar sized effect sizes for phonics, phonemic awareness in- struction, reading fluency training, auditory training (g’ = .39), and numerically, the largest effect was obtained with colour overlays. This is not the pattern of results that one might ex- pect to find in a meta-analysis that it commonly taken to support systematic phonics.

In addition, when claiming that 10 of the 12 meta-analyses reported significant bene- fits of systematic phonics, this included the positive outcomes of Ehri et al. (2001) and Ca- milli, et al. (2003). But this is misguided accounting. The Camilli et al. (2006) meta-analysis was carried out on the same set of studies as the earlier Ehri et al. (2001) and Camilli, et al. (2003) analyses (after removing one study and adding 3 from the NRP dataset.), and it failed to obtain a significant benefit of systematic phonics. That is, Camilli et al. (2006) analysis shows that the earlier analyses overestimated the impact of systematic phonics. So rather than providing two sources of evidence for the effectiveness of systematic phonics, the cor- rect conclusion from these three meta-analysis is that systematic phonics does not improve reading. Similarly, Torgerson et al. describe the Suggate (2010) meta-analysis as providing evidence that systematic phonics is most effective for youngest children, but this conclusion was challenged by a subsequent Suggate (2016) meta-analysis that failed to obtain long-term benefits of systematic phonics. Indeed, the Suggate (2016) finding suggests that all the previ- ous meta-analyses that reported benefits for phonics are only short-lived.

In addition, the claim that 10 of the 12 meta-analyses reported a significant benefit for systematic phonics does not incorporate a key point highlighted by Torgerson et al. (2018) elsewhere in their review, namely, the evidence that publication and method bias has inflated these effect sizes in at least some of these meta-analyses. When you combine all the above points with the evidence for bias, it is clear that the above summary of the meta-analyses is not appropriate.

Importantly, there are other aspects of the studies included in these 12 meta-analyses that Torgerson et al. (2018) did not consider that further undermine the claim that systematic phonics is better than alternative methods. As detailed above, there were multiple examples SYSTEMATIC PHONICS 23 of methodological errors in the meta-analyses (counting participants multiple times in ways that artificially inflated the power of studies, Torgerson et al., 2006; excluding studies that should have been included given the inclusion criteria; Camilli et al., 2003; and including studies that should have been excluded given the exclusion criteria, Camilli et al., 2003; Torgerson et al., 2006), examples of including flawed studies that strongly biased the find- ings in support of systematic phonics (e.g., the Umbach et al., 1989; Levy, 1997, 1999) in- cluding non-English studies that biased the results in support of systematic phonics (Suggate, 2010), amongst others. These errors consistently biased the estimates of systematic phonics upwards.

Most importantly, however, Torgerson et al., 2006 did not address the key point iden- tified by Camili et al. (2003, 2006) that compromises all other meta-analyses, namely, sys- tematic phonics was compared to a control condition that included both non-systematic phon- ics and non-phonics conditions (or only included a non-phonics condition in the case of Mac- Arthur et al., 2012). Accordingly, these meta-analyses did not even test the hypothesis that systematic phonics is more effective than whole language and other methods that include non-systematic phonics. Nevertheless, despite being cautious in their conclusion, Torgerson et al. (2018) conclude that teaching should “… include systematic phonics instruction for younger readers” (p. 27)

Summary of Meta-Analyses and Government Reports

The above review of existing meta-analyses and government reports provides no evi- dence that systematic phonics is better than standard alternatives methods used in schools, such as whole language and balanced literacy. Once again, it is important to emphasize that these findings do no challenge the importance of learning grapheme-phoneme correspond- ences, but these mappings are learned well enough with whole language (and related methods that include incidental phonics) such that the reading outcomes are equivalent with system- atic phonics and whole language. Despite this, these same meta-analyses are widely claimed to provide strong or even “overwhelming” (Rose, 2006) evidence for systematic phonics. There can be few areas in psychology in which the research community so consistently reaches a conclusion that is so at odds with available evidence.

Educational Neuroscience and Systematic Phonics

Before moving on to review the non-experimental research, it is worth briefly consid- ering neuroscience results that have been used to support systematic phonics. Researchers adopting this approach claim that brain measures provide unique insights into the mecha- nisms that support skilled reading and the causes of reading difficulties, and this in turn is claimed to provide insights into how to better design reading interventions. A colorful exam- ple of this reasoning can be found in the following analogy:

My firm conviction is that every teacher should have some notion of how reading op- erates in the child’s brain. Those of us who have spent many hours debugging com- puter programs or repairing broken washing machines (as I have done) know that the main difficulty in accomplishing these tasks consists in figuring out what the machine actually does to accomplish a task. To have any hope of success, one must try to pic- ture the state in which it is stuck, in order to understand how it interprets the incoming signals and to identify which interventions will bring it back to the desired states. SYSTEMATIC PHONICS 24

Children’ brains can also be considered formidable machines whose function is to learn. Each day spent at school modifies a mind-boggling number of synapses. Neu- ronal preferences switch, strategies emerge, novel routines are laid down, and new networks begin to communicate with each other. If teachers, like the repairman, can gain an understanding of all these internal transformations, I am convinced that they will be better equipped to discover new and more efficient education strategies . . . (Dehaene, 2009, p. 232–233).

Adopting this logic, a number of studies have found that areas of the brain associated with phonological processing are abnormally activated in dyslexia (e.g., Rumsey et al., 1992; Shaywitz & Shaywitz, 2005). This is taken to support systematic phonics in order to improve phonological processing in these children. In addition, a number of studies have found that this abnormal brain activation is normalized following an intervention of systematic phonics (Barquero, Davis, & Cutting, 2014; Eden et al., 2004). Again, this is claimed to provide strong empirical support for systematic phonics. Indeed, the results from neuroimaging stud- ies of reading are thought to provide some of the best evidence to date for a new field called Educational Neuroscience in with brain measures are used to improve teaching across a range of domains (Howard-Jones, 2014).

The first problem with using neuroscience to make claims regarding reading instruc- tion is that there are a range of empirical problems with the relevant neuroscience literature. This includes the fact that the imaging results are quite mixed, with struggling readers show- ing abnormal activation in many non-phonological brain regions, and non-phonological re- gions showing normalization following a reading intervention (Bowers, 2017). Similarly, Bishop (2013) reviewed all studies between 2003 and 2011 that reported measures of brain function in children before and after an intervention for language learning difficulties (includ- ing reading interventions) and found serious methodological problems in them. These empir- ical limitations seriously weaken the claims that are made regarding how to best teach read- ing.

The more important problem, however, is conceptual (Bowers, 2017). Unlike a wash- ing machine, the brain can learn, and there may be various ways to complete a given task. Accordingly, if a neuroscience study provides a new insight into the cause of a learning dis- order, it is unclear whether or not interventions should directly target the underlying disorder. Another possibility is that an intervention might be most effective when it is designed to en- hance compensatory skills rather than ameliorating the deficit itself. For example, if neuro- science provides evidence that a dyslexic child has a phonological processing deficit, it might be best to develop a reading intervention that emphasizes morphological (semantic) processes (as discussed in the final section). Or perhaps instruction should be designed to combine a compensatory approach along with targeting the deficit itself (e.g., phonics). The conclusion that the intervention should target the deficit or not is not very helpful. The only way to de- termine which approach is most effective is to run behavioral studies, ideally randomized controlled studies, as reviewed above. That is, psychology, not neuroscience, is the disci- pline that should be informing education, and in the case of reading, the psychology literature has failed to support systematic phonics over whole language and related methods that in- clude non-systematic phonics. For a debate regarding the value of educational neuroscience, see the following exchange (Bowers, 2016a,b; Gabrieli, 2016; Howard-Jones et al., 2016).

SYSTEMATIC PHONICS 25

Review of the Non-Experimental Evidence Used to Support Systematic Phonics

Rather than carry out experiments, another approach has been to assess the impact of England’s recent introduction of mandatory systematic phonics in state schools on reading outcomes. Below I review three sets of findings of this sort relevant to assessing the efficacy of systematic phonics. Again, it is widely claimed that the findings support systematic phon- ics, and again, the conclusions do not bear up to scrutiny.

Machin, McNally, and Viarengo (2016)

The authors took advantage of the fact that systematic phonics instruction was phased in slowly in different Local Authorities in England, and accordingly, it was possible to com- pare how children who were part of the phonics trial compared with children who received standard instruction on various standardized language measures.

In 2005 the “Early Reading Development Pilot” (ERDp) that involved 18 Local Au- thorities and 172 schools began, with each school receiving funding for a dedicated learning consultant who trained teachers in systematic phonics (typically for 1 year). Then in 2006, the “Communication, Language and Literacy Development Programme” (CLLD) that in- cluded a further 32 Local Authorities began, again with each school receiving 1-year funding for a dedicated learning consultant. In order to assess the immediate efficacy of introducing systematic phonics, scores from the communication, language, and literacy components of Foundation Stage assessment were collected (when children completed Year 1 at age 5). And in order to assess the long-term effects of this intervention, reading scores from Key Stage 1 (when children were 7 years of age), and reading scores from Stage 2 test (when children were 11) were collected. These are standardized tests given to all students in state schools, with teachers providing the assessment in the Foundation Stage and Key Stage 1, and the tests externally marked in Key Stage 2. Various statistical methods were used to control for the differences between the schools included in the trials and those not-included, and modera- tor variables included the impact of language background (Native English or not) and eco- nomic background (operationalized as children receiving or not receiving a free-school lunch).

For the ERDp sample the authors reported highly significant effect of systematic phonics on the Foundation Stage assessment immediately after the intervention (.298), but the effect dissipated on Key Stage 1 tests (.075), and was eliminated on the Key Stage 2 tests (-.018). Similarly, with the CLLD treatment, an initially robust effect (.217) was reduced on the Key Stage 1 tests (.017), and then was lost on the Key Stage 2 tests (.019). So much like the Suggate (2016) meta-analyses, the phonics intervention effect did not persist. However, Machin et al. (2017) highlighted that the effects did persist in the Key Stage 2 tests in the CLLD treatment condition for non-native speakers (.068) and economically disadvantaged children as measured by their receipt of free school meals (.062), with both effects significant at the p < .05 levels. They took these small effects to show that phonics does provide long- term benefits for children who are in the most need for literacy interventions.

Without a doubt it is high enough to justify the fixed cost of a year’s inten- sive training support to teachers. Furthermore, it contributes to closing gaps based on disadvantage and (initial) language proficiency by family back- ground.

SYSTEMATIC PHONICS 26

However, there are both statistical and methodological problems with using these findings to support the efficacy of systematic phonics. With regards to the statistics, it is im- portant to note that the ERDp sample of children did not show significant advantage for non- native speakers (.045) or for economically disadvantaged children (.050) on the Key Stage 2 tests. Furthermore, for the ERDp sample, there was a tendency for more economically ad- vantaged native English children (not in receipt of free school meals) to read more poorly in the phonics condition in the Key Stage 2 test (-.061), p < .1. As the authors write: “It is diffi- cult to know what to make of this estimate” (p. 22). Note, this negative outcome is of a simi- lar magnitude to the long-term benefits enjoyed non-native speakers (.068) and economically disadvantaged children (.062) in the CLLD treatment condition, and accordingly, is difficult to brush this finding aside.

More importantly, this study did not include the appropriate control condition. The advantages in Foundation and Key Stage 1 were the product intensive training support to teachers in Year 1, but it is possible that similar outcomes would result if training support was given to teachers in whole language instruction, or any other method. As was the case with most of the above meta-analyses, the conclusion the authors made was not even tested.

Recent Success of English Children on PIRLS

A great deal of attention in the mainstream and social media has been given to the re- cent success of English children in the “Progress in International Reading Literacy Study” (PIRLS) carried out in 2016. PIRLS assesses reading comprehension in fourth graders across a wide range of countries every five years: 35 countries participated in 2001, 38 in 2006, 48 in 2011, and 50 in 2016. Many supporters of systematic phonics have noted how far up the league table England has moved since 2006 given that systematic phonics was mandated in English state schools in 2007, and phonics check was introduced in 2012. Specifically, Eng- land was in 15th position in 2006 (with a score of 539), joint 11th position in 2011 (score 552), and joint 8th in 2016 (score 559).

In response to the most recent results, Mr Gibbs, the Minister of State at the Depart- ment for Education said:

The details of these findings are particularly interesting. I hope they ring in the ears of opponents of phonics whose alternative proposals would do so much to damage reading instruction in this country and around the world.

A Department for Education report for the UK (December, 2016) reported:

The present PIRLS findings provide additional support for the efficacy of phonics approaches, and in particular, the utility of the phonics check for flagging pupils’ potential for lower reading performance in their future schooling.

Similarly, Sir Jim Rose, author of the Rose (2006) report, used “the spectacular success of England shown in the latest PIRLS data” as further evidence in support of systematic syn- thetic phonics (Rose, 2017).

SYSTEMATIC PHONICS 27

The London based newspaper The Telegraph published an article entitled “Phonics revolution: Reading standards in England are best in a generation, new international test re- sults show” (December 5th 2017) also made the case that the PIRL results are attributable to the introduction of systematic phonics and the phonics check in England. This headline has received approximately 11.5K hits when entered in google, and the link to the article has been tweeted or retweeted by many leading academic researchers, again giving the strong im- pression that the there is evidence in support of systematic phonics.

However, once again, these conclusions are unjustified. One important fact ignored in the above story is that English children did well in 2001, ranking 3rd (scoring 553). Of the 6 countries that completed the PIRLS test on all 4 occasions (England, New Zealand, Russian Federation, Singapore, Sweden, and USA), England has gone from 2nd to 3rd position. If the introduction of systematic phonics is used to explain the improved performance from 2006- 2016, how is the excellent performance in 2001 explained?

It is also interesting to note that Northern Ireland participated in the last two PIRLS, and they did better than England, ranking 5th and 6th in 2011 and 2016 respectively. This is relevant as the Reading Guidance for Key Stage 1 published by the “Northern Ireland Educa- tion & Library Boards” does not include the words “systematic phonics”, nor do children complete a phonics check that was introduced in the UK to improve the administration of phonics in English schools. Of course, reading instruction in Northern Ireland does teach children letter-sound correspondences, but this is carried out along with a range of methods that encourage children to encode the meaning of words and passages. For instance, accord- ing to the Reading Guidance for Key Stage 1, when children encounter an unknown word, various strategies for naming the word are encouraged, including: phonics, using knowledge of context (semantics), using knowledge of grammar (syntax). This is similar to National Literacy Strategy in place in England from 1998 to 2006 (prior to the introduction of system- atic phonics) that recommended phonics as one of four ‘searchlights’ for learning to read, along with knowledge of context, grammatical knowledge, and graphic knowledge. If the introduction of systematic phonics is used to explain the strong perfor- mance of England in 2016, how is the even better performance of Northern Ireland ex- plained?

A final point worth emphasizing is that the PIRLS test assesses reading comprehen- sion, and as noted above only 1 of the 12 meta-analyses reported a benefit for comprehension (NPR, 2000), and only at a short delay, with the effects lost after a year (even putting aside all the problems of the meta-analyses). Attributing the PIRLS gains to phonics is hard to rec- oncile with all existing experimental research.

The Improving Performance on the Phonics Screening Check in England

Since 2012 the UK government has required all children in state schools in England to complete a phonics screening check in Year1 in order “to confirm that all children have learned phonic decoding to an age-appropriate standard” (Department for Education, 2012; p. 4). The phonics screening check is composed of one- and two-syllable real words (e.g., day, grit, shin) and 20 pseudowords that can only be read on the basis of learned grapheme-pho- neme correspondences (e.g., fape, blan, geck). Children near the end of Year 1 are asked to read the words and pseudowords aloud, with each item marked correct or incorrect. A child who correctly names aloud 32 items (80% of all items) is said the ‘meet the standard’, SYSTEMATIC PHONICS 28 whereas a child who misses the standard is to be given further support to improve their phon- ics knowledge (and complete the phonics check again in Year 2).

The test has proved highly controversial (e.g., Grundin, 2018; Wrigley, 2017), but what is critical for present purposes is that performance on the task has improved from 58% students meeting the standard in 2012 to 81% in 2017 (see Figure 1 for performance overall all years). Indeed, in a pilot introduction of the phonics screening test carried out in 300 schools in 2011, only 32% of children met the standard. Before considering this result in any detail, it is perhaps worth noting that the large improvement since 2011 seems somewhat sur- prising given that systematic phonics was introduced in UK state schools in 2007. In order to explain the results, it must be assumed that systematic phonics in England was poorly admin- istered prior to the phonics check (from 2007-2012), and that the introduction of the phonics check dramatically improved systematic phonics instruction, resulting in improved phonics check results for most children.

Figure 1 Scores on the phonics screening check from 2012-2017.

Despite the impressive improvements on the phonics screening check results, it is dif- ficult to conclude that the findings lend strong support to systematic phonics. One minor point worth mentioning is that the distribution of scores on the phonics screen check has been highly unusual in every year it has been administered, with a consistent spike in performance at the score of 32 (the criterion for meeting the standard), as can be seen in Figure 1. The cut-off score of 32 was announced to teachers in 2012 and 2013, and the score has remained the same ever since, suggesting that teachers have been marking children up in order to pass them. In response to this distribution observed in 2012 Bishop (2012) wrote:

This is most unlikely to indicate a problem inherent in the test itself. It looks like hu- man bias that arises when people know there is a cutoff and, for whatever reason, are reluctant to have children score below that cutoff. As one who is basically in favour SYSTEMATIC PHONICS 29

of phonics testing, I’m sorry to put another cat among the educational pigeons, but on the basis of this evidence, I do query whether these data can be trusted.

Although the bias is not as marked from 2014-2017, the bias has clearly continued.

Putting aside this issue, if we grant that the introduction of the phonics check has dra- matically improved phonics instruction, and this in turn has improved the pseudoword and word naming results on the phonics screening check, the important question is whether these modifications have translated to improved reading more generally. One obvious way to test this is to compare the phonics check results with the National Curriculum Assessments (SATs) carried out at Key Stage 1 and 2 during the years 2012-2017. These are the same tests analyzed by Machin et al. (2017) above (although they analyzed data from before 2012). The results of the phonics check and the Key Stage 1 SAT scores are displayed in Figure 2.

Figure 2.

Key Stage 1 SAT results from 2006-2017 and phonics screen check results from 2012-2017 100% 95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

Reading Writing Maths Science phonics check

There have been some claims that improved performance on the phonics screening check are associated with improved performance on the SATS. For instance, Buckingham (2016) writes:

There has also been an improvement in Key Stage 1 (Year 2) reading and writing re- sults since the introduction of the Phonics Screening Check. The proportion of stu- dents achieving at or above the target reading level hovered around 85% from 2005 to 2011 but steadily increased to 90% in 2015. There was an even greater improvement in writing in the same period ― a seven percentage point increase. (p. 16).

SYSTEMATIC PHONICS 30

But this characterization of the findings is inconsistent with a report from the Depart- ment for Education (Walker, Sainsbury,Worth, Bamforth, & Betts, 2015). The authors ana- lyzed the reading and writing scores for the KS1 for the two years preceding and following the in introduction of the screening check and concluded:

The evidence offered by these analyses is therefore inconclusive in identifying any impact of the PSC on literacy performance at KS1 or on progress in literacy between ages five and seven.

Why the different conclusions? One key point to note is that although the SAT scores did start slowly increasing in 2012 (consistent with Buckingham, 2016), it is not possible to at- tribute these gains to the phonics screening check because these children completed Year 1 in 2011, and accordingly, were never given the check. As noted by Walker et al. (2015):

These analyses of national data therefore indicate small improvements in attainment at KS1, which were a feature before the introduction of the check and continued at a similar pace following the introduction of the check.

In addition, as can be seen from Figure 1, there is little evidence that the SAT scores for read- ing and writing improved more than the SAT scores for maths or science between 2013- 2015.3

Another important question that can be asked is whether the introduction of the phon- ics check was associated with improved reading skills at Key Stage 2 when children were in Year 6 (age 11). That is, did the improved teaching of phonics in Year 1 (in response to the phonics check) have any long-lasting effect on reading outcomes? The results from 2017 provide the first relevant data given that children who completed these Key Stage 2 SATs were the first to complete the phonics check in 2012 in Year 1. As you can see in Figure 3, the reading results when slightly down for the 2017 cohort compared to the 2016 cohort who did not receive the phonics check (while writing results went slightly up). Again, these find- ings provide no evidence that improved phonics instruction supports long-lasting reading im- provement. This is consistent with Suggate’s (2016) meta-analysis that showed early system- atic phonics had no long-lasting impact on reading performance, as well as the non-experi- mental analysis of Machin et al. (2016) who found that the introduction of phonics had negli- gible long-term effects on SAT scores.

3 Note, the tests changed in 2016, and this accounts for the large dip in performance. SYSTEMATIC PHONICS 31

Figure 3

Key Stage 2 SAT results and phonics check scores from 2006-2017

95% 90% 85% 80% 75% 70% 65% 60% 55% 50% 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017

Reading Writing Maths Science phonics check

Summary of Non-Experimental Studies

Despite the widespread claim within the research community, mainstream media, and social media that children are reading better in England since the mandatory inclusion of sys- tematic phonics in state schools in 2007 and the introduction of the phonics screening test in 2012, there is little evidence to support this conclusion. Perhaps the best evidence is that word and pseudoword naming has improved on the phonics screening check (ignoring poten- tial problems with the scoring of these data as detailed in Figure 1), but there is little evidence that this improvement is associated with improved short- or long-term effects on reading more generally. It is hard to reconcile the findings with the many strong claims regarding the efficacy of systematic phonics.

Again, this should not be taken as evidence that grapheme-phoneme correspondences are unimportant to reading, nor should it be taken to support whole language and related methods. Rather, it should be taken to challenge the claim that systematic phonics is highly effective, and it should motivate researchers and teachers to look for better methods.

Explaining the Widespread Support for Systematic Phonics and Proposing an Alternative Approach

How can it be that the scientific consensus is so at odds with the data? No doubt soci- ological and political factors have played a role given the previous acrimonious debate (the “reading wars”) between proponents of phonics and whole language (Pearson, 2004). But I would like to suggest that the most important reason why so many reading researchers are committed to systematic phonics is that most researchers have a mistaken theory of how the SYSTEMATIC PHONICS 32

English works. As I detail below, this theoretical confusion has not only con- tributed to a strong bias in support of systematic phonics, it has also made it difficult for re- searchers to consider alternative approaches that may be more successful. I conclude this section by outlining an alternative approach to reading instruction consistent with both theory and the data reviewed above.

The Mistaken Theory That the English Spelling System is Alphabetic

The main theoretical motivation for phonics for English is the claim that the English spelling system follows the with letters representing speech sounds (pho- nemes). Indeed, over 11,000 articles have used the phrase “alphabetic principle” (as of Janu- ary 2018 according to Google Scholar), and in the vast majority of these papers (in the sam- ple I have read), this principle is used to motivate phonics instruction for English speaking children. That is, it is argued that early reading instruction should focus on graphemes-pho- nemes because this is the foundation of the English writing system. This line of argument is explicit in the National Curriculum for English programmes of study: Key Stages 1 and 2 in England (Department for Education, 2013) where it is written:

Skilled word reading involves both the speedy working out of the pronunci- ation of unfamiliar printed words (decoding) and the speedy recognition of familiar printed words. Underpinning both is the understanding that the let- ters on the page represent the sounds in spoken words. This is why phonics should be emphasised in the early teaching of reading to beginners (i.e. un- skilled readers) when they start school.

The problem with this argument is that the premise is wrong – the English spelling system does not follow the alphabetic principle. The obvious manifestation of this is that the mappings between graphemes and phonemes in English are highly inconsistent. Even when considering the monosyllabic words included in The Children’s Printed Word Database (Masterson, Stuart, Dixon, & Lovejoy, 2010), ~16% are “irregular” in the sense that they have unexpected pronunciations according to phonics (Bowers & Bowers, 2018a,b), and ad- ditional sources of inconsistencies arise in multisyllabic and multimorphemic words (Mous- ikou, Sadat, Lucas, & Rastle, 2017). The mappings between phonemes and graphemes (used for spelling rather than reading) are even more irregular, with Crystal (2003) estimating that only 56% English spellings can be derived from phoneme-grapheme correspondences. This is unlike alphabetic such as Italian and Spanish in which the grapheme-phoneme and phoneme-grapheme mappings are highly regular.

In fact, many researches acknowledge that the English spelling system is not perfectly alphabetic, but nevertheless claim that that it approximates an alphabetic system. For in- stance, Byrne (1998) writes:

…. Inconsistencies and irregularities in English spelling abound… Never- theless, English is fundamentally an alphabetic language (pp. 1-2).

The view that the English system approximates an alphabetic system is also commonplace and is still used to motivate phonics (e.g., Adams, 1990; Byrne, 1998; Castles, Rastles, & Na- tion, 2018; Duff, Mengoni, Bailey, & Snowling, 2015; Taylor, Davis, & Rastle, 2017; Wyse, & Goswami, 2008).

SYSTEMATIC PHONICS 33

But in fact, English spellings evolved to jointly represent units of meaning (mor- phemes) and phonology (phonemes). As Venezky (1967) put it:

The simple fact is that the present is not merely a letter-to-sound system riddled with imperfections, but instead, a more complex and more regular relationship wherein phoneme and share leading roles (p. 77).

It is important to emphasize that Byrne’s claim that English spelling system is “fundamen- tally alphabetic” is inconsistent with Venezky’s claim that phonemes and morphemes “share leading roles”.

To briefly illustrate the morphophonemic nature of English, consider the affix <-ed> in the words , , and . The key point to note is that the affix is spelt consistently despite the fact that the <-ed> is associated with the pronunciations /t/, /d/ and /ɪd/, respectively. Clearly the <-ed> spelling is coding for meaning (marking the past tense) rather than phonology. Or consider the spelling of the base in the words , , and . Here the base is spelt consistently despite changes in the pronuncia- tions of the vowel (it is not * ), again reflecting the fact that spellings are cod- ing for meaning rather than sound. These are not cherry-picked examples: English consist- ently prioritizes the consistent spelling of morphemes over the consistent spellings of pho- nemes. Indeed, in order to spell morphemes in a consistent manner, it is necessary to have inconsistent (or perhaps a better term is ‘flexible’) grapheme-phoneme correspondences. A language that prioritizes the consistent spelling of morphemes over phonemes is not “funda- mentally alphabetic”. The morphophonemic nature of the English writing system strongly impacts on the words that children need to learn at the start of instruction, with most words containing two or more morphemes (Bowers & Bowers, 2018b). For a brief tutorial of the English spelling system see Bowers and Bowers (2017), and for a critique of the alphabetic principle, see Bowers and Bowers (in press).

Why does this matter? First, it is clear that most researchers in psychology are strongly committed to systematic phonics despite the lack of evidence that systematic phon- ics is better than alternative methods commonly used in schools. Indeed, researchers system- atically ignore contrary evidence (e.g., the Camilli et al., 2008 paper that undermines the con- clusions of the NPR report has been cited a total of seven times). This is problematic if teaching practices are to be evidence based. One plausible explanation for this disconnect is that most researchers are so strongly committed to the theory of phonics (due to the alpha- betic principle) that they fail to appreciate the weakness of the empirical evidence for system- atic phonics.

Second, and perhaps more importantly, this theory has discouraged researchers from considering alternative teaching methods. Specifically, the alphabetic principle makes it dif- ficult to consider the hypothesis that children should be taught about the morphological struc- ture of words, or the logical organization of English spellings as reflected in the interrelation between morphology, etymology, and phonology. To illustrate, consider again the NPR (2000) report. In 449 pages, the word “phoneme” occurs 294 times, “alphabetic” 80 times, “alphabetic principle” four times, whereas “morpheme” occurs once (derivations of “mor- pheme” a total of four times). In more recent meta-analyses taken to support phonics (Ga- luschka et al., 2014; McArthur et al., 2012; Rose, 2006, 2009), and a recent meta-analysis that fails to find any long-term benefits of phonics (Suggate, 2016), there are no occurrences SYSTEMATIC PHONICS 34 of the word “morpheme”. As long as most researchers characterize English as alphabetic, lit- tle research will investigate alternative hypotheses to phonics.

Motivating an Alternative Approach to Systematic Phonics

Before outlining an alternative approach to systematic phonics, I would like to high- light some important lessons that should (and should not) be drawn from the above review. First, and most importantly, researchers should abandon their strong commitment to system- atic phonics and explore alternative methods. Again, given the highly politicized nature of reading research and teaching practice, I should also emphasize that the research also not sup- port whole language and related methods. Indeed, too many children struggle with reading following both whole language and systematic phonics instruction. Rather, the failure to ob- serve different outcomes for systematic phonics and whole language should be used to moti- vate more research into new approaches to reading instruction.

Second, when considering new methods, it is important to emphasize that this review does not challenge the widespread claim that children should be taught grapheme-phoneme correspondences. Indeed, as highlighted in the NPR (2000) and Her Majesty’s Inspectorate (1990), grapheme-phoneme correspondences are taught in a wide variety of reading methods, including whole language. The key finding from Camilli et al.’s (2003, 2006) review was not that grapheme-phoneme correspondences were unimportant to learning to read, but that there was no evidence that systematic phonics was more effective than alternative methods that in- cluded incidental phonics. Camilli et al. (2003, 2006) finding that systematic phonics was more effective than interventions that included no phonics highlights the importance of grapheme-phoneme correspondences.4

Third, the review does not falsify the widespread view that grapheme-phoneme corre- spondences should be systematically taught. Although this might be a tempting conclusion, it is important to note that the reading interventions that included incidental phonics also em- phasized meaning-based strategies whereas interventions with systematic phonics did not. One obvious possibility, then, is that systematic phonics is useful, but so are meaning-based strategies, and the complementary strengths of the two approaches lead to similar outcomes. If that is the case, then reading interventions that emphasize both systematic instructions of grapheme-phoneme correspondences as well as meaning-based approaches might be more effective. Indeed, as noted above, this was the conclusion of Torgerson et al. (2018) who ar- gued that current evidence is consistent with “balanced instruction” that emphasize both pho- nological and meaning-based instruction. This conclusion is also consistent with a wealth of evidence that direct instruction across a range of domains is beneficial (e.g., McMullen & Madelaine, 2014).

Fourth, the realization that the English writing system is morphophonemic, with both grapheme-phoneme correspondences as well as spelling-meaning correspondences (through morphology and etymology). This raises the possibility that meaning-based strategies should

4 In the same way, the fact that other meta-analyses showed little or no evidence for system- atic phonics does not contradict the importance of grapheme-phoneme correspondences, as again, systematic phonics was typically compared to a condition that included both incidental and non-phonics interventions. SYSTEMATIC PHONICS 35 not only consider the meaning of words in the context of text (as is the case of whole lan- guage), but also, teach the meaningful and regular sub-lexical organization of words. Indeed, an understanding of the writing system opens up the possibility to systematically study both the phonological and meaningful organization of English spellings.

Together, these findings and conclusions may provide some insights into how to de- sign better reading instruction. One alternative approach consistent with these observations is called Structured Word Inquiry or SWI (Bowers & Kirby, 2010). As detailed elsewhere (Bowers & Bowers, 2017), SWI is inspired by linguistic fact that English spellings are mor- phophonemic rather than alphabetic. On SWI, children from the start should be explicitly taught all the sub-lexical regularities that occur in English spellings, namely, grapheme-pho- neme correspondences as well as sub-lexical spelling-meaning correspondences (coded through morphology and etymology). In this way, children can make sense of most English spellings, and learn the meaningful connections between morphologically related words, with the aim of improving reading, spelling, and vocabulary knowledge. A critical feature of this approach is that children can be engaged in generating and testing hypotheses about the words they encounter. For example, children can be presented with words such as play, play- ful, replay, plays, plane, playmate, and say and develop and test hypotheses about which words are from the same morphological family and which are not. Similarly, children can learn why many words are spelt the way they are ( not *), rather than labelling so many words as exceptions as done in systematic phonics. This may be important given that learning is best when information is organized in a meaningful manner (Bower et al., 1960), and when children can generate plausible explanations as to why some stated fact is true (Dunlosky, et al., 2013).

Bowers and Bowers (2017) detail both pedagogical arguments and preliminary empir- ical evidence in support of SWI. This includes randomized and quasi-randomized controlled studies reporting that SWI improved decoding (Devonshire et al., 2013), spelling (Devon- shire & Fluck, 2010), and vocabulary knowledge (Bowers & Kirby, 2010). For example, the Devonshire et al. (2013) study found SWI to be more effective than systematic phonics in children between the ages of 5-7.5 Just to illustrate how SWI can be implemented at the start of instruction, see the following video of a basic lesson in public school Grade 1 classroom: http://tinyurl.com/zlr27pn. Of course, this video does not provide evidence that this approach is effective, but it does illustrate the nature of the instruction, and demonstrate that the ap- proach can be implemented from the start.

Unfortunately, however, there is currently a serious drawback with SWI; namely, lit- tle empirical research has directly compared SWI to systematic phonics or alternative meth- ods. And as noted above, there are two main reasons for this, namely (a) researchers are strongly committed to the alphabetic principle, and (b) researchers claim that the empirical evidence strongly supports phonics. Accordingly, most researchers do not seriously consider alternative methods. Bowers and Bowers (2017) have provided a detailed argument as to why the English spelling system is morphophonemic rather than alphabetic, and here, I show that the empirical evidence does not support systematic phonics. Once it is realized that both

5 Rastle and Taylor (2018) dismissed the relevance of the Devonshire et al. (2013) paper by questioning whether it included a phonics control condition. This criticism is based on a mis- reading of the paper as shown by Bowers and Bowers (2018b). SYSTEMATIC PHONICS 36 theory and data fail to support systematic phonics over alternative methods, my hope is that researchers will be more open to looking at alternative approaches, including SWI.

Summary

Despite the strong widespread support for systematic phonics within the psychology literature there is little evidence that this approach is better than the main alternative methods, including whole language. There can be few areas in psychology in which the strength of claims and the strength of evidence is so disconnected. This should not be used as an argu- ment in support of whole language and related methods, but rather, it should be used to moti- vate new approaches to reading instruction.

SYSTEMATIC PHONICS 37

References

Adams, M. J. (1990). Beginning to read: Thinking and learning about print. Cambridge, MA: The MIT Press. Adesope, O. O., Lavin, T., Thompson, T., & Ungerleider, C. (2011). Pedagogical strategies for teaching literacy to ESL immigrant students: A meta‐analysis. British Journal of Educational Psychology, 81, 629-653. Anglin, J. M. (1993). Vocabulary development: A morphological analysis. Monographs of the Society for Research in Child Development, 58(10, Serial No. 238). Barquero, L. A., Davis, N., & Cutting, L. E. (2014). Neuroimaging of Reading Intervention: A Systematic Review and Activation Likelihood Estimate Meta-Analysis. Plos One, 9(1), e83668. doi:10.1371/journal.pone.0083668 Becker, B. J. (2005). Failsafe N or file-drawer number. Publication bias in meta-analysis: Prevention, assessment and adjustments, 111-125. Bower, G. H., Clark, M. C., Lesgold, A. M., & Winzenz, D. (1969). Hierarchical retrieval schemes in recall of categorized word lists. Journal of Verbal Learning and Verbal Be- havior, 8, 323–343. http://dx.doi.org/10.1016/S0022-5371(69)80124-6 Bowers, J. S. (2016a). The Practical and principled problems with educational neuroscience. Psychological Review, 123, 600-612 Bowers, J. S. (2016b). Psychology, not educational neuroscience, is the way forward for im- proving educational outcomes for all children: Reply to Gabrieli (2016) and Howard- Jones et al. (2016). Psychological Review, 123, 628-635. Bowers, J. S., & Bowers, P. N. (2017). Beyond Phonics: The Case for Teaching Children the Logic of the English Spelling System. Educational Psychologist, 52, 124-141. DOI: 10.1080/00461520.2017.1288571 Bowers, J.S., & Bowers, P.N. (2018a). The importance of correctly characterizing the Eng- lish spelling system when devising and evaluating methods of reading instruction. Comment on Taylor, Davis, and Rastle (2017). Quarterly Journal of Experimental Psy- chology, 71, 1497-1500. Bowers, J.S. & Bowers, P.N. (2018b). There is no evidence to support the hypothesis that systematic phonics should precede morphological instruction: Response to Rastle and colleagues. PsyArXiv. DOI:https://psyarxiv.com/zg6wr/ Bowers, J.S., & Bowers, P.N. (in press). Progress in reading instruction requires a better un- derstanding of the English spelling system. Current Directions in Psychological Sci- ence. Bowers, P. N., & Kirby, J. R. (2010). Effects of morphological instruction on vocabulary ac- quisition. Reading and Writing: An Interdisciplinary Journal, 23, 515–537. http://dx.doi.org/10.1007/s11145-009-9172-z Bowers, P. N., Kirby, J. R., & Deacon, S. H. (2010). The effects of morphological instruction on literacy skills: A systematic review of the literature. Review of Educational Re- search, 80, 144–179. http://dx.doi.org/10.3102/0034654309359353 Bowey, J. A. (2006). Need for systematic synthetic phonics teaching within the early reading curriculum. Australian Psychologist, 41, 79–84 Browder, D. M., & Xin, Y. P. (1998). A meta-analysis and review of sight word re-search and its implications for teaching functional reading to individuals with moderate and severe disabilities. The Journal of Special Education, 32(3), 130-153. Brown, L. T., Mohr, K. A., Wilcox, B. R., & Barrett, T. S. (2017). The effects of dyad read- ing and text difficulty on third-graders’ reading achievement. The Journal of Educa- tional Research, 1-13. SYSTEMATIC PHONICS 38

Buckingham, J. (2016). Focus on Phonics: Why Australia should adopt the Year 1 Phonics Screening Check. Centre for Independent Studies. Research Report 22 Byrne, B. J. (1998). The foundation of literacy: The child’s acquisition of the alphabetic prin- ciple. Philadelphia, PA: Psychology Press. Camilli, G., Kim, S., & Vargas, S. (2008). A Response to Steubing et al., “Effects of system- atic phonics instruction are practically significant”: The origin of the national reading panel. Education Policy Analysis Archives, 16(16), 1–17. Camilli, G., Vargan, S., & Yurecko, M. (2003). Teaching children to read: The fragile link between science and federal education policy. Education Policy Analysis Archives, 11(15), 1–51. Camilli, G., M. Wolfe, P., & Smith, M. L. (2006). Meta-analysis and reading policy: Perspec- tives on teaching children to read. The Elementary School Journal, 107(1), 27-36. Carlisle, J. F. (2000). Awareness of the structure and meaning of morphologically complex words: Impact on reading. Reading and Writing: An Interdisciplinary Journal, 12, 169– 190. http://dx.doi.org/10.1023/A:1008131926604 Caravolas, M., Hulme, C., & Snowling, M. J. (2001). The foundations of spelling ability: Ev- idence from a three-year longitudinal study. Journal of Memory and Language, 45, 751–774. Castles, A., Rastle, K., & Nation, K. (2018). Ending the reading wars: Reading acquisition from novice to expert. Psychological Science in the Public Interest, 19, 5-51. Crystal, D. (2003). The Cambridge encyclopedia of the English language (2nd Edition). Cambridge, UK: Cambridge University Press. Dehaene, S. (2011). The massive impact of literacy on the brain and its consequences for ed- ucation. Human Neuroplasticity and Education (Vatican City), 117, 19 –32, 237–238 Department of Education, Science and Training (2005). Teaching Reading. National Inquiry into the Teaching of Literacy. Canberra: Commonwealth of Australia. Devonshire, V., & Fluck, M. (2010). Spelling development: Fine-tuning strategy-use and capitalising on the connections between words. Learning and Instruction, 20, 361–371. http://dx.doi.org/10.1016/j.learninstruc.2009.02.025 Devonshire, V., Morris, P., & Fluck, M. (2013). Spelling and reading development: The ef- fect of teaching children multiple levels of representation in their orthography. Learn- ing and Instruction, 25, 85–94. DOI: 10.1016/j.learninstruc.2012.11.007 Duff, F. J., Mengoni, S. E., Bailey, A. M., & Snowling, M. J. (2015). Validity and sensitivity of the phonics screening check: implications for practice. Journal of Research in Read- ing, 38(2), 109-123. Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Im- proving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Inter- est, 14(1), 4-58. DOI: 10.1177/1529100612453266 Eden, G. F., Jones, K. M., Cappell, K., Gareau, L., Wood, F. B., Zeffiro, T. A., et al. (2004). Neural Changes following Remediation in Adult Developmental Dyslexia. Neuron, 44), 411–422. doi:10.1016/j.neuron.2004.10.019 Ehri, L. C. (1997). Phases of development in learning to read by sight. Journal of Research in Reading, 18, 116–125 Ehri, L. C., & McCormick, S. (1998). Phases of word learning: Implications for instruction with delayed and disabled readers. Reading and Writing Quarterly, 14, 135–163. http://dx.doi.org/10.1080/1057356980140202 Ehri, L. C., Nunes, S. R., Stahl, S. A., & Willows, D. M. (2001). Systematic phonics instruc- tion helps students learn to read: Evidence from the National Reading Panel’s meta- SYSTEMATIC PHONICS 39

analysis. Review of Educational Research, 71, 393–447. http://dx.doi.org/10.3102/00346543071003393 Foorman, B., Francis, D., Novy, D., & Liberman, D. (1991). How letter-sound instruction mediates progress in first-grade reading and spelling. Journal of Educational Psychol- ogy, 83, 456-469. Ford, C. (2009). The effect of the backward-chaining method of decoding with computer-as- sisted instruction on the reading skills of struggling adolescent readers [thesis]. DeK- alb, Illinois: Northern Illinois University, 2009. Fricke, S., Bowyer‐Crane, C., Haley, A. J., Hulme, C., & Snowling, M. J. (2013). Efficacy of language intervention in the early years. Journal of Child Psychology and Psychiatry, 54, 280-290. Fricke, S., Burgoyne, K., Bowyer‐Crane, C., Kyriacou, M., Zosimidou, A., Maxwell, L., ... & Hulme, C. (2017). The efficacy of early language intervention in mainstream school settings: a randomized controlled trial. Journal of Child Psychology and Psychiatry, 58(10), 1141-1151. Frith, U. (1985). Beneath the surface of surface dyslexia. In K. E. Patterson, J. C. Marshall, & M. Coltheart (Eds.). Surface dyslexia: Neuropsychological and cognitive studies of phonological reading (pp. 301–330). London: Routledge and Kegan Paul. Gabrieli, J. D. E. (2016). The promise of educational neuroscience: comment on Bowers (2016). Psychological Review, 123, 613-619 Galuschka, K., Ise, E., Krick, K., & Schulte-Körne, G. (2014). Effectiveness of treatment ap- proaches for children and adolescents with reading disabilities: a meta-analysis of ran- domized controlled trials. PloS one, 9(2), e89900. Gittelman, R., & Feingold, I. (1983). Children with reading disorders—I. Efficacy of reading remediation. Journal of Child Psychology and Psychiatry and Allied Disciplines, 24, 167-191 Gooch, D., Hulme, C., Nash, H. M., & Snowling, M. J. (2014). Comorbidities in preschool children at family risk of dyslexia. Journal of Child Psychology and Psychiatry, 55(3), 237-246. Goodwin, A. P., & Ahn, S. (2013). A meta-analysis of morphological interventions in Eng- lish: Effects on literacy outcomes for school-age children. Scientific Studies of Reading, 17, 257–285. http://dx.doi.org/10.1080/10888438.2012.689791 Grundin, H. U. (2018). Policy and evidence: a critical analysis of the Year 1 Phonics Screen- ing Check in England. Literacy, 52, 39-46. Han, I. (2010). Evidence-based Reading Instruction for English Language Learners in Pre- school through Sixth Grades: A Meta-analysis of Group Design Studies. University of Minnesota, ProQuest Dissertations Publishing, 2009. 3371852. Hammill, D. D., & Swanson, H. L. (2006). The National Reading Panel’s meta-analysis of phonics instruction: Another point of view. The Elementary School Journal, 107(1), 17- 26. Henry, M. K. (1989). Children’s word structure knowledge: Implications for decoding and spelling instruction. Reading and Writing: An Interdisciplinary Journal, 2, 135–152. http://dx.doi.org/10.1007/BF00377467 Her Majesty’s Inspectorate (HMI) (1990) The Teaching and Learning of Reading in Primary Schools. London: Department of Education and Science (DES). Howard-Jones, P. (2014b). Neuroscience and education: A review of educational interven- tions and approaches informed by neuroscience. Full report and executive summary for the Education Endowment Foundation. SYSTEMATIC PHONICS 40

Howard-Jones, P. A., Varma, S., Ansari, D., Butterworth, B., De Smedt, B., Goswami, U., ... & Thomas, M. S. (2016). The principles and practices of educational neuroscience: Comment on Bowers (2016). Psychological Review, 123, 620-627. Johnston, R. & Watson, J. (2003) Accelerating reading and spelling with synthetic phonics: a five year follow up (Edinburgh, The Scottish Executive Education Department; Insight 4). Johnston, R. & Watson, J. (2004) Accelerating the development of reading, spelling and pho- nemic awareness skills in initial readers. Reading and Writing, 17, 327-357. Johnston, R. & Watson, J. (2005) The effects of synthetic phonics teaching of reading and spelling attainment: a seven year longitudinal study. Available online at: http://www.scotland.gov.uk/ Resource/Doc/36496/0023582.pdf (accessed 10 December 2006 Kirby, R. J., & Bowers, P. N. (2017). Morphological instruction and literacy: Binding pho- nological, orthographic, and semantic features of words (pp. 437-461). In K. Cain, D. Compton, & R. Parrila, (Eds.), Theories of reading development. Amsterdam, The Netherlands: John Benjamins. Doi: 10.1075/swll.15.24kir Larkin, R. F., & Snowling, M. J. (2008). Morphological spelling development. Reading & Writing Quarterly, 24, 363–376. http://dx.doi.org/10.1080/10573560802004449 Leach, D. J., & Siddall, S. W. (1990). Parental involvement in the teaching of reading: A comparison of hearing reading, paired reading, pause, prompt, praise, and direct in- struction methods. British Journal of Educational Psychology, 60, 349-355. Levy, B, & Lysynchuk, L.(1997). Beginning word recognition: benefits of training by seg- mentation and whole word methods. Scientific Studies of Reading, 1, 359–387. Levy, B, Bourassa D, & Horn, C. (1999). Fast and slow namers: benefits of segmentation and whole word training. Journal of Experimental Child Psychology, 73, 115–138. Machin, S., S. McNally, and M. Viarengo. 2016. “Teaching to Teach” Literacy. London: London School of Economics Centre for Economic Performance Discussion Paper No 1425. Mantzicopoulos, P., Morrison, D., Stone, E., & Setrakian, W. (1992). Use of the SEARCH/TEACH tutoring approach with middle-class students at risk for reading fail- ure. Elementary School Journal, 92, 573- 586. Masterson, J., Stuart, M., Dixon, M., & Lovejoy, S. (2010). Children's printed word database: Continuities and changes over time in children's early reading vocabulary. British Jour- nal of Psychology, 101, 221-242. DOI:10.1348/000712608X371744 McArthur, G., Castles, A., Kohnen, S., Larsen, L., Jones, K., Anandakumar, T., & Banales, E. (2013). Sight word and phonics training in children with dyslexia. Journal of Learn- ing Disabilities, 48, 391–407. http://dx.doi.org/10.1177/0022219413504996 McArthur, G., Eve, P. M., Jones, K., Banales, E., Kohnen, S., Anandakumar, T., &, et al. (2012, December 12). Phonics training for English speaking poor readers. Cochrane Database of Systematic Reviews, CD009115 McArthur, G., Kohnen, S., Jones, K., Eve, P., Banales, E., Larsen, L., & Castles, A. (2015). Replicability of sight word training and phonics training in poor readers: A randomised controlled trial. PeerJ, 3, e922. Moats, L. C. (2000). Whole language lives on: The illusion of “balanced” reading instruction. Retrieved from http://www.ldonline.org/article/6394/ Mousikou, P., Sadat, J., Lucas, R., & Rastle, K. (2017). Moving beyond the monosyllable in models of skilled reading: Mega-study of disyllabic nonword reading. Journal of Memory and Language, 93, 169-192. Moustafa, M., & Maldonado-Colon, E. (1998). Whole-to-parts phonics instruction: Building on what children know to help them know more. The Reading Teacher, 52, 448–458. SYSTEMATIC PHONICS 41

McMullen, F., & Madelaine, A. (2014). Why is there so much resistance to Direct Instruc- tion? Australian Journal of Learning Difficulties, 19, 137-151. National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. Bethesda, MD: National Institute of Child Health and Human Development. Nunes, T., Bryant, P. E., & Bindman, M. (1997). Morphological spelling strategies: Develop- mental stages and processes. Developmental Psychology, 33, 637–649. Pearson, P. D. (2004). The reading wars. Educational policy, 18, 216-252. Quinn, J. M., Wagner, R. K., Petscher, Y., & Lopez, D. (2015). Developmental relations be- tween vocabulary knowledge and reading comprehension: A latent change score mod- eling study. Child development, 86, 159-175. Rose, J. (2006) Independent Review of the Teaching of Early Reading. Nottingham: DfES Publications Rose, J. (2009). Identifying and teaching children and young people with dyslexia and liter- acy difficulties. London, UK: Department for Children, Schools and Families. Re- trieved from http://www.teachernet.gov.uk/wholeschool/sen/ Rumsey, J. M., Andreason, P., Zametkin, A.J., Aquino, T., King, A.C., Hamburger, S.D., et al. (1992). Failure to Activate the Left Temporoparietal Cortex in Dyslexia - an O-15 Positron Emission Tomographic Study. Archives of Neurology, 49, 527–534. Scargle, J. D. (1999). Publication bias (the" file-drawer problem") in scientific inference. arXiv preprint physics/9909033. Seidenberg, M. (2017). Language at the Speed of Sight: How we Read, Why so Many Can’t, and what can be done about it. Basic Books. Sherman, K. H. (2007). A Meta-analysis of Interventions for Phonemic Awareness and Phon- ics Instruction for Delayed Older Readers. University of Oregon, ProQuest Disserta- tions Publishing 2007: 3285626 Shulman, L. S. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15, 4-14. Shaywitz, S. E., & Shaywitz, B. A. (2005). Dyslexia (specific reading disability). Biological Psychiatry, 57, 1301-1309.

Snowling, M. J., & Hulme, C. (2011). Evidence-based interventions for reading and language difficulties: Creating a virtuous circle. British Journal of Educational Psychology, 81, 1–23. http://dx.doi.org/10.1111/j.2044-8279.2010.02014.x Snowling, M. J., & Hulme, C. (2014). Closing a virtuous circle: Reciprocal influences be- tween theory and practice in studies of reading intervention. Journal of Research on Educational Effectiveness, 7, 300–306. ttp://dx.doi.org/10.1080/19345747.2014.925307 Stadthagen-Gonzalez, H., Bowers, J.S., & Damian, M.F. (2004) Age of Acquisition Effects in Visual Word Recognition: Evidence From Expert . Cognition, 93, B11- B26. Stuebing, K. K., Barth, A. E., Cirino, P. T., Francis, D. J., & Fletcher, J. M. (2008). A re- sponse to recent reanalyses of the National Reading Panel report: Effects of systematic phonics instruction are practically significant. Journal of Educational Psychology, 100, 123-134 . Suggate, S. P. (2010). Why what we teach depends on when: Grade and reading intervention modality moderate effect size. Developmental Psychology, 46, 1556-1579. Suggate, S. P. (2016). A meta-analysis of the long-term effects of phonemic awareness, phonics, fluency, and reading comprehension interventions. Journal of learning disabil- ities, 49(1), 77-96. SYSTEMATIC PHONICS 42

Swanson, C. B., & Barlage, J. (2006). Influence: A study of the factors shaping education policy. Bethesda, MD: Editorial Projects in Education Research Center. Taylor, J. S. H., Davis, M. H., & Rastle, K. (2017). Comparing and validating methods of reading instruction using behavioural and neural findings in an artificial orthogra- phy. Journal of Experimental Psychology: General, 146(6), 826. Torgerson, C., Brooks, G., Gascoine, L., & Higgins, S. (2018). Phonics: reading policy and the evidence of effectiveness from a systematic ‘tertiary’review. Research Papers in Education, 1-31. Torgerson, C. J., Brooks, G., & Hall, J. (2006). A systematic review of the research literature on the use of phonics in the teaching of reading and spelling (DfES Research Rep. 711). London: Department for Education and Skills, University of Sheffield. Tunmer, W., & Hoover, W. (1993). Phonological recoding skill and beginning reading. Read- ing and Writing: An Interdisciplinary Journal, 5, 161-179. Umbach, B., Darch, C., & Halpin, G. (1989). Teaching reading to low performing first grad- ers in rural schools: A comparison of two instructional approaches. Journal of Instruc- tional Psychology, 16, 23-30. Valentini, A., Ricketts, J., Pye, R. E., & Houston-Price, C. (2018). Listening while reading promotes word learning from stories. Journal of experimental child psychology, 167, 10-31. Venezky, R. L. (1967). English orthography: Its graphical structure and its relation to sound. Reading Research Quarterly, 75-105. Venezky, R. (1999). The American way of spelling. New York: Guilford. Watson, J. & Johnston, R. (1998) Accelerating reading attainment: the effectiveness of syn- thetic phonics (The Scottish Office: Education and Industry Department). Available online at: www.standards.dfes.gov.uk/pdf/literacy/rjohnston-phonics Willingham, D. T. (2015). Raising kids who read: What parents and teachers can do. John Wiley & Sons. Wrigley, T. (2017). Synthetic phonics and the phonics check: The hidden politics of early literacy. In M.M. Clark et al. (Eds.), Reading the Evidence: Synthetic Phonics and Lit- eracy Learning. Wyse, D., & Goswami, U. (2008). Synthetic phonics and the teaching of reading. British Ed- ucational Research Journal, 34(6), 691-710.

SYSTEMATIC PHONICS 43

Figure Captions

Figure 1. Performance on the phonics screening check from Years 2012-2017. Score of 32 is the threshold for expected standard. The sharp rise in performance at 32 suggests a problem with the marking of the test.

Figure 2. Results on Keys Stage 1 SAT tests in reading, writing, maths, and science from 2006-2017 as well as the results of the phonics screening check from 2012-2017. SAT scores to the left of vertical dashed line were achieved without having completed the phonics screening check in Year 1, and SAT scores to the right of the vertical dashed lined were achieved after having completed the phonics check in Year 1. According, the improved SAT results on reading and writing between 2011-2012 cannot be at- tributed to the improved administration of phonics.

Figure 3. Results on Keys Stage 2 SAT tests in reading, writing, maths, and science from 2007-2017 as well as the results of the phonics screening check from 2012-2017. SAT scores to the left of vertical dashed line were achieved without having completed the phonics screening check in Year 1, and SAT scores to the right of the vertical dashed lined were achieved after having completed the phonics check in Year 1. The finding that SAT reading results did not improve between 2016-2017 indicates that the im- proved administration of phonics in Year 1 (back in 2012) did not have a long-term im- pact on the SAT scores.