Do Randomized Controlled Trials Meet the “Gold Standard”?

Do Randomized Controlled Trials Meet the “Gold Standard”? A Study of the Usefulness of RCTs in the What Works Clearinghouse ALAN GINSBURG AND MARSHALL S. SMITH March 2016 AMERICAN ENTERPRISE INSTITUTE Table of Contents Executive Summary ..........................................................................................................................ii Part I: Introduction and Context ............................................................................................................1 The Context ...................................................................................................................................2 Methodology ..................................................................................................................................4 Part II: Threats to the Usefulness of RCTs ................................................................................................6 Evaluators Associated with the Curriculum Developer ..........................................................................6 The Curriculum Intervention Is Not Well-Implemented .......................................................................8 Comparison Curricula Not Clear .....................................................................................................11 Multigrade Curricula Not Adequately Studied ...................................................................................13 Student Outcomes Favor Treatment or Are Not Fully Reported ...........................................................16 Outdated Curricula ........................................................................................................................20 Summary ......................................................................................................................................21 Part III: Observations and Recommendations ........................................................................................24 Notes ...............................................................................................................................................28 i Executive Summary he What Works Clearinghouse (WWC), which • Unknown comparison curricula. In 15 of 27 Tresides in the Institute of Education Sciences studies (56 percent), the comparison curricula are (IES), identifies studies that provide credible and reli- either never identified or outcomes are reported able evidence of the effectiveness of a given interven- for a combined two or more comparison cur- tion. The WWC gives its highest rating of confidence ricula. Without understanding the comparison’s to only well-implemented Randomized Control Trial characteristics, we cannot interpret the interven- (RCT) designs. RCTs are clearly the “gold standard” tion’s effectiveness. to minimize bias in outcomes from differences in unmeasured characteristics between treatment and • Instructional time greater for treatment than comparison populations. Yet when the treatment is a for control group. In eight of nine studies for complex intervention, such as the implementation of which the total time of the intervention was avail- an education curriculum, there is a high potential for able, the treatment time differed substantially other sources of serious estimation bias. from that for the comparison group. In these Our analysis of the usefulness of each of the 27 studies we cannot separate the effects of the inter- RCT mathematics studies (grades 1–12) meeting vention curriculum from the effects of the dif- minimum WWC standards identifies 12 nonselection ferences in the time spent by the treatment and bias threats, many of which were identified in a 2004 control groups. National Research Council (NRC) report. These nonselection bias threats are not neutralized by ran- • Limited grade coverage. In 19 of 20 studies, a domization of students between the intervention and curriculum covering two or more grades does not comparison groups, and when present, studies yield have a longitudinal cohort and cannot measure unreliable and biased outcomes inconsistent with the cumulative effects across grades. “gold standard” designation. Threats to the usefulness of RCTs include: • Assessment favors content of the treatment. In 5 of 27 studies (19 percent), the assessment was • Developer associated. In 12 of the 27 RCT stud- designed by the curricula developer and likely is ies (44 percent), the authors had an association aligned in favor of the treatment. with the curriculum’s developer. • Outdated curricula. In 19 of 27 studies (70 per- • Curriculum intervention not well-implemented. cent), the RCTs were carried out on outdated In 23 of 27 studies (85 percent), implementation curricula. fidelity was threatened because the RCT occurred in the first year of curriculum implementation. Moreover, the magnitude of the error generated by The NRC study warns that it may take up to even a single threat is frequently greater than the aver- three years to implement a substantially different age effect size of an RCT treatment. curricular change. Overall, the data show that 26 of the 27 RCTs in the WWC have multiple serious threats to their usefulness. ii DO RANDOMIZED CONTROLLED TRIALS MEET THE “GOLD STANDARD”? ALAN GINSBURG AND MARSHALL S. SMITH One RCT has only a single threat, but we consider it Recommendation 4: Evaluations of education mate- serious. We conclude that none of the RCTs provides rials and practices should be improved. First, the IES sufficiently useful information for consumers wishing should create an internal expert panel of evaluators, to make informed judgments about which mathemat- curriculum experts, and users (for example, teachers ics curriculum to purchase. and administrators) to consider how, in the short term, As a result of our findings, we make five recommen- to improve the current WWC criteria and standards for dations. Note that all reports stemming from the five reviewing RCTs in education. recommendations should be made public. Second, the IES and the Office of Management and Budget (OMB) should support an ongoing, five- Recommendation 1: IES should review our analyses year panel of experts at the NRC or the National of the 27 mathematics curriculum RCTs and remove Academy of Education to consider what would be those that, in its view, do not provide useful informa- an effective evaluation and improvement system for tion for WWC users. The IES should make their judg- educational materials and practices for the future. It ments and rationale public. should also consider how this system might be devel- oped and supported and what the appropriate role of Recommendation 2: The IES should examine the the federal government should be in designing, creat- other curriculum studies and curriculum RCTs in ing, and administering this system. the WWC. The review should be based on the same criteria as in recommendation 1, and the IES should Recommendation 5: OMB should support a three- remove those studies that, in their view, do not provide year study by a panel of unbiased experts and users useful information. convened by the NRC to look at the quality of RCT studies in noneducation sectors. We see no reason to Recommendation 3: The IES should review a rep- expect that RCTs funded out of the Labor Depart- resentative sample of all the other noncurricula RCT ment, HUD, Human Services, Transportation, or intervention studies in the WWC. The review should USAID would be immune from many of the flaws use the same criteria and standards as in recommenda- we find in the mathematics curriculum RCTs in the tions 1 and 2. Studies that do not meet the standards WWC. established for the reviews of the curriculum studies should be removed from the WWC. iii Part I: Introduction and Context he What Works Clearinghouse (WWC), insti- the past two decades, many organizations and US gov- Ttuted in 2002 as part of the Institute of Educa- ernment officials have touted RCTs as the best way to tion Sciences (IES) within the US Department of produce serious evidence of the effects of various social Education, describes its mission as thus: “The goal of and educational interventions, labeling RCTs as the the WWC is to be a resource for informed education “gold standard” for evaluating government programs.4 decision-making. To reach this goal, the WWC reports Yet not every statistician and scholar has unilateral on studies that provide credible and reliable evidence of faith in RCTs. The first concern is that while a single the effectiveness of a given practice, program, or policy well-done RCT has internal validity, it is carried out (referred to as ‘interventions’).”1 with a particular intervention, for a particular sample, The purpose of our review is to determine how use- at a particular time, and in a particular place, and it ful randomized controlled trials (RCTs) in the WWC produces a valid estimate of the intervention’s effect might be in helping teachers and school administra- only in that setting. Therefore, most single RCTs do tors make accurate, informed decisions about their not have external validity.5 choice of mathematics curricula. The WWC compiles To address this issue, William Shadish, Thomas high-quality evidence on curricula’s effectiveness and Cook, and Donald Campbell propose that a meta- makes it available online, but for that evidence to be analysis of multiple trials could help establish external useful, it must present an accurate picture of each cur- validity.6 Others argue that the strength of the instruc- riculum’s

Load more