Abstract
This report identifies mathematics and science curricula as well as professional development models at the middle and high school levels that are effective based on their success in increasing student achievement. The goal of the study was to provide some choice to districts and schools that wanted guidance in selecting a curriculum and that wished to use effectiveness as a selection criterion. Unexpectedly, most middle and high school mathematics and science curricula did not have studies of student achievement with comparison groups, and it proved especially difficult to find effects in either math or science for subgroups by sex, minority status, and urban status. Findings strongly suggest that science curricula is more effective when it is inquiry-based, although math curricula can be effective when standards- or traditional-based.
REVIEW OF EVALUATION STUDIES OF MATHEMATICS AND SCIENCE CURRICULA AND PROFESSIONAL DEVELOPMENT MODELS
By Beatriz C. Clewell Clemencia Cosentino de Cohen The Urban Institute
and
Patricia B. Campbell Lesley Perlman Campbell-Kibler Associates, Inc.
with
Nicole Deterding Sarah Manes Lisa Tsui The Urban Institute
and
Shay N.S. Rao Becky Branting Lesli Hoey Rosa Carson Campbell-Kibler Associates, Inc.
Submitted to the GE Foundation December 2004 Acknowledgments
A number of individuals contributed to this effort in various ways. We were fortunate to have the assistance of Gerhard Salinger of the National Science Foundation; Jo Ellen Roseman of Project 2061 at the American Association for the Advancement of Science (AAAS); Joan Abdallah at AAAS; and several staff members at the Center for Science Education at the Education Development Center—Barbara Berns, Jeanne Century, Joe Flynn, Elisabeth Hiles, Jackie Miller, Marian Pasquale, and Judith Sandler —in helping us identify science curricula that might have evaluation studies. We are also deeply indebted to those who reviewed our report and offered useful suggestions for revising it: Jane Butler Kahle at Miami University of Ohio and Linda Rosen of Education & Management Innovations, Inc. Thanks are due to William Bradbury and Cara West, Urban Institute staff who helped in the production of the report. Most of all, we wish to express our appreciation for the responsiveness of curriculum developers and researchers whose curricula are reviewed in this report and who shared evaluation studies with us.
Last, but not least, we thank our program officer, Roger Nozaki and his colleague, Kelli Wells, both of the GE Foundation, for their insightful comments and suggestions on the report draft that helped to make this document more user-friendly. We thank the GE Foundation for funding this review and for taking an evidence-based approach to school reform. We think it’s the only way to go.
REVIEW OF EVALUATION STUDIES OF MATHEMATICS AND SCIENCE CURRICULA AND PROFESSIONAL DEVELOPMENT MODELS
Introduction
This report presents the findings of a review of about four hundred studies evaluating mathematics and science curricula and professional development models for middle school and high school. As requested by the GE Foundation, the main goal of this review was to identify, in response to the GE Foundation’s request, mathematics and science curricula as well as professional development models that had been deemed effective based on their success in increasing student achievement. The Foundation’s interest in these findings stems from its desire to initiate a program of funding to foster sustainable improvement in academic achievement of underrepresented and disadvantaged populations.
Historically, curriculum choice at the local level has often been made by a committee that decides which curriculum to adopt based on considerations only peripherally related to student achievement—such as state-imposed standards, recommendations of others, cost, and presentations by publishers’ representatives. Choice of professional development models follows a similar pattern. Indeed, there has been very little else available to guide school districts in their curriculum selection process, since for most curricula and textbooks the only data at hand are publishers’ figures on the number of adoptions. That has been changing. There is a growing movement to assess the effectiveness of math and science curricula through various methodologies, including content analyses, comparative studies, case studies, and synthesis studies.1 And while there have been several studies of the effectiveness of professional development practices, very few have measured the effects of these practices on student achievement.
In this document, we describe the methodology used to conduct this review, present our findings, and end with a summary of conclusions. To provide an international perspective on these topics, the report includes a brief look at the research on mathematics and science education in three countries that are similar in key dimensions to the U.S.
Methodology
Criteria for Selecting Evaluation Studies
We developed a set of criteria to guide the selection of evaluation studies to be included in our review. Studies were expected to have (1) rigorous methodological design; (2) measures of impact on student outcomes (which include, but are not limited
1 The majority of these efforts have been undertaken by the American Association for the Advancement of Science (AAAS), the National Research Council (NRC), and the U.S. Department of Education. The AAAS study used content analysis, the NRC study did not rate specific curricular math programs, and the U.S. Department of Education study reviewed middle school math programs only.
1 to, test scores); (3) comparative data, cross-sectional or longitudinal, with experimental and quasi-experimental designs preferred over others; and (4) high quality and valid data.
We offer several caveats regarding the quality of the evaluation studies that we report in this document, especially those on mathematics and science curricula. Because of the dearth of studies that met our criteria, we were forced to compromise and include a number of evaluations that did not report effect sizes; a few that did not give the statistical significance levels for differences; and several that lacked non-treatment comparison groups. In some cases we were unable to verify the quality of the data on which findings were based. It was also a source of great disappointment that so few of the studies we identified disaggregated findings by sex, race/ethnicity, or urban school location. We believe, however, that taken as a whole the studies that are included here offer useful insight into the condition of mathematics and science curricula in middle and high school.
Identification of Curricula/Professional Development Models
Using research databases such as the Education Resource Information Center (ERIC), Education Abstracts, and web sites such as Northwest Regional Education Laboratory’s Catalog of School Reform Models, we conducted a literature search of articles and reports pertaining to (1) major mathematics and science curricula used at the middle and high school level; and (2) empirical studies that examine how teacher professional development in science and mathematics affects student outcomes. The review team also gathered and reviewed relevant documents that were not accessible through traditional research outlets. It was much more the case for science than for mathematics that most of the evaluation studies of recent curricula were unpublished reports of evaluations conducted by the program developers. On the other hand, a large number of published mathematics curriculum studies were available for inclusion in this review. Appendix D contains lists of all mathematics and science curricula for which studies were sought.
Primary sources of mathematics reform models were the Northwest Regional Educational Laboratory’s database on whole-school reform models2 and Comprehensive School Reform and Student Achievement: A Meta-Analysis.3 Primary sources of the mathematics curriculum were National Science Foundation-funded projects; the U.S. Department of Education’s “What Works Clearinghouse” and the Mathematics Expert Panel; the Mathematical Sciences Education Board’s Review of the Evaluation Data on the Effectiveness of NSF-Supported and Commercially Generated Mathematics Curriculum Materials; and the American Association for the Advancement of Science’s Project 2061.
2 http://www.nwrel.org/scpd/catalog/index.shtml 3 Borman, G. D., G. M. Hewes, L. T. Overman, and S. Brown. 2002. Comprehensive School Reform and Student Achievement: A Meta-Analysis. CRESPAR Report No. 59. Baltimore, Md.: CRESPAR/Johns Hopkins University. http://www.csos.jhu.edu/CRESPAR/techReports/report59.pdf,
2 In order to ensure broad coverage of the science curriculum studies, we contacted experts on science curricula at organizations such as the Education Development Center, the Technology Education Research Center (TERC), the National Science Foundation, Project 2061, and others. In view of how few published evaluations of science curricula we were able to identify, we attempted to locate more recently developed curricula that might not yet have produced published evaluation studies. Once these curricula were identified through conversations with experts in the field, we obtained contact information on the developers of these programs and approached each of them to ascertain whether or not they had evaluation data or reports on the effectiveness of their curricula that met our established criteria. Several developers either did not respond to our requests or responded that they had not yet completed evaluation studies. We were able, however, to secure a number of evaluation reports from developers and reviewed these to determine whether or not they met our criteria for inclusion in this study. We also scoured the Internet for sources of information on curricula and on relevant evaluation studies.4 Finally, to facilitate our analysis of the data, we developed matrices into which we entered relevant information on each of the studies that met our criteria. This information included the methodological design, the analytic technique, student outcome areas measured, outcome measures and instruments used, number in the sample, whether data were disaggregated by race/ethnicity and sex, and impact. We also included descriptive information about the curriculum, including the subject matter, targeted grades, curriculum name, whether or not it had a professional development component,5 and its principal instructional features.
We were able to locate very few studies of professional development models that used student achievement measures of effectiveness.6 Most of these studies were found via an extensive review of the literature, including ERIC, Education Abstracts, ProQuest, EbscoHost and others. A matrix containing descriptive information on the professional development models and on the study elements was prepared for all the professional development evaluation studies that we identified.
4 American Association for the Advancement of Science (AAAS)’s Project 2061: http://www.project2061.org/publications/articles/textbook/default.htm, Department of Education's Mathematics and Science Education Expert Panels: http://www.ed.gov/offices/OERI/ORAD/KAD/expert_panel/math-science.html, Eisenhower National Clearinghouse for Mathematics and Science Education: http://www.enc.org, Center for the Social Organization of Schools: http://www.csos.jhu.edu/, Education Development Center, Inc.’s Center for Science Education: http://cse.edc.org, Education Resource Information Center (ERIC): http://eric.ed.gov, Northwest Regional Education Lab: http://www.nwrel.org/scpd/catalog/index.shtml, National Clearinghouse for Comprehensive School Reform: http://www.csrclearinghouse.org/ National Science Resources Center: (http://www.nsrconline.org/), The Textbook League: (http://www.textbookleague.org/), University of Wisconsin-Madison’s National Center for Mathematics and Science: (http://www.wcer.wisc.edu/ncisla/), curriculum company web sites, developer web sites (i.e., at Johns Hopkins, LHS, etc.), with follow-up to authors of studies, and subscription electronic journal databases: (ProQuest Research Library Plus, JSTOR, Education Abstracts, EbscoHost, and Project MUSE) 5 Most of the data on the mathematics curricula did not provide this information, while that on the science curricula did. 6 The paucity of such studies has been mentioned by several researchers (Harlen 2004; Kennedy 1998; Marek and Methven 1991).
3
The International Component
Criteria for selecting countries for international comparison. Our report includes an international comparative study of a set of countries that may contribute useful information to the research at hand. The goal of this comparison is to garner additional, corroborating evidence on best practices. This required that countries be selected carefully and purposefully, based on the degree to which their experiences may be useful to the U.S. We selected nations based on two main criteria: (1) average country achievement on different mathematics and science tests (only high achievers were considered);7 and (2) similarity to the U.S. on key features of their educational systems— namely, degree of centralization of decisionmaking with respect to the curriculum; degree of centralization of decisionmaking with respect to textbook use; and degree of stratification or selectivity of the system. Focusing on those countries that performed well on average and whose educational systems were closest to the U.S. yielded three nations for comparison: Canada, Australia, and England.8 The section entitled “International Comparative Study” provides a discussion of findings, as well as additional information regarding country selection.
Identification of relevant research on selected nations. We conducted a review of the literature on the three selected countries based on several sources. These included databases such as ERIC and Proquest, as well as institutional sources—such as OECD, UNESCO, and the World Bank—and studies arising from the tests used for selection herein—Trends in International Mathematics and Science Study (TIMSS) and Programme for International Student Assessment (PISA). The review focused first on studies of characteristics of candidate educational systems and then on curriculum, pedagogy, and professional development at selected countries.
Findings of the Review
Interpreting Results: Some Things to Consider
The validity of the test. Issues for mathematics curriculum studies. During the past 15 years, mathematics curriculum development has moved in two different directions. Traditional curricula have continued the hierarchical structure of mathematics courses broken out by specific subject area: algebra, followed by geometry, followed by a more advanced algebra/trigonometry and pre-calculus. Calculus is made available for accelerated
7 Country performance was based on three tests— Trends in International Mathematics and Science Study (TIMSS) 1995 and 1999, and Programme for International Student Assessment (PISA) 2000. 8 England is not a “high achiever”, as it experiences achievement levels similar to the U.S., but it was included for reasons spelled out on page 12, under “The International Comparative Study: Conclusions.”
4 students, usually those who took algebra in the eighth grade. Standards-based curricula tend to be more interdisciplinary, providing students with a range of subject areas the first year and returning to them during each subsequent year, allowing for deeper analysis and understanding that tends to be more focused on longer-term problem solving.
The content and skills covered by most standardized achievement tests tend to reflect more closely the content and skills covered by traditional mathematics curricula than those covered by standards-based curricula, causing them to have better “content validity” for traditional curricula. Many researchers of standards-based curricula are aware of this and develop their own student achievement tests that more accurately test the skills and content of standards-based curricula. If students taking one curriculum score higher than others on both types of test, there is no question of interpretation. Beyond that, however, judgments of efficacy must take into account the content validity of the tests in order to determine which type of curriculum is more effective.
Issues for science curriculum studies. As inquiry-based science curricula have become a major tool for standards-based reform efforts across the U.S., a dilemma has arisen regarding the appropriateness and credibility of assessments to measure effectiveness of these curricula in terms of student achievement. Basically, the dilemma can be described in the words of Walker and Schaffarzick (1974), who concluded from their review of innovative curricula that: “innovative students do better when the criterion is well-matched to the innovative curriculum, and traditional students do better when the criterion is matched to the traditional curriculum” (p. 94). To address potential lack of “fit”, curriculum developers have developed their own assessments that more closely measure the intended effects on students. Results from standardized tests, however, carry greater credibility and are used by most states for accountability purposes (although less for science than for mathematics). The dilemma posed, which is similar to that faced in mathematics, does not seem, nevertheless, to be as much of a problem in science. Shymansky, Kyle, and Alport (1983), for example, in conducting their large meta- analysis, compared the results of standardized tests to those of other forms of assessment and found very small differences. More recently, Hamilton, McCaffrey, Stecher, Klein, Robyn, and Bugliari (2003) report that the differences between multiple-choice (standardized) and open-response tests of student achievement that they observed in evaluating the effects of standards-based mathematics and science instruction at 11 sites were not significant. Ruby (2001) discovered that the positive relationship of hands-on science and test scores that he found did not differ by type of test.
Measures used. The evaluation studies reported here for mathematics and science curricula used a variety of assessments—some standardized, some state-mandated, some self-developed, and some developed by others specifically for standards-based curricula. The types of assessment tools used in the evaluations are specified for each study, and judgments regarding the content validity of the assessments used should guide determinations about the effectiveness of these curricula.
5 The size of the difference. Most impact studies for both mathematics and science curricula reported the statistical significance of their results.9 Statistical significance represents the probability that an observed difference really exists and is not due to chance. It does not say anything about the size or meaningfulness of a result. (Being statistically significant is a first step—if there are no statistically significant differences then there are no differences.) If differences are statistically significant then there is another measure, called an effect size, which provides a measure of size—that is, shows how big the difference is. Effect sizes greater than .4 are considered large; between .2 and .4 are considered moderate; and less than .2 are considered small (Glass, McGaw, and Smith, 1981). Readers can also look at the size of statistically significant differences and decide for themselves how meaningful they are.
The target population or district context. Unfortunately, very few studies presented data on effects by sex, race/ethnicity, or urban school location. Where data are disaggregated by these characteristics, we have highlighted these findings, which enhance our knowledge about the effectiveness of curricula for populations or districts targeted by the Foundation for funding.
Middle and High School Mathematics Curricula: Conclusions
Our review netted 89 middle and high school curricula including eight mathematics curricula that were developed as part of whole school reform efforts (out of 31 whole school reform efforts examined) and 81 other middle and high school mathematics curricula. A total of 156 studies of student mathematics achievement with comparison group data were found for 18 of the curricula (20 percent of the total number of curricula identified). A table listing the math curricula that had credible evaluations appears in appendix A together with overviews of the 18 mathematics curricula and their impact studies. The overviews include: the type of student achievement measure used, the number and direction of the results, and, if available, the size of any differences between groups. If the results were broken out by sex and/or by race/ethnicity, this, too, is indicated. We concluded from our review of these evaluation studies that:
Most middle and high school mathematics curricula do not have studies of student achievement with comparison groups that can be found through literature or web searches.
As indicated above, studies of student achievement with comparison groups could be found for only 20 percent (18) of the curricula. Only three of the studies found specified the curriculum to which the target curriculum was being compared. The rest compared their curriculum to some unnamed curriculum, making comparisons across curricula impossible.
9 Where inferential statistics were used, only differences that reached the conservative minimum acceptable statistical significance level of .05 were included. Inferential statistics were used in all science studies with one exception, which is noted in the description.
6 If students are going to be judged on the results of an external test, the mathematics curriculum selected should cover the areas and skills that are included on that test (i.e., the test and curriculum should be aligned).
Different mathematics curricula cover different content areas at different times. Three of the 18 curricula—Saxon Math, Direct Instruction, and Advanced Placement Calculus— cover traditional mathematics subject areas (i.e., algebra, geometry) while the remaining 15 integrate traditional mathematics subject areas across years rather than covering a subject area per year. Whether a curriculum is “integrative” or “traditional” has implications for testing. As would be expected, students tend to score higher on tests focusing on the content and skills covered in their curriculum. Traditional math curricula have greater content validity than do standards-based curricula in most standardized and state tests. Integrative mathematics curricula, which are standards based, have greater content validity with standards-based tests than do most traditional curricula.
Studies of six of the curricula (Cognitive Tutor, Connected Mathematics, Interactive Mathematics Program, Prentice Hall: Tools for Success, Saxon Math and University of Chicago School Mathematics Project [UCSMP]) found that students who use the curriculum being tested scored higher than comparison students on a majority of standardized and/or state tests used as well as on a majority of the curriculum- based tests used.
One of the six curricula, Saxon, focuses on traditional course breakdowns (i.e., algebra, geometry), while four curricula—Connected Mathematics, Interactive Mathematics, Prentice Hall, and UCSMP—are integrative. The last curriculum, Cognitive Tutor, includes both traditional and integrative components. Moderate to large achievement differences between target and comparison students, as indicated by effect size, were found in favor of four of the six curricula (Cognitive Tutor, Connected Mathematics, Interactive Mathematics, and Prentice Hall). All six curricula cover middle school and three—Cognitive Tutor, Saxon, and UCSMP—cover high school as well.
The few results broken out by sex were inconsistent.
Only five of the 18 curricula—Cognitive Tutor, Connected Mathematics, Interactive Mathematics, MATH Connections, and Mathematics with Meaning—broke out results by sex. In Connected Mathematics and MATH Connections, no sex differences were found, while in Mathematics with Meaning, boys scored slightly higher than girls. In Cognitive Tutor, the results were mixed. Girls taking Interactive Mathematics were slightly more apt than boys to continue on to three or more years of mathematics.
Connected Mathematics appeared to be reducing racial/ethnic gaps.
Four Connected Mathematics studies looked at the relative growth in achievement by race/ethnicity. In two studies, African-American and Hispanic students showed greater growth than the other Connected Mathematics students. In a third study, African- American students showed greater growth than others, while in a final study Hispanic,
7 White, African-American and Asian-American students’ scores increased while Native American students’ scores decreased.
With the exception of Connected Mathematics, too few results per curriculum were broken out by race to allow us to draw general conclusions regarding racial/ethnic effects.
Five curricula presented results by race/ethnicity (Cognitive Tutor, Connected Mathematics, MATH Connections, Mathematics in Context, and Mathematics with Meaning). A MATH Connections study found no significant difference by race/ethnicity among MATH Connections students, while in Mathematics with Meaning, white students outperformed minority students. Other studies compared target students with comparison students of the same race/ethnicity. African-American Mathematics in Context students in one study were found to score better than comparison students. In one Cognitive Tutor study, African-American students scored better than comparison students, while there was no difference for Hispanic students. In a second study, Hispanic students using Cognitive Tutor did better than comparison students. As detailed in the section above, Connected Mathematics studies provided more consistent evidence that the curriculum was successful in reducing racial/ethnic gaps.
Middle and High School Science Curricula: Conclusions
We identified 80 science curricula at the middle and high school levels.10 Similar to the mathematics curriculum, which was included in eight whole-school reform efforts, we found seven science curricula that had been developed specifically as a part of whole school reform.
A total of 45 studies of student achievement in science were found that met our criteria, covering 26 percent (21) of the total science curricula identified. A table listing these curricula as well as brief descriptions of each and the results of the studies are given in appendix B. As with the mathematics curricula, the study descriptions include the type of student achievement measures used, a description of the results from each study, and, if available, the effect sizes of any differences between groups. Results by sex, race/ethnicity, limited English-proficiency (LEP) status, or urban schools are provided where available. Our review of evaluation studies of science curricula led us to the following conclusions:
As with mathematics curricula, most middle and high school curricula do not have evaluation studies of student achievement with comparison groups that can be found through published literature or web searches.
Of the 80 curricula identified, studies that met our criteria (45) could only be found for 21 curricula. None was a publisher’s textbook series. In contrast to the mathematics
10 Of the 80 science curricula identified, 59 had no evaluations or had evaluations that did not meet our criteria, or evaluations were out of print, or we did not receive a response from the developers.
8 curricula, however, many of which had been the subjects of multiple studies, most of the science curricula had only one evaluation or study, usually unpublished works available through the developer. Because the National Science Foundation (NSF) was a pioneer in developing standards-based science curricula (the first wave of these appeared in the sixties), there were published meta-analyses that examined the effect on student achievement of a large group of NSF-funded science programs—all inquiry-based—on student achievement as compared to the effect of traditional, textbook-based science curricula. We believe that the dearth of evaluation studies of single science curricula can be explained by the relatively recent development of the new generation of science curricula, many of which were also funded by NSF. Not enough time has elapsed for these curricula to have been the subject of multiple studies, and those studies that are available have been conducted mostly as evaluations by the developers of the curricula.
Science curricula based on the inquiry approach are consistently more effective than traditional science curricula as measured by student achievement.
The preponderance of evidence provided by meta-analyses and evaluations of individual curricula seem to confirm that inquiry-based science curricula produce larger effects on student achievement than do the more “traditional” science curricula. The largest study of this kind (Shymansky, Kyle, and Alton 1983), which was reanalyzed in 1990 (Shymansky, Hedges, and Woodworth), involved 81 studies (reanalysis figures). While this meta-analysis found that inquiry-based science programs had the greatest impact on student achievement and process skill development in the primary grades (with significant differences in effect sizes found at the intermediate elementary level [four through six] for attitudes and perceptions only), by the junior high and high school levels, significant impact was found on achievement, attitude, and process skills. Other meta- analyses have reported greater positive effects on student performance for inquiry- oriented science than traditional approaches for high school curricula (Weinstein, Boulanger, and Walberg 1982); inquiry-discovery teaching techniques (Wise and Okey 1983); and an inductive rather than a deductive approach to teaching (although this effect was very small) (Lott 1983).11 No meta-analysis on inquiry-based science curricula of the magnitude undertaken by Shymansky, Kyle, and Alport in 1983 (and Shymansky, Hedges, and Woodworth in 1990) has been published on more recent inquiry-based science curricula, although researchers at Education Development Center are currently conducting such a study, and the National Research Council held a meeting in May 2004 on the topic of evaluating inquiry-based science.
The direction of the effects of inquiry-based science curricula on student achievement and performance is generally positive, as shown in the individual evaluations of the curricula that are identified in this report. Programs showing the greatest positive effects
11 These earlier meta-analyses used the terms “new” and “innovative” to describe inquiry-based science curricula. The distinction between “new” and “traditional” curricula was set forth by Shymansky and his colleagues (1983), with “new” curricula (a) having been developed after 1955; (b) emphasizing the nature, structure, and processes of science; (c) integrating lab activities as an integral part of the class routine; and (d) emphasizing higher cognitive skills and appreciation of science. “Traditional” curricula were defined as (a) having been developed before 1955; (b) emphasizing knowledge of scientific facts, laws, theories, and applications; and (c) using lab activities as secondary applications of concepts previously covered in class.
9 are (1) a set of activity models for use in physical science and technology education courses in middle and high school (Designs/Designs II); (2) a comprehensive, laboratory- based program in which students in grades seven through nine construct their own knowledge through experiential, hands-on learning (Foundational Approaches in Science Teaching [FAST]); (3) curriculum materials to support the development of integrated science understanding for middle school students in urban schools (Center for Learning Technologies in Urban Schools [LeTUS]); (4) a supplemental program for average-to- gifted students in grades two through eight employing problem-based learning to engage students in the study of the concept of systems, specific science content, and the scientific research process (National Science Curriculum for High Ability Learners); (5) a program to promote understanding of physics principles in the context of experiences relating to the daily lives of high school students (Physics Resources and Instructional Strategies for Motivating Students [PRISMS]); and (6) an inquiry-based, technology-supported environmental science curriculum for high school (WorldWatcher/LATE).
It is difficult to determine the effect of these science curricula on different subgroups of students—such as girls, minority group members, and urban students.
Very few of the curricula had studies that met our criteria and disaggregated their findings by sex, language minority status, or urban location. Surprisingly, none of the studies reported data disaggregated by race/ethnicity.
Sex. In one evaluation study of Constructing Ideas in Physical Science (CIPS), participation did not appear to have closed the gender achievement gaps on multiple- choice content, process questions, or open-ended content items. One of two studies on Center for Learning Technologies in Urban Schools (LeTUS) provided evidence that participating in at least one LeTUS unit reduced the boy-girl achievement differences on statewide examinations. Evaluation data on Modeling Instruction in High School Physics show that, in terms of performance, male students consistently outperform female students. The Designs/Designs II evaluation reported that on measures of conceptual knowledge, there was no significant difference in the gains made by girls versus boys. Thus, one (LeTUS) of the four curricula that reported data by sex showed greater gains in (some areas of) achievement for female students. Two (CIPS and Designs/Designs II) showed no differences and a fourth showed larger gains for boys. A large meta-analysis of NSF-funded inquiry-based programs (Shymansky, Kyle, and Alport 1983), which was resynthesized in 1990 (Shymansky, Hedges, and Woodworth), found that the NSF- funded “new” science curricula had a significant positive effect on males but not on females in terms of composite performance; analytic skills of females, nevertheless, improved significantly in the inquiry-based science programs.
Race/Ethnicity, LEP Status, or Urban School Attendance. Interestingly, evaluations of two inquiry-based curricula reported positive results for English Language Learners (ELLs). The use of FOSS in fourth and sixth grade classes for ELLs showed a positive relationship between years in the science program and standardized test scores. The evaluation of Expeditionary Learning Outward Bound (ELOB) reported consistent gains in all science subject areas over five years for one school where the number of immigrant
10 (ELL) students grew by 22 percent. This school also had a high percentage of economically disadvantaged students (i.e., students on free and reduced lunch).
The large meta-analysis of NSF-funded inquiry-based programs (Shymansky, Kyle, and Alport, 1983), which was re-synthesized in 1990 (Shymansky, Hedges, and Woodworth), found that the NSF-funded inquiry-based programs, while having a greater effect on all students than did the traditional programs, showed (1) a much greater effect on the composite and achievement scores of urban students than on their suburban or rural counterparts and (2) a much greater effect on the analytic scores of urban students than on their suburban counterparts. WorldWatcher/Learning about the Environment (LATE) reported higher gains for urban students than for suburban students. It is surprising that none of the studies that provided disaggregated data on urban students showed separate outcomes by race/ethnicity.
Most science curricula include a professional development component.
At least 16 of the 21 science curricula for which we report evaluation studies have professional development components. Inclusion of a professional development component as a part of the curriculum is far more prevalent in science than in mathematics because science curricula tend to be more discretionary and variable than mathematics curricula (Kennedy 1998), leading developers to provide more guidance to teachers regarding the appropriate instructional approaches to be used for specific curricula.
Professional Development Programs: Conclusions
We identified 18 evaluation studies of professional development in science and mathematics that used student achievement outcomes as measures of effectiveness.12 A matrix outlining general features of these studies appears in appendix C. Our search for studies that met established criteria was facilitated by the research of Mary Kennedy (1998). The following are the conclusions that we draw from a review of these 18 studies and others:
Providing professional development for teachers of standards-based science curricula is associated with higher levels of student achievement.
The re-analysis of the large 1983 meta-analysis of inquiry-based science curricula (Shymansky, Hedges, and Woodworth 1990) found larger effect sizes on student performance measures for students of teachers in inquiry-based courses who had participated in professional development linked to the use of inquiry-based materials. (Students of teachers using inquiry who had not had professional development still outperformed students in traditional courses, but the former were outperformed by students in inquiry-based courses whose teachers had received professional development.) Similarly, an evaluation of an inquiry-based science curriculum, Project Inquiry, found that teachers who received professional development in implementing
12 Five of these were connected to specific mathematics or science curricula.
11 standards-based, inquiry-oriented instructional strategies (and who used the specific science materials linked to the program) had students who performed significantly higher on two science assessments (Rose-Baele 2003). This is also true of an evaluation of Modeling Instruction in High School Physics. This evaluation found that the students of high school physics teachers who had completed the modeling workshop series demonstrated much greater gains on a widely used physics assessment tool than physics students of the same teachers the year before participation in the professional development series and a comparison group of high school physics students a decade ago (Hestenes 2000).
Professional development that is tied to curriculum, to knowledge of subject matter, and/or to how students learn the subject is more effective in terms of improving student achievement than is professional development that focuses only on teaching behaviors.
A number of studies have concluded that the content of professional development is more important than its format and that content should be linked to subject matter knowledge, a specific curriculum, or the process of student learning. In her analysis of 12 studies of professional development models that reported effects on student achievement, Kennedy (1998) found that the models showing the largest effect sizes were those that focused on subject matter knowledge and on student learning of a particular subject. This finding was echoed in the work of McCaffrey, Hamilton, and Stecher (2001) in a large-scale study of high school standards-based mathematics reform in a large urban school district that was part of NSF’s Urban Systemic Initiative program. One of their conclusions was that in order to be effective, professional development for teachers should consider curriculum and instructional practices in combination. In the researchers’ words, “Simple prescriptions for how to teach are unlikely to be effective” (p. 10). Cohen and Hill (1998) addressed the question of whether students of teachers who received professional development focused on student curriculum scored higher on state mathematics assessments in California. They found that teachers who attended curriculum-centered workshops and who had learned about the state assessment system had students who received higher achievement scores on the state test than students of teachers who had not participated in the workshops or learned about assessment.
The amount of professional development provided is an important factor in influencing both change in teaching behavior of teachers and change in the classroom environment.
The amount of professional development provided to teachers is another factor that has received attention in the literature. While several studies suggest that professional development, to be effective, should be intensive and sustained, (Kahle and Rogg 1996; Supovitz and Turner 2000), we found only one study that specifically investigated how many hours of professional development were required to effect a change in teaching behavior (towards inquiry-based teaching practice). This study found that behavioral change was only evident after teachers had received a minimum of 80 hours of intensive professional development (Supovitz and Turner 2000). The same study found that it was
12 only after 160 hours of professional development that the teachers’ classroom environment acquired a “culture of investigation.”13 Evaluators of Project Inquiry found that the number of self-reported hours of Project Inquiry-sponsored professional development was positively associated with science achievement of students (Rose-Baele 2003). Kennedy (1998), however, cautions against adopting a “more is better” approach to professional development. She points out that of the 12 professional development models that she investigated, amount of contact time was not the most important factor determining the largest effect sizes (although, coincidentally, the most effective model reported 80 in-service contact hours, which was the minimum effective contact time found by other research [Supovitz and Turner 2000]).
Widely held beliefs about what constitutes effective professional development are not supported by research linked to student achievement.
In Kennedy’s study of 12 professional development models, she examined various features of in-service programs that have been hypothesized as being important elements of successful professional development (1998). These features are (1) program intensity as measured by total contact time with teachers (discussed above); (2) dispersal of time (whether it is concentrated or interspersed throughout the school year); (3) classroom visits by experts for consultation or coaching; and (4) whole school or individual provision of professional development. Her conclusions suggest that while professional development in science seems to benefit from distributed time (sessions throughout the academic year), the studies in mathematics do not support the hypothesis that distributed time is beneficial. Four of the programs reviewed by Kennedy provided in-class visitations, yet none produced greater influences on student learning than those that did not. Kennedy’s study also found no compelling evidence that in-service programs working with whole schools are more effective in terms of increasing student achievement in mathematics or science; in fact, the programs in her study that worked with whole schools demonstrated the smallest influences on student learning.
The International Comparative Study: Conclusions
As mentioned earlier, countries were selected based on the degree to which their experiences may be useful to the U.S. We therefore selected, first, nations that were high achieving on international tests of student achievement. Among these, we selected those whose educational system was closest to the U.S.’s. This yielded three nations— Australia, Canada, and England. Two of them—Canada and Australia—are ideal insofar as their systems of education are closest to the U.S.’s. One of them—England—is more centralized than the U.S. (with a national curriculum, for example); more selective (i.e., not a comprehensive system of education); and is not as high achieving as the others (experiencing achievement levels similar to the U.S., sometimes higher and sometimes
13 “Teacher behavior” in this context refers to teachers’ use of specific pedagogical approaches in instruction, such as inquiry-based teaching practices. “Change in the classroom environment” refers to teacher facilitation of an investigative classroom culture through seating arrangements to stimulate discussion, use of cooperative learning groups, encouraging students to explain concepts to one another, and other such practices.
13 lower). All of them, however, share other characteristics that make them good points of reference for the U.S. They are high-income, developed nations and spend similar shares of GDP on kindergarden through twelfth grade education. They are also culturally closer to the U.S. than the Asian and European countries that are generally highest achieving (e.g., Japan and Netherlands). This provides a natural (albeit partial) control for cultural disparities that may account for some of the observed achievement differences. Lastly, these nations have experienced, like the U.S., pressure from federal authorities seeking to influence educational policy.14 The conclusions below are based on a review of the relevant literatures on these three nations and are presented (selectively) as a complement to the conclusions arising from the literature review of curricula and professional development in U.S. mathematics and science discussed above.
Curriculum: Trends in the selected countries follow those found to be effective in the U.S.—shift from theory to applications, integration of subjects.
It is important to note that, like the U.S., two of the selected countries—Australia and Canada—have no national curriculum while one does (England). Australia delegates curriculum decisions to the member states, much like the U.S. Canada does the same thing, but de facto experiences convergence in curriculum coverage across provinces due to coordination through a Council of Ministers and through book purchases (the same publishers furnish books for all the provinces).
These countries have experienced a shift from theory to applications and utility. There is greater emphasis on math and science relevance and importance. There is also emphasis on integrating mathematics and science with other subjects and disciplines, as well as over time (i.e., building better course/content sequences). These changes—as well as the pedagogical ones mentioned below—often clash with testing requirements, as existing tests (usually the centerpiece of accountability efforts as well as certification of student achievement levels) tend to focus on acquired knowledge, on theory rather than on processes, or on demonstrated problem-solving skills. This goes back to the issue of “content validity” discussed earlier—that is, the degree to which the material covered in the test and the curriculum are aligned.
Pedagogy: Strategies prevalent in the selected nations are those found to be effective in the U.S. literature.
The tendency in all of these nations has been to transition, in both math and science, away from traditional textbook-based instruction and into inquiry-based, hands-on pedagogical approaches. They emphasize problem-solving skills over rote memorization, active modeling/activity-based instruction over passive textbook or lecture-based learning. There is also greater emphasis on “data analysis” and on real-world applications, particularly in mathematics. They also are moving towards increased use of technology in classroom instruction. These changes come hand in hand with a decreased emphasis on textbook use (though there is evidence of continued reliance on textbooks)
14 The case of England is an extreme example of this, as it has a national curriculum and unprecedented national government influence in education since 1988.
14 and greater diversity of materials used in the classroom (manipulative, technology, non- textbook printed materials, etc.).15 These conclusions are true of Canada and Australia and to a lesser extent of England as well.
To summarize, these nations (in particular Canada and Australia, the higher achieving of the three) seem to have shifted from a formal, traditional teaching approach to one centered on applications to the real world, on student interactions (group work) and on student-teacher interactions (interactive learning rather than lectures). This also could be described, partly, as a shift in the locus of responsibility for learning—from the teacher to the student.
Professional Development: There are virtually no studies (outside of the U.S.) of the impact of professional development on teaching practices or on student achievement, but there is widespread recognition of the importance of professional development and the need for evaluation of its impact.
Country studies indicate that professional development is offered by a variety of organizations (schools, boards, professional organizations, universities, central governments or departments of education). There is great variation with respect to all aspects of professional development opportunities—number of days/hours, funding sources, decisions regarding form and content of training, and extent to which teachers take advantage of them. There is, however, a clear emphasis on the importance of professional development (and, more generally, teacher quality) to raise student achievement. This is also true of the need to provide professional development opportunities to elementary school teachers, who often lack the knowledge and confidence needed to teach science. Professional development thus focuses, depending on the need of different teacher populations, on content knowledge and/or pedagogy. Leadership skills are another area of focus of professional development. Unfortunately, evidence on the types or forms or intensity of professional development opportunities that are effective (in these countries) is lacking. In detailed country reports on this topic recently published by OECD, all three nations mentioned the need to obtain evidence of the link between professional development and teacher practices and, ultimately, student learning.16
Summary of Conclusions
In this section, we summarize the major conclusions of this review that should be most useful to those wishing to invest in sustainable school reform in science and mathematics.
• Effective mathematics curricula in middle and high school can be either traditional or integrative (standards-based).
15 England, the lowest achieving of the three in mathematics, relies on mathematics textbooks and lecture style more than the other two countries. 16 The Canadian report was based on one province, Quebec.
15 • Effective science curricula in middle and high school should be inquiry-based rather than traditional.
• Effective professional development programs are those that focus on content rather than format and that have the following features: