Quick viewing(Text Mode)

An Inquiry Into Inquiry Science Teaching in Colombia

An Inquiry Into Inquiry Science Teaching in Colombia

AN INQUIRY INTO INQUIRY TEACHING IN COLOMBIA

A DISSERTATION

SUBMITTED TO THE SCHOOL OF EDUCATION

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Maria Figueroa

May 2011

© 2011 by Maria Jose Figueroa Cahn Speyer. All Rights Reserved. Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/wp841zm8200

ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Richard Shavelson, Primary Adviser

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Edward Haertel

I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy.

Maria Araceli Ruiz-Primo

Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives.

iii

ABSTRACT

Science education in different parts of the world has focused on teaching facts and concepts transmitted by a teacher in a lecture-style approach. In contrast, some initiatives, such as inquiry-based science teaching, use scientific inquiry—what scientists do to generate new knowledge—as a basis for teaching science to students. That is, inquiry- science teaching focuses on getting students to do what scientists do and how they learn about natural phenomena. This is not to say inquiry-science teaching ignores facts and concepts; it goes beyond transmission.

Inquiry-Based Science Education (IBSE) programs have been implemented throughout the world, with the objective of improving science education. Even though

IBSE programs have received wide attention and substantial funding, the impact of this approach on students’ learning is unclear. As a small step in clarifying the impact of

IBSE on students’ achievement, a quasi-experiment was conducted and reported in this dissertation. More specifically, the study examines achievement differences between inquiry science education and typical science education in five schools in Bogotá,

Colombia for overall achievement, achievement by types of knowledge (declarative, procedural, mental model) and proximity of the assessment measure to the curriculum

(proximal and distal), and achievement as measured by performance assessments.

Inquiry-based science teaching takes many forms. Moreover, even though studies compare inquiry teaching with other approaches, descriptions of this type of inquiry teaching are vague and vary widely as to classroom implementation. Through a review of the literature focused on empirical studies that compare inquiry teaching with other

iv approaches, I developed a framework used to define inquiry teaching and assess it using a variety of measurement methods. The framework focuses on three basic elements: 1) teachers, 2) students, and 3) curriculum materials, and how they tap into inquiry facets or domains (conceptual, epistemic, and social). This framework guided my comparative study of an IBSE program in Bogotá, Colombia with a more traditional approach teaching the same unit, Body Systems.

Three types of assessments measured fifth grade students’ science achievement: paper and pencil tests with (1) multiple-choice and (2) constructed-response questions and (3) performance assessments. The multiple-choice questions were constructed to test the different types of knowledge; test items were written proximal and distal to the curriculum taught. Of the two performance assessments, one was content rich with a direct link to the curriculum, while the other was content lean with an emphasis on science process skills.

A total of 365 students from both IBSE and the Control group took the paper and pencil tests and a sub-group of 140 students from both groups took the performance assessments. Data were collected from 5 different schools in Bogotá, three that teach science through an IBSE program and two that use a traditional approach. Data were analyzed using a nested design (classrooms within schools within treatment condition) and allowed for a comparison of the IBSE and the Control group science achievement.

The findings were mixed as to the impact of IBSE teaching on achievement.

While there was no statistically significant treatment effect as measured by the paper and pencil test including the multiple-choice or constructed response questions, there was a significant treatment effect in the content rich performance assessment as well as in the

v content lean. Moreover, even though there was no significant treatment effect on the paper and pencil tests, IBSE students consistently outperformed the Control students on all the different measures of science achievement. This result can be explained by the of the nested design, large variation among schools (that served as a significant part of the experimental error term) and consequently low statistical power.

The results, then, suggest that students who learn science through inquiry are able to go beyond concepts and apply them in conducting science investigations. Additional studies with more schools in order to better generalize than I could in this study as well as to increase statistical power should be done in Colombia and other countries that are reforming their curriculum through inquiry-based science teaching.

vi ACKNOWLEDGEMENTS

I am greatly appreciative of all the teachers and students who were part in this project, who were always willing to help and provide the time, logistics, and feedback that made this study possible. I am also thankful to the school administrators from

Alianza Educativa, Colegio Las Mercedes, and Gimnasio Sabio Caldas for granting permission to conduct all the assessments at different times.

My heartfelt gratitude also goes out to my advisor Richard Shavelson who is responsible for the successful completion of my dissertation. His untiring and constant effort, commitment, encouragement, guidance and unconditional support helped me greatly in the understanding and writing of the dissertation.

A number of experts in different fields have offered useful advice and encouragement. I want to particularly thank Edward Haertel and Maria Araceli Ruiz-

Primo for their valuable suggestions and comments at different stages of this process.

It is a pleasure to thank those who made this thesis possible. ICFES and Pequeños

Científicos provided support in the development of the assessments. The team from

Centro de Evaluación of Universidad de los Andes worked non-stop with me in data collection. And last, my colleagues from Universidad de Los Andes provided valuable feedback and guidance during this dissertation.

I offer special thanks to my office mates and friends at Stanford, Alice Fu and Jon

Shemwell. The whole PhD experience was a great ride with you along. Thanks for the support, patience, and academic growth during these years. To Alice, thank you for the

Wednesday meetings and good vibes during this process.

vii I am grateful to my family and friends, for their patience and constant words of encouragement. I thank my mother Vivian for her support and admiration. I owe my deepest gratitude to my son Emilio, who never complained and always understood the importance of this work. Finally, it would have been next to impossible to write this thesis without Camilo´s support, patience and love.

viii TABLE OF CONTENTS

ix

x

LIST OF TABLES Number Title Page Table 2.1. Characteristics of the Facets of Inquiry that Students Show when 12 Learning Science through an Inquiry Approach (Adapted from Furtak & Siedel, 2008) Table 3.1. Characterization of the Studies that Compare Inquiry with Other 18 Teaching Approaches

Table 3.2. Conceptual Approach - Mapping Study’s Inquiry Conception 21 onto the Inquiry Facets

Table 3.3. Research designs 24

Table 3.4. Critique and Drawbacks 25

Table 4.1. Examples Types of Assessments Used to Measure Science 28 Achievement

Table 4.2. Outcomes used in Studies that Compare Inquiry with other 30 Approaches

Table 4.3. Relationship between Types of Knowledge and Types of 34 Assessments

Table 5.1. Distribution of Students in Public and Private Schools in Bogotá 39 in 2009

Table 5.2. Managing Institutions of Concession Schools in Bogotá 41

Table 5.3. Schools Participating in the Studies 43

Table 5.4. Schools’ Level According to the Results from the 2010 ICFES 48 Exit Exam.

Table 5.5. Schools’ Results in the Science Components of the ICFES Exam 48 (2010).

Table 5.6. Schools’ Results in the 2009 SABER Exams. 49

Table 6.1. Teachers’ Classroom Practice as Evidenced from One Lesson 61

Table 6.2. Summary General Information About Each Teacher Based on 64 Interviews

xi Number Title Page Table 6.3. Characterization of What Students do in Class Based on Teacher 66 Responses to Adapted TIMSS Questionnaire

Table 6.4. Mapping Teachers and Students with the Inquiry Facets 67

Table 6.5. Student Participants in the Study by Group, School and Class 68 (sample sizes).

Table 6.6. Assessments related with the Paper and Pencil Tests 73

Table 6.7. Composition of Items in the Human Body System Booklets 74

Table 6.8. Composition of Items in the Human Body Systems Mapped into 74 the Facets of Inquiry Table 6.9. Reliabilities of the Paper and Pencil assessments. 77

Table 6.10. Number of students who participated in the Performance 79 Assessments.

Table 6.11. Reliabilities of the Performance Assessments 82

Table 7.1. Descriptive Statistics of the Results of the Pre and Post Multiple- 87 Choice Tests

Table 7.2. Correlations among the Paper and Pencil Tests 88

Table 7.3. Adjusted Marginal Means of Results for Post-Equal-Pre 89

Table 7.4. Descriptive Statistics of the Results of the Full Post Paper and 89 Pencil Test

Table 7.5. Results of the Nested ANCOVA with Constructed Response and 90 Posttest as Dependent Variables

Table 7.6. Descriptive Statistics of the Overall Adjusted Means of Science 91 Achievement Depending on the Type of Knowledge by Treatment Group

Table 7.7. Correlations of the Results of Science Achievement Depending 93 on the Type of Knowledge

Table 7.8. Results of the Nested ANCOVA for Knowledge Types with 93 Pretest as Covariate

xii Number Title Page Table 7.9. Descriptive Statistics of the Results of Science Achievement 94 Depending on the Proximity of the Items

Table 7.10. Correlations of the Results of Science Achievement Depending 95 on Proximity

Table 7.11. Results of the Nested ANCOVA for Proximity with Pretest as 95 Covariate

Table 7.12. Descriptive Statistics for the Performance Assessments. 96

Table 7.13. Results of the Nested ANOVA for Performance Assessments 98

Table 7.14. Correlations of the Types of Test 99

Table 7.15. Correlations of the Types of Knowledge and the Performance 100 Assessments

Table 7.16. Correlations of the Types of Knowledge and the Performance 100 Assessments

Table 7.17. Results of the Nested ANOVA for Types of Test 100

xiii

LIST OF FIGURES Number Title Page Figure 2.1. Continuum representing forms of science instruction (Furtak, 6 2006)

Figure 2.2. Triangle of Inquiry-Based Science Teaching 7

Figure 4.1. Schematic of a multilevel assessment of science achievement. 32 Taken from: Ruiz-Primo et al., 2002, p. 372.

Figure 4.2. The relationships between different knowledge types. Source: 34 Shavelson, Ruiz-Primo, Li & Ayala, 2003.

Figure 5.1. Distribution of students in public schools according to SES in 38 Bogotá in 2009. Source: Secretaría de Educación del Distrito, 2009.

Figure 5.2. Map of schools participating in the studies (adapted from 43 Alianza Educativa, 2008).

Figure 6.1. Process of teacher training in Pequeños Científicos. Taken from: 54 Pequeños Científicos. 2004. Institutional Presentation.

Figure 6.2. A print-out of pages of the book: “Santillana Casa Ciencias 58 Naturales 5”.

Figure 6.3. Crossword puzzle (Digestion in ) used in Control group. 59

Figure 6.4. Fill in the blank activity used in Control Groups teaching 60 science.

Figure 6.5. Example of a change in a question after the input received in a 71 Think-Aloud.

Figure 7.1. Effects size by types of knowledge 92

xiv

CHAPTER 1. INTRODUCTION

Science education in different parts of the world has focused on teaching facts and concepts transmitted by a teacher in a lecture-style approach. In contrast, some initiatives, such as inquiry-based science teaching, use scientific inquiry as a basis for teaching science to students. Inquiry-Based Science Education (IBSE) programs have been implemented in at least 30 countries around the world, with the objective of improving science education (IAP Science Education Programme, 2006). Even though IBSE programs have received wide attention and substantial funding, the impact of this approach is unclear. This dissertation studies the impact, if any, inquiry-based science teaching has on student learning in Colombia.

The Setting

Inquiry-based science education programs have been implemented in Colombia for 10 years by Pequeños Científicos, a program run by University of los Andes in collaboration with schools and Maloka, the Bogotá Science Museum. Of all the schools that implement Pequeños Científicos, more than 200 in the country, three schools in

Bogotá were selected for this study. The three Pequeños Científicos schools (hereafter

IBSE schools) and the two comparison schools (hereafter Control schools) come from similar low socio-economic neighborhoods. The schools were selected using the following criteria: all were concession schools, all came from similar socio-economic backgrounds, all had similar results on national standardized tests, all agreed to participate in this study, and all taught the same curriculum at approximately the same

1 time. Teachers in the three IBSE schools have received training by Pequeños Científicos in how to teach science using inquiry. The Control-school teachers use a traditional approach to science teaching, mainly focusing on content coming from a textbook.

As a small step in clarifying the impact of IBSE on students’ achievement, a quasi-experiment was conducted and reported in this dissertation. More specifically, the study examines achievement differences between inquiry science education and traditional science education in five schools in Bogotá, Colombia, for overall achievement, achievement by types of knowledge (declarative, procedural, mental model) and proximity of the assessment measure to the curriculum (proximal and distal), and achievement as measured by performance assessments.

The Problem

Several studies and meta-analyses have examined the impact of IBSE compared to other science teaching approaches (Berg, Bergendahl, Lundberg, & Tibell, 2003;

Brederman, 1983; Chang & Mao, 1999; Furtak, Seidel & Iverson, 2009; Klahr & Nigam,

2004; Schneider, Krajcik, Marx, & Soloway, 2002; Tamir, Stavy, and Ratner, 1998,

Furtak, Seidel & Iverson, 2009; Minner, Levy & Century, 2010). The findings in the literature are varied and sometimes contradictory. Some studies present evidence of the positive impact of scientific inquiry instruction while others present counter evidence.

What appears through the fog as illuminated by meta-analyses is a slight advantage for

IBSE. This current research intends to help lift the fog a little bit.

The varied definitions of inquiry-based science teaching and the multiple methods used for measuring students’ outcomes are probably two of the causes of the inconsistencies of the results. Additionally, research done on the impact of scientific

2 inquiry has suffered from design flaws including the lack of a clear definition of the meaning of inquiry, especially when dealing with achievement. Results from studies that focus on measuring student science achievement are mixed, with some studies suggesting positive impact of the methodology while other studies present no impact or negative impact.

The present dissertation seeks to compare IBSE and Control students in science achievement. This dissertation begins by presenting a working definition of what inquiry- based science education is in order to establish the science-teaching framework for this study. The dissertation continues with a presentation of several empirical studies that compared inquiry-based teaching with other approaches. The next chapter presents different components related to measuring student science achievement including ways in which inquiry has been measured in empirical studies. The following chapter focuses on a detailed description of the treatment and control groups and the procedures for conducting the study. Finally, the next chapter reports and discusses findings and a concluding chapter brings us back to the original question of the effects of IBSE and possibilities for future research.

The contribution of this dissertation is to provide an in-depth analysis of the impact of inquiry-based science teaching measured through different assessments that tap into different types of knowledge. Since inquiry education not only guides students into learning concepts, but also goes beyond facts, it may present further alternatives to develop skills in students.

3

CHAPTER 2. DEFINITION OF INQUIRY-BASED SCIENCE TEACHING

The term “scientific inquiry,” as applied to science education, has been used in a variety of ways and contexts (Berg, Bergendahl, Lundberg, & Tibell, 2003; Chang &

Mao, 1999; Klahr & Nigam, 2004; Schneider, Krajcik, Marx, & Soloway, 2002; Tamir,

Stavy, and Ratner, 1998). The problem then, is that its definition is not universal and is interpreted differently by different people. An example of this includes Chang & Mao’s

(1999) definition, that focuses on inquiry as hands-on activities, gathering and recording data, interpreting data and their relationships, or Klahr & Nigam (2004), who define it as experiences where students explore without any teacher guiding questions or direction.

Since there are so many conceptions and definitions of inquiry in science education, it is important for me to present my perspective of what “inquiry-based science teaching” means in science education. The development of this conceptual framework is based on four sources: 1) a definition by the National Research Council that focuses on students engaging in science activities, 2) a definition by National Science

Foundation that focuses on teachers, 3) a definition of guided scientific inquiry provided by Furtak (2006), and 4) Duschl’s (2003) facets of scientific inquiry.

The first source informing my definition focuses on students doing activities like those that scientists engage in and is offered by the National Research Council in the

United States:

Scientific inquiry refers to the diverse ways in which scientists study the natural world and propose explanations based on the evidence derived from their work. Inquiry also refers to the activities of students in which they develop knowledge and understanding of scientific ideas, as well as an understanding of how scientists study the natural world. - NRC, 1996. p. 23

4

In this case, NRC mentions that inquiry is an approach that scientists use to study the natural world. It also refers to activities that students engage in, in order to learn and understand scientific concepts. The combination between what scientists do and what students do are central to studying inquiry.

The second source, presented by the National Science Foundation, focuses on what teachers do to engage students in science “inquiry”:

Inquiry teaching students to build their understanding of fundamental scientific ideas through direct experience with materials, by consulting books, other resources, and experts, and through argument and debate among themselves. All this takes place under the leadership of the classroom teacher. -NSF, 1997. p. 7

The National Science Foundation, go a step further with their explanation of what students do in inquiry instruction. It talks, among other things, about students’ “direct experience” and “argument and debate” while guided by a classroom teacher. The leadership of the classroom teacher in setting up the learning process is one of the key issues of this definition of inquiry.

A third source for defining inquiry-based science education deepens the teacher’s role in inquiry teaching, and places teacher-student activities on an “inquiry continuum” ranging from traditional direct instruction to wide-open scientific inquiry (Furtak, 2006).

Most science inquiry instruction commonly falls between these two extremes in a condition called “Guided Scientific Inquiry” (Furtak, 2009 ) (Figure 2.1), where students are guided through investigative approaches toward answers that are known by the teachers. In many cases, curriculum and materials, such as teaching modules and textbooks, serve to define the location of teaching learning in the inquiry continuum.

5

Traditional, Open-Ended Direct Guided Scientific Scientific Instruction Inquiry Inquiry (Discovery) Figure 2.1. Continuum representing forms of science instruction (Furtak, 2006)

Facets in Inquiry-Based Science Teaching The fourth source used for developing my framework of inquiry-based science teaching comes from Duschl (2003). The author presented a framework of science inquiry teaching based on previous work in the areas of social and cognitive psychology, history and philosophy of science, and educational research. He proposed three domains that are part of inquiry-based science teaching: Epistemic, Conceptual, and Social.

Duschl (2003, p. 42) refers to the epistemic facet as “frameworks used when developing and evaluating scientific knowledge”, the conceptual facet as “structures and cognitive processes used when reasoning scientifically”, and the social facet as “the processes and forums that shape how knowledge is communicated, represented, argued, and debated”.

Ruiz-Primo and Furtak (2004) expanded on these three domains and created four overlapping facets, dividing Duschl’s epistemic domain into methodological and epistemic. This division was suggested since it was necessary to differentiate the methodology required in conducting investigations from the nature of scientific knowledge.

In Taking Science to School, Duschl et al. (2007) presented four strands of scientific proficiency where inquiry teaching plays an important role. These four strands are similar to Furtak’s (2006) and Duschl’s (2003) facets of inquiry: (a) Know, use, and interpret scientific explanations of the natural world; (b) Generate and evaluate scientific evidence and explanations; (c) Understand the nature and development of scientific knowledge; and (d) Participate productively in scientific practices and discourse.

6 Figure 2.2 presents my conceptualization of inquiry-based science teaching based on the five sources mentioned above. The framework presents three vertices represented by teachers, students, and curriculum and materials. The interior of the triangle presents the three facets or domains that inquiry-based science teaching needs to tap into

(conceptual, epistemic, and social) (Duschl, 2003). My framework for inquiry-based science education is therefore defined by the teacher in his or her actions in the classroom, including questions asked, activities used, and the set-up of the learning process, by students in the way they develop knowledge and understanding, and by the curriculum and materials used. In some cases, teachers follow the textbook and materials very closely. The curriculum and materials, then, teachers to base his or her class on factual and conceptual material and simple procedures (e.g., calculations).

Teacher

Facets Conceptual Epistemic Social (Duschl, 2003)

Curriculum/ materials Student

Figure 2.2. Triangle of Inquiry-Based Science Teaching

In IBSE classes, in addition, teachers can set up an inquiry environment where students are the ones doing the activities and building their scientific knowledge (Ruiz-

Primo, personal communication, Dec, 2009). Since the three domains that inquiry

7 teaching taps into are the center on inquiry teaching, a more detailed description of each facet is presented below.

Conceptual Facet

Teaching concepts is an important part of inquiry-based science teaching.

Students learn concepts through direct instruction and discovery. Furtak and Seidel

(2008) describe the conceptual facet as the facts, theories, and principles in science.

Concepts are connected both to students’ previous science knowledge and to the development of complex understandings.

In inquiry science, the development of these concepts is important and needs to take into account students’ previous knowledge (Duschl, 2003). Inquiry-based science teaching requires knowledge integration from different areas of science and ways of reasoning (Duschl, 2003). Inquiry-based science teaching aims to enable students to extend and refine their understanding of the natural world. Therefore, science education should focus on identifying important scientific ideas that not only help students understand the essence of a discipline, but also the manner in which those ideas are discovered (American Association for the Advancement of Science, 1990; NRC, 1996;

Furtak & Seidel, 2008).

Duschl et al. (2007) present a strand called “Know, use, and interpret scientific explanations of the natural world.” This strand is directly connected to my definition of the conceptual facet. This strand is defined as students acquiring facts and conceptual structures that incorporate scientific knowledge, and using those facts productively to understand the natural world. In my framework of inquiry-based science teaching,

8 students are led to construct knowledge in a similar way as scientists do, in addition to a guided process that allows them to discover, think, and individually use those concepts.

Therefore, my definition of the conceptual facet takes into consideration scientifically based facts, theories, and ideas, that students use to reason about and understand the natural world. In this process, students’ previous knowledge is taken into account and is either built upon or changed through the scientific inquiry process.

The conceptual facet is not perceived as the main one in inquiry education.

However, in order to learn science, students need to know specific concepts related with the scientific topics they are learning and reason about them. In spite of perception, then, this facet is a major component of science teaching including inquiry teaching.

Li, Ruiz-Primo & Shavelson (2006; Li & Shavelson, 2004) categorize cognitive outcomes into four types of knowledge: Declarative, knowing that; procedural, knowing how; schematic, knowing why; and strategic: knowing about knowing in a domain. In the case of the conceptual facet of inquiry, most cognitive outcomes can be mapped with declarative knowledge. In addition, there may be several outcomes that go beyond the facts where students apply their conceptual knowledge, and can be mapped with schematic knowledge.

Epistemic Facet

The second facet of inquiry-based science teaching is the epistemic one. Duschl

(2003, p. 42) describes this domain as “frameworks used when developing and evaluating scientific knowledge”. Ruiz-Primo and Furtak (2004) divided Duschl’s epistemic domain into a methodological and a epistemological facet. The former refers to a series of non- linear procedures that include experimental design, executing procedures using

9 instruments and data representation. The latter facet refers to the understanding of where scientific theories and principles come from. This epistemological facet involves students’ understanding of the nature of science. It is also possible to find the epistemic facet in Duschl et al. (2007) divided into two strands: “Understand the nature and development of scientific knowledge” and “Generate and evaluate scientific explanations”. The former resembles Ruiz-Primo and Furtak’s epistemological facet, while the later is similar to the methodological facet they propose.

My definition of the epistemic facet focuses on how science can generate knowledge and its limitations, and is divided into two sub-facets: nature of scientific knowledge and procedural.

Nature of scientific knowledge. This sub-facet refers to students’ understanding of the nature of scientific knowledge, its limitations, and scopes. It also refers to students’ understanding of science as a discipline and as a way of knowing (Duschl et al. 2007).

This is a desired outcome in inquiry-based science teaching, and needs to be part of the teaching process. When students are able to understand how science occurs, they get closer to thinking like scientists. Inquiry teaching actively leads students in this process.

Procedural. This sub-facet is related to procedures where students design and analyze investigations constructing and using scientific evidence. It involves planning and carrying out investigations, reviewing what is already known based on experimental evidence, using tools to gather, analyze, and interpret data, and proposing answers,

10 explanations, and predictions (NRC, 1996). Through these processes, students are able to construct knowledge and to understand how scientists warrant knowledge.

This sub-facet was first identified as a separate one by Ruiz-Primo and Furtak

(2004) and described as a methodological one. Their work was based on what Bruner

(1961) called the “heuristics of discovery”, including designing experiments, executing procedures, recording data, and constructing graphs.

I believe that the procedural and the nature of scientific knowledge sub-facets are the way through which students start constructing science theory and knowledge. Even though it is possible to have this epistemic facet in other types of science instruction, it is an essential component of inquiry. Inquiry-based science teaching may be defined or thought of as only hands-on experiments, but there is much more to it than this. In order to go beyond the hands-on component, it is necessary to lead students into a thinking process that allows them to learn about concepts, procedures, and the nature of scientific knowledge.

The Epistemic facet can be mapped into the declarative, procedural, schematic , and strategic types of knowledge presented by Li, Ruiz-Primo & Shavelson’s (2006). The declarative, schematic, and strategic types of knowledge include aspects of the Nature of

Scientific sub-facet while the procedural type of knowledge includes aspects of the procedural sub-facet of inquiry-based science education.

Social Facet

The third facet of inquiry-based science teaching is social. Duschl (2003) and

Furtak (2006) describe this domain as the social processes that shape how knowledge is communicated, represented, argued, and debated. It involves social interactions among

11 students and is an important element in reasoning and generating scientific knowledge.

Inquiry-based science teaching fosters students communicating with both teachers and classmates in order to construct their scientific language and thinking process.

My definition of the social facet refers to the communicative processes that are required to build scientific knowledge. For my definition, the social facet takes into account that scientific knowledge is constructed in collaborative groups and it is built on previous work, and how scientists work and interact as part of a community of practice

(Duschl, 2003).

Examples of Actions that Characterize Inquiry-Based Science Teaching

Table 2.1 Characteristics of the Facets of Inquiry that Students Show when Learning Science through an Inquiry Approach (Adapted from Furtak & Siedel, 2008) a Facet of Inquiry Number Actions students focus on

Procedural 1.1 Ask Scientifically-Oriented Questions

1.2 Design Experiments

1.3 Execute Scientific Procedures

1.4 Gather and interpret data

1.5 Represent Data

1.6 Do Hands-On Activities

1.7 Formulate hypothesis

Nature of scientific 1.8 Reflect on Nature of Science based on direct experiences

knowledge 1.9 Draw explanations based on evidence

1.10 Generate and revise theories

1.11 Apply scientific knowledge

Conceptual 1.12 Draw on/Connect to prior knowledge

1.13 Explain ideas

1.14 Organize concepts and principles

12 Social 1.15 Participate in class discussions

1.16 Argue/debate scientific ideas

1.17 Developing communication skills

1.18 Cooperative learning

a The numbers in this table will be used in Chapter 3 to map the different facets to empirical studies that have compared inquiry with other approaches.

Table 2.1 shows a series of examples of actions that students do that teachers intentionally enable through the design of inquiry learning environments. These actions can be linked directly to the Facets of Inquiry described above. This table presents examples to clarify what types of action can be categorized as inquiry teaching in the classroom. The examples will hopefully close the gap between my theoretical definition of inquiry-based science teaching, and what it is expected that students do in an inquiry classroom.

This table provides examples of actions that can be linked to specific facets. The elements of the table are applied in the next chapter when evaluating empirical studies and presenting my own study. By providing these concrete examples, the facets of inquiry-based science teaching can be made visible, as can the characterization of teaching practices.

Concluding Remarks about My Definition of Inquiry-Based Science Teaching

My definition of inquiry-based science teaching involves three approaches, 1) students engaging in science activities, 2) teachers and their instruction, and 3) teachers and students in their interaction with curriculum materials. Inquiry teaching focuses on the domains presented by three facets of inquiry: conceptual, epistemic, and social. The three approaches include the way students engage in scientific activities similar to what a scientist actually does, how teachers approach their students to foster the actions and

13 thinking of a scientist, and how curricular materials can create a learning environment to foster interactions between students and teachers to build conceptual, epistemic and social knowledge and skills. Additionally, the facets allow a further clarification of what inquiry looks like in the classroom to foster the type of thinking scientists engage in.

The framework presented in this chapter allows me to analyze empirical studies that have compared inquiry teaching with other approaches, and sets the stage for my own study, that compares inquiry-based science teaching with traditional approaches. The use of this framework provides a clear definition of what I refer to by inquiry-based science teaching.

14

CHAPTER 3. HOW INQUIRY HAS BEEN STUDIED

Even though the framework described above defines inquiry-based science teaching as comprised of facets or domains, very few studies that compare scientific inquiry teaching with other teaching methods incorporate this perspective in their descriptions of inquiry. When I mapped studies to the facets, most of them relate scientific inquiry with an epistemic facet (Von Secker and Lissitz, 2009; Berg, et.al,

2003; Klahr & Nigam, 2004, Geier, et.al. 2008; Tamir et.al., 1998, Chang & Mao, 1999;

Schneider et.al., 2002; Pine, Aschbacher, Foley, Jones, Kyle, McPhee, Phelps, and Roth,

2006). A few of studies present the epistemic and the conceptual facet in their definition of science inquiry teaching (Geier, et.al. 2008; Tamir et.al., 1998). However, only a small group of studies have presented inquiry teaching as the combination of all three facets

(epistemic, conceptual, and social) (Chang & Mao, 1999; Schneider et.al., 2002; Pine, et.al., 2006). Empirical studies have different definitions of inquiry-based science teaching, use a variety of research designs, and produce diverse conclusions when comparing inquiry with other teaching approaches. This chapter provides an analysis of the empirical studies, identifying drawbacks and lessons learnt that were considered in my study.

When analyzing the studies, I expected that the greater the number of (up to 3) facets involved in inquiry science teaching approaches, the greater that teaching impact would be on students’ science achievement. However, this was not the case (e.g., Pine et al, Chang & Mao); these studies only showed partial, positive effects of inquiry teaching.

In the studies reviewed, the inclusion of different facets in the treatment, then, did not

15 guarantee measurable differences on students’ learning. But I still think that in order to have the best test of the science-inquiry teaching hypothesis, it is necessary to have all three facets in the treatment as well as measured as outcomes. Additionally, none of the studies reviewed went beyond fairly superficial descriptions of treatments—be they inquiry or the comparison treatment—and none corroborated these descriptions with direct evidence from classrooms, where teachers might be applying teaching strategies and approaches different from the descriptions (cf. Furtak, Shavelson, Shemwell &

Figueroa, in press).

Operational Definitions of Inquiry-Based Science Teaching in Studies

Even though the studies reviewed had a similar purpose, that is, testing the hypothesis of the superiority of science-inquiry teaching compared to the “standard”

(variously defined) method, they varied in their estimates of the impact of science inquiry on students’ achievement. Some studies found inquiry to be an effective strategy (e.g.,

Tamir et al., Berg et al., Geier et al.), while others found inquiry to have a partial effect

(Chang & Mao, Schneider et al.) or no impact (Pine et al.). In the most recent meta- analysis to date, Furtak et. al. (2009) compared nine studies (some included in this analysis) in a meta-analysis, where she was able to select the studies based on a framework that included her four facets (conceptual, procedural, epistemic, and social).

Only studies that had a design in which a causal interpretation of the impact of treatments was warranted were selected (randomized, control experiments or quasi experiments with a pre-post two-group design, and a cognitive outcome measure). The study reported an

16 average effect size of 0.921 when comparing an inquiry-based science teaching approach with other teaching approaches. This effect size is considerable taking into account the different results that are presented previously. Given this analysis, it seems that inquiry- based science teaching has a large effect when compared to other types of teaching.

However, since the definitions of inquiry-based science teaching differ from study to study, the results need to be analyzed with more detail.

There are several possible explanations for these inconsistent results. The studies focused on a wide range of research questions and outcomes (e.g., achievement, motivation), lacked a unified definition of science inquiry teaching, employed limited research designs, and a deployed instruments of varied technical quality (reliability and validity). In addition, the studies varied in science topics taught and grade levels. All this variation makes it difficult to compare the impact of inquiry teaching reported in the studies. Additionally, the methodological approaches in the treatments used generated several threats to validity. In some of the studies including Chang and Mao (1999) and

Tamir et al. (1998), students in the treatment condition were explicitly taught the skills that were going to be tested; not so in the comparison condition. In other cases, participants were selected either by the researchers or by school administrators.

Table 3.1 presents a summary of the research questions or purposes presented in the empirical studies under review here, as well as their content areas, and the grade level in which the studies took place. The main goal in selecting these studies was to find research that provided examples of what has been done regarding the impact of inquiry- based science instruction on students’ outcomes, and that showed designs and

1 The standard deviation for the mean effect size is 1.02. This meta analysis only included six papers and nine studies.

17 methodologies similar to the ones I might use in my own study. Comparative studies instead of descriptive ones were selected. All of the chosen studies had a group taught with an inquiry approach and a comparison group that was taught with another approach.

Additionally, all studies used at least one cognitive outcome to measure students’ performance. Also, studies selected for this review were in the science domain, measuring students’ outcomes in different topics such as Earth Science, Physics,

Chemistry, and . Finally, all the studies were conducted with K-13 students, age groups that include the grade levels of my study’s participants.

Table 3.1 Characterization of the Studies that Compare Inquiry with Other Teaching Approaches a Research Content Study Grade Level Questions/Purposes Domain E: Evaluate the effect of different instructional strategies, one that emphasized E: Chemistry, Tamir et al. E: 12th inquiry and another that did not emphasize physics, or (1998) graders inquiry, on students’ performance in solving biology novel problems in the laboratory. E: “Investigate the comparative efficiency between inquiry-group instruction and Chang & Mao traditional teaching methods in terms of their E: Earth E: Junior high (1999) effects on student learning of science content Science and on student attitudes towards the subject matter…” E: “Provide a baseline evaluation of whether teachers’ decisions to implement the specific E: Chemistry, Von Secker & instructional emphases recommended in the physics, earth E: 10th Lissitz (1999) National Science Education Standards are science or graders associated with science achievement and biology equity.” E: Chemistry, E: “Whether students in an inquiry-based E: 10th, 11th Schneider et physics, earth science curriculum would perform as well as and 12th al. (2002) science or students nationally on achievement test items.” graders biology E: “Will an expository versus open-inquiry E: 1st year Berg et al. version of the same experiment have different E: Chemistry college (2003) outcomes for our students?” students I: Compare the relative effectiveness of E: Physics, Klahr & discovery learning and direct instruction and E: 3rd and 4th (balls and Nigam (2004) test children’s ability to transfer what they graders ramps) have learned once they have achieved mastery.

18 I: Compare the performance of students taught E: Physical, Pine at al. in classrooms with hands-on curricula to biological, E: 5th graders (2006) students taught in classrooms with textbook and earth curricula. science E: “Whether urban student participation in I: Chemistry, project-based inquiry science curricula leads Geier et al. biology E: 7th and to demonstrably higher student achievement (2008) (ecology) and 8th graders on state wide assessments over and above physics. general district-wide efforts at reform.” a E: Take explicitly from the study; I: Taken implicitly from the study.

As mentioned above, the studies generally lacked an explicit definition of both the science inquiry treatment and the alternative treatment. Even though the studies presented an overview of their treatment, none provided a formal and explicit definition as to what each study meant by inquiry teaching. In some cases, this lack of definition is even reflected by the fact the word “inquiry” itself is used as part of the definition of the treatment. Studies need to operationalize their treatments in carrying out the study and reporting results in order to make evaluations of the impact of such type of teaching possible and comparable (Furtak et al., in press). Additionally, the instructional methods for the Control groups were not well defined, and in some cases were defined as a “lack of treatment”. In those cases researchers used samples of students not participating in the treatment, which makes it difficult to interpret the differences between the inquiry and

“other” treatment(s) due to selection bias.

The framework that defines inquiry-based science teaching in this study is based on facets. Even though the studies provided limited definitional information, I was able to map those studies on to the three facets of my framework, to a greater or lesser extent.

While only one study had one of the facets of inquiry-based science teaching, (Klahr &

Nigam’s), others included two facets or sub-facets, (e.g., Von Secker & Lissitz and Berg et al.), and other studies incorporated three or more facets or sub-facets of inquiry

19 teaching, (e.g., Schneider et al., Pine et al., Geier et al.). Table 3.2 summarizes the nature of inquiry teaching and the alternative approach following the facets in my framework

(see Chapter 2).

20 Table 3.2 Conceptual Approach - Mapping Study’s Inquiry Conception onto the Inquiry Facets a, b, c Mapping to Facets and Sub-facets of Inquiry Study Alternative Approach Conception of Inquiry Procedural NOSK Conceptual Social I: Inquiry oriented curriculum : E: 1.2 * I: 1.8 I: 1.14 I: Curriculum that did not emphasize Group 1. Observation, experimental Designing Reflecting on Organizing inquiry: Students who specialized in design, communication and experiments. nature of concepts and a physics and/or chemistry Tamir et manipulation. science principles. curriculum that did not emphasize al. (1998) Group 2: Same as group 1 with an inquiry. addition of explicit instruction in the

concepts underlying inquiry.

* Only applies for group 2. E: Traditional instruction: Lectures E: Inquiry-group instruction: E: 1.6 I: 1.9 I: 1.13 E: 1.18 given by teachers, use of textbooks Cooperative learning. Carrying out Drawing Explaining Cooperative Chang & and other materials, and clear I: Hands-on activities, gathering and hands-on explanations ideas/ a learning Mao explanation of important concepts to recording data, interpreting data and activities based on mental models (1999) students. Occasional demonstrations their relationships. evidence were also included. Von E: Student-centered instruction: I: 1.6 I: 1.11 Secker & E: Teacher-centered instruction: Critical thinking, opportunities for Carrying out Applying Lissitz Definition not provided. laboratory inquiry, and less teacher- hands-on scientific (1999) directed instruction activities knowledge E: Project-Based Science (PBS): E: 1.4 I: 1.10 E: 1.14 I: 1.17 Collaboration and communication, Gathering and Generating Organizing Developing Schneider E: National Sample Subgroup: integrated with computer . interpreting and revising concepts and communica- et al. Definition not provided. Gathering information, analyzing data theories principles tion skills (2002) data, expressing results, creating scientific models, writing reports. E: Expository instruction: I: Open-inquiry: Students formulate a E: 1.7 I: 1.16 Berg et al. Laboratory instructors describe the hypothesis, propose how to test it, Formulating Arguing/deba (2003) entire experiment in detail to and plan, perform, evaluate and hypothesis ting scientific students. discuss their experiments. ideas. a, E: Explicit; I: Implicit b NOSK: Nature of Scientific Knowledge c If information about the relationship of the treatments to facets is not present in the study then the cell is empty. However, if I am not able to map the information to a facet, then I will indicate that this information is Not Available (NA).

21 Table 3.2 Continues Mapping to Facets of Inquiry Study Alternative Approach Conception of Inquiry Procedural NOSK Conceptual Social E: Direct instruction: where E: Discovery-learning: where I: 1.6 Carrying Klahr & objectives, materials, examples and students explored without any out hands-on Nigam explanations on how to control teacher guiding questions or activities Option 1 variables were provided by the direction. (2004) teacher. E: Hands-on curriculum: Hands-on E: 1.6 I: 1.9 E: Text-based curriculum: activities. Carrying out Drawing I: 1.14 I: 1.17 Pine at al. Textbooks as instruction tools. hands-on explanations Organizing Developing (2006) Multitude of short, fact-based activities based on concepts and communi- expositions. evidence principles cation skills E: Detroit Public Schools: A E: Inquiry-based science curriculum E: 1.4 I: 1.9 combined pool of students receiving (LeTUS curriculum): Inquiry Gathering and Drawing Geier et al. no intervention or other investigations contextualized by interpreting explanations E: 1.14 (2008) interventions (different than the driving questions. Supported by data based on Organizing inquiry-based science curriculum) in technology. evidence concepts and science. principles a, E: Explicit; I: Implicit b NOSK: Nature of Scientific Knowledge c If information about the relationship of the treatments to facets is not present in the study then the cell is empty. However, if I am not able to map the information to a facet, then I will indicate that this information is Not Available (NA).

22

All the studies had in their conception of inquiry information that mapped onto the procedural sub-facet of inquiry. It seems, then, this sub-facet serves as a defining characteristic for inquiry teaching in these studies. Unfortunately the studies did not provide sufficient information for me to characterize their approach, if at all, to the nature of scientific knowledge sub-facet. So, it is possible to infer that the emphasis of inquiry is still directed more towards methods and less towards the thinking processes and reflections of the nature of scientific knowledge.

Research Designs True experimental designs with randomization and appropriate control group that compare inquiry teaching with other approaches are rare in educational research. This may be because of the difficulty of conducting randomized studies with teachers and students in a school setting. Among the several studies reviewed, only one included a true experimental design (Klahr & Nigam, 2006). Other studies used quasi-experimental designs, where there is an appropriate control group and extensive pretest data but no randomization. These studies can also be used to examine causal impact if carefully carried out (Pine et. al. 2006). There were also some “pre-experimental” designs where inquiry was compared with another treatment there was neither randomization nor pretest controls. Table 3.3 presents a summary of the research designs used in the empirical studies analyzed.

23 Table 3.3 Research designsa, b Quasi- Study Experimental Ex-post facto experimental

I: X1 O1 O2 Tamir et al. (1998) X2 O1 O2 O1 O2

E: X O1 O2 Chang & Mao O1 O2 (1999)

I: X1 O Von Secker & Lissitz X2 O (1999)

I: X O Schneider et al. O (2002)

I: X O1 O2 Berg et al. (2003) O1 O2

I: R X O1 O2 Klahr & Nigam R O1 O2 (2004)

I: X O O Pine at al. 1 2 O O (2006) 1 2 I c: X O Geier et al. (2008) O

a E: Explicit; I: Implicit b Following Campbell and Stanley (1963): X=treatment, Blank=control, O=observation (measurement) c This design was replicated one year later.

Drawbacks and Lessons Learnt of the Studies Table 3.4 shows the main drawbacks of the studies producing a number of lessons learnt. None of the reviewed studies corroborated treatment labels with observable classroom practice; teachers could have applied different teaching strategies and approaches than those labeled. Only one was a true experiment; others were pre-

24 experimental without adequate controls for selection bias. Others used instrumentation that favored the inquiry treatment.

Finally, there might be different outcomes depending on the facets of knowledge taught in the treatment. It seems that if students are taught a conceptual facet through inquiry, they perform better in the conceptual component on a measure of that facet than does the comparison group (e.g., Chang & Mao, 1999; Schneider et al. 2002, Geier et al.

2008). In contrast, if these same students are taught hands-on activities (procedural facet) they do not necessarily do better in hands-on (procedural) assessments than students taught with an alternative approach (Pine et al. 2006).

Table 3.4 Critique and Drawbacksa Study Critique and Drawbacks Lessons Learnt • Provide the tasks in their (-) Lack of information regarding how both entirety for readers. curricula were taught in each class. • As expected, there was (-) Lack of detail of the treatment. greater impact because of Tamir et al. (-) Difference between students in groups (one the proximity of the (1998) group had students specializing in physics and outcome as well as the the other one had students specializing in alignment between the biology). treatment and the test. (+) Present the tasks in the appendix. • Avoid selection bias.

(-) Lack of information regarding how both • Use of a close outcome only curricula were taught in each class. may limit external validity (-) Advantage of treatment group because of results. alignment of treatment with outcomes. • It is realistic to consider (-) The cooperative learning setting used in random assignment by Chang & Mao the treatment might have had an influence on classes. (1999) students’ answers to the attitude questionnaire. (+) Random assignment of classes (not students). (+) Presented definitions of the treatment and the alternative approach. (-) Lack of information regarding how both • The findings of this study curricula were taught in each class. are very limited because of Von Secker & (-) The definition of the pedagogical practices the type of ex-post facto Lissitz (1999) is generated by a teacher’s self-report design. The design is questionnaire. critical to what can be (-) No accurate representation of the concluded.

25 instruction happening in the classroom. • Gathering practices only (-) Lack of description of the treatment and with a self-report provides the outcome. very limited information.

(-) Lack of information regarding how both • The outcome is not aligned curricula were taught in each class. with the test and it is distal, (-) Students participating in the treatment yet there is a treatment were a selected group from a specific high effect. This effect can be Schneider et school and were not randomly assigned. explained by the fact of al. (2002) (-) Lack of definition of alternative approach. having an alignment (-) Test-taking conditions are different for between the outcomes and both groups. the facet-based inquiry (-) Authors do not mention how the NAEP conception. item sub-set was selected. (-) Lack of information regarding how both • A measure of achievement curricula were taught in each class. is necessary when (-) The results cannot necessarily be attributed comparing science to the approach since open-inquiry students performance of students. spent more time in the experiments. Berg et al. (-) No measure of achievement was (2003) administered. (+) Authors provided examples of the different outcome measures and formats in the text and in appendices. (+) The authors were not involved in teaching students during any part of the courses. The use of a transfer task. (-) Lack of information regarding how both • The importance of well curricula were taught in each class. • defined conditions. If (-) Direct instruction seems more a form of authors had not provided a guided scientific inquiry teaching where the definition of “direct- Klahr & teacher interacts with the students. instruction”, this category Nigam (-) If inquiry is considered only as hands-on could have been (2004) activities without instruction, it is not misinterpreted. surprising to find that there is no learning. In this study, the alternative (+) Random assignment. • approach was much more (+) The use of a transfer task (evaluation of aligned to the tests than the posters). treatment. (-) Lack of information regarding how both • Importance of monitoring curricula were taught in each class. classroom implementation (-) Unequal assessment tasks (3 physics and 1 with observation biology). instrument. (-) Length of one assessment task (three- days Need to balance the content Pine at al. • long). domains in the assessments. (2006) (+) Use of distal and proximal outcomes. • Use of distal and proximal (+) Authors avoided specific content areas and outcomes measures. assessed mainly procedural aspects in order to have less aligned outcomes. Focused more on science skills instead of content.

26 (-) Lack of information regarding how both • Even though no difference curricula were taught in each class. was found within students (-) No sample of the selected MEAP items from the same school were provided. participating in both (-) There were students in the same school conditions, there were Geier et al. that participated in both conditions. The significant differences when (2008) authors did not find significant difference all groups were analyzed among those groups. together. Therefore, there is (-) Lack of definition of alternative approach. a need to measure different (-) Teachers that participated in LeTUS were schools. selected by administrators (selection bias). (+) Authors are aware of sample bias. a (+) Positive aspects; (-) Negative aspects

Concluding Remarks of How Inquiry Has Been Studied

The empirical studies analyzed above provide diverse lessons that were considered in the design of this dissertation study. First of all, the studies did not clearly define inquiry teaching. That is why I devoted a whole chapter defining what I refer to when talking about inquiry-based science teaching. In addition to pointing to the need for a clear definition of inquiry, the studies made clear to me the importance of developing a classroom observation tool that could be used to determine whether the conception of inquiry teaching espoused was enacted in the classrooms. In my study, then, classroom observation information is compared to my definition of inquiry, to be sure that the treatment and the control conditions are in fact inquiry or not. The final lesson learned relates to the nature of the research design. If a randomized experiment cannot be carried out, a quasi-experimental design, with one or more pretests and a posttest should be carried out if at all possible and it was possible in this dissertation.

27

CHAPTER 4. MEASURING STUDENTS’ SCIENCE ACHIEVEMENT

There are several different considerations that need to be taken into account when

measuring students’ science achievement. This chapter presents some of these

considerations, based on both empirical studies as well as conceptual articles about

science achievement.

How Students’ Science Achievement is Measured Students’ science achievement can be measured in different ways. The most

common way is to include multiple-choice questions. Assessment alternatives to

multiple-choice questions include evaluation of student science notebooks, Predict-

Observe-Explain (POE) questions, performance assessments and other constructed

response items (Ruiz Primo and Shavelson, 1996; Ruiz Primo, Shavelson, Hamilton and

Klein, 2002; Pine, et al. 2006, Shemwell, Fu, Figueroa, Davis & Shavelson, 2008). Table

4.1 presents a brief description of examples of types of assessments.

Table 4.1 Examples Types of Assessments Used to Measure Science Achievement

Type of Assessment Description. Multiple choice Consist of a stem, a correct answer and several distractors.

Consists of the question, stimulus material, and a scoring Constructed response rubric that includes the grading criteria.

Consist of a challenge that is to be solved using materials Performance assessments provided, a response sheet, and a scoring system that includes the procedures and the solution.

Students are presented with an initial condition of a POE situation, with an uncertain outcome and are asked to predict the outcome, observe what happens, and interpret and explain their observations.

28 One of the most common ways in which science achievement is tested is through multiple-choice exams (Ruiz Primo and Shavelson, 1996). Even though multiple-choice achievement tests have benefits such as efficiency, cost and high reliabilities since they are economical to develop, administer, and score, they have received diverse criticisms that include that they don’t measure some scientific knowledge such as the ability to formulate a problem or carry out an investigation (Ruiz-Primo & Shavelson, 1996).

However, multiple-choice tests are useful to measure facts and concepts. Nevertheless, these types of test do not necessarily measure all the outcomes related with knowledge and skills, since they are not equivalent to what a scientist does (Ruiz Primo &

Shavelson, 1996). Therefore, it is important to consider other types of measures when assessing students’ science achievement.

However, the assessments per se include diverse other criteria that may determine how well these can measure students’ science achievement. Therefore, a more detailed analysis of elements included in item design needs to be considered both from empirical studies as well as from the way student achievement is measured in practice.

Types of Outcomes in Inquiry Empirical Studies

The empirical studies analyzed in the previous chapter used different measures student achievement including achievement and local, regional, national or international standardized tests, students producing questions, and unit problems among others. While some of the studies provided the full instruments (Pine et al., 2006) as part of the article and others provided examples of the items (Chang & Mao, 1999), a few only mentioned or briefly described the tests used. Additionally, of the outcomes described in the studies, only a few were comparable with each other, mostly state or national assessments. Even

29 though the outcomes varied from one study to the next, a basic comparison of the studies was possible using the information provided, and the inquiry facets.

Types and characteristics of outcomes

Table 4.2 shows a comparison of the types of outcomes used in empirical studies that compare inquiry teaching with other approaches. The outcomes can be cognitive, affective or both. When provided, sample items and the intention of authors were taken into account in order to classify the assessment into type of outcomes. All studies presented in Table 4.2 have at least one cognitive outcome. These cognitive outcomes are further characterized by their proximity and alignment with instruction, and the type of knowledge they tap into.

Table 4.2 Outcomes used in Studies that Compare Inquiry with other Approaches Type of Study Outcomes measured outcome a Tamir et al. (1998) Two variable-based problems (Constructed C Response) Earth Science achievement test (Multiple- C Chang & Mao (1999) choice) Attitudes towards Earth Science A Inventory (Multiple-choice) th Von Secker & Lissitz (1999) 10 grade science achievement test (ETS) C (Multiple choice) Schneider et al. (2002) C NAEP Science Test Items (Multiple-choice) Berg et al. (2003) Questions asked by students. C Student’s self-evaluation (Multiple-choice) A Assessment of Control for Variable Strategies C Klahr & Nigam (2004) (CVS) Ability to evaluate posters. C Pine et al. (2006) Performance Assessment C TIMSS Achievement (Multiple-choice) C Geier et al. (2008) Science items from the Michigan Educational C Assessment Program. (Multiple-choice) a, C: Cognitive Outcome; A: Affective Outcome

30 Proximity

Proximity refers to the distance between the assessment and what is taught in the curriculum. Proximity ranges from immediate to a distal and/or remote (Ruiz-Primo,

Shavelson, Hamilton, & Klein, 2002, Ruiz-Primo, , Rosenquist, Schultz,

Shavelson, Klein, and Hamilton, 1998) (Figure 4.1). Immediate assessments refer to those that are directly linked with classroom instruction such as notebooks or classroom tests; close assessments are embedded assessments related with the teaching unit; proximal assessments measure the same unit but with a different application; distal are large scale performance assessments from a state or a national curriculum framework; and remote are national science achievement tests (Ruiz-Primo, Shavelson, Hamilton, &

Klein, 2002). Several studies reviewed had assessments that were close to the content taught in the treatment condition (Tamir et al, 1998, Chang and Mao, 1999, Berg, et al.,

2003, Klahr & Nigam, 2004), That is, the assessments were parallel to the treatment’s content and activities. On the other hand, the performance assessment used in Pine et al.

(2006) can be categorized as a proximal assessment since the knowledge and skills students need are relevant to the curriculum but specific topics can be different. The third category found in assessments from the studies (Pine et al. 2006, Geier et. Al, 2008, Von

Secker & Lissitz, 1999, Schneider et al., 2002) is a combination of the distal and remote categories presented by Ruiz-Primo et al. (2002), the former referring to assessment based on state or national standards, such as large scale assessments including NAEP,

TIMSS, MEAP, and the latter to general measures of achievement. When comparing inquiry teaching with another approach, there is a need to have an assessment tool that includes items that are proximal to the instruction as well as items that are distal. Having

31 distal items permits to observe if students’ learning can be extrapolated to other realms of science. Pine et al. (2006) and Schneider et al. (2003) were able to incorporate these two types of measures in their studies.

Figure 4.1. Schematic of a multilevel assessment of science achievement. Taken from: Ruiz-Primo et al., 2002, p. 372.

Alignment

Alignment refers to the relationship between the test and the “treatment”. Walker and Schaffarzick (1974) found that there was a greater effect size when comparing treatment to control conditions when there is close alignment between what the test measures and what is taught in the treatment condition. In the studies analyzed, alignment can be seen in two ways. One way is for the test and the treatment to be closely parallel.

Another way is to map the test onto the facets of inquiry. That is, to show how many of my inquiry facets the test taps into. Some studies with positive effects include assessments closely aligned with the treatment— most of the time through the conceptual facet. The positive impact of the treatment, in cases where there was an alignment of the

32 treatment with the outcome, supports Walker and Schaffarzick’s (1974) findings regarding the positive association between a type of instruction and a test aligned towards that instruction. For example, Pine et al. (2006) had two outcomes in his study, performance assessments that he or Shavelson, Baxter and Pine (1991) had constructed, and multiple-choice and constructed response items from TIMSS. The performance assessments were aligned with the inquiry-based science curriculum and with the procedural and the nature of scientific knowledge sub-facets. The TIMSS assessment was distal to Pine et al.’s treatment and aligned with the procedural and the conceptual facets of my framework.

Types of knowledge

The cognitive outcomes can also be categorized into four types of knowledge that are used to conceptualize science achievement (Li & Shavelson, 2004): Declarative, knowing that; procedural, knowing how; schematic, knowing why; and strategic: knowing about knowing in a domain. Declarative knowledge refers to scientific definitions or facts, and can be represented by students either by words or by other means such as images. Procedural knowledge refers to knowledge of the sequence of steps or of if-then production rules needed to complete a task. This knowledge can be simple including measurements of the amount of liquid in a beaker or complex including the design of an experiment to find out how body temperature changes with exercise.

Schematic knowledge refers to principles or explanatory models and involves relating complex phenomenon with concrete or common explanations. And strategic knowledge includes strategies of monitoring or planning and involves knowing where, when, and

33 how to use the other three types of knowledge (Li & Shavelson, 2004). Figure 4.2 shows how the knowledge types are related with each other.

Figure 4.2. The relationships between different knowledge types. Source: Shavelson, Ruiz-Primo, Li & Ayala, 2003.

Table 4.3, presents a classification of the types of knowledge, the focus of each type, and the assessments to which the types can be linked.

Table 4.3 Relationship between Types of Knowledge and Types of Assessments Type of knowledge Focus Type of assessment tasks usually associated with type of knowledge Declarative • Concepts • Multiple choice items • Content • Short answer questions • Facts • Concept maps • Scientific knowledge Procedural • Assessing skills • Performance assessment • Design of experiments • Multiple choice items • Data collection and representation Schematic • Explanations of mental models • Open-ended questions • Analysis • Some multiple choice • Comprehension questions (e.g., some items from NAEP or TIMSS)

Strategic • Application of knowledge • Constructed response

• Performance assessment • Some multiple choice questions (e.g., some items from NAEP or TIMSS)

34

Even though there may be a connection between the type of test item and the type of knowledge tested, this is not always the case. For example, to measure declarative knowledge, multiple-choice questions are commonly used. However, these types of questions can also measure schematic knowledge (Li, Ruiz-Primo, & Shavelson, 2006).

Performance assessments can be used to assess the procedural and strategic types of knowledge in some cases (Paper Towels) and all four types of knowledge in others (e.g.

Electric Mysteries.

Concluding Remarks About Measuring Students’ Science Achievement

Different empirical studies reviewed used a variety of assessments to measure students’ science achievement. Additionally, the different assessments varied as to proximity, alignment, and types of knowledge tapped. Each study defined inquiry in different ways, and none of them used Duschl’s facets or the inquiry framework presented in this dissertation. However, several of the assessments used in the studies can be aligned into one or several facets which does relate the proximity of the assessment to the inquiry curriculum.

The study presented in this dissertation focuses on comparing students who have learnt science through inquiry to other approaches using diverse assessments. Several aspects were considered in the design of the assessments. First, the assessments should be diverse and aligned with different components of inquiry-based science teaching. Second, the paper and pencil tests should have multiple-choice and constructed response questions that tap into declarative, procedural, and schematic types of knowledge. Third, the tests also need to provide tools for students to answer proximal and distal items.

35 Fourth, performance assessments should have both content rich and content lean components. Finally, I should use as many science facets as possible, so that the assessment encompasses both traditional methods of science teaching and inquiry-based science teaching. However, it may be very difficult to incorporate the social facet into the assessments.

36

CHAPTER 5: SCHOOL CONTEXT

The Educational System in Bogotá, Colombia

Bogotá, the capital city of Colombia, had an estimated population of 7.300.000 in

2009 (Secretaría de Educación del Distrito (SED), 2009). The Secretary of Education in

Bogotá estimated that 22.4% of the population were children between the ages of 5 and

17, generating the demand for preschool, primary, and secondary education (SED, 2009).

In 2009, the city had around 2,400 schools that served school-aged children. Of these schools, 384 were public, while 1,973 were private, all serving 1.611.808 students. Public schools have many sites, and they can have two or even three shifts during the day

(morning, afternoon and night). This means that for each of the public schools mentioned above, there might be from 2 to 18 separate buildings in different campus that house many students, with a double-shift in each. In 2009, 63% of students in Bogotá attended public schools while 37% attended private school. The majority of students, therefore, attend public schools.

In Bogotá, Socio-Economic Status (SES) is set by a scale of “Estratos” that goes from 0-6, 0 being the lowest and 6 the highest. Figure 5.1 shows the distribution of students in public schools according to their SES. In terms of schooling, 76% of students who attend public schools come from the lowest three SES levels. On the other hand, less than 2,000 students, corresponding to 0.2%, from the higher two SES levels attend public schools.

37

Figure 5.1. Distribution of students in public schools according to SES in Bogotá in 2009. Source: Secretaría de Educación del Distrito, 2009.

Three models are used in Bogotá’s public educational system: District Official,

Concession Official, and Agreement Official. District Official schools are institutions that are 100% administered by the city; Concession Official schools are public schools that are administered by private institutions; and Agreement Official schools are private institutions that sign an agreement with the Secretary of Education to provide schooling for non-paying students who otherwise would go to public or concession schools.

Table 5.1 shows the distribution of students in public and private schools in

Bogotá in 2009. Even though the number of District Official schools is low compared to private schools, each official school operates several sites, administering a total of 715

District Official school sites. The table shows that a majority of students (63%) attend

38 public schools, where the number of institutions is lower than the number of private schools that educate a smaller percentage of students. Public schools, therefore, have a larger number of students than private schools.

Table 5.1 Distribution of Students in Public and Private Schools in Bogotá in 2009 Type of School Number of Schools Number of Students (Percent (Percent of Schools) of Students) District Official 359 837,003 (15.2) (51.9) Concession Official 25 39,947 (1.1) (2.5) Agreement Official 335 143,514 (14.2) (8.9) Public Schools (Total) 384 1,020,464 (16.3) (63.3) Private Schools 1,973 591,344 (83.7) (36.7) Total K-12 2,357 1,611,808 Source: Secretaría de Educación del Distrito, 2009.

The Concession School Model in Bogotá

In the late 1990’s, the city’s administration realized that more than 140,000 school-aged children were not attending school and decided to set out a three-prong strategy to meet the demand of primary schooling (Uribe, Murnane, Willet and Somers,

2005). The first aspect of the strategy involved an expansion of the public sector, which included an increase in class sizes, reassignment of teachers from administrative posts to classrooms, and re-habilitating classrooms in run-down schools. The second, involved subsidizing private schools to enroll low-income students, and the third, set out to broaden the coverage as well as the quality of public education through a program called

Concession Schools.

39 Concession schools were built in extremely poor areas of the city where the demand for primary and secondary education was higher than the number of places supplied by the official public schools (Barrera-Osorio, 2006). Basically, the Concession

Schools’ program is a partnership between the public and the private education sectors, where the private schools administer public schools over a 15-year period (Barrera-

Osorio, 2006). In this model, the city provides the infrastructure, selects the students, and pays a pre-agreed sum per full-time student per year. The concession schools provide education to a very low-income population assigned to them by the city and must meet the performance standards set by the Secretary of Education measured through the

Colombian standardized exit exam ICFES (currently called SABER 11).

The Concession Schools were founded on the assumption that they could take advantage of the experience and high performance of the private schools through their administration of the public schools (Rodriguez and Hovde, 2002). The program was designed to overcome several of the most pressing issues faced by public schools including weak leadership, inability to select their own personnel, lack of labor flexibility, double-shift of students during each day, and restrictions on enacted curriculum (Patrinos, 2005). The managing institutions adhered to several standards that the concession schools must meet including a minimum number of hours of instruction, establishment of a single-shift of instruction, quality of nutritional provisions, a minimum qualifications profile for teachers and administrators, facility maintenance standards, criteria for the availability of instructional materials, a profile of the students to be served, and the evaluation of achievement by outcomes (Rodriguez and Hovde, 2002). Each

40 managing institution has considerable pedagogic and curricular freedom, since each institution has its experience in the educational sector.

The private schools were selected through a public procurement process, where bidders were evaluated on their proposed management plans. Two models were used in the concessionary management relationship: the one-to-one experience where one private school administered one public school, and the multiple school experience where an organization or private group took over the management of several schools (Rodriguez and Hovde, 2002).

The program was launched in 2000 with 22 schools. Three more schools were opened soon after, reaching a total of 25 Concession Schools. Currently, almost 40,000 students attend these schools in Bogotá, representing 4% of the city’s public enrollment.

Table 5.2 shows the current managing institutions and the number of schools each has.

Table 5.2 Managing Institutions of Concession Schools in Bogotá Name of Institution Number of Schools

Colsubsidio 5

Alianza Educativa 5

Don Bosco 5

Cafam 4

Fe y Alegría 2

La Salle 1

Nuevo Retiro 1

Gimnasio Moderno 1

Calasanz 1

41 Six years after the program was launched, concession schools showed a lower dropout rate and higher test scores than similar public schools (Barrera-Osorio, 2006).

The dropout rate in 2008 for concession schools was 1.6% compared to 4.1% in District

Official schools (SED, 2009). Additionally, the concession schools have also increased the number of students from this population that enter higher education, currently have a waiting list for admittance, and school violence problems are low in comparison to other similar public schools (, 2008). Furthermore, the concession schools reach students beyond academics by providing breakfast and lunch to all students, psychological counselling and special support for students with learning disabilities, and by working with the community through workshops directed to parents (El Tiempo,

2008).

Schools Participating in the Studies

All schools selected for the two studies that are part of this dissertation are

Concession Schools. Recall from above, the schools were selected by identifying similar characteristics including that all were concession schools, that all came from similar socio-economic backgrounds, and that all had similar results in standardized tests. Three schools are part of the Alianza Educativa, one school is part of Colsubsidio, and one school is managed by Gimnasio Moderno. Two schools from Alianza educativa were excluded from the study since the recently appointed science teacher did not have IBSE background. Table 5.3 shows the schools that are part of this dissertation research.

42 Table 5.3 Schools Participating in the Studies Name of School School Administrator Jaime Garzón Alianza Educativa IBSE La Giralda Alianza Educativa Santiago de Atalayas Alianza Educativa Control Las Mercedes Colsubsidio Sabio Caldas Gimnasio Moderno

Figure 5.2 shows a map of Bogotá that represents the distribution of these schools around the city. Even though some schools come from geographically different areas, all of the locations are similar in SES and in cultural characteristics.

Figure 5.2. Map of schools participating in the studies (adapted from Alianza Educativa, 2010).

Alianza Educativa: Jaime Garzón, La Giralda, and Santiago de las Atalayas

The Alianza Educativa is a non-profit association whose promoters are three private schools (Colegio Los Nogales, Colegio San Carlos, Colegio Nueva Granada) and the Universidad de los Andes. The main purpose of Alianza is “to promote, for the good of democracy, a high-quality education in Colombia as the best tool and means for citizens to achieve equal opportunities looking to reach an integral formation that

43 includes the intellectual, social, ethical, and aesthetic education of individuals (Alianza

Educativa, 2008, p. 4.)”

Alianza administers five schools in different low-income communities of Bogotá.

The five Alianza Schools serve 6,200 students from K-11 (K-11 in the Colombian system) with an average of 40 students per classroom. The regular schedule of classes is between 7:00 am and 2:30 pm, with extra-curricular activities from 2:30 pm to 4:00 pm.

Alianza currently has 245 teachers and 1817 alumni.

Some of the problems and difficulties present in the communities where the

Alianza works include undernourishment, interfamily violence, sexual abuse, teenage pregnancy, drug use, theft, academic gaps, and low motivation for studies (Alianza

Educativa, 2008).

Alianza has 5 main goals:

1. To graduate competent high school students.

2. To train a pool of qualified teachers.

3. To develop a model of interinstitutional work.

4. To be a center of influence in the community.

5. To carry out education research.

Pedagogically, Alianza develops its educational model and curriculum based on constructivist principles (Alianza Educativa, 2008):

1. Constructive processes where students construct their knowledge through a

gradual process.

2. Previous learning where experiences accumulate and contribute to each students’

construction of knowledge.

44 3. Performance and assessments where students produce different actions and

products that show their diverse levels of understanding.

4. Social interaction where learning is augmented through the interaction with

others.

Colsubsidio: Las Mercedes

Colsubsidio is a family compensation fund whose mission is to work for the integral improvement of conditions of the population and for the development of a supportive, harmonic, and equal society (Colsubsidio, 2010). In Colombia, family compensation funds are private institutions that finance themselves by resources that come from 2% of the salaries paid by private and public institutions. By law, each individual and their employer have to be affiliated to one of these funds and both the individual and the employer share the cost of this affiliation. These funds provide different services to their affiliates including recreation, health, housing, education, and training. Colsubsidio is one of the largest family compensation funds in the country (Villa and Duarte, 2002).

In the educational arena, Colsubsidio offers a model that strives for high quality schooling of the affiliates’ children. The main objective of Colsubsidio is to raise the educational level of the Colombian population through a model based on academic quality focused on the labor world and based on moral and social principles and values.

Colsubsidio has four private schools and currently administers five concession schools including Las Mercedes (Colsubsidio, 2010).

45 The mission of Colsubsidio’s concession schools is to form citizens with social and ethical commitment (Instituto para la Investigación Educativa y el Desarrollo

Pedagógico, IDEP, 2010). Colsubsidio’s schools have three main goals (IDEP, 2010):

1. The construction of a community where natural leaders will be identified and

where specific training will be given in communicative processes, conflict

resolution and project management.

2. The implementation of a project where the school accompanies the design of

strategies and instruments, and promotes the process of natural leaders.

3. The community’s evaluation of the impact of the schools’ actions to provide

feedback to these projects.

The pedagogical model used by Colsubsidio is based on constructivism (IDEP, 2010).

Gimnasio Moderno: Sabio Caldas

Gimnasio Sabio Caldas is administered by Gimnasio Moderno, one of the most traditional schools in Bogotá. The mission of Sabio Caldas is to implement human growth processes fostering the development of competencies for coexistence and labor performance that can lead to a productive and happy life.

The school’s goal is to help students find meaning in who they are, and what they do and learn. The school provides a context where respect, tolerance, responsibility, nutrition, and social work with families constitute the confidence framework so that the community feels committed and happy to be an active part of their children’s educational process (Manual de Convivencia, 2007). Sabio Caldas is a school where the interaction with the community is a model that aims to transcend to other local and district

46 communities. The school aims to create an “oasis” where each member of the community will find elements and opportunities that will permit him or her to advance and enrich his or her life and have a vision of the future (IDEP, 2010).

There are two topics of great importance in Sabio Caldas:

1. Community work, where institutional members work with the community located

close to the school.

2. Formation in values, where the school recognizes the loss of these values and the

role that the school and families have in this process.

The most common pedagogical strategies in Sabio Caldas include integration centers, pedagogical projects, interest centers, and research. Students in Sabio Caldas are given an education that allows them to enter the labor force with appropriate skills and competencies (Manual de Convivencia, 2007).

Schools’ results in standardized tests

The five schools that are part of these studies have participated in several national standardized tests including the SABER 11exit exam, SABER 5 (fifth grade), and

SABER 9 (ninth grade). The exit exam is high stakes used as the main selection criteria for entrance into higher education. All graduating students from schools take this multiple-choice exam that tests knowledge and skills in math, language, biology, physics, chemistry, social studies and philosophy. SABER 5 and 9 are standardized multiple- choice tests given to 5th and 9th grade students testing competencies and content knowledge in language, math and science. These exams are administered every three

47 years, and provide information on school performance rather than individual student

performance.

Based on each school’s results in the ICFES Saber 11 exit exams, each institution

is placed in a scale that ranges from Low to Very Superior. Table 5.4 shows the rating of

each school based on the 2010 ICFES results.

Table 5.4 Schools’ Level According to the Results from the 2010 ICFES Exit Exam. Name of School ICFES Scale

Jaime Garzón Superior

La Giralda Medium

Santiago de Atalayas High

Las Mercedes High

Sabio Caldas High

Note. The data in column 2 are from http://w6.icfes.gov.co:8095/Clas/

Table 5.5 presents the results of schools in the ICFES exit exams for science

related components (Physics, Biology, and Chemistry). This classification provides a

scale of achievement of each school comparing means with others. The lowest

classification is Low and the highest classification is “Very superior”.

Table 5.5 Schools’ Results in the Science Components of the ICFES Exam (2010). Name of School Biology Chemistry Physics Jaime Garzón 47.12 48.05 46.27 Santiago de Atalayas 47.03 48.48 45.28 La Giralda 44.66 45.19 43.53 Sabio Caldas 45.59 47.44 45.29 Las Mercedes 48.23 47.48 47.23 Note. The data in columns 2. 3 and 4are from http://w6.icfes.gov.co:8095/Clas/

Table 5.6 presents the results of schools in the SABER exams for Math,

Language, and Science. These scores are in general above the national average.

48 Table 5.6 Schools’ Results in the 2009 SABER Exams.

Name of School SABER 5 SABER 9 Math Language Science Math Language Science Santiago de Atalayas 337 342 340 331 327 324 Jaime Garzón 353 341 335 338 343 337 La Giralda 327 313 308 318 300 314 Sabio Caldas 306 300 299 311 305 300 Las Mercedes 343 350 341 332 328 332 Note. The data in columns 2,3,4,5,6 and 7 are from http://w6.icfes.gov.co:8095/Clas/

According to the information presented in this chapter, the students come from

similar backgrounds, they serve a low SES population in similar contexts, and they are all

concession schools. Statistical analyses of these data are presented in Chapter 6,

Methods.

49

CHAPTER 6. METHODS

Overview and Research Questions

This study aims to compare science achievement in students that participate in inquiry-based science education programs with students that have not participated in such programs. The comparison between groups looks at total scores in five different assessments, and also explores differences in the types of knowledge demanded by the assessments. The research question for this study, in broad terms, is:

How does the science achievement of students participating in the Colombian

IBSE program compare, on average, with the science achievement of students

who have not participated in inquiry-based science education programs?

More specifically, this study addresses the following questions:

1. Is there a difference between IBSE and Control students’ performance in the

Human Body Systems (HBS) paper and pencil tests?

2. Does IBSE and Control students’ science achievement vary depending on the type

of knowledge tested?

3. Does IBSE and non-IBSE students’ science achievement vary depending on the

proximity of the assessment used?

4. Does students’ mean achievement scores differ according to their group (IBSE,

Control), achievement on posttest (high medium, low) or content level (rich or

lean) of the performance assessments?

5. Do students perform similarly on multiple-choice tests and performance

assessments?

50

Participants and Context of this Study

Students from five schools in Bogotá participated in this study. Of these, students from three IBSE schools are the Treatment group while students from two Non-IBSE schools are the Control group. Institutions in Colombia (both IBSE and Control) base their science curriculum on the Science standards presented by the Colombian Ministry of Education (Ministerio de Educación Nacional (MEN), 2004). The standards present a broad range of content topics for fourth and fifth grade, as well as the development of science skills and social skills. The topics related to Human Body Systems (HBS) are part of the “Living Environment” component and include standards such as (MEN, 2004):

• “I identify the different levels of cellular organization in living things.

• I identify objects in my environment that have similar functions as those of my

organs, and I can justify my comparison.

• I represent the diverse systems and organs of human beings and I can explain their

function.”

Even though all schools in this study base their science curriculum on these standards, and are comparable in student’s socio-economic status (SES), there are differences in the way the curriculum is taught. In the following section, both the treatment and the control groups are described from three perspectives: Curriculum and materials, teachers, and students. Information about the curriculum and the materials was obtained directly from each of the modules or books used in classes. The Treatment and

Control groups with their specific curriculum and materials will be presented first, followed by specific information on teachers and students, which was obtained from

51 classroom observations and from interviews with the teachers. This section of the chapter will be presented by comparing what teachers and students in each of the groups actually do in the classroom.

Treatment Group

The Alianza Educativa manages five schools in very low SES neighborhoods in

Bogotá, Colombia. Each school has approximately 1200 students from K – 11, with an average of 40 students in each classroom. As above mentioned, the communities from which the student body of Alianza schools are drawn have several problems, including undernourishment, interfamily violence, sexual abuse, drug use, and low motivation for studying. However, Alianza schools have increased student achievement and shown higher scores than other public schools in their neighborhood. Fifth grade students from

La Giralda, Santiago de Atalayas, and Jaime Garzón, three of the Alianza schools participated in this study. Two of the Alianza schools, Miravalle and Argelia, were excluded from this study because the teachers are new (started with the Alianza schools in January 2009) and they have not participated in formal training workshops nor had a comparable experience to other fifth grade IBSE teachers.

Curriculum and Materials used in the Treatment Group

Since 2002, Alianza schools have implemented an Inquiry Based Science

Curriculum called Pequeños Científicos for grades 0 to 6. The program focuses on the acquisition of scientific knowledge and skills through direct experimentation that involves observation of phenomena, elaboration of hypothesis, design and execution of experiments, analysis of results, and conclusions. Students work in cooperative groups,

52 where each member has a well-defined role including secretary, time-keeper, materials´ administrator and presenter. Each student keeps a written record of the experiences in his or her notebook. Students are actively encouraged to present their ideas, and to discuss and argue about results and conclusions. In this process, the teachers are guides who lead students through questions and observations so that all children construct their own knowledge.

The program includes visits from scientific researchers, teachers from scientific disciplines, and university students who enhance the learning process. Students also share their learning process with their parents and families through homework and assignments.

In class, students are expected to generate predictions, inquire, experiment, and find evidence bearing on their predictions, keep a written record of their observations and results, present their conclusions and discuss their presentation with classmates.

Teachers in Pequeños Científicos participate in a rigorous training program on

IBSE that includes workshops, supporting visits and individual work. All teachers participate in 100 hours of training in inquiry-based teaching. The basic training involves four workshops over a two-year period; additional training is provided during the implementation of the module. The initial workshop includes a presentation of the IBSE strategy, where teachers are “students” in the first module they will teach. This first 20- hour workshop sets the stage for teachers to begin to develop their knowledge of how modules work. The workshop incorporates elements of cooperative groups, development of science skills, and administration of material, among other things. Teachers then go back to their school to implement their first module in the classroom. During this period, teachers receive two supporting visits. Six-months later, teachers participate in a second

53 workshop that focuses on the next module that teachers will implement. This second workshop is 16 hours long and builds on the experience teachers have had during the first six months of implementation. During the following six months, teachers receive visits once again and finish the school year with a third 16-hour workshop where the experience of the first year is reviewed and additional modules are presented. Teachers are visited during the second year, and then participate in a fourth and final 8-hour follow-up workshop at the end of this year. Sometimes, this cycle is repeated when there are new teachers in the schools. Figure 6.1 shows the training process of Pequeños

Científicos.

Figure 6.1. Process of teacher training in Pequeños Científicos. Taken from: Pequeños Científicos. 2004. Institutional Presentation.

54 Science teachers in Alianza schools were selected in 2002 to participate in this training process as a preparation for teaching the IBSE modules. The schools developed their 0-6th grade science curriculum based on the modules and the methodology provided by Pequeños Científicos. Some teachers in Alianza also had a one-hour weekly meeting when they work on professional development and were able to reflect and continue with their work in the implementation of IBSE modules. Additionally, the supporting visits, made by both teachers from the school and staff of Pequeños Científicos provided feedback on the teachers’ implementation of IBSE modules. The teachers at the three schools that were part of this study have been working with Pequeños Científicos for more than eight years.

In the Pequeños Científicos program, several of the modules that students study during primary school are based on the INSIGHTS modules developed by the Center for

Science Education, a part of the Educational Development Center (EDC). The specific

INSIGHTS module related to this study, Human Body Systems, has been implemented with fifth graders of Alianza Educativa. The module was translated into Spanish by the

Language Department of Universidad de los Andes. The Human Body Systems module allows students to explore how body systems work together in order to allow their body to work. The module starts by introducing students to the needs their bodies have in order to perform different functions. Then the students work with different body systems, exploring how they work and how they interact to make their body function properly

(CSE, 2011).

The Human Body Systems module is one of two modules taught in fifth grade.

Students work on the different aspects of this topic for approximately half of the school

55 year (5 months). Students have access to experimentation and to other diverse science related activities. Additionally, they have science textbooks as reference, including the same one used in the Control group. In the treatment schools, students answer an introductory assessment before starting the unit that measures their knowledge about the topics of the module. When they finish, they answer a final assessment that has similar questions. Both assessments are composed of constructed-response questions. Students are evaluated as to their growth from before to after studying HBS. The following questions exemplify those found in the assessments (Insights, 2003, p. 7-11):

• What is digestion?

• What does the blood do?

• How does the oxygen you breathe in get into your bloodstream?

One science class per teacher in the three IBSE schools was videotaped during one class period of approximately 50 minutes; the three teachers were also interviewed toward the end of the study. The information collected in both the videotapes and the interviews coincide with the information above, where the teachers use an inquiry-based approach to teach the Human Body System module. Further information on teachers and students in the Treatment group is presented below.

Control Group

Two other concession schools different from Alianza Educativa were part of this study. The first school, Las Mercedes, is one of the five concession’s school administered by Colsubsidio, a family compensation fund. The second school Sabio Caldas, is administered by Gimnasio Moderno, a traditional private school in Bogotá. Both schools are located in very low SES neighborhoods in Bogotá, Colombia with a comparable

56 population to that of the Alianza Schools. Each school has approximately 1200 students from K – 11, with an average of 40 students per classroom. The communities from which the student body of Las Mercedes and Sabio Caldas are drawn have very similar problems as those of Alianza schools, including undernourishment, interfamily violence, sexual abuse, drug use, and low motivation for studies. However, as it is the case with

Alianza schools, both Las Mercedes and Sabio Caldas have increased students’ achievement and shown higher scores than other public schools in their neighborhood.

These two schools are comparable to the Alianza Educativa schools, since they are also concession schools from a similar SES background and have comparable results in the ICFES exam (senior-year, school-exit exam). Further information about these schools is provided in the Context chapter.

Curriculum and Materials Used in the Control Group

Both Fifth grade HBS Science classes in the Control schools are based on the same textbook: Santillana Casa de las Ciencias Naturales 5. The book provides information about different body systems, among other topics. This book is the main source of information for students and teachers. Figure 6.2 shows an image of the type of information presented in the book. This is an example of the book’s information about the digestive system.

57

Figure 6.2. A print-out of pages of the on-line version of the book: “Santillana Casa Ciencias Naturales 5” (Editorial Santillana - Casa Ciencias 5. (n.d.)).

Science classes in the two Control schools were videotaped and the fifth grade teachers in each school were interviewed. The information gathered from the class observations and from the interview provided general characteristics of the type of teaching that occurs in each of these teachers’ classrooms. The most prominent characteristic present in both classes is that teaching is teacher-centered. This means that teachers in these classrooms do most of the talking in front of students and direct specific activities carried out by the students. Teachers in these classrooms spend a significant amount of time providing direct information to students, involving them in rote learning.

For example, one of the teachers frequently asked students to copy a direct text into their notebook from either the board or the textbook. Most of the work observed in the classroom focused on teaching declarative knowledge. The activities in these classrooms

58 ranged from a direct narration of information, to revising the previous lesson, or students filling out a crossword puzzle with HBS vocabulary. In order to develop the necessary learning teachers mentioned that they used the guides and activities from the book, as well as posters that describe different human body systems (See interview section).

Figure 6.3 shows one type of activity given by teachers to their students, taken from the textbook activity bank. The implementation of this activity was actually witnessed through the videotape.

Figure 6.3. Crossword puzzle (Digestion in Humans) used in Control group taken from on-line resources (Editorial Santillana - Casa Ciencias 5. (n.d.)).

59 Figure 6.4 shows another type of activity, Naming Parts of the Digestive System.

This activity also focused on vocabulary. In this case, students needed to fill out the names of the different parts of the digestive system.

Figure 6.4. Fill in the blank activity used in Control Groups teaching science taken from on-line activity bank (Editorial Santillana - Casa Ciencias 5. (n.d.)).

The Human Body Systems unit in four of the five the schools was taught in a similar time-frame. However, one of the Control schools actually took two months more than the other four schools working in the unit.

Teaching, Teachers and Students in the Treatment and Control Groups

Implementation check through videos and interviews

Five teachers, in total, participated in this study. They all taught two classes per school. Observations of each teacher were carried out in order to describe what each

60 implementation of the science curriculum looked like. All teachers were videotaped and the tapes were observed using the same criteria.

Classroom observation

Fifth grade classes from teachers of both the IBSE and the comparison schools were videotaped when implementing the HBS unit. One class from each of the five teachers was videotaped with the objective of gathering additional information about what occurred in the classroom and obtaining information about the fidelity of implementation in both, the Treatment and Control groups.

Table 6.1. Teachers’ Classroom Practice as Evidenced from One Lesson Activities observed in videos Teach Teacher Teacher Teacher Teacher er1 2 3 4 5 Type of School IBSE IBSE IBSE Non Non IBSE IBSE Teacher transmits information xxx xxxx Teacher grades students notebooks xxxxx Teacher asks declarative questions xx xx x x x For example: Which is the main function of the heart? Students answer declarative questions xx xx xx x xxx Students complete words the teacher mentions x xx xxx Students copy information (from documents, x xxxx xx book or the board) in their notebooks Teacher asks students about their previous xxx xx xx knowledge Teacher walks around the groups asking xxxx xxx xxx x questions Teacher gives instructions for experiments xx x xx Students do experiments xxxx xx xxx Students do an activity (not an experiment) xxx xxx Example: writing/completing a crossword Teacher gives feedback to students xx xx xx Teacher asks for explanations xxxx xxx x Students discuss in their group xxxx xxxx xx Students answer questions using evidence xxxx xxx xx Students register information with their own xxxx x xx words in their notebooks

61 Each science class took approximately the same amount of time (60 minutes). These recordings did not aim to describe teachers’ classroom practices nor characterize their teaching strategies, but to provide information about the facets of inquiry and traditional practices visible in each class. Each video was observed and Table 6.1 was created. The activities in the table show observed moments in the video. In order to describe the activities, the videos were coded. This coding was carried out in several steps. The first step was the definition of the categories for which two Master’s students and myself observed two IBSE and one Control video separately. Each video was observed during five-minute intervals. After each interval, each observer described what was observed in that time frame. The main focus during each observation was on the teacher and what he or she was doing (for example, the teacher asked questions or the teacher rotated from group to group). There were several subcategories within each category, although each observer named them differently (e.g. the teacher is teaching versus the teacher is transmitting information). Common categories were created when sharing the observations, extrapolating the main observations instead of focusing on specific details of each teacher. The observations of these three videos provided the general categories that were used when observing the other two videos. The observations of the last two videos only added a couple of activities that had not been described previously.

Therefore, for the description of the class session, the presented categories emerged from the observed classroom videos.

While IBSE teachers spent more classroom time asking declarative questions and for explanations, and walked around the room, giving feedback to students in their groups, Control teachers transmitted information and graded notebooks, and placed

62 themselves in the front of the classroom. In general, students in IBSE classrooms were doing experiments, discussing in their groups, answering questions using evidence, and registering information with their own words in their notebook. On the other hand, students in Control classrooms were copying information into their notebook and doing written activities such as filling in a worksheet or crossword individually (even though they were seated in pairs in one school and in groups of four in the other school).

Therefore, based on the classroom observations, IBSE teachers characterize what was defined in Chapter 2 as inquiry teaching, while Control group teachers are traditional in their approach to science teaching.

Teachers´ interviews

Each teacher was interviewed in March 2011, after all the students’ data was collected. Information gathered through the interview included teaching experience, years of teaching, materials used in the science classroom, professional development in science education, and classroom practices. The interview instrument was translated and adapted from the TIMSS teacher survey (see Appendix A).

Table 6.2 provides general information about each teacher and characterizes his or her teaching practices based on their responses to the interview. Four out of the five teachers were women and all ranged in ages between 30 and 50 years. Four teachers taught fourth and fifth grades, and one teacher fifth, sixth, and seventh. The number of students per class ranged from 36 to 43, and 4 out of 5 teachers had between 40 and 43 students in their classes. All teachers had more than 10 years of teaching experience, and they all mentioned teaching for approximately 10 years in their current school. The number of hours teaching science varied little with 3.7 weekly hours for the IBSE

63 teachers and 4 weekly hours for the Control teachers. All teachers held a Teachers’

License.

Table 6.2 Summary General Information About Each Teacher Based on Interviews

Age

in the of hool IBSE class hours sc Gender Teacher Books used Grade taught Range Years Teaching Highest level of # of years Weekly Science # of students per formal education

1 F 40-49 Y 4, 41 16 10 3.7 Teaching Pequeños Científicos Modules, 5 Licence Amigos de la naturaleza 2 F 40-49 Y 4, 41 14 10 3.7 Teaching Pequeños Científicos Modules, 5 Licence Amigos de la naturaleza 3 F 30-39 Y 4, 42 10 10 3.7 Teaching Pequeños Científicos Modules, 5 Licence Amigos de la naturaleza 4 M 40-49 N 4, 36 20 10 4 Teaching Casa de las ciencias naturales. 5 Licence Amigos de la naturaleza 5 F 30-39 N 5, 43 15 10 4 Teaching Casa de las ciencias naturales, 6, Licence Ciencia y vida 7 Amigos de la naturaleza

According to the interviews, teachers have comparable ages, experience,

academic specialization, and number of students per class. One of the main differences

appeared when teachers were asked about their experience and training in IBSE

programs. All three IBSE teachers received, over 8 years ago, training in the modules and

in inquiry teaching from the Pequeños Científicos program. They received additional

training and follow-up classroom visits after their initial workshop. All of them have

been teaching science through inquiry for more than 8 years. Additionally, teachers from

schools one and two have also studied a specialization in science education, focused in

scientific inquiry. None of the Control group teachers had received any training in IBSE,

64 but they have received professional development on topics such as science teaching, science curriculum, and integration of technology into science classes.

The other main differences that appeared from the interviews are related to what students do in class, according to the teachers, and are presented in Table 6.3. IBSE teachers reported that their students plan and design experiments and actually carry out the experiments in almost all class sessions, while Control teachers noted that this only happens seldom. Control teachers say that their students memorize facts and principles in almost all class sessions, while Treatment teachers answered their students never do.

Finally, IBSE teachers have their students relate what they learn in their everyday life to what they are learning in class, and this is seldom done by teacher 5, and never by teacher

4.

65 Table 6.3 In my class, students Teacher 1 Teacher 2 Teacher 3 Teacher 4 Teacher 5 a) Observe natural phenomena such as the weather or a About half the lessons About half the lessons Some lessons Some lessons Some lessons growing and describe what they see b) Watch me do a science Never Never Never Some lessons Some lessons experiment c) Design or plan experiments or Every or almost every Every or almost every About half the Some lessons Some lessons projects lesson lesson lessons About half the d) Do experiments or projects About half the lessons About half the lessons Some lessons Some lessons lessons e) Work together in small groups Every or almost every Every or almost every Every or almost Some lessons Some lessons on experiments or projects lesson lesson every lesson f) Read their textbooks or other Every or almost every Every or almost every Every or almost Every or almost Every or almost resource materials lesson lesson every lesson every lesson every lesson g) Have students memorize facts About half the Never Never Never Some lessons and principles lessons h) Give explanations about Every or almost every Every or almost every About half the About half the Some lessons something they are studying lesson lesson lessons lessons i) Relate what they are learning Every or almost every Every or almost every Every or almost Never Some lessons in science to their daily lesson lesson every lesson j) Work individually at their own About half the About half the About half the lessons About half the lessons Some lessons pace lessons lessons Characterization of What Students do in Class Based on Teacher Responses to Adapted TIMSS Questionnaire

66 Mapping Observed Classes and Interviews onto the Inquiry Facets

The information provided by the description of the modules, classroom observations, and interviews allows to map, tentatively, each teacher onto the facets of inquiry presented in Chapter 2 (see Table 6.4). Even though this information is only a sample of the practices of each teacher, it does provide evidence of the type of facets present in the classrooms.

Table 6.4 Mapping Teachers and Students with the Inquiry Facets Procedural Nature of Scientific Conceptual Social Knowledge

edge

Oriented Questions -

terpreting data On Activities -

Teacher Asking Scientifically Designing Experiments Executing Scientific Procedures Gathering and in Representing Data Carrying Hands Formulating hypothesis Reflecting on Nature of Science Drawing explanations based on evidence Generating and revising theories Applying scientific knowledge Drawing on/Connecting to prior knowl Explaining ideas/mental models Organizing concepts and principles Participating in class discussions Arguing/debating scientific ideas Developing communication skills Cooperative learning IBSE Teachers 1 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 2 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! 3 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! Control Teachers 4 ! ! 5 ! ! !

The Control group teachers focused mainly on the conceptual facet, which reflects a traditional approach to teaching science while IBSE teachers have a characterization of their practice that is aligned with my definition of inquiry.

67 Part A - Assessing and Comparing Students´ Knowledge of Human Body Systems with a Paper and Pencil Test Subjects

Two hundred and forty-seven fifth-grade students from the three Alianza schools

(IBSE schools) and 162 fifth grade students from Sabio Caldas and Las Mercedes

(Control schools), who finished the school year in November 2010, participated in this study. However, of these 409 students, only 365 students (Table 6.5) have both pretest and posttest scores, since some of the students were absent on either of the two test dates.

There are no statistically significant mean differences between the Control or IBSE students that were absent. Additionally, five students in total, 2 coming from the control group and 3 coming from the IBSE group were excluded because they skipped pages on the questionnaire.

In Colombia, fifth grade is equivalent to 6th grade in the United States, therefore, the ages of students who participated in the study ranged from 10 to 12 years. Two classes were selected in each school, and as above mentioned, each class had the same teacher as the only one teaching science in fifth grade in each school.

Table 6.5. Student Participants in the Study by Group, School and Class (sample sizes). Type of School and Number of Students with School School Class a Pre and a Post Test IBSE 1,1 41 Schools School 1 1,2 39 2,1 38 School 2 2,2 38 3,1 37 School 3 3,2 34 Control 4,1 27 Schools School 4 4,2 35 5,1 39 School 5 5,2 37 TOTAL 365

68

Instrument 1-Multiple Choice and Constructed Response Items

The paper and pencil assessment was developed in three steps: (1) item development, (2) item revision based on trials and think-aloud protocols, and (3) booklet construction.

Item development.

Test items came from different sources. Some items were developed during a six- day workshop organized by Pequeños Científicos and led by Professors Maria Araceli

Ruiz-Primo, Guillermo Solano-Flores and myself. Twenty-three IBSE teachers from

Colombia and Panama, as well as disciplinary experts and other science educators were involved in item development.

Participants were guided in the development of two types of items: some that were close to the Human Body Systems IBSE module (proximal) and others that matched

Colombia’s national education standards (distal). For the development of the proximal items, participants used the learning goals and the lesson storyboards of the module.

Participants were also asked to develop items that tapped into three types of knowledge: (1) declarative knowledge (factual, conceptual knowledge) or “knowing that”

(e.g. name the system that is composed of the heart and blood vessels; (2) procedural knowledge (step by step procedures) or “knowing how” (e.g. how to interpret a graph); and (3) schematic knowledge (knowledge used to reason about) or “knowing why” (e.g. explain why you breathe faster and your heart beats faster when you exercise).

Once the participants developed the items, each item was reviewed for content, format, knowledge type, and clarity of language. After this review some items were

69 selected and improved, while others were discarded. To insure an adequate number of items for the final version of the test, two science educators from Universidad de los

Andes and myself developed additional proximal items using both textbooks. .

Distal items were provided by ICFES, the Colombian Institute that carries out all standardized testing in the country. Specifically, items for this assessment were taken word-by-word from the released items from the SABER 2009 fifth grade science test. It was decided to use these distal items instead of those developed in the workshop because they had been field tested, revised and used on a national scale by ICFES.

Two versions of the paper and pencil assessment were produced. The pre-test with 28 items contained fewer items than the post-test and focused more on proximal than on distal items. The post-test contained 37 items and included all of the multiple- choice (MC) items from the pre-test, several distal items from SABER 2009, and six constructed response (CR) questions.

Item revision

Different versions of the items were tested on several occasions. On February

2009, a trial was carried out with 68 students in a city close to Bogotá (Tenjo). One result from this trial indicated that the questions were too difficult for students and lacked clarity. This led to the incorporation of approximately eight figures, in order to allow greater clarity and make the questions easier to interpret. Additionally, the amount of written text that had to be read by students in several questions was reduced.

Other trials were carried out in June and September, 2009 with 38 students in one of the Treatment classes (La Giralda) and 30 in a Control class (Las Mercedes), respectively. These students had already finished the HBS module or content but would

70 not participate in this study. This way, the trial provided information of how students from the same context as those in the study would answer the questions. The results from this trial led us to see that even though the questions worked better, several of them were still too difficult. During these trials, students were asked to underline words that they did not understand in the questions. When two or more students underlined a word, the wording was modified.

Some students in these trials also participated in think-aloud protocols. Through this method, students answer the question in the assessment and simultaneously verbalize what they are thinking in order to gather information regarding the student’s cognitive processes (Johnstone, Bottsford-Miller, and Thompson, 2006).

Before Think-Aloud Protocols After Think-Aloud Protocols 19. In what part of the following figure 19. In what part of the following figure does the connection between the does the connection between the circulatory and the digestive systems circulatory and the digestive systems occur? occur?

A. 1 A. Esophagus B. 2 B. Stomach C. 3 C. Large intestine D. 4 D. Small intestine Figure 6.5. Example of a change in a question after the input received in a Think-Aloud.

71 Students’ think-aloud protocols showed difficulties interpreting some drawings, as well as problems understanding the question. An example of the type of change carried in the items after the think-aloud protocols is presented in Figure 6.5. In this case, students initially thought that they needed to fill in the blank next to the number, instead of answering the actual question.

Items from each trial were analyzed using the statistical software Iteman (Item

Analysis) in order to identify items that were not working well and to calculate the reliability of the scores. Iteman also provided information regarding the difficulty level of the items as well as their discrimination index.

In general, the results of these trials indicated that the items had a high level of difficulty and reading load. This led to their revision. In order to assess their difficulty, the items were presented to elementary school science teachers in a workshop in Chile in

October 2009, where they rated the perceived difficulty level of each item, based on the percentage of students in their classes that they thought would be able to answer each question. Teachers classified about 50% of the items as too difficult (where less than 30% of students would answer the question). Based on the teachers’ rating, items were also modified, trying to keep their difficulty, but making them easier to understand for the students. A couple of items were removed from the test.

Excluded items, in general, included those with double negative. Additionally, all items were revised to unify the distractors so that they were all similar. In some cases, instead of using a negative or to help students in their comprehension of questions, keywords were highlighted in bold.

72 The final version of the pretest booklet had 28 questions, while the posttest had 31 multiple-choice questions and 6 constructed response questions. When scoring responses, multiple-choice questions had a 1-point value while the constructed response questions had a 2-point value since each constructed-response question had 2 parts. Table 6.6 summarizes the characteristics of the paper and pencil tests. The final versions of both the pre-test and the complete post-test can be found in the Appendix (Appendix B and C).

Table 6.6. Assessments related with the Paper and Pencil Tests Type of test Description Number of questions Pre-test The original booklet. 28

Post equal Pre The post-test contains the same 28 questions as the pre-test. Post Total Additional items in the booklet includes 31 more distal items. Post + Constructed Post Total with 6 additional constructed 37 Response response questions.

Booklet construction

The human body systems’ paper and pencil booklets balanced knowledge types

(declarative, procedural and schematic) and proximity of the item to the curriculum

(proximal v. distal). A group of science educators carried out the classification of the questions according to the types of knowledge type. Table 6.7 shows the distribution of questions according to the type of knowledge and the proximity to the curriculum. Two versions of the post-test were created, in order to balance the distribution of the types of questions (CR or MC) presented the students. Booklet 1 began with the multiple-choice options, while booklet two began with open-ended questions.

73 Table 6.7. Composition of Items in the Human Body System Booklets Type of Type of Declarative Procedural Schematic Test Knowledge Pretest Proximity Proximal Distal Proximal Distal Proximal Distal Post_equal_ Number of 11 1 4 4 8 0 Pre Questions Post_Total Proximity Proximal Distal Proximal Distal Proximal Distal Number of 12 1 4 4 10 0 Questions Proximity Proximal Distal Proximal Distal Proximal Distal Post+CR Number of 12 1 5 6 10 3 Questions

Additionally, the items can mapped into the facets of scientific inquiry presented in the framework. Table 6.8 shows the distribution of questions according to the type of knowledge. In this case, only the conceptual and procedural inquiry facets are incorporated into the booklets. In general, declarative and schematic items where mapped into the Conceptual facet, and Procedural items were matched with the Procedural facet.

Table 6.8 Composition of Items in the Human Body Systems Mapped into the Facets of Inquiry Type of Test Conceptual Procedural Pretest 20 items 8 items Post_equal_Pre Post_Total 23 items 8 items Post+CR 26 items 11 items Three examples of translated, revised items selected for the final booklet are presented below:

This first example was categorized as a proximal item tapping into declarative knowledge.

What is the function of the digestive system? A. Transform food that enters the body.* B. Carry oxygen through the body. C. Carry through the body. D. Regulate the body’s temperature.

74

The second example was categorized as a distal item (SABER 2009) tapping into schematic knowledge.

When food is digested, where does you body use it? A. Only in the blood of your body. B. Only in the stomach of your body. C. Only in the lungs of your body. D. In the cells of your body.*

This third example was categorized as a proximal item tapping into procedural knowledge.

Doctor Perez records the rhythm of respiration of different people when they are at rest.

He created the following table:

Person Breaths per minute Pedro the baby 38 7 year-old girl 25 7 year-old boy 25 10 year-old boy 20 Mother 16

This table suggest that: A. Boys breathe faster than girls. B. Older people breathe faster than younger people. C. Girls breathe faster than boys. D. Younger people breathe faster than older people.*

This example presents a constructed response proximal question taping into schematic knowledge.

How are the digestive and the circulatory systems related?

75 Test administration and data collection.

Test givers were trained in the implementation of this test. All test givers participated in a four-hour training session where they received and read a test implementation manual and were provided with logistical and technical information about the data collection (Appendix D). The training and manual helped standardize test administration and unify instructions and protocols.

All tests were given under standardized testing conditions and classroom setup. In each classroom, two test givers were present to administer the test. They followed the instructions found in the training manual and reported, in a provided form, any irregularities that occurred during the implementation.

Fifth grade students from three IBSE schools and two comparison schools took the Human Body Systems (HBS) pretest during the first 10 days of February, 2010. The same students, with the exception of students from Las Mercedes, took the post-test around the first week of June, 2010. In Las Mercedes, the post-test took place on

November the 8th, since the teacher had not finished the HBS unit before. In all cases, the post-test was given a few days after the teacher finished the unit. The reliabilities of the posttest and the posttest scales are in general high. The reliabilities in the pretest are low, since students had not been exposed to the topics and are probably guessing the answer.

The distal reliability is low given the small number of items, and also due to problems related with how clear those items were.

Table 6.9 presents the reliabilities for the paper and pencil assessments including the sub-scales by type of knowledge.

76 Table 6.9. Reliabilities of the Paper and Pencil assessments. Reliabilities based on assessment Complete Data File Reliability Pre Test 0.566 Reliability Post Test 0.791 Reliability Post Test CR 0.600 Reliability Post Test MC + CR 0.831 Reliability Pre Declarative 0.369 Reliability Pre Procedural 0.256 Reliability Pre Schematic 0.258 Reliability Pre Proximal 0.549 Reliability Pre Distal 0.079 Reliability Post Declarative 0.541 Reliability Post Declarative MC - CR 0.636 Reliability Post Procedural 0.576 Reliability Post Schematic 0.569 Reliability Post Schematic MC - CR 0.667 Reliability Post Proximal 0.742 Reliability Post Proximal MC - RC 0.782 Reliability Post Distal 0.503 Reliability Post Distal MC – CR 0.590 Reliability Post equal Pre 0.743 Reliability Post equal Pre Declarative 0.541 Reliability Post equal Pre Procedural 0.411 Reliability Post equal Pre Schematic 0.507 Reliability Post equal Pre Proximal 0.733 Reliability Post equal Pre Distal 0.197

Analysis of data

Descriptive statistics were calculated including the means and standard deviations

per class and per school. Specific analyses were done in order to provide information to

answer each research sub-question.

1. Is there a difference between IBSE and Control students’ performance in the Human

Body Systems (HBS) paper and pencil tests?

A nested analysis of variance (ANOVA) with pretest as dependent variable was

done to see if groups were equivalent. There were no differences in treatment, but there

77 were differences by school. Therefore, a nested analysis of covariance (ANCOVA), with pretest as covariate and the posttest as dependent variable was performed.

2. Does IBSE and Control students’ science achievement vary depending on the type of

knowledge tested?

A nested ANCOVA for each type of knowledge as dependent variable, and pretest as covariate was performed. Correlations among sub-scales of types of knowledge2 were done within each group and compared.

3. Does IBSE and non-IBSE students’ science achievement vary depending on the

proximity of the assessment used?

A nested ANCOVA with the proximal and distal item sub-scale as dependent variables, and pretest as covariate were performed. Correlations among sub-scales of proximity where done within each group and compared.

Part B - Assessing and Comparing Students´ Knowledge of Human Body Systems with Performance Assessments: Paper Towels and Pulse Subjects

Based on students’ performance on the multiple-choice test, the top ten-, the middle ten-, and the bottom ten-scoring students were selected from each school to participate in the performance assessments (due time restrictions and student availability). Even though the top, middle, and bottom students were relative to each school, if students had been grouped as a whole sample, ignoring school, the selection of student participants would change less than 10 percent. This led us to answer the

2 Appendix E presents the items included in each scale.

78 following question: Do students perform similarly on multiple-choice tests and performance assessments?

Table 6.10 shows the number of students who participated in the performance assessments. It was not possible to carry out the performance assessments for 5 students in school 4, since they were absent on the programmed day. Two students were removed from the database since they did not have a pre-test.

Table 6.10 Number of students who participated in the Performance Assessments. Type of School Paper towels Pulse School and class All CD* All CD* School 1,1 20 20 20 20 School 1,2 13 13 13 13 IBSE School 2,1 15 15 15 15 Schools School 2,2 17 16 17 16 School 3,1 14 13 14 13 School 3,2 17 15 17 15 School 4,1 11 11 11 11 Control School 4,2 14 13 15 14 Schools School 5,1 15 15 14 14 School 5,2 16 16 16 16 TOTAL 152 147 152 147 *CD = complete data

Instrument 2 - Paper Towels Performance Assessment

Description of instrument.

The hands-on Paper Towels performance assessment was taken from the Stanford

Educational Assessment Laboratory website

(http://www.stanford.edu/dept/SUSE/SEAL/Assessments/PaperTowels.htm), and translated into Spanish (Appendix F). A science performance assessment, in general,

79 includes a challenge as an invitation to perform an investigation, a response format, and a scoring system. In this case, the challenge requires students to determine which of three types of paper towels can absorb the most and which the least amount of water? Students were given materials such as a pitcher of water, a beaker, a scale, different types of paper towels, an eyedropper, and a stopwatch, among others. There are no specifications of the steps to be taken in order to solve the challenge; part of the challenge is for students to create a procedure for carrying out the investigation. Responses were registered in a notebook and collected after the assessment ended. The time each student spent responding to the assessment was also recorded in the notebook.

This assessment can be classified as content-lean since no special knowledge is needed to perform the assessment. More over, students cannot specifically apply the knowledge and skills learned when studying Human Body Systems. However, this assessment is process-rich, since students need to apply diverse scientific skills in order to come up with one of the many solutions to solve the challenge.

Revision of performance assessments

The Paper Towels Performance Assessment was tried with sixth grade students.

After this trial, several language adjustments were made and it was decided that the first page of the notebook would be individually read with each student to ensure that he or she understood what was called for. Given that there is no large variety of paper towels in

Colombia, many trials were carried out, with as many paper towels as possible, in order to find extreme differences between the towel that absorbed the most and the towel that absorbed the least water. The notebook is presented in Appendix F.

80 Instrument 3 - Pulse Performance Assessment

Description of instrument.

This instrument was an of the Pulse performance assessment, used by the Third International Math and Science Study (TIMSS). In this 4th Grade performance task, students determine a baseline pulse rate and collect data on the changes in that rate with exercise. They then describe the changes in the data and develop an explanation for their observations. The assessment can be accessed at: http://pals.sri.com/tasks/5-

8/PulseMS/. This assessment was translated into Spanish and is presented in Appendix G.

This assessment can be classified as content-rich since students can specifically apply the knowledge and skills learned when studying Human Body Systems. This assessment does not require as much process knowledge as does Paper Towels.

Revision of performance assessments

The Pulse Performance Assessment was tried out with fifth grade students from a private school during March 2010. Students had problems understanding the assessment, so the instructions were modified. Students had to find for themselves a way in which to record the data. However, this was very difficult for them. It was decided to keep the data table that came with the original TIMSS task instructions. Instructions appeared several times in the notebook and were read at the beginning.

Test administration and data collection for the Paper Towels and Pulse performance assessments.

As above mentioned, the ten top, ten middle, and ten bottom performers in that test were selected to participate in this part of the study. All these students responded to both the Paper Towels and the Pulse performance assessments during the same session.

81 The students per school were assigned to start with either the Paper Towels or the Pulse assessment, and then rotated. For each assessment, one trained observer took notes in an established format (Appendix H), writing all the steps of what the student did.

Scoring

The paper towels performance assessments were graded using the scoring form taken from the Stanford Education Assessment Laboratory website to assess students’ notebooks.

The first time paper towels was graded, there was very low interrater reliability

(0.495) since there was no common ground about what counted as a correct scientific method to determine which paper towel absorbs more water, and the difference between being careful and being sloppy when measuring. After redefining and unifying these categories the notebooks were graded a second time and the reliability changed to 0.844.

The pulse assessments were graded using a rubric from TIMSS. One research assistant and myself were trained in the scoring process. Grading of pulse was less problematic. Additional examples from TIMSS were used for the grading matrix including the keywords expected in a correct answer. The interrater reliability for this assessment was 0.945. The grading matrix is presented in Appendix I.

Table 6.11 presents the reliabilities for each of the performance assessments.

Table 6.11 Reliabilities of the Performance Assessments Complete Data File Reliability Paper Towels 0.531 Reliability Pulse 0.552

Analysis of data

Data from Part B were used to answer the following research question:

82 4. Does students’ mean achievement scores differ according to their group (IBSE,

Control), achievement on posttest (high medium, low) or content level (rich or lean)

of the performance assessments?

To characterize differences in science achievement, a nested ANOVA was done done by by ability (high, medium, or low) and by each type of assessment as dependent variable.

Correlations between the two performance assessments and between the performance assessments and the sub-scales that measure different types of knowledge were done within each group and compared.

Part C - Comparing Students’ Performance in Paper and Pencil Tests and Performance Assessments For this part of the study, all the instruments were used to compare the performance of students of different abilities (high, medium and low) on the different types of tests (paper and pencil, Pulse, and Paper Towels).

Proposed Analysis of Data

This part of the study aims to answer the following research sub-question:

5. Do students perform similarly on multiple-choice tests and performance assessments?

Descriptive statistics provided information about mean differences and correlations between paper and pencil and performance assessments provided insight about the relationship between these two types of measures, especially looking for differences in correlations between IBSE and Control students. A nested ANOVA by ability with Pulse or Paper Towels and paper and pencil test as dependent variables were performed to identified differences among the assessments and interactions.

83 Correlations among the performance assessments and the paper and pencil tests that measure different types of knowledge were done by treatment in order to identify relationships among different variables.

84 CHAPTER 7. RESULTS

This chapter addresses the question of the impact of inquiry-based science education on student achievement. More specifically, it examines achievement differences between inquiry science education and typical science education in Colombia for overall achievement, achievement by types of knowledge, and achievement by the proximity of the achievement measure to the curriculum. The results are organized as follows: (a) comparison of the achievement of the IBSE and Control groups using the paper and pencil test, (b) comparison of the groups using performance assessments, and

(c) comparison of results on the paper and pencil tests with the performance assessments.

Part A - Assessing and Comparing Students´ Knowledge of Human Body Systems: Paper and Pencil Test

In this section, data are presented to address the first two research sub-questions.

Each of the questions will be followed by the related data collected and analyzed from students in the five dissertation schools.

1. Is there a difference between IBSE and Control students’ performance in the Human

Body Systems (HBS) paper and pencil tests?

This part focuses on the results of the paper and pencil tests including the multiple-choice pretest and posttest, and the constructed response questions. Table 7.1 presents descriptive statistics for students’ scores on the multiple-choice questions by type of instruction. There are several differences between the means at pretest and posttest, and for the gain between pre- and posttest.

85 First, consider pretest mean differences. On the one hand, IBSE schools 1 and 2 have the highest means while IBSE school 3 and Control school 4 have the lowest means.

When comparing the treatment and the Control groups, IBSE students obtained an average score of 11.84 points on the pretest, while Control students obtained an average of 10.66 points. A nested analysis of variance (ANOVA) with pretest as dependent variable identified differences among pretest scores3. More specifically, the design was

Class nested within School and School nested within Treatment (IBSE v. Control).4

There was no treatment effect when performing this ANOVA (F(1,3)=.959 ; p>.05).

There are also no differences among classes within school (F(1,3)=.398; p>.05).

However, there is a significant difference in mean performance in the pretest among schools within each treatment condition (F(3,363)=29.85; p<.05). When comparing means, there is a significant difference between schools 1 with 3, 4, and 5, and school 2 with 3 and 4. These means even identify differences between IBSE schools, where 1 and

2 are significantly different than 3. These results lead us to conclude that students and schools do not come from the same population and some adjustments will be needed in comparing IBSE and Control groups. Perhaps more importantly, a significant school effect with so few schools reduces the power of all statistical tests as in this nested design. School enters importantly into the error term for statistical tests of the treatment effect. Indeed the difference between the IBSE and Control groups (Table 7.3) is 2.75 points and the effect size 0.62, which is not insubstantial yet not significant.

Second, mean differences were observed on the posttest (with the same items as on the pretest [“Post-equal-Pre”]). Once again, IBSE schools 1 and 2 have the highest

3 At Pretest, there was a gender effect, were boys scored higher than girls. There are no gender effects in the subsequent analyses of this study. 4 The ANOVA assumptions were met including normality and homogeneity of variances (F=.806; p>.05).

86 means, showing a gain greater than 3.6 points in the former and greater than 5.5 in the latter. The rest of schools, with the exception of Class 10 in school 5, have gains that range between 1.15 and 2 points. Class 10 has the lowest gain even though this class had the same teacher as Class 9.

Table 7.1. Descriptive Statistics of the Results of the Pre and Post Multiple-Choice Tests Pretest Pretest Post-equal- Post-equal- Gain from School Class mean SD Pre mean Pre SD Pre to Post 1 1 12.88 4.17 16.66 4.42 3.78 1 2 13.33 3.58 16.97 3.54 3.64

2 3 12.29 3.51 17.89 3.45 5.60

IBSE 2 4 11.66 3.04 17.24 4.00 5.58 3 5 10.41 3.68 11.68 3.63 1.27 3 6 10.15 3.16 12.15 3.33 2.00

4 7 10.15 2.32 11.30 2.71 1.15 4 8 9.49 3.06 11.06 3.27 1.57 5 9 11.54 2.96 13.41 3.00 1.87 Control 5 10 11.22 3.66 11.68 3.96 0.46

Since there are differences in the pretest, this variable will be used as a covariate in subsequent analyses.5 For covariate adjustment to be practical, the correlation between the Pretest (covariate) and the Post-equal-Pre variables should be and is in my data around .60 for the multiple-choice measure (Table 7.2). Consequently an analysis of covariance (ANCOVA) was used not only to address selectivity bias (with differences in pretest scores in this quasi-experimental design) but also to increase statistical power.

The ANCOVA assumptions were checked including normality and homogeneity of variances (F=.806; p>.20), linearity between posttest and pretest, and homogeneity of regression slopes.

5 Selectivity bias provides challenges to interpreting difference among groups, my focus, as well as schools and classes. With limited information, the best that could be done is to use the pretest as a covariate to adjust the comparisons at posttest.

87

Table 7.2. Correlations among the Paper and Pencil Tests I IBSE (N=227) Construc- Post MC Pretest Post-Equal- Posttest ted and CONTROL Total Pre Total Response Constructe (N=138) Total d Pretest Total 1 .684* .709* .513* .711*

Post-Equal-Pre .545* 1 .980* .645* .952*

Posttest Total .569 .958* 1 .666* .976*

Constructed .304* .285* .331* 1 .790* Response Total Post MC and .562* .903* .950* .534* 1 Constructed * Correlation is significant at the 0.05 level.

The treatment effect—IBSE vs. Control—was not statistically significant

(F(1,3)=2.635; p>.05) using the Pre-Equal-Post achievement measure. There was also no class within school effect (F5,354)=1.348; p>.05). On the other hand, there was a school effect (F3,354)=23.33; p<.05). Table 7.3 presents the adjusted marginal means when using a pretest as a covariate. There is a covariate adjusted mean difference in favor of the treatment group, but it is no statistical difference.

88 Table 7.3 Adjusted Marginal Means of Results for Post-Equal-Pre Post-equal-Pre Post-equal-Pre Overall means School Class adjusted mean standard Error 1 1 15.64 .441 1 2 15.64 .456

2 3 17.28 .455 15.162

IBSE 2 4 17.06 .453 3 5 12.36 .461 3 6 13.01 .482

4 7 12.16 .540 4 8 12.37 .480 5 9 13.31 .447 12.410 Control 5 10 11.80 .459

Table 7.4 presents the descriptive statistics of the full paper and pencil posttest,

including the multiple-choice questions that appeared on the pretest, the additional

multiple-choice questions that were included on the posttest, and the constructed response

questions given at posttest. The general patterns seen in this table correspond to what was

described above. IBSE students performed, on average, better than control students but

due to lack of power, the effect was not statistically significant.

Table 7.4 Descriptive Statistics of the Results of the Full Post Paper and Pencil Test Constructed Constructed Overall Overall Response Response Means Post MC Post MC Means School Class mean SD Constructed and CR and CR Post MC and Response mean SD CR

1 1 2.61 1.48 23.59 7.09 1 2 2.64 1.54 24.51 5.05

E 2 3 3.39 1.81 26.68 5.46 2.252 21.98

IBS 2 4 3.00 1.69 25.45 6.55 3 5 1.24 1.44 17.11 5.94 3 6 1.03 1.14 17.18 5.37

4 7 .74 .98 16.07 3.83 4 8 .77 1.03 15.57 4.70 1.236 18.07 5 9 1.74 1.09 20.13 4.66 Control 5 0 1.14 1.00 16.95 5.15

89 Correlations between constructed response questions and other questions (See

Table 7.2) indicate that high scores on the pretest are associated with high scores in the

Constructed-Response-Total. However, the strength of this association varied between groups with a low significant correlation between Constructed-Response-Total and

Pretest Total in Control (.304), and a moderate significant one in the IBSE group (.513).

A nested analysis of covariance (ANCOVA) with pretest as covariate and constructed response or full posttest as dependent variables identified differences among the constructed response scores and the full posttest combining multiple-choice and constructed response scores. As mentioned above, the design was Class nested within

School and School nested within Treatment (IBSE v. Control).6 There was no treatment effect when performing this ANCOVA (Table 7.5) or an effect by class within school within treatment. On the other hand, the school effect was significant for both assessments. The treatment effect favored the IBSE students in the constructed response and posttest total assessments with a mean difference of 1.02 and 3.91 respectively

(Table 7.4). The effect sizes were 0.62 for constructed response and 0.57 for the posttest total assessment. But due to the large school effect, the power of the statistical test was compromised and the difference, as said, was not statistically significant.

Table 7.5 Results of the Nested ANCOVA with Constructed Response and Posttest as Dependent Variables Constructed Response Posttest + Constructed

Response

F p F p Treatment 2.26(1,3) p>.05 2.18(1,3) p>.05 School(Treatment) 23.56(3,354) p<.05 18.47(3,354) p<.05 Class(School(Treatment)) 1.06(5,354) p>.05 1.93(5,354) p>.05

6 The ANCOVA assumptions were met including normal distribution of histogram, homogeneity of variances (F=2.78; p<.05), linearity between posttest and pretest, homogeneity of regression slopes, independence of covariance and treatments, and we assume the covariate was measured without error.

90

2. Does IBSE and Control students’ science achievement vary depending on the type of

knowledge tested?

Recall that the multiple-choice test was designed to tap into three types of student knowledge: declarative, procedural, and schematic. The effects of inquiry science were examined by type of knowledge demanded by the achievement test. Note one might expect IBSE student to perform better on procedural and perhaps schematic (inquiry) items than other items. Table 7.6 presents the descriptive statistics of the overall adjusted means by type of knowledge tested and type of instruction. The results for all types of knowledge are similar to those reported above. There is no significant treatment effect although the mean difference is in the predicted direction.

Table 7.6 Descriptive Statistics of the Overall Adjusted Means of Science Achievement Depending on the Type of Knowledge by Treatment Group Declarative Post total Procedural Post total Schematic Post total MC+CR MC+CR MC+CR Mean Standard Mean Standard Mean Standard Deviation Deviation Deviation IBSE 7.21 1.51 6.37 1.40 6.98 1.60 Control 5.74 2.40 5.52 2.23 5.46 2.55

In the declarative items, IBSE students outperformed the Control students by 1.47 points, with a medium effect size of 0.56. The result in the schematic scale is very similar, IBSE students outperformed the Control students by 1.52 points, and the effect size is 0.55. However, in the procedural items, even if the IBSE students performed better (0.85 points), the effect size was smaller (0.40). Figure 7.1 presents the effect sizes in the different types of knowledge.

91 !#)"

!#("

!#'"

!#&" !""#$%&'()#& !#%"

!#$"

!" "7,8+9.01," *+,-./.012+" 3/4,+56/.-" '$*+#,&-"&./0#,&-"&12-3+#45#&(2&%6#&7-,%%#,%&

Figure 7.1. Effect Size by Types of Knowledge.

Table 7.7 presents the correlations between the pre- and posttest items by types of knowledge. The correlations among types of knowledge for Control students are, in general, lower than for IBSE students. In general, moderate significant correlations were seen among the same type of knowledge at pre- and posttest. However, the correlation between schematic pre- and posttest scores though not as high it is significant in the

Control group. There is a high correlation between declarative post and schematic post in the IBSE group (.680) versus a moderate one in the Control group (.357). Additionally, there is a moderate correlation between procedural post and schematic knowledge at posttest in the IBSE group (.613) versus a low one in the Control group (.248).

92

Table 7.7 Correlations of the Results of Science Achievement Depending on the Type of Knowledge I IBSE (N=227) Declara- Procedu- Schema- Declara- Procedu- Schema- CONTROL tive ral tic tive ral tic (N=138) Pre Pre Pre Post Post Post Declarative Pre 1 .361* .363* .474* .372* .466* Procedural Pre .141 1 .212* .537* .533* .527*

Schematic Pre .080 .422* 1 .400* .453* .467* Declarative Post .441* .216* .242* 1 .567* .680* Procedural Post .405* .253* .135 .237* 1 .248*

Schematic Post .400* .207* .187* .357* .613* 1

* Correlation is significant at the 0.05 level (2-tailed).

Three nested analysis of ANCOVAs with their respective knowledge types as dependent variables and pretest as covariates were run. Table 7.8 presents the results of these ANCOVAs. Once again, there were no treatment effects and no class within school effects.

Table 7.8 Results of the Nested ANCOVA for Knowledge Types with Pretest as Covariate*

Declarative Procedural Schematic

F p F p F p Treatment 2.347 p>.05 2.698 p>.05 1.934 p>.05 School(Treatment) 15.965 p<.05 19.744 p<.05 16.450 p<.05 Class(School(Treatment)) 1.487 p>.05 .379 p>.05 1.671 p>.05 *Degrees of freedom = F(1, 354).

3. Does IBSE and Control students’ science achievement vary depending on the

proximity of the assessment used?

93 Recall that the multiple-choice items varied as to how close they were to the science curriculum and standards in Colombia. Most items were “close” or “proximal” to what the students were learning. A few items taken from the Colombian assessment of achievement were distal—they touched on human body systems but were not directly tied to the students curriculum. This section compares IBSE with Control on test items differing by proximity (Table 7.9). The pattern of results is just what we’ve seen earlier:

No treatment effect and a big school effect that explains lack of power. Even though the mean differences between IBSE and Control are in the predicted direction, they are not statistically significant.

Table 7.9 Descriptive Statistics of the Results of Science Achievement Depending on the Proximity of the Items Proximal Scho Posttotal Overall Distal Posttotal Overall ol Class MC+CR mean MC+CR mean Mean SD Mean SD 1 1 15.59 4.17 4.32 1.93 1 2 16.05 3.38 4.38 1.41

2 3 16.95 3.15 4.84 1.33 14.38 4.04

IBSE 2 4 16.68 3.48 4.21 1.86 3 5 11.08 3.44 3.46 1.71 3 6 11.41 3.14 3.65 1.72

4 7 10.74 2.83 3.45 1.50 4 8 10.48 3.06 3.11 1.66 11.65 3.89 5 9 12.49 2.71 4.13 1.67 Control 5 0 10.89 3.65 4.00 1.67

Correlations between different questions measuring proximity are presented in

Table 7.10. Correlations in Control results range from low (Distal Pre with Proximal Pre and Distal Pre with Proximal Post), to moderate. However, all the moderate correlations are significant. On the other hand, all correlations for IBSE groups are moderate and

94 significant. It is interesting to observe that there are higher correlations between proximal and distal items, than between distal pre and distal post items in both groups. This is not the case in correlations between proximal pre and proximal post items, whose correlation is the largest in both groups. Low correlations between the pre- and post distal items can be explained by the low reliability of that scale.

Table 7.10 Correlations of the Results of Science Achievement Depending on Proximity I IBSE (N=227)

CONTROL Proximal Distal Proximal Distal (N=138) Pre Pre Post Post Proximal Pre 1 .302* .641* .594* Distal Pre .100 1 .365* .342* Proximal Post .535* .095 1 .620* Distal Post .403* .244* .474* 1 * Correlation is significant at the 0.05 level (2-tailed).

Table 7.11 Results of the Nested ANCOVA for Proximity with Pretest as Covariate* 7 Proximal Distal

F p F p Treatment 2.597 p>.05 .339 p>.05 School(Treatment) 28.770 p<.05 4.858 p>.05 Class(School(Treatment)) 1.211 p>.05 .721 p>.05 *Degrees of freedom = F(1, 354).

A nested ANCOVA identified mean differences between IBSE and Control on the proximity posttest using the corresponding pretest as the covariate. The same pattern of

7 It is important to highlight that proximal items were analyzed trying to identify and separate items that were proximal to the IBSE curriculum and proximal to the Control curriculum. However, the curriculum in both groups was practically the same. Both texts cover each system and its connection with other systems. There was no intention to specifically create different items because of this similar curriculum. When the items were split into IBSE and Control scales (perhaps making close calls), the scales were small (n(items) of 6 and 7) and of low reliability. A statistical comparison was done with each of these scales and there was no IBSE or Control effect in the scales by IBSE items or Control items.

95 results were observed: no statistically significant treatment effect due to low statistical power; the direction of mean difference was as hypothesized (Table 7.11). With respect to distal items, there were no effects at all.

Part B - Assessing and Comparing Students´ Knowledge of Human Body Systems with Performance Assessments: Paper Towels and Pulse

4. Does students’ mean achievement scores differ according to their group (IBSE,

Control), achievement on posttest (high medium, low) or content level (rich or lean)

of the performance assessments?

Recall that for each school, ten high, ten middle, and ten low performing students were selected based on the multiple-choice posttest. These students took both the pulse and paper towels performance assessments. The descriptive statistics for these assessments are presented in Table 7.12.

Table 7.12 Descriptive Statistics for the Performance Assessments.

Paper Towels Paper Towels Overall

ol Pulse Total Total Process mean Overall Overall

Scho Class Mean SD mean Mean SD mean Mean SD 1 1 5.95 1.73 5.05 1.39 3.40 .236 1 2 5.54 1.20 5.00 1.53 3.23 .292 5.99 4.76 3.26 2 3 6.60 1.45 4.20 1.42 3.07 .272

IBSE 3.25 .264 2 4 6.31 1.40 5.00 1.46 3 5 5.85 1.41 4.62 2.14 3.25 .304 3 6 5.67 1.54 4.67 1.95 3.34 .272

4 7 3.91 1.22 5.64 1.50 3.36 .318 4 8 3.85 1.07 4.21 4.50 1.65 4.59 3.14 .282 2.88 5 9 4.47 1.13 4.29 1.98 2.71 .282 Control 5 10 4.63 0.96 3.94 1.95 2.31 .264

The table presents an additional data set that only takes into consideration those questions in the Paper Towels assessment that are related with processes. Recall that the

96 paper towels total score depended on using the correct (controlled) processes to investigate why some types of towels held more water and some held less water as well as arriving at the correct answer. In what is reported in the table below the data exclude the last question (result of which paper towel absorbs more or less water) in the assessment. It turns out students in both groups could answer the last question without considering the experimentation or from the observations recorded in their notebooks.

Rather, they could touch and see the towels and get the right answer! These data are relevant since performance assessments provide direct information about procedural and strategic knowledge, which the excluded question did not test for. This new version of the assessment will be named Paper Towels Process and its reliability is 0.627.

In Pulse, IBSE students outperform Control students by 1.78 points. In Paper

Towels Total there were not difference between groups. However, in Paper Towels

Process, once again, the IBSE students have a higher score than that of the Control students.

A nested ANCOVA with ability and treatment between subject variables and performance assessments as the within subjects variable provided information about the effects among performance assessments. This analysis also tested the interaction between

Treatment and Ability.8 The treatment effect—IBSE vs. Control—was statistically significant for Pulse and for Paper Towels Process (Table 7.13). There was no effect for school or class in any of the assessments (with the exception of Pulse that showed an effect in School), nor was there a treatment effect for Paper Towels Total. Therefore, in

8 The ANOVA assumptions were met including normality and homogeneity of variances for pulse (F=.999; p>.20) and for Paper Towels Total (F=1.162; p>.20).

97 both performance assessments, IBSE students performed significantly better than Control students.

Table 7.13 Results of the Nested ANOVA for Performance Assessments DF Pulse Paper Paper Towels Towels

Total Process

F p F p F p Treatment 1 27.476 p<.05 .590 p>.05 4.435 p<.05 Ability 2 4.433 p<.05 1.846 p>.05 3.102 p<.05 Ability*Treatment 2 .478 p>.05 .697 p>.05 .693 p>.05 School(Treatment) 3 1.886 p>.05 1.703 p>.05 2.348 p>.05 Class(School(Treatment)) 5 .411 p>.05 1.006 p>.05 .352 p>.05

Part C - Comparing Students’ Performance on Paper and Pencil Tests and Performance Assessments In this section, data are presented to address the last research sub-question: Do students perform similarly on paper and pencil tests and performance assessments The question is followed by the related data collected and analyzed from students that took all the three assessments.

5. Do students perform similarly on paper and pencil tests and performance

assessments?

I made a distinction between the usual paper and pencil tests used to measure science achievement and performance assessments. I hypothesized that the two types of measures, while both measuring knowledge and understanding, also tapped somewhat different aspects especially inquiry aspects of science achievement. If the two types of measure are highly correlated, my hypothesis doesn’t hold. However, if there are different patterns of correlation, I may have some relevant evidence for my hypothesis.

98 Table 7.14 presents the correlations between these assessments. The correlations between the assessments differ depending on treatment condition. There are no statistically significant correlations among test for the Control groups but several significant correlations in the IBSE group.

Table 7.14 Correlations of the Types of Test9

I

IBSE

(N=92)

Paper Paper CONTROL Posttest Towels Towels (N=55) Total Pulse Total Total Process Posttest Total 1 .406*10 .236* .248* Pulse Total .187 1 .050 .017 Paper Towels Total .152 .085 1 .828* Paper Towels .179 .186 .883* 1 Process * Correlation is significant at the 0.05 level (2-tailed).

Table 7.15 presents correlations among types of knowledge and types of tests in the Control group. There was a low significant correlation in the Control group between

Pulse and the declarative scale. There are no other significant correlations in this group.

On the other hand, there are several low significant correlations in the IBSE group (Table

7.16), where high scores in Pulse are associated with high scores in the three knowledge types. The Paper Towels assessment is correlated with the schematic and declarative scales.

9 Correlations are probably artificially high since they do not reflect the variation and covariation of scores in the middle of the joint distribution. This is due to the nature of the design that was used.

10 When correlating the Pulse performance assessment with questions in the posttest more directly related with this assessment (circulatory and respiratory system, relation between exercise and body systems) the correlation between the Posttest Total and the performance assessment increases to 0.474.

99 Table 7.15 Correlations of the Types of Knowledge and the Performance Assessments DeclarativeC Procedural Schematic CONTROL (N=55) Post Post Post Pulse Total .277* .041 .132 Paper Towels Total .200 .062 .101 Paper Towels Process .206 .091 .107 * Correlation is significant at the 0.05 level (2-tailed).

Table 7.16 Correlations of the Types of Knowledge and the Performance Assessments DeclarativeI Procedural Schematic IBSE (N=92) Post Post Post Pulse Total .397* .392* .345* Paper Towels Total .223* .150 .262* Paper Towels Process .214* .181 .261* * Correlation is significant at the 0.05 level (2-tailed).

A nested ANOVA by ability with Pulse or Paper Towels and paper and pencil test as dependent variables, was performed. The treatment effect—IBSE vs. Control—, ability, and school were statistically significant for types of test (Table 7.17). There was no effect for ability*treatment or class.

Table 7.17 Results of the Nested ANOVA for Types of Test DF Paper Paper Towels Towels

Total Process

F p F p Treatment 1 25.989 p<.05 34.856 p<.05 Ability 2 139.821 p<.05 166.126 p<.05 Ability*Treatment 1.130 p>.05 1.018 p>.05 School(Treatment) 3 7.607 p<.05 8.026 p<.05 Class(School(Treatment)) 137 .778 p>.05 1.260 p>.05

100

CHAPTER 8. CONCLUSIONS

Colombian science education standards and several educational reforms in the world now ask students to learn to carry out scientific inquiries in addition to facts and concepts in science. Even though inquiry-based science education (IBSE) has grown in

Colombia in the past ten years, the impact of these programs has not been systematically evaluated. The objective of this dissertation was to start addressing the question of the impact of IBSE on student achievement, and specifically to examine achievement differences between inquiry science education and typical science education in Colombia for overall achievement, achievement by different types of knowledge tapped (viz. declarative, procedural, schematic), achievement by the proximity of the achievement measure to the curriculum, and achievement as measured by performance assessments.

This is the first effort in Colombia to evaluate the impact of inquiry-based science teaching measured through student achievement; the results are mixed. In general, there was, on average no statistically significant treatment effects as measured by the paper and pencil test over all or by multiple-choice or constructed response questions regardless of whether they tapped different kinds of knowledge or were proximal or distal to the curriculum. There was, however, a significant treatment effect on the performance assessments.

IBSE students demonstrated, on average, “medium” overall performance in the multiple-choice questions, attaining an average of 54% of the possible points. In contrast,

Control students consistently performed on average 10% lower than IBSE in these types

101 of questions. There was a similar trend for IBSE students outperforming Control students on the constructed response items. However, performance of both groups was lower on these items, with IBSE students attaining an average of 37.5% of the possible points and

Control students 20.6%. Students’ low performance on these questions may be due to a limited ability to communicate scientific knowledge potentially caused by lack of guided exposition to these types of questions or general literacy problems recognized in

Colombian education (eg PISA and ICFES state exam results).

Even though there was no statistically significant treatment effect as measured by the variety of paper and pencil tests, IBSE students consistently outperformed Control students on these different measures of science achievement with a substantial effect size of 0.6.

When looking at these results more closely, and comparing achievement by types of knowledge (declarative, procedural, and schematic), I found that the results follow a similar trend. IBSE students achieve higher scores than Control students, even though there are no significant differences. There is a greater effect size (around 0.55) on the declarative and schematic scales than that in procedural scale (0.40). This last finding strikes me as interesting, since one of the main criticisms to inquiry is that it does not develop conceptual frameworks since it is focused on hands-on activities and procedures.

Additionally, since the conception of inquiry observed in the literature focuses on the hands-on or experimental component, one would expect a greater effect size for IBSE students’ on the procedural-knowledge scale and a smaller effect size on the declarative- or schematic-knowledge scales. However, in this study, IBSE students had a better performance on the conceptual facet.

102 One possible explanation for this last finding comes from an analysis of the types of items used to tap procedural knowledge. Due to the format of the test, the procedural questions focused on skills such as reading tables and graphs, and the control of variables, but did not measure in its full depth the procedural facet that is developed through inquiry (see Chapter 2 on inquiry teaching). This is one of the limitations that multiple-choice questions present when measuring the procedural aspect (Baxter &

Shavelson, 1994).

IBSE students also outperformed Control students on proximal items (effect size

.65). It is important to consider that both groups used the same human body curriculum.

From a content perspective, what differentiated the curriculum was the teaching method.

IBSE students should able to go beyond the concepts through their experimentation, while Control students are limited to the content in the book.

On the other hand, when comparing the two groups on distal items, IBSE students performed only slightly higher (effect size 0.1). This strikes me as unusual given the comparatively lower performance of IBSE students on distal items than on proximal items. The source of these differences might be the low reliability in the scale, since there were only a few distal items in the assessment. Furthermore, distal items are generally not worked on in the classroom and it is probable that some students guessed when answering the items, thereby lowering the reliability.

Two performance assessments measured students’ skills in doing scientific inquiries. These assessments also provide additional information given the limitations of multiple-choice items mentioned above in regards to measuring the procedural component of inquiry-based teaching. The first assessment –pulse—was proximally close

103 to the content taught to both groups, although teaching method varied, while the second assessment –paper towels—focused on students’ scientific inquiry skills. There was a significant treatment effect on both the content rich performance assessment (pulse-effect size 1.11) as well as in the content lean assessment (paper towels process-effect size

0.36).

IBSE students are expected to do inquiries better than Control students as measured by both assessments. IBSE students performed “medium-high” on both performance assessments, receiving about 66% of the total possible points. Control students, on the other hand, had a lower yet different performance on both assessments, correctly answering 46% of questions on pulse and 56% on paper towels.

There are stronger correlations in the IBSE group between the three types of knowledge —declarative procedural, and schematic—and pulse. The pulse performance assessment has procedural elements that are similar to those of the paper and pencil test, since students register information and search for patterns in a table. This performance assessment also has declarative and schematic elements in that students are asked to provide explanations for their pulse data. Consequently IBSE students have the inquiry knowledge needed to do the performance task and could rely on the declarative, procedural and schematic knowledge. The correlations for control students were consistently low. They simply did not know how to carry out science investigations and they worked these tasks in an erratic manner, with no specific types of knowledge associated with their performance. Hence the low correlation with the knowledge-type measures.

104 The paper towels performance assessment does not require the skills of a typical activity in the Human Body Systems unit. The fact that IBSE students performed significantly better than Control students implies that the former students developed general investigative and problem solving skills that could be related with strategic knowledge, and therefore with inquiry skills as a result of the HBS unit.

The significant higher performance of IBSE students on performance assessments corresponds to the general conception that students who participate in IBSE programs develop the investigative skills in greater depth than students who don’t. These results are different from those found in other studies (Pine et al., 2006) where there was no significant treatment results in three out of four performance assessments. In my study, there was an effect of inquiry in both measures including Paper Towels, where Pine et al.

(2006)11 found none.

This study provides yet another albeit not conclusive result about inquiry-based science teaching. However, and even though there was no statistical difference in some of the measures, IBSE students consistently outperformed Control students in all measures with a medium to large effect size. Additionally, there was a significant effect in the performance assessments. In large part, then, the lack of statistical significance could be traced to the low power of these tests based on the nature of the nested design where schools nested within treatments varied in the achievement they produced in their students, in both treatment conditions.

11 Pine et al. (2006) treatment groups studied three or four ‘‘units’’ for 6 – 8 weeks each during a school year.

105 Limitations and Challenges

Previous studies that compared science inquiry teaching to other approaches showed diverse limitations including inconsistent measures of achievement, lack of observation of the treatment, differences in the treatments, and a lack of a clear definition of inquiry. This dissertation aimed at reducing several of those limitations by providing diverse measures including the paper and pencil tests with the multiple-choice and constructed response questions and the performing assessments, a detailed description of the treatment and the control including the observation of classes and interviews with the teachers, a detailed construction of items that tap into different types of knowledge, and a clear framework for what inquiry means in the study.

Perhaps one of the most important limitations of previous studies is the lack of classroom observations in order to pair the conception of inquiry teaching with what gets enacted in the classroom. This study was able to carry out a detailed revision of the types of practices that occur in the science classroom of both IBSE and the Control groups.

Nevertheless, this study has several limitations including the selection of schools, sample size, and nested design. Schools were selected by convenience and availability and by trying to match them on a set of characteristics including type of administration

(concession), socio-economic status of the community, and results on standardized exams. Nevertheless the schools and students differed at pretest. Therefore, the data analysis focused on ways in which this difference could be addressed by using a nested design and the pretest as covariate.

Even though the number of students participating in this study is 365, the sample size, using a nested design, was sufficient, the number of schools participating was not

106 (cf. Pine et al. ,2006). It was a challenge to find both Control and IBSE schools that were willing to participate in the study. Additionally, within the interested IBSE schools, teacher rotation is a major factor that in this case led to the exclusion of two IBSE schools from the study. Furthermore, it is difficult to follow trained IBSE teachers closely to insure fidelity of implementation of the treatment. Control schools were difficult to find since it was challenging to identify similar concession schools that were willing to participate in the study and were teaching the same unit. Because of these reasons, and other logistical ones (money and time), this study only included 2 Control schools and 3

IBSE schools. The small school sample size affects the statistical power of the design and results in low power to detect the consistent mean differences in favor of IBSE students.

Some of the difference could also be due to chance among the five schools. This interpretation is bolstered when the design for the performance tasks is considered. Here we matched students within each school on posttest (low, medium, and high) and in this ability x treatment nested design, we found statistically significant differences.

Reflections and Directions for Future Research

The results obtained in this dissertation can serve as a stimulus for further research on the impact of inquiry-based science teaching. There are several lessons that can be applied in future research. For instance, the effects of inquiry-based science teaching is directly linked with the teacher. In order to have a better representation of inquiry teaching in studies, it is very important to carry out the classroom observations and interviews during the selection process. Additionally, not just the number of students

107 matters. But the number of classes within a school and the number of schools need to be sufficiently large to provide powerful tests of the IBSE effect.

The development of the paper and pencil assessment was a very rich, yet long process. The final instrument used in this study can now serve as the basis for future research with greater number of students, classes and schools. Incorporating aspects such as types of knowledge and proximity provides the possibility to carry out more in-depth analysis of aspects where inquiry teaching can have an effect. Additionally, including other types of questions and assessments such as constructed response questions and performance assessments provides greater information about student science achievement. Therefore, the use of multiple measures was an important aspect of this study and allowed a greater understanding about the impact of inquiry teaching. Future studies should continue to include multiple-achievement measures.

Even though the results of this research provide inconclusive evidence about the impact of inquiry-based science teaching, there are several possibilities for further research.

First, additional research can also shed on other ways in which to measure scientific inquiry skills. Since there were a few procedural items that tap into inquiry skills, the paper and pencil test showed limitations in measuring some inquiry skills that were tapped with performance assessments. However, there might be additional ways in which to tap into different types of knowledge with pencil and paper tests that are taught through inquiry.

Second, IBSE students showed higher gains in schematic knowledge when compared to Control students. Further research could provide data on the types of skills

108 that students acquire with inquiry, which allow them to go deeper into each concept and apply it to new situations. A question that can be addressed with additional research might focus on the effect of inquiry in student learning abilities.

Last, additional studies can address the difference found between achievement in proximal versus distal items. Even though IBSE students always outperformed Control students, the effect size on the distal items was considerably lower than on proximal items. Measures that include a greater number of distal items can provide further information about what inquiry students are able to achieve. The use of national assessment data, including results of Colombian students’ achievement at grades 5

(SABER 5) and 9 (SABER 9), can shed light in this topic. If these data sets are used to compare inquiry students with control students, further information can be collected regarding the nature of the differences among teaching approaches. Of course the challenge will be to assure that teachers labeled inquiry teachers are actually teaching with inquiry!

A Final Note…

The question of how to measure student achievement is very important. This dissertation addressed this question by using different measures to try to get a better picture of what each student knows and can do with scientific knowledge and skills. The initial stages of this research synthesized a framework for inquiry-based science education. This specific study mapped teachers’ practices to all the facets included in the framework. However, there are aspects within the framework that are not necessarily addressed by the measures used in this study such as the design of experiments or the social facet. Further research can provide additional ways in which to measure student

109 achievement in all the components that make up inquiry-based science teaching. This way, what a student learns through inquiry teaching can be more accurately represented.

Beyond the specific test taking skills or performance abilities, inquiry aims to provide a framework for students to think and approach the natural world in a more systematic and deep way. It goes beyond specific concepts and leads students to be able to schematically and strategically use available information. In the long run, research can provide information of a key question: “What is the role of inquiry-based science education in 21st century skills?”

110

LITERATURE CITED

Alianza Educativa. 2008. Proyecto Educativo Institucional. Alianza Educativa. Retrieved December 23, 2010, from http://www.alianzaeducativa.edu.co/images/documentos/pei_2008.pdf

Alianza Educativa. 2010. ¿Quiénes Somos? Alianza Educativa. Retrieved December 23, 2010, from http://www.alianzaeducativa.edu.co/iquienes-somos.html

American Association for the Advancement of Science (1990). Science for All Americans. New York: Oxford University Press.

Baxter, G.P., & Shavelson, R.J. (1994.) Science performance assessments: Benchmarks and surrogates. International Journal of Educational Research, 21, 279-298.

Barrera-Osorio, F. (2006). The impact of private provision of public education : empirical evidence from Bogota's concession schools. IDEAS: Economics and Finance Research. Retrieved December 21, 2010, from http://ideas.repec.org/p/wbk/wbrwps/4121.html

Berg, C. A. R., Bergendahl, V. C. B., Lundberg, B. K. S., & Tibell, L. A. E. (2003). Benefiting from an open-ended experiment? A comparison of attitudes to, and outcomes of, an expository versus an open-inquiry version of the same experiment. International Journal of Science Education, 25(3), 351-372.

Bredderman, T. (1983). Effects of Activity-Based Elementary Science on Student Outcomes: A Quantitative Synthesis. Review of Educational Research, 53(4), 499-518.

Bruner, J. S. (1961). The Act of Discovery. Harvard Educational Review, 31(1), 21-32.

Chang, C.-Y., & Mao, S.-L. (1999). Comparison of Taiwan Science Students' Outcomes with Inquiry-Group versus Traditional Instruction. Journal of Educational Research, 92(6), 340-346.

Colsubsidio. (2010). Colsubsidio. Retrieved December 10, 2010, from http://www.colsubsidio.com/porta_serv/educacion/formal.html

CSE: Insights: An Inquiry-Based Elementary School Science Curriculum. (2011). EDC's Center for Science Education (CSE) Home. Retrieved March 30, 2011, from http://cse.edc.org/curriculum/insightsElem/insights6.asp

Duschl, R. A. (2003). Assessment of inquiry. Everyday assessment in the science classroom, 41-59.

111

Duschl, R. A., Schweingruber, H. A., & Shouse, A. W. (2007). Taking Science to School: Learning and Teaching Science in Grades K-8: National Press.

Editorial Santillana - Casa Ciencias 5. (n.d.). Grupo editorial Santillana, Colombia. Retrieved March 30, 2011, from http://santillana.com.co/docentes/index.php?player_init/Q2FzYV9DaWVuY2lhc1 81/c3R1ZGVudA==/

El Tiempo. (2008, September 6). Proyecto de colegios en concesión cumple 9 años en funcionamiento. El Tiempo. Retrieved December 23, 2010, from www.eltiempo.com/archivo/documento/CMS-4505006

Furtak, E. M. (2006). The Problem with Answers: An Exploration of Guided Scientific Inquiry Teaching. Science Education, 90(3), 453-467.

Furtak, E. M., & Seidel, T. (2008). Recent Experimental Studies of Inquirey-Based Teaching: A Conceptual Review and Meta-analysis. Paper presented at the National Association of Research in Science Teaching Conference.

Furtak, E. M., Seidel, T., & & Iverson, H. (2009). Recent Experimental Studies of Inquiry-Based Teaching: A Meta-analysis and Review. Paper presented at the European Association for Research on Learning and Instruction. August 25-29, Amsterdam, Netherlands.

Furtak, E. M., Shavelson, R. J., Shemwell, J. T., & Figueroa, M. (2009). To Teach or Not to Teach Through Inquiry: Is that the question? Paper presented at the From Child to Scientist: A festschrift to honor the scientific and educational contributions of David Klahr.

Geier, R., Blumenfeld, P. C., Marx, R. W., Krajcik, J. S., Fishman, B., Soloway, E. (2008). Standardized Test Outcomes for Students Engaged in Inquiry-Based Science Curricula in the Context of Urban Reform.

IAP Science Education Programme. (2006). Report of the Working Group on International Collaboration in the Evaluation of Inquiry-Based Science Education (IBSE) programs. Retrieved December 12, 2009 from: http://www.ianas.org/Santiago_Report_SE.pdf

Instituto para la Investigación Educativa y el Desarrollo Pedagógico, IDEP, 2010

Insights: an elementary hands-on science curriculum. Human Body Systems. (Teacher's guide, 2nd ed.). (20032007). Dubuque, Iowa: Kendall/Hunt.

112 Klahr, D., & Nigam, M. (2004). The Equivalence of Learning Paths in Early Science Instruction. Effects of Direct Instruction and Discovery Learning. Psychological Science, 15(10), 661-667.

Li, M., Ruiz-Primo, M.A., & Shavelson, R.J. (2006). Towards a science achievement framework: The case of TIMSS 1999. In S. Howie & T. Plomp (Eds.), Contexts of learning mathematics and science: Lessons learned from TIMSS. London: Routledge, Pp. 291-311.

Li, M., & Shavelson, R. J. (2004). Validating the links between knowledge and test items from a protocol analysis.Unpublished manuscript.

Manual de Convivencia. (2007). PAGINA OFICIAL DEL GIMNASIO SABIO CALDAS. Retrieved December 23, 2010, from http://sabiocaldas.edu.co/Manual_convivencia.html

Ministerio de Educación Nacional (MEN). (2004). “Estándares básicos de competencias en Ciencias Naturales y Ciencias sociales.” Ministerio de Educación Nacional. Julio 2004.

Minner, D. D., Levy, A. J. and Century, J. (2010), Inquiry-based science instruction— what is it and does it matter? Results from a research synthesis years 1984 to 2002. Journal of Research in Science Teaching, 47: 474–496. doi: 10.1002/tea.20347

National Research Council. (1996). National Science Education Standards. Washington, D.C.: National Press.

National Science Foundation (1997). The Challenge and Promise of K-8 Science Education Reform. Foundations: A monograph for professionals in science, mathematics, and technology education, 1.

Patrinos, H. (2005, October 5). Education Contracting: Scope of Future Research. World Bank. Retrieved December 23, 2010, from citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.168.5608&rep=rep1&type=p df

Pine, J., Aschbacher, P., Roth, E., Jones, M., McPhee, C., Martin, C., et al. (2006). Fifth graders' science inquiry abilities: A comparative study of students in hands-on and textbook curricula. Journal of Research in Science Teaching, 43(5), 467-484.

Rodriguez, A., & Hovde, K. (2002). CiteSeerX — The Challenge of School Autonomy: Supporting Principals. CiteSeerX. Retrieved December 21, 2010, from http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.124.1536

113 Ruiz-Primo, M. A., & Furtak, E. M. (2004). Informal Formative Assessment of Students' Understanding of Scientific Inquiry: National Center for Research on Evaluation, Standards, and Student Testing, Center for the Study of Evaluation, Graduate School of Education & Information Studies, University of California, Los Angeles.

Ruiz-Primo, M. A., & Shavelson, R. (1996). Rhetoric and Reality in Science Performance Assessments: An Update. Journal of Research in Science Teaching. 33(10), 1045-1063.

Ruiz-Primo, M. A., Shavelson, R. J., Hamilton, L., & Klein, S. (2002). On the evaluation of systemic science education reform: Searching for instructional sensitivity. Journal of Research in Science Teaching, 39(5), 369-393.

Ruiz-Primo, M. A., Wiley, E., Rosenquist, A., Schultz, S., Shavelson, R. J., Hamilton, L. (1998). Performance Assessment in the Service of Evaluating Science Education Reform.

Schneider, R. M., Krajcik, J., Marx, R. W., & Soloway, E. (2002). Performance of Students in Project-Based Science Classrooms on a National Measure of Science Achievement. Journal of Research in Science Teaching, 39(5), 410-422.

Secretaría de Educación del Distrito (SED). 2009. Boletín estadístico sector educativo Bogota 2009. SEDBOGOTA - Inicio. Retrieved December 20, 2010, from http://www.sedbogota.edu.co//index.php?option=com_content&task=view&id=3 3&Itemid=174

Shadish, W. R., Cook, T. D., & Campbell, D. T. (2001). Experimental and Quasi- experimental Designs for Generalized Causal Inference: Houghton Mifflin.

Shavelson, R.J., Baxter, G.P., & Pine, J. (1991). Performance assessments in science. Applied Measurement in Education, 4, 347 – 362.

Shavelson, R. Ruiz-Primo, M.A., Li, M. and Ayala, C. C. (2003) Evaluating New Approaches to Assessing Learning. Center For Research On Evaluation, Standards, And Student Testing CSE Report 604, UCLA.

Shemwell, J. Fu, A., Figueroa, M., Davis, R & Shavelson, R. 2008. Assessment in Schools – Secondary Science in McCulloch, G., & Crook, D. The Routledge international encyclopedia of education (pp. 300-310). London: Routledge.

Schwab, J. J. (1962). The teaching of science as enquiry. In J. J. Schwab & P. F. Brandwein, The teaching of science, Cambridge, MA: Harvard University Press.

Solano-Flores, G., & Shavelson, R.J. (1997). Development of performance assessments in science: Conceptual, practical, and logistical issues. Educational Measurement:

114 Issues and Practices, 16, 16–25.

Tamir, P., Stavy, R., & Ratner, N. (1998). Teaching science by inquiry: assessment and learning. Journal of Biological Education, 33(1), 27-32.

Uribe, C., Murnane, R., & Willet, J. (2003). Why do students learn more in some classrooms than in others? Evidence from Bogota. Graduate School of Education. Retrieved December 10, 2010, from gseacademic.harvard.edu/.../Uribe_Murnane_Willett_2003.pdf

Villa, L., & Duarte, J. (2002). Los colegios en concesión de Bogotá, Colombia: Una experiencia innovadora de gestión escolar reformas o mejoramiento continuo. Comisión Vallecaucana por la Educación. Retrieved December 10, 2010, from Http://www.cve.org.co/pdf/nuevos2004/colegiosconcesion.pdf

Von Secker, C. E., & Lissitz, R. W. (1999). Estimating the Impact of Instructional Practices on Student Achievement in Science. Journal of Research in Science Teaching, 36(10), 1110-1126.

Walker, D. F., & Schaffarzick, J. (1974). Comparing Curricula. Review of Educational Research, 44(1), 83-111.

115

APPENDICES12

12 The formatting of some appendices was changed to fit in this dissertation. 116

Appendix A. Interview Instrument for Teachers Participating in the Study. Información de los profesores Esta encuesta busca tener información sobre las características de los profesores que participaron en la investigación sobre la enseñanza de las ciencias. Esta encuesta es de carácter confidencial. Muchas gracias por su colaboración.

Nombre: ______

Institución: ______

Celular: ______

Correo: ______

1 ¿Cuántos años tiene? 5 Encierre solo una opción Cursos que dicta (ej. # de estudiantes por curso Menos de 25 ------1 5A) (ej. 42) 25–29------2

30–39------3 40–49------4 50–59------5 60 o más ------6 2 Sexo Encierre solo una opción Mujer------1 6 Hombre------2 Jornada (marque con una X):

Mañana ______3 Tarde ______Labores que desempeña en la institución: Única ______

______

7 4 ¿Cuánto tiempo lleva enseñando en la institución Grados en que dicta (marque con una X): actual?

1. ______2. ______Número de años 3. ______8 4. ______Después de obtener su diploma, ¿cuántos años lleva 5. ______enseñando en total?

6. ______7. ______Número de años enseñando tiempo completo

117 9 ¿Cuál es el mayor grado de estudio alcanzado? Encierre solo una opción 10 No terminó el bachillerato ------1 A. Mientras hacía su educación universitaria, Completó el bachillerato ------2 ¿cuál fue el nivel de su licenciatura? Usted es normalista superior ------3 Encierre solo una opción Obtuvo un título de licenciado o pedagogo ------4 a) Educación – Preescolar ------1 Obtuvo un título de pregrado ------5 b) Educación – Básica primaria ------2 Cual: ______c) Educación – Secundaria ------3 Obtuvo un título de especialización ------6 d) Educación - Otra ------4 Cual: ______B. ¿Cuál fue su énfasis de su licenciatura? Obtuvo un título de un programa en Maestría--7 Si aplica encierre más de una Cual: ______a) Matemáticas ------1 b) Biología ------2 c) Química ------3 d) Física ------4 e) Humanidades ------5 f) Lenguajes ------6 g) Artes ------7

h) Otra área ------8

Enseñanza de Ciencia

118 11 ¿Qué tan bien preparado se siente para enseñar en las siguientes áreas de ciencia? Encierre solo una opción por fila Muy bien preparado Algo preparado No muy bien preparado No aplica A. Ciencias de la vida a) Estructuras principales del cuerpo y sus funciones en los seres humanos y otros organismos (plantas y animales) ------1 -- 2 -- 3 --4 b) La reproducción y el desarrollo de las plantas y animales (transmisión de características generales, los ciclos de vida de organismos conocidos)------1 -- 2 -- 3 --4 c) Conexiones entre los diferentes sistemas del cuerpo humano ------1 -- 2 -- 3 --4 d) La salud humana (ejemplo: la transmisión, prevención de las enfermedades transmisibles los signos de salud/enfermedad , dieta, ejercicio) - 1 -- 2 -- 3 --4 B. Ciencias Físicas a) Clasificación de objetos o materiales sobre la base de propiedades físicas (Ejemplo: la masa, forma, volumen, color, dureza, textura, calor / conductividad eléctrica. Atracción magnética) ------1 -- 2 -- 3 --4 b) Las Fuentes comunes de energía y las formas y sus usos prácticos (ejemplo: el viento, el sol, la electricidad, el agua, alimentos)------1 -- 2 -- 3 --4 c) La luz (ejemplo: las Fuentes y el comportamiento) ------1 -- 2 -- 3 --4 d) Circuitos eléctricos ------1 -- 2 -- 3 --4 e) Propiedades magnéticas ------1 -- 2 -- 3 --4

12 de ciencias reciben a la semana? Los estudiantes de quinto grado, ¿cuántas horas

119 ______(Horas) El total debe sumar 100% a) Ciencia de la vida (incluye cuestiones ambientales) ------_____% b) Ciencia física (incluye temas 13 de física y química) ------_____% En clase de ciencias con qué frecuencia los c) Ciencia de la tierra (incluye estudiantes hacen lo siguiente: la tierra y el sistema solar)------_____% Encierre solo una opción por fila d) Otro, por favor, especifique: Todas o casi todas las clases Aproximadamente la mitad de las clases ______-----____% Algunas clases Nunca Total ------100%

a) Observan los fenómenos naturales tales como el 15 clima o el crecimiento de A. ¿Qué libros de texto o guías utiliza para enseñar una planta y describe ciencias a los estudiantes de quinto grado? lo que ve ------1 -- 2 -- 3 --4 b) Me observan haciendo un experimento de ciencia- 1 -- 2 -- 3 --4 ______c) Hacen experimentos o ______proyectos ------1 -- 2 -- 3 --4 ______d) Diseñan o planean experimentos o B. ¿Cómo utiliza los libros para enseñar ciencias a los proyectos ------1 -- 2 -- 3 --4 estudiantes de quinto grado? e) Trabajan en pequeños Puede encerrar ambas grupos en experimentos o proyectos------1 -- 2 -- 3 --4 Como base para mis clases ------1 f) Leen sus libros de texto u Como recurso adicional------2 otros recursos ------1 -- 2 -- 3 --4 g) Memorizan factores y principios ------1 -- 2 -- 3 --4 16 h) Dan explicaciones sobre En los últimos dos años, ¿En qué talleres ha algo que se está estudiando------1 -- 2 -- 3 --4 participado sobre estos temas? i) Relacionan lo aprendido Encierre solo una opción por fila de ciencia con su No vida cotidiana ------1 -- 2 -- 3 --4 Si j) Trabajan individualmente a) Contenido en ciencias (disciplinar) --1--2 a su propio ritmo ---- 1 -- 2 -- 3 --4 b) Enseñanza en ciencias (metodología) 1--2 c) Currículo en ciencias ------1--2 d) Integración de tecnologías de información en la ciencia------1--2 e) Mejorar el pensamiento critico o habilidades de investigación ------1--2 f) Evaluaciones en ciencias ------1--2 g) Enseñanza de la ciencia basada en indagación ------1---2 h) Otros. ______

14 A finales del año escolar pasado (2010), ¿qué porcentaje del tiempo de enseñanza usó en cada una de las siguientes áreas? Escriba el porcentaje 120 17 Describa en un párrafo como se describirá usted como profesor de ciencias, puede dar ejemplos de las actividades que realiza al igual que contar sobre los textos y materiales que utiliza.

______

Muchas gracias.

121

Appendix B. Final Version of the Pretest

GRADO 5º PRE-PRUEBA DE CUERPO HUMANO

Nombre: ______Apellido:______

Nombre del colegio: ______Curso: ______

Nombre del profesor(a) de ciencias: ______

Eres: Niño Niña

¿Desde qué grado estás en este colegio?

0 1 2 3 4 5

A continuación vas a encontrar unas preguntas que debes contestar con mucho cuidado.

Instrucciones: Para contestar las siguientes preguntas de opción múltiple sigue estas instrucciones: 1. Lee cuidadosamente cada pregunta y elige la opción correcta. 2. Encierra con un círculo la respuesta correcta. 3. No hay límite de tiempo. Tienes el tiempo suficiente para contestar todas las preguntas.

122 EJEMPLOS Contesta estos ejemplos para que practiques. Encierra con un círculo la respuesta correcta.

EJEMPLO 1

¿Cuál es el resultado de sumar 25 y 25? A. 30. B. 40. C. 50. D. 60.

ASÍ DEBIÓ QUEDAR ENCERRADA TU RESPUESTA: EJEMPLO 1

¿Cuál es el resultado de sumar 25 y 25? A. 30. B. 40. C. 50. D. 60.

123

EJEMPLO 2

¿En qué mes se celebra la independencia de Colombia? A. Septiembre. B. Julio. C. Octubre. D. Enero.

ASÍ DEBIÓ QUEDAR ENCERRADA TU RESPUESTA: EJEMPLO 2

¿En qué mes se celebra la independencia de Colombia? A. Septiembre. B Julio. C. Octubre. D. Enero.

AHORA DETENTE Y ESPERA A QUE TE DEN LAS INSTRUCCIONES PARA

PODER EMPEZAR LA PRUEBA.

124 CHOMP01 1. ¿De qué están compuestos todos los seres vivos?

A. Órganos B. Sistemas C. Células D. Tejidos

CHOMP02 2. ¿Cuál es la función del corazón?

A. Bombear sangre. B. Combatir enfermedades. C. Intercambiar gases. D. Regular la temperatura.

CHOMP03 3. ¿Cuál es el mejor lugar para sentir el pulso?

A. B. C. D.

CHOMP04 4. ¿Qué sistemas trabajan juntos durante el proceso de respiración?

A. Respiratorio y reproductor. B. Circulatorio y respiratorio. C. Respiratorio y digestivo. D. Nervioso y respiratorio.

CHOMP05

125 5. ¿Cuál es la función del sistema digestivo? A. Transformar los alimentos que entran al cuerpo. B. Llevar oxígeno a todo el cuerpo. C. Transportar los nutrientes a todo el cuerpo. D. Regular la temperatura del cuerpo. CHOMP06 6. ¿Qué sistema incluye el corazón y los vasos sanguíneos? A. Sistema digestivo. B. Sistema nervioso. C. Sistema circulatorio. D. Sistema respiratorio. CHOMP07 7. ¿Qué pasa con tu pulso y tu ritmo respiratorio cuando corres muy rápido? A. Tu pulso y tu ritmo respiratorio aumentan. B. Tu pulso aumenta y tu ritmo respiratorio disminuye. C. Tu pulso y tu ritmo respiratorio disminuyen. D. Tu pulso disminuye y tu ritmo respiratorio aumenta.

CHOMP08 8. ¿Cuál de las siguientes partes del cuerpo participa en la digestión mecánica? A. El riñón. B. Los dientes. C. El corazón. D. El pulmón.

CHOMP09 9. Las personas respiran agitadamente cuando hacen ejercicio. Esto se explica porque al hacer esfuerzo físico A. disminuye la necesidad de nutrientes en el cuerpo. B. aumenta la necesidad de oxígeno en el cuerpo. C. disminuye la necesidad de flujo sanguíneo en el cuerpo.

D. aumenta la necesidad de dióxido de carbono en el cuerpo.

CHOMP10

126 10. El latido del corazón es un movimiento A. aprendido. B. involuntario. C. controlado. D. voluntario. CHOMD011 11. Un investigador quiere saber si la cantidad de glóbulos rojos en la sangre de las mujeres embarazadas es igual o diferente al de las mujeres adultas que no están embarazadas. Para averiguar lo anterior, ¿de cuál de los siguientes grupos de mujeres le sugerirías al investigador que extrajera y examinara la sangre?

A.

B.

C.

D.

CHOMP12

127

12. ¿Qué proceso ocurre en la boca para que comience la digestión química de un alimento?

A. Masticación B. Salivación C. Ingestión D. Peristaltismo

CHOMP13 13. ¿Cuál de las siguientes opciones es un ejemplo de un movimiento voluntario? A. Sudar B. Aplaudir C. Estornudar D. Digerir

CHOMP14 14. Tomás quiere medir sus pulsaciones en un minuto. Para eso él debe poner sus dedos en algún lugar donde sienta el pulso, y contar las pulsaciones durante

A. 15 segundos y multiplicarlas por 2. B. 20 segundos y multiplicarlas por 4. C. 15 segundos y multiplicarlas por 4. D. 20 segundos y multiplicarlas por 2.

128

CHOMP15 15. ¿En qué lugar de la siguiente figura se realiza la conexión entre el sistema circulatorio y el digestivo?

A. Esófago B. Estómago C. Intestino grueso D. Intestino Delgado

CHOMP16 16. ¿Para qué sirve un estetoscopio?

A. Ver la parte de adentro de la oreja. B. Mirar la pupila en el ojo. C. Medir las pulsaciones en el brazo. D. Oír las respiraciones de los pulmones.

129

CHOMP17

17. ¿Cuáles pasos seguirías para observar células de cebolla en el microscopio? A.

B.

C.

D.

130

CHOMP18 18. Daniel puso una bolsa semipermeable con colorante en una botella con agua limpia como se muestra en la siguiente figura:

¿Cuál de las siguientes botellas muestra lo que pasó al otro día? A. B.

C. D.

CHOMP19 19. Los glóbulos rojos permiten que la sangre A. transporte nutrientes a todo el cuerpo. B. transporte desechos a todo el cuerpo. C. transporte oxígeno a todo el cuerpo. D. transporte agua a todo el cuerpo.

131

CHOMD02 20. En la huerta hicieron el experimento que se muestra a continuación:

Con este experimento se quiere investigar el efecto de:

A. el tipo de suelo en el crecimiento de las plantas. B. la forma de las hojas en el crecimiento de las plantas. C. el tamaño del tallo en el crecimiento de las plantas. D. el sol en el crecimiento de las plantas.

CHOMP21 21. Ana quiere saber si la harina tiene almidón. ¿Qué debe agregar Ana a la harina para saber si tiene almidón?

A. B. C. D.

132 CHOMP22

22. La figura de abajo representa un modelo de caja torácica y sus diferentes partes. ¿Qué representan la primera y segunda parte del tubo plástico?

A. La primera parte es el bronquio y la segunda parte es la tráquea. B. La primera parte es la tráquea y la segunda parte es el diafragma. C. La primera parte es la tráquea y la segunda parte es el bronquio. D. La primera parte es el diafragma y la segunda parte es el bronquio. CHOMP23 23. ¿Cuándo digieres comida, dónde la utiliza tu cuerpo?

A. Sólo en la sangre de tu cuerpo. B. Sólo el estómago de tu cuerpo. C. Sólo los pulmones de tu cuerpo. D. En las células de tu cuerpo.

CHOMP24 24. ¿Cómo trabajan juntos el corazón y los pulmones? A. El movimiento de los pulmones ayuda al corazón a bombear sangre. B. Los pulmones dan oxígeno a la sangre que el corazón bombea a través del cuerpo. C. El corazón y los pulmones trabajan juntos para ayudar a digerir la comida. D. El corazón bombea la sangre, y los pulmones circulan la sangre a través del cuerpo.

133 CHOMP25 25. ¿Cuál de los siguientes diagramas describe el funcionamiento de los riñones?

A. B.

C. D.

CHOM P26

26. En el intestino las vellosidades (pelitos) permiten que se difundan

A. menos nutrientes porque la superficie disminuye.

B. más nutrientes porque la superficie aumenta. C. menos nutrientes porque la superficie aumenta. D. más nutrientes porque la superficie disminuye.

CHOMP27 27. ¿En cuál de los siguientes procesos del cuerpo humano ocurre difusión?

A. Masticación de alimentos en la boca. B. Transformación de alimentos en el estómago. C. Absorción de nutrientes en el intestino delgado. D. Bombeo de sangre desde el corazón.

CHOMD03 134 28. Unos estudiantes midieron durante ocho meses la temperatura ambiental y la temperatura de unos papagayos; con los resultados elaboraron la siguiente gráfica:

Teniendo en cuenta la información de la gráfica. ¿Cuál de las siguientes conclusiones es la más acertada?

A. La temperatura ambiental influye sobre la temperatura de los papagayos. B. Los papagayos mantienen constante la temperatura del cuerpo. C. Los papagayos cambian su temperatura a lo largo del año. D. La temperatura ambiental en el zoológico es constante.

135

Appendix C. Final Version of the Posttest. GRADO 5º CUADERNILLO I PRUEBA DE CIENCIAS

Nombres: ______Apellidos:______

Nombre del colegio: ______Curso: ______

Nombre del profesor(a) de ciencias: ______

Eres: Niño Niña

¿Desde qué grado estás en este colegio?

0 1 2 3 4 5

A continuación vas a encontrar unas preguntas que debes contestar con mucho cuidado. Esta prueba tiene dos partes, una de opción múltiple y la otra en donde debes escribir tu respuesta.

136 EJEMPLOS Contesta estos ejemplos para que practiques. Encierra con un círculo la respuesta correcta.

EJEMPLO 1

¿Cuál es el resultado de sumar 25 y 25? A. 30. B. 40. C. 50. D. 60.

ASÍ DEBIÓ QUEDAR ENCERRADA TU RESPUESTA: EJEMPLO 1

¿Cuál es el resultado de sumar 25 y 25? A. 30. B. 40. C) 50. D. 60.

137 EJEMPLO 2

¿Cuál grado estas cursando? A. Cuarto de primaria. B. Quinto de primaria. C. Sexto de bachillerato. D. Séptimo de bachillerato.

ASÍ DEBIÓ QUEDAR ENCERRADA TU RESPUESTA: EJEMPLO 2

¿Cuál grado estas cursando? A. Cuarto de primaria. B Quinto de primaria. C. Sexto de bachillerato. D. Séptimo de bachillerato.

AHORA DETENTE Y ESPERA A QUE TE DEN LAS INSTRUCCIONES PARA

PODER EMPEZAR LA PRUEBA. NO HAY LÍMITE DE TIEMPO. TIENES EL

TIEMPO SUFICIENTE PARA CONTESTAR TODAS LAS PREGUNTAS.

PRIMERA PARTE 138

CHOMP02 1. ¿Cuál es la función del corazón?

A. Intercambiar gases. B. Combatir enfermedades. C. Bombear sangre. D. Regular la temperatura. CHOMP01 2. ¿De qué están compuestos todos los seres vivos?

A. Órganos B. Sistemas C. Células D. Tejidos CHOMP03 3. ¿Cuál es el mejor lugar para sentir el pulso?

A. B. C. D.

CHOMP04 4. ¿Qué sistemas trabajan juntos durante el proceso de respiración?

A. Respiratorio y reproductor. B. Circulatorio y respiratorio. C. Respiratorio y digestivo. D. Nervioso y respiratorio.

CHOMP05 5. ¿Cuál es la función del sistema digestivo?

A. Transformar los alimentos que entran al cuerpo. 139 B. Llevar oxígeno a todo el cuerpo. C. Llevar los nutrientes a todo el cuerpo. D. Regular la temperatura del cuerpo.

CHOMP06 6. ¿Qué sistema incluye el corazón y los vasos sanguíneos? A. Sistema digestivo. B. Sistema nervioso. C. Sistema circulatorio. D. Sistema respiratorio.

CHOMD06 7. El maestro Carlos y mezcló comida y líquido con químicos. Esta mezcla fue filtrada para que los nutrientes y el agua fueran removidos. ¿Cuál de los siguientes sistemas relacionarías con el experimento? A. Nervioso. B. Digestivo. C. Circulatorio. D. Respiratorio.

CHOMP07 8. ¿Qué pasa con tu pulso y tu ritmo respiratorio cuando corres muy rápido? A. Tu pulso y tu ritmo respiratorio aumentan. B. Tu pulso aumenta y tu ritmo respiratorio disminuye. C. Tu pulso y tu ritmo respiratorio disminuyen. D. Tu pulso disminuye y tu ritmo respiratorio aumenta.

CHOMP08 9. ¿Cuál de las siguientes partes del cuerpo participa en la digestión mecánica? A. El riñón. B. Los dientes. 140 C. El corazón. D. El pulmón.

CHOMP09 10. Las personas respiran agitadamente cuando hacen ejercicio. Esto se explica porque al hacer esfuerzo físico

A. disminuye la necesidad de nutrientes en el cuerpo.

B. aumenta la necesidad de dióxido de carbono en el cuerpo. C. disminuye la necesidad de flujo sanguíneo en el cuerpo. D. aumenta la necesidad de oxígeno en el cuerpo

CHOMP10 11. El latido del corazón es un movimiento

A. aprendido. B. involuntario. C. controlado. D. voluntario.

141

CHOMD07 12. El doctor Pérez lleva un registro del ritmo de la respiración de las personas cuando están descansando. El hizo la siguiente tabla: Ritmo de la respiración Persona Respiración por minuto El bebé Pedro 38 Niña de 7 años 25 Niño de 7 años 25 Niño de 10 años 20 Mamá 16

La tabla sugiere que

E. Los niños respiran más rápido que las niñas. F. Las personas mayores respiran más rápido que las menores. G. Las niñas respiran más rápido que los niños. H. Las personas menores respiran más rápido que las mayores.

CHOMD08 142 13. Cecilia realizó el siguiente experimento: en un plato con una servilleta mojada puso cuatro fríjoles y en otro plato lleno con agua puso otros cuatro fríjoles, luego colocó los dos platos al borde de una ventana y observó lo que sucedía. Unos días después, Cecilia observó que en el plato con una servilleta mojado los fríjoles germinaron, mientras que en el plato con agua no sucedió nada.

Lo que tiene que hacer Cecilia para comprabar los resultados de su experimento es

A. repetir exactamente el mismo experimento. B. usar el plato con una servilleta húmeda. C. usar dos platos cada uno cubierto con agua. D. repetir el experimento usando otro tipo de semillas.

CHOMD01

143 14. Un investigador quiere saber si la cantidad de glóbulos rojos en la sangre de las mujeres embarazadas es igual o diferente al de las mujeres adultas que no están embarazadas. Para averiguar lo anterior. ¿De cuál de los siguientes grupos de mujeres le sugerirías al investigador que extrajera y examinara la sangre?

A.

B.

C.

D.

CHOMP11

144 15. ¿Qué proceso ocurre en la boca para que comience la digestión química de un alimento? A. Masticación

B. Salivación

C. Peristaltismo

D. Ingestión

CHOMP12 16. ¿Cuál de las siguientes opciones es un ejemplo de un movimiento voluntario? A. Sudar B. Aplaudir

C. Estornudar

D. Digerir

CHOMP13 17. Tomás quiere medir sus pulsaciones en un minuto. Para eso, él debe poner sus dedos en algún lugar donde sienta el pulso, y contar las pulsaciones durante

A. 15 segundos y multiplicarlas por 2. B. 20 segundos y multiplicarlas por 4. C. 15 segundos y multiplicarlas por 4. D. 20 segundos y multiplicarlas por 2.

CHOMP15 18. ¿Para qué sirve un estetoscopio?

A. Ver la parte de adentro de la oreja. B. Mirar la pupila en el ojo. C. Medir las pulsaciones en el brazo. D. Oír las respiraciones de los pulmones. CHOMP14 145 19. ¿En qué lugar de la siguiente figura se realiza la conexión entre el sistema circulatorio y el digestivo?

A. Esófago B. Estómago C. Intestino grueso D. Intestino Delgado

CHOMP16

146 20. ¿Cuáles pasos seguirías para observar células de cebolla en el microscopio? A.

B.

C.

D.

CHOMP17

147 21. Ana puso una bolsa semipermeable con colorante en una botella con agua limpia como se muestra en la siguiente figura:

¿Cuál de las siguientes botellas muestra lo que pasó al otro día?

A. B.

C. D.

CHOMP18 22. Los glóbulos rojos permiten que la sangre transporte A. nutrientes a todo el cuerpo. B. desechos a todo el cuerpo. C. oxígeno a todo el cuerpo. D. agua a todo el cuerpo.

CHOMD02

148 23. En la huerta hicieron el experimento que se muestra a continuación:

Con este experimento se quiere investigar el efecto de:

A. el tipo de suelo en el crecimiento de las plantas. B. la forma de las hojas en el crecimiento de las plantas. C. el tamaño del tallo en el crecimiento de las plantas. D. el sol en el crecimiento de las plantas.

CHOMP19 24. Luis quiere saber si la harina tiene almidón. ¿Qué debe agregar Luis a la harina para saber si tiene almidón?

A. B. C. D.

CHOMP20

149 25. La figura de abajo representa un modelo de caja torácica y sus diferentes partes. ¿Qué representan la primera y segunda parte del tubo plástico?

A. La primera parte es el bronquio y la segunda parte es la tráquea. B. La primera parte es la tráquea y la segunda parte es el diafragma. C. La primera parte es la tráquea y la segunda parte es el bronquio. D. La primera parte es el diafragma y la segunda parte es el bronquio.

CHOMP21 26. ¿Cuándo digieres comida, dónde la utiliza tu cuerpo?

A. Sólo en la sangre de tu cuerpo. B. Sólo el estómago de tu cuerpo. C. Sólo los pulmones de tu cuerpo. D. En las células de tu cuerpo. CHOMP22 27. ¿Cómo trabajan juntos el corazón y los pulmones? A. El movimiento de los pulmones ayuda al corazón a bombear sangre. B. Los pulmones dan oxígeno a la sangre que el corazón bombea a través del cuerpo. C. El corazón y los pulmones trabajan juntos para ayudar a digerir la comida. D. El corazón bombea la sangre, y los pulmones circulan la sangre a través del cuerpo. CHOMP23

150 28. ¿Cuál de los siguientes diagramas describe el funcionamiento de los riñones?

A. B.

C. D.

CHOM P24

29. En el intestino las vellosidades (pelitos) permiten que se difundan A. menos nutrientes porque la superficie disminuye.

B. más nutrientes porque la superficie aumenta. C. menos nutrientes porque la superficie aumenta. D. más nutrientes porque la superficie disminuye.

CHOMP25

151 30. En el siguiente dibujo se comparan un pedazo de tela roja con un pedazo de hoja de un árbol.

Al mirar la hoja y la tela te das cuenta de que una está viva y la otra no. ¿Cuál de las siguientes características te permite afirmar que la hoja está viva y la tela no?

A. El material de la tela es ordenado y el de la hoja es desordenado. B. La hoja está compuesta de células y la tela de fibras. C. El color de la tela es rojo y el de la hoja es verde. D. La superficie de la hoja es suave y la de la tela es áspera.

CHOMP25 31. ¿En cuál de los siguientes procesos del cuerpo humano ocurre difusión?

A. Transformación de alimentos en el estómago. B. Masticación de alimentos en la boca. C. Absorción de nutrientes en el intestino delgado. D. Bombeo de sangre desde el corazón.

CHOMD03

152 32. Unos estudiantes midieron durante ocho meses la temperatura ambiental y la temperatura de unos papagayos; con los resultados elaboraron la siguiente gráfica:

Teniendo en cuenta la información de la gráfica. ¿Cuál de las siguientes conclusiones es la más acertada?

A. La temperatura ambiental influye sobre la temperatura de los papagayos. B. Los papagayos mantienen constante la temperatura del cuerpo. C. Los papagayos cambian su temperatura a lo largo del año. D. La temperatura ambiental en el zoológico es constante.

153 RESPONDE LAS PREGUNTAS 33 Y 34 DE ACUERDO CON LA SIGUIENTE INFORMACIÓN Javier quiere investigar la forma de vida de las tijeretas y para esto puso tierra húmeda sin luz en un lado de la caja y tierra seca con luz al otro lado de la caja; luego metió ocho tijeretas. El siguiente dibujo muestra el experimento.

CHOMD04 33. ¿Qué pregunta se puede responder a partir de este experimento?

A. ¿Cuánto tiempo vive una tijereta? B. ¿Cómo se reproducen las tijeretas? C. ¿Dónde viven las tijeretas? D. ¿Qué comen las tijeretas?

CHOMD05

154 34. Javier llegó a la conclusión de que las tijeretas prefieren la tierra húmeda y la oscuridad. ¿Cuáles datos le permitieron a Javier llegar a esta conclusión?

A. Las 8 tijeretas se quedaron en la caja de madera. B. Las tijeretas se distribuyeron en los dos lados de la caja. C. De las 8 tijeretas, 7 se fueron a la tierra húmeda y sin luz. D. Las 8 tijeretas pueden vivir en las condiciones del experimento.

¡FELICITACIONES! YA TERMINASTE LA PRIMERA PARTE DE LA PRUEBA. AHORA, RESPONDERÁS TRES PREGUNTAS DONDE DEBES ESCRIBIR TU RESPUESTA.

CONTINÚA CON LA SEGUNDA PARTE DE LA PRUEBA. SEGUNDA PARTE CHOMAP01 35. ¿Cuál es la función de la sangre en tu cuerpo? ______

CHOMAP02 36. ¿Cómo están relacionados el sistema digestivo y el sistema circulatorio? ______

155 CHOMAD01 37. ¿Qué necesitan las células de tu cuerpo humano para sobrevivir y de dónde obtienen esas necesidades?

156 Appendix D. Implementation Manual for the Paper and Pencil Test.

MANUAL DE ADMINISTRACIÓN DE LA PRUEBA DE CUERPO HUMANO

Objetivo de la prueba: Evaluar los aprendizajes en cuerpo humano de los niños y niñas de diferentes instituciones. Información de la prueba: El cuadernillo está constituido por preguntas de opción múltiple y preguntas abiertas. Aplicación de la Prueba: Todo lo referente a la aplicación de la prueba se dirá durante el simulacro. Información para utilizar antes de ir a la institución a aplicar la prueba. Asegúrese de tener completa la siguiente información:

Nombre de la institución: ______Dirección de la institución: ______Contacto de la institución: ______Celular: ______

Información para utilizar antes de la aplicación de la prueba. Materiales 1. Revise el paquete de pruebas que le fue asignado teniendo en cuenta los siguientes criterios: ! Debe tener una hoja que corresponde al rótulo que debe fijar en la puerta con cinta y dice "Favor No Molestar, Administración de Prueba en Progreso". ! Debe tener una hoja llamada "Formato de Administración Prueba de Sistemas del Cuerpo Humano ". En esta hoja debe anotar las irregularidades y aprendizajes que ocurran durante la aplicación. ! Debe tener un esfero con cinta y un marcador de tablero. Estudiantes 1. Cuando llegue al salón salude a la persona encargada de los estudiantes y a los estudiantes. 2. Revise que la ubicación de los estudiantes sea la adecuada para la aplicación de la prueba. Es decir, los estudiantes deben estar lejos los unos de los otros de tal manera que no puedan hacer copia. 3. Asegúrese que los estudiantes tenga lápiz y borrador. En caso de no tener lápiz, pueden utilizar esfero. 4. Comprometa a los estudiantes con la prueba y dígales lo siguiente: ! Esta prueba la van a presentar muchos niños en el país, agradecemos su participación y esperamos su mejor esfuerzo. ! Pídales que contesten todas las preguntas y que pongan asterisco frente al número de la pregunta cuando no conozcan las respuesta. ! Dígales que usted no les podrá ayudar a contestar las preguntas durante la aplicación de la prueba. ! Enfatice a los estudiantes que la prueba la deben realizar de manera individual y en silencio.

Información para utilizar durante la aplicación de la prueba.

157 1. Entregue los cuadernillos con la carátula hacia abajo y dígale a los estudiantes que no lo empiecen a diligenciar hasta que se les informe. 2. Diligencie las dos primeras páginas con ellos y antes de comenzar la prueba pregúnteles si tienen preguntas sobre la aplicación de la prueba. 3. Camine silenciosamente en el salón y revise que los estudiantes estén contestando las preguntas. Asegúrese que los niños estén contestando las preguntas de la manera correcta. 4. Diligencie el formato de administración (hora de entrega del primer y último estudiante). Anote irregularidades y aprendizajes en caso de ser necesario. 5. Si tiene preguntas de los alumnos, siempre contésteles usando algunas de las siguientes respuestas: Lee nuevamente la pregunta; si no puedes contestarla, pasa a la siguiente. Contéstala lo mejor que puedas. ¿Cuál crees que sea la respuesta correcta? Selecciona entonces esa respuesta. 6. No dé ningún tipo de retroalimentación a los estudiantes, ni ninguna ayuda mientras contesten las preguntas. No clarifique las preguntas. No responda a preguntas de contenido. No interactúe con los estudiantes a menos que estén interrumpiendo el óptimo trascurso de la aplicación de la prueba. 7. Antes de recoger cualquier cuadernillo dígale al estudiante que revise si contestó todas las preguntas y anote en la parte superior derecha la hora de finalización de la prueba. 8. Pregunte a la profesora qué debe hacer con los estudiantes que finalicen la prueba. • Información para utilizar después de la aplicación de la prueba. 1. Guarde todos los cuadernillos en el orden en que los recibió, junto con el formato de administración de la prueba y el rótulo de la puerta. 2. Agradézcales a los estudiantes y a la profesora por su colaboración, así como nosotros agradecemos su colaboración. MUCHAS GRACIAS POR SU ADMINISTRACIÓN DE LA PRUEBA

158

Appendix E. Items included in each scale.

Pre-Test Total Declarative Procedural Schematic Proximal Distal 1 X X 2 X X 3 X X 4 5 X X 6 X X 7 X X 8 X X 9 X X 10 X X 11 X X 12 X X 13 X X 14 X X 15 X X 16 X X 17 X X 18 X X 19 20 X X 21 X X 22 X X 23 X X 24 X X 25 X X 26 X X 27 X X 28 X X

159

Post Total Declarative Procedural Schematic Proximal Distal 1 X X 2 X X

3 X X 5 X X 6 X X 7 X X 8 X X 9 X X 10 X X

11 X X

12 X X 13 14 X X 15 X X 16 X X 17 X X 18 X X 19 X X 20 X X 21 X X 22 23 X X 24 X X 25 X X 26 X X 27 X X 28 X X 29 X X 30 X X 31 X X 32 X X 33 X X 34 X X 35 X X 36 X X 37 X X

160

Post-Test Total Control IBSE Pulse_Items 1 X X 2 X 3 X 4 5 X 6 X X 7 X 8 X 9 X 10 X X 11 X 12 X 13 14 15 16 17 X X 18 X 19 20 21 X 22 23 24 X 25 X 26 27 X 28 X 29 30 31 32 33 34 35 X 36 X 37

161

Appendix F. Description of the paper towels performance assessment including student notebook.13

TOALLAS DE PAPEL

CIFE-PQC-DA-2009-XXX: PRUEBA DE DESEMPEÑO

1. INSTRUCCIONES DE ADMINISTRACIÓN DE LA PRUEBA DESCRIPCIÓN: Los estudiantes deben descubrir cuál toalla de papel puede absorber la mayor y la menor cantidad de agua.

TIEMPO: Otorgue 50 minutos para su realización.

MATERIALES POR ESTUDIANTE • 1 cuadernillo (1 formato observador, 1 formato instrucciones, 1 cuaderno de apuntes científicos)

• 1 gotero

• 3 cajas de petri con tapas

• 1 tijeras

• 1 pinza

13 CIFE-PQC-DA-2009-XXX: Esta prueba fue desarrollada por Richard Shavelson, con una subvención de la U.S. NationalScience Foundation, y traducida al español por el equipo de PeqCien, con la debida autorización.

Para mayor información, consultar el sitio Web del Stanford Education Assessment Laboratory: http://www.stanford.edu/dept/SUSE/SEAL/

162 • 1 lupa

• 1 recipiente medidor de 250 ml

• 1 recipiente medidor de 100 ml

• 1 balanza para comida

• 1 bandeja (de plástico o de aluminio, aproximadamente 2cm de profundidad, 30cm de largo, 18cm de ancho)

• 1 regla de 30cm

• 1 embudo

• 3 vasos plásticos transparentes (de aproximadamente 1 taza)

• 1 recipiente con agua

• 3 tipos de toallas de papel (papel de cocina): cada una con características (por ej., una blanco, una con dibujos azules, etc.). Son 3 pedazos de cada tipo de toalla (en total 9)

PREPARACIÓN: Organice los materiales en cada mesa de manera que los estudiantes puedan verlos todos: construya un semi-círculo frente al estudiante, tal como se muestra en la foto (arriba). Todo el material debe estar al alcance del estudiante, ninguna pieza debe sobresalir. En otras palabras, se debe evitar sugerir al estudiante que realice la investigación de una manera determinada. Se debe indicar con un letrero el nombre de cada uno de los materiales.

NOTA: Asegúrese de contar con todos los documentos necesarios para la administración y evaluación de la prueba, a saber

• Formato del Observador

• Cuadernillo Estudiantes-Instrucciones

• Cuaderno de Apuntes Científicos

163

TOALLAS DE PAPEL

Nombre: ______Apellido: ______Curso: ______Colegio: ______

Esta es una prueba y tienes dos retos:

1. Averiguar cuál toalla de papel puede absorber más agua.

2. Averiguar cuál toalla de papel puede absorber menos agua.

Para hacer esta prueba, sigue estas indicaciones:

A. Observa cada pieza del material.

B. Piensa cómo podrías utilizar algunas piezas de este material para hacer un experimento y resolver los retos.

C. Recuerda que no necesitas usar todo el material.

D. Utiliza el otro lado de esta hoja para hacer tus anotaciones.

E. Tienes máximo 50 minutos para completar la prueba.

Tienes alguna pregunta?

POR FAVOR, DA VUELTA A LA HOJA PARA INICIAR LA PRUEBA

164

HOJA DE ANOTACIONES

RESULTADOS: Marca la opción correcta según tus observaciones. Asegúrate de marcar tanto la toalla que más absorbe, como la toalla que menos absorbe. 1. La toalla de papel que absorbe más 2. La toalla de papel que absorbe menos agua es: agua es: ! Blanca ! Blanca ! Azul ! Azul ! Rosada-verde ! Rosada-verde

165

CUADERNO DE APUNTES CIENTÍFICOS

Nombre: ______Apellido: ______Curso: ______Colegio: ______

A. A partir de tu experimento, ¿cómo supiste cuál toalla de papel absorbe más agua y cuál toalla de papel absorbe menos agua?

¿Cómo supiste que la toalla ______absorbía más agua? ¿Por qué? ______

¿Cómo supiste que la toalla ______absorbía menos agua? ¿Por qué? ______

POR FAVOR, DA VUELTA A LA HOJA

166 B. Aquí hay algunas preguntas sobre tu experimento. Responde cada pregunta con “sí” o “no”. Las preguntas están relacionadas con el experimento que te ayudó a saber cuales toallas absorbían más o menos agua.

1. ¿Todas las toallas de papel eran del mismo tamaño?

2. ¿Todas las toallas de papel estaban completamente mojadas?

3. ¿Utilizaste la misma cantidad de agua para mojar cada toalla de papel?

C. Carolina cree que todas las toallas de papel deben estar completamente mojadas antes de decidir cuál absorbe más agua y cuál absorbe menos agua. Luis no cree que las toallas de papel deban estar completamente mojadas. ¿Qué piensas tú y por qué? ______

FIN DE LA PRUEBA ¡GRACIAS!

167

Appendix G. Description of the pulse performance assessment.

PRUEBA DE DESEMPEÑO EN CIENCIAS

Investigando tu pulso

Nombre: ______Apellido: ______Curso: ______Colegio: ______

Esta es una prueba en la que quieres investigar cómo cambia tu ritmo cardíaco al hacer ejercicio.

Para esto, tienes los siguientes materiales:

• Un cronómetro • Un escalón en el cual puedes subir y bajar.

Tu tarea es:

Encontrar cómo cambia tu pulso después de subir y bajar un escalón, para eso vas a tomar y contar tus pulsaciones.

168 Para cumplir con tu tarea:

1. Encuentra tu pulso y asegúrate que sabes cómo tomarlo.

2. Cuenta tu pulso por 10 segundos.

3. Anota la cantidad de pulsaciones contadas en 10 segundos en la tabla de abajo, al frente de la casilla 0 minutos/en reposo.

4. Completa la tabla contando tus pulsaciones, cada minuto, durante 10 segundos mientras subes y bajas un escalón durante 5 minutos.

Recuerda que debes detenerte después de cada minuto para escribir el número de pulsaciones que contaste en la correspondiente columna de la tabla.

Número de pulsaciones contadas Número de minutos en 10 segundos 0 minutos/En reposo

Al minuto 1

Al minuto 2

Al minuto 3

Al minuto 4 Al minuto 5

Después de completar la tabla contesta las siguientes preguntas:

169

1. Al observar la tabla, ¿qué puedes decir sobre el cambio de tu pulso durante el ejercicio? ______

2. ¿Por qué crees que tu pulso cambió de esta manera? ______

3. ¿Qué aprendiste en la actividad? ______

170

Appendix H. Format used by observers to record student actions in the performance assessments.

3. REALIZACIÓN DE LA PRUEBA -OBSERVACION Nombre: ______Apellido: ______Curso: ______Colegio: ______NOTAS Paso Descripción Observación 1

2

3

4

5

6

7

8

9

171 NOTAS Paso Descripción Observación 10

11

12

13

14

15

16

17

18

19

20

172 Appendix I. Grading matrix used for the pulse performance assessment.

Calificación Pulso

Item 1 - Measure “at rest” pulse rate and VER TABLA record in table. Pulse Tema 1 - Medida "en reposo" del pulso y el registro beats are plausible: 7 to 25 counts per 10 seconds en la tabla. Los latidos del resultado en reposo son (40 to 150 counts per minute). plausibles: de 7 a 25 latidos por cada 10 segundo (o Total Possible Points: 1 40 a 150 latidos por minuto). Total de puntos posibles: 1 Item 2 - Measure “after exercise” pulse rates VER TABLA and record in table. Tema 2 - Medida "después de hacer ejercicio" del i) Records pulse at least 4 different times during pulso y registro de la tabla. the exercise (in addition to “at rest” measurement). ii) Pulse rates i) Registro del pulso de por lo menos 4 veces are plausible: 7 to diferentes durante el ejercicio (diferente a la medida 25 counts per 10 seconds at the beginning (40 to de “reposo”). ii) Las frecuencias de pulso son 150 counts per convincentes: 7 a 25 latidos por 10 segundos al minute). iii) Pulse rate increases with exercise comienzo (40 a 150 pulsos por minutos). iii) Hay (may level off or slow aumento del pulso con el ejercicio (se pueden near the end). estabilizar o disminuir cerca del final). Total Possible Points: 3 Total de puntos posibles: 3 NOTA: Los datos pueden subir y bajar, porque los estudiantes hacen ejercicio con diferente intensidad entre minutos. Item 3 - Describe how pulse changes during VER PREGUNTA 1 exercise. i) Description Tema 3 - Describe cómo cambia el pulso durante el consistent with data presented. ii) Description ejercicio. i) Descripción consistente con los datos includes identification of the trend or pattern in the data. presentados. ) Descripción incluye la identificación Total Possible Points: 2 de la tendencia o patrón en los datos. HACE MENCION A LOS DATOSii Total de puntos posibles: 2 Item 4 - Explain why pulse changes. Includes the VER PREGUNTA 2 following three Tema 4 - Explica por qué los cambios de elements relating to physiological needs during pulso. Incluye los siguientes tres exercise: i) role of muscle action (exercise results in need for more los elementos relativos a las necesidades fisiológicas energy and oxygen in durante el ejercicio: i) el papel de the muscles); ii) role of blood (more oxygen or acción de los músculos (resultados del ejercicio en la food supplied by an necesidad de más energía y oxígeno en increase in blood flow); iii) connection with heart los músculos), ii) el papel de la sangre (Hay más action or pulse rate, oxígeno o alimentos suministrados por una (heart is pumping faster to supply more blood). aumento del flujo sanguíneo), iii) la conexión con la Total Possible Points: 3 acción del corazón o del ritmo cardíaco, (Corazón bombea más rápido para suministrar más sangre). Total de puntos posibles: 3 TOTAL puntos posibles: 9

Estudiant Reposo (1 pto) Tabla (3 ptos) Pregunta 1 (2 ptos) Pregunta 2 (3 ptos) e XXXX 1 pto si registra 1 pto por 1 pto si 1 pto si 1 pto si lo que 1 pto si 1 pto 1 pto 1 pto por datos coherentes escribir los valores hay una describe de los identifica por por relación en reposo al menos son tendencia datos es consistente patrón en múscul sangre corazón 4 datos coherentes en los con los datos de la datos os y ritmo datos tabla

173