<<

SCIENTIFIC REASONING SKILLS DEVELOPMENT IN THE INTRODUCTORY BIOLOGY COURSES FOR UNDERGRADUATES

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of

in the Graduate School of The Ohio State University

By

Melissa S. Schen, M.S.

* * * * *

The Ohio State University 2007

Dissertation Committee Approved by

Professor Anita Roychoudhury, Co-Advisor ______Co-Advisor Professor Arthur L. White, Co-Advisor ______Professor David Haury Co -Advisor Graduate Program in Education Copyright by

Melissa S. Schen

2007 ABSTRACT

Scientific reasoning is a skill of critical importance to those students who seek to become professional . Yet, there is little research on the development of such reasoning in majors. In addition, scientific reasoning is often investigated as two separate entities: hypothetico- and argumentation, even though these skills may be linked. With regard to argumentation, most investigations look at its use in discussing socioscientific issues, not in analyzing scientific data. As scientists often use the same argumentation skills to develop and support conclusions, this avenue needs to be investigated. This study seeks to address these issues and establish a baseline of both hypothetico-deductive reasoning and argumentation of scientific data of biology majors through their engagement in introductory biology coursework.

This descriptive study investigated the development of undergraduates’ scientific reasoning skills by assessing them multiple times throughout a two-quarter introductory biology course sequence for majors. Participants were assessed at the beginning of the first quarter, end of the first quarter, and end of the second quarter. A split-half version of the revised Lawson Classroom Test of Scientific Reasoning (LCTSR) and a paper and pencil argumentation instrument developed for this study were utilized to assess student hypothetico-deductive reasoning and argumentation skills, respectively. To identify

ii factors that may influence scientific reasoning development, demographic regarding age, gender, science coursework completed, and future plans was collected.

Evidence for course emphasis on scientific reasoning was found in lecture notes, assignments, and laboratory exercises.

This study did not find any trends of improvement in the students’ hypothetico- deductive reasoning or argumentation skills either during the first quarter or over both quarters. Specific difficulties in the control of variables and direct hypothetico-deductive reasoning were found through analysis of the LCTSR data. Students were also found to have trouble identifying and rebutting counterarguments, compared to generating initial arguments from scientific data sets. Although no overall improvement was found, a moderate, positive relationship was detected between LCTSR and argumentation scores at each administration, affirming the predicted association. Lastly, no difference was determined between biology majors and other students also enrolled in the courses.

Overall, the results found here are similar to those classified in the literature for both hypothetico-deductive reasoning and argumentation, indicating that biology majors may be similar to other populations studied. Also, as no explicit attention was paid to scientific reasoning skills in the two courses, these findings complement those that illustrate a need for direct attention to foster the development of these skills. These results suggest the need to develop direct and explicit methods in order to improve the scientific reasoning skills of future biological scientists early in their undergraduate years.

iii DEDICATION

Dedicated to my grandfather

iv ACKNOWLEDGMENTS

I wish to thank my co-advisor, Arthur White, for his steady support, patient editing, and sharing the plethora of his statistical knowledge.

I am also grateful to my co-advisor, Anita Roychoudhury, for her constant encouragement and guidance - modeling both good teaching and good research.

I want to thank David Haury for his continuous support and guidance as a committee member, teacher, and section head.

I would like to thank the faculty and staff of the Introductory Biology Program, especially Judy Ridgway, John Cogan, and Amy Kovach for their support of this research and providing access to the students. I am also thankful for the logistical support provided by Rosemarie Thornton and the teaching assistants. Without their patience and help, it would have been impossible to collect data.

I also wish to thank the students who participated in this study, particularly those students who completed the instruments in all three administrations.

Lastly, I thank my friends and family for their love and support through this journey. I am especially grateful for Denny, Scott, Kristi, Tim, and Helen Schen, as well as Jessica Auman, Cynthia Bill, and Rhiannon Light.

v VITA

March 15, 1976 ...... Born – Canton, Ohio

1998...... B.A. Biology, Case Western Reserve University

2000...... M.S. Biology, Case Western Reserve University

2000 – 2003 ...... Lecturer, Case Western Reserve University

2006 – 2007 ...... Adjunct Instructor, Columbus State Community College

FIELDS OF STUDY

Major Field: Education

Minor Field: Research Methods

vi TABLE OF CONTENTS

Page

Abstract...... ii Dedication...... iv Acknowledgments...... v Vita ...... vi List of tables...... x List of figures...... xiii

Chapter 1: Introduction...... 1 Background and setting ...... 1 The call...... 1 Scientific reasoning overview ...... 4 Scientific reasoning in education...... 5 Scientific reasoning in college biology...... 7 Statement of problem...... 9 Research questions ...... 11 Definition of terms ...... 12 Argumentation (dialogic)...... 12 Constitutive definition...... 12 Operational definition ...... 12 Deductive reasoning ...... 12 Constitutive definition...... 12 Operational definition ...... 12 ...... 13 Constitutive definition...... 13 Operational definition ...... 13 Hypothetico-deductive reasoning...... 13 Constitutive definition...... 13 Operational definition ...... 13 Scientific reasoning...... 13 Constitutive definition...... 13 Operational definition ...... 13

vii Chapter 2: Literature Review...... 15 Scientific reasoning...... 15 Deductive aspects of science...... 16 The philosophy of ...... 16 Deduction and hypothetico-deductive reasoning...... 18 Inductive aspects of science ...... 20 The philosophy of and ...... 20 Induction and argumentation...... 22 How do scientists actually reason?...... 23 Hypothetico-deductive reasoning ability...... 27 Hypothetico-deductive reasoning ability and achievement in college biology ...... 28 Hypothetico-deductive reasoning ability, student beliefs, and conceptual change ...... 31 Hypothetico-deductive reasoning ability and testing ...... 34 Summary ...... 36 Argumentation ability...... 38 Informal reasoning, ability, and expertise...... 39 Informal reasoning and college students...... 43 Research on general improvements...... 43 Interventions...... 44 Problems with informal reasoning...... 46 Reasoning through argumentation in science...... 47 Coordination of theory and ...... 47 Interventions...... 50 Summary...... 53

Chapter 3: Methods ...... 56 Research design ...... 56 Participants ...... 58 Outcome measures ...... 61 Dependent variables data collection ...... 61 Hypothetico-Deductive reasoning...... 61 Argumentation...... 66 Independent variables data collection...... 68

Chapter 4: Results and Conclusions...... 70 Characterization of courses ...... 70 Hypothetico-deductive reasoning ...... 73 Initial distributions...... 73 Relationship of background characteristics and LCTSR autumn quarter scores ...... 77 Change in LCTSR scores...... 80 Change in overall total scores...... 81 Change in LCTSR item scores...... 85

viii Argumentation ...... 89 Initial distributions...... 89 Relationship of background characteristics and argumentation autumn quarter scores ...... 94 Change in argumentation scores...... 95 Change in overall total scores...... 95 Change in argumentation subscale scores...... 99 Correlation of hypothetico-deductive reasoning and argumentation...... 103 Three-time participants ...... 104 Change in overall SR scores...... 108 Change in LCTSR scores...... 108 Change in argumentation scores...... 110 Change in LCTSR and argumentation subscale scores ...... 111 LCTSR item comparison...... 111 Argumentation subscale comparison...... 115 Correlation of HD reasoning and argumentation ...... 117 Summary of key findings ...... 118

Chapter 5: Discussion and Implications...... 120 Limitations of the study...... 121 The assumption of natural development of SR skills ...... 123 Particular findings for hypothetico-deductive reasoning ...... 125 Particular findings for argumentation ...... 126 Biology majors as a population ...... 127 Future work...... 129

References...... 131

Appendices:

A. Argumentation instrument ...... 139

B. Argumentation instrument scoring rubric...... 142

C. Student demographic information instrument (winter 2007)...... 144

ix LIST OF TABLES

Table Page

1 Demographics of all participants by major ...... 59

2 Distribution of future plans for all participants by major ...... 60

3 Original LCTSR question distribution to forms A and B ...... 62

4 Principal components analysis rotated factor loadings of LCTSR form A...... 64

5 Principal components analysis rotated factor loadings of LCTSR form B...... 65

6 Principal components analysis rotated factor loadings of argumentation forms A and B ...... 67

7 Reliability of argumentation forms A and B subscales ...... 68

8 Laboratory exercises characterization by level of ...... 71

9 Average total LCTSR scores by administration and major ...... 74

10 Average AU1 and AU2 LCTSR scores by lab section...... 75

11 Average WI LCTSR scores by lab section ...... 77

12 Summary data of regression of AU2 LCTSR scores on college major...... 79

13 Stepwise entry regression of AU2 LCTSR scores on college major...... 80

14 Total number of individuals who completed the LCTSR instrument by administration...... 81

15 Descriptive of AU1 and AU2 LCTSR scores by major ...... 82

16 Repeated measures MANOVA comparison of AU1 and AU2 LCTSR scores .. 82

x 17 Descriptive statistics of AU1 and WI LCTSR scores by major...... 84

18 Repeated measures MANOVA comparison of AU1 and WI LCTSR scores..... 84

19 Repeated measures MANOVA comparison of AU1 LCTSR item-pair scores .. 86

20 P-values from MANOVA post-hoc comparison of AU1 LCTSR item-pair scores...... 87

21 Repeated measures MANOVA comparison of AU2 LCTSR item-pair scores .. 88

22 P-values from MANOVA post-hoc comparison of AU2 LCTSR item-pair scores...... 88

23 Average total argumentation scores by administration and major ...... 90

24 Average AU1 and AU2 argumentation scores by lab section...... 91

25 Average WI argumentation scores by lab section ...... 93

26 Total number of individuals who completed the argumentation instrument by administration...... 95

27 Descriptive statistics of AU1 and AU2 argumentation scores by major ...... 96

28 Repeated measures MANOVA comparison of AU1 and AU2 argumentation scores...... 97

29 Descriptive statistics of AU1 and WI argumentation scores by major...... 98

30 Repeated measures MANOVA comparison of AU1 and WI argumentation scores...... 98

31 Repeated measures MANOVA comparison of AU1 and AU2 argumentation subscale scores by major...... 102

32 Repeated measures MANOVA comparison of AU1 and WI argumentation subscale scores by major...... 103

33 Pearson product moment correlations between LCTSR and argumentation scores by administration...... 104

34 MANOVA demographics comparison of three-time participants with all other participants ...... 106

xi 35 Independent t-test comparison of number of science courses taken by three-time participants and all other participants ...... 106

36 Number of three-time participants who completed the LCTSR and argumentation instruments ...... 107

37 Descriptive statistics of AU1, AU2, and WI LCTSR scores by major for three-time participants ...... 109

38 Repeated measures MANOVA comparison of AU1, AU2, and WI LCTSR scores for three-time participants ...... 109

39 Descriptive statistics of AU1, AU2, and WI argumentation scores by major for three-time participants ...... 110

40 Repeated measures MANOVA comparison of AU1, AU2, and WI argumentation scores for three-time participants ...... 111

41 Repeated measures MANOVA comparison of AU1 LCTSR item-pair scores for three-time participants...... 112

42 P-values from MANOVA post-hoc comparison of AU1 LCTSR item-pair scores for three-time participants...... 113

43 Repeated measures MANOVA comparison of AU2 LCTSR item-pair scores for three-time participants...... 114

44 P-values from MANOVA post-hoc comparison of AU2 LCTSR item-pair scores for three-time participants...... 115

45 Repeated measures MANOVA comparison of AU1 and AU2 argumentation subscale scores by major for three-time participants...... 116

46 Pearson product moment correlations between LCTSR and argumentation scores by administration for three-time participants ...... 117

xii LIST OF FIGURES

Figure Page

1 of scientific reasoning and , identifying the roles of hypothetico-deductive and inductive reasoning ...... 5

2 The relationship between hypothetico-deductive reasoning and argumentation to compose scientific reasoning is connected through the relationship between evidence and theory ...... 14

3 Giere et al.’s (2006) model of an ideally complete report of a scientific episode (p. 29) ...... 17

4 Schematic of scientific reasoning, linking deductive and inductive reasoning... 25

5 Distribution of all 460 participants by major ...... 61

6 Mean AU1 LCTSR item-pair scores by major...... 86

7 Mean AU2 LCTSR item-pair scores by major...... 87

8 Mean AU1, AU2, and WI argumentation subscale scores by major...... 100

9 Mean AU1 LCTSR item-pair scores by major for three-time participants...... 112

10 Mean AU2 LCTSR item-pair scores by major for three-time participants...... 114

11 Mean AU1 and AU2 argumentation subscale scores by major for three-time participants ...... 116

xiii CHAPTER 1

INTRODUCTION

“…'To know' science is a statement that one knows not only what a phenomenon is, but also how it relates to other events, why it is important, and how this particular view of the world came to be. Knowing any of these aspects in isolation misses the point. Therefore, in learning science, students, as well as having the opportunity to learn about the concepts of science, must also be given some insight into its epistemology, the practices and methods of science, and its as a social practice…” (Driver, Newton, & Osborne, 2000, p. 297)

Background and Setting

The Call

The field of biology is rapidly expanding, more often utilizing concepts and theory from other disciplines, such as physics, mathematics, chemistry, engineering, and computer science. As students choose a science major, they need and expect the course and laboratory work that will develop them into a . This work includes the content knowledge and skills necessary to be able to design a solid experiment, analyze the results, and apply the findings to future work both within and across disciplines

(Committee on Undergraduate Biology Education to Prepare Research Scientists for the

21st Century [CUBE], 2003). However, freshmen biology majors are entering college under-prepared both in content and cognitive abilities. The National Science Foundation

1 (NSF) (1996) stated only 22% of high school graduates had taken biology, chemistry, and physics in 1992. Seymour and Hewitt (1997) also found approximately 40% of science, mathematics, engineering, and technology majors reported inadequate high school preparation as a problem in their current coursework. These courses are necessary prerequisites for freshmen in their introductory biology courses. In addition, Lawson

(1992a), states that as many as 50% of students in freshmen-level biology do not use formal reasoning patterns, which include the ability to develop hypotheses, control variables, and design an experimental protocol – skills crucial in the scientific process.

Other studies in everyday, informal reasoning have found that undergraduates have difficulty distinguishing evidence with bias/fairness (Baron, 1991; Perkins, Farady, &

Bushey, 1991; Toplak & Stanovich, 2003), developing arguments with adequate evidence

(Baron; Cerbin, 1988; Perkins et al.), and differentiating between and linking evidence with claims (D. Kuhn, 1992; 1993a; 1993b; Shaw, 1996). In taking these characteristics together, students do not seem to be prepared to undertake the demands of an introductory biology course, which has traditionally assumed the necessary content and skills background to be in place.

An additional difficulty for biology majors in the first year is simply staying with their declared major. Seymour and Hewitt (1997) completed a large ethnographic study to determine why approximately 50% of students leave their intended or declared science, mathematics, engineering, or technology (SMET) majors. They found that the percentage of students who “switch” was comprised of both poor-ability students as well as high-ability students and was even higher with regard to women or minorities. When analyzing the data, Seymour and Hewitt found the most common factors related to

2 switching included faculty pedagogy and curriculum design/student assessment. Faculty pedagogy was found to be an issue with 83% of students, including those who switched and those who stayed in the SMET majors. The problems associated with pedagogy were characterized as poor faculty involvement, attitude toward students, attitude toward teaching, and teaching that centered on direct lecture or even reading from the textbook. In addition, students were overwhelmed with the large amounts of work and rote memorization that was expected of them. Overall, students found the faculty to be creating an atmosphere that abandoned them with copious amounts of content to memorize and no aid or support. The students perceived this alienating atmosphere as one which was designed to “weed out” students with lesser academic abilities. Those students of a higher ability who should not have been “weeded out” instead perceived the work as not challenging or intellectually stimulating. These attitudes are not unique to American students, as they were corroborated by Marbach-Ad (2004) in a survey study of first-year students at Tel-Aviv University.

Sigma Xi (1990) surveyed students and faculty in entry-level courses in SMET and found many of the difficulties described by the students in Seymour and Hewitt’s

1997 study were centered in these courses. In addition to those already identified, faculty believed that students did not leave these courses with an understanding of the nature of science. The National Science Foundation’s (NSF) Advisory Committee to the NSF

Directorate for Education and Human Resources (1996) also found that students perceived the introductory SMET courses as the major barrier to a continued major in

SMET. With this information in hand, the focus must then turn to find ways in which to improve these courses and lessen attrition of science majors.

3 In order to address the issues brought forth by Seymour and Hewitt (1997),

Marbach-Ad (2004), Sigma Xi (1990), the Committee on Undergraduate Biology

Education to Prepare Research Scientists for the 21st Century (2003), and NSF (1996) call for more information and research to be conducted on the introductory biology courses, especially with regard to biology majors. Aside from the causes of attrition, little research has been completed on biology majors’ experiences and requirements as a unique group. As the first experiences are so crucial and students are entering college underprepared, an emphasis should be placed on exactly how biology majors are affected and prepared by their early coursework.

Scientific Reasoning Overview

Scientific reasoning (SR), i.e. evidence-based reasoning, is the heart of scientific knowledge generation. It is the method through which evidence is collected and analyzed and linkages between concepts and theories have been created. This reasoning has two general aspects: deduction and induction. Science has traditionally been positivist, utilizing deductive in the testing of hypotheses to remain objective and collect evidence. One of the strongest proponent’s for this view of science, Karl Popper, believed that searching for refutations to theories through hypothetico-deductive experimentation and reasoning was the only way in which to truly conduct science – once a theory was refuted, it was no longer useful in that form (Popper, 1965; 1993; Thorton, 2005).

However, as science became increasingly viewed as a social endeavor, the importance of induction was implicitly legitimized by philosophers such as Thomas Kuhn (1993; 1996) and Imre Lakatos (1993). This new view of science recognized the crucial collection of evidence through traditional hypothetico-deductive methods, but also established the

4 importance of a use of both confirming and refuting evidence to bear on and modify theoretically driven research /programs. This more current view of science then takes into account both deductively-derived evidence and inductively- derived theories (Figure 1).

Current Theory New, Modified Theory refuting evidence Inductive Reasoning Hypothetico-Deductive Reasoning (confirming evidence)

Figure 1. Schema of scientific reasoning and epistemology, identifying the roles of hypothetico-deductive and inductive reasoning.

Scientific Reasoning in Education

The importance of SR in education is recognized by the “scientific ways of knowing” standards in both Science for All Americans (American Association for the

Advancement of Science [AAAS], 1990) and The National Science Education Standards

(National Research Council [NRC], 1996). Even though these books are not geared toward undergraduate education, they evidence the importance of SR for all individuals, not just science majors, throughout their education. Both of these books emphasize the nature of science through the linkage of true scientific evidence with theories, the predictive ability of theories, and the changing nature of theories as new evidence

5 becomes available. The underlying ability to understand and participate in these characteristics of science is SR.

Evidence of SR in education is not limited to the educational goals and standards.

It is also illustrated in several learning theories. Inhelder and Piaget (1958) characterized several stages of intellectual development. The later two stages, concrete operational and formal operational, are relevant to SR in that these are the stages during which advanced reasoning skills, as described previously, begin to develop. Concrete operational individuals are dependent on concrete experiences and are unable to develop hypotheses based on those experiences. Those individuals who are able to generate and test hypotheses, i.e. use hypothetico-deductive reasoning, have reached formal operational thinking. The attainment of this level of reasoning may not occur for all individuals or in all subject areas (Piaget, 1972), but is of crucial interest in the . Those who study informal reasoning have also found many parallels with SR. Once relegated to simple

“everyday reasoning,” informal reasoning was only studied for its use in day-to-day needs and social issues. However, with the and evidence that informal reasoning really was a form of internal argumentation, interest in its place in the sciences increased

(D. Kuhn, 1993a). The process of argumentation, especially the framework identified by

Toulmin (1958), mirrors the inductive process in science: multiple pieces of data/evidence are given to support a claim/theory and are backed by warrants/justifications. Although not directly Baconian, the principle of a theory emerging from a collection of data is retained. Lastly, individual constructivism and learning through scientific inquiry are based on the premise that knowledge is actively built from and interactions with the world. This knowledge (i.e. evidence) is

6 analyzed and understood in regard to current information (i.e. theory), which in turn is modified by the new knowledge (von Glasersfeld, 1993). Scientific inquiry generates new information in a similar fashion. Current knowledge informs and shapes a conduct of inquiry, which, through reflection on the results, sharpens the current base of knowledge

(Hodson, 1996; T. Kuhn, 1996; Leonard, 2000). All three of these learning theories relate to the generation of scientific knowledge through reasoning, from generating a hypothesis, to deducing predictions, to analyzing evidence in light of and bearing on current theory. With this epistemological evidence lending theoretical support to SR in education, much information has been gathered on ways to promote SR, both deductive and inductive, in the classroom. However, this research has generally been limited to hypothetico-deductive reasoning or argumentation per investigation and most often involves K-12 students, preservice, and inservice teachers.

Scientific Reasoning in College Biology

The most important goal for undergraduate biology curriculums is content acquisition and understanding. However, as noted by Science for All Americans (AAAS,

1990), The National Science Education Standards (NRC, 1996), and Transforming

Undergraduate Education in Science, Mathematics, Engineering, and Technology (NRC,

1999), there is more to a science education than just content knowledge. The acquisition of SR skills and content knowledge are concurrent and dependent on each other (Means

& Voss, 1996). Strong reasoners need a breadth and depth of content knowledge from which to identify appropriate and impacting information. Studies have demonstrated that reasoners who display advanced argumentation skills also generally have a higher level of content knowledge from which to draw evidence and alternative explanations (Perkins,

7 Allen, & Hafner, 1983; Voss & Means, 1991). Conversely, strong reasoning skills also aid individuals in making connections between concepts. This is implied by the studies which reveal that those individuals who are identified with advanced reasoning skills also demonstrate greater conceptual understanding (Daempfle, 2002; Johnson & Lawson,

1998; Lawson, 1980; Sadler & Zeidler, 2005; Zohar, 1996; Zohar & Nemet, 2002).

The content knowledge students obtain is generally discovered in the lecture courses. However, the practice of reasoning skills is rarely identified in the lecture.

Instead, lectures have remained as times when students sit passively to be inundated with terms and with very little connection to other disciplines or ideas (Carter, Heppner,

Saigo, Twitty, & Walker, 1990; CUBE, 2003; Seymour & Hewitt, 1997). The most likely opportunity for students to practice and develop their scientific reasoning skills is in the laboratory, both in research and in courses (Bender, 2005). However, most introductory biology laboratory exercises are still primarily composed of “cookbook” exercises designed with a goal of concept affirmation instead of concept construction. On the other hand, the use of investigative, inquiry-oriented laboratories has been demonstrated to increase students’ use of science process skills, concept comprehension, and achievement

(Leonard, 1989). These laboratories have also been the focus of many studies to determine ways in which to help accelerate and improve students’ reasoning skills and overall cognitive development (Daempfle, 2002; Lawson, 1992a). Reasoning skills development in these courses may have stemmed from participating in a process similar to the three phases of scientific inquiry described by Windschitl and Buttemer (2000): generating experiments, analyzing data, and defending their conclusion to critiques by peers. By modeling this process in inquiry-oriented laboratories, students then replicate

8 the way in which scientists generate knowledge. The critical aspect of these phases is the defense of critiques by peers, whereby scientists learn the socioculturally acceptable standards for generating and backing a conclusion within the realm of science. Lawson

(1995) complements this finding, as he indicated that the development of hypothetico- deductive thought is through conflict with others and a desire to prove. In addition,

Hogan and Maglienti (2001) and Nersessian (1995) assert that the way in which scientists develop their reasoning skills is through authentic practice within the science culture, including defending conclusions and resolving alternative explanations. Therefore, the use of inquiry in the laboratory not only allows students to practice and refine their reasoning skills, it also provides them with experience in the manner which scientific inquiry builds knowledge directly.

Statement of Problem

Students who chose to major in biology should expect to be guided into the culture of science. To this end, students should be given the content and the skills necessary, especially SR, which enables students to understand how the conceptual knowledge they were given has been constructed and modified. However, curriculums appear to have an overt focus on content given through “cookbook” laboratories and lectures (CUBE, 2003; NRC, 1999; NSF, 1996; Seymour & Hewitt, 1997). It has therefore been generally assumed that biology majors’ reasoning skills will be developed through the natural participation in science course work and later in research. In support of this belief, there is evidence that scientists’ reasoning skills develop through actual participation in the social aspects of the science culture (Hogan & Maglienti, 2001).

However, as the literature describes initial difficulties regarding the coordination of

9 theory and evidence, as well as hypothetico-deductive reasoning in their peers, one can infer that biology majors may have similar problems with reasoning skills (e.g. D. Kuhn,

1993a; Lawson, 1992a). As biology majors may experience similar initial difficulties in

SR and a significant portion of this reasoning may not be honed until the individual enters the science community, this leaves a large portion of their preparation with regard to reasoning unaccounted for. For instance, how well do biology majors reason, inductively and deductively, when they begin their curriculum? How well does this reasoning develop? What aspects of the curriculum appear to influence this development?

Currently, these questions are difficult to answer as the research in SR often looks individually at deductive and inductive reasoning skills, focuses on socioscientific issues, and utilizes primarily young individuals (K-12), non-science majors, or lay adults (e.g. D.

Kuhn, 1993b; D. Kuhn & Pearsall, 2000; Lawson, 1992a; Sadler, 2004; Zohar & Nemet,

2002). Taken together, the current literature implies but does not directly give knowledge concerning the SR of biology majors with regard to scientific problems. This baseline information is critical in order to address the development of the reasoning skills that will allow these future scientists to participate and practice within the culture of science.

10 Research Questions

This descriptive study will attempt to address the following questions:

1. What is the initial distribution of hypothetico-deductive reasoning skills of

undergraduate biology majors?

2. What is the predictive relationship of student background characteristics, such as

gender, major, university rank, years of college attendance, number of previous

college-level and Advanced Placement science courses, and future academic plans

with hypothetico-deductive reasoning skills?

3. What is the change (if any) in hypothetico-deductive reasoning skills of

undergraduate biology majors during their first two courses?

4. What is the initial distribution of argumentation skills of undergraduate biology

majors?

5. What is the relationship of student background characteristics, such as gender,

major, university rank, years of college attendance, number of previous college-

level and Advanced Placement science courses, and future academic plans with

argumentation skills?

6. What is the change (if any) in argumentation skills of undergraduate biology

majors during their first two courses?

7. What is the relationship between hypothetico-deductive reasoning and

argumentation skills of undergraduate biology majors during their first two

courses?

11 Definition of Terms

Argumentation (Dialogic)

Constitutive definition. Dialogic argumentation recognizes opposition among assertions, relates supporting and refuting evidence to each assertion, and weighs all of the evidence “in an integrative evaluation of the relative merit of the opposing views”

(Kuhn, 1992, p. 157). It is a measure of “the cognitive and affective processes involved in the negotiation of complex issues and the formation of adoption of a position” (Sadler &

Zeidler, 2005, p. 73). Strong argumentation displays internal consistency and analysis of evidence through multiple perspectives, while poor argumentation is evidenced by unclear, contradictory, and simple arguments with a single perspective (Sadler & Zeidler,

2005).

Operational definition. Argumentation quality is assessed based on Toulmin’s argumentation pattern (TAP). Subscale and composite scores on the argumentation instrument, based on the five aspects of TAP: claims made, evidence used, warrants given, alternative explanation identified, and rebuttal given will serve as measures of the quality of argumentation. The process and coherency of argumentation, not the actual content of the argument, is the focus of the assessment.

Deductive Reasoning

Constitutive definition. Deductive reasoning is reasoning through the process of deduction; “reasoning from the general to the specific, or from premises to a logically valid conclusion” (Agnes, 2002, p. 377).

Operational definition. Deductive reasoning is assessed by the score on the

Lawson Classroom Test of Scientific Reasoning (LCTSR). This assessment focuses on

12 deduction in the form of hypothetico-deductive reasoning, as perceived by Inhelder and

Piaget (1958).

Inductive Reasoning

Constitutive definition. Inductive reasoning is reasoning through the process of induction; “reasoning from particular facts or individual cases to a general conclusion”

(Agnes, 2002, p. 729)

Operational definition. Inductive reasoning is assessed through the argumentation quality sub-scale and composite scores on the argumentation instrument (see

Argumentation).

Hypothetico-Deductive Reasoning

Constitutive definition. Hypothetico-deductive (HD) reasoning is the cognitive process used to generate a hypothesis, devise an appropriate experiment to test the hypothesis, deduce a prediction, and determine the agreement of the evidence with the prediction.

Operational definition. Hypothetico-deductive reasoning is assessed by the score on the Lawson Classroom Test of Scientific Reasoning (LCTSR).

Scientific Reasoning

Constitutive definition. Scientific reasoning (SR) is evidence-based reasoning.

Scientific reasoning encompasses aspects of both deduction and induction to generate, modify, and validate theories based on evidence, which in turn has been discovered through experimentation within a theoretical framework (Figure 2).

Operational definition. Scientific reasoning is assessed through scores on both the

LCTSR and argumentation instruments.

13 The relationship between hypothetico-deductive reasoning and argumentation to compose scientific reasoning is connected through the relationship between evidence and theory. Figure 2.

14 CHAPTER 2

LITERATURE REVIEW

Scientific Reasoning

What is scientific reasoning? As previously defined, scientific reasoning (SR) is the process of producing scientific knowledge through evidence-based reasoning. This definition, however, is rather broad. Exactly how do scientists use the evidence they have gathered to generate new scientific knowledge? The recognizes two broad types of reasoning using evidence: deduction and induction. Deduction, with its basis from , uses general theories to create hypotheses and predict the outcomes, generally using an “if, and, then” model. For example, if all birds have two legs, and the finch is a bird, then it will have two legs. On the other hand, induction, championed by Sir Frances Bacon, uses multiple pieces of evidence from which to create theories. For example, since finches, jays, wrens, eagles, etc. all have two legs, then all birds have two legs. Whether scientists use either or both types of reasoning is still up for debate in science philosophy circles. In addition, the best type of SR to be employed in the classroom has influenced a small debate in science education. However, very little research has looked at both types concurrently (D. Kuhn & Pearsall, 2000) and for the purposes of this study, both types will be considered.

15 Deductive Aspects of Science

The philosophy of Karl Popper. Science has traditionally been viewed through a strictly positivistic lens whereby the process of evidence collection was believed to be completely objective and dictated by a deductive method. One of the preeminent philosophers of science who touted this viewpoint was Sir Karl Popper. Popper’s (1965;

1993) main interest was in determining the criterion of demarcation between science and pseudo-science. He stated that it was easy to find confirming evidence for theories when looking for confirming evidence. However, the truly genuine test of a theory was its ; therefore, any theory that cannot be refuted by any conceivable event is nonscientific and the only truly confirming evidence is that found through an attempt to refute a theory. Popper also disapproved the use of ad hoc adjustments to a theory following the discovery of refuting evidence. He believed that once a theory was refuted, it could no longer be useful—a belief criticized for its ignorance of the day-to-day workings of actual scientific practice (T. Kuhn, 1993; Thorton, 2005). However, Popper later qualified this belief by recognizing the importance of the modification of theories by auxiliary hypotheses, provided that the rationale behind the modifications were scientific and not designed simply to evade falsification of the theory (Thorton, 2005). Following these beliefs, Popper is also known for his staunch disapproval of induction as part of the . If the true demarcation between science and pseudo-science is the ability to disconfirm a theory, then theories built through induction of only confirming evidence are not scientific. Popper stated, “Induction, i.e. inference based on many observations is a myth. It is neither a psychological , nor a fact of ordinary life, nor one of scientific procedure.” (1965, p. 53). This rebuke of induction has also been one of

16 Popper’s criticisms and is therefore not acknowledged in the current epistemology of science.

Taken together, Popper’s views of the process of science are readily summed up by Giere, Bickle, and Mauldin’s (2006) diagram of an ideal report of a scientific episode

(Figure 3). In this diagram, the model is related to the real world by a hypothesis. This hypothesis will be found to be true if the data, derived from the real world and generated through experimentation, are found to agree with the prediction, derived from the model.

Figure 3. Giere et al.’s (2006) model of an ideally complete report of a scientific episode (p. 29).

This is the hypothetico-deductive process. The scientist observes the real world and conceives of a hypothetical model to represent it, from which a prediction is deduced regarding how the real world functions. Data is generated through an experiment and checked to match with the prediction. Those data that match confirm the model for the moment and those that disagree with the prediction refute the model, similar to Popper’s

(1965) belief. The best model then is the one that stands continued attempts at refutation.

17 Deduction and hypothetico-deductive reasoning. How has deduction been used in science education? Through his research into cognitive development, Jean Piaget categorized four stages of intellectual development from infancy to adulthood. These stages are defined by the mental structures and operations the individual acquires and integrates with previous experiences and operations as they mature. They are often discussed in terms of “reasoning patterns” or “reasoning abilities,” especially with regard to science education (to avoid other uses of “operation” in science terminology) (Karplus,

1977). As this study focuses on college students, the first two stages, sensory-motor and pre-operational, will not be reviewed. However, the next two stages are of importance in describing students’ reasoning at the post-secondary level. When individuals are able to modify an object of knowledge through interiorized actions, Piaget states that these are the creations of mental operations. The concrete operational stage (ages 7 to 10 years) bases these operations on objects that are observable and able to be referenced, forming the basis of elementary logic, though not extending beyond the observable. When a child is able to new operations that are combinatorial and to reason based on hypotheses that he or she is able to test, the formal operational stage is reached

(beginning approximately ages 11-15 years) (Inhelder & Piaget, 1958; Karplus, 1977;

Piaget, 1972; 2003). “In other words, formal thinking is hypothetico-deductive” (Inhelder

& Piaget, p. 251). An individual’s speed of development through the stages has been found to be dependent upon many different factors, including societal, cultural, and intellectual stimulations. Although most individuals can expect to advance through the first three stages, the attainment of the fourth stage is not as well defined, depending on the stimulation by environmental factors. Piaget hypothesizes that, between 15 and 20

18 years, most normal individuals can expect to attain the formal operational stage, although in different areas according to his or her individual aptitudes and professional specializations (Piaget, 1972). Interestingly, it has been found that only 25% of entering freshmen have attained the formal level of reasoning abilities, 50% are in transition from concrete to formal, and the remaining students have been found to still be concrete reasoners (Chiappetta, 1976; Lawson, 1980; 1992a).

At the college level, then, it would appear that only the concrete and formal operations are of interest; however, current research has led to the characterization of a possible fifth operational stage. Arlin (1975) demonstrated that Piaget’s formal level may actually be two separately characterized levels: the formal level, defined by problem solving, and a fifth level, defined by problem-finding or the ability to discover questions from ill-defined problems. In addition, Arlin provided evidence that these two stages are sequential, dependent on mastery of one stage to to the next, as the first four of

Piaget’s operational stages. Arlin’s work has been taken further to demonstrate that similar to the first four stages, there is a fifth change in the brain’s electroencephalogram pattern in young adults that correlates with the development of the ability to derive alternative hypotheses involving unseen entities (Lawson, Drake, Johnson, Kwon, &

Scarpone, 2000). This “post-formal” skill is a progression from the formal stage. Lawson,

Clark et al. (2000) characterized this progression as the ability to develop hypotheses based on unseen entities from an ability to develop hypotheses only based on observable entities.

As both the upper levels of Piagetian reasoning skills and Popper’s philosophy regarding the nature of scientific knowledge construction center on hypothetico-

19 deductive thinking, much research has been conducted, primarily led by Lawson, utilizing Piagetian reasoning levels to assess students in the sciences. The intersection of these two knowledge construction theories in the science classroom has produced a plethora of data linking Piagetian reasoning skills to teaching and learning factors such as science achievement, conceptual change, instructional methods, and learning styles

(Lawson, 1992a). Lawson (2003a; 2005) has even gone so far as to utilize some of his research methods and results to affirm Popper’s deductive viewpoint to the exclusion of all other possible interpretations, although he has not escaped criticism in this matter with regard to “shoehorning” his data (Allchin, 2003). With this criticism in mind, does the connection between the assertions of cognitive psychologist Piaget and science philosopher Popper tell the whole story with regard to students’ scientific reasoning skills?

Inductive Aspects of Science

The philosophy of Thomas Kuhn and Imre Lakatos. The importance of deduction in science is recognized without hesitation. However, the use of induction in science has not been so readily accepted because of reasons similar to the refutations given by Popper

(1965). In addition, science has traditionally been seen as an individual endeavor instead of a collaborative effort, which pools evidence from different quarters to formulate theories. However, the importance of social interactions in scientific knowledge construction has been recently acknowledged. This has primarily been due to two philosophers, Thomas Kuhn and Imre Lakatos, who give implicit regard to induction in their view of the scientific knowledge generation. T. Kuhn (1993; 1996) asserted that scientific knowledge is dominated by research paradigms that drive both the direction for

20 and method from which science knowledge is created. These paradigms, which are akin to a theory, influence the manner in which evidence is evaluated and persist until challenged by anomalous data – the process of “normal science.” The escalation of anomalous data confronts the /theory so that it must then be reconsidered, evaluated, modified, and/or replaced – creating a paradigm shift or “revolution.” The view of Lakatos (1993) is similar to that of T. Kuhn in that “research programs” exist which consist of an accepted “hard core” and “positive heuristic” which guide how and what research is conducted, similar to a paradigm. Lakatos also envisions a protective belt of auxiliary hypotheses, which allows the research program to continue in light of anomalous data so long as it keeps predicting and generating novel information, i.e.

“progressing.” Once the program no longer supports the theoretical with the empirical, it is determined to be “degenerating” and may be superseded, or “shelved,” by another research program that is progressing in that area. In each of these theories, Popper’s falsification is not an absolute blow to the theory at hand; rather, refutations are considered anomalous data to either be aside for the time being or possibly incorporated into auxiliary modifications (T. Kuhn, 1993). Instead, rejection occurs when all the data are considered and inductively give rise to a new and better theory.

The viewpoints of T. Kuhn and Lakatos have been also supported on an individual level by Chinn and Brewer (1993; 1998) who found that there are eight ways in which individuals deal with anomalous data: ignoring, rejecting due to disbelief of the data, uncertainty with regard to validity of data, exclusion due to irrelevance of data to the theory, abeyance due to acceptance of the data as valid but not sure what to do with it, reinterpretation that explains the data away within current theory, peripheral theory

21 change involving minor changes to the theory, and theory change due to a need to abandon the old theory and adopt the new one. Chinn and Brewer found that complete theory change upon an encounter with anomalous data is the most difficult for individuals to readily enact. This can also be easily inferred and transferred to the vision of scientists working within a paradigm/program, which is consistent with the views of T. Kuhn and

Lakatos.

Induction and argumentation. Informal reasoning skills were originally considered “everyday thinking skills.” However, with the claim that informal reasoning skills took the form of an internal dialogic argument, a new definition with regard to scientific reasoning emerged. Deanna Kuhn (1992; 1993a; 1993b) put forth that dialogic argumentation recognizes two opposing assertions, weighs the evidence, and determines the relative merit of each assertion in light of the evidence. This consideration of assertions and the corresponding evidence is similar to that put forward as scientific reasoning by Thomas Kuhn and Imre Lakatos. The implicitly inductive epistemology of science provided by philosophers such as T. Kuhn and Lakatos, as well as theories regarding the structure of arguments have provided a framework with which to investigate inductive reasoning skills in science.

Around the same time that T. Kuhn and Lakatos were developing their theories,

Stephen Toulmin (1958) elucidated a general structure in which to view argumentation.

This structure begins with the grounds, or data, used to establish a claim. In order to bridge the grounds and claim appropriately and legitimately, one needs to provide a warrant, i.e. a general rule. As warrants themselves may have different degrees of appropriateness and legitimacy related to the problem at hand, backings need to be

22 provided for the warrants, establishing the precise relation of the warrant to the particular data and claim in question. Finally, the claims may need to be modified as to the degree of force that the data have on them. To this end, qualifiers and conditions of exceptions/rebuttals may also need to be offered. Toulmin, Rieke, and Janik (1984) took this information and created a framework from which to investigate and teach higher order thinking, including consideration and integration of the grounds, claims, warrants, and backing, as well as counterarguments and rebuttals used in argumentation. Although this framework has been around for decades, it has more recently been applied in numerous studies investigating students’ higher order thinking and reasoning in the sciences, specifically looking at those factors contributing to students’ ability to generate and evaluate sound arguments.

Toulmin and T. Kuhn/Lakatos’ theories relate at the heart of their respective processes – a metacognitive approach to the evidence. This approach encompasses evaluating the evidence on its own merit, separate from the theory, yet bearing upon it.

The challenge by and evaluation of evidence is similar to that used in argumentation – for the new claim/paradigm to be accepted, scientists must first be able to envision and consider alternative arguments. They must then provide the confirming and anomalous data and utilize the appropriate warrants, backing, and qualifiers to convince others of its appropriateness and veracity while being prepared for counterarguments and rebuttals.

How Do Scientists Actually Reason?

Currently, the nature of science is generally seen as an intermingling of both deductive and inductive aspects. Science for All Americans (AAAS, 1990) describes science as a blend of logic and imagination – logic helps to generate the data and

23 imagination finds ways to connect the evidence. It also states, “Science can use deductive logic if general principles about phenomena have been hypothesized, but such logic cannot lead to those general principles” (p. 142). This characterization of the process of science includes both aspects of Popper’s logical deductive testing of hypotheses and T.

Kuhn/Lakatos’ linkage of evidence to form general principles. The hypothetico-deductive work of science is carried out at the core, generating evidence, which is then interpreted in a more inductive scheme, consistent with T. Kuhn’s “normal science”. D. Kuhn and

Pearsall (2000) corroborate this view in noting that there are two categories of scientific- thinking skills which are investigated: investigative skills and inferential skills. The investigative skills are those that are deductive, looking at the path from theory to evidence through the design of experiments. On the other hand, the inferential skills entail those that construct theory from evidence. Driver et al. (2000) also support this view in their discussion of the place of argument in the science classroom. “…Students need to appreciate that scientific theories are human constructs and that they will not generate a theory, or reach a conclusion, by deduction from the data alone. Instead, they need to postulate possible interpretations and then examine the arguments for each in the light of evidence that is available to them” (p. 299). Even Lawson (2003b), in addressing the applicability of Toulmin’s argumentation framework for developing hypothetico- deductive skills, appears to support this schema, although it has not been his purpose. He emphasizes the use of “if/and/then” reasoning to test alternative hypotheses of a problem; utilizing the claim in Toulmin’s framework as a tentative explanation, students must then supply the data, backing, and warrants through experimentation. Ironically, he also states that students “must once again engage in argumentation to convince others of the veracity

24 of their conclusion” (p.1389). In each instance, the importance of deduction to generate evidence and the implication of induction to aid in generating theory are present, along with recognition of the critical connection between the two, as shown in Figure 4.

Figure 4. Schematic of scientific reasoning, linking deductive and inductive reasoning.

Although this schema gives the relationship between the two types of reasoning to be investigated here, it does not identify the qualities of good scientific reasoning. With regard to strong formal operational reasoning skills, Lawson (1982) identified five schemata involved in advanced formal reasoning: generating expectations, control of variables, generating possible combinations of causes, probabilistic/correlational reasoning, and proportional reasoning. In addition, those who can reason with unseen

25 entities are considered to have achieved the highest level of Piagetian formal reasoning skills. Lawson (2003b) has also assigned high value to the recognition and testing of alternative explanations, similar to that seen in the argumentation literature (D. Kuhn,

1992; 1993a). This type of scientific thinking has also been described as an evaluative epistemology that values thinking, evaluation, and argument in knowing. Scientists, who follow this epistemology, have been found to value specific, conservative claims based on a range of solid evidence and consideration of alternative explanations (Hogan &

Maglienti, 2001). Tangential to this idea is the recognition and accommodation of anomalous evidence into theory. Weaker reasoners tend to either dismiss anomalous evidence outright or manipulate this evidence to maintain a confirming bias (Phillips,

1978; Zeidler, 1997). Chinn and Brewer (1993) reaffirm this by noting “theory change” as the highest and most difficult reaction to anomalous data. All in all, strong scientific reasoning is seen as taking into account all evidence, confirming and refuting, to evaluate initial and alternative hypotheses in reference to the current theory. This evaluation must also have a willingness to discard or modify the current theory in light of the accumulated evidence.

In addition to the above findings, strong argumentation skills also seem to mirror those mental skills identified in expert/novice literature. Perkins, et al. (1983) found that skilled reasoners were able to use pattern and model recognition to relate the appropriate evidence to a social issue. Additionally, these reasoners were more likely to evaluate their mental models metacognitively for fit to the problem at hand. Voss and Means (1991) also identified good reasoners as having a “mental style” that actively analyzed arguments and reflected on progress. The expert/novice literature reflects these findings

26 in that experts notice meaningful patterns in information, due in part to their ability to organize information mentally. In addition, experts monitor their own approach to problem solving metacognitively, constantly evaluating and adapting to new problems

(Bransford, Brown, & Cocking, 2000). These findings can readily be connected to the previously stated scientific-thinking characteristics. All in all, D. Kuhn and Pearsall

(2000) sum up this relationship when they describe the essence of mature scientific thought as the “coordination of theory and [empirical] evidence in a consciously controlled manner” (p. 114). Evidence is interpreted in light of the current theoretical framework/paradigm/program with awareness of the distinction between the theories and evidence; however, it is also metacognitively evaluated for its fit within the current state and flux of theories, based on other emerging evidence.

Hypothetico-Deductive Reasoning Ability

Piaget’s operational levels are often discussed in terms of “reasoning patterns” or

“reasoning abilities,” especially with regard to science education (Karplus, 1977).

Science education researchers have recognized that the upper levels of Piagetian reasoning are defined by hypothetico-deductive thinking and science is a field dependent on the creation and testing of hypotheses. These researchers have therefore been investigating the relationship of Piaget's intellectual/reasoning stages with factors such as science achievement, conceptual change, instructional methods, and learning styles for varying ages and years of schooling (Johnson & Lawson, 1998; Lawson, 1983; 1992b;

Lawson & Wollman, 2003). In addition, some researchers have proposed a fifth level of intellectual abilities, the “post-formal operational stage,” whereby the development of hypotheses based on observable entities is the central characteristic of the fourth stage

27 and the construction of those based on unobservable entities is the criterion for the fifth, post-formal, stage (Lawson, 2003b; Lawson, Alkhoury, Benford, Clark, & Falconer,

2000; Lawson, Clark et al., 2000; Lawson, Drake et al., 2000).

Hypothetico-Deductive Reasoning Ability and Achievement in College Biology

Due to previous conflicting results considering the correlation of Piagetian intellectual stage and grades, Lawson (1980) investigated the effects of reasoning ability level and cognitive style, as defined by field-dependence/disembedding ability, in relation to course achievement. In this study, Lawson pre-tested the students for Piagetian intellectual stage, grouping them into concrete, transitional, or formal reasoners, as well as cognitive style, grouping them into field-dependent, transitional, or field-independent categories. Course achievement was measured by cumulative exam grades on open book, open note essay exams. He found that formal reasoning significantly correlated with field-independence. Lawson also found that Piaget test scores correlated positively and strongly with cumulative exam scores, both with and without the cognitive-style scores accounted for, demonstrating that reasoning abilities do appear to have a positive relationship with course achievement. A few years later, Lawson (1983) was not able to support his earlier findings when he determined the only consistent correlating factor among several exam question types was cognitive style. Reasoning ability was found to have a low positive association with achievement on a computational question, although not to the degree found in the 1980 study. These studies suggest that reasoning ability may have a greater effect with regard to open essay examination questions than multiple- choice or computational assessments.

28 Although open essay questions may be the preferred method for demonstrating differences in reasoning skills, traditionally there is a demand on content knowledge measured by high-stakes multiple-choice testing, which emphasizes the importance of a large body of stored information and recall. This type of testing, along with the focus on prerequisites by colleges and universities, led Johnson and Lawson (1998) to investigate the effect of prior knowledge compared to reasoning ability on achievement. This study also took into account method of instruction, hypothesizing that the inquiry method advocated by the National Science Education Standards (NRC, 1996) would better serve those students with formal-reasoning abilities while an expository instructional method would best serve concrete-reasoning students. It was believed that students with formal- reasoning abilities could generate and test alternative hypotheses on their own, while concrete-reasoning students may need a more direct, step-by-step method for laboratory exercises (Karplus, 1977). Through multiple-choice exam scores, reasoning ability was determined to be the only significant predictive factor with regard to achievement on both the semester exams/quizzes and the cumulative final examination. In addition, reasoning ability, not prior knowledge, demonstrated correlation with achievement by instructional method, however to a greater extent with expository instruction over inquiry-based instruction. Lawson and Johnson (2002) repeated this study, finding similar results regarding the correlation of instructional method with reasoning levels.

The results from both of these studies conflicted somewhat with previous work by

Lawson (1983) where he found a positive relationship between prior knowledge and achievement on multiple-choice exam questions, but no correlation with reasoning ability. However, the positive correlation of reasoning ability with achievement on a

29 computational question was greater than the positive correlation found between prior knowledge and the same question type, similar to the results of his later work. This study was also only based on one exam, not averaged over three, making the results difficult to compare. In addition, although the evidence looks strong regarding the prevalence of reasoning ability over prior knowledge, this conclusion needs to be moderated as the validity and reliability of the instruments used to test for prior knowledge in all three studies (Johnson & Lawson, 1998; Lawson; Lawson & Johnson, 2002) was weak, especially in those done by Johnson and Lawson in 1998 where the reliability coefficient was r = 0.2. Finally, in all studies investigating the relationship of reasoning ability and other factors to achievement, the reliability and validity of the semester exams and quizzes is never noted. Understandably, exams must change with the class; however, perhaps some of the differences seen in the results may be attributed to differences in exam questions for that semester.

Lawson (1993) also examined the mutual, relative impact that homogeneous and heterogeneous pairings based on reasoning abilities had on student laboratory partner academic performances and enjoyment of learning in a lecture-laboratory combination introductory biology course for non-majors. Within each lab section, teaching assistants partnered students by ability with the intent to generate the greatest number of pairing combinations per class: concrete-concrete, transitional-transitional, formal-formal, concrete-transitional, concrete-formal, and transitional-formal. Lawson found a significant increase in the reasoning abilities of the concrete and transitional students paired together, however laboratory partner reasoning level was not found to be the source of the effect. In addition, laboratory partner reasoning ability did not significantly

30 affect achievement, although some gains were seen in both partners of concrete-formal pairings. When analyzing the student surveys regarding the laboratory experience, a reciprocal positive influence of working with either a much more or less able reasoner was discovered: the concrete students benefited from more able tutoring and formal students benefited from teaching their less able partner. Although concrete and formal reasoners described their experiences as positive, the transitional students significantly differed on nearly all aspects of the laboratory experience with regard to their concrete or formal partner. This finding indicated that the perceived reasoning ability of one’s partner may be critical to a student who is struggling with his or her own reasoning ability.

Lawson inferred that perhaps this difference was not due to the preference for working with the formal reasoners, but rather the preference not to work with concrete reasoners.

The transitional students may not have been secure in their own abilities and could become frustrated when working with less able students. Overall, Lawson demonstrated that although partnering based on reasoning ability may not have an impact on student achievement, it does have some influence on total laboratory experiences.

Hypothetico-Deductive Reasoning Ability, Student Beliefs, and Conceptual Change

Central characteristics of formal reasoning are the ability to consider multiple, alternative explanations for phenomena and to generate hypotheses to test such alternative explanations (Piaget, 1972). These skills are crucial to the process of conceptual change, whereby students first must differentiate between existing concepts, then exchange old, nonviable concepts for new, plausible ones, and finally to place these abstract concepts into the appropriate context (Hewson & Hewson, 2003). The reasoning skills found in the formal operational stage aid this process through the testing of old and

31 new concepts, as similarly described for deduction in science knowledge generation. It is therefore desirable to understand the relationship between reasoning level and conceptual change.

Lawson and Weser (1990) explored this association by investigating the degree that nonscientific beliefs regarding evolution changed throughout a semester in relation to the students’ initial levels of reasoning ability. It was predicted that those students who had higher reasoning skills would be more likely to hold scientific beliefs initially and also more likely to change nonscientific beliefs. Conversely, concrete reasoners would be more likely to have and retain nonscientific beliefs. The authors did detect the predicted correlation between initial reasoning ability and initially-held scientific beliefs, with significant differences found on test items regarding evolutionary perspective and

Darwin's theory. In addition, although most students tended to believe that evolution took place, those students identified as formal reasoners were more likely to hold these beliefs strongly. Lawson and Weser also found that more skilled reasoners were better adapted to change nonscientific beliefs to scientific ones. They argued that formal reasoners’ ability to consider alternative views strengthens their ability for conceptual change. However, mixed results demonstrating regression to more nonscientific beliefs on some test items, as well as the lack of many significant differences among reasoning abilities and belief changes implies that nonscientific beliefs mixed with religious convictions may be more difficult to change due to the depth of emotional investment.

In contrast to investigating nonscientific beliefs which may have emotional components, Lawson, Baker, DiDonato, and Verdi (1993) used the introduction of competing theoretical concepts regarding molecular polarity, bonding, and diffusion in

32 order to determine the impact of hypothetico-deductive reasoning skills on conceptual change of scientific concepts. After being exposed to an inappropriate concept, the researchers found that reasoning skill group difference was positively and significantly related to performance on the post-test, which required students to explain diffusion through the use of the appropriate concept. These group differences were not seen on the pre-test, as may be initially expected, especially when considering the observations by

Lawson and Weser (1990) that demonstrated a relationship of initial scientific beliefs to reasoning abilities. However, as the post-test required the students to consider both the inappropriate and appropriate theoretical explanations, the correlation between formal reasoning skill and performance revealed the importance of these hypothetico-deductive skills in conceptual change. In addition, the authors more closely examined those students who initially scored at a high misconception level on the pre-test. They found that, of those students, none of the concrete reasoners were able to change from holding a misconception to a correct conception, while both transitional and formal reasoners demonstrated some movement towards correct conceptions.

However, with regard to addressing misconceptions through reasoning ability, one missing piece of information is the degree to which the experimenter actively pursues an increase in reasoning abilities as one of the teaching goals, as is often the case in studies where the experimenter is the teacher. If there is a significant focus on developing reasoning skill through practice, possibly on the particular misconception investigated, then the premise of the effect of initial reasoning ability on conceptual change is less valid. The replicability of these experiments is severely limited without knowledge of the degree to which reasoning skills are incorporated in the everyday classroom. Even

33 without this information, these studies, taken together, demonstrate the variety of conceptions that can be influenced by reasoning ability.

Hypothetico-Deductive Reasoning Ability and Hypothesis Testing

Previous work focusing on Piaget’s intellectual development stages, including those already reviewed here, has been on the four levels already described by Piaget himself. However, recent studies have found there is a possible fifth level of reasoning ability. The progression of development through the first four Piagetian levels has been associated with changes in brain function, as noted by changes in electroencephalograms

(EEGs). A fifth alteration in an individual’s EEG has also been found to take place.

Lawson, Drake, et al. (2000) hypothesize this change is associated with a post-formal operational stage, which they term “theoretical,” denoted by the ability to formulate hypotheses involving unseen entities such as water or gas molecules. They found that in administering quizzes that asked for hypotheses of increasing difficulty, which correlated with concrete, formal, and theoretical levels of reasoning ability, there was a decrease in the percentage of students who were able to correctly answer the questions. Although no statistical analyses were performed, the percentages of those students who succeeded on progressively difficult tasks roughly equaled the average percentages of students found at the corresponding reasoning levels.

Lawson, Clark, et al. (2000) validated these findings by adjusting the previous

Lawson Classroom Test of Scientific Reasoning and demonstrated there are two significantly different levels of hypothesis-testing skills. The first corresponds to the formal reasoning level and hypotheses based on visible entities and the second corresponds to the theoretical reasoning level and hypotheses generated for unseen causal

34 agents. In addition, these two levels have a significantly increasing impact on exam scores and on transfer problems designed to measure theoretical reasoning skills.

Lawson, Alkhoury, et al. (2000) found a similar correlation between higher theoretical reasoning ability and a proposed new hierarchy of scientific concepts. The authors offered a designation of three levels of scientific concepts: descriptive, which involves directly observable entities; hypothetical, which involves unseen entities that are unable to be observed due to a restricted time frame (i.e. evolution); and theoretical, which involves concepts that are not directly observable. When computing multivariate analysis and pairwise comparisons, the authors found that, overall, the higher the reasoning ability, the greater the proficiency in answering questions regarding more difficult concepts.

These three studies all demonstrate the presence and characteristics of a new level of intellectual development, which Lawson and colleagues have termed “theoretical.”

Piaget’s basic premise of the necessity of completing one stage before advancing to the next is retained in this new schema. However, if one needs to acquire the critical skills to advance through the stages, then those individuals who have not attained these skills should not demonstrate any proficiency in the tests for levels above his or her current level. Interestingly, all three studies do demonstrate that within each reasoning stage there is proficiency variability with lower ability reasoners achieving on higher order questions. This does call into question the validity and reliability of the test of reasoning skills. However, Lawson and his colleagues postulate it is possible that lower-ability students may have enough pieces of knowledge to correctly answer a few higher-level questions or that personalogical factors, such as confidence and creativity, may be

35 interacting with reasoning ability (Lawson, Alkhoury et al., 2000; Lawson, Clark et al.,

2000; Lawson, Drake et al., 2000). Overall, though, the area is still relatively new and it is possible that this intellectual level may be characterized by more than the ability to consider and test unseen entities.

These studies regarding hypothesis testing and the presence of a fifth Piagetian intellectual stage are fairly new but appear promising. The validity of the newer items testing for the categorization of the theoretical level has been established through their repeated use in studies that demonstrate there is a significant difference in the abilities and achievement of students in the formal and theoretical stages. Staver and Pascarella

(1984) increases this confidence by demonstrating that the method and format of the

Piagetian tasks, which comprise the reasoning ability test, have no impact on the overall responses of subjects. Perhaps the greatest weakness in this area of study is the possibility of the interaction of other factors that could account for the appearance of upper-level reasoning skills in lower-level reasoners. Although this is a concern in all studies regarding Piagetian levels, more care and scrutinization need to take place when establishing a new level. Future research needs to be directed at these possible interactions, if only to discern newer individual characteristics that could have impacts on the research presented here or could further characterize Piaget’s intellectual stages.

Summary

The literature demonstrates that reasoning ability, when compared to cognitive style and prior knowledge, has a greater overall effect on student achievement, regardless of instructional or testing method. This statement is supported by Cavallo, Rozman,

Blickenstaff, and Walker (2003/2004) who also found reasoning ability to be the best

36 predictor of achievement in a group of sophomore and junior science majors. When taking the strength of the effect of reasoning ability on achievement into account, partnering in laboratories based on reasoning ability appears to be a possible extension of this impact from higher-level to lower-level reasoners. However, Lawson (1993) has been unable to offer evidence in support of this, although the process still reveals some positive effects on student perceptions of their overall laboratory experiences. With regard to conceptual and belief change, the studies establish an overall positive correlation between reasoning ability and conceptual change. However, this correlation is primarily associated with post-test change and not initial reasoning of alternative conceptions, as would be expected. This is not a pressing point, as each study was measuring the degree of conceptual change for two topics that are fundamentally different in regard to their personal nature for different individuals. Concepts regarding evolution are intricately tied to religious beliefs, which are very difficult to disentangle and disembed. On the other hand, misconceptions regarding diffusion are easier to address, as the emotional ties are few. Nevertheless, as each study examines different types of misconceptions yet still demonstrates a positive correlation with the appropriate concept on post-test data, they also reveal the common influence of higher reasoning ability with regard to conceptual change. In addition, they demonstrate that nonscientific beliefs with an emotional commitment are more difficult to alter while those without an emotional connection do not demonstrate this same resistance.

As an additional note, all the studies in this review, save Cavallo, et al.

(2003/2004), have focused on non-major introductory biology students from Arizona.

Although this allows for a more consistent comparison among the results of the studies, it

37 does not offer information regarding the distributions and relationships of reasoning abilities in science majors. As aptitude may be a factor in the progression through the stages, it is possible that science majors may have a higher proportion of formal and theoretical reasoners and demonstrate different relationships among the factors discussed: achievement, cognitive style, prior knowledge, instructional method, and conceptual change (Piaget, 1972). If reasoning ability has the pervasive impact as demonstrated here and hypothesis-testing skills, both seen and unseen, are crucial for developing scientists to acquire, then information regarding the distribution and impact of the stages in the population of future scientists, i.e. science majors, is critical.

Argumentation Ability

The use of argumentation theory and Toulmin’s argumentation pattern (TAP) is relatively new in science education. Argumentation and TAP have generally been investigated as a measure of informal, everyday reasoning about social issues, due to the similar nature of each. Both argumentation and informal reasoning recognize opposing assertions, weigh evidence for and against each assertion, generally use inductive reasoning, and adopt a justified position in the end (D. Kuhn, 1992; Sadler & Zeidler,

2005; Zohar & Nemet, 2002). D. Kuhn (1993a; 1993b), in her analysis of how individuals conceive of theories and the evidence that bears upon them, identified the parallel nature between informal and scientific reasoning using TAP as a framework. She found that in both informal reasoning of social issues and scientific reasoning during experimentation, individuals had difficulty recognizing the possible falsehood of a theory, identifying evidence that can confirm and refute the theory, and resolving alternative explanations. This finding and its parallel nature to the current epistemology

38 of science proposed by T. Kuhn (1993; 1996) and Lakatos (1993) has led to the use of

TAP to analyze individual ability to coordinate theory and evidence, recognize alternative explanations, and offer rebuttals in science. True to its roots in social issues however, the expansion of argumentation in the sciences has often focused on its use in the resolution of socioscientific issues (e.g. Sadler, 2004; Salder & Zeidler; Zohar & Nemet). Although the research regarding the use of argumentation with scientific issues is of increasing interest (i.e. Osborne, Erduran, & Simon, 2004), there is still less information on its use in resolving scientific problems. However, one can infer characteristics of argumentation in a strictly scientific context from the literature on informal reasoning of social and socioscientific issues.

Informal Reasoning, Ability, and Expertise

In a shift from the focus on formal, deductive-based reasoning, Perkins, et al.

(1983) investigated the problems individuals encounter using everyday, informal reasoning. This study utilized a large group of subjects that ranged from ninth-grade students to doctoral students to non-students, with and without college degrees. Perkins et al. interviewed individuals as they pondered over a given social issue, such as reinstatement of the draft, television violence, the effectiveness of bottle deposits on littering, and the definition of art. The authors wanted to determine the types and frequencies of different objections that the individuals used for lines of reasoning.

Although this study was in its preliminary stages and categorization of the objections did not follow a formal scheme, two major factors emerged related to the skilled reasoners: large knowledge repertoire and efficient knowledge evocation in the form of pattern and model recognition. Perkins et al. also suggested that a third component, a type of

39 metacognition, must be important. This suggestion came from the discovery of two types of : naïve reasoners who had a “makes-sense” epistemology and skilled reasoners who demonstrated a “critical epistemology.” The naïve reasoners tended to accept “truths” more readily, provided they made intuitive sense. Skilled reasoners, on the other hand, held higher standards with regard to adequate justifications, exhibited metacognition through questioning different mental models, and seemed to have general skills with regard to logical thinking and heuristics. These results began to delineate the characteristics of good informal reasoning.

Using the same data, Perkins (1985) investigated the relationship of educational level on informal reasoning ability. Looking at students and non-students, he analyzed the results in terms of ability to explain the logic of the argument with breadth and depth on six scales. He also gave an additional overall intuitive rating for argument quality.

Looking at trends in the data, graduate students performed better than college students who scored higher than high school students. Upon further analysis of actual gains at each education level, he found that, although all gains were small, the greatest gains were seen in the high school years. Perkins also performed a regression analysis of prior thought on the different problem topics, IQ, age, and years of education on each of the six scales. When pooling each of the student and non-student groups, IQ was found to be the most significant contributing variable. Perkins argues several conclusions. First, general ability is a major factor with regard to the use of informal reasoning for general social issues; however, as Perkins reviewed several studies that demonstrate an improvement in reasoning skills through instruction, general intelligence or ability cannot be the only factor. Second, students do improve significantly in informal-reasoning ability during the

40 high school years; however, this gain is small. Although the gains demonstrated by the college and graduate students were not as great as the high school students, Perkins suggests that student gains during the postsecondary years are most likely in their particular field of expertise and therefore more context-specific than the general topics used to assess them in this study. Taking these results together, Perkins surmises that general informal-reasoning skill gains may have a ceiling due to the combination of general intelligence influences and context-specific expertise as individuals progress through their education. However, as Perkins does not give any demographic information regarding the participants, it is difficult to discern if the apparent ceiling could be due to the prompts used in investigating informal-reasoning ability.

Including a consideration of Perkins (1985), Voss and Means (1991) conducted a literature review to further identify the important characteristics found in good informal reasoners through argumentation. They also determined that general ability appears to be a major factor in informal reasoning. In addition, age/experience does not affect informal- reasoning ability, as may be expected. Domain knowledge, as suggested previously by

Perkins, does appear to have some importance when considering problems that require an informed person. Additionally, Voss and Means summarized the characteristics of good reasoners in an argumentative context as having a “mental style” that actively analyzed arguments, could generate different types of arguments, weigh evidence, and reflect on progress. These characteristics are similar to those displayed by scientists when evaluating data and theories (D. Kuhn, 1992; Hogan and Maglienti, 2001).

Means and Voss (1996) built on this information, studying informal reasoning skill and its relationship to grade, ability, and knowledge levels. Students in grades 5, 7,

41 9, and 11 were selected from pre-established groupings of high, average, and low ability and interviewed individually regarding ill-structured problems, problem solutions, and problem difficulty. Similar to the scoring by Perkins (1985), each student was assigned a numerical score for several scales to determine the complexity and clarity with which the students discussed the problems. Overall, Means and Voss found that high-ability students were significantly better reasoners than average- and low-ability students, who most often did not differ between themselves. Although there was some significance attributed to grade level, it was not to the extent or frequency observed with regard to the effect of ability. It was postulated that this finding could be considered a general knowledge effect. When considering the effect of content knowledge, Means and Voss demonstrated that with regard to justification, sound argumentation was a function of ability level, not knowledge. However, increased knowledge was positively associated with the number of reasons and qualifiers generated. Although all the data is not presented, the authors also indicate that when knowledge was statistically partialled out, once again, high-ability level became the significant predicting factor with regard to sound reasoning. These results, taken together, agree with Perkins and Voss and Means

(1991) that general ability is the most significant factor with regard to informal-reasoning skills. Additionally, although age/grade has some effect, this could possibly be explained by the subject matter used in the problems.

Means and Voss (1996) synthesized their findings to create a two-component model for informal reasoning. The first component is a general informal-reasoning skill, possibly related to one’s learned language structures that allow a person to store, search for, and evaluate information. The second component is content knowledge, which can

42 include subject and personal knowledge combined in mental representations. These components work in tandem; higher-ability individuals are able to readily access and construct models from their content knowledge through their informal-reasoning skills while lower-ability individuals are able to recall content, but unable to mold it to fit different situations.

Informal Reasoning and College Students

Research on general improvements. Several studies have been conducted to determine the development of informal-reasoning skills in undergraduate students throughout their college experience. These studies sometimes had a focus on “critical thinking” or “everyday reasoning,” were primarily conducted in regard to social issues, and used mixed majors populations. Overall, a general trend of improvement was seen through the four years of college. Going as far back as 1963, Lehmann studied the 1958 entering freshman class of more than 1,000 students at Michigan State University at the beginning and end of their four years. With some improvement on Lehmann’s method and using a much smaller population of 47 individuals with a matched control group of non-college individuals from the same high school, Pascarella (1987) investigated the development of critical thinking during the first year of college. Both studies determined that there was a significant improvement in critical-thinking ability in students who attended college in regard to considering, analyzing, and evaluating multiple pieces of evidence and arguments as a whole; however, no specific aspects of the college experience were identified. In addition, Lehmann found the greatest improvement occurred during the first two years and that students became more flexible in their beliefs, open-minded, and receptive to new ideas throughout the four-year experience. Although

43 these studies cannot be easily transferable to today’s students, the results indicate that there is some development of reasoning skills in the undergraduate experience.

In a study which looked at a specific aspect of informal reasoning, “myside” bias,

Toplak and Stanovich (2003) also found improvement over the span of four years of undergraduate work. This study utilized 112 students from a diverse background of majors of which only 25% were in a science or engineering program. They were asked to argue both their own side (“myside”) and another position (“otherside”) to three social issues comprised of tuition subsidy, organ donation, and gasoline pricing. In each issue, the number of “myside” arguments was significantly greater than the number of

“otherside” arguments. The bias towards “myside” arguments was significantly greater for the issue regarding tuition than organ donation and gasoline prices, indicating a relationship between “myside bias” and more personal issues. Controlling for age and cognitive ability, Toplak and Stanovich also found that the tendency towards “myside” bias significantly decreased with each year in post-secondary education. Overall, these results indicate that length of time as a student at the university has a dampening effect on

“myside” bias, allowing for individuals to give more consideration to multiple sides of an issue. This progression appeared to have a greater impact on more personal issues, such as tuition costs and gasoline prices, which may be assumed to reflect more fiscally- minded undergraduate students. This conclusion agrees with Lehmann’s (1963) quantitative findings where students became more open-minded and increased their critical thinking throughout their student years at the university.

Interventions. Although the above three studies (Lehmann, 1963; Pascarella,

1987; Toplak & Stanovich, 2003) illustrate a general trend of improvement, there is very

44 little research as to what specifically and naturally influences this improvement in critical-thinking/informal-reasoning skills during the four years of college. However, more research can be found as to what types of interventions lead to an increase in informal-reasoning skills. Research on younger individuals indicated that the explicit instruction of reasoning models may have a direct effect on informal-reasoning skills

(Jiménez-Aleixandre, Rodríguez, & Duschl, 2000; Osborne, et al., 2004; Zohar, 1996;

Zohar & Nemet, 2002). However, due to maturation, these cannot be taken as a given for college-age individuals. There are very few studies on informal reasoning and college students aside from those above, which can describe only general trends. In a critical literature review for the improvement of informal and Piagetian hypothetico-deductive reasoning in introductory college biology courses, Daempfle (2002) found only nine studies published in a research journal when limiting his search to those that included college students in an introductory biology course, empirical research on instructional methods, and reasoning as an output variable. Altogether, Daempfle found no outstanding studies, but did find general characteristics of those studies that demonstrated an increase in both informal- and Piagetian hypothetico-deductive-reasoning skills. These important characteristics included a focus on writing, direct teaching of reasoning models, and length of time on instruction. In addition, most of the studies were conducted in a non- traditional/inquiry/collaborative environment. Therefore, it is difficult to determine if the characteristics or the environment elicited the positive effects. However, a study by van

Gelder and Bissett (2004) investigated the effect of deliberate practice, which included modeling and feedback, by undergraduates in an introductory reasoning course. They found a significant moderate correlation between hours engaged in deliberate practice

45 and an increase in informal-reasoning skills. Although not all confounding variables could be accounted for due to the voluntary effort of the participants, this study combined with the previously mentioned studies gives a strong indication for the positive effect of direct instruction and/or modeling.

Problems with informal reasoning. Although there is a general trend of improvement throughout the four years of college, there still remains much to be desired and encouraged in undergraduate informal-reasoning skills. Baron (1991) and Perkins, et al. (1991) both assert that there are two major areas of difficulty, independent of age, when it comes to informal reasoning: bias/fairness and incomplete evidence/lack of

“optimal search.” As seen in Toplak and Stanovich (2003), individuals tend to view and create arguments primarily from a “myside” bias. Those individuals who demonstrate better informal-reasoning skills are praised for their ability to consider multiple sides to an argument. In addition, those individuals who consider and seek out multiple pieces of evidence also, by definition, demonstrate better informal reasoning. Perkins, et al. (1983) had touched on both these concepts with the proposition that most individuals are poor reasoners who utilize a “makes-sense epistemology,” not going beyond their own beliefs or knowledge to discern a problem. This is supported by additional studies that illustrate a significant increase in the number of sentences and quality with regard to one’s pro-side and personal interest for an issue (Perkins, 1985; Woll, Navarrete, Sussman, & Marcoux,

1998). Ironically, when Baron investigated what undergraduates value with regard to quality of thinking, students valued two-sided arguments.

Cerbin (1988) discusses several areas in which he undergraduate students have difficulties in informal reasoning. One area for improvement is in undergraduates’

46 failure to explore the consequences of claims. Cerbin also agreed with Baron (1991) and

Perkins et al. (1991) that undergraduates present underdeveloped arguments with inadequate use of evidence; however, Cerbin posited that this could partially be due to a lack of well-organized content knowledge. This belief is similar to Means and Voss’

(1996) two-component model for informal reasoning, which held content knowledge as important for good argumentation. Both these assertions could also be due to the findings of research demonstrating the difficulty that undergraduates and other individuals have with differentiating between evidence and claims, and the consequential poor evaluation of the connection between the two. This poor evaluation is possibly due to a focus on the accuracy of the evidence/claims over the plausibility of their linkage (D. Kuhn, 1992;

1993a; 1993b; Shaw, 1996). Once again, ironically, students in Baron’s study highly regarded correctness of content with consistency between conclusions and arguments.

Reasoning Through Argumentation in Science

Coordination of theory and evidence. Deanna Kuhn (1992) focused on the use of argument as a way of reasoned, everyday thinking. She theorized that everyday, informal reasoning mentally took the form of a dialogic argument: recognizing two opposing assertions, weighing the evidence, and determining the relative merit of each assertion.

Using social issues, such as prisoners returning to a life of crime, failure of students in school, and causes of unemployment, D. Kuhn investigated the presence and quality of argumentation in subjects' thinking on each of the problems. Subjects, ranging from ninth-grade students to individuals in their 20s, 40s, and 60s, were grouped according to education level and sex. Overall, D. Kuhn found that the characteristics of dialogic argument were present in the subjects’ reasoning regarding the social issues, however,

47 not to a great extent. The percentage of success ranged from 25% to 60% for each measure: quality of evidence, generation of alternative theories, creation of counterarguments, and development of rebuttals. When investigating contributing factors to reasoning success, only education level demonstrated a significant correlation. The absence of significant correlations with age, gender, and topic knowledge suggested that informal reasoning via argumentation constitutes a general set of innate abilities/skills, agreeing with the work of Perkins (1985) and Voss and Means (1991). It is also interesting to note that the adolescents were prospectively classified as to education level, yet still showed no significant difference from the other age groups, implying that this general reasoning skill may have peaked in late adolescence, also agreeing with Perkins.

D. Kuhn (1992) also found that only 15% of the subjects displayed an evaluative epistemology that values thinking, evaluation, and argument in knowing, similar to that epistemology of science theorized by T. Kuhn (1996) and Lakatos (1993). This result agreed with findings from her other studies that looked specifically at individuals’ scientific thinking (D. Kuhn, 1993a; 1993b). In these studies, lay adults, adolescents, and children were observed as they created experiments and analyzed data. The results demonstrated that subjects had difficulty differentiating between evidence and theories, especially recognizing that the theory was something separate to reflect upon using both confirming and refuting evidence. D. Kuhn found that in both a social (1992) and scientific context (1993a; 1993b), the confusion between evidence and theory led subjects to typically and selectively assimilate new positive evidence into their own theory without much evaluation. Individuals were even found to adjust their theories, albeit subconsciously, to fit the new evidence. Without this separation and coordination, it is

48 very difficult to evaluate theories and therefore employ, by definition, an evaluative/scientific epistemology and approach to problem solving. Together, these findings provided the link, through dialogic argumentation, between everyday, informal reasoning and scientific reasoning. In both instances, individuals recognized opposing assertions, weighed the evidence, and determined the relative merit of each assertion to come to a conclusion, although not without difficulty.

Taking this farther, Hogan and Maglienti (2001) attempted to understand where students and scientists currently stand in their abilities to coordinate theory and evidence and how scientists came to understand the fundamentals of this type of reasoning. They investigated these differences by comparing the responses of middle school students and working scientists to questions judging the validity of hypothetical conclusions based on hypothetical data. By comparing the students, who represented lay people, with the scientists, Hogan and Maglienti found that lay individuals demonstrate faulty scientific reasoning by valuing causal claims without appropriate grounds or warrants, attaching their own personal inferences to the conclusions, and using those inferences to validate vague claims. Scientists, on the other hand, valued conservative claims based on clear , appreciated the consideration of alternative explanations, and poorly rated those conclusions with vague statements, illustrating an evaluative epistemology as defined by D. Kuhn (1992). In addition, Hogan and Maglienti found that scientists claimed to have developed their own reasoning skills through observing and modeling mentors, as well as considering critiques by experts in their field. Overall, this study elucidated the current dichotomy between novice and expert scientific reasoners as well as the general path through which one becomes an advanced scientific reasoner.

49 Although this study provides critical information regarding the current state of science education, it does not offer a readily feasible way in which teachers can aid students in increasing their higher-order thinking skills and scientific reasoning.

Interventions. Some researchers have taken the information regarding difficulties in coordinating theory and evidence to determine what types of intervention may help.

Zohar (1996) looked to determine if a learning environment, designed with constructivism and cognitive conflict in mind, could help increase the number of valid inferences that eighth- and ninth-grade students made regarding causal evidence. Similar to D. Kuhn (1993a; 1993b), students were encouraged to design experiments and analyze data during a four class-period lesson. Students were first allowed to investigate as they desired, but then given some instruction and cuing regarding variable control. Comparing pre- and post-tests, Zohar found that the number of valid inferences had increased from

11% to 77% with a decrease in the total number of inferences, indicating a more systematic and focused approach. The students who scored perfectly on the post-test were also given a transference problem and delayed post-test. These students also demonstrated an increase in the number of valid inferences with 87.5% on the transfer problem and 85% on the retention problem. However, the number of students included in the transfer/retention portion was small and not representative of the population, there were no statistical test outcomes given for either results, and the learning environment was a combination of many different constructivist . Yet, the results may be conservatively regarded as positive for the influence of explicit instruction with regard to reasoning through data and conclusions.

50 Building on this information, Zohar and Nemet (2002) investigated the outcomes of specific, explicit instruction of general reasoning patterns within science content in ninth-grade students. Before instruction, students were able to create arguments, counterarguments, and rebuttals, albeit often consisting of a simple structure with only one justification. This was similar to findings by Jiménez-Aleixandre, et al. (2000) who determined that ninth-grade students’ arguments in genetics were dominated by the identification of claims but not the justifications and warrants to support them. Zohar and

Nemet discovered that those students who received the additional training in argumentation skills within a unit on genetics displayed increased content knowledge, were significantly better at applying that knowledge correctly in socioscientific arguments regarding genetics, and significantly increased the complexity of their arguments in comparison to the control group. Although it is difficult to determine the exact influence of the explicit instruction versus the increase in the social aspects of the experimental classroom, the conclusions were much more sound than Zohar (1996) with regard to the positive effects of explicit instruction.

Findings from Zohar and Nemet (2002) are complemented by those uncovered by

Osborne, et al. (2004) who also investigated the usefulness of specific intervention on reasoning skills of eighth-grade students. This study was the second phase of a professional-development program designing argumentation materials for teachers to use in their classrooms. Osborne et al. studied teacher-led group discussions of both scientific and socioscientific issues. They found that students appeared to increase the quality of their argumentation after a year of intervention as asserted from results that were positive, but not statistically significant. However, this study also presents two more interesting

51 points. First, both the experimental and the control groups, which were taught by the same teacher, increased in the quality of their argumentation, suggesting that improvement may be teacher-specific. Second, although eight lessons were taught as argumentation in a scientific context, the quality of argumentation was significantly better in the socioscientific context over the scientific context. This once again possibly indicates the importance of content-specific knowledge in the creation of strong arguments. In both of these studies, it is not known whether the instruction or the classroom environment that encouraged argumentation contributed to the increased reasoning skills; however these findings tend to suggest a value of explicit instruction of reasoning skills.

Taken together, these four studies (Jiménez-Aleixandre et al., 2000; Osborne et al., 2004; Zohar, 1996; Zohar & Nemet, 2002) also elucidate two important points regarding the ability of students to generate sound scientific reasoning in the form of arguments: without intervention, students tend to develop simple arguments with few justifications and students encounter less difficulty in arguing socioscientific issues than scientific issues. These two findings are corroborated by a critical literature review of studies linking informal reasoning and socioscientific issues by Sadler (2004). When regarding the first point, Sadler found that, in general, students tend to display shallow analyses when considering evidence. This coincides with the findings of previous studies whereby students value vague, weakly substantiated conclusions, create simple arguments with single justifications, and tend to lack sufficient warrants and rebuttals. If students do not consider multiple pieces of data at a reasonable depth, then their arguments will be weak and simple in turn. Secondly, Sadler points out that personal

52 connections to socioscientific issues tend to demonstrate a higher correlation with higher- order thinking/informal reasoning, similar to the positive findings by both Osborne et al.

(2004) and Zohar and Nemet (2002). This point is also highlighted by the lack of improvement in students’ reasoning in scientific issues determined by Osborne et al., although weakly contradicted by Zohar. Two possibilities for this positive connection with socioscientific issues are available. First, students may be more knowledgeable and personally related to issues with a social component, granting them a broader source of information from which to draw grounds, warrants, and backing. Second, although both supporting and contradictory evidence is supplied by Sadler’s literature review and

Sadler and Ziedler’s (2005) work demonstrating an increase in content knowledge associated with a decrease in flawed reasoning, it is possible that students’ lack of conceptual understanding may impede their ability to generate sound and complex arguments regarding scientific issues. This may be due to a lack of known grounds, warrants, and backing.

Summary

Two points need to be made regarding the current state of research on argumentation and informal-reasoning. First, there are strong studies which link informal- reasoning ability to general ability and education (D. Kuhn, 1992; Means & Voss, 1996;

Perkins, 1985; Voss & Means, 1991). For these results, it is important to consider that there is some selectivity with regard to ability that occurs with those individuals who pursue higher education. Therefore, those studies linking informal-reasoning ability to educational level may be informed by those regarding general ability. However, the data that demonstrate a positive correlation between education and informal-reasoning ability

53 are also supported by those studies illustrating an improvement in critical thinking throughout four years in college (Lehmann, 1963; Pascarella, 1987; Toplak & Stanovich,

2003). Therefore, it is likely that there is some sort of interaction between ability and education level to improve informal-reasoning and argumentation skills in informal settings.

Second, it is difficult to determine the influence of content knowledge on informal-reasoning ability and argumentation. Most of the studies discussed here, especially those regarding undergraduates, utilized social or socioscientific issues in order to determine students’ informal-reasoning abilities. As informal reasoning is often considered part of the everyday reasoning domain, it is obvious and appropriate to utilize issues from everyday life. Some of these studies have tried to take into account prior thought and knowledge about the topics to mixed results (Means & Voss, 1996; Perkins,

1985; Perkins et al., 1983). Means and Voss (1996) have even tried to explain the confusion with their two-component model of reasoning, separating general informal- reasoning skills and content knowledge. Support for this view can also be found in

Perkins and Salomon’s (1989) historical review of the debate as to the context- dependency of cognitive skills. They proposed that there are general cognitive skills that adapt to domain-specific areas, with some domains more readily adapted to than others because of use and knowledge base. This view is also supported by those studies that have demonstrated that the more personal the issue and the more intimate the knowledge, the better the reasoning (Perkins, 1985; Sadler & Zeidler, 2005; Toplak & Stanovich,

2003; Woll et al., 1998). Overall, the relationship of content knowledge and informal-

54 reasoning ability and argumentation is tenuous, depending on the topic and participants used.

Taken together, this information could have an impact when expanding this research to a scientific context for biology majors. One might expect that science majors, who have chosen to go to college and who should have greater knowledge and interest in their field of study, may exhibit strong argumentation skills. However, the studies that looked at reasoning in science problems did not examine college students and demonstrated difficulties in science-issue argumentation (Hogan & Maglienti, 2001;

Osborne et al., 2004). Therefore, it is problematic to extrapolate results with any certainty. Altogether, the data from studies in informal reasoning and argumentation provides very little information with regard to undergraduate science majors’ actual abilities in argumentation, especially with regard to its use in science. One can only infer similar difficulties with coordinating theory and evidence and displaying an evaluative epistemology to be discovered in biology majors, as is found in both lay and younger individuals.

55 CHAPTER 3

METHODS

Research Design

This study is descriptive in nature. The research design was created to elicit information regarding students’ SR abilities development through the students’ natural participation in the first two quarters of an undergraduate introductory biology course sequence. No particular intervention with regard to SR was given. To this end, participants were assessed during their regularly scheduled lab section at the beginning of their first quarter of biology study, the end of the first quarter, and the end of the second quarter. At the university setting chosen, this corresponded to the autumn 2006 and winter 2007 quarters. This pattern of assessment was designed to determine the students’

SR skills proficiency at the beginning of their study, at the mid-point of the course sequence, and at the end of the course sequence.

With this research design, several threats to the internal validity of the study must be addressed. Foremost, the threat due to repeated testing was addressed by the use of two parallel forms for each instrument and the length of time between administrations. In the first administration at Week 1, Form A of each instrument was given; at Week 10,

Form B of each instrument was administered. Form A was then given again at the end of

56 the second quarter, a total of 22 weeks after the first administration. Although the format of the instrument forms was identical, the items were not. This format, combined with the long periods of time between testing served to reduce the testing threat. In relation to the preparation of two forms, the threat due to instrumentation is of concern. To address this, both forms of each instrument were piloted to ascertain reliability and correlated for similarity. The argumentation instrument was also face validated, while the hypothetico- deductive reasoning instrument has been repeatedly validated, as reported in the literature. Lastly, due to the passage of 23 weeks between the data collection events, threat due to maturation needs to be addressed. This threat is decreased by the relative maturity of the subjects in the study. Although the study focuses on SR development, most participants in the study needed to have reached a higher level of maturation to be enrolled in a biology-majors class at the main campus of the university. In addition, only participants over the age of 18 were considered.

The larger threats to this research design lay in the threats to external validity.

This is a mostly descriptive study designed to give some indication of a baseline of SR development. Generalization of the results will not be able to be expanded much beyond the population used. This is due to the threat of the interaction of testing and the students’ natural development. It is difficult to generalize results to individuals who have not been pre-tested, especially when it is assumed that testing SR skills directly is not a very common experience. In addition, the interaction of self-selection with the study must be considered. The participation in this study was strictly voluntary. This may skew the results as they specifically relate to those students who were very enthusiastic about science and were eager to determine their own aptitude with regard to SR skills.

57 However, the large number of participants for the first two administrations and the diversity of the university may ameliorate this threat.

Participants

The target population for this study was biology majors in the first year of coursework in the major curriculum. Biology majors include those declaring a major in biology, biochemistry, entomology, evolution and ecology, microbiology, molecular genetics, plant cellular and molecular biology, and zoology. The accessible population used was voluntary students, over the age of 18, enrolled in the introductory biology course sequence for majors at a large Midwestern university during the autumn quarter

2006 and winter quarter 2007. This accessible population is believed to be appropriate for this study due to the diverse student population available at one of the largest universities in the United States. Participants were recruited verbally and in writing during their lecture class and completed the instruments during their lab section. To maintain the anonymity of those choosing to participate, all students were asked to complete the instruments. Data from all participants who volunteered were utilized, including 412 students at the first autumn administration, 344 students at the second autumn administration, and 59 students at the winter administration. Of the original 460 volunteers, 30 participants completed either the hypothetico-deductive (HD) reasoning or argumentation portion of all three administrations. To account for participants who did not participate in all aspects of the study, all statistical tests were selected to exclude individuals list-wise, i.e. those who did not give complete data for both of the two dependent variables tested were not included. A summary of all the participant characteristics is presented in Tables 1 and 2, as well as Figure 5. Overall, the typical

58 participant was a 20-year-old sophomore non-biology major with plans to earn a B.S. and go to professional school, having taken three previous college-level science courses and one Advanced Placement (AP) course.

Variable Biology Majors Not Biology Majors

Total n 141 319

Gender Female 74 179

Male 67 140

Age (years) M 19.96 20.36

SD 2.01 2.32

University Rank Mode Sophomore Sophomore

n 78 162

Number of Years as an M 2.01 2.36 Undergraduate SD 1.15 1.30

Number of Science M 3.49 3.25 Courses Taken SD 2.26 2.08

Number of AP* Courses M 0.97 0.71 Taken SD 1.30 1.21 Note. Age, number of years as undergraduate, number of science courses taken, and number of AP courses taken are normally distributed. *AP = Advanced Placement.

Table 1. Demographics of All Participants by Major

59 Variable Biology Majors Not Biology Majors

Total n 141 319

Degree Sought A.S. 0 1

B.A. 23 50

B.S. 116 239

Graduate 5 14

Post-Baccalaureate Job 11 104 Plans Bachelor Degree 2 23

Science Grad School 35 67

Other Grad School 4 29

Professional School 96 129 Note. Total n in each variable category does not add up to overall total n as some individuals indicated multiple degrees sought or several post-baccalaureate plans.

Table 2. Distribution of Future Plans for All Participants by Major

60 160 141 140 120 100 76 80 60 51 40 35 32 27 23 Number of Participants 21 21 20 20 14 12 0

Other Biology Business Pharmacy Undecided Education Agriculture Humanities Engineering Allied Medical Other Science Natural Resources Major

Figure 5. Distribution of all 460 participants by major.

Outcome Measures

Dependent Variables Data Collection

Hypothetico-deductive reasoning. Student hypothetico-deductive (HD) reasoning ability was assessed using the Lawson (2000) revised, multiple-choice edition of the

Lawson Classroom Test of Scientific Reasoning (LCTSR). The instrument has well- established validity and reliability. For example, Lawson, Banks, and Logvin (2007) demonstrated a posttest Kuder-Richardson 20 internal-consistency reliability coefficient of 0.79. The LCTSR is used to assess several aspects related to HD reasoning, such as conservation of weight and volume, probability, proportionality, correlations, control of variables, and HD reasoning directly. The LCTSR consists of 12 scenarios followed by two questions each. There are six pairs of similar scenarios, each pair addressing one of the aspects related to HD reasoning. With each scenario, the first question focuses on the

61 scenario content specifically, while the second question asks for the reason the first answer is correct. Each answer for the first question has a corresponding reason in the second question.

To respond to the testing threat to internal validity, the LCTSR was piloted to determine the reliability of each half of the split test and administered as equivalent forms. In addition, it is noted that when used in the literature, the full LCTSR has been regularly utilized as a pre- and post-test with college students (e.g. Johnson & Lawson,

1998; Lawson, 1980; 1992a). The split forms divided the six pairs of similar scenarios into two forms each with six scenarios, including twelve questions total. Participants were given a score based on the total number of items answered correctly, as compared against a right/wrong answer key. The form order for the autumn administrations was randomly decided. The first form (A) was given again at the end of the winter quarter.

Form A Form B

Original LCTSR Form A Original LCTSR Form B Question Pair Question Pair Question Pair Question Pair

1,2 A1, A2 3, 4 B1, B2

7, 8 A3, A4 5, 6 B3, B4

9, 10 A5, A6 15, 16 B5, B6

11, 12 A7, A8 13, 14 B7, B8

17, 18 A9, A10 19, 20 B9, B10

21, 22 A11, A12 23, 24 B11, B12

Table 3. Original LCTSR Question Distribution to Forms A and B

62 The Cronbach’s alpha test for reliability for the 12 items of Form A was α = 0.53, p = 0.000 (n = 391), and the 12 items of Form B was α = 0.67, p = 0.000 (n = 318). There was a moderate correlation between the two forms (r = 0.42, n = 299, p = 0.000).

Although the Cronbach’s alphas were lower than generally accepted for a social science instrument, it is important to note that Cronbach’s alpha test for reliability assesses the homogeneity of the instrument for that which it purports to measure. By splitting the instrument into two halves that were originally a part of one measure, the original redundancy, which provided substantial interrelatedness of the item responses, decreased as expected. To confirm this, separate principal components analyses on Form A and

Form B demonstrated six factors corresponding to the six scenarios/pairs of questions

(Tables 4 and 5). The results from these analyses corroborate the low Cronbach’s alpha results for the LCTSR forms by illustrating some of the lack of homogeneity as inherent to the instrument and the process of dividing it into two forms.

63 Component

Item 1 2 3 4 5 6 Communality

A5 .96 .94

A6 .96 .94

A9 .91 .84

A10 .90 .83

A1 .88 .79

A2 .88 .78

A7 .83 .69

A8 .83 .69

A3 .76 .64

A4 .85 .73

A11 .78 .63

A12 .80 .65

Eigenvalue 2.25 1.69 1.56 1.36 1.19 1.10 9.15

% Total Variance 18.8 14.1 13.0 11.4 10.0 9.0 76.3

% Trace 24.6 18.5 17.0 14.9 13.0 12.0 100.0 Note. Varimax orthogonal rotation used. n = 391. Loadings less than 0.3 are not printed. Component 1 = Control of Variables, Component 2 = Probabilistic Thinking, Component 3 = Conservation of Mass, Component 4 = Advanced Control of Variables, Component 5 = Advanced Proportional Thinking, and Component 6 = HD Thinking.

Table 4. Principal Components Analysis Rotated Factor Loadings of LCTSR Form A

64 Component

Item 1 2 3 4 5 6 Communality

B1 .97 .96

B2 .95 .95

B3 .90 .86

B4 .88 .86

B9 .85 .78

B10 .89 .80

B5 .72 .61

B6 .80 .69

B7 .64 .54

B8 .75 .62

B11 .42 .50 .52

B12 .86 .79

Eigenvalue 2.97 1.67 1.31 1.07 1.06 0.91 8.99

% Total Variance 24.7 13.9 10.9 8.9 8.8 7.6 74.8

% Trace 33.0 18.6 14.6 11.9 11.8 10.1 100.0 Note. Varimax orthogonal rotation used. n = 318. Loadings less than 0.3 are not printed. Component 1 = Conservation of Volume, Component 2 = Proportional Thinking, Component 3 = Correlational Thinking, Component 4 = Probabilistic Thinking, Component 5 = Advanced Control of Variables, and Component 6 = HD Reasoning.

Table 5. Principal Components Analysis Rotated Factor Loadings of LCTSR Form B

65 Argumentation. Student argumentation ability was assessed via a paper and pencil instrument developed specifically for this study (Appendix A). Multiple forms of the instrument were piloted to assess equivalence, content validity, and reliability. Experts in biology and education established face validity. Participants were asked to respond to a set of questions designed to extract their inductive-reasoning patterns. These patterns were assessed through the argumentation support of participant-derived conclusions from a given scenario and data set relative to evolution (Form A) and ecology (Form B).

Students’ abilities to express the five aspects of Toulmin’s argumentation pattern (TAP): grounds used, claims made, warrants used, and ability to identify and rebut counterarguments, were examined using open-ended questions. Argumentation quality was assessed using a rubric with a score ranging from 0 – 2 per each item (Appendix B).

The assessment focus was not on the correct content of each item, but rather the presence and articulation of each aspect assessed by that item, as well as internal coherence among answers. Each item was scored independently and then summed to create a composite score for each student. The reliability of the five-item forms was assessed by Cronbach’s alpha: for Form A α = 0.68, p = 0.000, and n = 412 and for Form B α = 0.72, p = 0.000, and n = 344. Both forms also demonstrated two subscales relating to the types of questions asked when analyzed by a principal components analysis (Table 6).

66 Form A Form B Component Component

Item 1 2 Communality Item 1 2 Communality

A4 .87 .56 B4 .87 .81

A5 .89 .56 B5 .90 .83

A1 .73 .56 B1 .72 .62

A2 .75 .56 B2 .80 .65

A3 .66 .47 B3 .70 .55

Eigenvalue 2.22 .98 3.20 Eigenvalue 2.38 1.07 3.45

% Total 44.4 19.7 64.1 % Total 47.6 21.3 68.9 Variance Variance

% Trace 69.4 30.6 100.0 % Trace 69.0 31.0 100.0 Note. Varimax orthogonal rotation used. Loadings less than 0.3 are not printed. n = 412 for Form A and n = 345 for Form B. For each Form A and B, Component 1 = Alternative Explanations and Component 2 = Initial Argument.

Table 6. Principal Components Analysis Rotated Factor Loadings of Argumentation Forms A and B

The first argumentation subscale (represented by Component 2), comprising of items 1, 2, and 3 on each form, represents the participants’ ability to generate and support an argument. The other subscale (represented by Component 1), comprising of items 4 and 5, represents the participants’ ability to recognize and rebut an alternative explanation. Each subscale on each form was checked for reliability using Cronbach’s

67 alpha (Table 7). Although the values are not ideal (> 0.7), they are sufficient for use in this study.

Form A Form B

Initial Alternative Initial Alternative Argument Explanations Argument Explanations

n 413 412 346 345

Cronbach’s α 0.54 0.76 0.63 0.80

Table 7. Reliability of Argumentation Forms A and B Subscales

Independent Variables Data Collection

Participants were asked for demographic information to correlate with dependent variable data. This information was collected during the LCTSR and argumentation instrument administration at the beginning of the autumn 2006 quarter (AU1) and at the end of the winter 2007 quarter (WI). Participants were asked for their declared or planned major, university rank, number of years at the university, post-graduation goals, number of previous advanced placement (AP) and undergraduate science courses taken, gender, and age. In the winter quarter, participants were also asked when they enrolled in the first quarter of the introductory course sequence or its equivalent and the type of institution where they took the equivalent (Appendix C). Each of these variables was collected to account for possible influences and intervening variables affecting the dependent variables.

68 Because the introductory courses at the Midwestern university where the study was conducted have dual lecture/laboratory components, instructional evidence was collected in the form of lecture and laboratory syllabi, assignment instructions,

PowerPoint lecture notes, sample exams, and laboratory manuals. These materials were examined for evidence of HD reasoning and argumentation emphasis, as well as level of inquiry encouraged in the laboratory portion of the course. Laboratory exercises were scored on a scale of 1 to 5, based on the five levels of inquiry identified by Bonstetter

(1998), to give an indication of the average level of inquiry involved.

69 CHAPTER 4

RESULTS AND CONCLUSIONS

Characterization of Courses

The two courses in the introductory biology sequence in this study were comprised of a lecture section and a corresponding lab section. Students attended lecture twice per week and laboratory twice per week for the first biology course (“Biology 1”), but attended laboratory once per week for the second biology course (“Biology 2”). The courses traditionally have had a large enrollment with more than 20 lab sections per quarter. The enrollment in autumn 2006 for Biology 1 was approximately 600 and decreased to approximately 400 in winter 2007 for Biology 2. With these class sizes, the two lecture instructors for Biology 1 and the one instructor for Biology 2 relied on

PowerPoint presentations with a focus on factual information. These presentations were provided to the students through the online classroom.

Assessments for the students primarily consisted of multiple-choice exams in lecture and short multiple-choice quizzes for lecture/laboratory information. The questions on the Biology 1 exams were typically factual-recall with some application. No exam samples were collected for Biology 2. Any data given on exams was for calculation-based problems, such as Mendelian genetics. Other assignments consisted of

70 worksheets for laboratory exercises and a New York Times assignment for Biology 1.

This essay assignment required students to investigate any topic in biology based on science articles found in the New York Times. However, this assignment was more fact- based and students were not asked to investigate the socioscientific aspects of their topics; however, news articles were also discussed on a regular basis in the lectures for both quarters. Students also completed the laboratory exercises as characterized in Table

8 according to Bonstetter (1998).

Level of Inquiry

Traditional Structured Guided Course Level 1 Level 2 Level 3 Mean Level

Biology 1 (Autumn) 3 4 1 1.75

Biology 2 (Winter) 7 7 0 1.50 Note. Mean level calculated by multiplying the number of laboratory exercises by the level of inquiry, summing those values, and dividing by the total number of laboratory exercises.

Table 8. Laboratory Exercises Characterization by Level of Inquiry

Traditional laboratory exercises were more “cookbook” (Level 1) where the instructor provided everything from the topic to the conclusions. Examples of these labs were microscope identification of cell organelles in Biology 1 and multiple dissections in

Biology 2. In the structured labs (Level 2), students were given all materials, a step-by- step procedure, and data analysis guidance, but they determined their own conclusions.

Not considering the multiple labs of dissections (three), this was the dominant type of laboratory exercise employed during this study. In the guided inquiry (Level 3), students 71 were provided with topics, questions, and materials, but allowed to develop their own procedure and data analysis. The example of this type of laboratory was the characterization of enzyme kinetics in Biology 1. Evidence of HD reasoning and argumentation was found in those laboratory exercises classified as a Level 2 or 3. With

Level 2 exercises, students were encouraged to recognize the hypotheses and predictions and to generate a conclusion based on the evidence they collected. They were often asked to explain what they “expect[ed]” and “why.” Students’ “reasoning” for answers was irregularly required. However, one Biology 1 and two Biology 2 Level 2 exercises also asked students to determine or explain away alternative explanations. The Level 3 exercise in Biology 1 was more intellectually demanding with regard to scientific reasoning as students were required to explain their experimental designs and address multiply hypotheses for their questions of interest. This process was socially constructed within their laboratory groups and as a class, mimicking the process of scientific evidence sharing.

Overall, the pedagogical design and execution of the Biology 1 and Biology 2 courses was what would be expected as typical of a large university lecture course.

Factual recall was emphasized with laboratories focused primarily on structured inquiry.

This is most likely a common logistical solution for classes with large numbers of students. However, it is also emphasized in the course objectives given in the Biology 2 syllabus. The courses do encourage students to seek science information outside of lecture with the New York Times assignments; however, they are still seeking more factual information on a topic rather than focusing on the way in which that information

72 is gathered and displayed. These pedagogical choices have not been demonstrated to have significant effects on scientific reasoning, although they are assumed to do so implicitly.

Hypothetico-Deductive Reasoning

Initial Distributions

The initial distribution of the LCTSR scores for each administration was first investigated to determine the normality of the data and discern any readily distinguished patterns (Table 9). It was noted that the second autumn administration (AU2) appears lower than either the first autumn administration (AU1) or the winter administration

(WI). In addition, the AU1 and WI administrations appear similar, while the standard deviations for both the AU2 and WI administrations are 0.5 to 1 point larger, indicating more spread of scores with the later administrations. The average scores are approximately equal to 8. This total score on the LCTSR, adjusted for using a total point score and the split-half version, corresponds to the low end of scores for formal reasoning ability (Lawson et al., 2007). In addition, the percentage of individuals who scored in this range was slightly higher than the previously reported 50% of non-major students in biology (Johnson & Lawson, 1998; Lawson, 1992a). This finding is encouraging but also could be expected due to the assumption that students enrolled in a science major course would have a natural aptitude for science and its related skills. Lastly, because the introductory biology course sequence is important for other majors, a comparison between biology majors and the other participants was examined. There appeared to be no difference between the biology majors and other students in the course.

73 AU1 AU2 WI

Bio Not Bio Bio Not Bio Bio Not Bio n 116 275 84 234 32 27

M 8.29 7.98 7.65 7.82 8.28 8.85

SD 1.81 1.86 2.56 2.37 2.80 2.60

% Scores ≥ 8 69.0 62.8 53.0 56.2 65.6 81.5 Note. All populations for each administration are normally distributed, save the WI administration for non- biology majors, which is negatively skewed (-1.69). Bio = biology majors, Not Bio = non-biology majors. AU1 = 1st administration in autumn quarter, AU2 = 2nd administration in autumn quarter, and WI = 3rd administration in winter quarter.

Table 9. Average Total LCTSR Scores by Administration and Major

Due to the administration of the instruments during the students’ regularly scheduled lab sections, a repeated measures MANOVA was performed for the two autumn administrations and a one-way ANOVA was performed on the winter administration to determine if there was any effect on the LCTSR scores and their change related to the different laboratory instructors. For the autumn administrations, the assumptions of equality of variances were met (Table 10). It was found that there was no significant interaction between lab section and LCTSR test-score change (F21, 277 = 3.80, p = 0.098). There were, however, significant differences found among the 22 autumn lab sections (F21, 277 = 1.78, p = 0.021, effect size = 0.12). With a Bonferroni adjustment for the repeated post-hoc tests, section 3 (overall M = 6.05) was found to differ from both section 12 (overall M = 9.25, p = 0.005) and section 14 (overall M = 8.71, p = 0.026).

74 AU1 AU2

Autumn Lab Section M SD n M SD n

1 7.95 1.67 20 7.10 2.25 20

2 8.17 1.34 12 8.25 1.49 12

3 7.30 1.42 10 4.80 2.20 10

4 8.29 2.63 7 6.86 2.55 7

5 7.93 1.64 14 7.71 2.30 14

6 7.94 1.57 16 7.44 1.93 16

7 8.60 2.70 5 7.80 1.79 5

8 8.50 1.65 10 6.10 2.77 10

9 8.40 1.35 10 7.70 2.75 10

10 7.50 1.60 8 6.63 1.77 8

11 7.88 2.13 16 7.19 2.37 16

12 9.58 1.00 12 8.25 1.49 12

13 7.82 1.89 11 7.36 2.06 11

14 8.84 1.86 19 8.21 2.15 19

15 8.18 1.47 17 6.65 1.73 17

16 8.38 2.14 13 8.00 2.20 13

17 8.17 1.76 18 7.83 2.36 18

Continued

Table 10. Average AU1 and AU2 LCTSR Scores by Lab Section 75 Table 10 continued

AU1 AU2

Autumn Lab Section M SD n M SD n

18 8.20 1.85 20 8.00 2.13 20

19 8.20 1.64 5 6.80 2.28 5

20 7.63 1.86 16 6.25 2.57 16

21 8.58 1.61 19 7.68 2.47 19

22 7.53 1.84 19 7.68 2.21 19

For the winter quarter lab sections, a one-way ANOVA was performed. However, the assumption of equality of variances did not hold true and 5 of 12 sections only had one or two participants, not permitting the calculation of post-hoc tests to identify individual significant differences (Table 11). Therefore, even though the ANOVA demonstrated a significant difference due to section (F12, 46 = 2.89, p = 0.006), no practical information could be gleaned to describe this influence. Due to the only statistical differences being found in the autumn data between the lowest and highest average scores and a lack of interaction between score change and lab section, there appears to be no discernable possibility of different laboratory instructors and sections being an intervening variable.

76 Winter Lab Section M SD n

23 8.71 3.15 7

24 11.00 0.00 2

25 8.00 - 1

26 8.88 1.36 8

27 8.50 2.12 2

28 11.50 0.72 2

29 7.80 2.17 5

30 5.43 3.87 7

31 8.00 2.10 6

32 9.67 1.63 6

33 9.83 1.47 6

34 10.4 1.52 5

35 4.00 1.41 2

Table 11. Average WI LCTSR Scores by Lab Section

Relationship of Background Characteristics and LCTSR Autumn Quarter Scores

An exploratory multiple regression analysis was undertaken to determine if any demographic characteristics or set of characteristics could be determined to be an influencing factor on LCTSR scores. Due to the exploratory nature of the analysis, a multiple regression was performed on both the AU1 LCTSR scores and the AU2 LCTSR 77 scores to identify any recurring characteristics and provide cross-validation. The response for both instruments was sufficiently repetitive and there was a lack of an apparent significant score increase to warrant the use of the second administration as a cross- validation sample. To keep the ratio of variables to number of individuals low, several stepwise multiple regression analyses were conducted. In addition, due to the repetitive nature of the AU1 and AU2 LCTSR assessments, the AU1 scores were entered by force before the stepwise entry of the other independent variables when the AU2 scores were regressed on them. This accounted for any variance due to a test-retest factor.

The LCTSR scores for each autumn administration were regressed on the demographic variables including age, gender, years as an undergraduate, number of

Advanced Placement courses, and number of college science courses as a set, as well as university rank separately, to determine if maturation and previous experience influenced

LCTSR scores. These were chosen based on the relationship of the LCTSR to Piagetian theory, which holds that physical maturation and experience are the factors that influence

HD reasoning skills (Piaget, 1972). No consistent predictors were found in either analysis. The LCTSR scores were also regressed on the future plans variables, including degree sought and post-baccalaureate plans, to determine if future interests may influence

HD reasoning. These variables were thought to be possible intervening variables due to the expectation that individuals who sought more science-related degrees and careers would have a greater interest in sharpening their HD skills for their future plans utilizing alternative methods, for example, admission test preparation and independent research. It was thought that this type of preparation may therefore be considered as part of HD

78 reasoning experience. These variables also did not demonstrate any consistent relationship with LCTSR scores.

When considering experience as a factor, a final analysis regressed LCTSR scores on participants’ choice of major. In both analyses of the AU1 and AU2 data, it was found that a non-biology science major, such as chemistry or physics, demonstrated a positive increase of approximately one point on LCTSR scores, as shown in Tables 12 and 13

(only the regression of AU2 LCTSR scores are shown for brevity). Although the choice of a non-biology science major is found in both AU1 and AU2 analyses and is considered as cross-validated, other majors were also found to have an effect on LCTSR scores. The analysis of the AU1 data also found a negative effect on LCTSR scores due to choice of an allied health or agriculture major, while the AU2 data also demonstrated a positive effect due to declaring a business major. However, these three types of major have very little in common and can’t be considered as cross-validation for each other.

Intercorrelation

Variables X1 X2 X3 Y M SD

AU1 LCTSR Scores (X1) 1.00 0.15 0.03 0.42 8.14 1.76

Other Science Major (X2) 1.00 -0.04 0.17 0.06 0.23

Business Major (X3) 1.00 0.11 0.02 0.15

AU2 LCTSR Score (Y) 1.00 7.77 2.45 Note. n = 299. For the Other Science Major, 1 = Declared Major and 0 = Not Declared Major. Only Other Science Major is consistent when both the AU1 and AU2 LCTSR Scores are regressed on College Major.

Table 12. Summary Data of Regression of AU2 LCTSR Scores on College Major

79 R2 Std. Variables Step R2 Change B β t p

AU1 LCTSR Scores 1 0.18 0.18 0.56 0.40 7.58 0.000*

Other Science Major 2 0.19 0.01 1.23 0.12 2.22 0.028*

Business Major 3 0.20 0.01 1.75 0.11 2.07 0.039*

(Constant) 3.14 Note. n = 299 * Significant at α = 0.05. Standard error for predicted AU2 LCTSR scores from full model= 2.21 Adjusted R2 for full model = 0.19 For model: F 3, 295 = 24.47, p = 0.000*

Table 13. Stepwise Entry Regression of AU2 LCTSR Scores on College Major

In each of the statistical analyses, the residuals were checked and no violations of the assumptions for multiple regression were found. The stepwise regression model fits well and the combination of the three variables accounted for a total of 19.9% of the variance in the LCTSR scores, with the AU1 LCTSR scores contributing the majority

(17.5%) and the other science major variable contributing an additional 1.2%. Although these results are cross-validated by a multiple regression analysis on the AU1 LCTSR scores, the small amount of contribution to the variance, the weak intercorrelations, and the large size of the sample limits the practical nature of the results regarding college major.

Change in LCTSR Scores

To examine any significant developmental change in HD reasoning, LCTSR scores were analyzed with a repeated measures MANOVA, also considering the possible differences between biology majors and all other participants for comparison. To best 80 discern any developmental differences, two time periods were investigated. The first time period included the first quarter of the introductory biology sequence while the second time period encompassed the entire two-quarter sequence. Participants lacking data for either administration were eliminated by the list-wise option for the analysis. This elimination provided a total of 27 individuals who completed all three administrations

(Table 14). Therefore, two separate analyses were run to better take advantage of the greater number of participants in the autumn quarter. These separate analyses also allowed for an investigation into the influence due to time in the introductory biology sequence (one quarter versus two) and to better compare overall development by comparing the same form at the AU1 and WI administrations. In each analysis, the assumption for homogeneity of variance was met.

AU1 & AU1 & AU1 AU2 WI AU2 WI ALL

LCTSR 391 318 59 299 30 27

Table 14. Total Number of Individuals who Completed the LCTSR Instrument by Administration

Change in overall total scores. Over the course of the first quarter, there is a significant decrease from the beginning of the quarter to the end of the quarter but no difference between biology majors and non-biology majors (Tables 15 and 16). Primarily due to the difference between LCTSR AU1 and AU2 scores, there first appears to be a significant interaction effect between LCTSR score and major; however, this interaction

81 only explains 1.4% of the variance and is therefore not of very practical interest. When a

Bonferroni correction is applied for the two separate MANOVAs, reducing α to 0.025, this interaction only approaches significance; but a statistically significant difference in the LCTSR scores still remains.

M SD n

AU1 Biology 8.49 1.73 79

Not Biology 8.01 1.76 220

AU2 Biology 7.66 2.59 79

Not Biology 7.81 2.40 220

Mean Difference Biology – Not Biology 0.16

AU2 – AU1 Overall -0.52

Table 15. Descriptive Statistics of AU1 and AU2 LCTSR Scores by Major

Variables Tested F df p Effect Size

AU1 vs. AU2 LCTSR Scores 11.47 1, 297 0.001a 0.04

Biology majors vs. Not Biology Majors 0.48 1, 297 0.489 0.00

Interaction between LCTSR Scores and Major 4.32 1, 297 0.039b 0.01 Note. Wilks’ lambda was utilized for each the LCTSR scores and interaction comparisons’ F tests. aStatistically significant difference at α = 0.025. aStatistically significant difference at α = 0.05, but not at the Bonferroni-corrected α = 0.025.

Table 16. Repeated Measures MANOVA Comparison of AU1 and AU2 LCTSR Scores

82 As it is highly unlikely that students lost reasoning skills over the course of the quarter, some other factors must be considered to explain this drop in scores. One possibility is that students, after encountering the work load of their first college-level biology course, began to doubt their ability to do well in a college-level science major course. This may have led to a reduction in self-efficacy with regard to their reasoning ability and they therefore scored poorly. The plausibility of this explanation is difficult to determine, as no data was available for the students’ self-efficacy. However, Lawson et al. (2007) found that SR ability positively influenced self-efficacy, but not vice versa, therefore reducing the probability of self-efficacy’s negative effect in this instance. Also, the finding that the participants completed an average of more than three college-level science courses weakens this argument, as the LCTSR is not solely biology-specific. The more likely explanation is two-fold. First, Forms A and B were not as equivalent as previously thought. The equivalency of the forms was difficult to determine, given the nature of the instrument and the degree of independence of items when split. However, the Pearson correlation between the two forms, as used in this study, was moderate to substantial (rBiology = 0.54, p = 0.000, n = 79; rNot Biology = 0.38, p = 0.000, n = 220).

Second, the AU2 administration was given during the lab section in the last week of the quarter before the holiday break. Upon entry of the data, an increase in random patterns and lack of sincerity in answers was noted more keenly for this administration. It is likely that the students opted to complete the instruments as quickly as possible without much effort or interest. A comparison of the AU1 and WI LCTSR scores was used to further illustrate these possibilities (Tables 17 and 18).

83 M SD n

AU1 Biology 9.00 1.88 14

Not Biology 8.69 1.40 16

WI Biology 8.21 3.04 14

Not Biology 9.19 2.56 16

Mean Difference Biology – Not Biology 0.33

WI – AU1 Overall -0.14

Table 17. Descriptive Statistics of AU1 and WI LCTSR Scores by Major

Variables Tested F df p Effect Size

AU1 vs. WI LCTSR Scores 0.12 1, 28 0.735 0.00

Biology majors vs. Not Biology Majors 0.21 1, 28 0.653 0.00

Interaction between LCTSR Scores and Major 2.36 1, 28 0.136 0.08 Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. No statistically significant difference at the Bonferroni-corrected α = 0.025.

Table 18. Repeated Measures MANOVA Comparison of AU1 and WI LCTSR Scores

There were no significant differences found between the AU1 and WI LCTSR scores or between biology and non-biology majors. Also, no interactive effect was found between LCTSR scores and major. The lack of difference between the pre- (AU1) and post-test (WI) administrations of the same form (A), even when the mid-test (Form B) demonstrated a significant decrease, lends support to the explanation that the AU2 scores 84 drop were most likely due to non-measured intervening variables (such as attitude or self- efficacy). However, it must be acknowledged that, if self-efficacy in biology is a factor, it is possible that another quarter may have reinstated previous belief in one’s self and abilities. Another factor lends credence to the explanation of the AU2 scores. The administration of the instrument in the winter quarter was completed optionally at the end of a laboratory meeting in which the students completed their final practical exam.

Therefore, the few students who completed the instrument during this administration were more likely to be committed to the research process and answer to the best of their ability. These results are then likely to be indicative of the apparent lack of HD reasoning skills development during the two-quarter introductory biology sequence. The lack of development even in this group points to the need for specific attention to improve these skills.

Change in LCTSR item scores. With the finding that the AU2 LCTSR scores were significantly lower than the AU1 scores, an investigation into differences among actual item scores was undertaken to identify any particular types of scenarios that led to the lower scores. For the AU1 and AU2 LCTSR administrations, a repeated measures

MANOVA was completed on the total scores for each scenario/pair of questions.

Participants were given a 1 for each correct answer, with the possible item-pair total scores of 0, 1, or 2. The assumption of equality of variances was not met for the AU1

LCTSR item-pair scores; however, the sample sizes were equal and all multivariate test statistics demonstrated the same F, p, and effect size values. The assumption of equality of variances was met for the AU2 LCTSR item-pair scores. Overall, no distinct patterns

85 were found, as nearly all items in each administration were significantly different from each other (Tables 19 – 22, Figures 6 and 7).

2.2

2

1.8

1.6

1.4

1.2 Biology 1 Not Biology Mean Score 0.8

0.6

0.4

0.2

0 A1, A2 A3, A4 A5, A6 A7, A8 A9, A10 A11, A12 AU1 LCTSR Item Pair

Figure 6. Mean AU1 LCTSR item-pair scores by major.

Variables Tested F df p Effect Size

AU1 LCTSR Item-Pair Scores 319.73 5, 385 0.000* 0.81

Biology majors vs. Not Biology Majors 6327.62 1, 389 0.124 0.01

Interaction between LCTSR Item-Pair 1.58 5, 385 0.164 0.02 Scores and Major Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n = 116, Not Biology n = 275, and Total n = 391. *Experiment-wise statistically significant difference at α = 0.05.

Table 19. Repeated Measures MANOVA Comparison of AU1 LCTSR Item-Pair Scores

86 Item Pair A3, A4 A5, A6 A7, A8 A9, A10 A11, A12

A1, A2 0.000* 0.000* 0.000* 0.020* 0.000*

A3, A4 0.000* 0.000* 0.000* 0.000*

A5, A6 0.000* 0.057 0.000*

A7, A8 0.000* 0.000*

A9, A10 0.000* Note. Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α = 0.05.

Table 20. P-values from MANOVA Post-hoc Comparison of AU1 LCTSR Item-Pair Scores

2

1.8

1.6

1.4

1.2 Biology 1 Not Biology

Mean Score 0.8

0.6

0.4

0.2

0 B1, B2 B3, B4 B5, B6 B7, B8 B9, B10 B11, B12 AU2 LCTSR Item Pair

Figure 7. Mean AU2 LCTSR item-pair scores by major.

87 Variables Tested F df p Effect Size

AU2 LCTSR Item-Pair Scores 313.21 5, 312 0.000* 0.68

Biology majors vs. Not Biology Majors 2520.61 1, 316 0.582 0.00

Interaction between LCTSR Item-Pair 1.51 5, 312 0.186 0.02 Scores and Major Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n = 84, Not Biology n = 234, and Total n = 318. *Experiment-wise statistically significant difference at α = 0.05.

Table 21. Repeated Measures MANOVA Comparison of AU2 LCTSR Item-Pair Scores

Item Pair B3, B4 B5, B6 B7, B8 B9, B10 B11, B12

B1, B2 0.446 0.000* 0.000* 1.000 0.000*

B3, B4 0.000* 0.000* 0.045* 0.000*

B5, B6 0.000* 0.000* 0.000*

B7, B8 0.000* 0.000*

B9, B10 0.000* Note. Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α = 0.05.

Table 22. P-values from MANOVA Post-hoc Comparison of AU2 LCTSR Item-Pair Scores

As can be seen in both Figures 6 and 7, Item pairs 7, 8 and 11, 12 demonstrate the lowest mean scores. On both forms, Items 7 and 8 are related to an experimental scenario regarding the preference of fruit flies for two different variables. These problems are designed to address the identification and control of variables. In addition, Items 11 and 88 12 on each form are created to directly assess HD thinking and reasoning. It is possible that the scores on Items 7 and 8 are lower due to difficulty reading the numerical values on the figure associated with the scenario. Several individuals indicated this issue on their instrument. However, if studied closely, especially on Form B, the actual numerical values are not critical to understanding of the problem. With regard to Items 11 and 12,

Form B’s scenario focused on the consequence of placing red blood cells in a hypertonic solution and its cause. The low scores on these items are somewhat surprising as the participants completed this experiment earlier in the quarter. These results give some indication as to the particular trouble the participants have with regard to HD reasoning.

They do appear to be competent in basic conservation of material, proportional and probabilistic thinking. However, in both the item pairs highlighted as particularly difficult, the skills investigated are directly related to experimental design, control, and understanding – a critical skill for science majors.

Argumentation

Initial Distributions

The initial distribution of the argumentation scores for each administration was first investigated to determine the normality of the data and discern any readily distinguished patterns (Table 20). As seen in the LCTSR scores, the AU2 administration scores appear lower than either the AU1 or WI scores, with a noticeable decrease in the

WI scores of biology majors compared to non-biology majors. However, the standard deviations of all three administrations appear to be constant, although a little large, indicating a wide range of scores. The mean score was approximately between 5 and 6.

This finding is interesting as participants would be expected to do better on the first three

89 questions of the instrument relating to the initial argument. Based on the current literature, individuals are more adept at creating an initial argument and have difficulties recognizing, creating, and addressing alternative explanations (D. Kuhn, 1992; Toplak &

Stanovich, 2003; Zohar & Nemet, 2002). As the items are scored from 0 to 2, a strong initial argument would yield a total score of 6 and is similar to the current mean.

Unfortunately, as this instrument was created for this particular study, there was no comparison to determine the exact meaning of an average score in this range. It is also important to note again, however, that when scoring the instruments, if an individual earnestly attempted any of the items, those blanks remaining were given a 0 instead of a

“no response.” This scoring method could also be skewing the scores lower, especially for the alternative explanation items, 4 and 5.

AU1 AU2 WI

Bio Not Bio Bio Not Bio Bio Not Bio n 119 293 91 253 30 29

M 6.17 6.08 5.42 5.40 5.17 6.00

SD 2.43 2.41 2.29 2.53 2.23 2.70 Note. All populations for each administration are normally distributed. AU1 = 1st administration in autumn quarter, AU2 = 2nd administration in autumn quarter, and WI = 3rd administration in winter quarter.

Table 23. Average Total Argumentation Scores by Administration and Major

As with the LCTSR scores, a repeated measures MANOVA was performed for the two autumn administrations and a one-way ANOVA was performed on the winter

90 administration to determine if there was any effect due to different lab sections. For the autumn administrations, the assumptions of equality of variances were met (Table 24). It was found that there was a significant interaction between lab section and argumentation score change (F21, 318 = 1.92, p = 0.010,) with an effect size accounting for 11.3% of the variance in scores. There were also significant differences found among the autumn lab sections (F21, 318 = 2.32, p = 0.001) with an effect size accounting for 13.3% of the variance. However, with a Bonferroni adjustment for the post-hoc tests, only section 10 with the lowest mean score (overall M = 4.63), was found to differ from section 12, which had the highest mean score (overall M = 7.29, p = 0.034). This may be of some concern because section 12 was also one of the highest scoring sections for the LCTSR.

AU1 AU2

Autumn Lab Section M SD n M SD n

1 7.00 2.32 21 6.43 2.25 21

2 5.07 2.49 15 4.53 3.18 15

3 6.62 2.10 13 3.38 2.79 13

4 5.82 3.19 11 5.73 2.24 11

5 6.47 2.40 17 6.35 1.94 17

6 6.25 2.02 16 6.06 2.74 16

Continued

Table 24. Average AU1 and AU2 Argumentation Scores by Lab Section

91 Table 24 continued

AU1 AU2

Autumn Lab Section M SD n M SD n

7 6.57 2.23 7 6.29 2.29 7

8 5.57 1.70 14 4.86 2.38 14

9 6.42 2.50 12 6.50 2.24 12

10 5.13 2.16 16 4.13 2.42 16

11 6.53 2.64 15 5.40 2.64 15

12 7.33 2.06 12 7.25 1.55 12

13 4.10 2.18 10 6.00 2.67 10

14 5.75 2.05 20 5.95 1.85 20

15 7.16 2.12 19 6.00 2.19 19

16 7.21 2.86 14 4.64 2.79 14

17 6.30 2.11 20 5.20 2.42 20

18 5.36 2.48 22 5.86 2.12 22

19 4.90 3.38 10 4.20 1.32 10

20 5.78 2.13 18 4.89 2.52 18

21 7.59 2.06 17 4.76 2.17 17

22 5.57 2.38 21 4.67 2.75 21

For the winter quarter lab sections, a one-way ANOVA was performed, meeting the assumption of equality of variances (Table 25). There were no significant differences 92 found among the sections (F11, 58 = 1.43, p = 0.193). Overall, due to the only statistical differences being found in the autumn data between the lowest and highest average scores, there once again appears to be very little practical concern with regard to the possibility of different laboratory instructors and sections being an intervening variable.

Winter Lab Section M SD n

23 6.33 2.25 6

24 8.50 0.71 2

26 5.78 2.05 9

27 5.50 0.71 2

28 6.00 1.41 2

29 3.80 2.59 5

30 3.67 3.50 6

31 4.80 0.45 5

32 6.14 2.12 7

33 5.63 2.78 8

34 7.60 2.70 5

35 4.00 2.83 2 Note. Lab section 25 did not have any participants complete the argumentation instrument.

Table 25. Average WI Argumentation Scores by Lab Section

93 Relationship of Background Characteristics and Argumentation Autumn Quarter Scores

An exploratory multiple regression analysis was undertaken to determine if any demographic characteristics could be determined to have an influencing factor on argumentation scores. Due to the exploratory nature of the analysis, a multiple regression was performed on both the AU1 and the AU2 argumentation scores to identify any recurring characteristics and provide cross-validation. Similar to the LCTSR, the response for both instruments was sufficiently repetitive and there was a lack of a significant score increase to warrant the use of the second administration as a cross- validation sample. To keep the ratio of variables to number of individuals low, several stepwise multiple regression analyses were conducted. In addition, due to the repetitive nature of the AU1 and AU2 LCTSR assessments, the AU1 scores were entered by force before the stepwise entry of the other independent variables when regressing the AU2 scores on them. This accounted for any variance due to a test-retest factor.

As with the LCTSR scores, the argumentation total scores for each autumn administration were regressed on the demographic variables group including age, gender, years as undergraduate, number of Advanced Placement courses, and number of college science courses, as well as university rank separately, to determine if maturation and previous experience influenced the scores. Even though these were originally chosen based on the relationship of the LCTSR to Piagetian theory, it was believed that experience, especially the number of college science courses, may be a positive influencing factor on argumentation scores. No consistent factors were found in either autumn analysis. Multiple regression of the argumentation scores on the future plans variables, including degree sought and post-baccalaureate plans, or choice of major also

94 did not demonstrate any consistent influence. Overall, no consistent factors were found between the two administrations or the results found with the LCTSR scores.

Change in Argumentation Scores

To examine any significant developmental change in argumentation skills, argumentation scores were analyzed with a repeated measures MANOVA. Once again, the possible differences between biology majors and all other participants were considered for comparison. The two time periods between administrations, AU1 to AU2 and AU1 to WI, were examined to determine any effects due to time and form. This was also important because participants lacking data for either administration were eliminated using the list-wise option in the analysis. This elimination provided for a total of only 29 individuals who completed all three instruments (Table 26). Therefore, two separate analyses were completed to better take advantage of the greater number of participants in the autumn quarter. In all cases, the assumption for homogeneity of variance was met.

AU1 & AU1 & AU1 AU2 WI AU2 WI ALL

Argumentation 412 344 59 340 31 29

Table 26. Total Number of Individuals Who Completed the Argumentation Instrument by Administration

Change in overall total scores. Over the course of the first quarter, there was a significant decrease in argumentation scores from the beginning to the end of the quarter with the effect size explaining only 0.5% of the variance in the scores. There was no

95 difference between biology majors and non-biology majors or interactive effect due to major (Tables 27 and 28). However, the decrease in the argumentation scores is troubling yet again as it is highly unlikely that students lost reasoning skills over the course of the quarter. This repetition of effect in the argumentation scores as in the LCTSR scores increases the likelihood that an intervening factor was at work. Once again, the possibility of a self-efficacy decrease must be considered although there are no additional reasons to assume this is a factor for the argumentation scores when it is unlikely to be a factor influencing the LCTSR scores. As no factors were found with the regression, this leaves the format of the administration and the different forms of the instrument as the most likely influences.

M SD n

AU1 Biology 6.39 2.46 89

Not Biology 6.05 2.40 251

AU2 Biology 5.43 2.29 89

Not Biology 5.40 2.54 251

Mean Difference Biology – Not Biology 0.18

AU2 – AU1 Overall -0.81

Table 27. Descriptive Statistics of AU1 and AU2 Argumentation Scores by Major

96 Variables Tested F df p Effect Size

AU1 vs. AU2 Argumentation Scores 17.93 1, 338 0.000* 0.05

Biology majors vs. Not Biology Majors 0.62 1, 338 0.433 0.00

Interaction between Argumentation Scores 0.69 1, 338 0.407 0.00 and Major Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests. *Statistically significant difference at the Bonferroni-corrected α = 0.025.

Table 28. Repeated Measures MANOVA Comparison of AU1 and AU2 Argumentation Scores

The possibility of Form A and Form B not being equivalent is the stronger possible intervening variable in this instance. The two forms each contained a scenario and data table from two different topics in biology: Form A focused on evolution and

Form B focused on ecology. Even though the scenarios and corresponding data sets were designed to be more non-specific in nature, this could have had an effect. When correlating the two forms, this possibility becomes a concern. The correlations between

AU1 and AU2 argumentation scores by major were not significant for biology majors

(rBiology = 0.20, p = 0.055, n = 89) and only weakly associated for non-biology majors

(rNot Biology = 0.20, p = 0.002, n = 251). However, if the content of the forms was a true problem, a difference in scores would be expected to be found between biology majors and non-biology majors, which was not the case. Another likely cause of the drop in the scores is, once again, the desire of the participants to put less effort into the instruments at the end of the quarter. Both these possibilities are supported by the comparison of the

97 AU1 and WI argumentation scores where the significant decrease is no longer present when comparing the same form (Tables 29 and 30).

M SD n

AU1 Biology 6.57 2.14 14

Not Biology 6.82 2.70 17

WI Biology 5.57 2.85 14

Not Biology 6.53 2.85 17

Mean Difference Biology – Not Biology -0.61

WI – AU1 Overall -0.65

Table 29. Descriptive Statistics of AU1 and WI Argumentation Scores by Major

Variables Tested F df p Effect Size

AU1 vs. WI Argumentation Scores 2.67 1, 29 0.113 0.08

Biology majors vs. Not Biology Majors 0.51 1, 29 0.480 0.02

Interaction between Argumentation Scores 0.59 1, 29 0.380 0.03 and Major Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests, Bondferroni-corrected α = 0.025.

Table 30. Repeated Measures MANOVA Comparison of AU1 and WI Argumentation Scores

98 There was no significant difference found between the AU1 and WI argumentation scores or between biology and non-biology majors. Also, no interactive effect was found between argumentation scores and major. The lack of difference between the pre- (AU1) and post-test (WI) administrations of the same form (A), even when the mid-test (Form B) was a significant decrease, lends support to the explanation that the drop in the AU2 scores was most likely due to non-equivalent forms or a non- measured intervening variable. Also, the few individuals who completed the WI administration were probably more dedicated to providing valid data. Their lack of score change over two complete quarters lends credence to the likelihood that the second administration score decrease was influenced by an outside variable. Once again though, it must be acknowledged that, if self-efficacy in biology is a factor, it is possible that another quarter may have reinstated previous belief in one’s self and abilities. Regardless, the overall lack of improvement in argumentation scores through two quarters of study and the lack of any specific attention to argumentation implies the need for directed intervention to improve these skills.

Change in argumentation subscale scores. The argumentation subscale scores were examined to further clarify the composition of the participants’ argumentation scores and difficulties encountered. Overall, with a visual inspection, it appears that participants in this study are similar to those reported in the literature. Subscale scores for alternative explanations seem lower than those for argument generated in each administration (Figure 8). A repeated measures MANOVA was utilized to determine if the ability to generate an argument or the ability to identify and rebut alternative explanations were significantly different or changed over the course of the two quarters.

99 As with the other analyses, scores were compared in two time periods: AU1 to AU2 and

AU1 to WI. In each analysis, the assumption of equality of variances was met.

2.2

2

1.8

1.6

1.4

1.2 Biology 1 Not Biology Mean Score 0.8

0.6

0.4

0.2

0 AU1 AU1 Alter. AU2 AU2 Alter. WI Argument WI Alter. Argument Explan. Argument Explan. Explan. Argumentation Subscale by Administration

Figure 8. Mean AU1, AU2, and WI argumentation subscale scores by major.

The results of the repeated measures MANOVA indicate that there were significantly lower average alternative explanation subscale scores than argument scores for each the AU1 and AU2 administrations (Table 31). These score differences do not interact with choice of major. This result was expected, based on the current literature

(i.e. D. Kuhn, 1992; Toplak & Stanovich, 2003; Zohar & Nemet, 2002). Students were better able to identify and support one conclusion from the given data than identify any other conflicting counterarguments. Difficulties in the argument subscale primarily rested on conclusions that were too broad and the lack of specific data identification (i.e. the individual identified “the data table” as the source of support). One possible consideration for the difference between the subscales is that some participants seemed 100 unable to consider another person’s point of view with regard to the data set. Item 4 on

Form B asked, “A graduate student in your lab has an alternative conclusion with regard to the data. What does she believe and why?” Several individuals replied on each administration as individual 13-3 did for AU2, “I still don’t know how I am suppost (sic) to know what someone else thinks.” Considering this a serious answer, it is interesting to note the strong bias the participant has for his or her own point of view. Item 5 asked the participants how they would counter the viewpoint offered in Item 4. Many individuals recognized the need for strong empirical evidence to support an argument by often answering that “more research needs to be done,” instead of using the data given. As it is difficult to determine if this was a serious answer or simply a quick way to finish the instrument, this was not the intended purpose of the item and such answers were scored a

0. The combination of these two types of answers may have influenced this decrease in alternative explanation subscale scores. However, as the earnestness of such answers was unable to be determined, it is assumed that participants had difficulty recognizing and countering alternative conclusions with the information provided.

Another finding from the MANOVA is that the initial decrease in total argumentation score from AU1 to AU2 is due to a drop in each subscale score. This helps characterize the drop in scores from AU1 to AU2 as likely due to an overarching intervening variable that tended to affect all participants equally. This intervening variable could be any of those previously described, such as a decrease in self-efficacy, the different content area used in the two different forms, or lack of effort.

101 Effect Variables Tested F df p Size

AU1 Argument vs. AU1 Alternative Explanationa 18.42 1, 338 0.000* 0.05

AU2 Argument vs. AU2 Alternative Explanationa 151.03 1, 338 0.000* 0.31

Interaction among AU1, AU2, and Majora 0.01 1, 338 0.929 0.00

AU1 Argument vs. AU2 Argumentb 9.63 1, 338 0.002* 0.03

AU1 Alternative Explanation vs. AU2 Alternativeb 14.71 1, 338 0.000* 0.04 Explanation

Interaction of Argumentation Subscales and Majora 0.35 2, 337 0.706 0.00 Note. aWilks’ lambda was utilized for the argumentation intra-subscale comparisons’ F tests. bInter- subscale comparisons were completed as univariate tests with a Bonferroni correction for alpha. *Statistically significant difference at α = 0.05. Biology n = 89, Not Biology n = 251, and Total n = 340.

Table 31. Repeated Measures MANOVA Comparison of AU1 and AU2 Argumentation Subscale Scores by Major

When comparing the AU1 and WI administrations, a slightly different pattern of

results was found with no significant difference between the AU1 subscales. The smaller

number of individuals used for this analysis reduces the power to detect differences

between the AU1 and WI argumentation subscales that was resulted from the previous

analysis. The WI scores still illustrated the significant decrease between the argument

and alternative explanation subscales. However, just as the AU1 results in this

comparison must be treated with some skepticism due to a low n, so must this finding as

well. It is also interesting to note that there was a significant increase in the WI argument

subscale from the AU1 argument subscale – the first positive increase indicated. It was

likely due to the low n, though, as there was no difference between the alternative

102 explanation subscales and there was no overall difference between the two administrations. Lastly, there were no interactions due to major for either combination.

Effect Variables Tested F df p Size

AU1 Argument vs. AU1 Alternative Explanationa 2.01 1, 29 0.167 0.07

WI Argument vs. WI Alternative Explanationa 7.79 1, 29 0.009* 0.21

Interaction among AU1, WI, and Majora 0.03 1, 29 0.869 0.00

AU1 Argument vs. WI Argumentb 4.57 1, 29 0.041* 0.14

AU1 Alternative Explanation vs. WI Alternativeb 0.39 1, 29 0.537 0.01 Explanation

Interaction of Argumentation Subscales and Majora 0.57 2, 28 0.571 0.04 Note. . aWilks’ lambda was utilized for the argumentation intra-subscale comparisons’ F tests. bInter- subscale comparisons were completed as univariate tests with a Bonferroni correction for alpha. Biology n = 14, Not Biology n = 17, and Total n = 31. *Statistically significant difference at α = 0.05.

Table 32. Repeated Measures MANOVA Comparison of AU1 and WI Argumentation Subscale Scores by Major

Correlation of Hypothetico-Deductive Reasoning and Argumentation

To determine the relationship between HD reasoning and argumentation skills, a

Pearson correlation was completed for each administration of the instrument by biology major or non-biology major. In each instance, the resulting correlation was significantly different from 0 and positive, as expected (Table 33). The values for the autumn administrations were low to moderately correlated, while the winter administration demonstrated substantial correlations. This discrepancy was most likely due to the greater

103 number of individuals participating in the autumn administrations, increasing the variability of the scores. Even though the winter administration values were much higher for each major, the n in each case was over 25, and therefore the correlations can be considered relatively unbiased. Overall, these findings support the hypothesis that HD reasoning and argumentation skills are positively related and moderately correlated.

Correlation n r p

AU1 LCTSR and AU1 Argumentation Biology 114 0.23 0.015*

Not Biology 270 0.20 0.001*

AU2 LCTSR and AU2 Argumentation Biology 82 0.35 0.001*

Not Biology 231 0.27 0.000*

WI LCTSR and WI Argumentation Biology 28 0.55 0.003*

Not Biology 24 0.50 0.013* Note. All significance tests are two-tailed. *Statistically significant difference at α = 0.05.

Table 33. Pearson Product Moment Correlations between LCTSR and Argumentation Scores by Administration

Three-Time Participants

Of the 460 initial volunteer participants, 27 gave complete data for the LCTSR instrument and 29 gave complete data for the argumentation instrument for all three administrations. Of these small samples, 26 individuals completed both instruments at each administration. As these few individuals out of the larger sample were the most dedicated to the study, they needed to be looked at more closely. As previously stated, the

104 collection of data during the winter quarter occurred at the end of the participants’ last lab section meeting when they completed their final practical exam. Therefore, without any incentive to participate, it can be inferred that those who chose to complete the instruments could be considered to take the research more seriously and respond to the best of their abilities. With this in mind, this sub-set of participants may give a more precise picture as to the nature of SR development. To be assured that the three-time participants were representative of all the participants, a MANOVA comparison of demographics was performed (Table 34). The assumption of equality of variances was not met; however the robustness of the MANOVA with a high sample size reduces this threat. In addition, one three-time participant did not provide information on the number of undergraduate science courses completed. Therefore, this variable was compared using an independent samples t-test (Table 35). This analysis also did not meet the assumption of equality of variances, so the test statistic which did not assume equality of variances was utilized.

For all of the demographic variables investigated, the three-time participants were not significantly different from all other participants. A slight, but non-significant increase in the number of undergraduate science courses taken by three-time participants was found. As this variable was hypothesized to have an intervening influence on the results, this may be a cause of concern. However, the previous multiple regression conducted on each of the LCTSR and argumentation scores found no consistent influence due to this factor. This finding, coupled with the non-significance of the t-test, leads to the conclusion that the three-time participants could be considered similar in demographics to the other participants.

105 Demographic Population n M SD F df p

Biology Major Three-Time 30 0.43 0.50 2.33 1, 451 0.128

Bio Mj = 1, Other = 0 All Other 423 0.30 0.46

Age Three-Time 30 20.73 3.08 1.67 1, 451 0.194

All Other 423 20.19 2.14

Gender Three-Time 30 0.37 0.49 0.82 1, 451 0.367 Female = 0, Male = 1 All Other 423 0.45 0.50

Years as an Three-Time 30 2.32 1.15 0.08 1, 451 0.772 Undergraduate All Other 423 2.25 1.23

Seeking a B.S. Three-Time 30 0.97 0.18 3.63 1, 451 0.058 BS = 1, Other = 0 All Other 423 0.84 0.37

Go to Professional School Three-Time 30 0.37 0.49 1.89 1, 451 0.170 Yes = 1, Other = 0 All Other 423 0.50 0.50

Table 34. MANOVA Demographics Comparison of Three-Time Participants with All Other Participants

Demographic Population n M SD t df p

Number of Science Three-Time 29 4.34 2.88 2.00 30.09 0.054 Courses Taken All Other 405 3.25 2.06

Table 35. Independent T-test Comparison of Number of Science Courses Taken by Three- Time Participants and All Other Participants

106 Although there didn’t appear to be any demographic difference between the three- time participants and all other participants, the small sample size and disparity between the number of those who completed the LCTSR and argumentation instruments needed to be more closely examined. Table 36 presents the distribution of the three-time participants by instrument type.

LCTSR and Argumentation Argumentation LCTSR Only Only Total

Number of Three- 26 1 3 30 Time Participants

Table 36. Number of Three-Time Participants Who Completed the LCTSR and Argumentation Instruments

A closer examination of the participants who completed only one instrument at all three administrations did not reveal any particular patterns. The four individuals were a combination of three sophomores and one senior, three females and one male, from a variety of majors, and had different post-baccalaureate plans. They were also from different lab sections. No consistent pattern was found in their individual LCTSR and argumentation scores. However, the individual who only completed the three administrations of the LCTSR instrument did score very low on the WI administration.

As this individual was not the only participant to score low on the WI LCTSR administration, it is unlikely that this had a significant effect.

107 Change in Overall SR Scores

As all previous statistical analyses have been eliminating participants’ data using the list-wise option, this was a limitation for comparing all three administrations in one statistical test. With only approximately 30 individuals completing either WI instruments, a list-wise elimination of the entire data set while comparing the three administrations together would only focus on those 30 individuals for the AU1 to AU2 analyses as well and be limited in information. With a focus on the three-time participants, this is no longer an issue. In comparing the three-time participants as a subsample of the total participants, the main interest is in determining if there was a different pattern of results than those found when all participants were included. To this end, the previous statistical analyses were conducted again on the three-time participant population.

Change in LCTSR scores. The LCTSR scores for all three administrations were analyzed using a repeated measures MANOVA, also taking into consideration biology majors versus non-biology majors (Tables 37 and 38). The assumption of equal variances was met. It was first noted that the mean scores for the three-time participants appeared slightly higher than that of all participants for each of the biology and non-biology majors. Only the biology majors appeared to exhibit the same decrease in AU2 scores.

When compared in the MANOVA, there was no significant difference among the three administrations and no interaction with biology major. This lack of difference also supports the possibility that the two LCTSR forms were not as different as originally thought and the original decrease can be attributed to participant attitudes at the end of the quarter. However, it also demonstrates that no increase was seen. If these participants are hypothesized to be more dedicated students, it is possible that there may be a ceiling

108 effect. On the other hand, mean scores an average of 3 to 4 points from the maximum score are unlikely to be the highest the “better” student participants can achieve.

M SD n

AU1 Biology 9.00 1.86 15

Not Biology 8.67 1.45 12

AU2 Biology 7.75 3.08 15

Not Biology 9.13 2.03 12

WI Biology 8.67 3.06 15

Not Biology 9.13 2.64 12

Table 37. Descriptive Statistics of AU1, AU2, and WI LCTSR Scores by Major for Three- Time Participants

Variables Tested F df p Effect Size

AU1 vs. AU2 vs. WI LCTSR Scores 0.44 2, 24 0.652 0.04

Biology majors vs. Not Biology Majors 0.46 1, 25 0.506 0.02

Interaction between LCTSR Scores and Major 2.13 2, 24 0.141 0.15 Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests.

Table 38. Repeated Measures MANOVA Comparison of AU1, AU2, and WI LCTSR Scores for Three-Time Participants

109 Change in argumentation scores. The total argumentation scores were also analyzed via a repeated measures MANOVA, considering all three administrations and differences related to major (Tables 39 and 40). Similar to the LCTSR score results for the three-time participants, the assumption of equality of variances was met and there were no significant differences among the three administrations. There was also no interaction between argumentation scores and major. In addition, as opposed to the

LCTSR scores, overall means of each administration appear similar to those of all the participants, possibly indicating a general difficulty with the instrument. Regardless, this second lack of development of SR scores in a population believed to have earnestly completed the instruments lends credence to the previous findings that without a specific focus, no improvement in SR will be noticed.

M SD n

AU1 Biology 6.69 2.18 16

Not Biology 6.63 2.67 13

AU2 Biology 5.85 2.51 16

Not Biology 5.63 3.14 13

WI Biology 5.62 2.60 16

Not Biology 6.31 2.80 13

Table 39. Descriptive Statistics of AU1, AU2, and WI Argumentation Scores by Major for Three-Time Participants

110 Variables Tested F df p Effect Size

AU1 vs. AU2 vs. WI Argumentation Scores 1.50 2, 26 0.243 0.10

Biology majors vs. Not Biology Majors 0.03 1, 27 0.869 0.00

Interaction between Argumentation Scores 0.78 2, 26 0.471 0.06 and Major Note. Wilks’ lambda was utilized for the argumentation scores and interaction comparisons’ F tests.

Table 40. Repeated Measures MANOVA Comparison of AU1, AU2, and WI Argumentation Scores for Three-Time Participants

Change in LCTSR and Argumentation Subscale Scores

LCTSR item comparison. As no difference was found between the AU1 and AU2

LCTSR scores for the three-time participants, a repeated measures MANOVA was conducted to determine if there was a difference in mean score patterns of the item pairs

(Table 41). The assumption of equality of variances was not met for this analysis.

However, MANOVA is relatively robust regarding this assumption. The pattern of AU1

Form A scores is similar to that of all participants (Figure 9). Both Item pairs A7, A8 and

A11, A12 are the lowest scores, again demonstrating a difficulty with the control of variables and direct HD reasoning. An overall significant difference was found among the item pairs with an effect size of 0.83, but no interactions among item-pair scores and major were found. However, the pattern of significant differences found with item-pair pairwise comparisons changed distinctly (Table 42). The only item pair retaining its significant difference from all other item pairs was A7, A8. The difficulty of this item pair in this population can signify that either the problem was truly difficult to decipher or the participants sincerely had difficulty with this aspect of HD reasoning. 111 2.2

2

1.8

1.6

1.4

1.2 Biology 1 Not Biology Mean Score 0.8

0.6

0.4

0.2

0 A1, A2 A3, A4 A5, A6 A7, A8 A9, A10 A11, A12 AU1 LCTSR Item Pair

Figure 9. Mean AU1 LCTSR item-pair scores by major for three-time participants.

Variables Tested F df p Effect Size

AU1 LCTSR Item-Pair Scores 22.99 5, 24 0.000* 0.83

Biology majors vs. Not Biology Majors 0.82 1, 28 0.372 0.03

Interaction between LCTSR Item-Pair 0.91 5, 24 0.493 0.16 Scores and Major Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n = 13, Not Biology n = 17, and Total n = 30. *Statistically significant difference at α = 0.05.

Table 41. Repeated Measures MANOVA Comparison of AU1 LCTSR Item-Pair Scores for Three-Time Participants

112 Item Pair A3, A4 A5, A6 A7, A8 A9, A10 A11, A12

A1, A2 0.033* 1.000 0.000* 1.000 0.023*

A3, A4 0.799 0.000* 1.000 1.000

A5, A6 0.000* 1.000 0.329

A7, A8 0.000* 0.009*

A9, A10 0.210 Note. *Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α = 0.05. Underlined values represent those pairwise comparisons that are different from the same comparisons found using all participants’ data.

Table 42. P-values from MANOVA Post-hoc Comparison of AU1 LCTSR Item-Pair Scores for Three-Time Participants

When scrutinizing the item-pair data for the AU2 Form B, a new pattern of mean scores was revealed. This graph was more flattened in shape and distinct differences between biology and non-biology majors were evident (Figure 10). As the assumption of equality of variances was met, an overall significant difference among the item pairs with a very large effect size (0.77) was still found by the MANOVA and no significant interaction with major choice was identified (Table 43). However, as in Form A, a new pattern of item-pair pairwise comparisons was found (Table 44). The three-time participants’ only item pair that retained its significant difference from nearly all other item pairs was B11, B12. This once again highlights difficulties participants had with direct HD reasoning.

113 2.2

2

1.8

1.6

1.4

1.2 Biology 1 Not Biology Mean Score 0.8

0.6

0.4

0.2

0 B1, B2 B3, B4 B5, B6 B7, B8 B9, B10 B11, B12 AU2 LCTSR Item Pair

Figure 10. Mean AU2 LCTSR item-pair scores by major for three-time participants.

Variables Tested F df p Effect Size

AU2 LCTSR Item-Pair Scores 15.98 5, 24 0.000* 0.77

Biology majors vs. Not Biology Majors 0.93 1, 28 0.343 0.03

Interaction between LCTSR Item-Pair 1.70 5, 24 0.173 0.26

Scores and Major Note. Wilks’ lambda was utilized for the LCTSR scores and interaction comparisons’ F tests. Biology n = 13, Not Biology n = 17, and Total n = 30. *Statistically significant difference at α = 0.05.

Table 43. Repeated Measures MANOVA Comparison of AU2 LCTSR Item-Pair Scores for Three-Time Participants 114 Item Pair B3, B4 B5, B6 B7, B8 B9, B10 B11, B12

B1, B2 1.000 0.139 1.000 0.274 0.148

B3, B4 0.424 1.000 1.000 0.001*

B5, B6 0.269 1.000 0.000*

B7, B8 1.000 0.000*

B9, B10 0.000* Note. *Post-hoc comparison using a Bonferroni correction gives a statistically significant difference at α = 0.05. Underlined values represent those pairwise comparisons that are different from the same comparisons found using all participants’ data.

Table 44. P-values from MANOVA Post-hoc Comparison of AU2 LCTSR Item-Pair Scores for Three-Time Participants

Argumentation subscale comparison. As with the total LCTSR scores, no differences were found among the administrations of the argumentation instrument. It was unknown whether this lack of difference was due to a score change in either the subscale score or overall score. To better discern this pattern, in particular between AU1 and AU2, repeated measures MANOVAs looking at the differences between subscales and quarters were used (Figure 11 and Table 45). The assumption for equality of variances was met and no significant interactions due to major were found. A significant decrease in score was found between the argument subscale and the alternative explanation subscale in each administration. However, there was no difference in either subscale between AU1 and AU2. This infers that there was no greater difficulty on Form

B than Form A and there was an overall change in both subscale scores, not just one in particular. 115 2.2

2

1.8

1.6

1.4

1.2 Biology 1 Not Biology Mean Score 0.8

0.6

0.4

0.2

0 AU1 Argument AU1 Alter. Explan. AU2 Argument AU2 Alter. Explan. Argumentation Subscale by Administration

Figure 11. Mean AU1 and AU2 argumentation subscale scores by major for three-time participants.

Effect Variables Tested F df p Size

AU1 Argument vs. AU1 Alternative Explanationa 9.95 1, 28 0.004* 0.26

AU2 Argument vs. AU2 Alternative Explanationa 4.31 1, 28 0.047* 0.13

Interaction among AU1, AU2, and Majora 0.53 2, 27 0.595 0.04

AU1 Argument vs. AU2 Argumentb 2.65 1, 28 0.115 0.09

AU1 Alternative Explanation vs. AU2 Alternativeb 1.12 1, 28 0.300 0.04 Explanation

Interaction of Argumentation Subscales and Majora 0.04 2, 27 0.962 0.00 Note. aWilks’ lambda was utilized for the argumentation intra-subscale comparisons’ F tests. bInter- subscale comparison’s were completed as a univariate test with a Bonferroni correction for alpha. *Statistically significant difference at α = 0.05. Biology n = 13, Not Biology n = 17, and Total n = 30.

Table 45. Repeated Measures MANOVA Comparison of AU1 and AU2 Argumentation Subscale Scores by Major for Three-Time Participants 116 Correlation of HD Reasoning and Argumentation

A Pearson correlation between the LCTSR and argumentation scores for each administration by major was completed to determine if there were any differences between the three-time participants and all other participants (Table 46). However, due to the limited sample size (less than 25), it was likely that bias would be a consideration. In fact, only four of the six correlations were found to be significantly different than 0. The significant correlations though were all positive and stronger than those found using all participants, as they all could be characterized as substantial. Practically, although the pattern of results is similar to those already found, the differences between the three-time participants and all participants was likely due to bias and not very informative.

Correlation n r p

AU1 LCTSR and AU1 Argumentation Biology 13 0.22 0.472

Not Biology 17 0.63 0.007*

AU2 LCTSR and AU2 Argumentation Biology 13 0.76 0.003*

Not Biology 17 0.60 0.012*

WI LCTSR and WI Argumentation Biology 12 0.77 0.004*

Not Biology 14 0.44 0.112 Note. All significance tests are two-tailed. *Statistically significant difference at α = 0.05.

Table 46. Pearson Product Moment Correlations between LCTSR and Argumentation Scores by Administration for Three-Time Participants

117 Summary of Key Findings

Overall, several key findings were determined:

1. Students enrolled in an introductory biology course initially had slightly higher

LCTSR scores than those individuals described in the literature. These scores did

not change overall during the course of the two-quarter sequence. Students

appeared to have difficulty with the control of variables and direct HD reasoning.

2. An other-science declared major may have been an influencing factor for LCTSR

scores; however, declaring a biology major or non-biology major was not a factor

in any score comparison.

3. Students taking an introductory biology course demonstrated difficulties with

generating and rebutting alternative explanations compared to creating an initial

argument. These scores did not change overall during the course of the two-

quarter sequence.

4. No influencing factor could be determined for argumentation scores. In addition,

declaring a biology major or non-biology major was not a factor in any score

comparison.

5. There was a moderate positive relationship between HD reasoning and

argumentation scores.

6. When looking at a subsample of participants who completed instruments in all

three administrations, no improvement was seen in LCTSR scores and similar

difficulties with control of variables and direct HD reasoning were evident.

7. When looking at a subsample of participants who completed instruments in all

three administrations, no improvement was seen in argumentation scores and

118 similar difficulties with identifying and rebutting alternative explanations were

found.

8. Overall, the course offered no direct attention to HD reasoning or argumentation.

The lack of improvement in scores over the first quarter and entire two-quarter

sequence may have reflected this.

119 CHAPTER 5

DISCUSSION AND IMPLICATIONS

This study has addressed three holes found in information provided by the literature. First, this study focused on biology majors and how they compare to other students taking the same introductory coursework. Very little is known about the education of science majors in general and biology majors in particular. This is critical due to the attrition of science majors and the increasing impact that science has on daily life. This study has attempted to establish a baseline regarding biology majors’ scientific reasoning skills from which interventions could be designed and evaluated. It was found that biology majors are not much different from other undergraduate populations in the literature or other individuals in their introductory courses. Second, the deductive aspects and inductive aspects of scientific reasoning were investigated concurrently in this study.

Each type of reasoning is utilized in science and it is important that both aspects are attended to in science education. By studying them concurrently, the relationships between the two types of reasoning can be established and considered. This study revealed that deductive and inductive reasoning, as measured by the LCTSR and argumentation instrument, are moderately correlated. Lastly, this study focused on individuals’ argumentation abilities regarding scientific scenarios. As argumentation of

120 experimental findings is an important aspect of the culture of science, this skill needs to also be assessed using scientific data. The students in this study demonstrated similar difficulties with the recognition and rebuttal of alternative explanations, as seen in other populations in the literature. Overall, this study offers information to begin to fill these holes in the science education literature.

Limitations of the Study

Although most of the study results are statistically strong, there are several limitations to this study that restrict the practical extension of the findings. The first limitation is due to the instrumentation utilized in this study. The LCTSR has well- documented validity and reliability. It has also been used repeatedly in studies of undergraduates. However, its original strength and internal reliability lays in the redundant nature of the items assessing the six aspects of HD reasoning. By splitting the instrument in half, this internal reliability was reduced and some validity lost. This study is also the first use of the argumentation instrument. The validity and reliability of the instrument are promising, but need to be further developed and studied, especially in comparison to the more regular use of oral interviews or discourse analysis to assess argumentation skills. Lastly, the administration of two different forms for each instrument likely influenced the AU2 results. To account for differences among lab sections and to ensure that three-time participants all completed the same order of instruments, it was decided to present only one form at each administration. An alternative would have been to randomly assign the two forms for each instrument to lab sections for the AU1 administration and reversed the type of form for the AU2 administration. This method would have accounted for a difference in the equivalency of

121 the forms. However, the process of assuring that each participant would then receive the correct form for the WI administration was prohibitive due to the inability to know which participants would continue in the study in the second quarter.

Another aspect of the study that places limitations on the findings is the instrument administration process. As previously noted, the teaching assistants distributed the instruments in the regular lab sections of the introductory biology courses.

Students were given no incentive to participate and this may have limited the validity of the results by limiting the motivation of the participants. This is especially a concern for the winter administration. Out of approximately 400 students enrolled in the second introductory biology course, only 59 completed the instruments. By offering the students the instrument as an option at the end of their laboratory practical exam during the last week of the course, there was very little impetus for the students to participate in the study. The limited number of individuals who then completed instruments at all three administrations possibly introduced bias to the results that concern both quarters.

However, it is difficult to assess this bias as no significant difference was found over the two quarters. Lastly, although many studies have demonstrated an improvement in SR skills over a short period of time (e.g. Lawson, 1992a; Zohar & Nemet, 2002), it is possible that two quarters was not a long enough time to detect any improvement in SR skills.

A last limitation of the study regards the support for the theoretical model given in

Figure 2. By solely focusing on students in the introductory courses whose SR skills are limited, only a low positive relationship could be established between HD reasoning and argumentation skills. The relationship between these two aspects of SR needs to be

122 further examined with individuals at presumed higher levels of SR skills, such as upper- level undergraduates, graduate students, and scientists. Studying individuals with higher levels of SR abilities may also be better able to tease out relationships among the various underlying skills identified by the LCTSR and argumentation instruments. Overall, though, the findings in this study cannot be extended much beyond the particular circumstances identified here.

The Assumption of Natural Development of SR Skills

Professors of science have generally assumed that participation in regular undergraduate science coursework will implicitly foster individuals’ scientific reasoning as they memorize facts and understand concepts. Yet, these same professors are troubled time and again when students have difficulties writing laboratory reports or applying new information to an experimental situation. Hogan and Maglienti (2001) established that scientists do not come to truly understand what characterizes good scientific reasoning until they participate in the science process themselves, arguing their conclusions to their peers. This study complements this finding, as the participants did not have an experience even close to that of working in an actual research laboratory. In the coursework, students were examined with primarily factual-recall questions and had a low level of inquiry in their laboratory exercises. This two-quarter introductory biology course could be considered typical at a large university. Very little attention was paid to scientific reasoning, even implicitly. With little to no attention, the students in the study did not increase their scores over one or both quarters. Even the biology majors and three-time participants, who would have been expected to improve due to their interest in their own scientific reasoning, did not exhibit any significant improvement. In addition, the number

123 of previous science courses was not found to be an influencing factor on either HD reasoning or argumentation skills. However, although the findings do not establish any

SR skills improvement, this conclusion needs to be moderated as the validity and reliability of the instruments used was weakened due to their administration.

These findings do imply that to improve SR skills, specific explicit attention needs to be focused on them as part of the learning goals of the course. Previous studies

(Jiménez-Aleixandre et al., 2000; Osborne et al., 2004; Zohar, 1996; Zohar & Nemet,

2002) have demonstrated that directed intervention improves argumentation skills. The control group individuals in these studies did not significantly improve their argumentation skills compared to those who experienced an explicit intervention. At the college level, this type of intervention could occur in several different ways. One could be a specific course, in addition to their regular coursework, designed to teach these types of skills to biology or all science majors. A more likely scenario is one where instructors consciously focus on aspects of good reasoning and test students on the use of these skills. Even though this could more readily occur in laboratory courses through practical use and application of SR skills, the lecture can and should be utilized as well through discussing and modeling strong SR skills. Another possibility is for students to become involved in laboratory research much earlier than their junior and senior years, when most students enroll in independent research. Hopefully, the focus on improvements in students’ scientific reasoning earlier in their college careers would also allow them to be more connected to science as a process, reducing the feelings of alienation that commonly lead to attrition (Seymour & Hewitt, 1997).

124 Particular Findings for Hypothetico-Deductive Reasoning

One of the expectations of this study was that biology majors would have a higher initial HD reasoning score than the other populations in the literature. According to

Piaget (1972), HD reasoning in different topic areas develops based experience and aptitude. It was assumed that students interested in taking a major-level biology course would have a greater aptitude for SR than students enrolled in a non-major course. It was found that the mean LCTSR score placed the students in the low range of formal- reasoning ability, slightly higher than the concrete to transitional range generally found with students enrolled in non-majors biology courses. Also, the percentage of students who scored in this range or higher (53 – 81%) was higher than the 50% of students previously reported for non-majors in biology (Johnson & Lawson, 1998; Lawson,

1992a). This implies that instructors of biology majors’ courses can expect their students to come in with a higher level of HD reasoning, although not by a great deal.

One of the other key findings regarding HD reasoning in this study was the students’ lower scores on the LCTSR item pairs assessing control of variables and direct

HD reasoning. These results indicate that biology majors and other students enrolled in introductory biology courses have difficulty with two skills central to good scientific research. This finding, coupled with the lack of instructional attention, does not bode well for students’ future success in laboratory courses or independent research. These skills need to be practiced for students to improve. This is difficult when students are primarily given laboratory exercises in which the design of an experiment is already provided. HD reasoning links the hypothesis, design, and predictions of an experiment together. If the hypothesis and design are already provided, it may be difficult for students to practice

125 linking the reasoning behind them. Overall, students need more opportunities to design their own experiments to help improve their HD reasoning, preferably in a Level 3 and above inquiry setting.

Particular Findings for Argumentation

The analysis of argumentation in this study focused on the overall structure and support of the argument, not the correctness of the content used. Overall, the argumentation findings were similar to that in the literature. Students in this study demonstrated a significant decrease in average scores for the recognition and rebuttal of alternative explanations compared to the development of an initial argument (Osborne et al., 2004; Zohar & Nemet, 2002). This finding reflects a "myside" bias and difficulty recognizing other explanations for the same data set (D. Kuhn, 1992; Toplak &

Stanovich, 2003). In addition, the difficulties students displayed for initial argument generation reflected those given by Hogan and Maglienti (2001) – too broad of claims and lack of specific data to support the claims. However, students did recognize the importance of empirical data in supporting their own claim by often citing that “more research was needed” to choose between their conclusion and the alternative.

Unfortunately, by not using the data given, this also indicates that students may have difficulty taking a stand with regard to their conclusions. This could be due to the perception that science is a collection of correct facts, not empirically supported theories

– a point emphasized by the focus on factual-recall in college biology instruction. Further refinement of the argumentation instrument and a more focused study could help determine whether students truly had difficulty identifying and rebutting alternative

126 explanations or simply were tired of answering questions on an instrument that did not benefit them specifically.

The findings in the study imply that students first need to learn to develop a strong scientific argument. To this end, students must also understand that science is based on a preponderance of empirical data that is repeatedly argued to develop useful theories and laws. Instructors could implement this in their classrooms by simply modeling a classic biology experiment and the argument the researchers created for their theory or model. Students should also be encouraged to analyze their own experimental data in laboratories and to specifically support their analysis with theories learned in class. In the laboratory exercises used by the courses in this study, the most common type was the structured inquiry, which does provide students the opportunity to analyze their own results (albeit with instructor/laboratory manual guidance) and come to their own conclusions. In fact, one exercise focused on the historical development of the structure of DNA and discovery of DNA replication through the analysis of alternative explanations. However, often students are not expected to explicitly explain the theory behind their conclusions or identify any possible alternative scientific explanations. This may be precisely what is needed.

Biology Majors as a Population

This study adds to the little that is known about biology majors as a population and begins to establish a baseline for future work. The most consistent result in the statistical tests used was that the biology majors were not significantly different from the other students enrolled in the introductory biology courses. This is somewhat surprising as many of the other students in the course were alternative majors who needed only one

127 quarter of biology coursework. However, this does characterize biology majors as being no more adept at HD reasoning or argumentation with a biology data set than other students enrolled in the same course. This may also imply that biology majors have difficulty recognizing these important underlying aspects of science as a process. Instead, they are left to view biology as a science that is primarily a compilation of facts. This is of particular concern for the retention of biology majors. Seymour and Hewitt (1997) documented that one of the leading causes of attrition of science majors is a feeling of isolation due to high loads of necessary fact memorization. If biology majors could understand that science is more of a process than a collection of facts, this may help alleviate some of the perceived isolation. It is also well understood that many students who choose to major in biology often do so with a desire to enter medical school after graduation. For both this population and the remaining biology majors who may wish to go on to graduate school in biology, the strong development of SR skills is crucial to help mold better doctors and better researchers.

This study then provides some evidence that the process by which the large lecture introductory biology course is generally taught has not been helping biology majors develop their SR skills early in the curriculum. “Methods of science” is often one of the top objectives of the introductory courses. However, students in these classes may be having difficulty understanding these methods well because they are not developing the mental reasoning skills, both HD and argumentation, that lay under them. In addition, the main learning goal of the introductory biology course is to understand basic content knowledge. Yet, even this aspect of biology education is influenced by SR skills. Johnson and Lawson (1998) found that HD reasoning skills, not prior knowledge, was a better

128 predictor of course achievement. Implicitly addressing these SR aspects of biology in the laboratory and lecture does not appear to be enough to reveal some improvement. SR skills, both hypothetico-deductive and argumentative, should be specifically and explicitly addressed in order to ensure that biology majors are reaching these course and curriculum objectives. By placing more emphasis on SR skills early in the program, the foundation can be laid for the remaining curriculum and work beyond.

Future Work

Future work related to this study will focus on two main aspects: instrument refinement and factors that impact SR skills. The first goal is to further refine and validate the argumentation instrument, particularly with oral interviews. Typically, the research of argumentation is completed through oral interviews or discourse analysis. To truly make the argumentation instrument useful, it will need to provide a similar level of information as the current analytical systems. It will also be important to compare the instrument as it stands with scientific data sets to a similar version with a socioscientific topic. This could help bridge the gap between argumentation research on socioscientific issues and scientific data. An end goal of the instrument refinement is not only to make it a better academic research tool, but also to make it a useful action research tool for teachers. The LCTSR, due to its reliability, validity, and ease of use, is often noted in action research conducted both formally and informally. It would be useful if a pencil and paper instrument could also be developed to assess students’ inductive reasoning of scientific data. This could help classroom teachers formatively assess their students’ abilities and focus on areas of need.

129 The other main area of future study will be to follow the development of biology majors’ scientific reasoning skills throughout the entire curriculum. This research will allow for the identification of any key types of courses, experiences, or points in the curriculum that particularly aid in SR skills development. This research may also help to identify characteristics of these factors that could be developed for interventions to aid in

SR skills development. For example, due to the intimate nature of SR with the nature of science, it will be important to identify correlations between SR skills and students’ understanding of the nature of science. From this and the literature, future work can focus on developing and implementing course and curriculum interventions. Part of this work will also be identifying the impact of instructors’ attitudes and attentions related to giving explicit instruction on scientific reasoning in both the lecture and laboratory. It will also be of interest then to determine if these types of influences and intervention have any rolling impact on the K-12 classrooms of biology majors who wish to teach.

Overall, the work in this study and beyond is only beginning to unravel the needs of biology majors as a population. More information is needed regarding the development of SR skills throughout the undergraduate program and factors that affect this development, along with determining methods of intervention. From this point, biology departments can more readily prepare their students for their future, whether it is as researchers, medical professionals, teachers, or just scientifically literate individuals.

130 REFERENCES

Agnes, M. (Ed.). (2002). Webster's new world college dictionary (4th ed.). Cleveland, OH: Wiley Publishing, Inc.

Allchin, D. (2003). Lawson's shoehorn, or should the philosophy of science be rated 'x'? Science & Education, 12, 315-329.

American Association for the Advancement of Science. (1990). Science for all Americans. New York: Oxford University Press.

Arlin, P. K. (1975). Cognitive development in adulthood: A fifth stage? Developmental Psychology, 11(5), 602-606.

Baron, J. (1991). Beliefs about thinking. In J. F. Voss, D. N. Perkins & J. W. Segal (Eds.), Informal reasoning and education (pp. 169-186). Hillsdale, NJ: Lawrence Erlbaum Associates.

Bender, H. (2005). The study of biology. Change, 37(2), 42-43.

Bonstetter, R. J. (1998). Inquiry: Learning from the past with an eye on the future. Electronic Journal of Science Education, 3(1), Guest Editorial.

Bransford, J. D., Brown, A. L., & Cocking, R. R. (Eds.). (2000). How people learn: Brain, mind, experience, and school (Expanded ed.). Washington, D.C.: National Academy Press.

Carter, J. L., Heppner, F., Saigo, R. H., Twitty, G., & Walker, D. (1990). The state of the biology major. BioScience, 40(9), 678-683.

Cavallo, A. M. L., Rozman, M., Blickenstaff, J., & Walker, N. (2003/2004). Learning, reasoning, motivation, and epistemological beliefs: Differing approaches in college science courses. Journal of College Science Teaching, 33(3), 18-22.

131 Cerbin, B. (1988). The nature and development of informal reasoning skills in college students. Paper presented at the National Institute on Issues in Teaching and Learning, Chicago, IL. (ERIC Document Reproduction Service No. ED 298805).

Chiappetta, E. L. (1976). A review of Piagetian studies relevant to science instruction at the secondary and college level. Science Education, 60(2), 253-261.

Chinn, C. A., & Brewer, W. F. (1993). The role of anomalous data in knowledge acquisition: A theoretical framework and implications for science instruction. Review of Educational Research, 63(1), 1-49.

Chinn, C. A., & Brewer, W. F. (1998). An empirical test of a taxonomy of responses to anomalous data in science. Journal of Research in Science Teaching, 35(6), 623- 654.

Committee on Undergraduate Biology Education to Prepare Research Scientists for the 21st Century. (2003). BIO2010: Transforming undergraduate education for future research biologists. Retrieved May 8, 2006, from http://www.nap.edu/catalog/10497.html

Daempfle, P. A. (2002). Instructional approaches for the improvement of reasoning in introductory college biology courses: A review of the research. (ERIC Document Reproduction Service No. ED468720)

Driver, R., Newton, P., & Osborne, J. (2000). Establishing the norms of scientific argumentation in classrooms. Science Education, 84(3), 287-312.

Giere, R. N., Bickle, J., & Mauldin, R. F. (2006). Understanding scientific reasoning (5th ed.). Belmont, CA: Thomson Wadsworth.

Harker, A. R. (1999). Full application of the scientific method in an undergraduate teaching laboratory. Journal of College Science Teaching, 29(2), 97-100.

Hewson, M. G., & Hewson, P. W. (2003). Effect of instruction using students' prior knowledge and conceptual change strategies on science learning. Journal of Research in Science Teaching, 40(Supplement), S86-S98.

Hodson, D. (1996). Laboratory work as scientific method: Three decades of confusion and distortion. Journal of Curriculum Studies, 28(2), 115-135.

Hogan, K., & Maglienti, M. (2001). Comparing the epistemological underpinnings of students’ and scientists’ reasoning about conclusions. Journal of Research in Science Teaching, 38(6), 663-687.

132 Inhelder, B., & Piaget, J. (1958). The growth of logical thinking from childhood to adolescence: An essay on the construction of formal operational structures (A. Parsons & S. Milgram, Trans.). New York: Basic Books, Inc.

Jiménez-Aleixandre, M. P., Rodríguez, A. B., & Duschl, R. (2000). "Doing the lesson" or "doing science": Argument in high school genetics. Science Education, 84(6), 757-792.

Johnson, M. A., & Lawson, A. E. (1998). What are the relative effects of reasoning ability and prior knowledge on biology achievement in expository and inquiry classes? Journal of Research in Science Teaching, 35(1), 89-103.

Karplus, R. (1977). Science teaching and the development of reasoning. Journal of Research in Science Teaching, 14(2), 169-175.

Kuhn, D. (1992). Thinking as argument. Harvard Educational Review, 62(2), 155-178.

Kuhn, D. (1993a). Connecting scientific and informal reasoning. Merrill-Palmer Quarterly, 39(1), 74-103.

Kuhn, D. (1993b). Science as argument: Implications for teaching and learning scientific thinking. Science Education, 77(3), 319-337.

Kuhn, D., & Pearsall, S. (2000). Developmental origins of scientific thinking. Journal of Cognition and Development, 1(1), 113-129.

Kuhn, T. S. (1993). Logic of discovery or psychology of research? In J. H. Fetzer (Ed.), Foundations of philosophy of science: Recent developments (pp. 364-380). New York: Paragon House.

Kuhn, T. S. (1996). The structure of scientific revolutions (3rd ed.). Chicago: The University of Chicago Press.

Lakatos, I. (1993). and its rational reconstructions. In J. H. Fetzer (Ed.), Foundations of philosophy of science: Recent developments (pp. 381-413). New York: Paragon House.

Lawson, A. E. (1980). Relationships among level of intellectual development, cognitive style, and grades in a college biology course. Science Education, 64(1), 95-102.

Lawson, A. E. (1982). The nature of advanced reasoning and science instruction. Journal of Research in Science Teaching, 19(9), 743-760.

133 Lawson, A. E. (1983). Predicting science achievement: The role of developmental level, disembedding ability, mental capacity, prior knowledge, and beliefs. Journal of Research in Science Teaching, 20(2), 117-129.

Lawson, A. E. (1992a). The development of reasoning among college biology students - a review of research. Journal of College Science Teaching, 21, 338-344.

Lawson, A. E. (1992b). What do tests of "formal" reasoning actually measure? Journal of Research in Science Teaching, 29(9), 965-983.

Lawson, A. E. (1993). Using reasoning ability as the basis for assigning laboratory partners in nonmajors biology. Journal of Research in Science Teaching, 29(7), 729-741.

Lawson, A. E. (1995). Science teaching and the development of thinking. Belmont, CA: Wadsworth Publishing Company.

Lawson, A. E. (2000). Classroom test of scientific reasoning: Multiple choice version, based on Lawson, A. E. (1978). Development and validation of the classroom test of formal reasoning. Journal of Research in Science Teaching, 15(1), 11-24.

Lawson, A. E. (2003a). Allchin's shoehorn, or why science is hypothetico-deductive. Science & Education, 12, 331-337.

Lawson, A. E. (2003b). The nature and development of hypothetico-predictive argumentation with implications for science teaching. International Journal of Science Education, 25(11), 1387-1408.

Lawson, A. E. (2005). What is the role of induction and deduction in reasoning and scientific inquiry? Journal of Research in Science Teaching, 42(6), 716-740.

Lawson, A. E., Alkhoury, S., Benford, R., Clark, B. R., & Falconer, K. A. (2000). What kinds of scientific concepts exist? Concept construction and intellectual development in college biology. Journal of Research in Science Teaching, 37(9), 996-1018.

Lawson, A. E., Baker, W. P., DiDonato, L., & Verdi, M. P. (1993). The role of hypothetico-deductive reasoning and physical analogues of molecular interactions in conceptual change. Journal of Research in Science Teaching, 30(9), 1073- 1085.

Lawson, A. E., Banks, D. L., & Logvin, M. (2007). Self-efficacy, reasoning ability, and achievement in college biology. Journal of Research in Science Teaching, 44(5), 706-724.

134 Lawson, A. E., Clark, B., Cramer-Meldrum, E., Falconer, K. A., Sequist, J. M., & Kwon, Y.-J. (2000). Development of scientific reasoning in college biology: Do two levels of general hypothesis-testing skills exist? Journal of Research in Science Teaching, 37(1), 81-101.

Lawson, A. E., Drake, N., Johnson, J., Kwon, Y.-J., & Scarpone, C. (2000). How good are students at testing alternative explanations of unseen entities? American Biology Teacher, 62(4), 249-255.

Lawson, A. E., & Johnson, M. (2002). The validity of Kolb learning styles and neo- Piagetian developmental levels in college biology. Studies in Higher Education, 27(1), 79-90.

Lawson, A. E., & Weser, J. (1990). The rejection of nonscientific beliefs about life: Effects of instruction and reasoning skills. Journal of Research in Science Teaching, 27(6), 589-606.

Lawson, A. E., & Wollman, W. T. (2003). Encouraging the transition from concrete to formal cognitive functioning - an experiment. Journal of Research in Science Teaching, 40(Supplement), S33-S50.

Lehmann, I. J. (1963). Changes in critical thinking, attitudes, and values from freshman to senior years. Journal of Educational Psychology, 54(6), 305-315.

Leonard, W. H. (1989). Ten years of research on investigative laboratory instruction strategies. Journal of College Science Teaching, 18, 304-306.

Leonard, W. H. (2000). How do college students best learn science? An assessment of popular teaching styles and their effectiveness. Journal of College Science Teaching, 29(6), 385-388.

Marbach-Ad, G. (2004). Expectations and difficulties of first-year biology students. Journal of College Science Teaching, 33(5), 18-23.

Means, M. L., & Voss, J. F. (1996). Who reasons well? Two studies of informal reasoning among children of different grade, ability, and knowledge levels. Cognition and Instruction, 14(2), 139-178.

National Research Council. (1996). National science education standards. Washington, D.C.: National Academy Press.

National Research Council. (1999). Transforming undergraduate education in science, mathematics, engineering, and technology. Washington, DC: National Academy Press.

135 National Science Foundation. (1996). Shaping the Future: New expectations for undergraduate education in science, mathematics, engineering, and technology. Arlington, VA: National Science Foundation.

Nersessian, N. J. (1995). Should physicists preach what they practice? Constructive modeling in doing and learning physics. Science & Education, 4(3), 203-226.

Osborne, J., Erduran, S., & Simon, S. (2004). Enhancing the quality of argumentation in school science. Journal of Research in Science Teaching, 41(10), 994-1020.

Pascarella, E. T. (1987). The development of critical thinking: Does college make a difference? Paper presented at the Annual Meeting of the Association for the Study of Higher Education, Baltimore, MD. (ERIC Document Reproduction Service No. ED292417)

Perkins, D. N. (1985). Postprimary education has little impact on informal reasoning. Journal of Educational Psychology, 77(5), 562-571.

Perkins, D. N., Allen, R., & Hafner, J. (1983). Difficulties in everyday reasoning. In W. Maxwell (Ed.), Thinking: The expanding frontier (pp. 177-189). Philadelphia, PA: Franklin Institute Press.

Perkins, D. N., Farady, M., & Bushey, B. (1991). Everyday reasoning and the roots of intelligence. In J. F. Voss, D. N. Perkins & J. W. Segal (Eds.), Informal reasoning and education (pp. 83-106). Hillsdale, NJ: Lawrence Erlbaum Associates.

Perkins, D. N., & Salomon, G. (1989). Are cognitive skills context-bound? Educational Researcher, 18(1), 16-25.

Phillips, D. C. (1978). The Piagetian child and the scientist: Problems of assimilation and accommodation. Educational Theory, 28(1), 3-15.

Piaget, J. (1972). Intellectual evolution from adolescence to adulthood. Human Development, 15(1), 1-12.

Piaget, J. (2003). PART I. Cognitive development in children: Piaget - Development and learning. Journal of Research in Science Teaching, 40(Supplement), S8-S18.

Popper, K. R. (1965). Conjectures and refutations: The growth of scientific knowledge. New York: Harper & Row.

Popper, K. R. (1993). Science: Conjectures and refutations. In J. H. Fetzer (Ed.), Foundations of philosophy of science: Recent developments (pp. 341-363). New York: Paragon House.

136 Sadler, T. D. (2004). Informal reasoning regarding socioscientific issues: A critical review of research. Journal of Research in Science Teaching, 41(5), 513-536.

Sadler, T. D., & Zeidler, D. L. (2005). The significance of content knowledge for informal reasoning regarding socioscientific issues: Applying genetics knowledge to genetic engineering issues. Science Education, 89(1), 71-93.

Seymour, E., & Hewitt, N. M. (1997). Talking about leaving: Why undergraduates leave the sciences. Boulder, CO: Westview Press.

Shaw, V. F. (1996). The cognitive processes in informal reasoning. Thinking and Reasoning, 2(1), 51-80.

Sigma Xi. (1990). Entry-level undergraduate courses in science, mathematics, and engineering: An investment in human resources. Research Triangle Park, NC: Sigma Xi, The Scientific Research Society.

Staver, J. R., & Pascarella, E. T. (1984). The effect of method and format on the responses of subjects to a Piagetian reasoning problem. Journal of Research in Science Teaching, 21(3), 305-314.

Thorton, S. (2005). Karl Popper. Retrieved June 6, 2006, from http://plato.stanford.edu/archives/sum2005/entries/popper/

Toplak, M. E., & Stanovich, K. E. (2003). Associations between myside bias on an informal reasoning task and amount of post-secondary education. Applied Cognitive Psychology, 17, 851-860.

Toulmin, S. E. (1958). The uses of argument. Cambridge, Great Britain: Cambridge University Press.

Toulmin, S., Rieke, R., & Janik, A. (1984). An introduction to reasoning (2nd ed.). New York City: Macmilian Publishing Company. van Gelder, T., & Bissett, M. (2004). Cultivating expertise in informal reasoning. Canadian Journal of Experimental Psychology, 58(2), 142-152. von Glasersfeld, E. (1993). Questions and answers about radical constructivism. In K. Tobin (Ed.), The practice of constructivism in science education (pp. 24-38). Hillsdale, NJ: Lawrence Erlbaum Associates.

Voss, J. F., & Means, M. L. (1991). Learning to reason via instruction in argumentation. Learning and Instruction, 1, 337-350.

137 Windschitl, M., & Buttemer, H. (2000). What should the inquiry experience be for the learner? American Biology Teacher, 62(5), 346-350.

Woll, S. B., Navarrete, J. B., Sussman, L. J., & Marcoux, S. (1998). College students' ability to reason about personally relevant issues. Paper presented at the American Psychological Association Annual , San Francisco, CA. (ERIC Document Reproduction Service No. ED424556)

Zeidler, D. L. (1997). The central role of fallacious thinking in science education. Science Education, 81(4), 483-496.

Zohar, A. (1996). Transfer and retention of reasoning strategies taught in biological contexts. Research in Science & Technological Education, 14(2), 205-219.

Zohar, A., & Nemet, F. (2002). Fostering students' knowledge and argumentation skills through dilemmas in human genetics. Journal of Research in Science Teaching, 39(1), 35-62.

138 APPENDIX A

ARGUMENTATION INSTRUMENT

139 Form A

While climbing in the Rocky Mountains, you find an isolated valley that seems to have been separated from the surrounding area by a landslide centuries ago. You are surprised to find an unknown species of yellow flower from the family Asteraceae. You know that there are several species nearby that could possibly be the original source of this population. You collect data from these populations to try to determine the evolutionary source of the new flower.

Seed Flower Height (m) Petal Tip Leaf Shape Leaf Edge Dispersal Unknown 1.0 Round Uneven Ragged Bird Daisy 0.75 Pointed Uneven Ragged Wind Sunflower 2.0 Round Teardrop Smooth Bird Dandelion 0.25 Pointed Uneven Ragged Wind Goatsbeard 0.5 Round Uneven Smooth Bird

1. What is a conclusion that you can draw from the data regarding the evolutionary source of this new flower?

2. What data are you using to support this conclusion?

3. What rationale links this data to your conclusion?

4. The local naturalist is excited by your find, but takes an alternative viewpoint with regards to the data. What does he conclude?

5. How would you respond to this viewpoint?

140 Form B

You have been surveying Lake Erie for the last five years to better understand the relationship among the zebra mussel, quagga mussel, round goby (fish), and small mouth bass. To answer this question, you have collected the following data:

Average Mature Preferred Organism Density per Preferred Food Spawning Year (#/sample) Sources Habitat 00 01 02 03 04 Rocky, Zebra Mussel 125 117 98 87 74 Protozoa shallow Quagga Rocky, 143 135 100 84 121 Protozoa Mussel shallow Sandy, Round Goby 8 11 17 21 10 Mussel larvae, Algae shallow Immature Round Goby, Small Mouth Rocky, 6 9 10 18 22 Immature Mussels, shallow Bass Minnows

1. What is a conclusion that you can draw from the data regarding these relationships?

2. What data are you using to support this relationship?

3. What rationale links this data to your conclusion?

4. A graduate student in your lab takes an alternative viewpoint with regards to the data. What does she conclude?

5. How would you respond to this viewpoint?

141 APPENDIX B

ARGUMENTATION INSTRUMENT SCORING RUBRIC

142 Item Number 0 1 2 Claim made is No claim made or Claim made is weakly related to claim made is clearly related to or supported by 1 – Claim made irrelevant to data/scenario data/scenario data/scenario presented and is presented or is presented conservative too broad/general No grounds used or Grounds given Grounds given grounds used are sufficiently weakly support irrelevant to support claim 2 – Grounds used claim made data/scenario made, identifying and/or are too presented (“all specific data and general data”) trends No warrants given or warrant given is Warrant weakly Warrant is valid in irrelevant to related grounds to light of grounds 3 – Warrants given data/scenario claim or is used and claim presented or is somewhat unclear made completely unclear No Counterargument counterarguments given is weakly Counterargument generated or opposed to initial given is clearly counterargument claim or related to 4 – Counterargument generated is supported data/scenario generated irrelevant to by/related to presented and data/scenario data/scenario opposes initial presented or not presented (no claim opposed to initial answer to “why”) claim at all No rebuttal offered Weak rebuttal or rebuttal offered Rebuttal is clearly offered, not is irrelevant to identified and supported by 5 – Rebuttal offered data/scenario supported by grounds or just presented (“both grounds, offers expansion on valid” or “more new viewpoint warrant/claim research needed”)

If the individual begins the assessment in earnest and leaves items blank, those items are scored as 0.

143 APPENDIX C

STUDENT DEMOGRAPHIC INFORMATION INSTRUMENT (WINTER 2007)

144 1. Are you willing to participate in the study entitled, “Scientific Reasoning Skills Development in the First Courses of Undergraduate Biology Majors”?

YES NO

2. What is your age? ______

3. What is your gender? MALE ______FEMALE ______

4. What is your university rank?

SENIOR GRADUATE FRESHMAN SOPHOMORE JUNIOR SENIOR + STUDENT

5. How many years have you attended college as an undergraduate? ______

6. What is your declared or planned major? ______

7. How many advanced placement (AP) science courses have you taken? ______

8. How many college-level science courses have you taken? ______

9. *What quarter did you take Biology 1 or its equivalent? ______

10. *If you did not take Biology 1 at OSU, where did you complete its equivalent?

______

This school follows (please check one): Semesters _____ Quarters _____

11. What degree are you working toward?

A.S. _____ B.A. _____ B.S. _____ M.S. _____

145 12. What are your plans after completing your degree?

GRADUATE PROFESSIONAL GRADUATE BACHELOR’S SCHOOL: SCHOOL (i.e. JOB SCHOOL: DEGREE OTHER THAN MEDICAL, SCIENCE SCIENCE DENTAL, ETC…)

If you are not pursuing a career in science, in what field are you planning on

pursuing a career? ______

* These questions were only asked in for the winter administration to Biology 2.

146