Spring 1991 67

Research In Informal Settings: Some Reflections on Designs and Methodology

John J. Koran, Jr. Jim Ellis Professor and Curator Research Assistant Science Education and Museum Studies Museum Studies Associate Dean, The Graduate School

Florida Museum ofNatural History University ofFlorida Gainesville, FL 32611

Research in informalsettings can be experimental or naturalistic or a combination of methods, allofwhich canyield rich . Research designs areselected to explore specific problems; however, even the most carefully thought-out designs and procedures have shortcomings. Frequently, research studies contribute to summative or formative evaluation objectives. All evaluation studies, however, do not necessarily adhere to the rigorous criteria required for research studies and consequently mayor may not be strictlydefined as ((research. "Thispaperfocuses on experimentaldesigns andthe review offour experimental studies, considering threats to internal and external . Internal validitydeals with whether the experimentaltreatmentsactuallyproduce the observed effects. External validity concernsgeneralizability ofstudyfindings to other settings, exhibits, andsubjects. Threats to validityinclude history, maturation, testing, instrumentation, regression, selection, mortality, and interactions. The studies selected for this analysis were all variations of experimental designs. In each case, the study could haveprofitedfrom stronger, morefocused research designs and methodology as well as from exit interviews ofthe subjects andfrom the collection of other types of naturalistic data. It is the authors' opinion that both experimental and naturalistic methods can often be used together to enrich the data base from which to make inferences and contribute to knowledge about learning in informal settings. 68 ILVS Review

INTRODUCTION Research in informal settings can be experimental or naturalistic or a combination of methods, all of which can yield rich data. Regardless of the methods used, research designs are selected to explore specific issues and problems. However, even the most carefully thought-out designs and procedures can have some shortcom­ ings. This is especially true of field research, where control of variables is often difficult to achieve. While research studies can contribute to evaluation objectives, not all evaluation studies necessarily adhere to the rigorous criteria required for research studies. Consequendy, such studies mayor may not be stricdy defined as "re­ search." This paper will focus on experimental research designs and will critique four published studies.

Experimental Designs: Some Basics Cook and Campbell (1979) suggest that the word "" implies testing, causal relationships, deliberate manipulation, and inference. For clarity we will use the term to signify a treatment, an outcome measure, some form of ofsubjects, and a method for comparison to determine the effects ofthe treatment. A treatment can involve any number ofpossible manipulations; however, for museums and other informal learning settings,l treatments frequendy involve having subjects (visitors, students) view or participate in some exhibit or program that includes a planned and/or sequenced set of informational or instructional materials. Outcome indicators or instruments generally take the form ofquestion­ naires, interviews, or other ways ofmeasuring knowledge, comprehension, inter­ ests, attitudes, etc. These instruments often measure combined outcomes (~.g., knowledge and attitude), although interactions between cognitive and affective questions can occur, making the results difficult to interpret. Other forms of measurement also are used, such as unobtrusive observation ofvisitors and coding ofpertinent visitor behavior. Dependingonthe problem they are studying, researchers often assign their subjects to several treatment groups and/or a control group. In order to enhance the ability ofthe researcher to make an inference about the change observed as a result ofa particular treatment, a randomization technique for subject selection and assignment should be utilized. Random selection from a given population assures the researcher that the subjects are representative of that population. Random

1. Informal learning settings can include zoological parks, botanical gardens, nature centers, national and state parks, aquaria, schoolbased nature trails, most field trip locations, and even many school laboratory activities. Spring 1991 69

assignment of subjects from a selected population to the treatment and control groups ensures that the study groups are equivalent in all factors other than those associated with the treatments (Cookand Campbell, 1979;Smith and Glass, 1987). Researchers should be aware that the unit ofanalysis is dependent on the method by which subjects are selected and assigned. If individual visitors are randomly assigned to treatments and control, the to be analyzed is based on the number ofvisitors in the ; however, ifone assigns groups of visitors (e.g., adult education classes), then the unit of analysis is based on the number of groups and not individuals. The latter would require a considerably larger sample size since small numbers ofintact groups generally must show robust treatment effects before the effects are measurable. Yet in some situations, class assignment may be easier than splitting up classes.

Three Basic Research Designs Research designs vary according to researcher preferences and types of questions asked. Campbell and Stanley(1973) have written one of the best introductory texts on experimental research designs. Table 1 outlines three ofthe most basic designs. Design 1 is best suited when the researcher has a large or relatively homogeneous sample and has reason to believe that randomization will be effective in equating the treatment and control groups; hence a pretest to determine whether randomizationwas effective is not necessary. This type ofdesign would be suitable for many ofthe museum and informal settings that include the casual visitor as well as students in planned groupvisits. This design can be used with class (field trip) groups by randomly assigning each student in a particular class to a treatment or control group. The treatment group views an exhibit or otherwise participates in the treatment and then receives the post-test. The control group is first tested with the same post-test and then allowed to view the exhibit or receive the treatment; through this modified design, the control group can benefit from the exhibit or treatment while still serving as a control. Designs 2 and 3 are most suitable when the researcher has greater control ofthe subjects or groups because at least two testing periods are needed for each group. Laboratory type settings, classrooms, school groups, and other planned groupings associated with museums and informal settings mightlend themselves to this type ofdesign. Again, as these are experimental designs, random selection and assignment are used to offset both design threats and sources of bias in selecting subjects (Table 2). A pretest, given to ascertain if randomization was indeed effective, provides data on whether the group and standard deviations were equal at the outset. Design 3 provides the researcher with the opportunity to study the effects ofpretesting on the treatment (interactions ofthe treatment and test) and the effects ofthe pretest on the post-test when no treatment intervenes. 70 ILVS Review

Table 1 Experimental Research Designsfor Informal Settings

Design Subject Pretest Treatment Post-test Assignment

Design 1 Post-test Only treatment group random NO YES YES control group random NO NO YES

Design 2 Pretest/Post-test treatment group random YES YES YES control group random YES NO YES

Design 3* Solomon four-group treatment group random YES YES YES control group random YES NO YES treatment group random NO YES YES control group random NO NO YES

*This design is particularly strong because it controls for most effects that may be caused by the test itself.

Note: YES indicates that a test or treatment is done in the sequence presented. NO indicates that the particular aspect is not active in the design.

(adapted from Campbell and Stanley, 1973) VJ Table 2 '"d ::I. ::l Design Threats to Validity CJQ I-' '0 '0 I-' Design History Maturation Testing Instrumentation Regression Mortality

Design 1 Post-test Only + + + + + + with control group

Design 2 Pretest/ post-test with + + + + + + control group

Design 3 Solomon four- + + + + + + group

Note: Plus (+) sign indicates threat is accounted for through in the design. Question mark (?) indicates that threats to validity may not be fully accounted for by random assignment.

(adapted from Campbell and Stanley, 1973)

'-J I-' 72 ILVS Review

As illustrated in Table 2, a number ofpossible design threats or factors can hinder the interpretation of the results. Campbell and Stanley (1973), in their discussion of experimental and quasi-experimental designs for research, refer to these threats to validity as history (specific events occurring during the study), maturation (growth or change in the subjects during the study), testing (effects of the test on the subjects ofthe study), instrumentation (changes in rating scales or instruments during the study), regression (scores moving toward the regard­ less oftreatment because ofthe groups' extreme or unique nature), selection (biases based on placement ofsubjects in groupings), mortality (non-random reasons for subjects dropping outofthe groupings), and interactions(various aforementioned threats interacting with each other). These threats may affect the ability of the researcher to observe true treatment effects and to infer what caused them; in additiol?-, the threats may affect generalizability to other settings. Table 2 shows that, for the three designs previously discussed, the majority ofthese threats are addressed by the random assignment process.

Validity and Reliability The ofa study deals with whether or not the treatment actually produced the effects observed. External validity is concerned with the degree to which findings are generalizable to other settings, subjects, and exhibits or programs. Lack ofexternal validity results in data that are unique to one type of setting and set ofconditions rather than to a broad ofsimilar conditions. It is up to the researcher to balance the requirements for internal and external validity and, in the end, develop a design that best answers the questions being asked. Anothersignificant threat to any study is instrumentvalidity and reliability. Instrument validity is concerned with whether the instrument is measuring the actual content or characteristics presented to the treatment group. For example, does the instrument measure the content actually taught? Instrumentvalidity often can be established by having the instrument reviewed by experts or subject-area teachers; in addition, the instrument can be compared with the actual exhibit or program. Questions ofmeasurement reliability (how consistent and dependable the instrument is) must also be answered. Three aspects should be considered for determiningthe reliability ofany instrumentation: testing(does the testmeasure the same thing repeatedly; if using two forms of the same test, are the forms equivalent?), internalconsistency (are all ofthe items on the test measuring the same propertyorquality?), and inter-observeragreement(ifusing observers orraters, how well do theyagree on theirobservations?). In addition to design threats, other basic considerations in judging the quality ofresearch are: identitying significant prob­ lems to explore, carefully describing them, describing the theoretical basis ofthe Spring 1991 73

study, and rigorously adhering to the selection procedures established for identifY­ ing visitors for study. The authors use the above criteria and standards to analyze four frequently cited articles that have been published in the area ofmuseum research studies over the last decade. Studies 1 and 2 serve as examples of a Post-test Only/Control Group design (Table 1, Design 1), with the unit of analysis based on the total number of individual students. Study 3 is a Pretest/Post-test example (Table 1, Design 2). Study 4, although using a modified Post-test Only/Control Group design, illustrates the complexities and difficulties in both design and interpretation that can arise when multiple factors are included (knowingly and unknowingly) in an experimental design.

THE STUDIES

STUDY 1: Koran, J, J" Jr., Lehman, J. R., Dierking, L. D., & Koran, M. L. (1983). The relative effects ofpre- and post-attention directing devices on learning from a ~~walk-through" museum exhibit. Journal ofResearch in Science Teaching) 20(4), 341-346.

Purpose and Rationale:The stated purpose ofthis study was to determine whether an information panel placed before or after an exhibit had a beneficial effect in focusing visitor attention. The exhibit studiedwas awalk-through cave. This exhibit allowed the visitor to enter a cave setting containing the characteristic flora, fauna, and geological formations found in Florida caves. The cave as designed had no introductoryinformation preceding it, butit did have an external information panel at its exit. Previous theory on attention, encoding, and processing (Bransford, 1979; Glynn, 1978; Glynn & Di Vesta, 1979; Keele, 1973; Rothkopf, 1970) suggests that an exhibit information panel can have two types of effects. When placed before the exhibit, such a panel serves to converge visitor attention on information intended as exhibit outcome. The visitor experiencing the same panel after the exhibit may have attended to a broad spectrum of information in the exhibit rather than the information intended as exhibit outcome, and the panel may act to diverge that attention. Thus, the group receiving the panel before the cave would be expected to learn more intended information about the cave. Research Design and Procedures: Twenty-nine seventh and eighth grade students attending a National Science Foundation workshop were randomly assigned to two treatment groups and a control. Treatment 1 viewed the exhibit with the pre-panel. Treatment 2 saw the exhibit with the post-panel. Subjects who did notvisit the cave (control) provided a baseline of knowledge. Time spent in the exhibit was controlled for the three groups, and a 25-item multiple choice test was administered 74 ILVS Review

after 20 minutes ofexposure to the cave. Visitors were not cued or instructed to follow any particular instructions. Based on the Kuder-Richardson method (K­ R20) for establishing internal consistency reliability, a reliability coefficient of.80 was calculated. Internal validity of the instrument was established by a panel of biology teachers, biologists, and cave experts, who reviewed the test to answer the internal validity question, "Does the instrument measure all ofthe stated objectives ofthe cave exhibit factually and conceptually?" The overall design was a modified Post-test Only/Control Group design with two treatments (Table 1, Design 1).

Findings: Notunexpectedly, the authors reported that bothofthe treatment groups did better on the 25-item multiple choice test than the control group did. Analysis ofvariance results were significant, [F(2, 28) = 8.09, P < .01]. Although greater success ~as predicted for Treatment 1, this did not occur. However, the differences were in the predicted direction.

Critique afStudy:Although the number ofsubjects in this study was small (N=29), the for the 2 treatment groups was equal, thus justifying the use ofanalysis ofvariance. Also, both the reliability and internal validity ofthe criterion instrument were judged high; hence, the researchers were measuring what they intended to measure. A pretest would have been ofvalue in providing additional information about individual differences in the groupings and allowing for different forms of' analyses such as analysis ofcovariance. In this case, with random assignment, the pretest was not necessary and the researcher can assume that the groups were equated unless there was a flaw in the randomization procedure. As always, due to external validity considerations, generalizations about results can only be applied to a similar sample (NSF summer students - mean IQ 118) in the same type ofcave setting. Essentially, external validity was threatened by the unlikelihood ofrecreating identical cave conditions and having a sample of subjects with similar mean IQs. There is a serious flaw in this study. There should have been an additional control group that walked through the cave with no panels present, thus providing more precise information about the processing effect of individuals without any cues (panels). While this would require a larger sample, say 80 students, it would have provided data on the effects ofthe exhibit without panels in addition to the baseline provided by visitors who did not see the exhibit (control group). Further­ more, a homogeneous group ofstudents with a mean IQ of 118 is quite unusual. Literature in the area ofaptitude X treatment interaction (Koran & Koran, 1980) suggests that higher ability students do well regardless ofthe types oftreatment or cues provided. Brighter students have been shown to use their own processing methods and thus accommodate for variability in the stimulus materials presented to them. This is one plausible explanation for the lack of differences between treatments, although other reasons are also possible. Spring 1991 75

STUDY 2: Dierking, L. D., Koran, J. J., Jr., Lehman, J., & Koran, M. L. (1984). Recessing in exhibit design as a device for directing attention. Curator, 27( 3),238­ 248.

Purpose andRationale:Neal (1976) calls recessing the "hole in the wall" technique. The researchers in this study define recessing as the placing ofan object in an area that is indented or offset in a deepened pocket or hole from the face ofthe exhibit panel itself. Many museum designers advocate recessing as a device for directing and focusing attention (Neal, 1976). But is it? Recessing in an exhibit should act to increase learning by focusing attention on the recessed object, in this case a particular order ofinsect. Lack of recessing should result in the observer searching the array ofstimuli without a cue as towhat organism or organism characteristic to focus on. The studypresented the exhibit objects inductively to two treatment groups. Only one ofthe two treatment groups was provided additional cueing (recessing). The authors proposed that this would be an effective method oftesting the value ofrecessing.

Research Design and Procedures: A simulated museum exhibit was designed to convey information on the characteristics of the beetle (Class Insecta, Order Coleoptera). The exhibit panel consisted of five insect trios (a Coleopteran positioned between two specimens from two other insect orders). The only text materials in the exhibit were labels with the common names of the three insect orders; these labels were displayed at the top ofthe exhibit case, not adjacent to the specimens. Ninety-nine seventh and eighth grade students were randomly assigned to three groups. Treatment 1 viewed the exhibit case with the Coleopterans recessed. Treatment 2 viewed the identical exhibit, but without the recesses. The control group did not view either exhibit but read a short unrelated article on ~~Insects and Human Welfare" from The American Biology Teacher. Both treatment and control groups received a one-page set ofinstructions describing the body parts ofthe Class Insecta and telling students to "pay close attention to the variations in body, wings, legs, and antennae ofeach ofthe orders" (p. 241). The treatment groups were told to try to determine "what makes a beetle a beetle" from an examination of the exhibit panel: the control group was told to try to determine "what makes a beetle a beetle" from the unrelated text they were asked to read. Thus, both treatment and control groups received equivalent instructions. Given these instructions, the subjects had to induce the concept of beetle either from an examination of the exhibit panel (treatment groups) orthe unrelated text (control group). The control group spent the same amount of time on their activity as the participants in the 76 ILVS Review

experimental groups. Since the sample was large and randomly assigned, a Post-test Only/ Control Group design was selected because itwas presumed unlikely thatsignificant differences would occur due to faulty randomization. It was also unlikely that the Control Group would learn any information pertinent to the study by reading the above-mentioned article. A 25-item multiple choice examination was administered to the two treatment groups after viewing the experimental panel. To correctly respond to the items on the instrument, the students needed to pay close attention to the characteristics and differences ofthe bodies ofthe insect groups. Questions were based ondistinguishing between the three orders ofinsects and understandingwhat makes a beetle a beetle. Prior to the study, the instrument was pilot tested and revised as necessary. The Kuder-Richardson method (K-R21) for establishing internal consistency reliability on this instrument was .69. To perform well on the measure, according to the researchers, subjects would have to use location and recessing as a cue to attend to the distinguishing characteristics ofthe three insect orders represented.

Findings:Analysis ofvariance was used in this study to assess the differences between treatments and control. Mean test scores showed that the un-recessed group (M= 14.56) significantly exceeded the recessed group (M= 12.03) and the control group [(M = 10.21), F(3, 96) = 11.06, P< .05]. The variance was approximately equal in all groups, so the assumption ofequal variance when using analysis ofvariance was justified. Since Treatment 1 (recessed group) had a mean score below the mean score for Treatment 2 (un-recessed group), it was concluded that recessing alone did not adequately focus attention. . The authors speculated that the recesses may have so directed attention to the beetles that the subjects failed to note the other insects and/or those characteristics that distinguish them from beetles. In fact, recessing may have interfered with subsequent coding ofboth the relevant and irrelevant attributes of the other specimens. The authors note that attention can be a limiting process (Bransford, 1979; Gagne, 1970, 1973; Keele, 1973). Attention paid to one cue can reduce attention paid to another.

Critique of Study: The large sample in this study, randomly assigned, was an advantage. Differences in variance between each treatment group were small, thus giving the hypothesis a realistic test. The large sample also justified the Post-test Only/Control Group design and also negated threats to internal validity. Since the study was short (approximately one hour), history, maturation, selection, and mortality probably did not threaten the outcomes. Spring 1991 77

The study had at least one omission and one problem. Since the sample was so large, and included seventh and eighth graders, the analysis should have considered the two grade levels individually. Perhaps seventh graders required the extra cue for processing the exhibit materials, but eighth graders did not, or vice versa. Second, the instrument clearly required refinement. A reliability coefficient of.69 is marginal. Less than one-halfofthe variance in student performance (48%) could be accounted for by this instrument. Greater reliability of the instrument could have assisted in a finer-grained test of treatments against each other and against the control. Also, gender differences could have been important (i.e., attitude of girls vs. boys toward insects). The authors did not indicate gender composition ofthe treatment groups. Since the study was conducted under controlled simulated conditions (cued visitors, non-museum setting) and with specially designed exhibits, external validity would be limited to these exhibits, in a similar setting, with similar students. Simulated conditions are very difficult to recreate; consequently, establishing external validity is almost an impossible task unless done by the original researchers. Because this was a very well-controlled study conducted under almost "clinical" conditions, we believe that internal validity was held intact for the reasons cited previously.

STUDY 3: Gennaro, E. D. (1981). The effectiveness ofusing pre-visit instructional materials on learning for a museum field trip experience. Journal ofResearch in Science Teaching) 18(3),275-279.

Purpose: The three stated research questions were (1) ~~What is learned by schoolchildren attending a specific museum movie-experience?" (2) "To what extent can this be a more effective learning experience ifthe attention ofthe students is focused on the concepts and theories in the movie-experience before the field trip to the museum?" and (3) ~~Does preliminary instruction help one ability group of students more than another?" (p. 275). Thesubject matterfor the studywas the Big Bang Theory and plate tectonics (treatment group); the control group received information on the geology ofthe National Parks.

Rationale: The author reviewed research indicating that "carefully designed curricular materials and advance organizers can be effective instructional strategies for learning new information" (p. 275). Furthermore, literature was cited that presented conflicting results about the effectiveness of advance preparation for students at different achievement levels. The author clearly distinguished between overviews and advance organizers as pre-instructional strategies. 78 ILVS Review

Research design and procedure: The museum experience consisted ofa movie-in­ the-round. A Pretest/Post-test design with Control Group was used (Table 1, Design 2). The experimental treatment consisted of classroom instructional material drawn from the museum (study sheets, demonstrations, and hands-on experiences) that focused on concepts associated with the Big Bang Theory and plate tectonics. The design consisted of a seven-day experimental period which included an identical pretest (content) administered to both treatment and control groups; a four-day instructional period in which only the treatment group was taught, using the museum-associated materials, "and the control group was taught with materials that did not include the subject matter ofthe treatment materials; a field trip to the museum movie by both groups; and on the last day, the administration ofa post-test (same questions as pretest). The experimental groups were composed of five eighth-grade science classes normally taught by the same teacher. The author states that "each offive eighth-grade earth science classes ... were randomly assigned" (p. 276); in fact, it appears that the students (and not the classes) were randomly assigned to the treatment (n = 49) and control (n = 56). During the study, the control group was taught by the regular teacher, and the experimental group was taught by a graduate student in education. Both teachers were certified in the same subject area ofthe experiment. Instrument design made use ofthe Taxonomy ofEducational Objectives (Bloom et al., 1956) to classify test items. Of 50 items, 6 were factual, 26 Were comprehension, and 18 were synthesis and analysis questions. All items in the test instrument were directly related to ideas and concepts presented in the film. The Spearman-Brown Prophecy formula, which correlates test halves, was carried. out on the post-test ofboth control and treatment groups. The reliability reported was .84.

Findings: Analysis consisted ofusing step-wise multiple and the reporting ofmeans and standard deviations. Class period, pretest scores, treatment type, and interaction ofpretest and treatment scores comprised the order ofthe variables in the regression formula. F and R2 values were reported in table format for the above-mentioned variables tested. The author reports that"Pretest scores accounted for 44% ofthe total variance on the post-test over and above the effect ofthe period ofthe day. Treatment accounted for approximately 7% ofthe variance over and above the effect ofperiod ofday and initial ability" (p. 277). No significant interaction effect was found between class period, pretest scores, and treatment type. Spring 1991 79

Interpretations: The author concludes that there was value in using pre-VISIt instructional materials for students ofall ability levels. The authorrecommends that the study be repeated in order to include students who have had only the pre-visit materials, to include students who did not take the pretest (in order to better understand the testing effect), to use different age groupings to determine the appropriateness ofthe materials and learning experience, to determine the effect of increased time spent on pre-visit materials, and to determine the motivating effect offield trips by presenting them at the beginning ofteaching units.

Critique ofStudy: This study followed clearly outlined procedures. Many of the internal validity threats were accounted for in this study through design consider­ ations, however, the existence ofa possible teacher effect may have posed a serious threat to internal validity. Although the experimental group was taught by a teacher who was a graduate student in education and who specifically trained on the museum-related materials, equal certification in a subject area is not necessarily a good indicator ofteaching ability or equality in teaching methods. An additional problem may have existed because ofcontact between students at the school where the studywas conducted. The data reported does not clarify the extent ofthis threat. Other threats to internal validity may have been present even with the random assignment. The dropping ofany students who did not take the pretest, field trip, or post-test may have increased the threat to internal validity due to mortality. It is expected that because ofrandomization, dropouts for each group should be similar in character; however, the interaction oftests with students (test anxiety, type oftest, and studentability levels, etc.), as well as otherfactors, may have affected the dropout rate and thus become a crucial threat to internal validity. Because no data on causes ofdropout and dropout numbers for each group were reported, this aspect cannot be evaluated. Equal standard deviations for treatment and control groups on the pretest (5.1 ), and similar standard deviations on the post­ tests (7.6/7.0), suggest lowvariability between the groups, thus indicating that the mortality factor may not have been as significant as one might initially expect. One research question dealt with ability-treatment interaction; however, no data on student ability levels was reported that would allow the researcher to answer this question. The conclusion that pre-visit instructional materials were valuable for "students ofall ability levels" is not supported by the research findings. Rather the research supports the conclusion that the pre-visit materials were valuable for the ability levels represented in the particular sample. Instrument threats to validity may also have been present. The content validity ofthe test used may have been questionable as it was established only by having the treatment teacher view the film 10 times and study the museum script materials. The use of subject area teachers or other experts would have been advisable. 80 ILVS Review

External validity issues could be raised because ofthe nature ofthe sample group in this study. The entire sample group was from one school and was taught regularly by one teacher. All ofthe subjects were from a science class. To generalize beyond this specific group would require a much broader design that incorporated more classes, schools, teachers, subject areas, and aptitude levels. Additionally, a treatment effect could potentially be influenced by the novelty ofhaving a new and possibly more stimulating teacher. This implies a need for some control of experimenter effect through using either unbiased observers (sitting in on the classes to document the teaching practices used) or some other form ofmethod­ 0logical control. The author did, to some extent, indicate concern for external validity in that he described a need for·studies that look at other age levels, allowing greater generalization across this variable. This study provides museum educators and researchers with insights into the possible value oftheir pre-visit materials. Although weaknesses existed in the design, the research described by the author provides the reader with sufficient information eitherto be able to replicate the study or, more importantly, to develop further studies to determine the effectiveness ofpre-visit orientation for museum films as well as field trips in general.

STUDY 4: Peart, B. (1984). Impact ofexhibit type on knowledge gain, attitudes, . and behavior. Curator, 27(3), 220-237.

Purpose: Peart's stated purpose for this study was to "determine what kind ofexhibit had the greatest effect on museum visitor behavior in terms of knowledge gain, attitudinal change, attracting power, holdingpower, and interaction" (p. 220). The exhibits used in this study included natural history displays on seabird colonies and their interaction with other species (including humans).

Rationale: The author suggests that museums are changing their exhibits from collections with no order to those that illustrate particular "story lines" and that the collections are placed in artificially created "environments or habitats." Research on the effectiveness ofthese trends has been inconclusive, according to the author. In fact, the author reports that "Knowledge gain does seem to result in some visitors; attitudinal change is minimal, ifit occurs at all, and visitor behavior in relation to exhibits is quite varied" (p. 221). Using this rationale, the author developed six research questions thatdealt with the effects offive exhibit types onknowledge gain, attitude change, the enhancement ofattracting and holding power,2 the enhance-

2. Holding power as defined by Peart is a ratio between actual viewing time and required minimum viewing time. Spring 1991 81

ment ofbehavioral interaction ofa visitor with other visitors or the exhibit, and the correlation between knowledge gain, attitude change, attracting power, holding power, and interaction.

Research design and procedure: A specific part of a newly developed exhibit was modified to provide five treatments: 1. a word exhibit (only a panel containing text) 2. a picture exhibit (picture panel with an accompanying text panel) 3. an object exhibit (object without text) 4. a standard exhibit (objects and a text panel) 5. a standard exhibit (objects, text panel, and sound) The text panel, where present, was the same in content throughout the treatments. It is not clear, however, whether Treatment 2 is a photograph ofthe same three-dimensional object used in Treatments 3, 4, and 5. This is critical because the author defines concrete exhibits as those with three-dimensional objects (Treatments 3, 4, and 5), and abstract exhibits as being two-dimensional and lacking objects (Treatments 1 and 2). The research design consisted ofa Post-test Only/Control Group design in which the control group was not exposed to the exhibit treatments. Instrumen­ tation was developed for measuring the knowledge and attitudinal aspects ofthe study; however, the development process and style of the measures were not reported. Tracking (the ~~unobtrusive observation and recording of behavior in a predetermined, standard fashion") was used as the measure to determine attracting and holding power (p. 221). Twenty-four study days were selected randomly out of a 37-day period during the summer of 1981. The study sample was composed of 616 randomly selected first-time visitors. No information on first-time visitor characteristics, dropouts, refusals to participate, or the methodology for selecting the original pool of visitors is described. The method of assigning visitors to the treatment and control groups is not explained. The study was composed oftwo parts. In part one ofthe study, post-test were administered to 56 visitors in each of the five treatment categories discussed above and questionnaires identical to the post-test were administered to 56 control group visitors (N = 336). In part two, "Unobtrusive tracking of 56 randomly selected visitors to each experimental exhibit type [N = 280]" (pp. 228-229) was used to determine attracting power, holding power, and interaction. Method ofrecording visitor behavior was reported, but assignment to the treatment groups for tracking was not discussed. It appears that the author used the pool of 616 visitors for the two-part studyin the following way: N= 338 for panone and N=280forparttwo. However, 82 ILVS Review

the lack of information on selection and assignment methodology makes any interpretation beyond this questionable. No information is reported as to whether visitors were given any procedural or instructional materials as part of the two studies.

Findings: Knowledge Gain - The researcher concludes that four ofthe treatment exhibits (#1- word, #2- picture, #4- standard, and #5- standard plus sound) significantly increased knowledge gain and that three-dimensional exhibits with labels (#4, #5) have the mostsignificant effect. The data reported, however, suggest a more cautious interpretation because, in fact, they indicate that the picture plus label treatment (#2) was more effective than the standard exhibit (#4) but less effective than the standard plus sound treatment (#5). The object-only treatment (#3) showed no significant increase in knowledge gain, supporting the author's conclusion that "the presence ofa label is critical" (p. 230). Attitudinal Change - No significant difference in attitude was reported between the control group and the five treatment groups. On the basis of this evidence, the researcher states that "exhibits do not significantly improve attitudes regardless of exhibit types" (p. 231). From the data presented, however, it is obvious that the majority of the visitors already had favorable outlooks (control 71 %, pooled groups 78%) toward the subject ofthe attitude question. Attracting Power and Holding Power - Significant differences were reported, and they were interpreted to indicate that three-dimensional exhibits attracted more visitors than two-dimensional exhibits and that sound attracted the most visitors. For holding power, the fact that visitors spent relatively more time at the standard exhibit and at the standard exhibit with sound was interpreted to mean that the more realistic the exhibit, the greater the level ofvisitor interest. Interaction - This category was defined by the researcher as "any movement associated with gaining better comprehension ofan exhibit" (p. 222). From the observation ofsignificant differences, the author concluded that visitors interacted more with three-dimensional exhibits. On the basis ofthe above evidence for attracting power, holding power, and interaction, the researcher concluded that three-dimensional exhibits had greater attracting power than two-dimensional exhibits, and that the former enhance holding power and interaction. Correlation measurement - Significant positive correlations were re­ ported between interaction and attracting power as well as between interaction and holding power. High positive correlations were reported between holding power and attracting power and between holding power and knowledge gain. The researcher summarized these findings by stating that there was a Spring 1991 83

correlation between knowledge gain, attitudinal change, attracting power, holding power, and interaction.

Interpretation: Peart interpreted the results as being "supported by the literature," even though at the outset ofthe paper he wrote that "research and other literature on the topic (effectiveness ofmodern exhibit techniques) is inconclusive" (p. 221). In addition, the researcher concluded that concrete exhibits were more effective than abstract ones, that knowledge gain does occur (especially with concrete exhibits that are well designed), that exhibit type significantly affected visitor flow patterns, that labels were critical, that the goal for exhibits should be attitudinal reinforcement rather than attitude change, and that a correlation existed between attracting power, holding power, and interaction.

Critique ofStudy: In the preparation ofany research, there should be an appropriate if not exhaustive review of the literature that can be cited as leading to, or suggesting, the current study. In this paper the literature review was incorporated within the introduction, which focused on the institution where the study took place. Only one statement indicated that numerous researchers had previously conducted evaluations that addressed the concerns ofthe author; however, these were not cited. Similarly, in supportofthe conclusions ofthis study, the author cited a number of authors without reference to the nature oftheir research or how it related to the present findings. Clearly this study did not present enough evidence to indicate that it emerged from a sound theoretical as well as empirical foundation. Consequently the conclusions could not be interpreted on this basis. There is clarification of terminology in the study which is of benefit to readers. Accuracy and consistency in the definitions are essential to a new field of research such as museum studies. The use ofunobtrusive tracking and question­ naires is appropriate; however, questions can be raised about the possible interac­ tion between variables such as comprehension and movement, movement and exhibit complexity, and comprehension and individual differences such as prior knowledge, reading ability, attitude, and interest - to name just a few. Further, the design and nature of instruments, testing procedures to determine reliability and validity, and implementation methodology were all absent. Critical procedural descriptions ofhow visitors were selected and assigned to treatment and control groups were also absent. Although first-time visitors were reported to have been randomly selected, it is unclear how this pool was then assigned. The statistical analyses used to determine significance were not clearly described, and results reported cannot be interpreted due tothe lack ofclarity in the method ofassignment. ANOVA results were reported to indicate no significant difference between sample groups; yet no summary table was reported and the 84 ILVS Review

author proceeded to discuss significant differences between treatment groups. In the discussion of significant differences, standard deviations were reported as ranging from 17.4 (control) to a high of29.3 for one ofthe treatment groups. It is important for the reader to be able to review instrument design components in order to be able to analyze adequately and understand the reported results. The instrument may be measuring a very narrow range ofattributes or may not be measuring what it is supposed to be measuring. The researcher may have accounted for validity threats; however, without more information about the methodology and reliability of development and tracking proce­ dures, the relative value of the data is questionable. On the other hand, even measures that have high reliability coefficients may in fact be totally inadequate for the construct or variables being studied, thus again threatening the overall validity ofa study. It is obvious that this study had the potential for clarifying various issues ofconcern in museum research. Forthose contemplatingsuch research, much more narrowly defined treatment groups are needed than were found in this study. Prior to the initiation of the study, the researcher should have clearly defined the boundaries ofthe experimental design, the variables that impact the design, and the analyses to be used. Clarification of design through front-end studies and pilot testing would help to avoid a study with "ceiling effects" (such as those found in the attitude results, where the visitors in both the treatment and the control groups already had very positive attitudes). With such clarification, the results could have been more clearly associated with the various treatments.

CONCLUSION Each of the studies selected for this analysis demonstrates the need for closer attention to overall experimental design. The studies also provide examples ofthe decisions that researchers make in balancing internal validity with external validity. In the end, such decisions will affect the usefulness of a study in guiding future research and application. Study 4 provides an example ofa study that attempts to answer a very broad set ofinteresting questions with, at best, limited and conflicting results. The difficulties presented by all ofthe studies highlight the need for careful definition of research questions and close attention to the specific details of the experimental design. Each ofthe studies might have profited from exit interviews ofthe subjects and other types ofqualitative data (what the subjects thought ofthe exhibits and the graphics, the difficulty level ofthe text material, etc.). Impartial observers could have been used to better document research procedures and visitor activities. More of the subjects could have been tracked to ascertain how long they observed Spring 1991 85

exhibits, whether they took time to read labels, orwhether they took adequate time to take the tests. Exhibit studies in general do not adequately describe these visitor activities. While some researchers maintain that naturalistic/qualitative methods are pre-experimental or the antithesis ofexperimental, it is the authors' opinion that both experimental and naturalistic methods can often be used togetherto enrich the data base from which to make inferences and contribute to knowledge about learning in informal settings. None ofthe studies reviewed took advantage ofthis possibility. It is obvious that a careful balance based on theoretical, empirical, and contextual constraints will further the interests ofmuseum studies. Experimental research in informal settings is a difficult process. In order to achieve control and assure internal and external validity, experimental designs by their very nature obtrusively affect the informal nature ofthe museum experience. Ifsimulated exhibits are used, one could argue that they (or the setting) do not accurately reflect the character of a "routinely" functioning museum. Although problems for study in informal settings abound (such as the influence ofadvance organizers, attentional factors, perceptions, and perspectives on coding, memory, and retrieval of exhibited information), the ways to most profitably study and measure these influences are elusive. Similarly, the difficulty in randomizing subjects and presenting treatments in a less than sterile and formal way is always present. Finally, developing instruments that reliably and validly measure the myriad cognitive and affective outcomes that could be anticipated (or unantici­ pated) is a formidable task. While useful data can be derived from studies such as those reviewed, there is still a need to more systematically apply cognitive, behavioral, and developmental theory to areas ofinformal learning. Clearly, there is much left to do in this area, but that is what makes the study of learning in informal settings fascinating and compelling...

REFERENCES Bloom, B. S., Engelhart, M. D., Furst, E. J., Hill, W. H., & Krathwohl, D. R. (1956). Taxonomy ofeducational objectives: Handbook I: The cognitive domain. New York: David McKay. Bransford, J. D. (1979). Human cognition: Learning) remembering and under­ standing. San Francisco, CA: Wadsworth. Campbell, D. T., & Stanley, J. C. (1973). Experimental and quasi-experimental design for research. Chicago, IL: Rand McNally College Publishing. 86 ILVS Review

Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation. Design and analysis issues for field settings. Boston, MA: Houghton Mifflin. Dierking, L. D., Koran, I. I., Ir., Lehman, I., & Koran, M. L. (1984). Recessing in exhibit design as a device for directing attention. Curator, 27(3), 238-248. Gagne, R. M. (1970). Some new views of learning and instruction. Phi Delta [(appan, 51,463-472. Gennaro, E. D. (1981). The effectiveness ofusing pre-visit instructional materials on learning for a museum field trip experience. Journal ofResearch in Science Teaching, 18(3),275-279. Glynn, S. (1978). Capturing readers attention by means oftypographical cueing strategies. Educational Technology, 1, 7-12. Glynn, S., & Di Vesta, F. (1979). Control ofprose processing via instructional and typographical cues. Journal ofEducational Psychology, 71, 595-603. Keele, S. (1973). Attentionandhumanperftrmance. Pacific Palisades, CA: Goodyear Publishing. Koran, I. I., Ir., & Koran, M. L. (1980). Interaction oflearner characteristics with pictorial adjuncts in learning from science text. Journal ofResearch in Science Teaching, 17( 5), 477-483. Koran,I.I .,Ir., Lehman,I. R., Dierking, L. D., &Koran,M. L. (1983). The relative effects ofpre- and post-attention directing devices on learning from a "walk­ through" museum exhibit.JournalofResearch inScience Teaching, 20(4),341­ 346. Neal, A. (1976). Exhibits for the small museum: A handbook. Nashville, TN: Association for State and Local History. Peart, B. (1984). Impact of exhibit type on knowledge gain, attitudes, and behavior. Curator, 27(3), 220-237. Rothkopf, E. Z. (1970). The concept of mathemagenic activities. Review of Educational Research, 40(3), 325-336. Smith, M. L., & Glass, G. V. (1987). Research and evaluation in education and the social sciences. Englewood Cliffs, NI: Prentice-Hall.