External Validity: from Do-Calculus to Transportability Across Populations

Total Page:16

File Type:pdf, Size:1020Kb

External Validity: from Do-Calculus to Transportability Across Populations Statistical Science 2014, Vol. 29, No. 4, 579–595 DOI: 10.1214/14-STS486 c Institute of Mathematical Statistics, 2014 External Validity: From Do-Calculus to Transportability Across Populations Judea Pearl and Elias Bareinboim Abstract. The generalizability of empirical findings to new environ- ments, settings or populations, often called “external validity,” is es- sential in most scientific explorations. This paper treats a particular problem of generalizability, called “transportability,” defined as a li- cense to transfer causal effects learned in experimental studies to a new population, in which only observational studies can be conducted. We introduce a formal representation called “selection diagrams” for ex- pressing knowledge about differences and commonalities between pop- ulations of interest and, using this representation, we reduce questions of transportability to symbolic derivations in the do-calculus. This re- duction yields graph-based procedures for deciding, prior to observing any data, whether causal effects in the target population can be in- ferred from experimental findings in the study population. When the answer is affirmative, the procedures identify what experimental and observational findings need be obtained from the two populations, and how they can be combined to ensure bias-free transport. Key words and phrases: Experimental design, generalizability, causal effects, external validity. 1. INTRODUCTION: THREATS VS. most studies are conducted with the intention of ap- ASSUMPTIONS plying the results elsewhere means that we usually deem the target environment sufficiently similar to Science is about generalization, and generalization the study environment to justify the transport of requires that conclusions obtained in the laboratory experimental results or their ramifications. be transported and applied elsewhere, in an envi- Remarkably, the conditions that permit such arXiv:1503.01603v1 [stat.ME] 5 Mar 2015 ronment that differs in many aspects from that of transport have not received systematic formal treat- the laboratory. ment. In statistical practice, problems related to Clearly, if the target environment is arbitrary, combining and generalizing from diverse studies are or drastically different from the study environment handled by methods of meta analysis (Glass (1976); nothing can be transferred and scientific progress Hedges and Olkin (1985); Owen (2009)), or hierar- will come to a standstill. However, the fact that chical models (Gelman and Hill (2007)), in which re- sults of diverse studies are pooled together by stan- Judea Pearl is Professor and Elias Bareinboim is Ph.D. dard statistical procedures (e.g., inverse-variance Candidate, Computer Science Department, University reweighting in meta-analysis, partial pooling in hi- of California, Los Angeles, California 90095-1596, USA erarchical modeling) and rarely make explicit dis- e-mail: [email protected]; [email protected]. tinction between experimental and observational This is an electronic reprint of the original article regimes; performance is evaluated primarily by sim- published by the Institute of Mathematical Statistics in ulation. Statistical Science, 2014, Vol. 29, No. 4, 579–595. This To supplement these methodologies, our paper reprint differs from the original in pagination and provides theoretical guidance in the form of limits on typographic detail. what can be achieved in practice, what problems are 1 2 J. PEARL AND E. BAREINBOIM likely to be encountered when populations differ sig- language within which the notion “environment” (or nificantly from each other, what population differ- “population”) is given precise characterization, and ences can be circumvented by clever design and what differences among environments can be encoded and differences constitute theoretical impediments, pro- analyzed. hibiting generalization by any means whatsoever. The advent of causal diagrams (Wright (1921); On the theoretical front, the standard literature Heise (1975); Davis (1984); Verma and Pearl (1988); on this topic, falling under rubrics such as “exter- Spirtes, Glymour and Scheines (1993); Pearl (1995)) nal validity” (Campbell and Stanley (1963), Man- together with models of interventions (Haavelmo ski (2007)), “heterogeneity” (H¨ofler, Gloster and (1943); Strotz and Wold (1960)) and counterfactu- Hoyer (2010)), “quasi-experiments” (Shadish, Cook als (Neyman (1923); Rubin (1974); Robins (1986); and Campbell (2002), Chapter 3; Adelman (1991)),1 Balke and Pearl (1995)) provides such a language consists primarily of “threats,” namely, explanations and renders the formalization of transportability of what may go wrong when we try to transport re- possible. sults from one study to another while ignoring their Armed with this language, this paper departs differences. Rarely do we find an analysis of “licens- from the tradition of communicating “threats” and ing assumptions,” namely, formal conditions under embarks instead on the task of formulating “licenses which the transport of results across differing envi- to transport,” namely, assumptions that, if they held ronments or populations is licensed from first prin- true, would permit us to transport results across ciples.2 studies. The reasons for this asymmetry are several. First, In addition, the paper uses the inferential ma- threats are safer to cite than assumptions. He chinery of the do-calculus (Pearl (1995); Koller and who cites “threats” appears prudent, cautious and Friedman (2009); Huang and Valtorta (2006); Sh- thoughtful, whereas he who seeks licensing assump- pitser and Pearl (2006)) to derive algorithms for de- tions risks suspicions of attempting to endorse those ciding whether transportability is feasible and how assumptions. experimental and observational findings can be com- Second, assumptions are self-destructive in their bined to yield unbiased estimates of causal effects in honesty. The more explicit the assumption, the more the target population. criticism it invites, for it tends to trigger a richer The paper is organized as follows. In Section 2, space of alternative scenarios in which the assump- we review the foundations of structural equations tion may fail. Researchers prefer therefore to declare modeling (SEM), the question of identifiability and threats in public and make assumptions in private. the do-calculus that emerges from these founda- Third, whereas threats can be communicated in tions. (This section can be skipped by readers fa- plain English, supported by anecdotal pointers to miliar with these concepts and tools.) In Section 3, familiar experiences, assumptions require a formal we motivate the question of transportability through simple examples, and illustrate how the solution de- 1Manski (2007) defines “external validity” as follows: “An pends on the causal story behind the problem. In experiment is said to have “external validity” if the distribu- Section 4, we formally define the notion of trans- tion of outcomes realized by a treatment group is the same as portability and reduce it to a problem of symbolic the distribution of outcome that would be realized in an ac- transformations in do-calculus. In Section 5, we pro- tual program.” Campbell and Stanley (1963), page 5, take a vide a graphical criterion for deciding transporta- slightly broader view: ““External validity” asks the question of generalizability: to what populations, settings, treatment bility and estimating transported causal effects. We variables, and measurement variables can this effect be gen- conclude in Section 6 with brief discussions of re- eralized?” lated problems of external validity, these include sta- 2Hern´an and VanderWeele (2011) studied such conditions tistical transportability, and meta-analysis. in the context of compound treatments, where we seek to predict the effect of one version of a treatment from experi- ments with a different version. Their analysis is a special case 2. PRELIMINARIES: THE LOGICAL of the theory developed in this paper (Petersen (2011)). A FOUNDATIONS OF CAUSAL INFERENCE related application is reported in Robins, Orellana and Rot- nitzky (2008) where a treatment strategy is extrapolated be- The tools presented in this paper were developed tween two biological similar populations under different ob- in the context of nonparametric Structural Equa- servational regimes. tions Models (SEM), which is one among several EXTERNAL VALIDITY: FROM DO-CALCULUS TO TRANSPORTABILITY 3 approaches to causal inference, and goes back to In principle, each query Qi ∈ Q should be “well (Haavelmo (1943); Strotz and Wold (1960)). Other defined,” that is, computable from any fully approaches include, for example, potential-outcomes specified model M compatible with A. (See Def- (Rubin (1974)), Structured Tree Graphs (Robins inition 1 for formal characterization of a model, (1986)), decision analytic (Dawid (2002)), Causal and also Section 2.4 for the problem of identifi- Bayesian Networks (Spirtes, Glymour and Scheines cation in partially specified models.) (2000); Pearl (2000), Chapter 1; Bareinboim, Brito I-3. A set D of experimental or non-experimental and Pearl (2012)), and Settable Systems (White and data, governed by a joint probability distribu- Chalak (2009)). We will first describe the generic tion presumably consistent with A. features common to all such approaches, and then summarize how these features are represented in The outputs are: 3 SEM. O-1. A set A∗ of statements which are the logical 2.1 Causal Models as Inference Engines implications of A, separate from the data at hand. For
Recommended publications
  • Internal Validity Is About Causal Interpretability
    Internal Validity is about Causal Interpretability Before we can discuss Internal Validity, we have to discuss different types of Internal Validity variables and review causal RH:s and the evidence needed to support them… Every behavior/measure used in a research study is either a ... Constant -- all the participants in the study have the same value on that behavior/measure or a ... • Measured & Manipulated Variables & Constants Variable -- when at least some of the participants in the study • Causes, Effects, Controls & Confounds have different values on that behavior/measure • Components of Internal Validity • “Creating” initial equivalence and every behavior/measure is either … • “Maintaining” ongoing equivalence Measured -- the value of that behavior/measure is obtained by • Interrelationships between Internal Validity & External Validity observation or self-report of the participant (often called “subject constant/variable”) or it is … Manipulated -- the value of that behavior/measure is controlled, delivered, determined, etc., by the researcher (often called “procedural constant/variable”) So, every behavior/measure in any study is one of four types…. constant variable measured measured (subject) measured (subject) constant variable manipulated manipulated manipulated (procedural) constant (procedural) variable Identify each of the following (as one of the four above, duh!)… • Participants reported practicing between 3 and 10 times • All participants were given the same set of words to memorize • Each participant reported they were a Psyc major • Each participant was given either the “homicide” or the “self- defense” vignette to read From before... Circle the manipulated/causal & underline measured/effect variable in each • Causal RH: -- differences in the amount or kind of one behavior cause/produce/create/change/etc.
    [Show full text]
  • Validity and Reliability of the Questionnaire for Compliance with Standard Precaution for Nurses
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Cadernos Espinosanos (E-Journal) Rev Saúde Pública 2015;49:87 Original Articles DOI:10.1590/S0034-8910.2015049005975 Marília Duarte ValimI Validity and reliability of the Maria Helena Palucci MarzialeII Miyeko HayashidaII Questionnaire for Compliance Fernanda Ludmilla Rossi RochaIII with Standard Precaution Jair Lício Ferreira SantosIV ABSTRACT OBJECTIVE: To evaluate the validity and reliability of the Questionnaire for Compliance with Standard Precaution for nurses. METHODS: This methodological study was conducted with 121 nurses from health care facilities in Sao Paulo’s countryside, who were represented by two high-complexity and by three average-complexity health care facilities. Internal consistency was calculated using Cronbach’s alpha and stability was calculated by the intraclass correlation coefficient, through test-retest. Convergent, discriminant, and known-groups construct validity techniques were conducted. RESULTS: The questionnaire was found to be reliable (Cronbach’s alpha: 0.80; intraclass correlation coefficient: (0.97) In regards to the convergent and discriminant construct validity, strong correlation was found between compliance to standard precautions, the perception of a safe environment, and the smaller perception of obstacles to follow such precautions (r = 0.614 and r = 0.537, respectively). The nurses who were trained on the standard precautions and worked on the health care facilities of higher complexity were shown to comply more (p = 0.028 and p = 0.006, respectively). CONCLUSIONS: The Brazilian version of the Questionnaire for I Departamento de Enfermagem. Centro Compliance with Standard Precaution was shown to be valid and reliable.
    [Show full text]
  • Statistical Analysis 8: Two-Way Analysis of Variance (ANOVA)
    Statistical Analysis 8: Two-way analysis of variance (ANOVA) Research question type: Explaining a continuous variable with 2 categorical variables What kind of variables? Continuous (scale/interval/ratio) and 2 independent categorical variables (factors) Common Applications: Comparing means of a single variable at different levels of two conditions (factors) in scientific experiments. Example: The effective life (in hours) of batteries is compared by material type (1, 2 or 3) and operating temperature: Low (-10˚C), Medium (20˚C) or High (45˚C). Twelve batteries are randomly selected from each material type and are then randomly allocated to each temperature level. The resulting life of all 36 batteries is shown below: Table 1: Life (in hours) of batteries by material type and temperature Temperature (˚C) Low (-10˚C) Medium (20˚C) High (45˚C) 1 130, 155, 74, 180 34, 40, 80, 75 20, 70, 82, 58 2 150, 188, 159, 126 136, 122, 106, 115 25, 70, 58, 45 type Material 3 138, 110, 168, 160 174, 120, 150, 139 96, 104, 82, 60 Source: Montgomery (2001) Research question: Is there difference in mean life of the batteries for differing material type and operating temperature levels? In analysis of variance we compare the variability between the groups (how far apart are the means?) to the variability within the groups (how much natural variation is there in our measurements?). This is why it is called analysis of variance, abbreviated to ANOVA. This example has two factors (material type and temperature), each with 3 levels. Hypotheses: The 'null hypothesis' might be: H0: There is no difference in mean battery life for different combinations of material type and temperature level And an 'alternative hypothesis' might be: H1: There is a difference in mean battery life for different combinations of material type and temperature level If the alternative hypothesis is accepted, further analysis is performed to explore where the individual differences are.
    [Show full text]
  • Validity and Reliability in Quantitative Studies Evid Based Nurs: First Published As 10.1136/Eb-2015-102129 on 15 May 2015
    Research made simple Validity and reliability in quantitative studies Evid Based Nurs: first published as 10.1136/eb-2015-102129 on 15 May 2015. Downloaded from Roberta Heale,1 Alison Twycross2 10.1136/eb-2015-102129 Evidence-based practice includes, in part, implementa- have a high degree of anxiety? In another example, a tion of the findings of well-conducted quality research test of knowledge of medications that requires dosage studies. So being able to critique quantitative research is calculations may instead be testing maths knowledge. 1 School of Nursing, Laurentian an important skill for nurses. Consideration must be There are three types of evidence that can be used to University, Sudbury, Ontario, given not only to the results of the study but also the demonstrate a research instrument has construct Canada 2Faculty of Health and Social rigour of the research. Rigour refers to the extent to validity: which the researchers worked to enhance the quality of Care, London South Bank 1 Homogeneity—meaning that the instrument mea- the studies. In quantitative research, this is achieved University, London, UK sures one construct. through measurement of the validity and reliability.1 2 Convergence—this occurs when the instrument mea- Correspondence to: Validity sures concepts similar to that of other instruments. Dr Roberta Heale, Validity is defined as the extent to which a concept is Although if there are no similar instruments avail- School of Nursing, Laurentian accurately measured in a quantitative study. For able this will not be possible to do. University, Ramsey Lake Road, example, a survey designed to explore depression but Sudbury, Ontario, Canada 3 Theory evidence—this is evident when behaviour is which actually measures anxiety would not be consid- P3E2C6; similar to theoretical propositions of the construct ered valid.
    [Show full text]
  • Relations Between Inductive Reasoning and Deductive Reasoning
    Journal of Experimental Psychology: © 2010 American Psychological Association Learning, Memory, and Cognition 0278-7393/10/$12.00 DOI: 10.1037/a0018784 2010, Vol. 36, No. 3, 805–812 Relations Between Inductive Reasoning and Deductive Reasoning Evan Heit Caren M. Rotello University of California, Merced University of Massachusetts Amherst One of the most important open questions in reasoning research is how inductive reasoning and deductive reasoning are related. In an effort to address this question, we applied methods and concepts from memory research. We used 2 experiments to examine the effects of logical validity and premise– conclusion similarity on evaluation of arguments. Experiment 1 showed 2 dissociations: For a common set of arguments, deduction judgments were more affected by validity, and induction judgments were more affected by similarity. Moreover, Experiment 2 showed that fast deduction judgments were like induction judgments—in terms of being more influenced by similarity and less influenced by validity, compared with slow deduction judgments. These novel results pose challenges for a 1-process account of reasoning and are interpreted in terms of a 2-process account of reasoning, which was implemented as a multidimensional signal detection model and applied to receiver operating characteristic data. Keywords: reasoning, similarity, mathematical modeling An important open question in reasoning research concerns the arguments (Rips, 2001). This technique can highlight similarities relation between induction and deduction. Typically, individual or differences between induction and deduction that are not con- studies of reasoning have focused on only one task, rather than founded by the use of different materials (Heit, 2007). It also examining how the two are connected (Heit, 2007).
    [Show full text]
  • Questionnaire Validity and Reliability
    Questionnaire Validity and reliability Department of Social and Preventive Medicine, Faculty of Medicine Outlines Introduction What is validity and reliability? Types of validity and reliability. How do you measure them? Types of Sampling Methods Sample size calculation G-Power ( Power Analysis) Research • The systematic investigation into and study of materials and sources in order to establish facts and reach new conclusions • In the broadest sense of the word, the research includes gathering of data in order to generate information and establish the facts for the advancement of knowledge. ● Step I: Define the research problem ● Step 2: Developing a research plan & research Design ● Step 3: Define the Variables & Instrument (validity & Reliability) ● Step 4: Sampling & Collecting data ● Step 5: Analysing data ● Step 6: Presenting the findings A questionnaire is • A technique for collecting data in which a respondent provides answers to a series of questions. • The vehicle used to pose the questions that the researcher wants respondents to answer. • The validity of the results depends on the quality of these instruments. • Good questionnaires are difficult to construct; • Bad questionnaires are difficult to analyze. •Identify the goal of your questionnaire •What kind of information do you want to gather with your questionnaire? • What is your main objective? • Is a questionnaire the best way to go about collecting this information? Ch 11 6 How To Obtain Valid Information • Ask purposeful questions • Ask concrete questions • Use time periods
    [Show full text]
  • Reliability, Validity, and Trustworthiness
    © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION Chapter 12 Reliability, Validity, and Trustworthiness James Eldridge © VLADGRIN/iStock/Thinkstock Chapter Objectives At the conclusion of this chapter, the learner will be able to: 1. Identify the need for reliability and validity of instruments used in evidence-based practice. 2. Define reliability and validity. 3. Discuss how reliability and validity affect outcome measures and conclusions of evidence-based research. 4. Develop reliability and validity coefficients for appropriate data. 5. Interpret reliability and validity coefficients of instruments used in evidence-based practice. 6. Describe sensitivity and specificity as related to data analysis. 7. Interpret receiver operand characteristics (ROC) to describe validity. Key Terms Accuracy Content-related validity Concurrent validity Correlation coefficient Consistency Criterion-related validity Construct validity Cross-validation 9781284108958_CH12_Pass03.indd 339 10/20/15 5:59 PM © Jones & Bartlett Learning, LLC. NOT FOR SALE OR DISTRIBUTION 340 | Chapter 12 Reliability, Validity, and Trustworthiness Equivalency reliability Reliability Interclass reliability Sensitivity Intraclass reliability Specificity Objectivity Stability Observed score Standard error of Predictive validity measurement (SEM) Receiver operand Trustworthiness characteristics (ROC) Validity n Introduction The foundation of good research and of good decision making in evidence- based practice (EBP) is the trustworthiness of the data used to make decisions. When data cannot be trusted, an informed decision cannot be made. Trustworthiness of the data can only be as good as the instruments or tests used to collect the data. Regardless of the specialization of the healthcare provider, nurses make daily decisions on the diagnosis and treatment of a patient based on the results from different tests to which the patient is subjected.
    [Show full text]
  • Validity and Reliability in Social Science Research
    Education Research and Perspectives, Vol.38, No.1 Validity and Reliability in Social Science Research Ellen A. Drost California State University, Los Angeles Concepts of reliability and validity in social science research are introduced and major methods to assess reliability and validity reviewed with examples from the literature. The thrust of the paper is to provide novice researchers with an understanding of the general problem of validity in social science research and to acquaint them with approaches to developing strong support for the validity of their research. Introduction An important part of social science research is the quantification of human behaviour — that is, using measurement instruments to observe human behaviour. The measurement of human behaviour belongs to the widely accepted positivist view, or empirical- analytic approach, to discern reality (Smallbone & Quinton, 2004). Because most behavioural research takes place within this paradigm, measurement instruments must be valid and reliable. The objective of this paper is to provide insight into these two important concepts, and to introduce the major methods to assess validity and reliability as they relate to behavioural research. The paper has been written for the novice researcher in the social sciences. It presents a broad overview taken from traditional literature, not a critical account of the general problem of validity of research information. The paper is organised as follows. The first section presents what reliability of measurement means and the techniques most frequently used to estimate reliability. Three important questions researchers frequently ask about reliability are discussed: (1) what Address for correspondence: Dr. Ellen Drost, Department of Management, College of Business and Economics, California State University, Los Angeles, 5151 State University Drive, Los Angeles, CA 90032.
    [Show full text]
  • Validity: Theoretical Basis Importance of Validity
    Psychological Measurement mark.hurlstone @uwa.edu.au Validity Validity: Theoretical Basis Importance of Validity Classic & Contemporary Approaches PSYC3302: Psychological Measurement and Its Trinitarian View Unitary View Applications Test Content Internal Structure Response Processes Mark Hurlstone Associations With Univeristy of Western Australia Other Variables Consequences of Testing Other Week 5 Perspectives Reliability & Validity [email protected] Psychological Measurement Learning Objectives Psychological Measurement mark.hurlstone @uwa.edu.au Validity • Introduction to the concept of validity Importance of Validity • Overview of the theoretical basis of validity: Classic & Contemporary 1 Trinitarian view of validity Approaches Trinitarian View 2 Unitary view of validity Unitary View Test Content • Alternative perspectives on validity Internal Structure Response Processes • Contrasting validity and reliability Associations With Other Variables Consequences of Testing Other Perspectives Reliability & Validity [email protected] Psychological Measurement Validity in Everyday Usage Psychological Measurement mark.hurlstone @uwa.edu.au • In everyday language, we say something is valid if it is Validity sound, meaningful, or supported by evidence Importance of Validity • e.g., we may speak of a valid theory, a valid argument, Classic & or a valid reason Contemporary Approaches • In legal terminology, lawyers say something is valid if it is Trinitarian View Unitary View "executed with the proper formalities"—such as a valid
    [Show full text]
  • Accuracy Vs. Validity, Consistency Vs. Reliability, and Fairness Vs
    Accuracy vs. Validity, Consistency vs. Reliability, and Fairness vs. Absence of Bias: A Call for Quality Accuracy vs. Validity, Consistency vs. Reliability, and Fairness vs. Absence of Bias: A Call for Quality Paper Presented at the Annual Meeting of the American Association of Colleges of Teacher Education (AACTE) New Orleans, LA. February 2008 W. Steve Lang, Ph.D. Professor, Educational Measurement and Research University of South Florida St. Petersburg Email: [email protected] Judy R. Wilkerson, Ph.D. Associate Professor, Research and Assessment Florida Gulf Coast University email: [email protected] Abstract The National Council for Accreditation of Teacher Education (NCATE, 2002) requires teacher education units to develop assessment systems and evaluate both the success of candidates and unit operations. Because of a stated, but misguided, fear of statistics, NCATE fails to use accepted terminology to assure the quality of institutional evaluative decisions with regard to the relevant standard (#2). Instead of “validity” and ”reliability,” NCATE substitutes “accuracy” and “consistency.” NCATE uses the accepted terms of “fairness” and “avoidance of bias” but confuses them with each other and with validity and reliability. It is not surprising, therefore, that this Standard is the most problematic standard in accreditation decisions. This paper seeks to clarify the terms, using scholarly work and measurement standards as a basis for differentiating and explaining the terms. The paper also provides examples to demonstrate how units can seek evidence of validity, reliability, and fairness with either statistical or non-statistical methodologies, disproving the NCATE assertion that statistical methods provide the only sources of evidence. The lack of adherence to professional assessment standards and the knowledge base of the educational profession in both the rubric and web-based resource materials for this standard are discussed.
    [Show full text]
  • Introduction to Validity
    Slide 1 Introduction to Validity Presentation to the National Assessment Governing Board This presentation addresses the topic of validity. It begins with some first principles of validity. Slide 2 Validity is important. * “one of the major deities in the pantheon of the psychometrician...” (Ebel, 1961, p. 640) * “the most fundamental consideration in developing and evaluating tests” (AERA, APA, NCME, 1999, p. 9) 2 These two quotes highlight the importance of validity in measurement and assessment. The first of these quotes is by a renowned psychometrician, Robert Ebel. The second quote is from the Standards for Educational and Psychological Testing. Slide 3 Two Important Concepts 1) Construct * a label used to describe behavior * refers to an unobserved (latent) characteristic of interest * Examples: creativity, intelligence, reading comprehension, preparedness 3 There are two important concepts to validity. The first one is the notion of a construct. Constructs can be defined as labels used to describe an unobserved behavior, such as honesty. Such behaviors vary across individuals; although they may leave an impression, they cannot be directly measured. In the social sciences virtually everything is a construct. Slide 4 Construct (continued) * don’t exist -- “the product of informed scientific imagination” * operationalized via a measurement process 4 Linda Crocker and James Algina described constructs as “the product of informed scientific imagination.” Constructs represent something that is abstract, but can be operationalized through the measurement process. Measurement can involve a paper and pencil test, a performance, or some other activity through observation which would then produce variables that can be quantified, manipulated, and analyzed. Slide 5 Two Important Concepts 2) Inference * “Informed leap” from an observed, measured value to an estimate of underlying standing on a construct * Short vs.
    [Show full text]
  • Pre-Service Science Teachers' Discrimination Level Of
    ORIGINAL ARTICLE Pre-service Science Teachers’ Discrimination Level of Science and Pseudoscience Murat Berat Uçar1* Elvan Sahin2 1Department of Mathematics and Science Education, M. R. Faculty of Education, Kilis 7 Aralik University, Kilis, Turkey, 2Department of Mathematics and Science Education, Faculty of Education, Middle East Technical University, Ankara, Turkey *Corresponding author: [email protected] ABSTRACT This quantitative study aimed to examine Turkish pre-service science teachers’ beliefs regarding the demarcating between science and pseudoscience. Participants completed the Science and Pseudoscience Distinction Scale. Data collected from the 123 pre-service science teachers were examined based on the dimensions of the instrument, namely science as a process of inquiry (SCI), demarcating between science and pseudoscience, and pseudoscientific beliefs (PS). This study found that these pre-service science teachers generally did not hold strong beliefs on distinguishing science and pseudoscience. Their beliefs regarding SCI were not highly favorable. Moreover, this study revealed that they had some PS. Considering the role of gender and year in the program, the results of two-way MANOVA indicated that there was no statistically significant difference on the related belief constructs for these pre-service science teachers. Thus, the present study intended to shed light on pre-service science teachers’ mindsets about identifying accurate scientific information rather than pseudoscientific confusions that could aid preparing scientifically
    [Show full text]