Individually-Administered Intelligence Tests: an Application of Anchor Test
Total Page:16
File Type:pdf, Size:1020Kb
INDIVIDUALLY-ADMINISTERED INTELLIGENCE TESTS: AN APPLICATION OF ANCHOR TEST NORMING AND EQUATING PROCEDURES IN BRITISH COLUMBIA by BARBARA JOYCE HOLMES B.A., Queen's University, 1964 M. S., University of Wisconsin, 1968 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF EDUCATION in THE FACULTY OF GRADUATE STUDIES Department of Educational Psychology We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA April 1981 Barbara Joyce Holmes, 1981 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department of e^i—tT^ The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5 Date J^ipud ABSTRACT The purpose of the present study was to simulate the Anchor Test Study for reading achievement tests using five individually-administered intelli• gence tests: The Wechsler Intelligence Scale for Children—Revised (WISC-R), the Peabody Picture Vocabulary Test (PPVT), the Slosson Intelligence Test (SIT), the Standard Progressive Matrices (SPM), and the Mill Hill Vocabulary Scale (MHVS). Three major objectives were adopted from the Anchor Test Study: to prepare tables of equivalent score values for the conversion of scores from one test to another; to compare linear and equipercentile equating procedures in the derivation of equivalent scores; and to develop provincially representative norms for the five tests. The rationale for the present study was based on the fact that intelli• gence tests are commonly used interchangeably on the apparent assumption that an equivalency relationship exists among common purpose tests. The primary focus of the present study was an empirical investigation of the viability of this use. In addition, American and British norm-referenced intelligence tests are interpreted in British Columbia as if the population of children to whom they are applied is identical to the population of chil• dren for whom each of the tests was prepared. An ancillary focus was the determination of the relevance of existing norms for use in British Columbia. All five tests were administered to a stratified random sample of 340 children at three age levels: 115 aged 1\ years, 117 aged 9^ years, and 108 aged \\\ years. The population from which the sample was drawn ii consisted of all non-Native Indian, English-speaking children at these three age levels attending public and independent schools in British Columbia. This population was further restricted to exclude children in classes for the physically handicapped, emotionally disturbed, and trainable mentally retarded. The stratification variables employed were geographic region, community size, school size, age, and sex. In addition, information was collected on a sixth variable, level of education of the head of the house• hold, to provide a description of the sample using a socioeconomic index. The tests were first scored using the norms tables in their respective manuals. Statistical tests for differences of means and variances for the B.C. sample compared to the original standardization sample revealed that, in most cases, B.C. children scored significantly higher and with less variability (p < .05). Therefore, new norms tables were prepared for each test. These consisted of IQ conversion tables for the WISC-R, PPVT, and SIT, and percentile ranks associated with raw scores for the SPM and MHVS. The renorming procedure involved lowering and spreading out the IQ score scales to mean 100 and standard deviation 15. As a result students scored lower with the B.C. than with the published norms. This Is most pronounced in the lower score ranges. In the equating phase of the study, the equivalence of each of the PPVT, SIT, SPM, and MHVS to the three' WISC-R IQ scales was examined using both psychological and statistical criteria of equivalence. Pairs of tests were defined as nominally parallel (Lord & Novick, 1968) if they were psychol• ogically similar in terms of content and purpose, and statistically similar as defined by a disattenuated correlation coefficient > .70. Thirteen test pairs were identified which satisfied the dual criteria for equivalency. iii Both linear and equipercentile equating procedures were applied to the observed score distributions of these test pairs. The accuracy of the results were judged by comparison of the conditional root-mean-square errors of equating associated with the equating procedures. These errors averaged 12 score points and were similar across all procedures. It was concluded that none of the test pairs considered in the study were equivalent, or parallel, and that, consequently, their interchangeable use is erroneous. Further, it was concluded that test equivalence requires a close correspondence of content in terms of item similarity. Without such correspondence, differences between tests render equating inappropriate. Research Supervisor: Dr. W. T. Rogers iv TABLE OF CONTENTS ABSTRACT ii LIST OF TABLES ix LIST OF FIGURES xi ACKNOWLEDGMENT '• • xii Chapter I INTRODUCTION 1 The Challenge to School Psychology 1 Background to the Problem 3 The Anchor Test Study 4 Inter-Test Score Conversions 6 Equivalent Scores » 6 Comparable Scores 7 Equating Methods 8 The Problem 9 Delimitations of the Study 11 Native Indians 11 English-Speaking Children 12 Handicapped Children 12 Age 13 Organization of the Dissertation 13 II REVIEW OF THE LITERATURE 14 Justification for Equating 14 Wechsler Intelligence Scale for Children—Revised (WISC-R) 15 Slosson Intelligence Test (SIT) 17 Peabody Picture Vocabulary Test (PPVT) 19 Standard Progressive Matrices (SPM) 21 Mill Hill Vocabulary Scale (MHVS) 23 Summary 23 Inter-Test Score Conversions 25 Equating 27 Parallel Forms 27 Nominally Parallel Tests 28 Equating Methods 31 Unequally Reliable Tests 37 Comparing 39 Nonparallel Tests 39 v Methods 43 Summary 44 Renorming 44 III METHODOLOGY 48 Sample Design 48 Stage I: Schools 49 Stage II: Individuals 52 Population Sizes 52 Sampling Procedures 54 Sample Allocation 54 Preparation of the School Sampling Frame 54 Identification of Schools 60 Preparation of the Sampling Frames for Individuals ... 61 Testing 62 Tests Used 62 Testers 63 Testing Procedures 63 Scoring and Data Preparation 64 Mill Hill Vocabulary Scale 65 DATA ANALYSIS 66 Preliminary Analyses 67 Order of Administration 67 Goodness of Fit 67 Determination of Norm Relevance 68 Central Tendency 69 Variance 69 Preparation of B.C. Norms 70 Wechsler Intelligence Scale for Children—Revised 71 Derivation of Subtest Scaled Scores 71 Derivation of Verbal, Performance, and Full Scale IQ Scores 76 Peabody Picture Vocabulary Test and Slosson Intelligence Test 77 Standard Progressive Matrices and Mill Hill Vocabulary Scale 78 Statistical Properties of the Tests 78 Equating 80 Determination of Nominally Parallel Test Pairs 81 Assignment of Test Pairs to Equating Methods 82 Linear Equating (Equally Reliable Tests) 83 Linear Equating (Unequally Reliable Tests) 86 Equipercentile Equating 87 Summary . 88 vi IV RESULTS : 90 Rate of Response 90 Representativeness of the Sample 91 Results of Preliminary Analyses 96 Order Effect 96 Normality 96 Comparison to Published Norms 99 Comparison of WISC-R Results to "White" American Norms ... 102 Preparation of B.C. Norms 103 WISC-R Scaled Scores 103 WISC-R IQ Scales 108 PPVT and SIT IQ Scores 108 Statistical Properties of the Tests 109 Interpretation of the Norms Ill Equating 114 Identification of Nominally Parallel Test Pairs 115 Designation of Test Pairs to Equating Methods 116 A Comparative Examination of Equating Procedures 116 I. MHVS and WISC-R Verbal 116 II. PPVT and WISC-R Verbal .° 127 Equating Results 135 V SUMMARY AND DISCUSSION 149 Summary 149 Norming 153 The B.C. Scores 153 Use of the New Norms 156 The Comparability of the New Norms -157 Equating 157 The Equatability of Nominally Parallel Test Pairs 157 The Use and Interpretation of Intelligence Tests 160 Limitations of the Study 160 Norming 160 Equating 161 Directions for Further Research 161 REFERENCE NOTES 163 REFERENCES 164 APPENDIX A: Letters and Consent Forms 173 APPENDIX B: WISC-R Canadian Substitution Items 188 APPENDIX C: Project Handbook 190 APPENDIX D: MHVS Scoring Guides 200 vii APPENDIX E: British Columbia Norms Tables 213 APPENDIX F: ANOVA and Cochran's C • 250 APPENDIX G: Graphic Equipercentile Equating Procedure: PPVT IQ to WISC-R Verbal IQ Age 1\ 254 APPENDIX H: Normalized Linear Equating Results 259 viii LIST OF TABLES Table 1 WISC-R Subtests and Skills Measured 16 2 Pearson Product-Moment Correlation Coefficient for PPVT IQs and Three WISC-R IQs 20 3 Content and Structure of the Tests 24 4 Psychologically and Functionally Equivalent Test Pairs as Suggested by their Content, Authors' Claims, Empirical Validity, and Practitioners' Use 24 5 Relationship between Parallelism and Type of Score Conversion - 44 6 Population Size Stratified by Region, Community Size, School Size, and Age 53 7 Population Percentages: Region by Age 54 8 Percentage of Population within Region by Age 55 9 Target Sample Allocation: Region by Age 57 10 Target Sample Allocation within Region 58 11 Proportional School Size Sampling Procedure Region #1, Community Size A, School Size II 59 12 Measures of Central Tendency and Variability Reported in Test Manuals 68 13 Example of Computation of Scaled Score Equivalents of Raw Scores WISC-R Information Subtest Age 11%, n=l08 72 14 Example of the Graphic Scaled Score Conversion Approach WISC-R Information Subtest Age Ilk, n=l08 74 15 Form of Reliability Coefficient Computed 79 16 Rates of Response for Each Test and Age 90 17 Rates of Response (Percentages) for the Total Sample Design .