INDIVIDUALLY-ADMINISTERED INTELLIGENCE TESTS: AN APPLICATION OF ANCHOR TEST

NORMING AND EQUATING PROCEDURES IN BRITISH COLUMBIA

by

BARBARA JOYCE HOLMES

B.A., Queen's University, 1964 M. S., University of Wisconsin, 1968

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

DOCTOR OF EDUCATION

in

THE FACULTY OF GRADUATE STUDIES

Department of Educational Psychology

We accept this thesis as conforming

to the required standard

THE UNIVERSITY OF BRITISH COLUMBIA

April 1981

Barbara Joyce Holmes, 1981 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission.

Department of e^i—tT^

The University of British Columbia 2075 Wesbrook Place Vancouver, Canada V6T 1W5

Date J^ipud ABSTRACT

The purpose of the present study was to simulate the Anchor Test Study

for reading achievement tests using five individually-administered intelli• gence tests: The Wechsler Intelligence Scale for Children—Revised (WISC-R),

the Peabody Picture Vocabulary Test (PPVT), the Slosson Intelligence Test

(SIT), the Standard Progressive Matrices (SPM), and the Mill Hill Vocabulary

Scale (MHVS). Three major objectives were adopted from the Anchor Test

Study: to prepare tables of equivalent score values for the conversion

of scores from one test to another; to compare linear and equipercentile

equating procedures in the derivation of equivalent scores; and to develop

provincially representative norms for the five tests.

The rationale for the present study was based on the fact that intelli•

gence tests are commonly used interchangeably on the apparent assumption

that an equivalency relationship exists among common purpose tests. The

primary focus of the present study was an empirical investigation of the

viability of this use. In addition, American and British norm-referenced

intelligence tests are interpreted in British Columbia as if the population

of children to whom they are applied is identical to the population of chil•

dren for whom each of the tests was prepared. An ancillary focus was the

determination of the relevance of existing norms for use in British Columbia.

All five tests were administered to a stratified random sample of

340 children at three age levels: 115 aged 1\ years, 117 aged 9^ years,

and 108 aged \\\ years. The population from which the sample was drawn

ii consisted of all non-Native Indian, English-speaking children at these three age levels attending public and independent schools in British Columbia.

This population was further restricted to exclude children in classes for the physically handicapped, emotionally disturbed, and trainable mentally retarded. The stratification variables employed were geographic region, community size, school size, age, and sex. In addition, information was collected on a sixth variable, level of education of the head of the house• hold, to provide a description of the sample using a socioeconomic index.

The tests were first scored using the norms tables in their respective manuals. Statistical tests for differences of means and variances for

the B.C. sample compared to the original standardization sample revealed

that, in most cases, B.C. children scored significantly higher and with

less variability (p < .05). Therefore, new norms tables were prepared

for each test. These consisted of IQ conversion tables for the WISC-R,

PPVT, and SIT, and percentile ranks associated with raw scores for the SPM

and MHVS. The renorming procedure involved lowering and spreading out

the IQ score scales to mean 100 and standard deviation 15. As a result

students scored lower with the B.C. than with the published norms. This

Is most pronounced in the lower score ranges.

In the equating phase of the study, the equivalence of each of the

PPVT, SIT, SPM, and MHVS to the three' WISC-R IQ scales was examined using

both psychological and statistical criteria of equivalence. Pairs of tests were defined as nominally parallel (Lord & Novick, 1968) if they were psychol•

ogically similar in terms of content and purpose, and statistically similar

as defined by a disattenuated correlation coefficient > .70. Thirteen

test pairs were identified which satisfied the dual criteria for equivalency.

iii Both linear and equipercentile equating procedures were applied to the observed score distributions of these test pairs. The accuracy of the results were judged by comparison of the conditional root-mean-square errors of equating associated with the equating procedures. These errors averaged

12 score points and were similar across all procedures.

It was concluded that none of the test pairs considered in the study were equivalent, or parallel, and that, consequently, their interchangeable use is erroneous. Further, it was concluded that test equivalence requires a close correspondence of content in terms of item similarity. Without such correspondence, differences between tests render equating inappropriate.

Research Supervisor: Dr. W. T. Rogers

iv TABLE OF CONTENTS

ABSTRACT ii

LIST OF TABLES ix

LIST OF FIGURES xi

ACKNOWLEDGMENT '• • xii

Chapter I INTRODUCTION 1

The Challenge to School Psychology 1 Background to the Problem 3 The Anchor Test Study 4 Inter-Test Score Conversions 6 Equivalent Scores » 6 Comparable Scores 7 Equating Methods 8 The Problem 9 Delimitations of the Study 11 Native Indians 11 English-Speaking Children 12 Handicapped Children 12 Age 13

Organization of the Dissertation 13

II REVIEW OF THE LITERATURE 14

Justification for Equating 14 Wechsler Intelligence Scale for Children—Revised (WISC-R) 15 Slosson Intelligence Test (SIT) 17 Peabody Picture Vocabulary Test (PPVT) 19 Standard Progressive Matrices (SPM) 21 Mill Hill Vocabulary Scale (MHVS) 23 Summary 23 Inter-Test Score Conversions 25 Equating 27 Parallel Forms 27 Nominally Parallel Tests 28 Equating Methods 31 Unequally Reliable Tests 37 Comparing 39 Nonparallel Tests 39 v Methods 43 Summary 44 Renorming 44

III METHODOLOGY 48

Sample Design 48 Stage I: Schools 49 Stage II: Individuals 52 Population Sizes 52 Sampling Procedures 54 Sample Allocation 54 Preparation of the School Sampling Frame 54 Identification of Schools 60 Preparation of the Sampling Frames for Individuals ... 61 Testing 62 Tests Used 62 Testers 63 Testing Procedures 63 Scoring and Data Preparation 64 Mill Hill Vocabulary Scale 65

DATA ANALYSIS 66

Preliminary Analyses 67 Order of Administration 67 Goodness of Fit 67 Determination of Norm Relevance 68 Central Tendency 69 Variance 69 Preparation of B.C. Norms 70 Wechsler Intelligence Scale for Children—Revised 71 Derivation of Subtest Scaled Scores 71 Derivation of Verbal, Performance, and Full Scale IQ Scores 76 Peabody Picture Vocabulary Test and Slosson Intelligence Test 77 Standard Progressive Matrices and Mill Hill Vocabulary Scale 78 Statistical Properties of the Tests 78 Equating 80 Determination of Nominally Parallel Test Pairs 81 Assignment of Test Pairs to Equating Methods 82 Linear Equating (Equally Reliable Tests) 83 Linear Equating (Unequally Reliable Tests) 86 Equipercentile Equating 87 Summary . 88

vi IV RESULTS : 90

Rate of Response 90 Representativeness of the Sample 91 Results of Preliminary Analyses 96 Order Effect 96 Normality 96 Comparison to Published Norms 99 Comparison of WISC-R Results to "White" American Norms ... 102 Preparation of B.C. Norms 103 WISC-R Scaled Scores 103 WISC-R IQ Scales 108 PPVT and SIT IQ Scores 108 Statistical Properties of the Tests 109 Interpretation of the Norms Ill Equating 114 Identification of Nominally Parallel Test Pairs 115 Designation of Test Pairs to Equating Methods 116 A Comparative Examination of Equating Procedures 116 I. MHVS and WISC-R Verbal 116 II. PPVT and WISC-R Verbal .° 127 Equating Results 135

V SUMMARY AND DISCUSSION 149

Summary 149 Norming 153 The B.C. Scores 153 Use of the New Norms 156 The Comparability of the New Norms -157 Equating 157 The Equatability of Nominally Parallel Test Pairs 157 The Use and Interpretation of Intelligence Tests 160 Limitations of the Study 160 Norming 160 Equating 161 Directions for Further Research 161

REFERENCE NOTES 163

REFERENCES 164

APPENDIX A: Letters and Consent Forms 173

APPENDIX B: WISC-R Canadian Substitution Items 188

APPENDIX C: Project Handbook 190

APPENDIX D: MHVS Scoring Guides 200

vii APPENDIX E: British Columbia Norms Tables 213

APPENDIX F: ANOVA and Cochran's C • 250

APPENDIX G: Graphic Equipercentile Equating Procedure: PPVT IQ to WISC-R Verbal IQ Age 1\ 254

APPENDIX H: Normalized Linear Equating Results 259

viii LIST OF TABLES

Table 1 WISC-R Subtests and Skills Measured 16 2 Pearson Product-Moment Correlation Coefficient for PPVT IQs and Three WISC-R IQs 20 3 Content and Structure of the Tests 24 4 Psychologically and Functionally Equivalent Test Pairs as Suggested by their Content, Authors' Claims, Empirical Validity, and Practitioners' Use 24 5 Relationship between Parallelism and Type of Score Conversion - 44 6 Population Size Stratified by Region, Community Size, School Size, and Age 53 7 Population Percentages: Region by Age 54 8 Percentage of Population within Region by Age 55 9 Target Sample Allocation: Region by Age 57 10 Target Sample Allocation within Region 58 11 Proportional School Size Sampling Procedure Region #1, Community Size A, School Size II 59 12 Measures of Central Tendency and Variability Reported in Test Manuals 68 13 Example of Computation of Scaled Score Equivalents of Raw Scores WISC-R Information Subtest Age 11%, n=l08 72 14 Example of the Graphic Scaled Score Conversion Approach WISC-R Information Subtest Age Ilk, n=l08 74 15 Form of Reliability Coefficient Computed 79 16 Rates of Response for Each Test and Age 90 17 Rates of Response (Percentages) for the Total Sample Design .. 92 18 Sample Percentages by Sex and Age 93 19 Population and Sample Percentages by Region and Age 93 20 Population and Sample Percentages by Community Size and Age .. 94 21 Population and Sample Percentages by School Size and Age 95 22 Level of Education of Head of Household Percent in Population and Sample (AH Ages Combined) 95 23 Summary of Analysis of Variance for Order of Test Administration 97 24 D Values for Kolmogorov-Smirnov Tests for Goodness of Fit ... 98 25 Means and Standard Deviations for All Tests Scored using Published Norms Tables 100 26 Results of t Tests and x2 Tests 101 27 t Tests for Differences in Means between Total B.C. Sample and American WISC-R "White" Sample (Mean=l02) 102 28 Comparison of Results of Analytical and Graphic Scaled Score Conversions Arithmetic Subtest—Age 9% 104

ix 29 Means and Standard Deviations for WISC-R Scaled Scores and Sums of Scaled Scores B.C. Norms 107 30 Raw Score Means and Standard Deviations for PPVT and SIT 108 31 Internal Consistency Coefficients and Standard Errors of Measurement 110 32 Percent of Total Sample in IQ Classification Categories for B.C. Norms and American Norms 113 33 Observed Score Correlation Coefficients Corrected for Attenuation 115 34 Test Pairs Identified for Equating Methods 117 35 Comparative Example of Linear Equating and Equipercentile Equating for Converting MHVS Raw Scores to WISC-R Verbal IQ Scores (Age 11%) 118 36 Standard Errors of Equating for MHVS Raw Scores to WISC-R Verbal IQ Scores (Age 11%) 120 37 Distributions of MHVS Raw Scores and WISC-R Verbal IQ Scores, Age 11% 123 38 Equipercentile Points for MHVS and WISC-R Verbal IQ from Graphic Procedure 125 39 Comparative Example of Linear Equating and Equipercentile Equating for Converting PPVT IQs to WISC-R Verbal IQs (Age 7%) <129 40 Standard Errors of Equating for PPVT IQ to WISC-R Verbal IQ Scores (Age 7%) 130 41 Equivalent Scores PPVT Raw Scores to Standardized WISC-R Verbal Scores (Age 9%) 138 42 Equivalent Scores PPVT Raw Scores to Standardized WISC-R Verbal Scores (Age 11%) 139 43 Equivalent Scores SIT Raw Scores to Standardized WISC-R Verbal Scores (Age 7%) 140 44 Equivalent Scores SIT Raw Scores to Standardized WISC-R Full Scale Scores (Age 7%) 141 45 Equivalent Scores SIT Raw Scores to Standardized WISC-R Verbal Scores (Age 9%) .142 46 Equivalent Scores SIT Raw Scores to Standardized WISC-R Full Scale Scores (Age 9%) 143 47 Equivalent Scores SIT Raw Scores to Standardized WISC-R Verbal Scores (Age 11%) 144 48 Equivalent Scores SIT Raw Scores to Standardized WISC-R Full Scale Scores (Age 11%) 145 49 Equivalent Scores SPM Raw Scores to Standardized WISC-R Performance Scores.(Age 11%) 146 50 Equivalent Scores MHVS Raw Scores to Standardized WISC-R Verbal Scores (Age 7%) 147 51 Equivalent Scores MHVS Raw Scores to WISC-R Standardized Verbal Scores (Age 9%) 148

x LIST OF FIGURES

Figure 1 B.C. Regional Groupings of School Districts 50 2 Test Administration Order (second session) for each Age Group 64 3 Relative Cumulative Frequencies for the WISC-R Information Subtest, age 11% 75 4 Inter-test Score Conversion Methods and Procedures used for Norming and Equating 89 5 Graphic Norming Procedure—WISC-R Arithmetic Subtest, age 9k 106 6 Frequency Distributions of WISC-R Full Scale IQ Scored with American and B. C. Norms (n=340) 112 7 Scatterplot of Obtained Scores with Linear Observed Score and True Score Equating Conversion Lines. MHVS to WISC-R Verbal IQ, age Ilk 122 8 Cumulative Relative Frequencies for MHVS Raw Scores and WISC-R Verbal IQs, age Ilk 124 9 Hand-graphed Equipercentile Equating Conversion Line. MHVS to WISC-R Verbal IQ, age Ilk ••• 126 10 Scatterplot of Obtained Scores with Interpolated, Linear Curve-fitting, and Polynomial Curve-fitting Conversion Lines: Analytical Equipercentile Equating Procedure. MHVS to WISC-R. Verbal IQ, age Ilk • 128 11 Scatterplot of Obtained Scores with Linear Observed Score and True Score Equating Conversion Lines. PPVT IQ to WISC-R Verbal IQ, age Ik - 131 12 Scatterplot of Obtained Scores with Interpolated, Linear Curve-fitting, and Polynomial Curve-fitting Conversion Lines: Analytical Equipercentile Equating Procedure. PPVT IQ to WISC-R Verbal IQ, age Ik 133

xi ACKNOWLEDGMENT

On the last page among these many written, one is overwhelmed by the recollection of the help and support of many and therefore humbly reduced to the admission:

"I did it with the help of my friends"

The fact that their names are not individually listed, does not diminish the recognition of their contribution.

Financial support for this research was provided through a grant

from the Educational Research Institute of British Columbia and a bursary

from the Canadian Association for Educational Psychology.

I wish to express my appreciation to my research supervisor, Dr.

Todd Rogers, who provided the methodological expertise for the study and

a model of the unrelenting pursuit of excellence. I also wish to thank my committee members, Dr. Buff Oldridge for his always generous support,

advice, and grass-roots assistance; and Dr. Juli Conry for her on-going

enthusiasm and encouragement. To Drs. David Robitaille and Ralph Hakstian,

the university examiners, a word of appreciation for their involvement

and stimulating questions. And to Dr. Alan Kaufman, the external examiner,

a special thanks for his time and valued comments.

A large measure of indebtedness goes to the 44 school psychologists

and psychometricians across the province whose volunteer testing contribu•

tions made this study possible. The work is dedicated to them. And

thanks to my graduate student colleagues and friends for their willing

xii and multiple contributions in many stages of the project.

A final word of appreciation goes to the 340 children in public and independent schools throughout B.C. who gave their time and energies to helping all of us through five test administrations.

xiii CHAPTER I

INTRODUCTION

The Challenge to School Psychology

Controversy over the use of standardized tests of intelligence for

educational decision-making is not new. The history of the testing move• ment from the initial enthusiasm for classification of mental ability

to growing skepticism over the misuse and misinterpretation of such class•

ification is well documented (Cronbach, 1975; Houts, 1976; Kamin, 1974,

1975; Marks, 1976-77).

During the last two decades in the United States the use of group

intelligence tests has been restricted and in some cases abolished by

legislation and litigation (see, for example, Loretan [1965] and Mercer

[1979]). Until very recently individually-administered tests of intelli•

gence have remained relatively free from the criticisms of bias, misuse,

and misinterpretation which have characterized the attacks against group

tests. However, in October, 1979, a district court in California

extended judicial control of intelligence testing to include individual

tests on the grounds that they too, were biased against minority group

children (Larry P. vs. Winston Riles, 1979). The-court charged the defen•

dants (psychologists and educators) with the task of ensuring proper future

use of intelligence tests by providing differential norms for each discrete

group with which they are to be used and validation for each specific

purpose for which they are to be applied (Larry P. vs. Winston Riles,

1 2

1979, p. 105).

It should be noted that, beyond that court's jurisdiction, defen• dants of standardized tests in general have stressed the need for more informed interpretation and application of test results (Anastasi, 1967,

1976; Clarizio, 1979; Kaufman, 1979; Lennon, 1978; Vitro, 1978).

It is the test practitioners themselves who are being called to task:

". . . the primary responsibility for the improvement of testing continues to rest on the shoulders of test users" (APA, 1974, p. 7). This refers to professional psychologists and educators who are qualified to administer level B and level C tests defined as follows:

Level B. Tests or aids which require some technical of test construction and use, and of supporting subjects such as statistics, individual differences, the pyschology of adjustment, personnel psychology, and guidance. (Examples: general intelli• gence and' special aptitude tests, interest inventories and person• ality screening inventories.)

Level C. Tests and aids which require substantial of testings and supporting psychological topics, together with super• vised experience in the use of these devices. (Examples: clinical tests of intelligence, and personality tests.). (Cronbach, 1970, p. 18)

In educational settings, intelligence tests of the level B type (e.g., group intelligence tests, the Peabody Picture Vocabulary Test, and the

Slosson Intelligence Test) are frequently administered by teachers and principals. Level C testing, however, is the domain of the school psychol• ogist. In fact, school psychology as a profession grew out of the demand for testers capable of administering level C intelligence tests. The resulting symbiotic relationship between the two has resulted in a chal• lenge to the credibility of the profession parallel to that of the tests themselves (Brown, 1976-77; Mercer, 1977; Sarason, 1977). Concern for the appropriateness of test use and interpretation, therefore, is 3 a professional obligation of the school psychologist. The fact that, in recent years, this professional obligation has become a legal mandate, subject to public scrutiny and judicial investigation, makes it all the more impelling.

The responsibility of the school psychologist, however, extends beyond personal practice. As an "expert" in test theory and measurement in the schools there is a further need to ensure that other test users are accurate in their interpretation of test results. It is toward this aim that the present dissertation was directed.

Background to the Problem

In 1922, E. L. Thorndike pointed out that the development of increas• ing numbers of tests of intelligence necessitated a means for transmitting

"a score obtained with one test into the score that is equivalent to it in some other test" (Thorndike, 1922, p.29). More recently, Lennon

(1966a) claimed that "it^rsgenerally agreed that there is need for data that will permit conversion of scores or IQ's derived from one test to

[equivalent 3 measures derived from other tests" (p. 168). The rationale for the call for equivalent score conversions is based on the desirability of establishing an empirical means for making comparisons among individuals or groups using common-purpose test scores from different test sources.

It has been suggested that one fallacy in the interpretation of intelligence tests is that comparisons of this nature are often made on what may be an erroneous assumption of direct score correspondence between tests which are seen as psychologically equivalent. The basic misunder• standing underlying this practice has been termed the "jingle fallacy": tests which bear the same name measure the same thing. This has fostered the notion that there is a common and therefore constant conceptual meaning 4 in the numerical representation of IQ (Anastasi, 1976; Ebel, 1972, 1976;

Newland, 1973). Despite the fact that it has been repeatedly pointed out that direct score equivalence does not exist from test to test (Anastasi,

1976; Cronbach, 1970; Sattler, 1974; Salvia & Ysseldyke, 1978), there can be little doubt from a practitioner's point of view, that IQ scores from various tests of intelligence are used interchangeably for educational purposes (Barratt, 1956; Birkemeyer, 1964; Ritter, Duffey, & Fischman, .1973).

The major focus of the present study was.directed toward this issue.

Specifically, the question examined was whether it was feasible to establish for individually-administered intelligence tests, the type of score conver• sions advocated by Thorndike (1922) and Lennon (1966a). A second focus was on the development of provincially representative norms as discussed in a later section.

The Anchor Test Study

The Anchor Test Study (The National Test Equating Study in Reading) addressed similar issues. The study was undertaken in response to the recog• nized need for statistical equivalence among the eight most widely used read• ing achievement tests in the United States (Loret, Seder, Bianchini, & Vale,

1974). It had two major objectives: to provide equivalency tables for trans• lating a score on any one of the tests into a score on any other test, and to provide new national norms for each of the tests (Bianchini & Loret, 1974, p.l).

The study was named after the "anchor test';' norming technique which it adopted. This technique involved the identification of one test as the reference scale, or "anchor," to which the scores of other similar tests are converted or normed. The notion was first conceptualized by

Cureton (1941) and later supported by Lennon (1966b) as a national solution to the need for a uniform system of units for all tests. Both Lennon 5 and Cureton advocated endorsement by all test publishers of a standard approach to the norming of intelligence and achievement tests in reference to a single global anchor test. The advantage of this approach is the comparability among test scores yielded by the common norm population.

Briefly, the procedure used in the Anchor Test Study was as follows: one of the tests, the Metropolitan Achievement Test (MAT), was restandard- ized directly on a nationally representative sample of children in Grades

4, 5, and 6. Following that, equivalency tables were established among all eight tests. Finally, norms were derived for the seven non-MAT tests, using the MAT as the reference or "anchor" and reading corresponding norm values from the equivalence tables. Two psychometric questions were examined in this process. The first dealt with the technical feasibility of equating tests which were not designed as parallel, but which were similar in content and function. The second considered the relative efficiency of the linear and equipercentile methods for deriving equivalent scores.

In the following sections, the concepts and terminology associated with score conversion types and methods are briefly introduced prior to a formalization of the research problem. Each of these concepts is dis• cussed in detail in Chapter II. 6

Inter-Test Score Conversions

Two types of inter-test score conversions are familiar in psycho• metric literature: equivalency conversions, based on test equating proce• dures, and comparable score conversions, based on common scaling proce• dures. Both of these were examined in terms of their technical require• ments and their applicability to the present study.

Equivalent Scores

The technical feasibility of equating is determined by the parallel• ism of the tests involved. According to Angoff,

equating, or the derivation of equivalent scores, concerns itself with the problem of unique conversions which may be derived only across test forms that are parallel, that is forms that measure, within acceptable limits, the same psychological function (1971, p. 563).

Ideally, parallelism is defined in the restricted mathematical sense of classical true score theory (see Chapter II). Wesman (1958) pointed out that "these conditions are ordinarily most closely approximated where parallel forms of a test have been constructed—forms which are intended to be interchangeable" (p. 8). For practical purposes, however, equiv• alent score conversions may be desirable among tests which were not spec• ifically constructed in this manner (Flanagan, 1964; Lennon, 1966a;

Marco, Petersen, & Stewart, 1979). The eight reading achievement tests in the Anchor Test Study were such an example. In that study, both high content similarity and high disattenuated intercorrelation coefficients were cited as evidence of adequate parallelism among the tests to justify their treatment as parallel (Loret,et al., 1974).

In the present study, the question of technical parallelism was examined in reference to tests which are employed as functionally parallel but which are known not to have a high degree of content similarity (as 7 will be discussed in Chapter II). Given this, it was not expected that the tests would satisfy the classical criteria for parallelism. Lord and Novick (1968), however, have suggested an application of test theory to tests which are imperfectly parallel in the classical sense. They considered it psychometrically defensible in some cases to define true score in terms of a "family" of tests all measuring a common psychological function rather than in terms of a single test. They introduced the term nominally parallel to describe tests which measure "important aspects or manifestations of the psychological variable under study" (p. 174) despite the fact that there may be some differences among the tests.

Comparable Scores

Unlike the concept of equivalence, which is specific to the tests themselves and, as has been shown, is synonymous with parallelism, compar• ability refers solely to test'score scales. Comparable scores can be achieved by scaling any tests in a common metric for a common reference group. The purpose of such common scaling is to assign the same numerical value to scores occupying the same rank in their respective distributions.

This can be done irrespective of test content and carries no implication of the interchangeability of tests.

Angoff (1971) described comparing as follows:

[Comparing] may be of quite simply as the problem of "equating" tests of different psychological function.(p. 590). [The derivation of comparable scores involves] the problem of nonunique conversions of scores across nonparallel forms (p. 563).

As opposed to test equating, Angoff stressed the broader application of the concept of comparing (i.e., any two scales can be made comparable) and the narrower interpretation of the converted scores. Since the score conversions are nonunique, they are not generalizable beyond the 8 group on which they were established. Lord (1950) pointed out that

"comparability may be expected to hold approximately, however, with respect to groups sufficiently similar to the basic group. Obviously, the basic group should in general be a representative sample of persons of the type for whom the tests are intended" (p. 1).

The technical considerations in the establishment of comparable scores, therefore, are focused on the nature of the group used for estab• lishing the conversions rather than on the nature of the tests themselves.

Equating Methods

In the Anchor Test Study, two equating methods—linear and equi• percentile—were applied to the derivation of the equivalency tables.

These methods are generated by different definitions of equivalency: the equipercentile method defines as equivalent scores on two tests corres• ponding to equal"percentile ranks, while the linear method considers scores corresponding to equal standard-score deviates as equivalent (Angoff,

1971, p. 565). The latter procedure involves a linear transformation function of the form Y = aX + b, while the former procedure is an area transformation which serves to stretch and compress one score distribution to conform to the shape of the other. When test score distributions are similar beyond the first two moments, the linear method is preferred, since it involves an adjustment in mean and standard deviation only.

When test score distributions are dissimilar, the equipercentile method is necessitated.

Test forms which are strictly parallel can be assumed to have iden• tical raw score distributions and therefore to satisfy the requirements for linear equating. An assumption of parallelism of tests such as that made among the reading achievement tests in the Anchor Test Study 9

leads to an hypothesis of the viability of a linear relationship on which to base equating conversions. Further, since linear and equipercentile equating closely approximate one another when score distributions are

identical, the score conversions yielded by the two methods can be expected

to be similar. The results of the Anchor Test Study, however, failed

to support such expectations. It was found that the linear equating method yielded some converted score values which fell outside the corres• ponding score ranges of the tests (Bianchini & Loret, 1974). Therefore

the reported equivalency tables were produced using equipercentile score

conversions.

The Problem

The functional equivalence of intelligence tests, defined by the manner in which they are used in common practice, was established earlier.

The first purpose of this study was to assess the validity of this use.

Five individually-administered intelligence tests were considered: the

Wechsler Intelligence Scale for Children-Revised (WISC-R), the Peabody

Picture Vocabulary Test (PPVT), the Slosson Intelligence Test for Children

and Adults (SIT), the Standard Progressive Matrices (SPM), and the Mill

Hill Vocabulary Scale (MHVS). With the exception of the MHVS, all of

these tests are frequently used in schools in British Columbia, the prov•

ince in which the study was conducted. The WISC-R was chosen to serve

as anchor. In addition to being the most widely used individual intelli•

gence test (Beauchamp, Samuels, & Griffore, 1979; Mercer, 1979), it

has traditionally served as the criterion test against which other measures

of intelligence have been validated (Bersoff, 1980). The remaining

tests have the common advantage of requiring considerably less administra•

tion time than the WISC-R (10-30 minutes each compared to 1-1% hours 10 for the WISC-R). Further, these are level B tests. As such, they are more accessible than the WISC-R and are often used as alternatives to it. Thus, the inter-test relationships examined in the present study were between the four "shorter" tests and the WISC-R. The defensibility of substituting these tests for the WISC-R was considered first on logical and then on empirical grounds. The logical analysis consisted of an examination of the psychological parallelism of the four tests to the

WISC-R. This was operationally defined in terms of the similarity of the psychological trait measured. The empirical analysis involved an investigation of the applicability of the equating methods used in

the Anchor Test Study to the pairs of tests identified as psychologically equivalent.

As with the Anchor Test Study, a preliminary step to the actual

equating was the determination of the technical, or statistical, parallel•

ism of the tests. Given that the five tests in the present study were all tests of intelligence as defined by Lennon,

these tests that profess to measure a more or less general ability, or a very small number of such abilities, assumed to underlie or condition performance across a considerable range of tasks primarily intellectual in character (1978, p. 1)>

Lord and Novick's concept of nominal parallelism was adopted. In the

absence of statistical guidelines for defining nominal parallelism, a

disattenuated correlation coefficient of at least .70 between test pairs

(r > .70) was selected. Correlations of this magnitude are commonly AI —

found between WISC-R scores and scores on other tests of intelligence

(Sattler, 1974; Wechsler, 1974).

The requirement of a large number of examinees on which to base

equivalency conversions, additionally provided the possibility of renorming 11 the tests and thus simulating the Anchor Test Study in both its norming and equating aspects. Therefore the scope of the present study was extended to include the administration of the five tests to a representa• tive sample of children in British Columbia and the development of provin• cial norms. The use of a common standardization sample and common scal• ing procedures ensured comparability among the test scores.

Delimitations of the Study

Native Indians

Problems of bias in the use of standardized intelligence tests with Native Indian children in British Columbia have been recently summar• ized by More and Oldridge (1980). Similarly, Hawthorn (1971), in A Survey of the Contemporary Indians of Canada, said:

The matter of intelligence is a serious one. The tests are known to be invalid for all populations except the one for which they were standardized, that is, English-speaking White middle classes of an urban group. . . . Even if a teacher is aware that test results may not be accurate, it would be difficult for him to look at a series of below-normal scores on a child's PRC [Permanent Record Card] and not conclude that the child has low ability. If school systems had adequate facilities for slow learners, almost every Indian child in the country would be in one oh the basis of his intelligence and achievement tests results. (p. 144)

Hawthorn recommended the abolition of such tests for Indian students.

Intelligence test score means for Native Indian Children in British

Columbia (Fraser, 1969; Goldstein, 1980) resemble those for children from cultural minority groups in the United States (Kaufman & Doppelt,

1976; Mercer, 1979; Reschly, in press). Mercer (1979) suggested that the bimodal distribution of scores resulting from the significantly lower mean scores of these nonwhite children precludes their treatment as part of the same statistical population as white, middle-class children.

Therefore, a national norms group including the proportional representation of such minority group children is inappropriate. Alternately, Mercer advocated separate norms for minority groups (1979). The position state• ment of the National Association of School Psychologists (NASP, 1976)

similarly stated, "All student information should be interpreted in the context of the student's socio-cultural background" (p. 1). The use

of subgroup or differentiated norms for different-scoring groups has been recommended as psychometrically desirable (Anastasi, 1976; Angoff,

1971) and, as mentioned earlier in this chapter, mandated as legally binding (Larry P. vs. Winston Riles, 1979).

Bearing in mind the arguments against inclusion of Native Indian

children in a representative sample of the total population and being

restricted by practical considerations of sample size from replicating'

the norming procedure for a Native Indian sample, these children were

excluded from the study.

English-Speaking Children

The largely verbal nature of the content and instructions of the

tests in this study necessitated, on logical grounds, the restriction

of the sample to children who were fluent in the use and comprehension

of the English language. This did nof'exclude children for whom English

is a second language, as long as fluency was assured.

Handicapped Children

All children enrolled in classes for the physically handicapped,

emotionally disturbed, or trainable mentally retarded were excluded.

This is consistent with sample restrictions placed on the original stand•

ardization of the WISC-R. (As will be demonstrated in Chapter III,

this was the model adopted for sampling in the present study.) It is

also in keeping with conscientious testing practice since these children usually require special testing procedures.

Age.-

The sample was restricted to children at three ages: 7%, 9%, and

11% years. These ages correspond closely to grades 2, 4, and 6 and were considered broadly representative of children across the elementary grades where requests for individual intelligence testing are most common.

Organization of the Dissertation

The remainder of the dissertation consists of four chapters.

In Chapter II, the review of the literature is presented. This is fol•

lowed by a detailed outline of the procedures for data collection and data analysis in Chapter III and the corresponding results in Chapter

IV. The final chapter contains a summary review of the study and a discussion of the findings. CHAPTER II

REVIEW OF THE LITERATURE

The review of the literature is presented in three major sections.

First, the psychological equivalence of the five intelligence tests con•

sidered in this study is examined to establish the justification for

the equating phase. This is followed by a discussion of the technical

criteria and the methodology for inter-test 'score conversions. In the

final section, the rationale for including the norming phase is presented.

Justification for Equating

In Chapter I a distinction was made between functional and technical

equivalence—the former based on actual practice and the latter on statis•

tical criteria. The argument for the functional equivalence of the

five tests considered in this study was presented in terms of their common

use in educational settings. The determination of technical equivalence

rests on statistical indices of the degree of parallelism of test pairs.

This is examined in detail later in this chapter.

The subject of the present section is the justification for equating

four of the tests (PPVT, SIT, SPM, MHVS) to the WISC-R based on what

may be considered their "psychological" equivalence. This is operation•

ally defined in this context as similarity of the psychological trait

or function measured. The discussion includes a comparison of the hypo•

thetical constructs underlying the tests insofar as these are reflected

in the statements of purpose of their respective authors and in the

14 content and format of the tests themselves. In addition, the results of correlational and factor analytic studies are cited as providing empir• ical evidence of inter-test similarities. The purpose of this review is to develop a matrix of test pairs which can be considered functionally and psychologically equivalent. This will provide a framework for the further assessment of technical equivalence and the identification of test pairs for equating.

The discussion is organized as follows. First the WISC-R, as anchor or reference test, is described. Subsequently, each of the other tests is examined for similarity of purpose and content to the WISC-R.

Wechsler Intelligence Scale for Children-Revised (WISC-R)

The WISC-R is a test of "global" intelligence which is based on the conceptualization of intelligence as a "multidimensional and multi- faceted entity rather than an independent, uniquely defined trait"

(Wechsler, 1974, p. 5). In keeping with this multidimensional notion, the test actually consists of a set of ten subtests, each internally homogeneous' in content and each "considered necessary for the fuller appraisal of intelligence" (p. 9). The subtests, described in Table

1, are classified as measuring either verbal or performance skills and are combined to yield three summary IQ scores: (1) the Verbal IQ com• puted from the five verbal subtests; (2) the Performance IQ computed from the five performance subtests; and (3) the Full Scale IQ derived from all ten subtests.

The robustness of this verbal-performance dichotomy has been well substantiated by factor analytic studies (Kaufman, 1975; Reschly, 1978;

Silverstein, 1977; Vance & Wallbrown, 1978) and is consistent with a conceptual duality of skills underlying intelligence which is reflected 16

Table 1

WISC-R Subtests and Skills Measured*

Subtests Skills

Verbal Informat ion acquired verbal information Similarities verbal concept formation, logical thinking Arithmetic reasoning ability, numerical accuracy in mental arithmet ic Vocabulary word knowledge Comprehension social judgment, common sense

Performance Picture Completion visual alertness, visual organization, visual Picture Arrangement sequential thinking, nonverbal reasoning Block Design perceptual organization, spatial visualization, visual-motor coordination Object Assembly perceptual organization Coding visual-motor coordination, speed of mental operation, short-term memory

"Summarized from Sattler (1974, pp. 175-187)

in other measures (e.g., Cognitive Abilities Test, Kuhlmann-Anderson

Intelligence Tests, Henmon-Nelson Tests of Mental Ability, California

Test of Mental Maturity). Hierarchical factor solutions also support

the presence of a relatively strong general (g) factor which is contri•

buted to by all subtests (Wallbrown, 1979; Wallbrown, Blaha, Wall-

brown, & Engin, 1975).

The WISC-R, like its predecessor the WISC, shares status among

individual intelligence tests with the Stanford-Binet as one of the "stately,

revered, and venerated devices against which all other tests [are]

measured" (Bersoff, 1980, p. 113). Reviewers acclaimed the WISC-R as

an "even more valuable tool than its highly praised and well-used prede•

cessor" (Kricher, 1978, p. 355) and "the instrument of choice for the 17 psychological assessment of children's intelligence" (Whitworth, 1978, p. 351). As pointed out in Chapter I, however, it is time-consuming to administer and its use is restricted to persons who have undergone extensive training in administration and interpretation. The combination of its accepted validity and its obvious cost have, since its inception, precipitated "a continual search for a test of mental ability which would produce comparable results" (Covin, 1977b, p. 259).

Slosson Intelligence Test (SIT)

The SIT has been a major contender for the role of short-test alternate to both the WISC-R and the Stanford-Binet, primarily because of its relationship to the latter.

Many of the items in the SIT were adapted directly from the Stanford-

Binet and include, at least at the lower age levels, items which could be classified as "performance" tasks (e.g., drawing a circle). Like

the Stanford-Binet however, it is entirely verbal at the upper age levels

(including all three ages represented in this study): that is, items

are presented verbally and require spoken responses. Although this

question-answer mode is uniform, the test content is diverse and "appears

to cover a wide range of behavior relevant to successful school perform•

ance" (Lessler & Galinsky, 1971). Nicholson (1970) analyzed the functions of the SIT using Sattler's (1965) classification schema and identified

items of the following types: language, numerical reasoning, conceptual

thinking, social intelligence, memory, and visual-motor. Using a differ•

ent classification approach, that proposed by Valett (1965), Stone (1975) categorized SIT items as measuring information and comprehension, vocabu•

lary and verbal fluency, arithmetic reasoning, reasoning, memory, and visual motor. Thus the SIT may be considered to measure intelligence 18 in a somewhat general, or varied, sense.

Slosson (1975) himself offered no specific clarification of theoret• ical intent, although he did refer to IQ (with implied generality) as a "'rough' measure of an individual's capacity to , judge and retain knowledge" (1975, p. 24). The stated purpose of the test is as an "indi• vidual screening instrument" (Slosson, 1975, p. iii) which is accessible to untrained testers and which offers the dual advantages of brevity and ease of administration. In several papers published by the Slosson

Educational Publishing firm there are explicit recommendations for the use of the SIT as a substitute for one or both of the WISC and the Stanford-

Binet. Poissant (1967) expressed the hope that the time-saying' SIT would-

"replace the use of the Stanford-Binet" in retesting children enrolled

in special education programs. She reported a correlation of .89 between

the two tests for a small sample of slow- children. Armstrong

and Jensen (1970) similarly suggested that the SIT could be used as a

"substitute" (p. 2) for the Stanford-Binet for screening and retesting purposes. In a later study, Armstrong, Jensen, and Reynolds (1974) made

similar claims for the relationship between the SIT and the WISC. They

tested 198 elementary school students who had been referred for special

class placement. Based, on correlations of .93 and .90 respectively,"

they claim "equivalence" of SIT IQ's to both WISC Verbal and Full scale

IQs (p. 2).

Although previous studies involving the Stanford-Binet are not

directly relevant to the present research, they were included to emphasize

the fact that claims for the "equivalence" and "substitutability" (hence

parallelism) of the short, easy-to-administer test for the longer, psychol•

ogist-administered tests have been made. At the same time, warnings 19 against interchangeability have been issued in the Mental Measurements

Yearbook (Himelstein, 1972, p. 766); in training-oriented textbooks

(Salvia & Ysseldyke, 1978, p. 241; Sattler, 1974, p. 246); and in a

series of comparative studies with the WISC-R (Covin, 1977a, 1977b;

Lowrance & Anderson, 1979; Martin & Kidwell, 1977; Mize, Calloway,

& Smith, 1979). Stewart and Jones (1976), in a ten-year review, summar•

ized the correlational findings for the SIT and Wechsler scales as follows:

(a) the highest degree of relationship is between the SIT and the Wechsler verbal scale, and in most studies, there is virtually an equal degree of relationship between the SIT and the Wechsler Full Scale; (b) there is a lower and much more variable degree of rela• tionship between the SIT and the Wechsler Performance Scale; and (c) with samples of subjects characterized by a wide range of IQ scores, the typical correlation falls in the low .80s. (pp. 374- 375)

On the.-basis of the discussion of test content as being both verbally

loaded and diverse and on the empirical evidence cited above, the SIT

appears defensible as either a verbal or a general test of intelligence.

Thus it can be considered psychologically equivalent to both the WISC-R

Verbal and Full Scales.

Peabody Picture Vocabulary Test (PPVT)

The PPVT has also been considered as a possible substitute for

the WISC-R (Covin, 1977b; Mize, Calloway, & Smith, 1979). Unlike the

WISC-R, however, this test has both a single format and content. It

requires the examinee to select from a set of four pictures the one picture which defines each stimulus word presented by the examiner. In the

author's words, the PPVT was designed "to provide an estimate of a subject's

through measuring his hearing [receptive] vocabulary"

(Dunn, 1965, p. 25). Dunn justified the choice of a vocabulary format

for a test of intelligence on the basis of its proven construct validity 20

(Terman & Merrill, 1937; Wechsler, 1949). He also clarified its specif•

icity:

one must concede that the PPVT is not providing a comprehensive measure of intellectual functioning. Instead, by means of a short, restricted sample of behavior, it attempts to provide a useful prediction of school success, especially in the areas which call more heavily on verbal intelligence. (Dunn, 1965, p. 33)

At the same time, however, he lays claim to the comparability of PPVT

IQs to WISC IQs: "In terms of comparability of scores, the PPVT and

Wechsler I.Q. values appear to be very similar" (Dunn, 1965, pp. 33,

41).

Although it would appear somewhat contradictory that the PPVT, which is recognized as a verbal test of intelligence, should be considered

in relation to the entire WISC-R, research studies typically report corre•

lations between PPVT IQs and all three WISC-R IQs (Covin, 1977b; Mize,

Callaway, & Smith, 1977; Richmond & Long, 1977; Vance, Lewis, & De

Bell, 1979; Vance, Prichard, & Wallbrown, 1978). The results of these

studies, presented in Table 2, unanimously show low correlations between

PPVT and WISC-R Performance IQs.

Table 2

Pearson Product-Moment Correlation Coefficient for PPVT IQs and Three WISC-R IQs

Mize et Vance et Vance et Covin Richmond & al.(1977) al.(1979) al.(1978) (1977b) Long(1977)

PPVT: WISC-R Verbal IQ .58 .43 .66 .57 .77 Performance IQ .41 .12 .50 .20 .32 Full Scale IQ .57 .40 .63 .43 .65 21

The tendency for correlations with Full Scale IQs to approach the

size of correlations with Verbal IQs is obvious and has been reported

in reviews of studies with PPVT and WISC correlations as well. Dunn

(1965) and Sattler (1975) cite median correlations coefficients of .67

and .66 respectively for PPVT and WISC Verbal IQs and of .61 and .63

for PPVT and WISC Full Scale IQs. It may be this factor which encourages

comparison of the PPVT to the entire WISC-R. All studies, however,

reported highest correlations of PPVT IQs with WISC and WISC-R Verbal

IQs. This, coupled with the obvious verbal content of the PPVT and

the author's statements of intent, lend credence to the logical designation

of the PPVT as a singularly verbal measure of intelligence.

Standard Progressive Matrices (SPM)

The case for categorization of the SPM is not as clear-cut as for

either the SIT or PPVT. The test consists of 60 figural designs, or matrices, each with a missing portion. The task for the examinee is

to select, from six or eight given alternatives, the piece which correctly

completes the pattern. Face validity as well as the absence of explicit

verbal content have resulted in the consideration of the SPM as a non•

verbal, or performance-type, test. As a result, the test has enjoyed

some popularity as a culturally-unbiased and language-free measure of

intellectual ability (Anastasi, 1976; Elley & MacArthur, 1962). Results

from correlational studies relating scores on Progressive Matrices scales

to Wechsler Verbal and Performance scores are, however, equivocal.

Birkmeyer (1964, 1965) and Burke and Bingham (1969) found higher correla•

tions between Matrices and Wechsler Performance scores than between Matrices

and Wechsler Verbal scores for children and adults. Others (Barratt,

1956; Martin & Wiechers, 1954; Mehrotra, 1968; Stacey & Carleton, 22

1955) though, failed to find a stronger relationship with one scale than with the other. In fact, in all of these latter studies, the highest correlations were reported between Matrices and Wechsler Full Scale scores.

These findings cannot be consistently accounted for by greater variability in the Full Scale scores.

Claims that the Progressive Matrices tests measure intelligence in a more "general" than specific sense have been postulated since its introduction. As summarized in Burke's (1958) review, Spearman himself regarded the 1938 Progressive Matrices as the "best of all" nonverbal test of £ (cited in Burke, 1958, p. 210). Vernon and Parry also considered it "an almost perfect g_ test" (cited in Burke, 1958, p. 210). More recently

Bock (1973) postulated semantic components in the Progressive Matrices tests. This notion has been borne out in factor analytic studies which show high loadings on a verbal factor (Wiedl & Carlson, 1976) or on a single general factor which is itself highly verbal (Burke & Bingham,

1969).

Raven considered the SPM a test of "observation and clear thinking" which by itself would be mistakenly described as a measure of general intelligence (1960, p. 2). Despite his warning, and as indicated by many of the studies reported above, the SPM has become popular in North

America for this *very purpose. It has additionally, like the SIT and

PPVT, been singled out as a candidate for WISC substitute or alternate

(Martin & Wiechers, 1954; Mehrotra, 1968).

Despite the fact that some degree of relationshp has been proposed between the SPM and each of the three WISC-R scales, the strongest evidence remains in support of the SPM as a nonverbal measure. Certainly item content, response format, and author's intent suggest this. Thus 23

consideration of the SPM as psychologically equivalent to WISC-R Perform•

ance is proposed.

Mill Hill Vocabulary Scale (MHVS)

The MHVS, Oral Definitions Form, consists of two parallel sets

(A and B) of 44 words which are defined orally by the examinee. This

test was constructed by Raven for combined use with the SPM "in place

of a single test of 'general intelligence'" (Raven, 1960, p. 3), arguing

that:

Non-verbal tests are often misleadingly described as tests of intel• ligence when they sample only certain aspects of intellectual func• tioning. The underlying reason for the creation of vocabulary scales to be used together with RPM [Raven's Progressive Matrices] is that the verbal and non-verbal tests are complementary. While RPM is primarily a test of education, vocabulary tests represent measures of reproduction, or measures of acquired information. (Raven, Court, & Raven, 1976, p. G14)

In effect then, Raven proposed a dichotomous measurement instrument having •

a verbal and a nonverbal component.

The obvious absence of references to the MHVS in testing texts

(cf. Anastasi, 1976; Cronbach, 1970; Sattler, 1974) and in the literature

attests to the fact that the scale is little known and seldom used in

North America. Consequently, it was introduced in the present study

to explore'its validity as a verbal measure of intelligence.

Summary

A summary of the content and factorial structures of the five tests

is presented in Table 3. Using the three WISC-R scales (Verbal, Perform•

ance, and Full Scale) for reference, tests which are psychologically

similar to these scales can be identified on the basis of their content,

author's statement of purpose, and/or the results from empirical validity

studies. These are indicated by the appearance of a C, A, or V

respectively in Table 4. The coincidence of two or more of these 24

Table 3

Content and Structure of theTest s

Type of Measure Verbal Nonverbal General

WISC-R Yields Verbal IQ Yields Performance IQ Yields Full Scale IQ score—supported score—supported by score which combines by factor anal• factor analytic Verbal and Perfor• ytic studies studies mance. Constitutes a- measure of g

SIT Verbal format. Item diversity sug• Similar to S-B gests a more general which has high measure verbal factor

PPVT Strictly verbal

SPM Some suggestion Overtly nonverbal. Considered a measure of verbal Considered as such of g_ by some component s by Raven

MHVS Strictly verbal

Table 4

Psychologically and Functionally Equivalent Test Pairs as Suggested by their Content (C), Authors' Cla ims (A), Empirical Validity (V), and Practitioners' Use (P)

WISC-R Verbal Performance Full Scale

SIT

PPVT

SPM

MHVS

SPM & MHVS 25

similarity factors suggests strong evidence for the psychological equiv•

alence of test pairs. This is visually represented by the solid line

enclosures in the body of the table. Test pairs having weaker evidence

of psychological equivalence are illustrated with a broken-line enclosure.

Functionally equivalent test pairs are also represented in Table 4

by the letter P. It is obvious from this illustration that evidence

for the psychological equivalence of the test pairs considered in this

study does not generally substantiate the functional equivalence which

is implied in their common use. The research question posed in Chapter

I concerns the further search of empirical corroboration of the equiva•

lence. In the following section, the criteria for determining technical

equivalence and the methods for deriving inter-test score conversions

are presented.

Inter-Test Score Conversions

For purposes of clarity, the distinction between equivalence and

comparability made in Chapter I will be maintained throughout this disser•

tation. However, since this use of terminology is not consistently

adhered to in the reference sources, a further note of clarification is warranted. Equating and comparing are referred to as types of conversion, where a conversion is an empirical transformation of the score scale of

one test so that it is related to the score scale of another test in some

defined manner. When this transformation yields an equivalency relation•

ship, the score values on the two tests have identical meaning in terms

of the trait being measured. Thus, equal score values represent a quanti•

fication of the same amount of the same trait irrespective of the test

used. When a comparable relationship is established between two tests,

the same numerical score values represent the.same rank position within 26 a specified group. This is a scale relationship only and carries no connotation regarding the equality of possession of the amount of the trait or traits being measured.

A clear example of the difference between the two, and one which exists within the context of this study is as follows. The PPVT has two parallel forms: Form A and Form B. As such, it does not matter which form is used since they yield equivalent measures of verbal intelligence through receptive vocabulary. An individual would be expected to score similarly on both test forms. The WISC-R subtests, on the other hand, are comparable tests. Each subtest has unique content and is scaled from 1-19 with mean

10 and standard deviation 3. Thus a scaled score of 12 has a constant meaning in terms of rank across all subtests (that is, a percentile rank of 75). However, an individual's score of 11 on a given subtest cannot be considered predictive of her score on any other subtest.

Comparability can be seen as the generic relationship, which accounts for much of the intermingling of the terms in the literature. A common definition of comparability is that the distributions of scores on two tests have been made identical for a given group (Angoff, 1971; Lord,

1950; Flanagan, 1951). This applies to equivalent as well as to compar• able tests (although, as will be discussed in the next section, the restricted reference group notion is not necessary for equivalent forms).

Thus equivalent scores share the characteristics of comparable scores.

Equivalence, however, is a specific case of comparability which is limited by the additional requirement of parallelism of the two tests. 27

Equating

Parallel Forms

Angoff (1971) emphasizes that strict adherence to the concept of equating implies two restrictions. The first of these is that the two

instruments in question be measures of the same characteristic (p. 563);

that is, they are parallel forms of the same test. The second is actually

a corollary of the first and requires that "the transformation be indepen•

dent of the groups of individuals used to develop the conversion" (p.

563). Thus equating applies to parallel test forms and yields unique

score transformations which are subsequently generalizable to any and

all groups with which the tests may be used.

The most rigorous definition of parallelism comes from classical

test theory:

Distinct measurements X and X , are called parallel measurements

£ ga g'a if, for every subiect aeP, x = T , , and a(E ) = a(E , ). ga g'a ga g'a

(Lord & Novick, 1968, p. 48)

This definition states that measurement X for person a on test g_ and measurement X for person a on test g' are parallel, if for every member a of the population (p, the true score on g, T , equals the true score — — ga

on £' , x t^, and the standard error of measurement on j> equals the standard

error of measurement on g'. Lord and Novick add:

Thus parallel measurements measure exactly the same thing in the same scale and, in a sense, measure it equally well for all persons. (p. 48)

Gulliksen (1950) provided a semantic definition for parallelism:

from a common sense point of view, it may be said that two tests are "parallel" when "it makes no difference which test you use." It is certainly clear that, if for some reason one test is better than the other for certain purposes, it does make a difference which test is used, and the tests could not be termed parallel, (p. ID 28

More specifically, he outlines two types of criteria for parallel tests.

One of these he terms statistical and summarizes as follows: "In addition to equal means, variances, and reliabilities, parallel tests should have approximately equal validities for any given criterion" (p. 173). The other he refers to as the psychological criterion. This requires that

"the tests should contain items dealing with the same subject matter, items of the same format, etc. In other words, the tests should be par• allel as far as psychological judgment is concerned" (pp. 173-174).

Nominally Parallel Tests

Marco, Petersen, and Stewart (1979) qualify the strict psychometric requirements for equating as "ideal." They suggest that these may be unrealistic in actual testing practice and that "scores must sometimes be equated under less-than-qptimum conditions" (p. 1). A common adap• tation of parallel-forms test equating is with tests which, although they were not specifically constructed as parallel, are used for the same pur• pose. In such situations the criteria for parallelism are relaxed to

some extent. Wesman (1958) says:

Assumptions we can make with regard to content and reliability of parallel forms of one test are not readily acceptable when we are dealing with two somewhat different tests of the same general ability. This situation is one in which the problem of equival• ence frequently arises. (p. 8)

He suggests that in determining the degree of equivalence of scores in

these circumstances, the size of the correlation coefficient is of prime

importance.

The Anchor Test Study represents this type of application of equating

methodology. According to Jaeger (1973), "The major question to be explored

was the technical feasibility of equating tests that were not designed

to be parallel" (p. 3). In that study, the psychological criteria were evaluated by a panel of experts. They judged the tests to require many of the same skills and abilities but with less than perfect overlap (Jaeger

1973). The statistical criterion was interpreted solely in terms of

inter-test correlation coefficients. When corrected for attenuation,

they were found to be uniformly high. "Of the 207 corrected intercorre-

lations obtained in the Anchor Test Study data . . . only'.20 might be

considered significantly below .95; and 20 were above .89" (Loret et

al., 1974, p. 8).

Lord and Novick's (1968) notion of nominally parallel tests formal•

izes the relaxation of the statistical restrictions imposed on strictly

parallel tests. They suggest that the term "does not imply any mathe• matically defined degree of parallelism in any particular respect" (p.

174). Nominally parallel tests are, rather, defined in terms of func•

tional and psychological criteria. They are considered a "family" of

different measurements, each of which measures "important aspects or mani•

festations of the psychological variable under study" (p. 174). In an

earlier paper, Lord (1964) defined as nominally parallel "any set of test

forms that are used interchangeably in common practice, even though they

are not strictly parallel statistically" (p. 335). A person's true score

is then defined as the expectation of observed scores, or of a weighted

expectation of observed scores, .taken over all nominally parallel tests.

Lord and Novick present this as the generic true score symbolized by C

and expressed as:

C = E E. Y (p. 174) a g k gak 30 where

a is an examinee,

g is a given test, and

k indicates repeated measurements for person.a across the population

of tests nominally parallel to g

Thus the generic true score "represents the standing of the examinee with respect to the average of the traits measured by all the G test forms"

(Lord & Novick, 1968, p. 164).

Generic true-score theory differs from classical theory in that there are no assumptions imposed concerning the unbiasedness or non-correla• tion of errors of measurement. Lord and Novick claim that while this complicates generic true-score theory, it also makes it "applicable in

situations that have troubled psychometricians" (p. 184)—situations in which those assumptions are known to be violated.

Marks and Lindsay (1972), in an empirical examination of test equating under less-than-optimum conditions endorsed the notion' of nominally parallel test forms as "quite obviously the one most frequently encoun•

tered in testing practice" (p. 46).' They note, however, that the price

of attempting to implement 'statistical relaxations' is the absence of

definitive guidelines regarding how much latitude can be tolerated within

the equating model. As a result, test equators are forced to rely on

admittedly arbitrary"choices based on individual judgment.

Marks and Lindsay (1972) attempted to clarify this problem by explor•

ing the effects of four parameters—sample size, test length, test reliabil•

ity, and intertestcorrelation—on the accuracy of test equating. Using

a Monte Carlo procedure, they produced simulated score distributions for

specified values of the four parameters. Then, using a regression 31 approach, they compared the relative sizes of the errors of estimate assoc• iated with estimating the true score distributions of one test form from the observed score distributions of the other test form. (It will be noted in the next section that the regression method is a non-standard approach to equating. Marks and Lindsay point out, however, that the use of identically-distributed population densities in their study would yield results quite similar to other linear and curvilinear equating methods.)

The results of the study indicated that the most important factors were sample size and the estimated association between true score distri• butions. Examination of the interaction between true score correlations of .80, .90, and 1.0 and sample sizes of 100, 250, and 500 suggested that with larger sample sizes (n > 500) increasing test form dissimilarity had relatively little effect. Graphic representation of the interaction of these variables indicated sharply increasing error with lower correla• tions. The authors conclude:

In practice and where the investigator feels uncomfortable about the similarity of two or more test forms in terms of the dimensions measured or their length, very large samples of observations are indicated. Conversely, small sample sizes, say 250 observations or less, should be discouraged under a nominally parallel definition of test equating. (Marks & Lindsay, 1972, p. 55)

Equating Methods

Flanagan (1951) presented an historical review of test equating technology. One of the earliest procedures was to calculate the means for the raw score distributions on two tests. Any given score on one test was then "equated" to the corresponding score on the other test by adding or subtracting a constant equal to the difference between the means.

Flanagan (p. 751) points out the shortcomings of this approach since it 32 assumes uniformity of the difference between the score distributions beyond the first moment.

A second method involved the use of regression procedures to compute the "best estimate" of a score on one test form given the score on the second test form. Thorndike (1922) suggests, however, that this method is satisfactory as an equating approach only when the two tests are perfectly parallel and perfectly reliable, so that there is, in effect, no regression.

When tests correlate imperfectly, the resulting regression line will not be unique for all groups, nor will it provide invertible score conversions.

The former problem violates the requirement for uniqueness (Angoff, 1971, p. 563). The second violates a logical conception of the equivalency relationship, namely that the conversion is fixed and that the resulting table of equivalent score values can be read in either direction (cf.

Marco et al., 1979).

Both the linear and equipercentile methods bear the "tried and true"

stamp of decades of representation in equating literature (Jaeger, 1980).

As stated in Chapter I, these were the two methods applied in the present

study. The equipercentile method derives from a basic definition of

the equivalency relationship initially suggested by Lord (1950) and Flan•

agan (1951) and defined by Angoff (1971) as follows:

Two scores, one on Form X and the other on Form Y (where X and Y measure the same function with the same degree of reliability), may be considered equivalent if their corresponding percentile ranks in any given group are equal. (p. 563) th Thus for two cumulative percentage distributions, P and P the j score A Y t h value on test Y (Y.) is equivalent to the k score value on test X (X, ) J k when P = P . Traditionally, this has been accomplished using a hand- i . X J k graphing technique (Angoff, 1971; Flanagan, 1951). This requires first, 33 the computation of percentile ranks for the raw score distributions on each test. These are then plotted and smoothed on arithmetic probability paper. Next, for both tests, raw scores corresponding to selected percen• tiles are read from the smoothed ogives and plotted against each other on arithmetic graph paper. Finally, a smoothed line is drawn connecting these points and is extended to cover the full range of possible test scores. This final curve records the conversion of scores from test

X to test Y.

Problems associated with this procedure are that it is tedious, non-analytical, and vulnerable to errors in hand-smoothing, particularly in the extremes of the score ranges where data are scant and/or erratic.

In an attempt to address these difficulties and to provide an analytic and verifiable approach to equipercentile equating, Lindsay and Prichard

(1971) developed a computer program which both simulated the manual proce• dure and provided functional equations for predicting equated scores on one test from the other.

From the definition on the previous page, it can be seen that equi• percentile equating involves the determination of pairs of raw scores that cut off equal proportions of the two distributions. This requires the estimation of "missing" raw score points in one distribution to corres• pond to every cumulative percentage point in the other distribution and vice versa. Lindsay and Prichard assumed that "the best estimate of a 'missing' score point on one distribution lies on a straight line con• necting two adjacent CPs [cumulative percentages] and their associated score points" (p. 204). Their procedure was derived as follows:

For the test score distributions X and Y, the raw scores X. and

Y and their associated cumulative percentages P. and P can be 34 represented as ordered pairs (X.,P.) and (Y ,P ). Given that an equiva- lency relationship between and Y^ requires P^ = P , the estimation problem is to find the X associated with the P on distribution Y when n n

P^ does not correspond to any X^. Then, from the assumed linear rela• tionship , X P - PT n - xx' n P P (Lindsay & Prichard, 1971, p. 204) |x2 - xj L 2 - J

where (X^P^ and (X2,P2) are from distribution X, and P^ 4 Pi,P2.

The method, with appropriate substitution, is similarly used to solve for Y . n

This step of the computer program produces two distributions having an equal number of raw score points and an identical set of cumulative percentage values in each. This corresponds to the production of the first graph described in the manual procedure. The next step involves the generation of linear or polynomial regression equations for fitting the two distributions, and corresponds to the second graph in the manual procedure. The particular advantages of the second step are that it facilitates accurate extrapolation beyond the obtained data points.

The equipercentile equating method can be applied to any pair of parallel measures. When the raw score distributions are dissimilar, the 'procedure adjusts the shape of the distribution of scores on one test form to correspond to that of the other test form. This provides a cur• vilinear transformation of scores which serves to compensate for uninten• tional variations in difficulty level between two tests and yields, as 35 the definition of equivalency states, the same converted score regardless of the test form taken.

If the two raw score distributions are of the same shape, differing in none of their moments beyond the second, the same results can be achieved by a linear transformation which adjusts only the origin and the unit of measurement. The definition for linear equating states that:

scores on two tests are equivalent if they corresponds to equal standard-score deviates,

Y - My X - Mx

Sv ~ Sv (Angoff, 1971, p. 564) Y A where M and M are the means and S and S the standard Y X Y A deviations for test forms X and Y.

The terms can be rearranged so that the equation takes the familiar linear transformation form, Y = aX + b, where a = S /S and b = M - aM . In I A Y A the present study, Lord's (1950) notation was adopted. The linear equation is shown as:

SX - SX - Y" = Y + X" Y- (Lord, 1950, p. 5) where

Y" represents the Y score converted to the scale of X, and the means for test forms X and Y are represented by X. and Y. respectively.

Jaeger (1980) pointed out that with imperfectly parallel tests, the linear and equipercentile methods will produce different results and that these differences will be most noticeable in the tails of the score

distributions. This effect was reported for the Anchor Test Study.

The moments (the mean, variance, skewness, and kurtosis) of the score distributions for the [eight] tests differed substantially. Thus, a linear transformation based only on the mean and variance of each distribution was generally inadequate, often yielding equivalent scores which fell outside the corresponding raw score range of the tests. (Bianchini & Loret, 1974, pp. 187-188) 36

Based on the implausibility of some of the linear equating results, this method was rejected in favor of the curvilinear alternative.

The Anchor Test Study was admittedly a pioneering and exploratory endeavor with regard to both equating imperfectly parallel tests and com• paring linear and equipercentile methods. As mentioned in the preceding paragraph, the results of that comparison were evaluated on a logical rather than an empirical basis. In an effort to provide more stringent tools for such comparisons, Jaeger (1980) explored a number of statistical

indices for judging the adequacy of linearity for tests having somewhat non-identical score distributions. He suggested that the-crux of the practical problem was that methodologists fail to provide specific guide•

lines for discriminating "between situations where the linear equating method adequately adjusts for differences between the score distributions of two approximately parallel test forms, and situations where a more complex model is needed" (p. 4). The two major findings of his study were (1) that a linear relationship between tests can best be ensured

in the developmental stage by balancing the item difficulty distributions between tests (this is, by making the tests as closely parallel as possible),

and (2) that there are, as well, statistical means for determining which method should be used with existing tests.

The first of Jaeger's indices examines the similarity of the cumu•

lative score distributions for the two tests after one has been linear-

equated to the other. If they are significantly different on the

Kolmogorov-Smirnov two-sample test, the need for equipercentile equating

is indicated. A second approach checks for linearity of the results

of the equipercentile method using linear and second- and third-order 2 polynomial regression analyses and examining the increments in R . When R2 fails to increase significantly in the higher order analyses, a linear relationship is accepted. The final approach involves a direct comparison of results after both methods have been used. Jaeger created a difference score (equipercentile equated score minus linear equated score) for each raw score value on the converted test. He then assumed that if the raw score distributions were similar enough for the relation• ship to be linear, these difference "scores "should not be systematically predictable from any function of corresponding raw scores" (p. 14). He again used regression analyses to check this assumption.

Unequally Reliable Tests

Although it was not emphasized in the preceding discussion, defini• tions of conversion relationships imply true score components (Flanagan,

1951; Lord, 1950). When tests are equally reliable, observed scores can be used since the relationship between true and observed scores will be approximately the same for both forms. With unequally reliable tests, however, the error components of the two tests are different, resulting

in an unstable relationship between observed scores. In this case, con• versions must be based on true rather than observed scores (Angoff, 1971;

Flanagan, 1951, 1964; Lord, 1950).

Angoff (1971) provided a-lihear conversion equation for unequally

reliable tests which is similar to that for linear equating, with the

exception that the variance components are derived from estimated distri• butions of true scores. The mathematical relationship is represented

by Y - M. X - M. X X

(Angoff, 1971, p. 571) 3'8

where M^ and Mx are the means, SY and the standard deviations, and r^Y

and rxx the reliabilities for tests X and Y. Using Lord's (1950) notation

again, the computational formula becomes

VrYY V'YY where the terms are defined as they were in the linear equating formula.

The inclusion of a discussion of unequally reliable tests under the general heading of "equating" is somewhat misleading and requires some qualification. In their discussion of nominally parallel tests,

Lord and Novick (1968) point out the unfairness to some examinees of the '

'equivalent' use of tests having-unequal reliabilities. When this occurs, a good examinee is put at a disadvantage since it is easier for her to obtain a higher score on a reliable than an unreliable test. This situa• tion can be seen to violate a basic premise of equivalency, namely, that it is a matter of indifference to each examinee which test she takes (cf.

Gulliksen, 1950; Lord, 1977; Marco et al., 1979; Wesman, 1958). Since unequally reliable tests cannot be considered interchangeable, they cannot therefore be equated in any meaningful way (Angoff, 1971; Lord & Novick,

1968).

Angoff (1971), however, in his compendium of equating methods and designs, chose to include the linear conversion procedure using true scores while at the same time emphasizing the non-interchangeability of the tests.

These restrictions make the distinctions among equating procedures both conceptually awkward and difficult to interpret. Linear true score equat• ing was included in the present study, however, because of repeated refer• ences to the use of true scores in the literature (e.g., Flanagan, 1950,

1964) and to round out the methodological examination of equating procedures. 39

Comparing

Nonparallel Tests

Angoff's definition of comparing cited in Chapter I (see p. 7) outlines the oppositional features between that type of conversion and equating. In fact, tests to be compared need not satisfy any of the restrictive criteria for equating. The price of this apparent latitude, however, is that comparable scores are limited in generality. The effect of differences in content for nonparallel tests is that different relation•

ships between the tests will be established for different reference groups

(Angoff, 1964, 1966, 1971; Flanagan, 1964; Lindquist, 1964; Lord, 1950).

In addition, the adoption of different definitions of comparability will

result in different sets of comparable values (Angoff, 1964, 1966, 1971).

For example, a definition of comparability based on equal standard-score

deviates would yield a different conversion than a definition of compar•

ability based on equal percentiles. In contrast to this, all methods

or definitions of equivalency for strictly parallel tests will lead to

the same conversion (Angoff, 1966).

The definition of comparability offered earlier in this chapter

required that the score distribution on one test be made identical to

the score distribution on a second test. The fundamental definition

cited by Angoff (1971) was

that certain agreed-upon moments of the distribution of scores on the two tests be identical with respect to a particular (single) group of examinees. (p. 591)

Angoff suggested further that the "agreed-upon" moments frequently include

only the mean and standard deviation. 40

Probably the most common procedure of defining comparability is simply to administer the two or more tests (frequently a battery of tests) to a common basic reference group and to scale the tests in such a way that the mean and standard deviation have the same numerical values, respectively, on each of the various tests. (pp. 590-591)

The latter definition reflects a principal application of comparable score scales: to provide the means for profile analysis across a battery of tests of different functions. Many educational and psychological tests are scaled in this manner (e.g., Illinois Test of Psycholinguistic

Abilities, Keymath, McCarthy Scales of Children's Abilities, WISC-R).

A second application involves same-purpose but nonparallel tests.

As suggested in Chapter!;, comparability among such tests is frequently assumed when tests have the same number scale. It is now obvious that

the lack of uniformity in standardization procedures makes this assumption untenable.

Millman and Lindlof (1964) report a study which emphasizes this

fact. They administered the vocabulary subtests of three widely used achievement batteries (the California Achievement Tests, the Iowa Tests of Basic Skills and the Metropolitan Achievement Tests) to each of 204

fifth-grade students. Using the equipercentile method, they developed a table of comparable percentile ranks for the tests. Differences among

these rankings were cited as evidence of "the risk involved in comparing

derived scores from different tests standardized on different populations,

as is frequently done when a measure of growth in achievement or a compari•

son of aptitude and achievement is made" (p. 136).

Lennon's (1966) comparison of three intelligence tests yielded

similar results. He examined the equivalence among scores and IQs derived

from "three widely used mental ability measures for secondary school pupils:

Terman-McNemar Test of Mental Ability, Otis Quick-Scoring Mental Ability 41

Tests: Gamma Test, and Pintner General Ability Tests: Verbal Series,

Advanced Test" (p. 198). The equating was based on distributions of

scores for three groups of students, one group per test, where the groups had been closely matched with respect to age, grade, and achievement test

scores. A graphic equipercentile procedure was used to produce equival•

ency tables among the1 three tests. The adequacy of the results was judged by examination of these tables rather than according to any statistical

criterion. Lennon pointed out from this, that although the differences

among IQs were not great, they were, in some cases, large enough to cause

quite different decisions to be made about an individual depending on

the test used. His conclusions include the necessity of referencing

any IQ to its specific test source since score consistency across tests

cannot be assumed. Consequently, where inter-test comparisons are desired,

these must be based on empirically-determined converted scores rather

than on existing scores from different tests.

There are two issues regarding inter-test score conversions for

nonparallel tests which have received considerable attention: 1) the

technical problems in achieving precision in converted scores, and 2) the

potential for misinterpretation of the results (Angoff, 1964, 1966, 1971;

Flanagan, 1964; Lennon, 1964; Lindquist, 1964). The technical problems

arise from the restrictions stated earlier, namely that such conversions

are applicable only to the specific conditions under which they are derived.

As a result, the establishment of an appropriate reference group becomes

of prime importance. Angoff (1964, 1966, 1971) offers three possibili•

ties: national norms groups, local norms groups, and differentiated norms

groups. The use of a fourth type of group on which to base tables of

comparable scores—that is, on a "happenstance" group of persons for whom 42 scores are available on both tests—has been strongly criticized (Angoff,

1964, 1966, 1971; Lindquist, 1964). Angoff suggested that when tests are quite similar in function, "rough"conversions can be based on the scores from their respective national norms samples. Inevitable differ• ences in the selection of these norms groups, however, violate the premises for comparable scores. The Millman and Lindlof (1964) and Lennon (1966) studies discussed earlier illustrate the inadequacy of this approach.

The most precise and defensible method is to base comparable scores on differentiated norms groups. "The procedure will yield a number of con• version tables, each based on, and appropriate for, a different norms group. . . . The user will be forced to choose the appropriate table with care, keeping in mind the group for which he intends to use it and the purpose for which it is to be applied" (Angoff, 1971, p. 596). An example of differentiated norms group conversions would be the provision of differ• ent tables for each age level covered by the tests. Angoff also endorsed the local norms approach to comparable scores provided appropriate cautions are taken in the selection of the group and in the application of the results.

Problems of misinterpretation of comparable scores arise from their potential confusion with equivalent scores. All four participants—Angoff,

Flanagan, Lennon, and Lindquist—in a symposium entitled "The Equating of

Non-Parallel Test Scores" presented at the 1964 Annual Meeting of the

National Council on Measurement in Education, addressed this concern.

Of the four, Lindquist alone totally rejected such conversions:

No matter what precautions were taken to guard against misuses and misinterpretations, the total effect would probably be increased misuses of the test results, or a more widespread failure to use the test results in the best way possible. (Lindquist, 1964, p. 5) 43

The other three emphasized that with careful selection and definition of the reference group, and with the use of appropriate conversion proce• dures, satisfactory and useful comparable scores could be provided (Angoff,

1964; Flanagan, 1964; Lennon, 1964).

In summary, while it is recognized that comparable score conversions require careful interpretation guidelines, it is also recognized that, when properly'.constructed, such conversions can provide valuable informa• tion to test users. Angoff (1968, 1971) implied a justification of empirical comparability on the grounds of psychometric self-defense.

He claimed that comparisons are made anyway, and that, therefore,

it behooves us to construct a system which, while under the circum• stances, cannot possibly be wholly satisfactory, will at least represent some improvement in the status quo and will avoid some of its obvious imperfections. (Angoff, 1968, p. 13)

Methods

Both linear and equipercentile conversion methods can be used to derive :.comparable scores between tests (Angoff, 1971). Again, as with equating, the use of a linear conversion is contingent on the shapes of the raw score distributions of the tests. Thus the equations defining the basic linear and equipercentile relationships presented earlier for equating are applicable to comparing. When, however, the tests for which comparable scores are to be derived have been administered to a common group of examinees, the simplest procedure for establishing comparability is to scale the tests so that they have equal means and standard devia• tions (Angoff, 1971). This latter procedure was adopted in the present study. 44

Summary

A summary of the indices of psychological equivalence appeared in

Table 4. In Table 5 the statistical indices associated with parallel, nominally parallel, and nonparallel tests, as described in this chapter, are summarized and related to the relevant type of score conversion.

Table 5

Relationship between Parallelism and Type of Score Conversion

Nominally Nonparallel Parallel Forms Parallel Tests Tests

Observed score very high moderately high equating intercorrelations" intercorrelations* equal reliabilities equal reliabilities

True score moderately high equat ing intercorrelations* unequal reliabilities

Comparing low inter- correlations*

*Corrected for attentuation

Renorming

The majority of tests used by school psychologists in British Columbia, as elsewhere in Canada, were constructed and standardized in other countries.

Such adoption of existing tests involves the implicit assumption that

the Canadian population to which the tests are applied is identical to

the "foreign" population for which the norms were prepared. In reference

to the specific tests included in the present study, use of the WISC-R

in British Columbia in 1980 assumes the comparability of British Columbia

schoolchildren to a national sample of American children in 1974 (Wechsler, 45

1974). Similar use of the PPVT assumes the comparability of British

Columbia schoolchildren to a sample of 4000 children and adolescents in

Nashville, Tennessee in 1958 (Dunn, 1965). The reference group for the

SIT is vaguely described as including children and adults from various schools and professional groups in New York State who appear to have been tested in 1961 or 1962 (Slosson, 1975). Finally, use of the SPM or MHVS involves a comparison with all those children living in Colchester, England in 1943 whose surnames started with the letters A or B (Raven, 1958)!

To date there has been little research to substantiate the applicabil• ity of these tests to Canadian children. Laycock (1960) is critical of this situation:

Canada ... is handicapped by its reluctance to spend substan• tial amounts on educational research. As a result it has com• paratively few data..of its own on which to base the use of IQ tests or the solution of other educational problems. (p. 236)

Studies that do exist suggest that further investigation with Canadian children is required. Beauchamp, Samuels, and Griffore (1979) administered

two WISC-R subtests—Information and Digit Span—to Canadian and American children to compare differences in performance which could be attributed to cross-cultural influences. Their subjects were students in two grade

3 classes, one in a rural area in the province of Quebec, and the other

45 miles away, in a rural area in New York state. The authors hypothesized that cultural differences would be reflected in scores on the Information

subtest which contains items with American culture-specific bias. They

further hypothesized that such differences would not exist on the Digit

Span subtest which requires only the repetition of numbers. Results of the study were exactly the opposite of those expected. Although some

specific questions on the Information subtest were answered correctly by a greater percentage of students in one class than another, there was no significant difference between mean scaled scores for the two groups on that subtest. There was a significant mean score difference, however, on the Digit Span subtest with Canadian children scoring higher. The fact that differences were found and that those were not predictable prompted the authors to advise cautious interpretation of scores for

Canadian children. In fact, they proposed the standardization of the

WISC-R in Canada. Peters (1976), in a similar vein, reported no evidence of concern for bias against Canadian students in WISC-R content. He did, however, find mean IQ scores for children in a large city in Saskat• chewan that were consistently higher than the American standardization sample mean of 100. He reported average Full Scale IQ scores of 109.75,

106.58, and 103.41 for children at ages 7%, 10%, and 13% respectively.

Various studies using group tests of intelligence confirm a pattern of higher Canadian scores. In 1966, the standardization of the Canadian

Lorge-Thorndike Intelligence Test yielded results which "clearly indicated the need for Canadian norms. The Canadian students displayed both a different average performance [higher] and a different amount of variabil• ity [more homogeneous] from the American students" (Wright, Thorndike,

& Hagen, 1972, p. 7). Kennett (1972) reported an average IQ of 113.0

(standard deviation = 9.7) for girls and 110.7 (standard deviation = 10.2) for boys across five socioeconomic levels on the Otis Quick Scoring Mental

Ability Test. Oldridge (1968) discussed similar results using the Cali• fornia Test of Mental Maturity and suggested:

Those in education who have been involved in the intellectual and/or academic assessment of Canadian school children have observed the consistently higher rating of these students on American tests using American norms. The higher academic achievement is usually explained as being a function of the higher intellectual ability (I.Q.) of the Canadian groups. (p. 1) Herman (1979) reported that evidence from sequential standardization of intelligence tests in the United States indicated better performance there as well. His paper would suggest that higher IQ scores are a function of time and may reflect general cultural progress rather than national superiority.

Whatever the that may be posed, the studies cited in this section provide ample cause for questioning the adequacy of the existing norms tables for the tests used in the present study. As mentioned in

Chapter I, the need for a large, representative sample on which to base score conversions also satisfied the requirements for an appropriate norm- ing sample. Thus this study included the determination of the applicabil• ity of the published "foreign" norms in British Columbia. CHAPTER III

METHODOLOGY

There were two objectives to the present study: to investigate

the application of linear and equipercentile equating techniques to those pairs of tests identified as psychologically and technically equivalent,

and to establish comparable provincially representative norms for ages

1\i 9%, and 11% years for each of the five tests. Briefly, the procedure

followed was to administer all tests to each child in the sample; to

establish new norm tables for each test; and to apply equating procedures

to the selected pairs of tests. A complete description of the methodology

is presented in this chapter. The sample design and sampling procedures

are described in the first section. Following that, the data collection

and preparation are described. In the final section, the data analysis

is presented under two headings: norming and equating.

Sample Design

A two-stage stratified sample design was used to select probability

samples, representative of the population of interest in the province,

for each of three age levels: 7%, 9%, and 11% years. The choice of

stratification variables was based on those included in the standardization

of the WISC-R in the United States in 1973. At that time, the variables used were age, sex, race (white-nonwhite), geographic region, occupation

of head of household and urban-rural residence (Wechsler, 1974). The white-nonwhite variable was omitted from the present study: apart from

48 49 the exclusion of Native Indian children (see Chapter I), there was no attempt to classify children according to race. The variable, occupation of head of household, was replaced by years of schooling of head of household and was included for descriptive rather than selection purposes since population figures for this variable were available only for the province as a whole

(see p. 61). In summary, the stratification variables adopted from the original WISC-R standardization were age, sex, geographic regions, and commun• ity size (urban-rural). One further variable—size of school—was added to these. This variable has been used in other sampling designs for tests of intellectual ability (cf. Wright, Thorndike, & Hagen, 1972).

Stage I: Schools

At the first stage, the sampling frame consisted of a list of all 12 public and independent schools in British Columbia in which children at one or more of the three age levels were enrolled. Determination of a specific age level was based on corresponding grade levels, with grades 2,

4, and 6 representing ages 1\, 9%, and W\ respectively. A total of 1465 schools—1312 public and 153 independent—were identified and categorized according to the Stage 1 stratification variables: geographic region, com• munity size, and size of school. 3

Geographic region. Seventy-four school districts in the province of

B.C. were grouped into the six regional zones shown in Figure 1 and listed below: ''"Ministry of Education, Report on Education 1976-1977, Victoria, B.C., 1978. 2 Listed in Federation of Independent School Associates, B.C. Inde• pendent Schools, 1978-1979. 3 The Nishga school district was excluded from the study since it exclusively incorporates a Native Indian population. Therefore, schools in all but one district were included in the sampling frame.

51

1. Okanagan 2. Metro 3. Fraser Valley 4. Vancouver Island 5. Kootenay

6. Northern

These zones represent administrative units within the B.C. school system and have previously been used for sample stratification purposes in educa• tional research designs for the provincial learning assessment program

(B.C. Research, Note 1).

Community size. Three community sizes defined in terms of total population, were used: A under 1,000 B 1,001-50,000 C over 50,000

The proportions of the population of B.C. represented in these categories according to 1971 census data (Statistics Canada, 1971) were: A - 25%,

B - 25%, and C - 50%. Schools were assigned to community size on the basis of school address and population figures from the Surveys and Mapping

Branch, Ministry of the Environment, Victoria, B.C. (1978-1979). In cases where school location was unspecified, the address was obtained

from the relevant school district office.

Size of school. School size was defined by total student enrollment.

Three categories of school size were determined:

I. to 150 II. 151-300 III. over 300

Choice of school size categories was determined to ensure the adequate

representation of all sizes of schools in the sample. 52

Stage II: Individuals

The sampling frame for individuals consisted of a list of all non-

Native Indian, English-speaking children enrolled in each selected school.

The lists excluded children with physical, emotional and mental handicaps as outlined in Chapter I. There were two stratification variables at this stage: age and sex.

Age. Three age groups were included: 7%, 9%, and 11% years.

Ages were defined within three months of the midyear; thus 7% years spanned a 6-month range from 7 years 3 months to 7 years 9 months. Ages 9% and

11% were similarly defined. All children whose ages corresponded to these 6-month ranges at the estimated time of testing were listed.

Sex. Within each age group the children were classified by sex.

Population Sizes

Population data were based on the 1976-1977 school year enrollment figures (Ministry of Education, 1978). These were the most recent data available at the time the sample was drawn. Although enrollment numbers have declined since that time, the reduction was judged to be consistent across school districts (Rees, Note 2).

The full sample design is shown in Table 6. Estimates of the popu•

lation size in each cell were calculated independently for ages 7%, 9%, and 11%. To do this, age levels within schools were assumed to have a one-to-one correspondence.with grade levels. It was further assumed that class size was constant across grades. An estimate of the number of children at each age level within each school was then determined by dividing the total school enrollment by the number of grades. For example,

in a school with 320 children and grades K-7 it was estimated that there were 40 children at each level. 53

Table 6

Population Size Stratified by Region, Community Size, School Size, and Age

Community School Region Size Size 1 2 3 4 5 6

A I 597 34 588 460 313 566 II 641 748 896 184 511 III 395 837 399 71 112 r- B I 245 403 352 216 287 105

A I .:• 524 34 477 335 294 566 II 641 787 896 184 511 III 395 918 559 95 112 A" ON B I 223 326 237 194 220 105 0) oo II 897 1439 1027 893 744 950 < III 1650 3808 1306 1219 1006 1904 C I 249 693 — 192 25 II 464 2090 846 - 193 III 1473 8062 2338 1014

A I 465 17 447 268 289 537 II 641 - 737 837 190 511 III 395 - 918 484 95 112 B I 174 157 212 125 200 105

Note. The population sizes for each age level were calculated by dividing total enrollment per school by number of grades within the school. The procedure is described more fully on page 52. 54

Using the data in Table 6, population percentages were calculated as a basis for determining sample dispersion. Working independently at each age level, percentages were first calculated for each region, as shown in Table 7, and then within each region and age, for the remaining cells of the design as shown in Table 8.

Table 7

Population Percentages: Region by Age

Region 1 2 3 4 5 6 Vancouver South Greater South Island & Central Vancouver Mainland South Coast Southeast North

Age 1\ 14.9 38.4 11.2 17.3 5.9 12.3

Age 9% 15.1 38.2 11.0 17.3 5.9 12.5

Age \\\ 14.6 35.9 11.6 17.6 7.0 13.2

Note: The fact that one row of the table does not total 100% is due to the effect of rounding each percent to the nearest tenth.

Sampling Procedures

Sample Allocation

The target sample size was established on the basis of the following considerations.

1. The American standardization sample for the WISC-R included 200 chil•

dren at each age level. This sample size was approximated as closely

as possible.

2. It was necessary to realize many practical constraints in terms of

financial and personnel resources. The scope of the study required 55

Table 8

Percental >e of Population within Region by Age

Community School Region Size Size 1 2 3 4 5 6

A I 9.2 12.1 6.1 12.3 10.6 II 9.9 - 15.4 11.9 7.2 9.5 HI 6.1 - . 17.2 5.3 2.8 2.1 r» B I 3.8 2.4 7.2 2.9 11.3 2.0 II 12.7 8.6 21.1 12.8 26.9 19.5

Ag e III 24.2 22.8 26.9 16.2 39.5 32.2

C I 4.2 4.9 _ 2.0 1.6 II 7.2 13.0 - 11.8 - 3.6 III 22.7 48.0 - 31.1 - 18.9

A I 8.0 10.0 4.5 11.6 10.5 II 9.8 - 16.6 12.0 7.2 9.5 III 6.1 - 19.3 7.5 3.7 2.1 rW B I 3.4 2.0 5.0 2.6 8.6 2.0 II 13.8 8.8 21.6 12.0 17.7 0) 29.3 III 25.3 23.2 27.5 16.3 39.6 35.4 < C I 3.8 4.2 _ 2.6 _ _ II 7.1 12.7 - 11.3 3.6 III 22.6 49.0 - 31.3 —- 18.8

A I 7.6 9.2 3.6 9.8 9.7 II 10.5 - 15.2 11.3 6.4 9.2 III 6.5 - 18.9 6.6 3.2 2.0

B I 2.8 1.0 4.4 1.7 6.8 1.9 II 13.1 8.0 24.0 10.1 25.2 18.3 I—I III 26.6 28.4 CD 25.3 20.6 48.7 36.7 •oo 1 _ 2.7 < C 2.0 II 7.6 9.3 - 11.1 — 3.5 III 24.1 53.5 — 33.0 — 18.2

Note: The fact that some of the columns of the table do not total 100% is due to the effect of rounding each percent to the nearest tenth. 56

extensive reliance on the availability of volunteer assistance. It

was therefore desirable to limit the testing both in terms of indi•

vidual and total project time commitment.

3. Since sample size is directly related to the precision of the estimation

of population parameters, it was necessary to balance the practical

need for size restriction against the need for statistical reliability

gained with a large sample size. This latter need is even greater

for the equating than the norming requirements in the present study.

A target sample size of 180 students (90 girls and 90 boys) at each level was established. Given the fixed population standard deviation of 15 for the WISC-R, this size yielded a standard error of the mean of

1.1 IQ points, as illustrated by the formula:

bt x - n-i -jm ~ '

Each total age sample was then allocated according to the population percentages shown in Tables 7 and 8. Corresponding target sample sizes are shown in Tables 9 and 10. Since the cell sample sizes were fairly consistent across ages (as shown in Table 9), it was decided to test one child at each age level within a single school. Schools were first selected on the basis of sample allocation figures for age 7%. Where possible, the same school was used for ages 9% and 11%. In cases where the sample size was larger at ages 9% and 11%, or where the age 7% school contained only primary grades, the sampling procedure was repeated at the other age levels for the particular cell in question. 57:

Table 9

Target Sample Allocation: Region by Age

Re;gio n 1 2 3 4 5 6 Vancouver South Greater South Island & Central Vancouver Mainland South Coast Southeast North

Age 1\ 14.9% 38.4% 11.2% 17.3% 5.9% 12.3% n=27 n=69 n=20 n=31 n=ll n=22

Age 9% 15.17. 38.2% 11.0% 17.3% 5.9% 12.5% n=27 n=69 n=20 n=31 n=ll n=23

Age 11% 14.6% 35.9% 11.6% 17.6% 7.0% 13.2% n=27 n=65 n=22 n=32 n=12 n-23

Preparation of the School Sampling Frame

In order to provide for sampling flexibility in schools with very small enrollments (there were many rural schools with 1--3 students per age), "superschools". were created consisting of the amalgamation of two or more similar-size schools within the same cell of the sample design.

Superschool amalgamations were restricted to schools within the same school district and within close geographic proximity. The rule adopted for the establishment of "superschools" was that there would be no school unit (a "superschool" is considered one unit) smaller than 2n where n equalled the sample allocation for that cell. When sampling for individuals, a "superschool" was treated as one school unit with all eligible students listed in a single sampling frame.

To ensure equal sampling probability for each child in the population, the following procedure was used (as illustrated in Table 11):

1. Within each cell, the school units were rank ordered from smallest

to largest according to enrollment at age 7%. 58

Table 10

Target Sample Allocation within Region

Community School Region Size Size 1 2 3 4 5 • 6

A I 2 2 2 1 2 II 3 3 4 1 2

A« III 2 4 2 I 1 1 1 1 1 oo < II 4 6 4 4 3 4 III 7 16 5 5 4 7

C I 1 4 1 II 2 9 3 - 1 III 6 33 10 4

A I 2 2 2 1 3 II 3 3 4 1 2 III 2 4 2 1 A™ * B I 1 1 1 1 1 1 oo II 4 6 4 4 3 4 < III 7 16 5 5 4 8

C I 1 3 1 II 2 9 3 1 III 6 34 10 _ 4

A I 2 - 2 1 1 2 II 3 3 4 1 2 III 2 4 2 1

An B I 1 1 1 1 1 1 r-l 4 r-l II 4 5 5 3 3 0) III 7 16 6 7 6 9 00 I 2 1 II 2 6 4 1 III . 7 35 11 4 Table 11

Proportional School Size Sampling Procedure Region #1, Community Size A, School Size II n=3 per age

f Enrollment School Code (per age) cf cp cp x n

8913' 15 15 .023 .07 2716 21 36 .056 .17 1904 22 58 .090 .27 2209 22 80 .125 .38 2325 22 102 .159 .48 8921 22 124 .193 .58 2337 23 147 .229 .69 2604 24 171 .267 .80 2734* 24 195 .304 .91 2738 24 219 .342 1.03

8903 24 243 .379 1.14 8906 24 267 .417 1.25 1603 26 293 .457 1.37 2215* 27 320 .499 1.50 8919 27 347 .541 1.62 3004 28 375 .585 1.76 1510 30 405 .632 1.90 2739 31 436 .680 2.04

1401 32 468 .730 2.19 2207 32 500 .780 2.34 3005 33 533 .832 2.50 2305 35 568 .886 2.66 2432* 36 604 .942 2.83 2605 37 641 1.000 3.00

"'-Selected school 60

2. Cumulative frequencies and cumulative percentages were calculated for

the resulting cell population distributions.

3. The cumulative percentage value was then multiplied by n where n =

student sample allocation for that cell.

4. The schools within a cell were divided into n substrata so that equal

proportions of the cell student population were represented in each

substratum.

Identification of Schools

For each substrata, a single random number was drawn within the range of cumulative frequency values for that substratum (in the example, for the first substratum, a random number between 0 and 200). Continuing with the example'in Table 11, the first random number drawn was 180.

Since the 180th student was enrolled in the school coded as #2734, it became the selected school for this substratum. One school from each of the other substrata was similarly identified.

Preparation of the Sampling Frames for Individuals

Consent for participation in the study was requested sequentially from district superintendents and school principals. Samples of all let• ters are included in Appendix A. Consenting principals were asked to provide class lists including the names and birthdates of all children with the exception of those described in the delimitations • section of Chap• ter I. The class lists were sent to the researcher who then prepared the sampling frame according to the age-range specifications and stratified by sex.

Equal numbers of boys and girls were assigned to each cell when the sample size for that cell was even. When cell sample size was odd, the sex of the "odd" person was determined randomly and the female-male quota of the next adjacent cell (within region) was structured to balance the numbers.

Identification of Individuals

Within schools, a random sample of students, stratified by age and sex, was selected. Student names plus parent letters and consent forms were returned to the principals and were subsequently sent to parents directly from the schools. Testing was contingent on the return of the signed parent consent forms.

In order to allow a more complete assessment of the representative• ness of the sample, parents were requested to provide information on a socio-economic variable: the level of education, in years, of the head of the household (see parent consent form, Appendix A). Thorndike (1951) found this variable to correlate most highly with childrens' intelligence in a study of community predictors of intelligence and academic achieve• ment. At the time this study was conducted, census data on which popula• tion proportions were based were available on a provincial basis only.^

Therefore, the representativeness of the sample on this variable was deter• mined in reference to province-wide data.

The following categories, condensed from 1976 census data, were used:

I. Grade 8 and below II. Grades 9-10 III. Grades 11-13 IV. Post-secondary, non-university V. Post-secondary, including university

For future research, this data will be available for school districts from Data Services, Ministry of Education. 62

Testing

Tests Used

The following five tests were used:

Wechsler Intelligence Scale for Children-Revised (WISC-R)

Slosson Intelligence Test for Children and Adults (SIT)

Peabody Picture Vocabulary Test, Form A (PPVT)

Standard Progressive Matrices, Sets A, B, C, D, and E (SPM)

Mill Hill Vocabulary Scale (MHVS)

Alterations were made to items in the WISC-R and the MHVS; the other tests were administered without change. On the WISC-R, six items in the

Information subtest and one item in the Comprehension subtest were reworded to "Canadianize" U.S.-specific content. The rewordings adopted were recommended by P. E. Vernon (1974, 1976, 1977) on the basis that their use with Canadian children yielded subtest pass percentages similar to those for the American standardization sample. The alternate items are. presented in Appendix B.

In the MHVS, the first 20 words in each set were checked for frequency of occurrence in children's textbooks published in the United States

(Carroll, Davies, & Richman, 1971). An arbitrary frequency value of two or less was chosen as the basis for rejecting a word as "uncommon" and therefore unlikely to be in the vocabulary repertoire of school age chil• dren. One word in each set was identified in this manner and replaced with its synonym from the synonym list provided in the Guide (Raven, 1958).

In Set A the word "dress" was substituted for "frock" and in Set B the word "quarrel" was substituted for"squabble." 63

Testers

In all cases, the WISC-R was administered by "level C" (Cronbach,

1970, p. 18) testers. Generally, the same person administered all tests to a given student. In some situations, the non-WISC-R tests were admin• istered by a learning assistance teacher or a first-year graduate student in school psychology, trained to "level B" (Cronbach, 1970, p. 18) testing competence. The testing was completed by 54 volunteers, many of whom were practicing school psychologists in the cooperating school districts, and five paid research assistants.

Testing Procedures

Each test administrator was supplied with a project Handbook (see

Appendix C) providing standardization information regarding testing proce• dures specific to this study. The individual test administration proce• dures were those described in the respective test manuals with the following alterations:

1. Only 10 WISC-R subtests were given: Mazes and Digit Span were not

administered.

2. The "Canadian" items were substituted for the original test items

on the WISC-R.

3. For the MHVS, the oral definitions procedures were used (Raven, 1958,

p. 29).

Children were tested in two sessions. The WISC-R was administered in the first session and the remaining tests, in counterbalanced order

(as shown in Figure 2) in the second.

Individual test packages were prepared consisting of three parts:

1. the request for subject participation form (see Appendix A);

2. the WISC-R protocols with Canadian substitution items on a separate 64

1, 2, 3 2, 1, 3 3, 1, 2 n=30 n=30 n=30

1, 3, 2 2, 3, 1 3, 2, 1 n=30 n=30 n=30

Figure 2. Test administration order (second session) for each age group.

1 = PPVT; 2 = SIT; 3 = SPM + MHVS

page; and

3. the remaining tests numbered and stapled together in one of the

assigned sequences.

Equal numbers of packages were prepared for each sequence; test sequence was randomly assigned across all subjects.

All testing was completed between May 1 and December 17, 1979.

Preceding each testing session, student consent for participation was secured. A student code number, birthdate, and sex were recorded on each test form. This procedure allowed identification of each student in terms of each of the stratification variables. Subject anonymity was guaranteed by the absence of student names on any lists or in any records.

Scoring and Data Preparation

Completed test protocols were returned to U.B.C. To ensure consis• tency all tests were scored by the researcher following the directions given in the respective manuals. The exception to this was the Mill Hill

Vocabulary Scale: the scoring procedures used for that test are described in detail in the following section. Second party verification of a 10% random sample of tests revealed a .57% error rate for the WISC-R. For all other tests, the error rate was 0%. The data were then coded with

20% random verification and keypunched with 1007o verification. 65 •

Mill Hill Vocabulary Scale

It was necessary to construct scoring guides for the MHVS since existing "criteria for marking" were incomplete and possibly obsolete.

The MHVS Guide (Raven, 1958) offers scoring outlines for only every fifth word on Sets A and B, using criteria established prior to the standardiza• tion of the test in England in 1943-1944 (Raven, 1958; Raven & Walshaw,

1944). Additional scoring criteria are available for the first 40 words of the MHVS (words 1-20 from each of Set A and Set B) as they appear in the Crichton Vocabulary Scale (Raven, 1954). Since these criteria were established in England in 1949, it was felt that they should be examined for applicability to current Canadian usage.

The scoring guidelines developed for the present study were based on Canadian Senior Dictionary (Avis, Drysdale, Gregg, & Scargill, 1979) definitions plus meanings commonly given by children in the sample. For each word in both sets, every qualitatively unique meaning was recorded and a tally was kept of the frequency of occurrence for the entire sample.

The list of meanings for each word was then divided into acceptable and unacceptable responses following the general scoring principles for the

WISC-R Vocabulary Test (Wechsler, 1974, pp. 161-162). In almost all cases, this procedure resulted in obvious categorizations and yielded criteria similar to the original with some rewordings and additions of phrases according to current use. There were two problems, however, for which outside expertise was sought. One of these was the question of acceptability of slang expressions (e.g., "right on," as the meaning for

"precise"). The other concerned the reverse polarization of an originally unacceptable response to a currently acceptable one. In the latter case, one definition was involved: the response "mean" to the stimulus word "cruel" (Set B). In the scoring criteria for the Crichton Vocabulary

Scale, the definition "mean" was listed as an unacceptable response (Raven,

1954, p. 6). Moreover, that particular definition was not included in the resource dictionary used. It was, however, given as a synonym for cruel by 76% of the present sample.

The questions of acceptability of common usage and of colloquialisms were referred to a linguist. Such issues are commonly resolved on an individual basis in reference to the intention of the given test (Ralstan,

Note.3). When the purpose of the test is to measure the vernacular (com• munications) characteristics of language, common usage—including slang— is acceptable. According to Raven, the aim of the MHVS "is to record a person's present recall of acquired information and ability for verbal " (1958, p. 4). This stated intent was judged to be com• patible with the criterion stated above. Therefore slang expressions or colloquialisms and commonly given meanings were considered acceptable.

The complete scoring guides for the MHVS as used in the present study are included in Appendix D.

DATA ANALYSIS

The data analysis consisted of two phases: norming and equating.

In the norming phase of the study, test scores for the B.C. sample were first compared to published test data to determine the applicability of existing norms. In Chapter IV, the need for renorming was estab• lished. The preparation of new norms and the computation of the statis• tical properties of the renormed tests constituted the remainder of this phase.

The equating phase involved the application of test equating techniques •67' to the pairs of tests selected on the basis of their characteristics as outlined in Tables 4 and 5. The determination of relevant test pairs for inclusion in this analysis and the description of the actual procedures used are included in the final section of this chapter.

Preliminary Analyses

Two preliminary analyses were run using raw score data from all five tests. The first of these was to determine the independence of test results from the order in which the tests were administered. The second was to determine the goodness of fit of the B.C. data to the theoretical normal distribution to judge the adequacy of the assumption of normal distri• bution of intelligence test scores in:the population.

Order of Administration

For each test, a one-way, fixed effects analysis of variance was run with order of administration as the independent variable. The six sequences shown in Figure 2 were used as six levels of the factor, order.

The null hypothesis tested was that order of test administration had no effect on test scores. As shown in Chapter.'.IV, the results confirmed the null hypothesis.

Goodness of Fit

The distributions of raw scores on the ten WISC-R subtests, the

PPVT, the SIT, the SPM, and the MHVS were analyzed for goodness of fit to the theoretical normal distribution employing the Kolmogorov-Smirnov test (Siegel, 1956). As shown in Chapter IV, the K-S statistic showed non-significant departures from normality for all tests. 68

Determination of Norm Relevance

IQ scores were derived for each child on the WISC-R, PPVT, and SIT using the procedures and score conversion tables in the respective test manuals. For the SPM and MHVS, the raw score was used since no scaled score conversions were provided by their author (Raven, 1958, 1960). To assess the applicability of the use of the existing norms for each test with B. C. children, measures of central tendency and variability were com• puted and compared to the published norm data. The published values used in these comparisons are shown in Table 12.

Table 12

Measures of Central Tendency and Variability Reported in Test Manuals

(b) (a) Standard Mean IQ Deviation

WISC-R 100 15

PPVT 100 15 SIT 100 16

Median Raw Score SPM 7%. 17 9% 24 11% 35 MHVS 7% 16 9% 22 11% 28 ,69

Central Tendency

Each test was treated separately since the unit of interest was the individual test and since the tests are not administered in combination regular practice. Accordingly the one sample t test was used to test for differences in means between the B.C. sample and the published means:

X. - a t =

Sv//n~ (Glass & Stanley, 1970, p. 293) A where X. is the B.C. mean

a is the published mean

S is the B.C. standard deviation, and X

n is the sample size

The published median raw score values were substituted for a for the SPM and MHVS since it was assumed that the score distributions 7for the original

standardization samples were symmetrical and consequently that the mean and median were equal.

Variance

The test statistic used for assessing corresponding differences in variance was:

2 (n-l)Sy 2 _ X b (Glass &:Stanley, 1970, p. 301)

2 where Sv is the B.C. variance, and X

b is the published variance

Since df > 30, the value of the test statistic was ^converted to a standard normal deviate

z = /2x2 - /2df - 1 (Glass & Stanley, 1970, p. 520)

for hypothesis testing purposes.

The results of these two tests revealed that, inmost eases, the scores

for the B.C. sample were significantly different from the published norm 70 - values (see Chapter IV). Therefore it was necessary to renorm the tests for B.C.

Preparation of B.C. Norms

In this section the norming procedures are described for each test individually. IQ conversion tables with mean 100 and standard deviation

15 were prepared for three tests: the WISC-R, PPVT, and SIT. The result• ing IQ scales are comparable in the.:sense described in Chapters I and II.

That is, the tests were administered to the same group and scaled "in such a way that the mean and standard deviation have the same numerical values, respectively, on each of the various tests" (Angoff, 1971, p. 590). There• fore, a given IQ score value has a constant meaning on all tests in terms of relative position within the population represented by the B.C. standard•

ization sample.

For the WISC-R and the PPVT deviation IQ tables were constructed

following the procedures used in the preparation of the respective published norms. The PPVT scaling procedure was applied to SIT scores as well.

This represents . a change in metric for the SIT since the published norms

include mental age and ratio IQ calculations. Normative data for the

SPM arid MHVS were prepared as percentile ranks associated with the raw

score distributions. IQ score conversions were not provided for these

tests since it was feared that they would be misused. Specifically, since

only the SPM is commonly used (the MHVS is virtually unknown in B.C.),

it is likely that the provision of IQ score conversions would result in

the isolated reporting of SPM IQs. Since such usage would be in violation

of Raven's intent (see Chapter II), the scores on these two tests were

reported as in the original test manuals (Raven, 1958, 1960). All norm

tables are included in Appendix E. •71

Wechsler Intelligence Scale for Children-Revised

In accordance with the procedures reported for the WISC-R (Herman,

Note 4; Matarazzo, ., 1972 ; Wechsler, 1974), tables of normalized scaled score equivalents (mean 10, standard deviation 3) of raw scores were pre• pared for each of the ten subtests for each age group. IQ equivalents

(mean 100, standard deviation 15) were then prepared for three sums of scaled scores: Verbal, Performance, and Full Scale. These procedures are described in detail in this section. In addition, an alternate norming method employing a graphing technique (Angoff, 1971) was applied to the subtest scores for comparative purposes.

Derivation of Subtest Scaled Scores

1. The publishers' method. The procedures followed for normalizing the observed score distributions and transforming them to scaled scores were those used by the Psychological Corporation for the Wechsler tests

(Herman, Note 4). The steps are described below and an actual example of the calculations is presented in Table 13.

(i) The frequency distribution of raw scores was recorded with every

raw score represented. This corresponds to the first two columns

of Table 13.

(ii) The value f was computed for each score. This value is the fre•

quency for a score plus the frequency for the score immediately

above.

(iii) The first and last frequencies were partitioned off as illustrated,

(iv) The q values for the first and last frequencies (i.e., those above

and below the partitions) were computed as f/n.

(v) The remaining q values were calculated by cumulatively dividing

the f' values by 2n from the top down. Thus, for raw score 24, Table 13

Example of Computation of Scaled Score Equivalents of Raw Scores WISC-R Information Subtest Age 11%, n=108

Total Raw Score f f q x SS

30 29 28 27 26 1 1 .009 2.70 18.0 25 2 3 .018 2.46 17.4 24 0 2 .028 1.91 15.7 23 3 3 .042 1.73 15.2 22 0 3 .056 1.59 14.8 21 7 7 .088 1.35 14.0 20 8 15 .157 1.01 13-0 19 12 20 .250 .67 12.0 18 12 24 .361 .36 11.1 17 7 19 .449 .13 10.4 16 10 17 .528 .07 9.8 15 6 16 .602 .26 9.2 14 14 20 .694 .51 8.5 13 10 24 .806 .86 7.4 12 4 14 .870 1.13 6.6 11 6 10 .917 1.38 5.9 10 4 10 .963 1.79- 4.6 9 1 5:' .986 2.20 3.4 8 0 1 .991 2.37 2.9 7 1 1 .009 2. 70 1.9 6 — 1 — — — 5 4 3 73

the cumulative f' is 6. The q value was calculated as 6/216.

(vi) The x values recorded in column 5 of Table 13 were read from the

normal curve table in Kelley (1924). For the first and last raw

scores represented in the frequency distribution (i.e., those with

partitioned-off q values) the tables were entered with q values

and the corresponding X values were read. For the remaining'. qv

values less than .500, the tables were entered with q and corres•

ponding z/q values were recorded. For q values greater than .500,

q was considered as p when entering the tables; z/q values were

recorded.

(vii) A line was drawn across the last three columns above the first

q that exceeds .500. The scaled scores above this line were com•

puted as SS = 10 + 3X. Below the line, scaled scores were computed

as SS = 10 - 3X.

The scaled score range 1 to 19 was adopted from the WISC-R. For many of the tests, it was necessary to extrapolate and smooth scores in

the extreme tails of distributions. This was .done in a manner to preserve

the form of the distribution and to provide a progression of scaled scores

from age to age.

2. A graphic method. Because of the arbitrariness of smoothing

scores in the tails of the distributions using the first approach, an

alternate method based on a graphing technique was employed in the hope

that it would facilitate the extrapolation procedure. This method was

described by Angoff (1971, pp. 515-518) and involved the following steps

(as illustrated in Table 14 and Figure 3):

(i) The frequency distribution of a set of raw scores for a test is

prepared. 74

Table 14

Example of the Graphic Scaled Score Conversion Approach WISC-R Information Subtest Age 11%, n=108

Cumulative Percentile Raw Frequency Frequency Percent Rank (from Normal Scaled Scores Distribution Distribution Below Fig- 3) Deviate Score

30 99.86 2.99 19.0 29 99.74 2.8 18.4 28 99.57 2.63 17.9 27 99.25 2.43 17.3 26 1 108 99.07 98.73 2.24 16.7 25 2 107 97.22 97.75 2.0 16.0 24 96.3 1.79 15.4 23 3 105 94.44 94.3 1.58 14.7 22 91.45 1.37 14.1 21 7 102 87.96 87.5 1.15 13.4 20 8 95 80.56 82. 0.92 12.8 19 12 87 69.44 74.75 0.67 12.0 18 12 75 58.33 65.8 0.41 11.2 17 7 63 51.85 55.5 0.14 10.4 16 10 56 42.59 44.5 -0.14 9.6 15 6 46 37.04 33.0 -0.44 8.7 14 14 40 24.07 23.5 -0.72 7.8 13 10 26 14.82 15.0 -1.04 6.9 12 4 16 11.11 9.0 -1.34 6.0 11 6 12 5.56 5.0 -1.65 5.0 10 4 6 1.85 2.4 -1.98 4.1 9 1 2 .93 .95 -2.35 3.0 8 .325 -2.73 1.8 7 1 1 .078 < 3.00 1.0 6 5 4 3 2 1 O N > 99 UJ => 98 o UJ 95 cr LL 90

UJ 80 > 70 < 60 _J 50 ZD 40 n o 30 20

UJ 10 > 5 < -J 2 UJ CC 1

10 15 20 25 RAW SCORE

Figure 3. Relative cumulative frequencies for the WISC-R Information subtest, age 11%. 76

(ii) Relative cumulative frequencies (percent below each score value)

are computed. These are plotted and smoothed on arithmetic graph

paper.

(iii) New percentile rank values are read from the smoothed curve and

corresponding normal deviates (z ) are determined using a table

of areas of the unit normal distribution,

(iv) Finally, the values are transformed to a scale having the desired

mean and standard deviation, in this case 10 and 3 respectively.

Derivation of Verbal, Performance, and Full Scale IQ Scores

Three sums of scaled scores (Verbal, Performance, and Full Scale) were obtained for each child. The Verbal Score and Performance Score are the sums of the five verbal subtest scaled scores and the five per• formance subtests scaled scores' respectively. The Full Scale Score is the sum of all ten subtest scores. For each age group, the means and standard deviations were computed separately for each sum of scaled scores.

As found with the American standardization sample (Wechsler, 1974, p.

23) and as revealed in Table 30 (Chapter IV) for the B.C. sample, these data were similar across ages. Therefore, the respective sums of scaled scores for all age groups combined were used as a basis for constructing the three corresponding IQ tables.

For each IQ scale, the mean and standard deviation were set equal to 100 and 15 respectively. The conversion from sums of scaled scores to equivalent IQ values was accomplished using the following formula:

IQ = ^ (X. -X.) + 100 (Matarazzo, 1972, p. 509) bx 1 where S = standard deviation for all age groups combined for the approp- A riate sum of scaled scores 77

= any sum of scaled scores

X. = mean for all age groups combined for the appropriate sum of

scaled scores.

As in the WISC-R Manual, for the Verbal and Performance Scales,

IQs were extended to 3 2/3 standard deviations on either side of the mean, and range from 45 to 155. For the Full Scale, IQs were extended to 4 standard deviations on either side of the mean, and range from 40 to 160.

The percentile ranks published in the WISC-R Manual (Wechsler, 1974, p. 25) may be applied to the B.C. WISC-R norms. Therefore new percentile ranks were not calculated.

Peabody Picture Vocabulary Test and Slosson Intelligence Test

The raw score scales for both the PPVT and SIT were converted to

IQ equivalents following the procedure reported in the PPVT Manual (Dunn,

1965, pp. 28-29). The conversion formula was identical to that used in the derivation of the WISG-R IQs. Unlike the WISC-R, however, the

PPVT and SIT IQ conversions were calculated independently for each age level. Thus the means and standard deviations entered into the equation were computed for each age separately. The IQs were extended to 4 standard deviations on either side of the mean, ranging from 40 to 160.

Percentile ranks associated with the score distributions for the

PPVT and SIT were derived using the formula:

%ile = + cf) (Dunn, 1965, p. 29) where n = number of cases in distribution,

f = frequency of a score, and

cf = cumulative frequency of a score. 78

Standard Progressive Matrices and Mill Hill Vocabulary Scale

In the preparation of the norms and in all future calculations, the MHVS score represents the total score on Sets A and B. Normative information for the SPM and MHVS was prepared as raw score distributions with corresponding percentile ranks for each age group, calculated as shown above. This is consistent with the published norm data for these two tests (Raven, 1958, 1960).

Statistical Properties of the Tests

As a final step in the norming phase, internal-consistency reliabil• ity coefficients and standard errors of measurement for each test were calculated as described below.

The internal consistency coefficients used and the item components included in their calculations are summarized in Table 15. With the exception of the WISC-R IQ Scales, all coefficients are split-half correla• tions corrected by the Spearman-Brown formula. Items were excluded from the calculations as noted in order to yield split test halves with equal total scores. Coding is a speeded test for which split-half techniques are inappropriate. Limitations on time and availability of testers pre• cluded the possibility of getting;.a test-retest reliability coefficient for Coding (cf. Wechsler, 1974); therefore reliability estimates for this subtest are not available. Some of the WISC-R subtests, as well as the PPVT and the SIT have variable basal levels below which items are scored as correct although they are not in fact included in the questioning.

To avoid artificially inflating internal consistency estimates by including these items, total "odd" score and total "even" score were calculated on the basis of items actually presented; that is all items within the basal to ceiling range. This was not necessary with the WISC-R Table 15

Form of Reliability Coefficient Computed

Test Reliability Coefficient Components

WISC-R Information split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item Similarities split-half (odd vs. even items), Spearman-Brown correction item #1 to ceiling item Arithmetic split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item Vocabulary split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item 1 Comprehension split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item

Picture Completion split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item Picture Arrangement split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item Block Design split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item2 Object Assembly split-half (odd vs. even items), Spearman-Brown correction all items3 Coding none

Verbal IQ reliability of a composite group of tests (Guilford, 1954) 5 verbal tests Performance IQ reliability of a composite group of tests (Guilford, 1954) 4 performance tests Full Scale IQ reliability of a composite group of tests (Guilford, 1954) 9 tests

PPVT split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item SIT split-half (odd vs. even items), Spearman-Brown correction basal item to ceiling item SPM split-half (odd vs. even items*), Spearman-Brown correction all items MHVS split-half (Set A vs. Set B), -Spearman-Brown correction item #1 to ceiling item, both sets

Excluding item #17; 2excluding item #1; Excluding score of 9 on item #4; scored as 8. 80

Similarities and Object Assembly subtests or with the SPM and MHVS since, for each of these, the basal item is fixed at item #1 for all ages.

The coefficients for the WISC-R IQ Scales were obtained from the formula for the reliability of a composite group of tests (Guilford, 1954, p. 393). As indicated in Table 15, these are based on subtest combinations excluding Coding.

Standard errors of measurement were calculated for all tests using the formula:

SEM = SX ^ " rXX where S = standard deviation, and X r = reliability coefficient. XX

These were derived using scaled score units for the WISC-R subtests;

IQ units for the WISC-R IQ scales, the PPVT, and the SIT; and raw score units for the SPM and MHVS.

Equating

The purpose of the equating phase of the study was to explore empir• ically the equivalence of four tests—PPVT, SIT, SPM, MHVS—to the WISC-R.

The method adopted was to apply test equating procedures to pairs of psychologically and technically equivalent tests and to examine the accur• acy of the converted scores. As discussed in Chapters I and II, the operational definition of technical equivalence used in this study was based on the concept of nominal parallelism (Lord, 1964; Lord & Novick,

1968). Two tests were considered nominally parallel if they measured the same psychological function, were used interchangeably, and had a disattenuated correlation coefficient of at least .70.

The first step in the equating phase of the analysis was to determine if the test pairs identified as psychologically equivalent in Chapter n 81 further satisfied the statistical criterion for equating. These tests were then assigned to observed score or true score equating methods on the basis of the similarity of their reliability coefficients. A summary of test characteristics and equating methods was presented in Table 5 and

is described in detail in the following sections.

Determination of Nominally Parallel Test Pairs

Both Angoff (1971) and Lord (1950) stress the use of raw scores in

equating procedures. Since WISC-R scores are commonly used only in scaled

score and IQ score form, it was necessary to create a new type of WISC-R

score corresponding to raw score rather than normalized score distributions.

The WISC-R components of interest in the equating phase are the summary

Verbal, Performance, and Full Scale scores. It will be recalled that,

in the norming procedures, these summary scores were additive combinations

of scaled scores on groups of subtests. To derive corresponding raw score

combinations, however, a direct summing of subtest raw scores is not possible

since they are in different metrics. Therefore it was necessary to rescale

the subtests to provide equal score intervals.

To accomplish this, scores on each subtest were transformed to standard

(z) scores with mean 0, standard deviation 1. A Verbal score was created

for each individual consisting of the sum of z-scores for the five verbal

tests. Similarly, a Performance score was created as the sum of z-scores

for the five performance tests. The Full Scale scores consisted of the

sum of z-scores on all 10 tests. These summary groupings correspond to

the subtest combinations for the Verbal, Performance, and Full Scale IQs.

To avoid working with negative numbers, the three summary scores were in

turn converted to standardized scores with mean 100 and standard deviation

.15. These are referred to in the following discussion as "standardized" 82

to distinguish them from IQ scores.

Product-moment correlation coefficients were calculated for all pair•

ings of the three WISC-R standardized scores (Verbal, Performance, and

Full Scale) with the raw scores of the remaining four tests. These were

corrected for attenuation using the formula r /Vr- r where r and r XY AA Y.Y XX YY are internal consistency measures and r is the correlation between.the A Y tests.

As discussed previously, nominally parallel test pairs were statis•

tically defined as those having disattenuated correlations of .70 or more.

Examination of the corrected correlation coefficients in Table 33(Chapter

IV) shows that, with the exception of the SPM, the statistical evidence

of parallelism corresponds to the psychological evidence summarized in

Table 4. Since the strictly verbal tests, PPVT and MHVS, were considered

logically "equatable" only to the WISC-R Verbal Scale, no attempt was made

to equate verbal test scores to WISC-R Full Scale scores. The asterisks

in Table 33 (Chapter IV) indicate test pairs identified for equating as

jointly determined by the psychological and statistical criteria developed

for this study.

Assignment of Test Pairs to Equating Methods

In Chapter III, three equating methods were identified based on

Angoff's (1971) work: linear observed score equating, linear true score

equating; and equipercentile equating. A distinction was made between

the applicability of observed score and true score equating methods on

the basis of the similarity of the tests' reliability coefficients.

Nominally parallel tests with equal, or almost equal, reliabilities can be equated using observed scores. When the reliabilities are unequal, procedures based on true score distributions must be used. 83

There are no specific guidelines regarding the tolerance of equating procedures from exact equality of reliabilities. Marks and Lindsay (1972) used test pairs with as large as a .20 discrepancy between reliability

coefficients and found non-significant effects for reliability on size

of equating error. A much more conservative difference (> .06) was

adopted in the present study to ensure the inclusion of the true score

equating method. Examination of the reliability coefficients in Table 31

(Chapter IV) resulted in the further subdivision of the previously identi•

fied nominally parallel test pairs into those with "equal" and "unequal"

reliabilities. This step corresponded to the assignment of test pairs

to observed score and true score equating methods as shown in Table

(Chapter IV).

Linear Equating (Equally Reliable Tests)

The use of a linear transformation for the purpose of equating two

tests is predicated on the following assumptions:

1. the tests are (nominally) parallel,

2. the tests are equally reliable, and

3. the score distributions of the tests are identical (cf. Angoff, 1971).

In the present study, specific test pairs which satisfied the first

two assumptions were selected for equating. Further, since all test score

distributions were shown not to depart significantly from normality the

test pairs selected on the basis of assumptions 1 and 2 were additionally

considered to satisfy assumption 3.

Within the general linear method, the relevant procedure for the

calculation of converted scores is determined by the nature of the data

collection design. In the present study, the subjects were selected

randomly and all tests were administered to all subjects in counterbalanced 84 order. This design satisfies the requirements for Lord's Case I model

(1950, pp. 4-8) provided a fourth assumption can be met, namely:

4. the test results are unaffected by the order of administration of the

tests.

It was pointed out earlier that the scores from the five tests in this

study were not significantly affected by the order of testing. Therefore,

Lord's Case I linear equating model was adopted. This corresponds to

Angoff's variant of Designs I and II (1971, p. 575).

Applying Lord's basic linear model for a converted score

SX - sx - Y* = — Y + X. - — Y. (Lord, 1950, p. 5) bY bY and substituting the terms Y. for the score of individual i on test t s it - - (t = PPVT, SIT, MHVS) and X. for the score of Individual i on test 1W w

(w = WISC- R) the linear equating formula becomes

S. . S w — w — Y*. = -T— Y. + X. — Y. it S it w S t t t where X. , Y. , S , and S are the means and standard deviations of the w t w t

scores on tests w and t.

Two error measures were estimated corresponding to the two used in the Anchor Test Study (Bianchini & Loret, 1974; Jaeger, 1973). The equat•

ing error (S ^) reflects "the degree to which the equating results would vary if the same equating procedure and method were applied to different representative sample of pupils" (Jaeger, 1973, p. 7). The formula used was

! (1 + r ) + 2

2 Sv.„ = /S (1 - r )— — (cf. Lord, 1950, p. 7) Y" J w wt n v 85 where n size of sample,

r uncorrected correlation between scores on test'.w and wt test t, and

Y. - Y. it z . it S t The other error measure, called the conditional mean-root-square error of equating, is an estimate of "the degree to which a score read from the equating tables would differ from the score a pupil would have earned, had he been given the equated test" (Jaeger, 1973, p. 7). The conditional root-mean-square error of equating was computed as

(cf. Bianchini & Loret, 1974, p. 158)

These formulas were applied to all test pairs identified for linear observed score equating (see Table 34, Chapter IV). The score values entered into the equations were raw scores for the PPVT, SIT,and MHVS, and standardized scores for the WISC-R.

Examination of the indices of skewness and kurtosis for the raw score distributions of these tests revealed that, although each individual distri• bution did not differ significantly from normality, for any given test pair the distributions were not exactly identical. Therefore, to insure that the results of the linear conversions reflected the influence of test parameters which were not confounded with anomalies in the score distribu• tions, the distributions ofiscores on all tests identified for observed score equating were normalized in the manner described for the WISC-R sub• tests, and the equating procedures were reapplied to these scores. 86

Linear Equating (Unequally Reliable Tests)

Angoff (1971) presented a linear true score equating procedure which

is applicable under the same data collection conditions as the previously

discussed observed score procedure but which is based on a reduced set of assumptions. Since the restriction concerning equal reliabilities

is not required, the assumptions underlying the analogous "Case I" true

score method are:

1. the tests are (nominally) parallel,

2. the score distributions of the tests are identical, and

3. the test results are unaffected by the order of administration of the

tests (pp. 565-569, 571-573).

Conversion formulae for tests having unequal reliabilities require

true scores rather than observed scores (see Chapter II). Therefore, variance estimates of hypothetical true score distributions were used in place of the various components in the equating formulas. The linear

true score conversion formula used was:

it : It w t Y YY Y YY

The equation for the standard error of equating was:

S /s * (1 + rXY> + 2 V =/SX rXX(1 - rXY} n where n = number of cases

r = uncorrected correlation between scores on test w and test t AI —* —

ZY = Yit " Y-t/SY/rYY

The conditional root-mean-square error of equating was computed as shown for the linear observed score procedure. 87

4 These formulae were applied to the IQ scores (B.C. normed) of all test pairs which were identified as having unequal reliabilities. It will be noted that this equating procedure was applied to IQ scores rather than observed scores. Again, this is in keeping with Angoff's (1971) use. Since there is a very close correspondence between WISC-R standard• ized and WISC-R IQ scores, however, essentially the same results would be achieved regardless of the metric used.

Equipercentile Equating

As described in Chapter II, equipercentile equating involves the identification of corresponding percentiles for two score distributions and the assignment of equivalence to their associated score values. Two procedures, hand-graphing and computerized, were used and their results compared. The steps involved in each are detailed in the following sections.

Hand-graphing (Angoff, 1971, pp. 571-575; Flanagan, 1951, pp. 727-

730).

1. A table of relative cumulative frequencies (i.e., percentages of cases

7 '.. falling below each interval) was computed for each distribution.

2. These values were plotted and smoothed on arithmetic probability paper.

3. Score values for given percentiles were read from the smoothed ogives

and recorded.

4. These values were plotted against each other on arithmetic graph paper.

5. A smoothed line was drawn connecting these points and was extended

to cover the full range of possible test scores.

6. A table of equivalent score values was prepared from this final curve.

4 Since IQ scores are not reported for the MHVS, raw scores from this test were used in the conversions. 88

The hand-graphing equipercentile method was applied to two test equat• ing pairs in the present study. For illustrative purposes, these included pairs of tests having the highest and the lowest intercorrelations. From

T.able 33, these pairs were identified as WISC-R Verbal: PPVT, age 7%; and WISC-R Verbal: MHVS, age 11%.

Computer analogue. The computerized equipercentile program developed by Lindsay and Prichard (1971) was applied to both the observed and normal•

ized IQ score distributions of all test pairs used in linear equating.

This program involved three distinct routines.

1. The interpolation procedure which yielded two distributions with matched

cumulative percentage values and raw score points.

2. The curve-fitting procedure which provided linear and higher-order

equations (to the fifth polynomial) for fitting the two interpolated

distributions.

3. The plotting procedure which produced graphs of the actual smoothed

(interpolated) distributions and of the estimated distributions using

both linear and curvilinear prediction functions.

As a result, comparisons were made possible between converted scores produced by the linear and equipercentile equating methods and between

the graphic and analytical equipercentile procedures.

Summary

At the conclusion of Chapter'II, the three intertest score conversion methods identified for use in this study were summarized according to the

criteria of parallelism of the tests involved. Figure 4 presents an

extended outline incorporating the methods and procedures for each conver•

sion type as described in the present chapter. The results of all anal•

yses are presented in Chapter IV. EQUATING NORMING

Test Pair nominally parallel nominally parallel nonparallel

Characteristics (r- > .70) (rXY<.70) -70) XY equally reliable unequally reliable

(Ur = r ) (Ur F / r ) XX YY XX YY

Conversion Type OBSERVED SCORE EQUATING TRUE SCORE EQUATING COMPARING , I 1 "I Shape of Distributions same shape different shape

Conversion Method LINEAR EQUIPERCENTILE LINEAR \ / no sequence effect

Procedure Lord's (1950) Case I A. Graphic (Angoff, 1971; Angoff (1971) common scaling Flanagan, 1951) X. = 100

Sx = 15 sx sx Sx/rxx Y. + X. Y*. = — Y. + X. - — Y. B. Analytical (Lindsay Y* = •=- it Xt SY Xt W Y t 6 Prichard, 1971) S /r

Figure 4. Inter-test score conversion methods and procedures used for norming and equating. CHAPTER IV

RESULTS

The organization of this chapter parallels that of Chapter III.

The characteristics of the obtained samples are reported first, followed in turn, by the results of the norming and equating phases of the analyses.

Rate of Response

The target sample size for this study was 180 children at each of the three age levels. The actual number of students tested together with the number of usable tests are shown in Table 16 for each age level. As

Table 16

Rates of Response for Each Test and Ag

Target Number of % # % Age Sample Size Students Tested Tested Test Usable Usable

ih 180 115 63.9 WISC-R 115 100.0 PPVT 114 99.1 SIT 108 93.9 SPM 99 86.1 MHVS 114 99.1

180 117 65.0 WISC-R 117 100.0 PPVT 113 96.6 SIT 111 94.9 SPM 108 92.3 MHVS 103 88.0

Ilk 180 108 60.0 WISC-R 108 100.0 PPVT 104 96.3 SIT 100 92.6 SPM 106 98.2 MHVS 105 • 97.2

90 91 seen in this table, WISC-R scores were obtained for each child tested.

For a few students, however, scores were not available on the remaining four tests. For some of these children, testing was not completed because of absenteeism. For the remaining, test protocols were considered unscor- able due to errors in administration.

In Table 17 the rate of response is shown for each cell in the sample design, based on the total returns at each age level. The rate of response is calculated as the percentage of the target sample (see Table 10, Chapter

III) represented in the actual sample.

Representativeness of the Sample

In Tables 18 through 22 the sample is described in terms of the remain• ing four stratification variables and the socioeconomic variable (education) for the WISC-R samples. The percentages reported for the variable sex are based on the target sample goal of equal numbers of girls and boys at each age level. For each of the other variables, the tables show the per• centages, based on 1976-1977 school enrollment data or 1976 census data

(as outlined in Chapter III), that constituted the population, and the per• centages that were obtained in the sample.

From Table 19, it can be seen that the relative proportions of chil• dren tested in each geographic region correspond closely to the population proportions. The Metro and Kootenay regions are slightly overrepresented while the Okanagan and Fraser Valley regions are somewhat underrepresented. 92

Table 17

Rates of Response (Percentages) for the Total Sample Design

Community School Region Size Size 12 3 4 5 6

A I 50.0 50.0 50.0 100.0 50.0 II 33.3 - 33.3 50.0 100.0 100.0 III 50.0 50.0 0.0 B I 100.0 100.0 0.0 0.0 100.0 100.0

Ag e II 75.0 50.0 25.0 75.0 100.0 50.0 III 42.9 50.0 80.0 80.0 75.0 100.0 C I 100.0 175.0 0.0 II 0.0 88.9 - 66.7 - 0.0 III 66.7 63.6 70.0 50.0

A I 100.0 - 50.0 100.0 100.0 33.3 II 33.3 33.3 50.0 100.0 100.0 III 50.0 50.0 0.0- 0.0

AN ON B I 100.0 100.0 0.0 0.0 100.0 0.0 01 II 75.0 50.0 25.0 75.0 100.0 50.0 00 < III 42.9 50.0 80.0 80.0 75.0 75.0 C I 0.0 233.0 0.0 II 0.0 88.9 - 66.7 0.0 III 66.7 61.8 90.0 : 75.0 A I 50.0 - 50.0 100.0 100.0 50.0 II 33.3 33.3 50.0 100.0 100.0 III 0.0 50.0 0.0 0.0 B I 100.0 100.0 0.0 0.0 100.0 100.0 II 75.0 60.0 20.0 100.0 100.0 50.0

Ag e 1 III 57.1 50.0 83.3 57.1 50.0 88.9

C I 200.0 — 0.0 — II 0.0 83.3 50.0 0.0 III 42.9 60.0 63.6 25.0 Table 18

Sample Percentages by Sex and Age

Sex Age Female Male

7% 47.8 52.2 9h 53.8 46.2 llh 47.2 52.8

Table 19

Population and Sample Percentages by Region and Age

Region Population Sample

1. Okanagan Ih 14.9 13.0 9h 15.1 12.8 11% 14.6 12.0 2. Metro Ih 38.4 41.7 9h 38.2 40.2 11% 35.9 38.9 3. Fraser Valley 7% 11.2 7.8 9% 11.0 8.6 11% 11.6 9.3 4. Vancouver Island 7% 17.3 16.5 9% 17.3 18.8 11% 17.6 17.6 5. Kootenay 7% 5.9 7.8 9% 5.9 7.8 11% 7.0 8.3 6. Northern 7% 12.3 13.0 9% 12.5 12.0 11% 13.2 13.9 94

Table 20

Population and Sample Percentages by Community Size and Age

Community Size Population Sample

A—under 1,000 Ih 16.9 13.0 9h 17.0 14.8 llh 16.6 13.0 B—1,001 to 50,000 ih 42.0 41.7 9h 42.1 40.2 llh 44.1 47.2 C—over 50,000 ih 41.1 45.2 9h 40.9 45.3 llh 39.4 39.8

Table 20 shows a slight sample bias toward schools located in large urban rather than rural areas. This is in keeping with the inflated repre• sentation of the Metro region discussed for Table 19. Any trend toward the disproportionate inclusion of large units which might be suspected from the data to this point, however, is disproved in Table 21. More of the children tested were enrolled in small schools than was expected from the target sample allocation based on these population percentages. As was seen in Table 17, this proportion is attributed to the testing of more chil• dren in small schools in the Metro region than anticipated.

The percentages shown in Table 22 for level of education of head of household were derived for the entire province. As mentioned previously, it was not possible to stratify on this variable. Classification accord• ing to level of education was requested from parents after the sample had been drawn (see Parent Consent Form, Appendix A). As shown, a much larger Table 21

Population and Sample Percentages by School- Size and Age •

School Size Population Sample

I to 150 7% 12.6 14.7 9% 10.9 14.5 11% 8.7 12.0 II 151 to 300 7% 29.2 27.8 9% 29.1 26.5 11% 27.4 26.8 III over 300 7% 58.2 57.4 9% 60.0 60.0 63.9 61.1

Table 22

Level of Education of Head of Household Percent in Population and Sample (All Ages Combined)

Education Completed Population Sample (n=258)

Grade 8 and below 13.3 10.5 Grades 9 to 10 16.0 18.6 Grades 11 to 13 27.0 38.0 Post-secondary, non-university 20.0 15.9 Post-secondary, including university 23.6 17.0

Note: The number of parents who responded to this categorization was 258. 96 than expected percentage reported the highest level of schooling completed to be senior secondary (grades 11 to 13).

In summary, although the obtained sample size was less than the target size on which the allocations were based, the sample was dispersed in basic• ally the same relative proportions for all stratification variables.

Results of Preliminary Analyses

Order Effect

The results of analyses of variance to determine if order of test administration had an effect on test performance are summarized in Table

23. An alpha level of .25 was selected since it was more important to protect against a Type II than a Type I error. Three significant F ratios were found. However, with 12 tests, the number of significant findings by chance alone is 3 (12x.25). As seen in Table 23, the significant results were not limited to one test or age level. Thus it was concluded that the results were consistent with what would be expected by chance and that there was no systematic effect attributable to order.

Normality

The results of the Kolmogorov-Smirnov (K-S) one-sample goodness of fit tests are summarized in Table 24. The table reports D values for the point at which the relative cumulative frequency distributions of the theor• etical normal and the observed distributions show the greatest divergence.

For D < K-S(a) it is concluded that there is no significant difference between the two distributions at a level of significance of a. Again, to reduce the chance of a Type II error, an alpha level of .20 was adopted.

This is the lowest significance level reported by Siegel (1956). With

45 tests (as shown in Table 24), the expected number of significant results by chance is 9. However, only one of the test score distributions was Table 23

Summary of Analysis of Variance for Order of Test Administration

Test Source SS df MS F

PPVT Order 100. 934 5 20. 187 0 .266 Residual. 8182. 121 108 75. 76 SIT Order 152. 284 5 30. 457 1 .004 Residual 3094. 473 102 30. 338 SPM Order 337. 273 5 67. 455 1 .307 Residual 4800. 758 93 51. 621 MHVS Order 128. 171 5 25. 634 1 .603' Residual 1727. 438 108 15. 995

PPVT Order 531. 039 5 106. 208 1 .296 Residual 8768. 102 107 81. 945 SIT Order 205. 454 5 41. 091 0 .607 Residual 5961. 504 105 56. 776 SPM Order 691. 859 5 138. 372 2 . 252' Residual 6268. 281 102 61. 454 MHVS Order 22. 535 5 4. 507 0 . 148 Residual 3261. 435 107 30. 481

PPVT Order 447. 293 5 89.459 0. 834 Residual 10509. 203 98 107.237 SIT Order 105. 089 5 21.018 0. 198 Res idual 9954. 492 94 105.899

SPM Order 249. 115 5 49.823 1. 175 oo < Residual 4238. 758 100 42.388 MHVS Order 282. 376 5 56.475 1. 462* Residual 3825. 003 99 38.636

*p < .25 Table 24

D Values for Kolmogorov-Smirnov Tests for Goodness of Fit

1\ 9% 11%

WISC-R Informat ion 0. 020 0. 044 0. 055 Similarities 0. 024 0. 074 . 0. 062 Arithmetic 0. 040 0. 03 7 0. 079 Vocabulary 0. 031 0. 044 0. 054 Comprehension 0. 043 0. 030 0. 029 Picture Completion 0. 026 0. 031 0. 069 Picture Arrangement 0. 048 0. 034 0. 107 Block Design .0. 053 0. 044 0. 034 Object Assembly 0. 040 0. 068 0. 034 Coding 0. 127* 0. 066 0. 048

PPVT 0.102 0.058 0.063

SIT 0.036 0.055 0.040

SPM 0.098 0.051 0.031

MHVS: A 0.052 0.073 0.062 B 0.015 0.062 0.074

*D > K-S(.20) 99 found to differ significantly from normality for a = .20. Thus the hypothesis of normally distributed intelligence test scores in the popula• tion was accepted.

Comparison to Published Norms

The means and standard deviations of the tests scored using the pub• lished norms tables in their respective manuals are shown- in Table 25.

These were compared to the corresponding values reported in the test manuals

(as shown in Chapter III, Table 12). The results of t tests for differ• ences in means and x2 tests for differences in variance are shown in Table

26.

Significant mean score differences for the B. C. sample were found for all tests except the WISC-R Verbal IQ at age 11% and the MHVS at ages

7% and 11%. It will be recalled that the MHVS scoring procedure was esta- lished using the responses from this group of children rather than the original Colchester standardization sample. Therefore the MHVS means reported in Table 29 are based on restandardized scores for the B.C. sample.

The results suggest that, when scored for contemporary usage, B. C. 7% and

11% year olds can define, on the average, approximately the same number of words on the MHVS as could the original reference group. The fact that the 9% year olds scored significantly higher may possibly be attributed to a verbally precocious group whose WISC-R Vocabulary subtest and PPVT scores also indicate a higher level of performance than for the other age groups.

Significant differences in variance were also found for 8 of the

15 comparisons made. In each case, the variance for the B.C. sample was less than for the original standardization sample. 100

Table 25

Means and Standard Deviations for All Tests Scored using Published Norms Tables

Age 1\ 9 h 11 h Mean S.D. Mean S. D. Mean S. D.

WISC-R Informat ion 10.722 (2.553) 10.641 (2. 778) 9.880 (2. 611) Similarities 11.168 (2.982) 11.243 (2. 998) 10.870 (3. 293) Arithmetic 10.826 (2.894) 11.216 (2. 427) 11.259 (2. 552) Vocabulary 10.719 (2.783) 11.128 (2. 355) 9.917. (2. 376) Comprehension 11.158 (2.965) 10.880 (2. 516) 9.824 (2. 437) Picture Completion 11.339 (2.309) 10.897 (2. 513) 10.704 (2. 503) Picture Arrangement 12.035 (3.267) 11.897 (2. 836) 11.157 (2. 721) Block Design 11.504 (3.360) 11.761 (2. 996) 11.654 (2. 921) Object Assembly 11.851 (2.652) 11.643 (2. 998) 11.759 (3. 057) Coding 10.491 (2.841) 10.282 (2. 834) 10.393 (2. 867) Verbal IQ 106.078 (13.785) 106.009 (12. 465) 101.685 (12. 694) Performance IQ 109.922 (13.114) 108.855 (12. 711) 107.685 (13. 531) Full Scale IQ 108.157 (12.351) 107.889 (12. 756) 104.796 (12. 884)

PPVT IQ 106.526 (15.251) 112.708 (14. 348)s 110.490 (14. 991)

SIT IQ 114.954 (11.905) 112.523 (12. 676) 109.480 (15. 418)

SPM (raw) 21.172 (7.241) 31.583 (8. 065) 37.575 (6. 538)

MHVS (raw) 15.412 (4.052) 23.549 (5. 415) 28.410; (6. 284). Table 26

Results of t Tests and x2 Tests

Test t za

WISC-R Verbal IQ 4. 73* -1. 19 WISC-R Performance IQ 8.11* -1. 86 WISC-R Full Scale IQ 7.08* -2. 63* PPVT IQ 4.57* 0. 28 SIT IQ 13.05* -3. 71* SPM 5.73* b MHVS -1.55 - b

WISC-R Verbal IQ 5.21* -2. 54* WISC-R Performance IQ 7.54* -2. 29* WISC-R Full Scale IQ 6.69** -2. 25* PPVT IQ 9.42* -0. 62 •• SIT IQ 10.41* -3. 05* SPM 9.77* MHVS 3.04* -

WISC-R Verbal IQ 1.38 -2. 22* WISC-R Performance IQ 5.90* -1. 40 WISC-R Full Scale IQ 3.87* -2. 03* PPVT IQ 7.14* 0. 03 SIT IQ 6.15* -0. 48 SPM 4.06* - MHVS 0.67 —

*p < .05 az = /2T2" - /2df-l for df > 30

^There is no measure of variability reported in the test manuals for the SPM and MHVS. 102

Comparison of WISC-R Results to "White" American Norms

The American WISC-R standardization sample included black children

in the same proportion as in the population for the age range tested.

Separate analyses for black and white children revealed that the former

scored one standard deviation below the latter (Kaufman & Doppelt, 1976).

As pointed out in Chapter I, Native Indian children in B.C. have been shown

to score in a similar manner (Goldstein, 1980). It may therefore be hypothesized that the exclusion of Native Indian children from the B.C.

sample served to inflate the IQ scores obtained. To check this, t tests

for differences in means were applied to B.C. sample means and WISC-R

"white" means. Kaufman and Doppelt (1976) reported the latter to be 102

for each of the three WISC-R IQ scales calculated across all age groups.

The results of these t tests are given in Table 27. Using mean 102,

significant differences were still found for eight of the nine means.

Table 27

t Tests for Differences in Means between Total B.C. Sample and American WISC-R "White" Sample (mean=l02)

1\ 9h Ilk

WISC-R Verbal IQ 3.17* 3.48* -0.26

WISC-R Performance IQ 6.48* 5.83* 4.37* WISC-R Full Scale IQ 5.35* 4.99* 2.26*

*p < .05 103

Preparation of B.C. Norms

Given the differences noted in the previous section, B.C. norms were prepared for all five tests. The results of renorming are discussed below.

All norms tables are included in Appendix E.

WISC-R Scaled Scores

Two procedures, described in Chapter III, were compared for producing normalized scaled score conversions for WISC-R subtest raw scores. For clarity in the following discussion, these are referred to as "analytical"

(i.e., the publisher's method) and "graphic" (Angoff, 1971). The objective

in either case was to produce conversion tables relating raw score values

for each subtest to scaled scores having range 1-19, mean 10, and standard

deviation 3. An example of the results of the two procedures are shown in

Table 28 for the Arithmetic subtest at age 9%. The raw score to scaled

score conversions associated with the analytical procedure are listed in column 2 and those associated with the graphic procedure are listed in

column 3.

Use of the analytical procedure is relatively non time-consuming

and essentially verifiable. The direct raw score to scaled score conver•

sions, however, are limited by the range of raw scores obtained in the particular sample. For the subtest illustrated, the total raw score range

is 1-18. The obtained score range for the B.C. sample was 7-17. This

leaves 'gaps' in the extremes of the distributions and necessitates extrap• olation of the raw scores to correspond to the scaled score range of 1-19.

The rule of thumb adopted for this was to extend the scores in a manner

that ensured a progression from age to age (Wechsler, 1974, p. 21).

Thus, in general, increasingly higher raw score values were associated with any given scaled score as age increased. Table 28

Comparison of Results of Analytical and Graphic Scaled Score Conversions Arithmetic Subtest—Age 9%

Raw Scores Scaled Scores Analytical Graphic

1 0-2 5 2 3-4 6 3 5-6 7 4 7 8 5 8 9 6 9 10

7 10 _ 8 11 11 9 - 12 10 12 - 11 - 13 12 13 - 13 - 14 14 14 15 15 16 15- 16- 17 16 17 18 17 18 19 18

Mean 10.05 9.21 Standard Deviation 2.95 3.04 105

The graphic approach was attempted in the hope that the shape of the graph wouLd dictate the solution for extrapolation thereby reducing any arbitrariness in the procedure. The graph produced for the Arith• metic subtest, age 9%, in shown in Figure 5. Guidelines for the position• ing and smoothing of the curve were offered by Angoff:

The smoothed curve should in general sweep through the points in such a way as to equalize the divergences of the points on either side of the line. (1971, p. 516)

The positioning of the tails of the curve, however, remains speculative and subject to considerable variability. To judge the efficiency of this procedure, the mean and standard deviation of the scaled scores were calcu• lated and compared to the corresponding values for the analytical procedure.

These are shown below the relevant columns in Table 28. The results shown for this example are indicative of those found across subtests and age

levels and reveal a consistent grapher bias toward low scores.

The tediousness and time-consuming requirements of re-graphing coupled with the failure of the procedure to fix the conversion line for extreme scores resulted in the rejection of this procedure in favor of the analytical method. As a result, the WISC-R subtest scaled scores were derived accord• ing to the procedures used by the publisher, Psychological Corporation.

Reference to Table 29 shows that the desired mean and standard deviation • were achieved with this approach.

The complete conversion tables for all subtests are included in Appen• dix E. The means and standard deviations for the B.C. sample when scored using these tables are presented in Table 29.

107

Table 29

Means and Standard Deviations for WISC-R Scaled Scores and Sums of Scaled Scores British Columbia Norms

ik 9k nk Mean S.D. Mean S.D. Mean S.D.

Verbal Subtests Informat ion 10..0 5 2..9 2 10. 10 3,,0 9 9.,9 2 3,.0 0 Similarities 10..0 4 3.,0 9 10. 03 3,.0 3 9..9 2 2,.9 1 Arithmet ic 9..9 6 2.,8 8 10. 05 2,.9 2 9.,9 7 2,.9 8 Vocabulary 10..1 4 2.,9 0 10. 09 2..9 1 10.,0 3 2,.9 7 Comprehension 9..9 7 2.,8 7 10. 08 3,.0 5 9.,9 6 3,.0 0

Performance Subtests Picture Completion 10.29 2.99 10. 20 3.08 9. 84 2 .84 Picture Arrangement 9.96 3.01 10. 02 3.05 9. 98 3 .01 Block Design 10.15 2.85 9. 94 3.09 9. 98 3 .07 Object Assembly 10.05 2.97 9. 96 3.00 9. 97 2 .95 Coding 10.12 2.99 9. 98 3.06 10. 00 3 .04

Sums of Scaled Scores Verbal Score 50.16 10.74 50.35 11.45 49.80 11.48 Performance Score 50.57 9.23 50.09 9.71 49.78 10.03 Full Scale Score 100.73 16.81 100.44 19.05 99.57 19.01

Mean S.D. All Ages Combined: VS 50.10 11.19 PS 50.16 9.64 FSS 100.26 18.26 108

WISC-R IQ Scales

To check the apparent similarity of means and standard deviations for subtests and sums of scaled scores across age levels, one way fixed effects analyses of variance and Cochran's test for homogeneity of variance

(Winer, 1971) were used. The results are summarized in Appendix F.

In no case were significant differences found for a = .05. Therefore

the Psychological Corporation procedure described in Chapter III was adopted (cf. Wechsler, 1974). The means and standard deviations for the

sums of scaled scores for all age groups combined, as shown in Table 29, were used as the basis for constructing IQs.

PPVT and SIT IQ Scores

IQ Conversion tables for the PPVT and SIT were constructed separately at each age level using the appropriate raw score means and standard devia•

tions in the conversion equation in Chapter III. These values are shown

in Table 30.

Table 30

Raw Score Means and Standard Deviations for PPVT and. SIT

7% 9k 11% Mean S.D. Mean S.D. Mean S.D.

PPVT 68.16 (8.56) 82.41 (9.11) 91.12 (10.31)

SIT 106.45 (5.51) 118.19 (7.49) 129.23 .(10.08) 109

Statistical Properties of the Tests

The internal consistency coefficients and associated standard errors of measurement for each test are reported in Table 31.- The coefficients were calculated following the procedures identified in Table 15, Chapter

III. The standard errors of measurement are in scaled score units for the WISC-R subtests; in IQ units for the three WISC-R IQ scales, the PPVT and the SIT; and in raw score units for the SPM and MHVS.

Comparison of these data with the corresponding published data show that, for the WISC-R (cf. Wechsler, 1974, p. 28), the B.C. reliability coefficients are generally lower as would be expected from the reduced variability in the B.C. sample. The standard errors of measurement, in turn, are slightly higher for the B.C. sample (cf. Wechsler, 1974, p. 30).

The B.C. reliability coefficients are also lower than the reliability of

.97 reported for the SIT (Slosson, 1977, p.v.). However, the latter was based on a test-retest procedure using a group with very high variability.

The findings were reversed for the PPVT (cf. Dunn, 1965, p. 30). Higher reliabilities and lower standard errors of estimate were found for the

B.C. sample. Although the variability was comparable for the PPVT stand• ardization sample and the B. C. sample, the original PPVT reliability coef• ficient was calculated using the parallel form method which is known to yield lower coefficients than other methods. Table 31

Internal Consistency Coefficients and Standard Errors of Measurement

7% 11%

r r r xx xx M -XX M M

WISC-R Information .69 1 .63 .68 1 .75 .86 1..1 2 Similarities .68 1 .75 .77 1 .46 .80 1..3 0 Arithmetic .72 1 .52 .65 1 .72 .70 1..6 3 Vocabulary .59 1 .86 .75 1 .46 .82 1..2 6 Comprehens ion .66 1 .68 .98 .43 .71 1..6 2 Picture Completion .61 1 .87 .68 1 .74 .67 1..6 3 Picture Arrangement .75 1 .51 .55 2 .05 .44 2..3 0 Block Design .85 1 .10 .81 1 .35 .77 1.,4 7 Object Assembly .63 1 .81 .67 1 .72 .62 1..8 2 Coding ------Verbal. IQ .86 5 .39 .93 4 .07 .93 4..0 8 Performance IQ .88 4 .98 .83 6 .23 .82 6..6 1 Full Scale IQ .86 5 .16 .88 5 .41 .87 5.,6 3

PPVT .88 5 .20 .87 5 .41 .87 5,.4 1

SIT .78 7 .04 .79 6 .87 .88 5,.2 0

SPM .89 2 .40 .92 2 .28 .83 2..7 0

MHVS .76 1 .98 .83 2 .23 .88 2,,1 8 Ill

interpretation of the Norms

The renorming procedures for the WISC-R, PPVT, and SIT involved the re-alignment of the IQ scales for these three tests to mean 100 and stand• ard deviation 15. Since the B. C. sample scored higher and with less vari• ability than the American standardization samples using American norms, the effect of rescaling was to lower and spread out the IQ score scales.

The result of this procedure is that below average students will score lower with B. C. than American norms, while students who are much above average will score the same or even higher with B.C. norms. The effects of the rescaling process are shown graphically in Figure 6. The curves were constructed from the actual distributions of WISC-R Full Scale IQ scores pooled across all age groups for the'340 children tested. The peaked curve shows the distribution of IQs when the tests were scored using

American norms. The more symmetrical curve shows the distribution of

IQs for the same tests scored using the new B.C. norms. The effect of renorming on any given score value can be appreciated by imagining the peaked curve to be stretched and compressed into the shape of the second curve. The more pronounced effect on the score changes for below average than above average children can be seen from this diagram.

One of the immediate questions that arises is the effect of this lowering of scores on the way children are classified for educational pur• poses. Table 32 shows the IQ score ranges and classification categories reported in the manuals .for the WISC-R, PPVT, and SIT respectively. The percentages of the B. C. sample falling within each classification are given for the tests scored with American norms and with B.C. norms. The percent represented in each category for B.C.-scored tests corresponds to the per• cent expected based on the normal curve distribution. For all three

113

Table 32

Percent of Total Sample in IQ Classification Categories for B.C. Norms and American Norms

American Theoretical IQ Classification Norms B.C. Norms Normal

WISC-R (n=340) 130 and above Very Superior 4.1 2.6 2.2 120-129 Superior 12.4 7.4 6.7 110-119 High Average (Bright) 25.3 15.9 16.1 90-109 Average 50.6 49.7 50.0 80-89 Low Average (Dull) 6.2 15.3 16.1 70-79 Borderline _ 1.2 7.4 6.7 69 and below Mentally Deficient .3 1.8 2.2

PPVT (n=331) 125 and above Very Rapid Learner 16.9 4.2 110-124 Rapid Learner 31.4 19.6 90-109 Average Learner 43.8 47.4 75-89 Slow Learner 6.0 24.5 Below 75 Very Slow Learner 1.8 4.2

SIT (n=319) 140 and above Very Superior 2.2 .3 120-139 Superior 27.6 9.1 110-119 Bright 28.5 18.5 90-109 Average 37.6 47.0 80-89 Dull 3.1 18.8 70-79 Borderline .6 5.0 50-69 Mild Retardation .3 .6 20-49 Moderate Retardation - .6 0-19 Severe Retardation -

from WISC-R Manual (Wechsler, 1974, p. 26)

'from PPVT Manual (Dunn, 1965, p. 11)

'from SIT Manual (Slosson, 1977, inside front cover) 114 tests, the shift to lower classifications is obvious and pronounced. In actual numbers, using WISC-R Full Scale IQs, five times as many children score 70 and below using B.C. norms than using American norms. For the

SIT, this number is almost seven times as many. For the below 75 class• ification on the PPVT, a little more than twice as many children are in• cluded when scored with B.C. as opposed to American norms.

As seen in this section, the IQ tests (WISC-R, PPVT, SIT) were re-

scaled to conform with conventional use and interpretation: that is, with mean 100 and standard deviation 15. The effect of these adjustments is

a lowering of the score value assigned to a given level of performance.

The implications of these results are discussed in Chapter V.

One further set of tables was prepared following discussions with practitioners concerning the usefulness of the -results of the study. The percentile ranks of IQ scores corresponding to selected standard deviations were calculated for the B.C. sample using the published norms. These are

included with the norms tables in Appendix E.

Equating

This section is organized as follows. First, nominally parallel

test pairs are identified and assigned to observed score and true score

equating methods. Following this, there is a comparative examination of

the outcomes of applying all four procedures—linear observed score equating,

linear true score equating, graphic equipercentile equating, and analytical

equipercentile equating—to two pairs of scores. The pairs chosen for

this illustration were: 1) WISC-R Verbal and PPVT, age 7%, and WISC-R

Verbal and MHVS, age 11%. These represent test pairs having low (corrected

r = .68) and high (corrected r = .92) disattenuated correlation coeffic- XY XY

ients respectively. Although this format necessitates some discussion 115 in order to interpret and assess the significance of the results, a more general discussion of the applicability and suitability of equating these intelligence tests is reserved for Chapter V. Finally, the equivalency tables resulting from the equating procedures are presented.

Identification of Nominally Parallel Test Pairs

The matrix of correlation coefficients, corrected for attenuation, is presented in Table 33,. The coefficients were computed using standardized

Table 33

Observed Score Correlation Coefficients Corrected for Attenuation

WISC-R Standardized Standardized Standardized Raw Score Verbal Score Performance Score Full Scale Score

Age 7%

PPVT .68* .35 .62 SIT .94* .40 .79* SPM .34 .50 .48 MHVS .88* .38 .74

Age 9%

PPVT . 72* .52 .69 SIT .76* .67 . 80* SPM .46 .60 .58 MHVS .82* .43 .70

Age 11%

PPVT .78* .46 .71 SIT .92* .67 .89* SPM .52 .68* .67 MHVS .92* .59 .86

Nominally parallel test pairs WISC-R scores and raw scores for the other four tests. The statistical criterion for nominal parallelism was defined as corrected r > .70.

Application of this criterion to all test pairs identified as psychologic• ally parallel additionally satisfied the criterion for statistical paral• lelism (with the exception of the SPM and WISC-R Performance at ages 7% and 9%). Thus the 13 pairs of tests indicated by the asterisks in Table

33 were selected for equating.

It -was noted in Chapter III that the linear true score equating proce• dure was based on converted IQ scores rather than on observed scores.

In that chapter it was further pointed out that there was a close corres• pondence between WISC-R standardized and WISC-R IQ scores. Similarly, the correlation coefficients computed using observed scores and IQ scores are virtually identical. Therefore the statistical evidence of nominal parallelism presented in Table 33 applies to both observed score and IQ score distributions.

Designation of Test Pairs to Equating Methods

As discussed in Chapter III, the critical difference value for the determination of equally reliable tests was set at .06. Test pairs having reliabilities differing by no more than .06 were assigned to observed score equating methods (both linear and equipercentile), while tests whose reli• abilities differed by more than .06 were assigned to the linear true score equating method. These designations are shown in Table 34.

A Comparative Examination of Equating Procedures T. MHVS and WISC-R Verbal •

Table 35 consists of a series of "equivalent" score values for the

MHVS and WISC-R Verbal IQ (B.C. normed) at age 11% which were produced by the different equating procedures. It will be noted that the converted IQ 117

Table 34

Test Pairs Identified for Equating Methods

Disattenuated Age Correlation Reliability Test Pair Level Coefficient Coefficients

Observed Score Equating (Linear and Equipercentile)

WISC-R Verbal: PPVT .68 .86, .88 9h .72 .93, .87 .78 .93, .87 WISC-R Verbal: SIT llh .92 • 93, .88

WISC-R Full Scale: SIT nh .89 .87, .88 WISC-R Performance: SPM nh .68 .82, .83 WISC-R Verbal: MHVS nh .92 .93, .88

True Score Equating (Linear)

WISC-R Verbal: SIT ih .94 .86, .78 9h .76 .93, .79 WISC-R Full Scale: SIT -.7% ..'.79 .,'8'6, .78 9h .80 .88, .79 WISC-R Verbal: MHVS ih .88 .86, .76 9h .82 .93, .83 aWISC-R reliability listed first 118

Table 35

Comparative Example of Linear Equating and Equipercentile Equating for Converting MHVS Raw Scores to WISC-R Verbal IQ Scores (Age Ilk)

rxy = .92, rxx = .93, ryy = .88

Linear Equating Equipercentile Equating MHVS Observed Analytical Raw Score True Score Interpolated Curve-Fitting Scores Y* Y* Graphic Scores Linear Polynomial

(1) (56.5) 49.00 35.78 48.89 7 46.95 45.50 63.0 62.00 49.99 61.49 13 61.69 60.65 69.3 64.00 64.20 65.58/ 15 66.61 65.70 71.7 66.00 68.94 68.57 16 69.06 68.22 72.8 72.00 71.31 70.35 18 73.98 73.27 75.4 74.00 76.04 74.35 19 76.94 75.80 77.0 78.60 78.41 76.50 20 78.89 78.32 78.6 79.20 80.78 78.73 21 81.35 80.85 80.3 81.00 83.15 81.00 22 83.81 83.37 82.5 85.25 85.52 83.31 23 86.26 85.90 84.6 86.50 87.89 85.64 24 88.72 88.42 86.8 87.50 90.26 87.99 25 91.18 90.95 89.2 88.67 92.63 90.36 26 93.64 93.47 91.5 92.60 94.99 92.78 27 96.09 96.00 93.9 93.83 97.36 95.25 28 98.55 98.52 96.8 96.50 99.73 97.79 29 101.01 101.05 99.5 99.50 102.10 100.41 30 103.46 103.57 102.6 101.67 104.47 103.15 31 105.92 106.10 105.7 108.00 106.84 106.00 32 108.38 108.62 108.8 109.00 109.21 108.98 33 110.83 111.15 111.8 112.50 111.58 112.06 34': 113.29 113.67 114.7 117.00 113.94 115.23 35 115.75 116.20 117.5 119.50 116.31 118.44 36 118.20 118.72 120.3 123.00 118.68 121.60 38 123.12 123.77 125.5 125.00 123.42 127.29 39 125.58 126.30 128.0 128.00 125.79 129.47 40 128.03 128.82 130.3 131.00 128.16 130.88 41 130.49 13.1.35 132.9 133.00 130.53.' 131.20

Y.=28.41 RMSQ = RMST = RMSG = RMS]. = RMS RMSp = L 8.87 9.62 9.13 9.47 9.40 9.24 119 values are reported to two decimal places. Since the purpose of this part of the study was to examine the efficacy of equating procedures, it was desirable to retain precision to the second decimal point. In the prepar• ation of equivalency tables for practitioners' use, however, these would be rounded to the nearest whole number (cf. Loret et al., 1974).

In the first column the raw score values for the MHVS are listed.

The obtained score range on this test was 7-41: the value 1 shown in parentheses was derived from the analytical equipercentile procedure and is described in the section for that procedure.

The next two columns contain the results of linear equating: the con• verted WISC-R scores from the observed score and true score procedures are shown in the columns labelled Y*^ and Y* respectively. The results of the equipercentile equating procedures are shown in columns 4 to 7. The converted scores from the hand-graphing procedure are listed first (column

4), followed by the results from the computerized analytical program (columns

5-7). Column 5 shows the interpolated scores, and columns 6 and 7 the con• verted scores resulting from the linear and polynomial curve-fitting proce• dures. Below the table, the following statistics are presented to aid in the interpretation of the results:

Y. = mean of the MHVS raw scores

RMS = conditional root-mean-square error of equating. The subscripts

0, T, G, I, L, and P indicate the correspondence of these errors

to the observed score, true score, graphic, interpolated, linear

curve-fitting, and polynomial curve-fitting columns respectively.

The standard errors of equating for the two linear procedures were omitted from Table 35 because of space limitations. They are shown in Table 36. 120

Table 36

Standard Errors of Equating for MHVS Raw Scores to WISC-R Verbal IQ Scores (Age 11%)

MHVS S S Y* Y* Raw Scores 0 T

7 3.15 3.28

13 2.36 2.44 15 2.10 2.17 16 1.98 2.04 18 1.73 1. 78 19 1.62 1.66

20 1.50 1.54 21 1.40 1.42 22 1.29 1.31 23 1.20 1.21 24 1.12 1.12 25 1.04 1.04 26 .99 .98 27 .95 .92

28 .93 .91 29 .93 .92 30 .96 .94 31 1.00 .99

32 1.06 1.05 33 1.31 1.14

34 1.22 1.23

35 1.31 1.33 36 1.42 1.44

38 1.64 1.68

39 1.76 1.81 40 1.88 1.93

41 2.00 2.06 121

Linear equating. As shown in Table 35, column 2, the converted values for the linear observed score procedure ranged from 46.95 to 130.49. The actual obtained score range was 49 to 133. Taken across the actual scores, the conditional root-mean-square error was 8.87. The corresponding error for the linear true score procedure was slightly larger: 9.62. The extent of the discrepancies between observed and converted scores are shown more fully in Figure 7. This diagram shows a scatterplot of the obtained scores with both linear conversion lines superimposed on it. Discrepancies between an observed WISC-R Verbal IQ and the corresponding equated value for a given

MHVS raw score are represented by the length of the vertical segment between an observed score point (represented by a closed circle) and the conversion line (e.g., see vertical segment A).

Graphic equipercentile equating. The converted scores listed in the column of Table 35 labelled "Graphic" were derived following the steps out• lined in Chapter III and illustrated on the following pages. Table 37 shows the preparation of the relative cumulative frequency distributions for the two tests. The graphs of these values are pictured in Figure 8, and equi• percentile points corresponding to selected percentiles from the ogives are recorded in Table 38. The actual conversion line represents the smoothed graph of the equipercentile points plotted against each other (Figure 9).

The final equivalent values for converting MHVS raw scores to WISC-R Verbal

IQs, reported in column 4, Table 35, were read from the graph in Figure 9.

As shown, the converted scores corresponding to the obtained MHVS scores ranged from 63.0 to 132.9. The root-mean-square error of equating was

9.13. 0. 10 20 30 40 50 MHVS RAW SCORE

Figure 7. Scatterplot of obtained scores with linear observed score and true score equating conversion lines. MHVS to WISC-R Verbal IQ, age 11.%. 123

Table 37

Distributions of MHVS Raw Scores and WISC-R Verbal IQ Scores, Age 11%

MHVS WISC-R Raw Cumulative % Verbal Cumulative % Score Frequency Frequency Below IQ Frequency Frequency Below

41 1 105 99.0 133 1 105 99.0 40 1 104 98.1 131 1 104 98.1 39 1 103 97.2 128 1 103 97.1 38 2 102 95.2 125 1 102 96.2 36 4 100 91.4 124 1 101 95.2 35 6 96 85.7 123 1 100 94.3 34 5 90 81.0 121 4 99 90.5 33 8 85 73.3 119 5 95 85.7 32 4 77 69.5 117 1 90 84.8 31 16 73 63.8 115 2 89 82.9 30 4 67 60.0 113 4 87 79.0 29 7 63 43.8 112 5 83 74.3 28 4 46 40.0 111 1 78 73.3 27 7 42 33.3 109 4 77 69.5 26 7 35 26.7 108 2 73 67.6 25 5 28 21.9 107 3 71 64.8 24 2 23 20.0 105 5 68 60.0 23 4 21 16.2 104 4 63 56.2 22 4 17 12.4 103 3 59 53.3 21 3 13 9.5 101 1 56 52.4 20 1 10 8.6 100 4 .55 48.6 19 3 9 5.7 99 3 51 45.7 18 1 6 4.8 97 4 48 41.9 16 1 5 3.8 96 1 44 41.0 15 1 4 2.9 94 6 43 35.2 13 1 3 1.9 93 5 37 30.5 7 2 2 92 1 32 29.5 90 1 31 28.6 89 6 30 22.9 88 4 24 19.0 86 4 20 15.2 85 1 16 14.3 84 1 15 13.3 82 1 14 12.4 81 5 13 7.6 78 1 8 6.7 76 2 7 4.8 72 1 5 3.8 66 2 4 1.9 62 1 2 1.0 49 1 1 124

MHVS RAW SCORE 10 20 30 40 50 i 1 1 1 1

1 ' l l l 1 L I 1 60 70 80 90 100 110 120 130 WISC-R VERBAL IQ

Figure 8. Cumulative relative frequencies for MHVS raw scores and WISC-R Verbal IQs, age 11%. Table 38

Equipercentile Points for MHVS and WISC-R Verbal IQ from Graphic Procedure

Percentile WISC-R Rank MHVS Verbal

0.5 6.5 62.5 0.7 8.0 64.0 1.0 9.5 65.7 1.4 11.3 67.5 1.8 12.6 68.8 2.0 13.2 69.5 3 15.0 71.8 4 16.8 73.6 5 17.8 75.3 8 20.0 .78'. 7 12 22.0 82.2 17 23.5 85.5 20 24.3 87.2 24 25.2 89.4 30 26.4 92.3 34 27.0 94.2 40 28.0 96.8 46 29.0 99.5 50 29.5 101.0 54 30.0 102.6 60 31.0 105.0 66 31.7 107.5 70 32.3 109.2 74 32.8 111.0 78 33.5 112.8 82 34.2 115.5 85 34.8 116.8 88 35.5 118.4 90 36.0 119.8 93 37.0 122.2 95 37.7 124.3 97 39.0 127.3 98 39.8 129.5 98.4 40.3 130.5 98.8 40.8 132.0 126

I | I I I 1 1 10 20 30 40 50 MHVS RAW SCORE

Figure 9. Hand-graphed equipercentile equating conversion line. MHVS to WISC-R Verbal IQ, age 11%. 127

Analytical equipercentile equating. As described in Chapters II and

III, the first step of the analytical procedure was to produce interpolated distributions for both MHVS raw scores and WISC-R Verbal IQ scores. The converted values reported in Table 35 correspond only to the obtained MHVS scores with one exception: the score of 1, shown in parentheses, was pro• duced by the interpolation procedure to correspond to the obtained WISC-R

Verbal IQ of 49.

The second step of the procedure, curve-fitting, represented an attempt to fit a regression line to the interpolated score distributions using both a linear and a polynomial functional equation. The bivariate interpolated score distribution and the linear and polynomial regression lines are shown in Figure 10. The root-mean-square errors corresponding to each of these were 9.47, 9.40, and 9.24 respectively. As in Figure 7, the discrepancy between an observed and a converted score is indicated by the length of the vertical segment A.

The conditional root-mean-square errors of equating (RSM and RSM )

J-J L were derived from the linear and polynomial regression equations, respect-', tively, generated by the curve-fitting procedures. These represent a dif• ferent type of error estimate than that reported by Lindsay and Prichard

(1971) as will be discussed in a later section.

II. PPVT and WISC-R Verbal

The results of the second example, in which each of the equating proce• dures was applied to PPVT IQ and WISC-R Verbal IQ scores at age 11%, are presented in Tables 39 and 40. These tables are identical in format to

Tables 35 and 36 for the first example. Figure 11 shows a scatterplot of the obtained scores used in the linear equating procedures and includes the two linear conversion lines produced by the observed score and true score 128

130 h

Interpolated Scores •— Linear — Polynomial

40 10 20 30 40 50 MHVS RAW SCORE

Figure 10. Scatterplot of obtained scores with interpolated, linear curve-fitting, and polynomial curve-fitting conversion lines: analytical equipercentile equating procedure. MHVS to WISC-R Verbal IQ, age 11%. 129

Table 39

Comparative Example of Linear Equating and Equipercentile Equating for Converting PPVT IQs to WISC-R Verbal IQs (Age 7k)

RXY = -68' RXX = -86' RXY = -88

' Linear Equating Equipercentile Equating Observed Analytical PPVT Score True Score Interpolated Curve- -Fitting

Y Graphic Scores Linear IQ *o Y*T Polynomial

70 71.19 71.52 (<70.0) 70.00 71.36 71.12 75 76.24 76.51 73.0 71.33 76.25 69.43 77 77.92 78.18 74.6 72.67 77.88 71.53 79 79.61 79.84 76.0 76.00 79.51 74.24 80 81.29 81.51 76.8 76.50 81.14 77.29 82 82.98 83.17 78.7 80.25 82.77 80.45 84 84.66 84.84 81.5 81.00 84.41 83.56 86 86.34 86.50 85.4 85.50 86.04 86.50 87 88.03 88.16 87.5 88.00 87.67 89.19 89 89.71 89.83 91.0 93.67 89.30 91.62 91 91.39 91.49 93.7 95.20 90.93 93.77 93 93.08 93.16 96.4 96.60 92.56 95.66 94 94.76 94.82 97.6 99.00 94.20 97.34 96 96.44 96.49 99.7 99.67 95.82 98.83 98 98.13 98.15 101.4 99.83 97.46 100.18 100 99.81 99.82 102.8 101.29 99.08 101.45 101 101.50 101.48 103.6 101.57 100.72 102.68 103 103.18 103.14 104.8 102.43 102.35 103.92 105 104.86 104.81 106.0 104.67 103.98 105.18 107 106.55 106.47 107.3 107.14 105.61 106.52 108 108.23 108.14 108.0 107.86 107.24 107.92 110 109.91 109.80 109.4 109.50 108.88 109.42 112 111.60 111.46 111.0 111.50 110.51 110.99 114 113.28 113.13 112.6 112.75 112.14 112.61 115 114.96 114.79 113.4 113.00 113.77 114.26 117 116.65 116.46 115.0 115.00 115.40 115.91 119 118.33 118.12 116.7 119.00 117.03 117.52 121 120.02 119.79 118.4 120.50 118.67 119.07 124 123.38 123.12 121.0 122.00 121.93 121.78 128 126.75 126.44 124.7 123.50 125.18 123.90 130 128.43 128.11 126.6 124.00 126.82 124.75 133 ' 131.80 131.44 129.7 126.33 130.08 126.35 135 133.48 133.10 131.6 127.67 131.71 127.32 137 135.17 134.77 133.8 129.00 133.34 128.65

Y.=100 .00 RMSQ = RMSm = RMSG = RMS = RMS = RMSp = T Li 13.08 13.00 13.18 12.94 12.89 12.90 Table 40

Standard Errors of Equating for PPVT to WISC -R Verbal IQ Scores (Age 7k)

PPVT IQs T

7:0 •2.50 2.53 75 2.18 2.19 77 2.07 2.08 79 1.97 1.98 80 1.87 1.87 82 1.78 1.77 84 1.69 1.68 86 1.60 1.59 87 1.52 1.50 89 1.45 1.43 91 1.39 1.36 93 1.33 1.30 94 1.29 1.25 96 1.25 1.21 98 1.23 1.19 100 1.22' 1.18 101 1.23 1.18 103 1.24 1.20 105 1.28 1.23 107 1.32 1.28 108 1.37 1.34 110 1.43 1.40 112 1.50 1.48 114 1.58 1.56 115 1.66 1.65 117 1.75 1.74 119 1.84 1.84 121 1.94 1.94 124 2.14 2.16 128 2.36 2.38 130 . 2.47 2.50 133 2.69 2.73 135 2.81 2.85 137 2.92 2.97 131

• Obtained Scores

150 — Observed Score Equating —— True Score 140 Equating

80 90 100 110 120 130 PPVT IQ

Figure 11. Scatterplot of obtained scores with linear observed score and true score equating conversion lines. PPVT IQ to WISC-R Verbal IQ, age 1\. 132 methods. The root-mean-square errors corresponding to these are 13.08 and

13.00 respectively. These are somewhat larger than the errors associated with the linear equating procedures in the previous example. It should be noted, however, that the correlation in this example is lower.

The tables and graphs illustrating the graphic equipercentile procedure for these tests appear in Appendix G. A graph of the interpolated score distribution is illustrated in Figure 12, together with the linear and poly• nomial regression lines resulting from the analytical procedure. The mean- root-square errors corresponding to the three score distributions represented

in Figure 12 are 12.94, 12.89, and 12.90 respectively.

Summary. The characteristics of the equating methods and procedures which were demonstrated in the examples in this section are summarized below.

1. Linear Observed Score vs. Linear True Score Equating

The converted WISC-R score distributions resulting from the application of the observed and true score linear procedures are very similar. Likewise the distributions of the standard errors of equating and the size of the conditional root-mean-square errors of equating associated with the two pro• cedures are similar. Thus no obvious advantages are accrued by using the true score procedure, at least for the range of discrepancies between the reliability coefficients reported in the present study.

2. Standard Errors of Equating

The standard errors of equating associated with the linear procedures show a greater precision of converted scores closer to the mean. The increase in the size of the standard errors in the extremes of the distribu• tions indicates more instability of converted scores in those areas. 133

Interpolated Scores 150 Linear Polynomial 140

90 100 110 120 130 PPVT IQ

Figure 12. Scatterplot of obtained scores with interpolated, linear curve-fitting, and polynomial curve-fitting conversion lines: analytical equipercentile equating procedure. PPVT IQ to WISC-R Verbal IQ, age 1\. 134

3. Graphic vs. Interpolated Equipercentile Equating

The analytical procedure was included in the present study for compar• ison with the traditional hand-graphing procedure. This comparison involves only columns 4 and 5 in Tables 35 and 39 (i.e., those columns labelled

"graphic" and "interpolated scores"). As can be seen, these two score distributions are very similar. Again, differences are most evident in the extremes of the distributions where data for graphing are scant. Despite these differences, however, examination of the conditional root-mean-square errors of equating reveals that the two procedures are comparable.

4. The Curve-Fitting Procedures

Curve-fitting, through the application of linear and polynomial regres•

sion procedures, was included by Lindsay and Prichard (1971) as a further effort to define the shape of the conversion line. As can be seen by com• paring columns 5, 6, and 7, the distributions of predicted scores are similar

to the distribution of interpolated scores. This is reflected in the size of the respective root-mean-square errors.

5. Linear vs. Equipercentile Equating

The linear and equipercentile conversion procedures produced very sim•

ilar results in the central areas of the distributions. Differences in

converted score values are most pronounced in the extremes. However, the

conditional root-mean-square errors are of similar magnitude for both linear

and equipercentile procedures.

A further comment on curve-fitting. In their analytical procedure,

Lindsay and Prichard (1971) relied on the standard error of estimate from

the regression procedure as a measure of equating error. This, however,

is misleading. By definition, the equal-percentile (i.e., interpolated)

score distributions are highly correlated. Using the first example given 135 2 in this section for illustration, the value of R (coefficient of deter• mination) associated with the interpolated scores is .9719. Consequently, a regression equation can be found which will accurately reproduce the con• version line. The standard errors of estimate associated with the linear and polynomial regression equations for this example are 1.48 and 1.42 respec• tively. These values however, are indices of the accuracy of fit of the predicted lines to the equal-percentile line and should not be construed as a measure of equivalency. The appropriate measure is the discrepancy between the observed scores and the converted scores, as is reported in the conditional root-mean-square error of equating. The corresponding values for this measure are 9.40 and 9.24.

Equating Results

The test pairs identified for equating were listed in Table 34. With the exception of the two pairs used as examples, the tables of equivalent score values for each of these test pairs are presented in this section.

Since the observed score and true score procedures were found to be virtually identical, the true score method was excluded, and both linear and equiper• centile procedures were applied to all test pairs. As stated in Chapter

III, the graphic equipercentile procedure was applied to only the test pairs in the two examples. Because of the close relation between the graphic and interpolated procedures, there was no advantage in producing the graphic converted scores as well. Finally, the third step of the analytical equi• percentile program, namely the graphic representation of the goodness-of- fit of the curve—fitting procedure was omitted.

To test the adequacy of the assumption of identical test score distri• butions, the linear equating procedure was reapplied to the normalized dis• tributions of the test pairs identified as having equal reliabilities. 136

This resulted in no improvement in precision of equating as indicated by

the error estimates. The tables reporting these equivalent score values

appear in Appendix H.

The equating results are presented in Tables 41 to 51. In each

case, the obtained scores for the test to be equated are listed in column

1, the linear equated scores (Y*) and their associated standard errors

(S ^) in columns 2 and 3 respectively, and the results of the analytical

equipercentile program in columns 4, 5, and 6. The conditional root-mean-

square errors of equating, RMS , RMS , and RMS , are interpreted as they

\J LJ IT were in the previous examples and appear below the columns to which they

apply. The results are presented in the following order:

PPVT to WISC-R Verbal,

SIT to WISC-R Verbal and Full Scale,

SPM to WISC-R Performance, and

MHVS to WISC-R Verbal.

The results shown.in Tables 41 to 51, included at the end of this

chapter, substantiate those summarized for the two examples given earlier.

As shown by the conditional root-mean-square errors, there was very little

difference between the linear and equipercentile procedures. Consequently,

there is no preference for one method over the other in terms of the accuracy

of the converted scores in relation to the obtained scores.

The size of the root-mean-square errors can be judged both in compari•

son to those found for the Anchor Test Study, and in a practical sense

as they pertain to test interpretation. In reference to the former, Linn

(1975) reported that "the estimated error for all tests was generally less

than one raw score point (substantially less in most cases)" (p. 207).

In the present study, the mean conditional root-mean-square errors of 137 equating taken across all 13 test pairs were 11.46, 11.34, 11.48, and 12.56 for linear equating, interpolated scores, linear curve-fitting, and poly• nomial curve-fitting respectively. In reference to score interpretation, this suggests that a discrepancy of as much as + 12 score points would be expected between an observed and a converted score value. The close corres• pondence between standardized WISC-R scores and WISC- R IQs was referred to in Chapters III and IV. Therefore, although the results reported here are in the standardized score metric, they would be virtually identical for IQ scores.

Using the obtained WISC-R scores as criteria of the accuracy of conver• sion, discrepancies of + 12 score points can be seen to fluctuate across two or three of the WISC-R intelligence classifications. Deviations of this magnitude are intolerable for scores on which educational decisions are based. Therefore the errors were interpreted as indices of the non- equatability of the test pairs in this study. This finding and its impli• cations are discussed in the final chapter. 138

Table 41

Equivalent Scores PPVT Raw Scores to Standardized WISC-R Verbal Scores (Age 9%)

RXY = -?2' RXX = -93' RYY = -87

Linear Equating Equipercentile Equating PPVT Interpolated Curve-Fitting Raw Scores Y* S ^. Scores Linear Polynomial

51 48.30 3.88 58.85 49.28 58.28 59 61.47 3.00 75.87 62.42 75.89 62 66.41 2.68 75.97 67.35 76.73 63 68.05 2.57 76.95 68.99 76.90 65 71.34 2.36 77.18 72.27 77.34 67 74.64 2.16 77.85 75.56 78.11 69 77.93 1.97 79.66 78.84 79.36 70 79.57 1.88 79.95 80.48 80.19 71 81.22 1.79 80.35 82.13 81.16 73 84.52 1.62 80.54 85.41 83.52 74 86.16 1.54 85.57 87.05 84.89 75 87.81 1.47 86.93 88.70 86.38 76 89.45 1.40 88.81 90.34 87.97 77 91.10 1.34 90.76 91.98 89.65 78 92.75 1.29 91.38 93.62 91.41 79 94.39 1.25 92.21 95.26 93.22 80 96.04 1.21 93.22 96.91 95.08 81 97.68 1.19 96.12 98.55 96.97 82 99.33 1.18 98.58 100.19 98.88 83 100.98 1.18 101.92 101.83 100.78 84 102.62 1.19 102.70 103.48 102.68 85 104.27 1.22 104.11 105.12 104.56 86 105.91 1.25 105.90 106.76 106.42 87 107.56 1.30 108.71 108.40 108.25 88 109.21 1.35 109.92 110.05 110.05 89 110.85 1.41 111.20 111.69 111.83 90 112.50 1.48 112.68 113.33 113.61 91 114.14 1.55 118.12 114.97 115.38 92 115.79 1.63 119.07 116.61 117.18 93 117.44 1.72 119.22 118.26 119.03 94 119.08 1.80 119.29 119.90 120.97 95 120.73 1.90 121.03 121.54 123.02 96 122.37 1.99 126.58 123.18 125.25 97 124.02 2.08 128.07 124.83 127.70 98 125.67 2.18 130.04 126.47 130.44 99 127.31 2.28 131.09 128.11 133.54 101 130.60 2.49 145.07 131.39 141.15

Y.=82.41 RMSQ = RMSj = RMSp = RMSL = 12.65 12.10 12.65 12.21 •139

Table 42

Equivalent Scores PPVT Raw Scores to Standardized WISC-R Verbal Scores (Age 11%)

rXY = '78' rXX = *93' rYY = -87

PPVT Linear Equating Equipercentile Equating Raw Scores Int erpolated Curve—Fitt ing Y* S_„ Scores Linear Polynomial

57 50.40 3.60 43.55 50.08 44.79 64 60.51 2.94 65.56 62.84 65.34 75 76.57 1.97 66.67 78.18 74.01 77 79.48 1.81 75.10 80.97 76.94 78 80.93 1.73 77.73 82.36 78.58 80 83.84 1.58 81.54 85.15. 82.15 81 85.29 1.51 86.54 86.55 84.04 82 86.75 1.45 89.24 87.94 85.98 83 88.20 1.38 89.71 89.34 87.95 84 89.66 1.33 89.72 90.73 89.92 8-5 91.11 1.28 89. 79 92''. 13' 91.89 86 92.56 1.23 91.49 93.52 93.84 87 94.02 1.19 94.67 94.92 95.74 88 95.47 1.16 97.27 96.31 97.60 89 96.92 1.14 100.53 97.70 99.38 90 98.38 1.13 102.21 99.10 101.10 91 99.83 1.12 103.55 100.49 102.73 92 101.29 1.12 105.26 101.89 104.29 93 102.74 1.14 105.56 103.28 105.75 95 105.65 1.19 106.53 106.07 108.43 96 107.10 1.22 108.33 107.47 109.65 97 108.56 1.27 110.48 108.86 110.80 98 110.01 1.32 110.92 110.26 111.90 99 111.46 1.37 112.31 111.65 112.94 100 112.92 1.43 113.21 113.04 113.94 101 114.37 1.50 116.83 114.44 114.91 102 115.83 1.56 117.09 115.83 115.87 103 117.28 1.64 118.41 117.23 116.83 105 120.19 1.79 119.22 120.02 118.78 106 121.64 1.87 120.72'. 121.41 119.79 107 123.10 1.95 121.13 122.81 120.84 108 124.55 2.03 121.78 124.20 121.92 109 126.00 2.12 121.84 125.59 123.03 110 127.46 2.20 122.46 126.99 124.17 112 130.37 2.38 125.41 129.78 126.44 113 131.82 2.46: 128.49 131.17 127.53 119 140.54 "•3..;02 130.69 139.54 130.26

=91.12 RMSQ = RMS = 'RMS = RMSp =

11.43 10.92 11.24 11.19 140

Table 43

Equivalent Scores SIT Raw Scores to Standardized WISC-R Verbal Scores (Age 1\)

RXY = '94' RXX = '86' RYY = *78

Linear Equating Equipercentile Equating SIT Interpolated Curve-Fitting Raw Scores Y* S Scores Linear Polynomial Y"

88 49.80 3.24 65.87 50.00 65.87 94 66.12 2.30 72.42 69.44 72.41 96 71.56 2.00 73.29 72.22 72.08 97 74.28 1.86 74.26 75.00 72.80 98 77.00 1.72 75.31 77.77 74.54 99 79.72 1.58 75.95 80.55 77.12 100 82.44 1.46 78.80 83.33 80.34 101 85.16 1.34 80.35 86.11 83.97 102 87.88 1.23 89.52 88.88 87.76 103 90.60 1.14 93.49 91.66 91.54 104 93.32 1.06 97.34 94.44 95.12 105 96.04 1.01 98.34 97.22 98.42 106 98.76 0.98 100.57 99.99 101.38 107 101.48 0.98 102.52 102.77 104.03 108 104.20 1.01 106.30 105.55 106.44 109 106.92 1.07 109.15 108.32 108.72 110 109.64 1.14 111.42 111.10 111.00 111 112.36 1.24 112.75 113.88 113.43 112 115.08 1.35 117.98 116.66 116.13 113 117.80 1.47 118.67 119.43 119.13 114 120.52 1.60 122.90 122.21 122.38 115 123.24 1.73 125.46 124.99 125.70 116 125.96 1.87 126.72 127.77 128.67 118 131.40 2:: 16 132.51 133.32 130.65

Y.=106.45 RMSQ = RMS]. = RMS = RMSp = Li 10.19 10.28 10.34 12.18 141

Table 44

Equivalent Scores SIT Raw Scores to Standardized WISC-R Full Scale Scores (Age 1\)

RXY = '79' RXX = -86' RYY = *78

Linear Equating Equipercentile Equating SIT Interpolated Curve-Fitting

Raw Scores Y* S v Scores Linear Polynomial

88 49.80 3.87 62.96 49.36 62.82 94 66.12 2.76 68.62 66.13 68.80 96 71.56 2.41 71.27 71.72 71.57 97 74.28 2.24 73.37 74.52 73.66 98 77.00 2.07 76.12 77.31 76.15 99 79.72 1.91 78.71 80.11 78.95 100 82.44 1.76 81.70 82.90 81.96 101 85.16 1.62 84.23 85.70 85.07 102 87.88 1.50 87.62 88.49 88.20 103 90.60 1.39 93.21 91.29 91.28 104 93.32 1.30 95.79 94.08 94.24 105 96.04 1.24 97.29 96.88 97.07 106 98.76 1.21 98.03 99.67 99.76 107 101.48 1.21 102.56 102.47 102.35 108 104.20 1.25 104.84 105.26 104.88 109 106.92 1.31 106.71 108.06 107.43 110 109.64 1.40 108.57 110.85 110.06 111 112.36 1.51 112.52 113.65 112.85 112 115.08 1.64 117.52 116.45 115.87 113 117.80 1.78 118.49 119.24 119.15 114 120.52 1.93 123.21. 122.04 122.68 115 123.24 2.09 126.64 124.83 126.40 116 125.96 2.25 127.50 127.63 130.14 118 131.40 2.60 137.99 133.22 136.54

Y.=106.45 RMSQ = RMS]. = RMS = RMSp = J-j

12.63 12.58 12.84 17.14 142

Table 45

Equivalent Scores SIT Raw Scores to Standardized WISC-R Verbal Scores (Age 9%)

rXY = '76' rXX = >93' rYY = -79

Li-near-i-Equat ing Equipercentile Equating SIT Interpolated Curve-Fitting Raw Scores Y* Scores Linear Polynomial

90 43.62 4.24 58.85 45.56 58.64 105 73.62 2.25 75.86 75.05 75.94 107 77.62 2.01 76.95 78.98 78.90 108 79.62 1.89 79.66 80.95 80.59 109 81.62 1.78 80.54 82.92 82.38 110 83.62 1.68 83.32 84.88 84.24 111 85.62 1.58 85.77 86.85 86.17 112 87.62 1.49 90.42 88.81 88.14 113 89.62 1.41 91.11. 90.78 90.14 114 91.62 1.34 93.21 92.74 92.14 115 93.62 1.28 95.56 94.71 94.13 116 95.62 1.23 96.12 96.68 96.12 117 97.62 1.20 96.60 98.64 98.10 118 99.62 1.19 98.58 100.61 100.06 119 101.62 1.20 101.61 102.57 102.02 120 103.62 1.22 102.70 • 104.54 103.98 121 105.62 1.26 105.33 106.51 105.95 122 107.62 1.31 107.58 108.47 107.94 123 109.62 1.38 109.18 110.44 109.97 124 111.62 1.46 111.20 112.40 112.06 125 113.62 1.54 117.74 114.37 114.20 126 115.62 1.64 119.29 116.33 116.41 127 117.62 1.74 119.79 118.30 118170 128 119.62 1.85 121.03 120.27 121.06' 129 121.62 1.96 121.22 122.23 123.49 130 123.62 2.08 125.86 124.20 125.96 131 125.62 2.20 128.07 126.16 128.43 133 129.62 2.45 130.04 130.10 133.18 135 133.62 2.71 145.07 134.03 137.12

:118.19 RMS0 = RMSI = RMS = RMSp =

12.53' 12.11 12.46 14.08 143

Table 46

Equivalent Scores SIT Raw Scores to Standardized WISC-R Full Scale Scores (Age 9%)

RXY = "8°' RXX = '86' RYY = '76

Linear Equating Equipercentile Equating SIT Interpolated Curve- Fitting Raw Scores Y* S ^ Scores Linear Polynomial

90 43.62 4.14 59.41 46.42 59.79 105 73.62 2.19 69.05 75.46 72.90 107 77.62 1.96 76.36 79.33 77.07 108 79.62 1.85 78.24 81.27 79.35 109 81.62 1.74 81.86 83.21 81.70 110 83.62 1.64 82.75 85.14 84.08 111 85.62 1.54 85.55 87.08 86.45 112 87.62 1.45 90.64 89.02 88.78 113 89.62 1.37 91.11 90.95 91.04 114 91.62 1.30 94.67 92.89 93.22 115 93.62 1.24 97.00 94.83 95.31 116 95.62 1.20 97.47 96.76 97.31 117 97.62 1.17 97.75 98.70 99.21 118 99.62 1.16 99.55 100.63 101.04 119 101.62 1.16 101.79 102.57 102.81 120 103.62 1.18 103.31 104.51 104.55 121 105.62 1.22 106.01 106.44 106.28 122 107.62 1.28 109.46 108.38 108.04 123 109.62 1.34 110.87 110.32 109.85 124 111.63 1.42 112.34 112.25 111.75 125 113.62 1.50 114.02 114.19 113.74 126 115.62 1.60 115.87 116.13 115.85 127 117.62 1.70 119.66 118.06 118.08 128 119.62 1.80 121.41 120.00 120.40 129 121.62 1.92 121.54 121.94 122.79 130 123.62 2.03 124.59 123.87 125.19 131 125.62 2.15 128.29 125.81 127.51 133 129.62 2.39 129.21 129.68 131.39 135 133.62 2.64 136.78 133.55 132.97

Y.=118.19 RMSQ = RMSj = RMS = RMSp =

12.14 9.79 11.97 17.,83 144

Table 47

Equivalent Scores SIT Raw Scores to Standardized WISC-R Verbal Scores (Age 11%)

rXY = -92' rXX = -93' rYY = '88

Linear Equating Equipercentile Equating PPVT Interpolated Curve-Fitting Raw Scores Y* S..... Scores Linear Polynomial

93 46.09 3.13 43.55 47.51 43.57 109 69.90 1.89 62.02 70.91 59.18 112 74.36 1.68 65.56 75.29 70.43 113 75.85 1.61 73.26 76.76 73.59 114 77.34 1.54 77.73 78.22 76.47 116 80.31 1.40 81.61 81.14 81.40 117 81.80 1.34 83.84 82.60 83.49 118 83.29 1.28 84.14 84.07 85.37 119 84.78 1.22 88.70 85.53 87.06 120 86.27 1.16 89.02 86.99 88.60 121 87.75 1.11 90.10 88.45 90.01 122 89.24 1.06 90.76 89.92 91.32 123 90.73 1.02 91.28 91.38 92.58 124 92.22 .98 92.74 92.84 93.79 125 93.71 .94 95.37 94.30 9 4..$'9 126 95.19 .92 95.88 95.76 96.20 127 96.68 .89 97.27 97.23 97.44 128 98.17 .88 98.99 98.69 98.71 129 99.66 .88 100.65 100.15 100.05 130 101.15 .88 101.05 101.61 101.44 131 1G2>. 63 .89 103.39 103.08 102.89 132 104.12 .90 104.94 104.54 104.41 133 105.61 .93 105.42 106.00 105.98 134 107.10 .96 107.58 107.46 107.60 135 108.59 1.00 109.03 108.93 109.25 136 110.07 1.04 110.46 110.39 110.92 137 111.56 1.09 112.14 111.85 112.59 138 113.05 1.14 115.33 113.31 114.24 139 114.54 1.19 116.08 114.77 115.85 140 116.03 1.25 117.10 116.24 117.39 141 117.51 1.31 120.72 117.70 118.85 144 121.98 1.51 121.13 122.09 122.55 145 123.47 1.57 122.46 • 123.55 123.50 150 130.91 1.93 125.42 130.86," 126.32 151 132.39 2.07 128.49 132.32 126.67 156 139.83 2.32 130.69 139.63 131.23

Y.=129.23 RMSQ = RMS]. = RMS .= RMSp =

8.78 9.00 8.72 9.26 145

Table 48

Equivalent Scores SIT Raw Scores to Standardized WISC-R Full Scale Scores (Age 11%)

rXY = '89' rXX = -87' rYY = -88

Linear Equating Equipercentile Equating SIT Interpolated Curve-Fitting Raw Scores Y* S„, Scores Linear Polynomial

93' 46.09 3'-. 5.1 •3'4.46 47.50 34.50 109 69.89 2.13 61.98 70.93 59.99 112 74.36 1.88 68.51 75.32 71.09 113 75.85 1.80 75.92 76.79 74.23 114 77.33 1.73 76.41 78.25 77.08 116 80.31 1.58 78.49 79.72 79.66 117 81.80 1.51 79.87 81.18 81.99 118 83.29 1.44 84.70 82.65 84.07 119 84.77 1.37 84.76 84.11 85.95 120 86.26 1.31 88.92 87.04 89.16 121 87.75 1.25 92.49 88.50 90.56 122 89.24 1.20 92.89 89.97 91.85 123 90.73 1.15 93.56 91.43 93.07 124 92.21 1.10 94.71 92.90 94.24 125 93.70 1.07 96.07 94.36 95.39 126 95.19 1.03 96.49 95.82 96.53 127 96.68 1.01 97.96 97.29 97.69 128 98.17 1.00 98.94 98.75 98.89 129 99.65 .99 99.38 100.22 100.13 130 101.14 .99 99.74 101.69 101.43 131 102.63 1.00 101.24 103.15 102.79 132 104.12 1.02 104.1.7 104.61 104.22 133 105.61 1.05 106.25 106.08 105.70 134 107.09 1.09 107.86 107.54 107.25 135 108.58 1.13 109.31 109.01 108.84 136 110.07 1.17 109.61 110.47 110.47 137 111.56 1.23 111.60 111.94 112.12 138 113.05 1.27 114.94 113.40 113.78 139 114.53 1.34 116.09 114.86 115.42 140 116.02 1.41 116.79 116.33 117.04 141 117.51 1.48 119.86 117.79 118.60 144 121.97 1.69 120.65 122.19 122.77 145 123.46 1.77 123.01 123.65 123.93 150 130.90 2.17 127.45 130.97 127.63 151 132.39 2.25 129.06 132.44 128.00 156 139.83 2.68 129.73 139.76 130.02

Y.=129.23 RMSQ = RMS.,. = RMS = RMSp = Li 9.91 9.79 9.85 10.62 146

Table 49

Equivalent Scores SPM Raw Scores to Standardized WISC-R Performance Scores (Age 11%)

rxy = .68, rxx = .82, ryy = .83

Linear Equating Equipercentile Equating SPM Interpolated Curve-Fitting Raw Scores Scores Linear - Polynomial

15 48.21 4.38 40.05 49.08 40.07 20 59.68 3.52 68.81 60.59 68.52 24 68.85 2.85 70.46 69.80 71.11 26 73.44 2.53 71.93 74.41 73.41 27 75.74 2.38 75.99 76.71 75.14 28 78.03 2.23 77.97 79.01 7.21 29 80.32 2.09 78.32 81.31 79.59 30 82.62 1.95 80.71 83.62 82.21 31 84.91 1.82 85.45 85.92 84.99 32 87.21 1.71 86.49 88.22 87.86 33 89.50 1.60 89.94 90.52 90.73 34 91.79 1.51 93.98 92.82 93.54 35 94.09 1.44 97.90 95.13 96.23 36 96.38 1.39 98.62 97.43 98.77 37 98.68 1.37 101.15 99.73 101.14 38 100.97 1.36 103.50 102.04 103.33 39 103.26 1.39 104.57 104.34 105.37 40 105.56 1.43 106.22 106.64 107.28 41 107.85 1.50 108.04 108.95 109.12 42 110.15 1.59 111.61 111.25 110.93 43 112.44 1.69 114.73 113.55 112.78 45 117.03 1.93 117.87 118.15 116.83 46 119.32 2.07 118.08 120.46 119.11 47 121.62 2.21 119.47 122.76 121.58 48 123.91 2.36 125.04 125.06 124.22 49 126.21 2.51 127.39 127.36 126.95 53 135.38 3.15 134.78 136.57 134.78

Y.=37.57 RMSQ = RMS]. = RMS = RMS P Ju 14.11 14.37 14.15 14.28 147

Table 50

Equivalent Scores MHVS Raw Scores to Standardized WISC-R Verbal Scores (Age 7%)

= .86, r„„ = .76 XY XX YY

Linear Equating Equipercentile Equating MHVS Interpolated Curve- Fitting Raw Scores Y* Scores Linear Polynomial

5 61.46 2.76 65.87 63.03 66.34 7 68.86 2.32 71.77 70.43 71.05 8 72.56 2.10 74.26 74.12 74.16 9 76.26 1.90 78.32 77.82 77.56 10 79.96 1.70 79.27 81.52 81.12 11 83.67 1.52 84.41 85.22 84.78 12 87.37 1.36 89.86 88.92 88.49 13 91.07 1.22 92.94 92.63 92.24 14 94.77 1.12 95.48 96.32 96.03 15 98.48 1.08 99.99 100.02 99.86 16 102.18 1.08 102.59 103.72 103.73 17 105.88 1.14 106.93 107.42 107.66 18 109.58 1.24 112.41 111.12 111.62 19 113.28 1.38 116.95 114.82 115.59 20 116.98 1.55 118.09 118.52 119.48 21 120.69 1.73 124.70 122.22 123.21 22 124.39 1.93 126.60 125.92 126.61 23 128.09 2.14 129.28 129.62 129.49 24 131.79 2.35 131.13 133.32 131.59 26 139.20 2.80 132.51 140.72 132.01

Y.=15.41 RMS0 = RMS]. = RMS = RMSp = Li

11.44 11.50 11.54 11.44 148

Table 51

Equivalent Scores MHVS Raw Scores to WISC-R Standardized Verbal Scores (Age 9%)

RXY = *82' RXX = '93'R YY = '83

. . _ . Equipercentile Equating Linear Equating 3— s- MHVS 3 a Interpolated Curve- Fi.t.f.ing Raw Scores Y* S ^ Scores Linear Polynomial

6 51.39 3.34 58.85 53.63 58.83 9 59.70 2.84 75.86 61.83 75.72 12 68.01 2.34 75.97 70.03 77.33 14 73.55 2.02 77.18 75.50 78.33 15 76.32 1.87 79.66 78.24 79.47 16 79.09 1.73 79.95 80.97 81.08 17 81.86 1.59 81.54 83.70 83.10 18 84.63 1.46 86.93 86.44 85.44 19 87.40 1.34 88.52 89.17 88.02 20 90.17 1.24 90.77 91.91 90.72 21 92.94 1.15 92.63 94.64 93.47 22 95.71 1.09 95.56 97.38 96.20 23 98.48 1.06 99.67 100.11 98.90 24 101.25 1.06 102.39 " .102.85 101.55 25 104.02 1.09 104.08 105.58 104.21 26 106.79 1.14 106.87 108.31 106.93 27 109.56 1.23 109.66 111.05 109.81 28 112.33 1.33 111.95 113.78 112.94 29 115.10 1.44 117.74 116.52 116.41 30 117.87 1.57 • 119.79 119.25 120.28 31 120.64 1.71 126.58 121.98 124.58 32 123.41 1.86 128.07 124.72 129.24 33 126.18 2.01 131.09 127.45 134.12 34 128.95 2.16 145.07 130.19 138.94

Y.=23.55 RMSQ = RMSI = RMS = RMSp = Li

11.22 10.83 11.19 10.93 CHAPTER V

SUMMARY AND DISCUSSION

In this final chapter the purposes, procedures, and results of the

study are summarized. The major findings and their implications are then

discussed and an orientation for further research is suggested.

Summary

The present study addressed two issues concerning the appropriate

use and interpretation of individually-administered tests of intelligence.

One issue concerned the interchangeable use of intelligence tests for the

same purposes in educational settings. The second issue concerned the

appropriateness of interpreting test results in British Columbia using

foreign norms.

Three major objectives were adopted from the Anchor Test Study in

Reading. These were: the preparation of a set of equivalency tables for

translating scores on one test to scores on another test; the comparison

of linear and equipercentile equating methods for the development of the

equivalency' tables; and the development of new norms.

Four individually-administered intelligence tests were selected on

the basis of their regular use in schools: the Wechsler Intelligence Scale

for Children—Revised (WISC-R), the Peabody Picture Vocabulary Test (PPVT),

the Slosson Intelligence Test (SIT), and the Standard Progressive Matrices

(SPM). A fifth test, the Mill Hill Vocabulary Scale (MHVS) was also

included. Although the MHVS is little known in B. C, it was developed

149 150 for joint use with the SPM (Raven, 1960). The specific research questions posed were whether the PPVT, SIT, SPM, and MHVS could legitimately be used as alternatives to the WISC-R, and whether existing norms for all five tests were appropriate in B.C.

Three technical issues were examined. The first of these was the distinction between equivalence and comparability. Equivalence defines a specific relationship between tests based on their parallelism and inter- changeability. Correspondingly, test equating is the process of identifying pairs of score values between tests which have identical qualitative and quantitative meaning. Comparability, on the other hand, defines a common rank position of score values for a given reference group. Comparing, then, is the process of scaling tests in a manner that associates a given score value with a given rank position across tests for a given population.

The comparability of IQ scores is implied in common usage when a con• stant meaning is associated with mean 100 and standard deviation 15 or 16.

The fact that IQ scales are derived using different standardization groups and procedures for different tests, however, violates the conditions of comparability. Additionally, the equivalence of IQ tests is implied in common usage when they are used interchangeably for the same purpose.

With the introduction of the notion of comparable scores, the methodological orientation of the study was extended to include the development of compar• able score scales for all five tests as well as the preparation of equival• ency tables among tests where the latter were feasible as determined by psychological and statistical criteria.

The second technical issue concerned the viability of considering as parallel, and therefore equatable, the WISC-R and each of the other tests. The concept of nominal parallelism(Lord, 1964; Lord & Novick, 151

1968) was adopted as compatible with the nature and use of the tests con•

sidered in this study. The criteria established for nominal parallelism

were psychological similarity in terms of content and purpose and statistical

similarity as defined by disattenuated correlation coefficients of at least

. 70.

The third issue concerned the applicability of four equating methods—

linear observed score, linear true score, graphic equipercentile, and anal•

ytical equipercentile—to nominally parallel tests. This represented an

extension of previous equating applications and allowed an exploration of

the robustness of linear and equipercentile equating methods to tests which

were not strictly parallel in the classical sense.

The population identified for the study was all non-Native Indian,

English-speaking children at three age levels—7% years, 9% years, and 11%

years—attending public and independent schools in British Columbia. The

population was further restricted to exclude children enrolled in classes

for the physically handicapped, emotionally disturbed, or trainable mentally

retarded. A stratified sampling design was used employing five stratifica•

tion variables: geographic region, community size, size of school, age,

and sex. A random sample of 180 children was identified at each of three

age levels to proportionately represent the population described according

to these variables. The actual number of students tested was 63 per cent

of this target sample: they were shown, however, to maintain the represent•

ativeness of the target sample in terms of the stratification variables.

All five tests were administered to each child, with a Canadianized version of the WISC-R given in one testing session, and the other four tests

given in counterbalanced order in a second session. The tests were first

scored using the norms tables and procedures in their respective manuals. 152

Statistical tests for differences of means and variances for the B.C. sample compared to the original standardization samples revealed that, in general,

B.C. children scored significantly higher and with less variability (p < .05).

Therefore, new norms tables were prepared for the conversion of raw test scores to IQs' and corresponding percentile ranks for the WISC-R, PPVT, and SIT. In keeping with the original format of the SPM and MHVS inter- pretational information, raw scores and percentile ranks only were reported.

The B. C. rescaling procedures involved lowering and spreading out the score scales to conform to conventional use and interpretation. The result is that below average students score lower with the B.C. than the published norms, while students who are much above average score the same or even higher with B.C. norms.

Both linear and equipercentile equating procedures were applied to the observed score distributions of the following test pairs identified as nominally parallel:

PPVT WISC -R Verbal, ages ih, 9%, and nh SIT WISC -R Verbal, ages ih, 9%, and nh SIT WISC -R Full Scale, ages ih, 9h, and 11% SPM WISC -R Performance, age nh MHVS WISC -R Verbal, ages ih, 9%, and 11%

The accuracy of the results were judged by comparison of the conditional root-mean-square errors of equating associated with the equating procedures.

These errors averaged 12 score points and were similar across all proce• dures. 153

In keeping with the organization of Chapters III and IV, the results of the norming portion of the study are discussed first, followed by the results of equating.

Norming

The B.C. Scores

A concern which has been frequently expressed by practitioners to whom the results were presented is that the exclusion of Native Indian and low-scoring children had a spurious effect on scores and that the resulting distributions are "too high." However, the representativeness of the sample for the population defined was substantiated in Chapter IV. The exclusion of other groups of children such as those non-fluent in English or emotionally disturbed is consistent with the American WISC-R standardization procedures.

No children were excluded on the basis of learning disabilities, low academic performance, or disciplinary problems. Personal communication with many principals confirmed that any exclusions made according to the delimitations in Chapter I were restricted to extreme cases.

An alternate hypothesis to that of sampling bias toward high scores, is that children in B.C., and perhaps in Canada, perform at a higher level as defined by IQ scores. Certainly support for the generalizability of high IQ scoring patterns was provided by studies referenced in Chapter II in which group intelligence tests were used (Kenneth, 1972; Oldridge, 1968;

Wright et al., 1972). More directly relevant, Peters (1976) reported mean

WISC-R scores for a sample-of 7%, 10%, and 13% year olds in a city in Sas• katchewan that closely resemble those found in the present study. The mean WISC-R full Scale IQ scores for his sample of 100 children at each age level were 109.75, 106.58, and 103.41 respectively. The corresponding scores for 7%, 9%, and 11% year olds in the B.C. sample were 108.16, 107.89, 154 and 104.80.

A third hypothesis for the higher scores is that intelligence, in general, is increasing over time. Therefore renorming efforts would be expected to necessitate a lowering of the score scales to maintain the mean

IQ at 100. Doppelt and Kaufman (1977) compared scores on items common to both tests using the 1949 WISC sample and the 1974 WISC-R sample. Using regression equations, they estimated WISC IQs for the WISC-R standardization sample at each age level. They found that the obtained WISC-R IQs were lower than the estimated WISC IQs by 4 to 8 points for children from 6% to 11% years old and by 2 to 3 points for older children, with the greatest discrepancies occurring for the lowest scoring children.

Herman (1977) reported a consistent trend toward the lowering of IQ scales in sequential standardizations of intelligence tests in the United

States. He concluded therefore that variations in sampling could not account for the differences, and suggested instead that intelligence is increasing. One possible explanation for this, and one which is consistent with the finding of 'larger differences for younger children, is the greater emphasis placed on environmental enrichment for pre-school and primary children (Thorndike, 1977). Another might be increasing test-wiseness of children in general (cf. Beauchamp et al., 1979; Herman, 1979).

The British Columbia sample was tested in 1980. If in fact, intelli• gence as measured by intelligence tests, is increasing, this alone may account for the B.C. differences and be independent of nationality. One interesting observation in the present study which was also noted in the

Doppelt and Kaufman paper, is that greater increases were found in Perfor• mance than Verbal IQs (see Table 25, Chapter IV). It is possible that a general familiarity with games and the increasing acceptability of such 155

games in school settings may contribute to the higher scores.

Another hypothesis for the higher B.C. scores on the WISC-R is that

the Canadianized items provided an advantage. This however, seems unlikely

since Vernon's (1977) rationale for the substitutions was to match American pass percentages for Canadian examinees on items for which Canadian pass percentages were otherwise depressed. Examination of the' Canadianized

content (see Appendix B) does not indicate any obvious easiness of items.

In fact, the contrary might be speculated. In particular, the item, "Name

three oceans that border Canada" was noted to be difficult since the accep•

table response had to include mention of the Arctic ocean. The results

of the Peters' (1976) study refute the notion of an easier Canadianized

version. He administered the WISC-R without any changes and found very

similar scores to those reported in this study. Peters' mean Information

subtest scores were 11.05, 10.08, and 9.92 for ages 7%, 10%, and 13%.

The corresponding means in this study were 10.72, 10.64, and 9.88 for ages

7%, 9%, and 11%.

A limitation noted for the present study is bhab both American and

Canadian items were not administered and the results compared. Consider•

ing the results of Peters' study, one would be inclined to suggest that the B.C. WISC-R norms are applicable regardless of whether the Canadian

items are used or not. This is not advised, however, unless further research confirms the similarity of scores. For most of the substitution items, a case can be made for the face-value desirability of the Canadian wordings.

Canadian testers have, for years, been making substitutions informally and inconsistently (cf. Peters, 1976; Spreen & Tryk, 1970), based on the beliefs that some items were American-biased and that Canadian children should be oriented toward Canadian culture. Thus the adoption of Vernon's items 156 are recommended since they are empirically-based.

Use of the New Norms

A concern regarding the renorming results is the question of recep• tivity of both practitioners and laypersons to the notion of lowering IQ.

The question is not perceived to occur at the level of the conceptual sound• ness of rescaling. The conventional interpretation of IQ assumes a refer• ence value of the mean equal to 100. Matarazzo (1972) summarized the desir• ability of endorsing this conceptual constancy, saying that it is

obviously a matter of common sense to select a value for it as would be in line with the order of numerical values of IQ's now in general use. (p. 104)

The interpretation additionally assumes a normal distribution of scores around the mean so that percentile ranks associated with given scores are known. From a practitioner's point of view, assurance that this applica• tion of test results is accurate has very obvious justification. It is at the practical level that problems may occur. In an educational milieu of working familiarity with test results from existing norms, the incautious introduction of a new norms reference scale could have misleading and erroneous impact. In particular, the re-testing of children using the

same test but different norms, or the comparing of children on the basis of scores derived from different norms tables would be unfortunate and would, in fact, contribute to the very type of comparative misuse of scores that this study set out to rectify. The onus of the responsibility for ensuring correct use comes back to the professional tester. Any score reporting based on B.C. norms requires careful reference and clarification. The results of the study which may prove of most practical value to testers are those they suggested, namely the percentile ranks associated with selected standard deviations. Since the trend is away from reporting 157 actual numerical score results, these values may be more useful. They at least would serve to avoid the emotionally-laden impact on laypersons of lowering actual scores.

The Comparability of the New Norms

The IQ score scales for the WISC-R, PPVT, and SIT are comparable in the sense described in Chapter II; that is, a given score value represents a constant rank position across tests for the population defined in this study. It was noted earlier that the practice of assigning a common score to different tests of intelligence fosters the interpretation of scores as if they were comparable when they, in fact, are not. Thus the results of this phase of the study provide for this correct usage and interpretation of the IQ score scales.

Equating

The magnitude of the conditional root-mean-square errors of equating associated with both linear and equipercentile equating methods was inter• preted in Chapter IV as_:an indication of the non-equatability of the test pairs in this study. In light of this finding, the criteria established for the equivalence of the tests leading to the identification of nominally parallel test pairs are reassessed. Following that, the implications of the results for testing practice are examined.

The Equatability of Nominally Parallel Test Pairs

In Chapters I and II, the justification for the equating phase of this study was established using the concepts of functional, psychological, and statistical equivalence. The functional, or practical, equivalence of four of the intelligence tests used (excluding the MHVS) generated the research question regarding the feasibility of an equating study. Func• tional equivalence alone, however, is based on the simple face validity 158

of tests having a common name and claim, and was deemed an inadequate justi•

fication for the consideration of parallelism. Therefore, psychological

and statistical criteria were added.

The determination of psychological equivalence of test pairs was based

on an analysis of the similarity of the psychological function measured

by each test. Thus tests labelled as verbal measures of intelligence and

having obvious verbal content (e.g., WISC-R Verbal and PPVT) were considered

psychologically parallel. These designations were further supported by

the results of previous empirical studies showing high correlations and/or

common factor loadings. The notion of content similarity was, however,

defined in a broad sense rather than in terms of the balance of items measur•

ing similar skill categories or domains across tests as in the Anchor Test

Study (Jaeger, 1973). Thus a global notion of verbal measurement may be

inadequate to account for differences in the unidimensionality of the verbal

task required for the PPVT and the multidimensionality of the verbal tasks

required for the five WISC-R Verbal subtests. Although Lord and Novick

(1968) provided no guidelines for the degree of test similarity required

for nominal parallelism, Marks and Lindsay (1972) stressed the notion of

slight differences only among tests considered as nominally parallel.

The lack of specificity regarding the size of correlations delimiting nominal parallelism was similarly noted. Again, although Lord and Novick

(1968) provided no statistical criterion of similarity, Marks and Lindsay

(1972) referred to high correlations. As discussed in Chapter I, the minimum disattenuated correlation coefficient of .70 adopted in the present

study was based on known correlations among functionally equivalent indi• vidually administered intelligence tests (Wechsler, 1974).

In Chapter IV it was noted that the root-mean-square errors were of a similar magnitude across all test pairs and equating procedures. For the range of disattenuated correlations reported in this study (.68-.92) there were no appreciable differences in the size of the errors associated with differences in correlation. Therefore, high correlation was concluded to be a necessary although not sufficient requirement for equating. The root-mean-square errors reported in the Anchor Test Study for tests having similar disattenuated correlations to some of those reported in the present study (i.e., > .89) were generally much less than one raw score point (Linn,

1975). Therefore it was additionally concluded that content similarity based on item and domain correspondence is required.

In summary, the results of the present study suggest the following considerations for the determination of nominally parallel test pairs for equating purposes. Despite Lord's (1964) early definition of nominal parallelism based on interchangeable use in common practice, functional equivalence is not a sufficient standard where equating is the goal. Lord and Novick (1968) stressed the notion of psychological equivalence in terms of commonality of the psychological trait measured. In the present study this was adopted as the first, or screening, criterion for the identifica• tion of nominally parallel tests. As discussed earlier, to this must be added the notion of content parallelism as used in the Anchor Test Study.

This requires a correspondence of items between tests in terms of specific skill areas or domains tested within the more global definition of the psychological trait. Finally, an index of statistical similarity, namely the correlation coefficient, must be applied to confirm empirically the relationships established on the basis of judgmental analysis. The deter• mination of the size of this index is influenced by the size of the sample on which the conversions are to be established. Marks and Lindsay (1972) 160

claimed that, for tests defined as nominally parallel in the stricter sense

discussed in this section, correlations of .80 or even lower are acceptable

provided the sample size is at least 500. Conversely, smaller samples

require higher correlations.

The^Use and Interpretation of Intelligence Tests

From a practical point of view, the results of the study emphatically

confirm the non-equivalence, that is, the non-interchangeability of the

five individually-administered intelligence tests examined. As discussed

in Chapter IV, the size of the errors describing the discrepancy between

converted and obtained scores indicated the inadequacy of using a conversion

based on the assumption of test equivalence. This finding re-emphasizes

the need for careful test use and interpretation and stresses the respons•

ibility placed on the professional school psychologist and psychometrician

to ensure the proper understanding of test application.

Limitations of the Study

The limitations of the study derive from issues related to sample

size. These are discussed separately in relation to the effect on norming

and on equating.

Norming

With respect to norming, the issue of size relates to the problem

of response rate. The issue of non-response is always troublesome, and particularly so in a situation such as the present one where there is a heavy reliance on the assistance of volunteers. In the present: study,:the

target sample size was determined and a stratified random sample of children was drawn corresponding to this target size. The actual number of children

tested was 63.0% of the target number. These children came from 63.9% of the schools and 92.2%> of the school districts originally drawn. 161

This raises the question of bias in the potential differential inclusion and exclusion of persons in the sample.. In the present study, however, the issue of bias is equivocal. The non-response group included not only refusals to participate (at the district or school level), but also those for whom testing arrangements could not be made or were not followed through.

Thus the non-responders were not homogeneous with respect to reasons for exclusion. Additionally, as shown in Chapter IV, comparison of sample and population percentages for the stratification variables confirmed the overall representativeness of the sample. Therefore, it is unlikely that the low response rate effected a systematic bias.

Equat ing

The limitation for the equating phase of the study was related to the general size of the sample apart from questions of its dispersion.

Marks and Lindsay (1972) stressed the need for a large sample («* 500) in order to reduce equating error. In the present study, however, sample size would not be expected to compensate for the error attributable to the non-equivalence of test content. Therefore, the more important considera• tion remains the degree of test parallelism.

Directions for Further Research

The most obvious need for further research indicated by the norming phase of the study involves an extension of the norms tables to other ages— particularly throughout the WISC-R range of 6 years 0 months to 16 years

11 months. Results of the present study suggest that renorming would con• sistently result in a lowering of IQs. While this could be postulated for all age groups, it is currently impossible to provide B.C. standardized scores for other than the specific age groups included in the present study.

A second need is for achievement correlates to establish the'.criterion 162 validity of the B.C. norms. A third orientation for further research regard•

ing intelligence test scores concerns the consistently higher performance

reported for Canadian children on standardized tests. Since the reasons

for this have not been fully examined, further studies may be considered.

The equating results indicate the need for research to confirm the uses to which intelligence tests are applied. The differential validation of tests would help to avoid the problem of overlapping usage. With regard

to test equating, it is unproductive to'advocate further equating efforts between tests unless there is a high degree of content similarity. With

the tests now available for the individual assessment of children's intelli• gence, test equating is judged to be inappropriate. Therefore, the more productive orientation is toward specific and individual test validation.

Thus the concluding recommendations for this study are seen to correspond to those derived from a different perspective: the judicial analysis of test bias discussed in Chapter I (p. 1). 163

REFERENCE NOTES

1. B.C. Research, Personal experience regarding sampling procedures for the B.C. Learning Assessment Program. November, 1980.

2. Rees, D. Personal communication regarding 1978-1979 school enrolment in British Columbia, November, 1978.

3. Ralston, M. V. Personal communication regarding language usage. November, 1979.

4. Herman, D. 0. Personal communication regarding conversion of raw scores on Wechsler tests. November, 1978. 164

REFERENCES

American Psychological Association. Standards for educational and psychological tests. Washington, D.C.: Author, 1974.

Anastasi, A. Psychology, psychologists, and psychological testing. American Psychologist, 1967, _22, 297-306.

Anastasi, A. Psychological testing (4th ed.). N.Y.: Macmillan, 1976.

Angoff, W. H. Technical problems of obtaining equivalent scores on tests. Journal of Educational Measurement, 1964, 1, 11-13.

Angoff, W. H. Can useful general-purpose equivalency tables be prepared for different college admissions tests? In A- Anastasi (Ed.), Testing problems in perspective. Washington, D.C.: American Council on Education, 1966.

Angoff, W. H. Scales, norms, and equivalent scores. In R. L. Thorndike (Ed.), Educational measurement (2nd ed.). Washington, D.C.: American Council on Education, 1971.

Armstrong, R. J., & Jensen, J. A. Can scores obtained from the Slosson Intelligence Test be used with as much confidence as scores obtained from the Stanford-Binet Intelligence Scale? East Aurora, N.Y.: Slosson Educational Publications, 1970.

Armstrong, R., Jensen, J., & Reynolds, C. Untitled paper. East Aurora, N.Y.: Slosson Educational Publications, 1974.

Avis, W. S., Drysdale, P. D., Gregg, R. J., & Scargill, M. H. Canadian senior dictionary. Toronto: Gage, 1979.

Barratt, E. S. The relationship of the Progressive Matrices (1938) and the Columbia Mental Maturity Scale to the WISC. Journal of Consulting Psychology, 1956, 20, 294-296.

Bea'uchamp, D. P., Samuels, D. D. , & Griffore, R. J. WISC-R Information and Digit Span scores of American and Canadian children. Applied Psychological Measurement, 1979, 3(2), 231-236.

Bersoff, D. N. P.v. Riles: Legal perspective. School Psychology Review, 1980, 9, 112-122.

Bianchini, J. C, & Loret, P. G. Anchor test study. Final report. Project report and volumes 1 through 30: 1974. (ERIC Nos. ED 092 061 through ED 092 631.) 165

Birkemeyer, F. The relationship between the Coloured Progressive Matrices and individual intelligence tests. Psychology in the Schools, 1964, 1, 309-312.

Birkemeyer, F. The relationship between the Coloured Progressive Matrices and the Wechsler Intelligence Scale for Children. Psychology in the Schools, 1965, 2, 278-280.

Bock, R. D. Word and image: Sources of the verbal and spatial factors in mental test scores. Psychometrika, 1973, 38, 437-457.

Brown, A. E. Intelligence tests and the polities of school psychology. Interchange, 1976-77, 7(3), 17-20.

Burke, H. R. Raven's Progressive Matrices: A review and critical evalua• tion. Journal of Genetic Psychology, 1958, 93, 199-228.

Burke, H. R., & Bingham, W. C. Raven's Progressive Matrices: More on construct validity. Journal of Psychology, 1969, 72, 247-251.

Carroll, J. B., Davies, P., & Richman, B. Word frequency book. N.Y.: American Heritage, 1971.

Clarizio, H. F. In defense of the IQ test. School Psychology Digest, 1979, 8, 79-88.

Covin, T. M. Comparison of SIT and WISC-R among special education candi• dates. Psychology in the Schools, 1977a, 14, 19-23.

Covin, T. M. Relationship of the SIT and PPVT to the WISC-R. Journal of School Psychology, 1977b, 1_5, 259-260.

Crofoot, M. J., & Bennett, T. S. A comparison of three screening tests and the WISC-R in special education evaluations. Psychology in the Schools, 1980, 1_7, 474-478.

Cronbach, L. J. Essentials of psychological testing (3rd ed.). N.Y.: Harper & Row, 1970.

Cronbach, L. J. Five decades of public controversy over mental testing. American Psychologist, 1975, 30, 1-14.

Cureton, E. E. Minimum requirements in establishing and reporting norms on educational tests. Harvard Educational Review, 1941, 1_1, 287-300.

Doppelt, J. E., & Kaufman, A. S. Estimation of the differences between WISC-R and WISC IQs. Educational and Psychological Measurement, 1977, 37, 417-424.

Dunn, L. M. Expanded manual for the Peabody Picture Vocabulary Test. Circle Pines, Minnesota: American Guidance Service, 1965. 166

Ebel, R. L. Essentials of educational measurement. Englewood Cliffs, N.J.: Prentice-Hall, 1972.

Ebel, R. L. The social consequences of educational testing. In W. A. Mehrens (Ed.), Readings in measurement and evaluation in education and psychology. N.Y.: Holt, Rinehart & Winston, 1976.

Elley, W. B., & MacArthur, R. S. The Standard Progressive Matrices as a culture-reduced measure of general intellectual ability. Alberta Journal of Educational Research, 1962, 8, 54-65.

Flanagan, J. C. Units, scores, and norms. In E. F. Lindquist (Ed.), Educational measurement. Washington, D.C.: American Council on Education, 1951.

Flanagan, J. C. Obtaining useful comparable scores for non-parallel tests and test batteries. Journal of Educational Measurement, 1964, 1_, 1-4.

Fraser, D. Mental abilities of British Columbia Indian Children. Canadian Counsellor, 1969, 3, 42-48.

Glass, G. V., & Stanley, J. C. Statistical methods in education and psychology. Englewood Cliffs, N.J.: Prentice-Hall, 1970.

Goldstain, C. The performance of Cowichan Native Indian Children on three tests of cognitive ability. Unpublished masters' thesis, University of British Columbia, 1980.

Guilford, J. P. Psychometric methods- N.Y.: McGraw-Hill, 1954.

Gulliksen, H. Theory of mental tests. N.Y.: Wiley, 1950.

Hawthorn, H. B. (Ed.). A survey of the'contemporary Indians of Canada. Economic, political, educational needs and policies (Vol. 2). Ottawa, Information Canada, 1971.

Herman, D. 0. The WISC-R, its development and usage: Some findings from sequential standardizations. Paper presented at the annual meeting of the National Association for School Psychologists, San Diego, March, 1979.

Himelstein, P. Reviews of the Slosson Intelligence Test. In 0. K. Buros (Ed.). The seventh mental measurements yearbook. Highland Park, N.J.: Gryphon Press, 1972.

Houts, P. L. Behind the call for test reform and abolition of the IQ. Phi Delta Kappan, 1976, June, 669-673.

Jaeger, R. M. The National test-equating study in reading (The Anchor Test Study). NCME Measurement in Education, Summer 1973, 4 (Whole No. 4). 167 Jaeger, R. M. Some exploratory indices for selection of a test equating method. Paper presented at the meeting of the American Educational Research Association, Boston, April 1980.

Kamin, L. J. The science and polities of'IQ. N.Y.: John Wiley & Sons, 1974.

Kamin, L. J. Social and legal consequences of I.Q. tests as classifica• tion instruments: Some warnings from our past. Journal of School Psychology, 1975, 13, 317-323.

Kaufman, A. S. Factor analysis of the WISC-R at 11 age levels between 6% and 16% years. Journal of Consulting and Clinical Psychology, 1975, 43, 135-147.

Kaufman, A. S. Intelligent testing with the WISC-R. N.Y.: Wiley, 1979.

Kaufman, A. S., & Doppelt, J. E. Analysis of WISC-R standardization data in terms of the stratification variables. Child Development, 1976, 47, 165-171. "

Krichev, A. Review of WISC-R. In 0. K. Buros (Ed.), The eighth mental measurements yearbook (Vol. 1). Highland Park, N.J.: Gryphon Press, 1978.

Kelley, T. L. Statistical method. N.Y.: Macmillan, 1924.

Kennett, K. F. Intelligence and socioeconomic status in a Canadian sample. Alberta Journal of Educational Research, 1972, 1_8, 45-50.

Larry P. v. Wilson Riles. OPINION, U.S. District Court for Northern District of California, (NoC-712270 RFP), October 11, 1979.

Laycock, S. R. The use and misuse ofrI .Q. tests. The B.C. Teacher, 1968, 47, 234-236.

Lennon, R. T. Equating nonparallel tests. Journal of Educational Measurement, 1964, 1, 15-18.

Lennon, R. T. A comparison of results of three intelligence tests. In C. I. Chase & H. G. Ludlow (Eds.), Readings in educational and psychological measurement. Boston: Houghton-Mifflin, 1966a.

Lennon, R. T. Norms: 1963. In A. Anastasi. (Ed.), Testing problems in perspective. Washington, D.C: American Council on Education, 1966b.

Lennon, R. T. Perspective on intelligence testing. NCME Measurement in Education, Spring 1978, 9 (Whole No. 2). 168

Lindsay, C. A., & Prichard, M. A. An analytical procedure for the equi• percentile method of equating tests. Journal of Educational Measurement, 1971, 8, 203-207.

Lindquist, E. F. Equating scores on non-parallel tests. Journal of Educational Measurement, 1964, 1, 5-9.

Linn, R. L. Anchor test study: The long and the short of it. Journal of Educational Measurement, 1975, 12, 201-214.

Lord, F. M. Nominally and rigorously parallel test forms. Psychometrika, 1964, 69, 335-345.

Lord, F. M. Notes on comparable scales for test scores (Research Bulletin), Princeton, N.J.: Educational Testing Services, 1950.

Lord, F. M., & Novick, M. R. Statistical theories of mental test scores. Reading, Mass.: Addison-Wesley, 1968.

Loret, P. G., Seder, A., Bianchini, J. E., & Vale, C. A. Anchor test study. Equivalence and norms tables for selected reading achievement tests (grades 4, 5, 6). Washington, D.C.: U.S. Government Printing Office, 1974.

Loretan, J. 0. The decline and fall of group intelligence testing. Teachers College Record, 1965, 67, 10-17.

Lowrance, D., & Anderson, H. N. A comparison of the Slosson Intelligence Test and the WISC-R with elementary school children. Psychology in the Schools, 1979, 16, 361-364.

Marco, G. L., Petersen, N. S., & Stewart, E. E. A test of the adequacy of curvilinear score equating models. Paper presented at the Computerized Adaptive Testing Conference, Minneapolis, June, 1979.

Marks, R. Providing for individual differences: A history of the intel• ligence testing movement in North America. Interchange, 7, 3-16.

Marks, E., & Lindsay, C. A. Some results relating to test equating under relaxed test form equivalence. Journal of Educational Measurement, 1972, 9, 45-56.

Martin, A. W., & Wiechers, J. E. Raven's Colored Progressive Matrices and the Wechsler Intelligence Scale for Children. Journal of Consulting Psychology, 1954, 18, 143-144.

Martin, J. D., & Kidwell, J. C. Intercorrelations of the Wechsler Intel• ligence Scale for Children—Revised, the Slosson Intelligence Test, and the National Educational Developmental Test. Educational and Psychological Measurement, 1977, 37, 1117-1120. 169

Matarazzo, J. D. Wechsler's measurement and appraisal of adult intelli• gence .(5th ed.). Baltimore: Williams & Wilkins, 1972.

Mehrotra, K. K. A comparative study of WISC and Raven's PM. Psychological Studies, 1968, 13, 47-50.

Mercer, J. R. The struggle for children's rights: Critical juncture for school psychology. The School Psychology Digest, 1977, 6, 4-19.

Mercer, J. R. SOMPA Technical manual. N.Y.: Psychological Corporation, 1979.

Millman, J., & Linlof, J. The comparability of fifth grade norms of the California, Iowa, and Metropolitan achievement tests. Journal of Educational Measurement, 1964, 1, 135-137.

Ministry of Education, Report on education 1976-77. Victoria, B.C.: Queen's Printers, 1978.

Mize, J. M., Callaway, B., & Smith, J. W. Comparison of reading disabled children's scores on the WISC-R, Peabody Picture Vocabulary Test, and Slosson Intelligence Test. Psychology in the Se'hools, 1979, 16, 356-358.

More, A. J., & Oldridge, B. An approach to non-discriminatory assessment of Native Indian Children. B. C. Journal of Special Education, 1980, 4, 51-59.

NASP position on unbiased assessment. Communique, 1976, 4(8).

Newland, T. E. Assumptions underlying psychological testing. Journal of School Psychology, 1973, 1_1, 316-322.

Nicholson, C. L. Analysis of functions of the Slosson Intelligence Test. Perceptual and Motor Skills, 1970, 31, 627-631.

Nunnally, J. C., Jr. Introduction to psychological measurement. N.Y.: McGraw-Hill, 1970.

Oldridge, 0. A. A comparison of high ability Canadian and American grade 8 and 9 students in six areas of academic achievement when C. A., I.Q., and grade placement are equated. Paper presented at the Canadian Educational Research Association, St. John's, June, 1968.

Peters, H. D. The validity of the Wechsler Intelligence 'Scale for Children —Revised. Canadian Journal of Behavioral Science, 1976, 8, 414-417.

Poissant, P. A. A product moment correlation between the Stanford-Binet and Slosson Intelligence Tests with slow learners. East Aurora, N.Y.: Slosson Educational Publications, 1967. 170

Raven, J. C. Guide to using the Crichton Vocabulary Scale with Progres• sive Matrices (1947). Sets A, Ab, B. London England: H. K. Lewis, 1954.

Raven, J. C. Guide to using the Mill Hill Vocabulary Scale with the Progressive Matrices Scales. London, England: H. K. Lewis & Co., 1958.

Raven, J. C. Guide to the Standard Progressive Matrices. Sets A, B, C, D, and E. London, England: H. K. Lewis & Co., 1960.

Raven, J. C, Court, J. H., & Raven, J. Manual for Raven's Progressive Matrices and vocabulary scales. Section 1: General overview. London, England: H. K. Lewis, 1976.

Raven, J. C., & Walshaw, J. B. Vocabulary tests. British Journal of Medical Psychology, 1944, 20, 185-194.

Reschly, D. J. WISC-R factor structures among Anglos, Blacks, Chicanos, and Native-Indian Papagos. Journal of Consulting and Clinical Psychology, 1978, 46, 417-422.

Reschly, D. J. Concepts of bias in assessment and WISC-R research with minorities. In H. Vance & F. Wallbrown (Eds.), WISC-R: Research and interpretation. Washington, D.C: National Association of School Psychologist, in press.

Richmond, B. 0., & Long, M. WISC-R and PPVT scores for black and white mentally retarded children. Journal of School Psychology, 1977, 15, 261-263.

Ritter, D. , Duffey, J., & Fischman, R. Comparability of Slosson and S-B estimates of intelligence. Journal of School Psychology, 1973, 11, 224-227.

Salvia, Y., & Ysseldyke, J. E. Assessment in special and remedial education. Boston, N.Y.: Houghton Mifflin, 1978.

Sarason, S. B. The unfortunate fate of Alfred Binet and school psychology. In S. Mietzitis & M. Orme (Ed.), Innovation in school psychology. Toronto: Ontario Institute for Studies in Education, 1977.

Sattler, J. M. Analysis of functions of the 1960 Stanford-Binet Intelli• gence Scale, Form L-M. Journal of Clinical Psychology, 1965, 21, 173-179. ~

Sattler, J. M. Assessment of children's intelligence (Revised reprint). Philadelphia, Pa.: W. B. Saunders Co., 1974.

Siegel, S. Nonparametric statistics for the behavioral science. N.Y.: McGraw-Hill, 1956.

j 171

Silverstein, A. B. Alternative factor analytic solutions for the Wechsler Intelligence Scale for Children—Revised. Educational and Psycholog• ical Measurement, 1977, 37, 121-124.

Slosson, R. L. Slosson Intelligence Test (SIT) for Children and Adults. East Aurora, N.Y.: Slosson Educational Publications Inc., 1975.

Spreen, 0., & Tryk, H. E. WISC Information subtest in a Canadian popula• tion. Canadian Journal of Behavioral Science, 1970, 2, 294-298.

Stacey, C. L., & Carleton, F. 0. The relationship between Raven's Colored Progressive Matrices and two tests of general intelligence. Journal of Clinical Psychology, 1955, 1_1, 84-85.

Stewart, K. D., & Jones, E. C. Validity of the Slosson Intelligence Test: A ten-year review. Psychology in the Schools, 1976, 13, 372-380.

Stone, M. An interpretive profile for the Slosson Intelligence Test. Psychology in the Schools, 1975, 12, 330-333.

Terman, L. M., & Merrill, M. A. Measuring intelligence. Boston: Houghton Mifflin, 1937.

Thorndike, E. L. On finding equivalent scores in tests of intelligence. Journal of Applied Psychology, 1922, 6, 29-33.

Thorndike, R. L. Community variables as predictors of intelligence and academic achievement. Journal of Educational Psychology, 1951, 42, 321-338. ~

Thorndike, R. L. Causation of Binet IQ decrements. Journal of Educational Measurement, 1977, 14, 197-202.

Thorndike, R. L., Hagen, E., & Wright, E. N. Canadian Cognitive Abilities Test. Form 1. Levels A-F. Grades 3-9. Examiner's manual. Don Mills, Ontario: Thomas Nelson, 1974.

Valett, R. E. Clinical profile for the Stanford-Binet Intelligence Scale. Palo Alto, California: Consulting Psychologists Press, 1957.

Vance, H. B., Lewis, R., & DeBell, S. Correlations of the Wechsler Intelli• gence Scale for Children—Revised, Peabody Picture Vocabulary Test, and Slosson Intelligence Test for a group of learning disabled stu• dents. Psychological Reports, 1979, 44, 735-738.

Vance, H. B., Prichard, K. K., & Wallbrown, F. H. Comparison of the WISC-R and PPVT for a group of mentally retarded students. Psychology in the Schools, 1978, ' 15, 349-351.

Vance, H. B., & Wallbrown, F. H. The structure of intelligence for black children: A hierarchical approach. Psychological Record, 1978, 28, 31-39. 172

Vernon, P. E. WISC-R. Canadian Psychological Association Bulletin, 1974, 4, 8-9.

Vernon, P. E. Modification of the WISC-R for Canadian use. Canadian Psychological Association Bulletin, 1976, 6, 4-5.

Vernon, P. E. Final report on modifications of WISC-R for Canadian use. Canadian Psychological Association Bulletin, 1977, 5, 5-7.

Vitro, F. T. In defense of intelligent intelligence testing. Academic Therapy, 14, 223-228. ~~

Wallbrown, F. H. A factor analytic framework for the clinical interpreta• tion of the WISC-R. Paper presented at the annual meeting of the National Association of School Psychologists, San Diego, March, 1979.

Wallbrown, F. H. , Blaha, J. , Wallbrown, J. D. , & Engin, A. W. The hier• archical factor structure of the Wechsler Intelligence Scale for Children—Revised. Journal of Psychology, 1975, 89, 223-235.

Wechsler, D. Manual for the Wechsler Intelligence Scale for Children. N.Y.: Psychological Corporation, 1949.

Wechsler, D. Manual for the Wechsler Intelligence Scale for Children— Revised. N.Y.: Psychological Corporation, 1974.

Wesman, A. C. Comparability vs. equivalence of test scores. Test Service Bulletin, The Psychological Corporation, 1958, No. 53.

Whitworth, R. H. Review of WISC-R. In 0. K. Buros (Ed.), The eighth mental measurements yearbook (Vol. 1). Highland Park, N.J.: Gryphon Press, 1978.

Wiedl, K. H., & Carlson, J. S. The factorial structure of the Raven Coloured Progressive Matrices test. Educational and Psychological Measurement, 1976, 36, 409-413.

Winer, B. J. Statistical principles in experimental design (2nd ed.). N.Y.: McGraw-Hill, 1971.

Wright, E. N., Thorndike, R. L., & Hagen, E. Canadian Lorge-Thorndike Intelligence Tests. Technical Supplement. Toronto: Thomas Nelson, 1972. APPENDIX A

LETTERS AND CONSENT FORMS THE UNIVERSITY OF BRITISH COLUMBIA 2075 WESBROOK MALL VANCOUVER, B.C., CANADA V6T1W5 Education Clinic FACULTY OF EDUCATION

This is a request for your endorsement of a province-wide research project involving the use of intelligence tests in schools in British Columbia. The project is being undertaken as a doctoral dissertation in the division of Educational Psychology at the University of British Columbia, and has received the approval of the Behavioral Sciences Screening Committee for Research.and Other Studies Involving Human Subjects at the University. Financial support has been provided through a grant from the Educational Research Institute of B.C.

The following is a brief abstract describing the purposes and procedures involved.

This research project addresses three questions concerning the use of individual intelligence tests ;in schools in British Columbia. 1. Can test results be -interpreted fairly for Canadian children using published norms established in the United States and Britain? The applicability of the existing norms for five individually-administered tests (the Wechsler Intelligence Scale for Children-Revised (WISC-R), the , Slosson Intelligence Test, the Peabody Picture Vocabulary Test, the Standard Progressive Matrices, and the Mill Hill Vocabulary Scale) will be assessed in response to this question. '2. Is it feasible to make comparisons between IQ scores which .were' obtained. from different tests of intelligence? Comparisons of this nature are frequently made based on the assumption of direct score correspondence between intelligence tests. Using the WISC-R as an anchor, the feasibility of constructing tables of comparable values among the five tests will be examined. 3. Can the Mill Hill Vocabulary Scale be used in conjunction with the Standard Progressive Matrices to provide verbal and nonverbal estimates of intellectual ability similar to that dichotomy provided by WISC-R Verbal and Performance scores? Raven constructed these tests for use in combination and standardized them jointly in Britain. In North America the Standard Progressive Matrices is commonly used in isolation as a measure of intelligence. The Mill Hill Vocabulary Scale will be introduced in this study to explore its validity as a verbal measure of intelligence and to examine its combined use with the Standard Progressive Matrices as a measure of general intelligence. The tests will be administered to a random, proportionately representative sample of 540 children, aged 7, 9, and 11 years, from public and independent 176

The following list of schools were randomly selected from School District No. . Would you please verify the principals' names so that, given your permission, we may contact them directly. Thank you.

I consent to the above -schools being contacted to request their participation in this research project. In doing so, I give my support to the study with the understanding that each school is subsequently free to make its own decision regarding participation.

signature

On behalf of School District No. , I decline involvement in this research project.

signature THE UNIVERSITY OF BRITISH COLUMBIA 2075 WESBROOK MALL VANCOUVER, B.C., CANADA V6T 1W5 Education Clinic FACULTY OF EDUCATION

Your superintendent has given us permission to contact you to request your support and participation in a research project involving the use of intelligence tests in schools in British Columbia. The project is being undertaken as a doctoral dissertation in the division of Educational Psychology at the University of British Columbia, and has received the approval of the Behavioral Sciences Screening Committee for Research and Other Studies Involving Human Subjects at the University. Financial support has been provided through a grant from the Educational Research Institute of B.C.

The following is a brief abstract describing the purposes and procedures involved..

This research project addresses three questions concerning the use of individual intelligence tests in schools in British Columbia. 1. Can test results be interpreted fairly for Canadian children using •published norms established in the United States and Britain? The applicability of the existing norms for five individually-administered tests (the Wechsler Intelligence Scale for Children-Revised (WISC-R), the Slosson Intelligence Test, the Peabody Picture Vocabulary Test, the Standard Progressive Matrices, and the Kill Hill Vocabulary Scale) will be assessed in response to this question. 2. Is it feasible to make comparisons between IQ scores which were obtained from different tests of intelligence? Comparisons of this nature are frequently made based on the assumption of direct score correspondence between intelligence tests. Using the WISC-R as an anchor, the feasibility of constructing tables of comparable values among the five tests will be examined. 3. Can the Mill Bill Vocabulary Scale be used in conjunction with the Standard Progressive Matrices to. provide verbal and nonverbal estimates of intellectual ability similar to that dichotomy provided by WISC-R Verbal and Performance scores? Raven constructed these tests for use in combination and standardized them jointly in Britain. In North America the Standard Progressive Matrices is commonly used in isolation as a measure of intelligence. The Mill Hill Vocabulary Scale will be introduced in this study to explore its validity as a verbal measure of intelligence and to examine its combined use with the Standard Progressive Matrices as a measure of general intelligence. The tests will be administered to a random, proportionately representative School Code No.

I consent to participate in the research project as described concerning the use of intelligence tests in B.C.

The person who is a qualified WISC-R test administrator for this school is:

Name: Address:

Telephone:

signature.

I do not consent to participation in this research.

signature THE UNIVERSITY OF BRITISH COLUMBIA 2075 \VESBROOK MALL 180 VANCOUVER, B.C., CANADA V6T 1W5 Education Clinic FACULTY OF EDUCATION

This is a request for your participation in a research project involving the use of intelligence tests in public and independent schools in British Columbia. To ensure the applicability of the results to children attending other than public schools, we have included .independent school children in our sample in the same proportion as they exist in the province. We ask that you read the enclosed materials describing the study and sincerely hope that you will agree to be included in this research. The project is being undertaken .as a doctoral dissertation in the division of Educational Psychology at the University of British Columbia, and has received the approval of the Behavioral Sciences Screening Committee for Research and Other Studies Involving Human Subjects at the University. Financial support has been provided through a grant from the Educational Research Institute of B.C.

The following :is a brief abstract describing the purposes and procedures involved.

This research project addresses three questions concerning the use of individual intelligence tests in schools in .British Columbia. I. Can test results be interpreted fairly for Canadian children using published norms established in the United States and Britain? The applicability of the existing norms for five individually-administered tests (the Wechsler Intelligence Scale for Children-Revised (WISC-R), the Slosson Intelligence Test, the Peabody Picture Vocabulary Test, the Standard Progressive Matrices, and the Mill Hill Vocabulary Scale) will be-assessed in response to this question. "2. Is it feasible to make comparisons between IQ scores which were obtained from different tests of intelligence? Comparisons of this nature are frequently made based on the assumption of direct score correspondence between intelligence tests. Using the "WISC-R as an anchor, the feasibility of constructing tables of comparable values among the five tests will be examined. 3. Can the Mill Hill Vocabulary Scale be used -in conjunction with the Standard Progressive Matrices to provide verbal and nonverbal estimates of intellectual ability similar to that dichotomy provided by WISC-R Verbal and 'Performance scores? Raven constructed these tests for use in combination and standardized them jointly in Britain. In North America the Standard Progressive Matrices is commonly used in isolation as z measure of intelligence. The Mill Hill Vocabulary Scale will be introduced in this study to explore its validity as a verbal measure of intelligence' and to examine its combined use with the Standard Progressive Matrices as a measure of general intelligence. The tests will be administered to a random, proportionately representative code no. 183

Parent Consent Form

I consent to 's participation in the 'testing research study at School. I am aware that this will involve two testing sessions of approximately one hour each, and that the tests will be returned anonymously to the University of British Columbia for scoring. I understand that confidentiality of test results will be maintained and that no individual scores will be released. I also understand that participation in this project is voluntary and may be terminated at any time.

signature

One of the problems in doing research of this scope is to ensure that the children chosen are truly representative of all the children in the province. To help us judge this, we have already grouped the children according to district, size of school, age, and sex. One further piece of information which we would .like to compare to Census Canada data is the level of education of the head of household (that is, the major wage-earner in the family). Would you please put an "X" in front of the category below which best describes .the completed level of education of the head of your household.

I Grade 8 and below

II Grades 9 - 10

III Grades 11-13

IV Post-secondary, non-university

V Post-secondary, including university

I am unwilling to have involved in the testing research study.

signature 184

Request for Subject Participation

(to be read to each subject individually prior to testing)

, as you may know by now, you have been selected to take part in a research project to see how children in British Columbia answer questions on some tests. You were chosen partly because we need children your age, and partly because we need children from this part of B.C. Altogether there are more than 500 children from all over the province doing the same tests that you will do. When we finish I will send these papers with your work to UBC. Your name won't be on them so nobody will know it was you— we only want to see how children answer the questions, okay?

I want you to remember that these tests have nothing to do with your schoolwork and will not count for your grades on your report card. Most children enjoy doing the tests and I'm sure you will too. Before we start, I want you to know that you don't have to do this, but that your help is important for a lot of children in B.C. I would appreciate it if you would .agree to work on these tests with me. Okay? THE UNIVERSITY OF BRITISH COLUMBIA 2075 WESBROOK MALL 185 VANCOUVER, B.C., CANADA V6T 1W5

FACULTY OF EDUCATION

We are contacting you to request your assistance in a research project involvii the use of individual intelligence tests in schools in British Columbia. The projec is being undertaken as a doctoral dissertation in the division of Educational Psychology at the University of British Columbia and has received financial support through a grant from the Educational Research Institute of B.C.

Enclosed is a brief abstract describing the purposes of the study. As you know, the process of establishing norms requires a large number of test administra• tions. The design of this study requires that all tests be given to a random, representative sample of 540 children, aged 7, 9, and 11 years. These children are being selected from 195 public and independent schools (no more than three children per school) in 52 school districts throughout the province of British Columbia.

Our procedure has been to randomly select a stratified sample of schools and then to gain the endorsement of the district superintendents.arid the support of the school principals. Where possible, we have asked the principals to identify for us the qualified WISC-R tester for his or her school. The principal(s) of the following school(s) have referred your name:

We will be contacting you by telephone shortly to ask for your participation in this project. Before doing so however, we wanted you to have a chance to acquaii yourself with the procedures. We are now in the process of selecting the children for testing and seeking parental consent. Following this, we ask that all five tests be administered to each of these children. It is anticipated that this will require approximately 2h hours' time per child. The tests should be administered in two sessions to be scheduled at your convenience sometime before the end of the school year. Detailed directions and test protocols -will be supplied. To reduce your time, all scoring will be done at UBC.

We are very aware that this is a busy time for you and that testing demands are considerable. We hope that you will appreciate, however, the desirability of having local norms for these tests, including a "Canadianized" version of the WISC-I Since we are operating on a limited budget, and since the scope of the project is large, we must rely on the help and support of many of you to make it a success. We do assure you a copy of the results of the study for your future use. 187

Individually-Administered Intelligence Tests: Determination of Norm Relevance and Inter-Test Comparability in British Columbia

This study has two research orientations. The first is substantive and addresses three questions concerning the use of individual intelligence tests in schools in British Columbia. The second orientation is method• ological and concerns the application of equivalent score conversion techniques to intelligence test scores. The following four questions summarize the purposes of the study. .1. Car, test results be interpreted fairly for Canadian children using published norms established in the United States and Britain? The applicability of the existing norms for five individually-administered tests (the Wechsler Intelligence Scale for Children-Revised (WISC-R), the Slosson Intelligence Test, the Peabody Picture Vocabulary Test, the Standard Progressive Matrices, and the Mill Hill Vocabulary Scale) will be assessed in response to this question. 2. Is it feasible to make comparisons between IQ scores which were obtained from different tests of intelligence? Comparisons of this nature are frequently made based on the assumption of direct score correspondence between intelligence tests. Using the TtflSC-R as an anchor, the feasibility of constructing tables of comparable values among the five tests will be examined. 3. Can the Mill Bill Vocabulary Scale be used in conjunction with the Standard Progressive Matrices to provide verbal and nonverbal estimates of intellectual ability similar to that dichotomy provided by WISC-R Verbal and Performance scores? Raven constructed these tests for use in combination and standardized them jointly in Britain. In North America the Standard Progressive Matrices is commonly used in isolation as a measure of intelligence. The Mill Hill Vocabulary Scale will be introduced in this study to explore its validity as a verbal measure of intelligence and to examine its combined use with the Standard Progressive Matrices as a measure of general intelligence. 4. What are the meaningfulness and relative efficiency of the linear and equipercentile score equating techniques when applied to intelligence tests? These techniques have previously been compared for reading achievement tests which have a high degree of content similarity. The application of these techniques to intelligence tests will be explored. 188

APPENDIX B

WISC-R CANADIAN SUBSTITUTION ITEMS 189

Canadian Substitution Items for the WISC-R

Original Item No. Substitution Acceptable Answers

Info. 16. Bulb Who invented the telephone? .Bell, or Graham Bell

17. 1776 From which country did most England, Britain, of the first settlers in Scotland, or France Canada come?

19. Border Name three oceans that Accept Arctic and either border Canada. Atlantic or Pacific, £r both

21. Chile In what continent is Sweden? Europe

24. Tall How "tall is the average 5'7" - 5Ml", or Canadian man ? 170-180 centimetres

27. LA/NY How far is it from "Toronto 1700-2700 miles, or to Vancouver? :2700-4300 kilometres

Comp. 17. Senators Substitute: Members of Same scoring criteria as Parliament for Senators in Manual. and Congressmen APPENDIX C

PROJECT HANDBOOK Tester's Handbook

for the

B.C. Intelligence Testing Research Project

Contents:

Limitations on Children to be Included in the Sample Subjects' Code Numbers and Identification of Alternatives Parent Consent Forms Sample of Parent Consent Forms Child Consent Forms Scheduling Test Sessions and Order of Test Administrations Testing Directions Scoring Directions WISC-R Canadianization Summary of Directions 192

Limitations on Children to be Included in the Sample .1

Below is a segment of the letter sent to principals to guide them in the preparation of school lists from which the children were randomly selected for testing. To guarantee that children for whom testing is inappropriate do not appear in the sample, we have asked that the following children be excluded:

"a) physically, emotionally, or mentally handicapped children

Please exclude all children enrolled in classes for the physically handicapped, emotionally disturbed, or trainable mentally retarded. These children usually require special testing procedures beyond the nature of this study and to include them would be unfair to their particular needs.

b) Native Indian children

Previous studies have indicated that standardized intelligence tests may contain 'items which are culturally biased against Native Indian children. Current trends in testing emphasize the need for separate norms for these children. Dr. Oldridge is hoping to establish these Native Indian norms for B.C. in the near future. In the meantime, to avoid misapplication of the results of the present study, we have decided to exclude Native Indian children.

c) children who are not fluent in the use and comprehension of the English language

Since most of the tests are verbal, it is essential that the children being tested speak and understand English. Please exclude children in ESL classes, or children for whom you know verbal tests would be unfair. If there are cases where you have a question about degree of proficiency, include the names on the list. If any of these children are selected in our sample, we will include an .'English proficiency measure in their testing package."

. If you have been assigned to test a child whom you feel should not be included on the basis of these restrictions, please let us know and an alternate will be chosen. It is important however, that no changes or substitutions be made for other reasons. The unbiasedness and representativeness of the sample are dependent of the observance of the random selection procedures. 193

Subjects' Code Numbers and Identification of Alternates ?

So you know how it was done:

All 7, 9, and 11 year olds in each school were listed excluding the three categories of children described on the previous page. We further limited potential subjects to those who would be within three months of their half year at the time of testing: eg. 7-year-olds would be between 7 yrs. 3 mos. and 7 yrs. 9 mos. This was done in an attempt to simulate standardization procedures for the WISC-R, where all children were tested within six weeks of their midyear (see p.17 in the WISC-R Manual).

One subject and one alternate were randomly selected at each age level in each school. They are coded as in the following example:

9901 - 17 - A 9901 - 34 - B

where 9901 is the school code number 17,' 34 are simply order numbers on the school lists A is the first-drawn subject B is the alternate subject

The parents of all "A" children were contacted first and these children will appear in the final sample if consent was given. "B" children should be used only if an alternate is required.

The principal of each school has forms ready to contact "B" parents af this is necessary. This step can therefore be taken immediately and doesn't require a contact with UBC. If there are any queries or complications, however, please call us collect at any time. .194

Parent Consent Forms (sample on next page) 3

Please get from the principal the signed consent form for each child before any testing is done. The code number in the upper right hand corner is our only means of identification of the subject and therefore must appear on each test •protocol (enter as name).

We also need to know the level of education of the head of household. Please transcribe this information (the Roman numeral) on the front of the WISC-R protocol. 195

code no.

Parent Consent Form

I consent to 's participation in the testing research study at School. I am aware that this will involve two testing sessions of approximately one hour each, and that the tests will be returned anonymously to the University of British Columbia for scoring. I understand that confidentiality of test results will be maintained and that no individual scores will be released. I also understand that participation in this project is voluntary and may be terminated at any time.

signature

One of the problems in doing research of this scope is to ensure that the children chosen are truly representative of all the children in the province. To help us judge this, we have already grouped the children according to district, size of school, age, and sex. One further piece of information which we would like to compare to Census Canada data is the level of educa• tion of the head of household (that is, the major wage-earner in the family). Would you please put an "X" in front of the category below which best describes the completed level of education of the head of your household.

I Grade 8 and below

II Grades 9-10

III Grades 11-13 IV Post-secondary, non-university

V Post-secondary, including university

I am unwilling to have involved in the testing research study.

s ignature 196

Child Consent Forms (Request for Subject Participation) ,5

Stapled to the front of each child's test package is the consent form to be read prior to testing. It is recognized that a large part of a school psychologist's or psychometrician's professional skill involves soliciting and maintaining the cooperation of children during testing. Since this .skill is involved in the administration of individual intelligence tests, it will be necessary in many cases to encourage subjects' participation.

On the other hand, however, it is contrary to the purposes of the study and the validity of test results, to coerce any child who is seriously opposed to participation. Please use your judgement concerning unwillingness on the part of the child.

If it is necessary to select an alternative, please arrange to have the "B" child tested following procedures outlined on page 2 of this handbook. The principal has been instructed regarding these procedures and has the parent letters and consent forms on hand. 197

Scheduling Test Sessions and Order of Test Administration 6

Please schedule two testing sessions for each child on different days (i.e. not morning and afternoon of the same day). During Session I, administer the WTSC-R. At Session II, administer the other three tests. Please note that the Standard Progressive Matrices and the MillLHill Vocabulary Scale are con• sidered as one test and are given in the order: SPM followed by MHVS.

The order of administration is important and we ask that you do not alter it. In each case the WISC-R is given first. The other tests have been counter• balanced for order and randomly assigned to children. This procedure will control for any possible sequence effects that could occur. It is therefore important that the order be maintained.

Testing Directions

Follow the manual procedures as usual for the WISC-R, the Slosson, the Peabody Picture Vocabulary Test, and the Raven's Standard Progressive Matrices. The directions for the Mill Hill Vocabulary Scale accompany the test record forms in the test package.

Scoring Directions

All tests will be scored at UBC: this procedure will both save you time and guarantee the anonymity of results. Please, therefore, ensure that you record responses fully to allow a second party to score accurately.. 198

WISC-R Canadian!zation 7

The substitution items for the WISC-R have been enclosed within each protocol. They include alternate wordings and responses for six Information items and one Comprehension item. The items in the protocol have been circled in red to remind you to make the substitutions. Please read only the new wordings in each case.

You are welcome to keep these lists for your own use. The substitutions were recommended by Dr. Philip Vernon of the University of Calgary in the Bulletin of the Canadian Psychological Association. These wordings were found to yield item pass percentages for a Canadian sample which are closer to the original norm sample than .are pass percentages using the original wording.

The norms which we will provide you as a result of this study will be applicable only to these wordings. Summary of Directions 8

1. get parent consent forms from the principal

2. schedule 2 testing sessions per child

Session ill

3. record code number, sex, and birthdate of the child on each protocol

4. record the level of education of the head of household from the green Parent Consent Form to the top of the WISC-R protocol

5. read the Request for Subject Participation (pale yellow form)

6. administer the WISC-R with the Canadian substitutions (goldenrod form)

Session #11

7. check that code number, etc. (as in #1. above) are on each test protocol

8. administer tests in specified order

Please retain the Parent Consent Forms at the -school until all testing is completed APPENDIX D

MHVS SCORING GUIDES 201

Mill Hill Vocabulary Scoring Guides

Set A

l.CAP

1 point—Cover (lid) for a bottle....Like a cork (Q) like a lid.... Wear it (Q) a baseball cap....To put a cap on, put a top on....Cover the top....A small quantity of explosive in a wrapper....Anything like a cap—top of a mushroom....Hat....A thing you wear (put) on your head....What you shoot in a gun.... Something you wear to keep the sun out of your eyes

0 points—Boy's cap

2.LOAF

1 point—Spend time idly....Do nothing....Bread that is shaped and baked as one piece.... Food shaped like a loaf of bread.... Food (Q) you eat it....Dough....You bake it....Loaf of bread....It has a crust....Meatloaf

0 points—White....Meat....It's round.... It's long.... Something you eat....Piece of bread.... Bread

3.FROCK

1 point—Gown... .Dress Loose outer garment Clothes You wear it....It keeps you warm

0 points—

3b.DRESS

1 point—(Girls) Wear it....Blouse and skirt combined in one....An outer garment worn by women, girls and babies.... Clothes.... Formal clothes.... Clothing.... To put clothes on....Decorate....Trim.... Adorn.... Gown.... Frock.... Like this (indicating a dress worn)

0 points—Skirt

4.DAMP

1 point—A bit wet....Sort of wet and sort of dry....Grass is damp with dew on it.... Slightly wet....Moist....Moisture....Not dry.... When it rains....Put the washing our (Q) it's damp

0 points—Dry....Grass is damp.... Floor is damp (Q) you mustn't sit on it....Wet 202

5.NEAR

1 point—Close....Not far....Just beside....Touching....By....Next.... Almost....Together....Near somebody (Q) you're talking to them.... Nearly at it

0 points—Nearly....There

6.UNHAPPY

1 point—Sad.... Sorrowful....Unlucky....When something's wrong....Dis• appointed. ... If someone hits you....When you don't like nobody....When you cry.... Sulky....Not happy

0 points—When you fight....When you've done something....When you're unhappy....Angry....Mad

7.DISTURB

1 point—Bug, bother, hastle, annoy....Destroy the peace, quiet, or rest of....Break in upon with noise or other distraction....Put out of order....Make uneasy....Interrupt....Inconvenience....Wake you up....Make a noise....Disturb people (Q) don't leave them alone.... Disturb the cat (Q) it scratches.... If you're doing work and some• body talks to you

0 points—Noise....By angry....Make fun of you....Asleep

8.BATTLE

1 point—A fight between opposing armed forces....Fighting or war.... Fight....Contest....Struggle....Contend....Soldiers have a battle.... A war.... Shooting....When you charge....A duel....You kill each other

0 points—Argument....Army....You win....

9.RECEIVE

1 point—Someone gives you something, you get it....Take something (offered or sent or given)....Take into one's hands or possession.... Take, accept, admit or get something....Acquire, obtain, get some• thing .... Receive a letter (Q) you get one....It comes to you....Meet someone....Welcome....Brought....Receive a parcel (Q) undo it.... Receive something off someone

0 points—Send....Take away.... Give.... Receive a letter (Q) put it in the pot.... Receive anything 203

10.VIEW

1 point—An act of seeing.... Sight.... The power of seeing.... Something seen. ...A scene.... .A way of looking at or considering a matter.... Opinion....See....Watch....Vision....Photo....See France from Dover (Q) good view....From a window.... See something from a distance

0 points—Nice place.... Show.... Go to view somebody if they've been ill....Get a view of the lake....

11.CONTINUE

1 point—Keep up, keep on, keep doing....When you see part of the show this week and again next week....When you do one part and it continues, you do another part....Go on....Go on with (something) after stopping ....Begin again....Resume....Last....Endure....Do it again.... Finish later on....Follow on.... Continue the same work.... Something that stopped and started.... Stop and then continue on

0 points—Again....Start....Ended....Keep stopping.... Go on to some• thing else....When you're doing something....The show will continue next week

12.STARTLE

1 point—Frighten suddenly.... Surprise....A sudden shock of surprise or fright.... Scare, frighten....Alarm.... Shock.... Stun....Make some• one jump....Shaken....Afraid suddenly....Someone comes in suddenly (Q) you're startled....

0 points—Something ,startled you.... Something happens quickly

13. PERFUME

1 point—A smell (Q) comes in a bottle and you spray it on your neck .... Cologne....A liquid having the sweet smell of flowers....A sweet smell....Put a sweet-smelling liquid on....Fragrance....Scent.... Stuff in a bottle (Q) it smells nice, good....What you put in your bath....Ladies wear it when they go out

0 points—Something you put on....Perfume is what you use....Make up .... Powder.... Shampoo.... Sweet.... Ladies wear it (Q).... Smell (Q).... Odor

14. MALARIA

1 point—A disease characterized by periodic chills followed by fever and sweating.... Illness.... Sickness....Makes you ill

0 points—When a man feels sick....You come out in spots....You get a cold 204

15.MINGLE

1 point—Mix....Associate....Blend....Fuse....Combine....Merge into.... Get in a crowd....Jumble....Mingle in a bunch of children.... Joined together.... Get amongst....Join....Wander in a crowd

0 points—Collect together....

16.FASCINATED

1 point—Delighted....Attracted very strongly....Enchanted by charming qualities....Amazed....Interesting....Something makes you look a lot ....Pleased....When someone looks nice....See a thing and can't believe it... .

0 points—Astonished....Frightened....Amused....A feeling....Something pleasing.... Like the look of....Fancy

17.BRAG

1 point—Say you're the greatest.... Say you can do something better than the other person....Boast....Talk about oneself.... Think you're good....To be proud....Big talk.... Telling a story (Q) make a lot up .... Show off

0 points—Talk too much.... Gloat....Talking about one thing all the time....You brag because you won the game

18.PROSPER

1 point—Be successful....Have good fortune....Flourish....Get more ....Get wealthy.... Better off....Have money (Q) prosper by selling again....Make money....Have good luck

0 points—Do good....Live....Do a thing well....Do it right....When you're given something

19. ANONYMOUS

1 point—By or from a person whose name is not known or not given.... Having no name....Nameless....Unknown....Don't know who did it....A person who wrote something and didn't put his name at the bottom

0 points—Pen name....False....Letter....Won't tell

20. VERIFY

1 point—Prove (something) to be true....Confirm....Test correctness 205

of....Check for accuracy.... Get proof.... Confirm.... Find out.... Identify.... Correct or make sure....That a statement is made and correct

0 points—Compare....View closely....Clarify....Give evidence.... Swear ....Witness a thing....Agree

21. RUSE

1 point—Trick....Stratagem....Artifice....Dodge....Wile

0 points—

22. FORMIDABLE

1 point—Hard to overcome....Hard to deal with....To be dreaded.... Appalling....Fearful

0 points—Something you can't do

23.IMMERSE 1 point—Plunge into a liquid.... Baptize.by dipping under water.... Involve deeply....Absorb....Submerge

0 points—

24. DOCILE

1 point—Easily managed.... Obedient.... Easily taught....Willing to learn

0 points—

25. VIRILE

1 point—Manly....Masculine.... Full of manly strength or masculine vigor....Vigorous....Forceful

0 points—

26.SULTRY

1 point—Hot, close, and moist.... Full of passion....Fiery

0 points— 206

27.STANCE

1 point—Manner of standing, posture....A standing place....Position

1 points—

28.EFFACE

1 point—Rub out....Blot out....Do away with....Destroy....Wipe out

0 points—

29.SENSUAL

1 point—Having to do with the bodily senses rather than wit- the mind or soul.... Caring too much for the pleasures of the senses....Lustful .... Lewd

0 points—Sensitive 207

Mill Hill Vocabulary Scoring Guides

Set B

1.TOMATO

1 point—Fruit....Vegetable....A juicy fruit used as a vegetable— most tomatoes are red when ripe.... Round and red....You eat it.... Red ball with a stalk on

0 points—

2.REST

1 point—Sleep....Ease after work or effort....Freedom from anything that tires, troubles, disturbs or pains....Quiet....Be still, sleep ....Stop moving.... Lie, recline, sit, lean, etc. for rest or ease.... Be at ease.... Relax.... Become inactive....What is left....Stop working ....When you're tired.... Rest on a chair....Nap

0 points—Tired....Restless

3.PATCH

1 point—A piece put on to mend a hole or tear....A piece of cloth, etc. put over a wound or sore....Like a bandaid....Fix something that has a hole in it....A protective pad over an injured eye....A piece of ground—a garden patch....To put patches on....Mend....Darn....Patch your trousers when they are ripped....Cover a hole

0 points—Tear....Bare.... Space....Hole....Patch of sky....Patch your clothes (Q)....Get a patch from Scouts (ie badge).... Something you put on your pants (Q)

4.AFRAID

1 point—Not brave.... Feeling fear.... Frightened.... Filled with fear or apprehension.... Fearful.... Scared....You cry....You run away.... Araid of a lion

0 points—You're afraid....You're sad

5.CRUEL

1 point—Fond of causing pain to others and delighting in their suf• fering. .. .Not caring about pain and suffering of others.... Causing pain and suffering....Mean....Evil.... Showing a cruel nature....Un- 208

kind....Bad....Hurt....Be nasty....Bully....Take things from someone ....Cruel to a dog (Q) hit it Horrible Awful....Beat somebody up....You kill an animal

0 points—Strict Selfish Unhappy Don't be cruel (Q) don't do it....mad

6.BLAZE

1 point—A bright flame or fire....An intense light, glare....A violent outburst.... Burn with a bright flame....Be on fire....Mark (a tree, trail, etc.) by chipping off a piece of bark.... Fire.... Light (Q) it blazes....When you get in a rage

0 points—Blaze away.... Blade of grass.... Burn (Q)

7.ACHE

1 point—Be in continued pain....Be eager, wish very much....A-dull, steady pain....Hurt....Long....Yearn....Throb....Hurt....Stiff.... Not very well....In your tummy (Q) nasty....Headeache, toothache.... Ailment.... Sore....Pain....Soreness

0 points—

8.SQUABBLE

1 point—A petty, noisy quarrel.... Fight....Argue.... Squabble with a boy (Q) take it away from him.... Squabble over something (Q) shout ....Squabble over a skipping rope (Q) both want it

0 points—Talking....With people....Making a noise

8b.QUARREL

1 point—An angry dispute or disagreement Find fault Feud Fight.... Same as squabble

0 points—Get angry, mad

9.RAGE

1 point—Mad (Q) really mad....Tantrum....Temper....Violent anger.... Fit of violent anger....Violence....An idea, etc. that is popular for a short time....Great enthusiasm.... Storm.... Be wild....Be angry.... Lose one's temper.... Get in a rage (Q) start a fight

0 points—Terror....Noise....Rage up....really scared....in a bad mood ....cross 209

10.SHRIVEL

1 point—Dry up....Wither.... Shrink and wrinkle....Waste away, become useless....Make helpless or useless....Make less....Curl up.... Crinkle up....Go small.... Shrivel up by the heat.... Shrivel up (Q) go dead.... Shrivel up like a crisp.... Loss of flesh.... Shrivel on the fire (Q) be burnt up....Raisins are shrivelled (Prunes)

0 points—Burn up. .. ,'Warp. .. .Piece of paper is screwed up....Get soft ....Turn brown....Go rotten....When you're cold or chilly (ie shiver)

11.CONNECT

1 point—Hitch (go, put) together.... Join (one thing to another).... Link (two things together).... Fasten together....Unite.—Join or link together in an electrical circuit....Connect on the telephone (Q) put you through.... Catch a train....With something....Together....Attach ....Connect things together....Hip bone connects to the leg bone.... Wire up something, wire your speaker into tape deck

0 points—Talk on the phone.... Send a message.... Connect wireless with another station.... Connect two wires (Q)....Plug it in

12.PROVIDE

1 point—Supply.... Furnish.... Supply (or arrange) means of support.... Take care for the future....Get ready....Prepare....Deliver goods.... Save....Care for....Help.... Buy.... Provide a meal (Q) get it ready.... Do something provided you don't do something else....Give people what they need—food and stuff....Give (Q)

0 points—Let....Careful....To receive....Plenty....Share....Provide food

13.STUBBORN

1 point—Not wanting (willing) to do it....Fixed in purpose or opinion ....Not giving in to arguments or requests.... Characterized by obstinacy ....Hard to deal with or manage.... Inflexible in opinion or intention ....Intractible....Resistant....Refuse....Won't be told....Won't do it....Like a mule (Q)....Won't change his mind

0 points—Sulky....Angry....Quarrelsome....Stern....When you're naughty (Q) doing wrong....Bad....Stupid

14.SCHOONER

1 point—A ship with two or more masts and fore-and-aft sails.... Type of boat, ship....Yacht....Fishing boat.... Something that floats.... Something at sea....Goes on water 210

0 points—Steam ship....Barge .Submarine .A vehicle (Q)

15. LIBERTY

1 point—Freedom.... The right or power to do as one pleases....Live free....Have your own way....Go out and enjoy yourself....Nothing to do....Take a liberty (Q) do something you're not told to

0 points—Peace....Victory....Justice.... Statue (of Liberty).... Women's rights (Q)

16. COURTEOUS

1 point—Polite.... Thoughtful of others....Pleasant....Nice....Generous ....Manners....Kind....When someone helps another person

0 points—Good....Proud

17.RESEMBLANCE

1 point—A similar appearance.... Likeness.... Emphasizes looking alike ....Like someone else....Like parents....Copied....Just the same.... Like a twin

0 points—Familiar....When a thing happens again....

18.THRIVE

1 point—Grow strong....Grow vigorously.... Be successful.... Grow rich ....Prosper....Improve....Increase....Grow....Be healthy....When you've been ill and you get better.... Successful.... Rich.... Carry on....Go on

0 points—Live a long time....Get through an illness....

19.PRECISE

1 point—Exact....Accurate....Definite....Careful....Strict....Scrupulous .... Prim.... Just so....Neat....Precise date (Q) state a certain date ....To mean what you say....Talk posh (Q) not like a country boy.... Right on

0 points—True....Same....Just....At once....Right away.... Sure.... Definitely....Abrupt....Tidy....Ladylike....Nice....Cute....Smart.... At this very moment 211

20. ELEVATE

1 point—Lift up... .Raise... .Put in high spirits... .Elate. .. .Turn up• wards ....Make spirits rise....Go up (in elevator).... Gain height.... Pull up....Carry up....Take up....Hold up

0 points—Up....High....Build up....Move....Go up and down....Keep your spirits up....Like stairs (Q)

21. DWINDLE

1 point—Make or become smaller and smaller.... Shrink....Diminish... . Lessen....Decline....Wane

0 points—

22. LAVISH

1 point—Very free or too free in giving or spending....Prodigal.... Very abundant....More than enough.... Given or spent too freely.... Extravagant

0 points—Something posh....Luxurious

23. WHIM

1 point—A sudden fancy or notion.... Freakish of capricious idea or desire....

0 points—

24.SURMOUNT

1 point—Rise above....Be above or on top of....Go up and across.... Overcome

0 points—

25. BOMBASTIC

1 point—Using bombast (high-sounding, pompous language)

0 points—

26. RECUMBENT

1 point—Lying down....Reclining....Leaning 212 0 points—

27.ENVISAGE

1 point—Foresee....Visualize....Form a mental image of

0 points— APPENDIX E

BRITISH COLUMBIA NORMS TABLES 214

BRITISH COLUMBIA NORMS TABLES

for

Wechsler Intelligence Scale for Children-Revised Peabody Picture Vocabulary Test Slosson Intelligence Test Standard Progressive Matrices Mill Hill Vocabulary Scale

AGES 7*5, 9h, Hh

Barbara J. Holmes University of.British Columbia August, 1980 215

The tables which are presented in this manual were prepared as a portion of a doctoral dissertation study in the Department of Educational

Psychology at the University of British Columbia. Financial support for the study was provided through.a grant from the Educational Research

Institute of British Columbia and a bursary from the Canadian Association for Educational Psychology.

I wish to express my thanks to my committee chairman, Dr. Todd

Rogers, who provided the methodological expertise for the study and a model of the unrelenting pursuit for excellence. I also wish to thank

Dr. Buff Oldridge for his always generous support, advice and grass-roots assistance.

A large measure of appreciation goes to the 44 school psychologists and psychometricians across the province whose volunteer testing con• tributions made this study possible. This work is dedicated to them.

And thanks to my graduate student colleagues and friends for their willing and multiple contributions in many stages of the project.

A final word of appreciation goes to 340 children in public and independent schools throughout B.C. who gave their time and energies to helping all of us through five test administrations. 216

.Table of Contents

Page

Introduction 1

Standardization Procedures 2 Stratification of the Sample 2 WISC-R Canadianization 3 Preparation of the Norms Tables 3

Canadian Substitution Items for the WISC-R 4

Statistical Properties of the Tests 5

Interpretation of the Tables in Appendix B 5

The Results of Renorming 9

References 10

Appendix A. British Columbia Norms Tables 11

Appendix B. Percentile Ranks for.British Columbia '30 in Terms of American Norms 217

List of Tables

Table Page

1 Reliability Coefficients by Age 6

2 Standard Errors of Measurement by Age 7

3 Correlation Coefficients by Age 8

A British Columbia Standardization of the WISC-R 12 Scaled Score Equivalents of Raw Scores

B British Columbia Standardization of the WISC-R 15 IQ Equivalents of Sums of Scaled Scores

C British Columbia Standardization of the PPVT 18

D British Columbia Standardization of the SIT 21

E British Columbia Standardization of the .SPM .24

F British Columbia Standardization of the MHVS .27

G British Columbia Sample-Scored with American 31 Norms 218 Introduction

Many of the tests used in schools in British Columbia were prepared and standardized in the United States. The purpose of this project was to produce provincially representative norms for four commonly used individual tests of intelligence: the Wechsler Intelligence Scale for

Children-Revised (WISC-R), the Peabody Picture Vocabulary Test (PPVT), the Slosson Intelligence Test (SIT), and the Standard Progressive Matrices

(SPM). A fifth test, the Mill Hill Vocabulary Scale (MHVS) was also included. Although the MHVS is little known in B.C., it was developed as a complement to the SPM; Raven recommended, the combined use of the two tests "in place of a single test of 'general intelligence' "(1960, p.3).

This paper contains, in condensed form, a description of the stan• dardization procedures and the statistical properties of the tests. The

B.C. norms tables for all five tests are included in Appendix A. In

Appendix B are tables of selected standard deviations with the corres• ponding IQ scores and percentile ranks for the B.C. sample when scored using the published (American) norms.

Please direct any request for information beyond what is included in this manual to:

B. Holmes % Education Clinic Faculty of Education U.B.C. Vancouver, B.C. V6T 1Z5

1 219 Standardization Procedures

Description of the Population

The norms in this manual are applicable to the population of non-

Native Indian1, English-speaking children at .three age levels — IH years,

9h years and 11^ years — attending public and independent schools in

British Columbia. The population was further restricted to exclude children enrolled in classes for the physically handicapped, emotionally disturbed or trainable mentally retarded.

Stratification of the Sample

A stratified sampling design was used employing the stratification variables described below:

Geographic region. The province of B.C. was divided into the six adminis• trative regions determined by the Ministry of Education.

Community size. Three community sizes were used:

A under 1,000 B 1,001 to 50,000 C over 50,000

Size of school. School size was defined by total student enrollment.

Three size categories were used:

I under 150 II 151 to 300 III over 300

Age. Age was defined within 3 months of the midyear. Thus the three age groups included represent the following age ranges:

Native Indian children were excluded in the hope of better serving their testing needs through the provision of separate norms. A project with this purpose is currently being planned (see More & Oldridge, 1980). WISC-R norms for Native Indian children representing three bands in south• western B.C. are currently available (Seyfort, Spreen & Lahmer, 1980).

2 7 years 3 months 0 days - 7 years 8 months 30 days 9 years 3 months 0 days - 9 years 8 months 30 days 11 years 3 months 0 days - 11 years 8 months 30 days

Sex. Within each age group, the children were divided equally by sex.

A random sample of 340 children was selected in a manner to pro•

portionately represent the population described according to these

variables.

WISC-R Canadianization

The WISC-R was administered using the Canadian substitution items

listed on the following page. These items were recommended by Dr.

Philip Vernon of the University of Calgary in the Canadian Psychological

Association Bulletin (1977). Vernon found the revised wordings to yield

item pass percentages for a Canadian sample which are closer to the

.American standardization sample than are pass percentages using the

original wording. Please note that the Canadian substitution items

must be used for the B.C. norms tables to be applicable.

Preparation of the Norms Tables

IQs

The WISC-R scaled scores and IQ scores were prepared following the

procedures used by the Psychological Corporation (Wechsler, 1974, pp.21-

24).. Both PPVT and SIT IQ scores were prepared in the manner described

in the PPVT Manual (Dunn, 1965, pp. 28-29). These procedures produce

deviation IQs calculated separately for each age level. This represents

a change in metric for SIT IQs, since they were originally reported in

ratio .form (Slosson, 1977). 221

Percentile ranks

The percentile ranks published in the WISC-R Manual (p. 25) may be applied to the B.C. WISC-R norms. For all other tests, the percentile ranks are included in the tables.

Statistical Properties of the 'Tests

Reliability coefficients, intercorrelations and standard errors of measurement for all tests are presented in Tables 1 through 3. Below each table is information regarding the data included in the calculations.

.Interpretation of the Tables in Appendix B

The tables in Appendix B were prepared in response to a request from practicing school psychologists. These tables are to be used when the tests have been scored using the original norms provided in the res• pective test manuals. The inclusion of percentile ranks allows :for interpretation of a score in terms of its rank position within the B.C. population.

Examination of these tables shows consistently higher means on all tests for the B.C. sample than for the respective-standardization groups.

The varying IQ - percentile relationships across tests are a result of variations in the shapes of .the test score distributions and in "their

standard deviations. 222 Table 1

Reliability Coefficients by Age

Age

Test 9h Ilk

WISC-R Information ,69 ,68 .86 Similarities .68 .77 .80 Arithmetic .72 .65 .70 Vocabulary .59 .75 .82 Comprehension .66 .98 .71

Picture Completion .61 .68 .67 Picture Arrangement .75 .55 .42 Block Design .85 .81 .77 Object Assembly .63 .67 . 62 Coding

Verbal IQ ,86 .93 .93' Performance IQ .88 .83 .82 Full Scale IQ .86 .88 .87

PPVT .88 .87 ..87

SIT .78 .79 .88

SPM .89 .92 .83

MHVS .76 ..83 .88

Note: The reliability coefficients for all tests are split-half correlations corrected by the Spearman-Brown formula. Since this type of reliability is not appropriate for use with Coding, there are no reliabilities reported for this test. The coefficients of the WISC-R IQ Scales were calculated from the formula for the reliability of a composite group of tests (Guilford, 1954, p, 393).

6 223

Table .2

Standard Errors of Measurement by Age.

Age Test 73s 9h

WISC-R Information 1.63 1.75 1.12 Similarities 1.75 1.46 1.30 Arithmetic 1.52 1.72 1.63 Vocabulary 1.86 1.46 1.26 Comprehension 1.68 «43 1.62

Picture Completion 1.87 1.74 1.63 Picture Arrangement 1.51 2.05 .2.30 Block Design 1.10 1.35 1.47 Object Assembly 1.81 1.72 1.82 Coding

Verbal IQ .5.39 4.07 4.08 Performance IQ 4.98 6.23 6.61 Full Scale IQ 5.16 5.41 5.63

PPVT 5.20 5.41 5.41

SIT 7.04 6.87 .5.20

SPM 2.40 2.28 .2.70

MHVS 1.98 "2.23 2.18

Note: The standard errors of measurement are in scaled score units for the WISC-R tests; in IQ units for the WISC-R, PPVT, and SIT IQ scales; and in raw score units for the SPM and MHVS.

7 224

Table 3

Correlation Coefficients by Age

WISC-R

Perfor• Full Verbal mance Scale PPVT SIT SPM IQ IQ IQ IQ IQ

PPVT •59 .32 .56

SIT .77 • .35 .68 .62 AGE 71s SPM ...26 .42 .40 .25 .32

MHVS .72 .32 .64 .58 .72 .....24

PPVT .64 .46 .62

SIT .65 .54 .67 .62 AGE 9h SPM .43 • ..53 .52 .40 .42

MHVS .72 ..35 .61 .64 .59 ..31

PPVT .70 .38 .62

SIT .82 ..57 .79. .62 AGE llh SPM .45 .54 .56 .34 .54

MHVS .81 .48 .75 .72 .77 ..44

The correlation coefficients were computed using IQ scores for the WISC--R, PPVT and SIT and raw scores for the SPM and MHVS.

.8 225 The Results of Renorming

Intelligence tests are traditionally scaled in a manner which assigns

the mean score a value of 100 and the standard deviation a value of 15 or

16. Included in the B.C. standardization procedure was the re-alignment

of the IQ test score scales for three tests (WISC-R, PPVT, SIT) to mean

100 and standard deviation 15. (The SPM and MHVS were retained as raw

score scales with new percentile ranks for B.C.) The B.C. sample scored

higher and with less variability than the American standardization samples using American norms; therefore the B.C. rescaling procedure resulted in

lowering and spreading out the score scales. The result is that below

average students will score lower with.B.C. than American norms, while students who are much above average will score the same or even higher with B.C. norms.

9 226 References

Dunn, L.M. Expanded manual for the Peabody Picture Vocabulary Test. Circle Pines, Minn.: American Guidance Service, 1965.

More, A.J. & Oldridge, B. An approach to non-discriminatory assessment of Native Indian children. B.C. Journal of Special Education, 1980, 4(1)-, 51-59.

Raven, J. Guide to the Standard Progressive Matrices. Sets A,B,C,D and E. London, England: H.K. Lewis, 1960.

Seyfort, B., Spreen, 0. & Lahmer,. A critical look at the WISC-R with Native Indian children. Alberta Journal of Educational Research, . 1980, 26(1) , .14-24.

Slosson, R.L. Slosson Intelligence Test (SIT) for Children and Adults. East Aurora, N.Y.: Slosson Educational Publications, Inc., 1977.

Vernon, P.E. Final report on modifications of WISC-R for Canadian use. Canadian Psychological Association Bulletin, .1977., 5.(1), 5-7..

Wechsler, D. Manual for the Wechsler Intelligence Scale for Children- Revised. N.Y.: Psychological Corporation, .1974.

10 227 Appendix A

British Columbia Norms Tables

11 7 yrs; 3 mos. 0 days Table A through 7 yrs. 8 mos. 30 days British Columbia Standardization of the WISC-R

Scaled Score Equivalents of Raw Scores

VERBAL PERFORMANCE

Picture Picture Scaled Infor• Simil• Arith• Vocab• Compre• Scaled Com• Arrange• Block Object Scaled Score mation arities metic ulary hension Score pletion ment Design Assembly Coding Score

1 0-2 0 0 0-5 0-1 1 0-4 0-1 0-1 0-3 0-20 1 2 3 1 i-2 6-7 2 2 5-6 2 2 4 21 2 3 4 2 3 3-4 8 3-4 7-8 3-4 3 5 .22 3 4 3 4 5 9-H 5 9 5 4 6-8 23 4 5 -5 4 • 5 12-13 6 10 6-8 - 9-11 24-26 5 6 6 5 6 6 14-15 7-8 11 9-12 5 12-13 27-28 6

7 7 6-7 7 16 9 12 13-16 6-9 14 29-32 7 8 8 8 17-18 10 8 13 17-18 10-11 15-16 33-35 8 9 9 8 i9-20 11 9 14 19-21 12-14 17 36-40 9 10 9 10 21 12-i3 10 15 22-23 .15-18 18-19 41-42 10 11 11 22 io 9 14 11 16 24-26 19-22 20-21 43-44 11 i2 12 10 11 23-24 15 12 17 27 23-26 22 45 12 13 13 12 25-26 16 13 i8 28-29 27-31 23 46 13

14 14 13 14 11 27 14 17 19 30-32 32-36 24 47 15 14 15 15 15 12 28-29 18 - 33-35 37-42 16 16 25-26 15 16 30-31 19-20 16 17 20 36-38 43-46 27 48 16 17 17 17 13 32-33 21-22 21 39-41 47-50 28 18 i8 18 17 18 14 34-35 23 22 19 42 51 29 49 18-30 19-30 15-i8 36-64 19 19 24-34 23-26 43-48 52-62 30-33 50

OO ts3 9 yrs. 3 mos. 0 days through Table A(continued) 9 yrsi 8 mos. 30 days British Columbia Standardization of the WISC-R

Scaled Score Equivalents of Raw Scores

VERBAL PERFORMANCE

Picture Picture Scaled Infor• Simil• Arith• Vocab• Compre• Scaled Com• Arrange— Block Object Scaled Score mation arities metic ulary hension Score pletion ment Design Assembly Coding Score

1 0-4 0-2 0-2 0-14 0-4 i 0-8 0-3 0-7 0-6 0-20 1 2 5 3-4 3-4 15-17 5-6 2 9-10 4-6 8-9 7-8 21-22 2 3 6-7 5-6 5-6 18-20 7-8 3 11 7-10 10-11 9-10 23-24 3 4 8 7 7 21 9-10 4 i2 11-14 12 11-12 25-26 4 5 9 8 8 22 11 5 . 13 15-16 13-14 13-14 27 5 6 10 9 9 23 12 6 14 17-21 15 15-16 28-29 6

7 ii io 10 24 13 7 15 22-23 16-19 17 30 7 8 - - 11 25-26 14 8 16 24 20-23 18-20 31-32 8 9 12 11 - 27 15 9 17 25-27 24-27 21 33-35 9 10 13 12-13 12 28-29 16-17 10 18 28-30 28-30 22-23 36-38 10 11 14 14 - 30-31 18 11 19 31 31-34 . 24 39-41 11 12 15 15-16 13 32-33 19 12 - 32-33 35-37 25 42-44 12 13 16 17 - 34-35 20 13 20 34-36 38-41 - 45-47 13

14 17-18 18-19 14 36-37 21 14 21 37 42-47 26-27 48-51 14 15 19 20 - 38-40 22 15 22 38-39 48-49 28 52 15 16 20 21-22 15 41-43 23-25 16 - 40-43 50-52 29 53-62 16 17 21 23-24 i6 44 26 17 23 44 53-55 30 63-65 17 18 22 25-26 17 45 27 18 24 45 56-57 31 66-68 18 19 23-30 27-30 is 46-64 28-34 19 25-26 46-48 58-62 32-33 69-93 19

M ro 11 yrs. 3 mos. 0 days table A(continued) through ll yrs. 8 mos. 30 days British Columbia Standardization of the WISC-R

Scaled Score Equivalents of Raw Scores

VERBAL PERFORMANCE

Picture Picture Scaled Scaled Infor• Simil• Arith• Vocab• Compre• Scaled Com• Arrange• Block Object Score mation arities metic ulary hension Score pletion ment Design Assembly Coding Score

1 0-6 0-4 0-4 0-16 0-5 1 0-10 0-4 0-13 0-10 0-22 2 7 5 5-6 17-18 6-7 2 11-12 5 14-15 11-12 23-25 3 8 6 7-8 19-20 8-9 3 13-14 6-7 16-17 13-14 26-28 4 9 ' 7 9 2i-22 10 4 15 8-11 18-20 15-16 29-31 5 10 8 io 23-24 11-13 5 - 12-19 21-23 17-18 32-34 6 11 9-10 11 25-26 14-15 6 16 20-25 24-26 19 35-37 7 12-13 11-12 12 27-28 16 7 17 26-27 27-28 20-21 38-40 8 14 13-14 13 29-30 17 8 18 28 29-32 22-23 41-43 9 15 15 - 31-33 18-19 9 19 29-30 33-35 24 44-46 10 16-17 16-17 i4 34-35 20 10 20 31-32 36-38 25 47-48 11 18 18-19 - 36-37 21 11 2l 33 39-41 26-27 49-51 12 19 20 15 38-39 22-23 12 22 34-36 42-45 28 52-55 13 20 21 - 40-42 24-25 13 - 37 46-48 29 56-58 14 21 22 16 43 26 14 23 38-39 49-54 30 59-63 15 22-23 23 - 44 27 15 - 40 55-57 31 64-65 16 24 24 17 45 - 16 - 41-43 58 32 66-72 17 25 25-26 - 46-47 28 17 24 44-45 59 - 73-75 18 26 27 18 48-49 29 18 25 46-47 60 33 76-78 19 27-30 28-30 — 50-64 30-34 19 26 48 61-62 79-93

ro o Table B

British Columbia Standardization of the WISC-R

IQ Equivalents of Sums of Scaled Scores

VERBAL

Sum of Sum of Sum of Scaled Scaled Scaled Scores IQ Scores IQ Scores IQ

41 88 81 141 42 89 82 143 43 90 83 144 44 92 84 145 45 93 85 147

46 94 86 148 47 96 87 149 • 48 97 88 151 9 45 49 98 89 152 10 46 50 100 90 154

11 48 51 101 91 .155 12 49 52 102 13 50 53 104 14 52 54 105 15 53 55 107

16 54 56- 108 17 56 57 109 18 57 58 111 19 58 59 112 20 60 60 113

21 61 61 115 22 62 62 116 23 64 63 117 24 65 64 119 25 66 65 120

26 68 66 121 27 69 67 123 28 70 68 124 29 72 69 125 30 73 70 127

31 74 71 128 32 76 72 129 .33 77 73 131 34 78 74 132 35 80 75 133

36 81 76 135 37 82 77 136 38 84 78 137 39 85 79 139 40 86 80 .140 Table B(continued) British Columbia Standardization of the WISC-R

IQ Equivalents of Sums of Scaled Scores

PERFORMANCE

Sum of Sum of Scaled Scaled . Scores 10 Scores IQ

51 101 52 103 53 104 54. 106 15 45 55 108

16 47 56 109 17 48 57 111 18 50 58 112 19 52 59 114 20 53 60 115

21 55 61 117 22 56 62 118 23 58 63 120 24 59 64- 122 25 61 65 123

26 62 66 125 27 64 67 • 126 28 66 68 128 29 67 69 129 30 69 70 131

31 70 71 132 32 72 72 134 33 73 73 136 34 75 74 137 35 76 75 139

36 78 76 140 37 80 77 142 38 81 78 143 39 83 79 145 40 84 80 146

41 86 81 148 42 87 82 150 43 89 83 151 44 90 84 153 45 92 85 154

46 94 86 155 47 95 48 97 49 98 50 100 Table B(continued) British Columbia Standardization of the WISC-R 233 IQ Equivalents of Sums of Scaled Scores

— FULL SCALE

Sum of Sum of Sum of Sum of Scaled Scaled Scaled Scaled Scores Scores IQ IQ Scores IQ Scores IQ

61 68 101 101 141 134 62 69 102 101 142 134 63 69 103 102 143 135 64 70 104 103 144 136 65 71 105 104 145 137 '

66 72 106 105 146 138 27 40 67 73 107 106 147 138 28 41 68 74 108 106 148 139 29 42 69 74 109 107 149 140 30 42 70 75 110 108 150 "141

31 43 71 76 111 109 151 142 32 44 72 77 112 110 152 142 33 45 73 78 113 .110 153 143 34 46 74 78 114 111 154 144 35 46 75 79 115 112 155 145

36 47 76 80 116 113 156 146 37 48 77 81 117 114 157 147 38 49 78 82 118 115 158 147 39 50 79 82 119 115 159 148 40 50 80 83 120 116 160 149

41 51 81 84 121 117 161 150 42 52 82 85 122 118 162 151 43 53 83 86 123 119 163 152 44 54 84 87 124 120 164 152 45 55 85 88 125 120 165 153

46 55 86 88 126 121 166 154 47 56 87 89 127 .122 167 155 48 57 88 90 128 123 168 156 -49 58 89 91 129 124 169 156 50 59 90 92 130 124 170 157

51 60 91 92 131 125 171 158 52 60 92 93 132 126 172 159 53 61 93 94 133 127 173 160 54 62 94 95 134 128 55 63 95 96 .135 128

56 64 96 96 136 129 57 64 97 97 137 130 58 65 98 98 138 131 59 66 99 99 139 132 60 67 100 100 140 133 17 7 yrs. 3 mos. through

Table C n „ 7 yrs. 9 mos.

British Columbia Standardization of the

Peabody Picture Vocabulary Test IQ and Percentile Equivalents of Raw Scores

Raw Raw Score IQ %ile Score IQ %ile

71 105 62 72 107 67 73 108 71 34 40 74 110 75 35 42 75 112 78

36 44 76 114 82 37 45 77 116 84 38 47 78 117 85 39 49 79 119 -&r- 40 51 80 121 91

41 52 81 122 92 42 54 82 124 93 43 56 83 126 94 44 58 84 128 95 45 59 85 130 96

46 61 86 131 97 47 63 87 133 99 48 65 88 135 99.6 49 66 89 136 50 68 90 138

51 70 91 140 52 72 92 142 53 73 93 144 54 75 1 94 145 55 77 2 95 147

56 79 , 4 96 149 57 80 6 97 150 58 82 10 98 152 59 84 14 99 154 60 86 18 100 156

61 88 21 101 158 62 89 29 102 159 63 91 .38 103 160 64 93 -•41 65 94 44

66 96 -hi 67 98 50 68 100 :53 69 102 ;56 70 103 58 18 235 9 yrs. 3 mos. through

Table C(continued) 9 yrs< 9 m0S-

British Columbia Standardization of the

Peabody Picture Vocabulary Test

IQ and Percentile Equivalents of Raw Scores

Raw Raw Score IQ %ile Score IQ %ile

81 98 JiO 82 99 50 83 101 51 84 103 56 85 104 • 61

46 40 86 106 66 47 42 87 108 72 48 43 88 109 76 49 45 89 111 79 50 47 90 112 •80

51 48 91 114 83 52 50 92 116 86 53 52 93 117 88 54 53 94 119 89 55 55 95 121 90

56 56 96 122 93 57 58 97 124 96 58 60 98 126 .97 59 62 2 99 127 98 60 63 .2 100 129 98

61 65 2 .101 131 99 62 66 .3 102 132 63 68 4 103 134 64 70 4 104 136 65 71 5 105 137

66 73 . 5 106 139 67 75 6 107 140 68 76 -6 108 142 69 78 7 109 144 70 80 8 110 145

71 81 9 111 147 72 83 9 112 149 73 84 10 113 150 74 86 12 114 152 75 88 16 115 154

76 89 21 116 155 77 91 26 117 157 78 93 .30 118 159 79 94 32 119 160 80 96 35 19 236 11 yrs. 3 mos. through Table C(continued) 11 yrs., 9 mos.

British Columbia Standardization of the

Peabody Picture Vocabulary Test

IQ and Percentile Equivalents of Raw Scores

Raw Raw %ile Score IQ %lle Score IQ %ile

81 85 12 111 129 97 82 87 18 112 130 98 83 88 21 113 132 99 84 90 23 114 133 85 91 24 115 135

86 93 26 116 136 87 94 31 117 138 88 96 38 118 139 89 97 45 119 141 • 90 98 51 120 142

91 100 56 121 144 92 101 60 122 145 93 103 64 123 146 94 104 65 124 148 95 106 66 125 149

96 107 68 126 151 97 109 76 127 152 98 110 74 128 154 99 112 77 129 155 100 • 113 80 130 • 157

101 114 84 131 158 102 116 87 132 160 103 117 88 104 119 •88' 105 120 89

106 122 91 107 123 93 108 125 95 109 126 96 110 128 97

20 23 7 7 yrs. 3 mos. through Table D 7 yrs. 9 mos.

British Columbia Standardization of the

Slosson Intelligence Test

IQ and Percentile Equivalents of Raw Scores

. Raw Raw Score IQ %ile Score IQ %ile

111 112 78 112 115 •84 113 118 88 84 40 114 120 91 85 42 115 123 94

86 44 116 126 96 87 47 117 129 97 88 50 118 131 99 89 52 119 134 90 55 120 137

91 58 121 140 92 61 122 142 93 63 123 .145 94 66 124 148 95 69 2 125 150

96 72 3 126 153 97 74 5 127 156 98 77 6 128 159 99 - 80 8 .129 160 100 82 10

101 85 13 102 88 19 103 91 27 104 93 36 105 96 •42

106 99 46 107 102 . 53 108 104 59 109 107 68 110 110 74

21 238 9 yrs. 3 mos. Table D(continued) through 9 yrs.. 9 mos.

British Columbia Standarization of the

Slosson Intelligence Test IQ and Percentile.; Equivalents of Raw Scores

Raw Raw Score IQ %ile Score IQ %ile

121 106 64 122 108 70 123 110 74 124 112 78 125 114 82

126 116 87 127 118 90 88 40 128 120 90 89 42. 129 122 91 90 44 130 124 93

91 46 131 126 95 92 48 132 128 96 93 50 133 130 97 94 52 134 132 99 95 54 135 134

96 56 136 136 97 58 137 138 98 60 138 140 99 62 139 142 100 64 140 144

101 66 141 146 102 68 142 148 103 70 143 150 104 72 144 152 105 74 2 145 154

106 76 3 146 156 107 78 4 147 158 108 80 5 148 160 109 82 8 110 84 11

111 86 14 112 88 21 113 90 28 114 92 34 115 94 40

116 96 43 117 98 45 118 100 47 119 102 52 120 104 58

22 239 11 yrs. 3 mos. through

Table D(continued) U yrs> 9 mos>

British Columbia Standardization of the Slosson Intelligence Test

•IQ and Percentile Equivalents of Raw Scores

Raw Raw Raw Score IQ %ile Score IQ %ile Score IQ %ile

111 73 141 118 91 112 74 2 142 119 92 113 76 4 143 120 93 114 77 6 144 122 94 115 79 7 145- 124 96

116 80 9 146 125 96 117 82 12 147 126 97 118 83 14 148 128 97 89 40 119 85 .16 149 129 97 90 42 120 86 19 150 131 98

91 43 121 88 22 151 132 98 92 45 122 89 24 152 134 93 46 123 91 26 153 135 94 48 124 92 28 154 137 95 49 125 94 33 155 .138

96 51 126 95 36 156 140 97 • 52 127 97 40 157 141 98 54 128 98 44 158 143 99 55 129 100 48 159 144 100 56 130 101 50 160 146

101 58 131 103 53 161 147 102 60 132 104 58 162 149 103 61 133 106 62 163 150 104 62 134 107 66 164 152 105 64 135 109 70 165 153

106 65 136 110 74 166 155 107 67 137 112 77 167 156 108 68 138 113 ' 82 168 158 109 70 139 114 86 169 159 110 71 140 116 88 170 161

23 240 7 yrs. 3 mos. through

Table E 7 yrs.! 9 mos.

British Columbia Standardization, of the

Standard Progressive Matrices

Percentile Equivalents of Raw Scores

Raw Score %ile

11 6 12 7 13 12 14 17 15 21

16 27 17 35 18 40 19 46 20 51

21 58 22: 63 23 65 24 68 25 72.

26 76 27 78 28 79 29 81 30 84

31 87 32 90 33 92 34 93 35 94

36 97' 37 97 38 98 39 98 40 99

41 99.5

24 241 9 yrs. 3 mos. through

Table E(continued) g yrs> q mos<

British Columhia Standardization of the

Standard Progressive Matrices

Percentile Equivalents of Raw Scores

Raw Score %ile

16 1 17 2 18 3 19 • 5 20 9

21 13 22 17 23 20 24 22 .. 25 26

26 29 27 30 28 32 29 36 30 42

31 45 32 49 33 53 34 57 35 62

36 67 37' 73 38 77 39 82 40 86

41 89 42 92 43 94 44 94 45 95

46 96 47 98

25 242 11 yrs. 3 mos. „, , -, r, / • • J N through Table E(contxnued) n yrs> 9\qs

British Columbia Standardization of the

Standard. Progressive Matrices

Percentile; Equivalents of Raw Scores

Raw Score. %ile

20 1

21 1 22 1 23 2 24 2 25 2

26 3 27 5 28 7 29 9 30 11

31 15 32, 18 33 20 34 27 35 35

36 40 37 46 38 53 39 58 40 62

41 67 42 74 43 82 44 84 45. 87

46 90 47 92 48 95 49 98

26 243

7 yrs. 3 mos Table F through 7 yrs. 9 mos

British Columbia Standardization of the

Mill Hill Vocabulary Scale, Set A plus Set B

Percentile Equivalents of Raw Scores

Total Raw- Score %ile

6 7 1 8 4 9 8 10 11

11 15 12 22. 13 27 14 34 15 44

16 54 17 63 18 73 19 82 20 87

21 92 22 96 23 97 24 99 25 99.3

26 99.6

27 244 9 yrs. 3 mos Table F(continued) through 9 yrs. 9 mos British Columbia Standardization of the

Mill Hill Vocabulary Scale, Set A plus Set B

Percentile Equivalents of Raw Scores

Total Raw Score %ile

6 7 8 9 2 10 2

11. 2' 12 3 13 3 14. 4 15 5

16 7 17 8 18 1.4 19 19 20 24

21 32 22 38 23 44 24 52 25 60

26 66 27 73 28 78 29 82 30 87

31 92 32 96 33 97 34 99

28 245 11 yrs. 3 mos Table F(continued) through 11 yrs. 9 mos British Columbia Standardization of the

Mill Hill Vocabulary Scale, Set A plus Set B

Percentile Equivalents of Raw Scores

Total Raw Score %ile

13 :2 - 14- 2 15 3

16 4- 17 4 18 5 19 7 20 9

21 "11 22 14 23 18 24 21 25 24

26 30 27 37 28 42 29 47 30 52.

31 62 32 71 33 77 34 83 35 89

36 93 37 94 38 96 39 98 40 99

41 99.5 Appendix B

Percentile Ranks for British Columbia

in Terms of American Norms British Columbia Sample Scored with American Norms

Relation of IQs to Standard Deviations and Percentile Ranks

AGE lh

WISC-R Number of Perfor• Full Number of SDs from Verbal mance Scale PPVT SIT SDs from the Mean IQ IQ IQ IQ IQ the Mean IQ IQ IQ IQ IQ Score Scorc e %ile %ile Score %ile Score %ile Score /Sile

+3 149 148 99 144 149 151 +3 +2.5 142 141 ;98 138 99 142 99.9 145 99.9 +2.5 +2 135 99 135 96 132 98 134 95 139 97 +2 +1.5 128 96 128 90 126 92 127 90 133 94 +1.5 +1 121 85 122 82 120 81 119 74 .127 84 +1 + .5 114 75 115 68 114 70 112 65 121 72 + .5

0 107 50 109 , 50 108 50 104 50 115 50 0

- .5 100 33 103 •27 102 34 96 .33 109 34 -.5 -1 93 20 96 16 96 14 89 12 103 15 -1 -1.5 86 6 90 4 90 6 81 2 97 5 -1.5 -2 79 2 83 1 84 .1 74 91 2 -2 -2.5 72 77 78 66 85 1 -2.5 -3 65 70 72 .59 79 -3

31 Table G(continued)

British Columbia Sample Scored with American Norms

Relation of IQs to Standard Deviations and Percentile Ranks

AGE 9h

WISC -R Number of Perfor- j Full Number of SDs from Verbal mance Scale PPVT SIT SDs from :the Mean IQ IQ IQ IQ IQ the Mean IQ IQ IQ IQ IQ Score %ile Score %ile Score %ile Score %ile Score %ile

+3 142 148 146 154 151 +3 +2.5 136 99 141 139 99 147 144 +2.5 +2 130 97 135 98 133 96 140 98 138 97 +2 +1.5 124 92 128 94 126 90 133 '94 131 90 +1.5 +1 118 82 122 85 120 83 126 83 125 85 +1 + .5 112 69 115 67 113 67 119 66 118 68 + .5

0 106 50 109 50 107 50 112 50 112 50

•+ .5 100 '32 103 32 101 30 105 27 106 "35 + .5 +1 94 15 96 18 94 14 98 11 99 15 +1 +1.5 88 . 7 90 6 88 3 91 7 93 3 +1.5 +2 82 1 83 3 81 . 1 84 4 86 1 +2 +2.5 76 77 75 77 3 80 +2.5 +3 70 70 68 70 .1. 73 +3

32 Table G(continued)

British Columbia Sample Scored with American Norms

Relation of IQs to Standard Deviations and Percentile Ranks

AGE Ilk

WISC -R Number of Perfor• . Full Number of SDs from Verbal mance Scale PPVT SIT SDs from the Mean IQ IQ IQ IQ IQ the Mean IQ IQ IQ IQ IQ Score %ile Score %ile Score %ile Score %ile Score %ile

+3 141 150 143 154 156 +3 +2.5 134 143 137 147 149 99 +2.5 +2 128 99.6 136 99 130 98 139 98 141 98 +2 +1.5 121 96 129 94 123 94 132 91 134 95 +1.5 +1 115 •87 122 87 .117 84 124 81 126 89 +1 + .5 108 69 115 72 110 66 117 .67 119 72 + .5

0 102 50 108 50 104 50 109 50 111 50 0

- .5 96 32 101 29 98 24 101 24 103 34 - .5 -1 89 14 94 17 .91 12 94 9 96 19 -1 -1.5 83 6 87 6 85 •6 86 4 88 7 -1.5 -2 76 4 80 1 78 .2 79 3 81 2 -2 -2.5 70 1 73 72 1 71 2 73 1 -2.5 -3 63 66 65 64 1 66 -3

33 APPENDIX F

ANOVA AND COCHRAN 251

Table FI

Summary of Analysis of Variance for WISC-R Subtest and Scaled Score Means by Age

Source df SS MS

Scaled Scores: Information Between 2 2.06 1.03 0.11 Within 337 3042.68 9.03

Similarities Between 2 1.02 0.51 0.06 Within 337 3063.95 9.09

Arithmetic Between 2 0.60 0.30 0.04 Within 337 2885.36 8.56

Vocabulary Between 2 0.69 0.35 0.04 Within 337 2889.81 8.58

Comprehension Between 2 0.90 0.45 0.05 Within 337 2985.05 8.86

Picture Between 2 12.20 6.10 0.69 Completion Within 337 2980.31 8.84

Picture Between 2 0.16 0.08 0.01 Arrangement Within 337 3087.76 9.16

Block Between 2 2.78 1.39 0.15 Design Within 337 3041.00 9.02

Object Between 2 0.60 0.30 0.03 Assembly Within 337 2981.36 8.85

Coding Between 2 1.32 0.66 0.07 337 3094.23 9.18 252

Table FI continued

Source df SS MS

Sums of Scaled Scores: Verbal Between 2 17.73 8.87 0.07 Within 337 42440.92 125.94

Performance Between 2 36.05 18.03 0.19 Within 337 31432.37 93.27

Full Scale Between 2 80.39 40.20 0.12 Within 337 112965.38 335.21

.75F2,- " X-39 Table F2

C Values for Cochran's Test for Homogeneity of Variance

C=(maximum variance/sum of variances)

Scaled Scores:

Information 0.35 Similarities 0.35 Arithmetic 0.35 Vocabulary 0.34 Comprehension 0.35

Picture Completion 0.36 Picture Arrangement 0.34 Block Design 0.35 Object Assembly 0.34 Coding 0.34

Sums of Scaled Scores:

Verbal 0.35

Performance 0.36

Full Scale 0.36

Note: All C non-significant. APPENDIX G

GRAPHIC EQUIPERCENTILE EQUATING PROCEDURE PPVT IQ TO WISC-R VERBAL IQ, AGE7% 255 Table GI

Distribution of PPVT IQ Scores and WISC-R Verbal IQ Scores, Age lh

Cumulative Percentage WISC-R Cumulative Percentage PPVT Frequency Frequency Below VIQ Frequency Frequency Belo\

137 1 114 99.1 129 3 114 97.4 135 1 113 98.3 125 1 111 96.5 133 2 112 96.5 124 2 110 94.7 130 1 110 95.6 123 2 108 93.0 128 2 109 93.9 121 2 106 91.2 124 2 107 92.1 120 2 104 89.5 121 3 105 89.5 119 2 102 87.7 119 5 102 85.1 117 2 100 86.0 117 1 97 84.2 116 1 98 85.1 115 1 96 83.3 115 1 97 84.2 114 4 95 79.8 113 4 96 80.7 112 4 91 76.3 112 6 92 75.4 110 4 87 72.8 109 2 86 73.7 108 5 83 68.4 108 7 84 67.5 107 4 78 64.9 107 2 77 65.8 105 7 74 58.8 105 3 75 63.2 103 3 67 56.1 104 3 72 60.5 101 1 64 55.3 103 7 67 54.4 100 6 63 50.0 101 4 62 50.9 98 1 57 49.1 100 6 58 45.6 96 4 56 45.6 99 5 52 41.2 94 4 52 42.1 96 5 47 36.8 93 3 48 39.5 94 3 42 34.2 91 4 45 36.0 93 5 39 29.8 89 16 41 21.9 92 4 34 26.3 87 3 25 19.3 90 3 30 23.7 86 4 22 15.8 89 2 27 21.9 84 3 18 13.2 88 2 25 20.2 82 7 15 7.0 86 2 23 18.4 80 2 8 5.3 85 1 21 17.5 79 3 6 2.6 84 2 20 15.8 77 1 3 1.8 81 4 18 12.3 75 1 2 .9 80 3 14 9.6 70 1 1 78 1 11 8.8 77 4 10 5.3 76 2 6 3.5 74 3 4 .9 70 1 1 256

80 90 100 110 120 130 140 WISC-R VERBAL !Q PPVT IQ

Figure GI. Relative cumulative frequencies for PPVT IQs and WISC-R Verbal IQs, age 7%. 257 Table G2

Equipercentile Points for PPVT IQ and WISC-R Verbal IQ from Graphic Procedure

Percentile PPVT WISC-R Percentile PPVT WISC-R Rank IQ VIQ Rank IQ VIQ

0.5 73.5 71.5 93.0 125.2 123.0 0.7 74.3 72.4 95.0 128.0 124.7 1.0 75.3 73.3 97.0 131.5 128.3 1.4 76.2 74.0 98.0 134.3 131.8 1.8 76.8 74.7 98.4 135.5 132.2 2.0 77.3 74.9 98.8 137.0 133.7 3.0 78.5 75.6 99.0 138.2 134.8 4.0 79.6 76.3 5.0 80.4 76.8 8.0 82.5 78.5 12.0 84.5 81.3 17.0 87.0 85.0 20.0 87.3 87.5 24.0 87.8 90.2 30.0 89.8 93.5 34.0 92.0 95.3 40.0 94.3 98.0 46.0 96.5 100.5 50.0 98.5 102.0 54.0 100.5 103.3 60.0 103.5 105.3 66.0 106.7 107.0 70.0 108.8 108.5 74.0 111.0 110.0 78.0 113.3 111.8 82.0 115.7 114.0 85.0 117.8 115.8 88.0 120.3 117.7 90.0 122.0 119.3 258

80 90 100 110 120 130 PPVT IQ

Figure G2. Hand-graphed equipercentile conversion line (PPVT IQ to WISC-R Verbal IQ, age Ih.)• APPENDIX H

NORMALIZED LINEAR EQUATING RESULTS 260

Table Hi

Equivalent Scores Normalized PPVT IQs to Normalized WISC-R Verbal IQs (Age 7h)

rXY= '68' rXX= '86' rYY= '88

Normalized Linear Equating

PPVT IQs Y* Sy>

60 59.982 3.299 67 66.982 2.816 70 69.982 2.614 74 73.982 2.353 77 76.982 2.164 81 80.982 1.926 84 83.982 1.761 86 85.982 1.659 88 87.982 1.565 92 91.982 1.409 95 94.982 1.325 97 96.982 1.289 98 97.982 1.278 99 98.982 1.271 100 99.982 1.268 101 100.982 1.270 102 101.982 1.276 103 102.982 1.287 104 103.982 1.303 106 105.982 1.346 108 107.982 1.404 110 109.982 1.476 112 111.982 1.559 114 113.982 1.652 115 114.982 1.702 117 116.982 1.807 120 119.982 1.976 122 121.982 2.094 124 123.982 2.217 126 125.982 2.344 129 128.982 2.538 133 132.982 2.806 140 139.982 3.290

Y.= 'RMS ;=13.454 100.000 261

Table H2

Equivalent Scores Normalized PPVT IQs to Normalized WISC-R Verbal IQs (Age 9h)

rXY= '72' rXX= -93' rYY= «87

Normalized Linear Equating

PPVT IQs Y* sWJ. ^ o Y*

60 60.318 3.039 68 68.286 2.528 72 72.270 2.281 74 74.262 2.161 75 75.258 2.103 76 76.254 2.044 77 77.250 1.987 79 79.242 1.874 80 80.238 1.820 81 81.234 1.766 83 83.226 1.663 85 85.218 1.565 88 88.206 1.433 90 90.198 1.355 92 92.190 1.289 93 93.186 1.260 94 94.182 1.235 96 96.174 1.195 98 98.166 1.171 100 100.158 1.163 102 102.150 1.172 104 104.142 1.199 106 106.134 1.240 109 109.122 1.328 111 111.114 1.402 112 112.110 1.442 113 113.106 1.485 114 114.102 1.530 116 116.094 1.625 118 118.086 1.726 120 120.078 1.832 122 122.070 1.943 126 126.054 2.175 128 128.046 2.295 130 130.038 2.418 137 137.010 2.860

Y.= RMS=12.432 100.000 262

Table H3

Equivalent Scores Normalized PPVT IQs to Normalized WISC-R Verbal IQs (Age 11%)

r 78 r •93, ryY = .87 XY ' ' XX

Normalized Linear Equating PPVT IQs Y* o SY*

60 59.428 3.105 69 68.518 2.519 73 72.558 2.269 75 74.578 2.147 77 76.598 2.028 79 78.618 1.913 83 82.658 1.695 86 85.688 1.546 88 87.708 1.457 89 88.718 1.415 91 90.738 1.340 93 92.758 1.276 96 95.788 1.206 98 97.808 1.179 100 99.828 1.169 102 101.848 1.177 104 103.868 1.202 105 104.878 1.221 106 105.888 1.243 107 106.898 1.269 109 108.918 1.332 110 109.928 1.367 111 110.938 1.406 113 112.958 1.490 115 114.978 1.583 117 116.998 1.683 118 118.008 1.735 119 119.018 1.789 121 121.038 1.900 122 122.048 1.957 124 124.068 2.074 126 126.088 2.194 127 127.098 2.255 130 130.128 2.441 133 133.158 2.632 140 140.228 3.090

Y.= . 100.000 RMS =12.000 263

Table H4

Equivalent Scores Normalized SIT IQs and Normalized WISC-R Verbal IQs (Age llh)

rvv = .89, rvv = .93, rw = .88

Normalized Linear Equating SIT IQs Y* SY*

60 59.625 2.543 67 66.695 2.162 71 70.735 1.951 74 73.765 1.797 77 76.795 1.648 80 79.825 1.505 83 82.855 1.370 84 83.865 1.327 85 84.875 1.286 87 86.895 1.207 88 87.905 1.171 90 89.925 1.103 91 90.935 1.073 93 92.955 1.020 95 94.975 .978 96 95.985 .962 98 98.005 .940 99 99.015 .935 100 100.025 .934 101 101.035 .936 103 103.055 .951 105 105.075 .981 106 106.085 1.001 108 108.105 1.049 109 109.115 1.077 111 111.135 1.141 114 114.165 1.252 116 116.185 1.334 118 118.205 1.421 120 120.225 1.512 123 123.255 1.656 126 126.285 1.805 129 129.315 1.960 133 133.355 2.171 140 140.425 2.552

Y.= 100.000 RMS =9.429 264

Table H5

Equivalent Scores Normalized SIT IQs to Normalized WISC-R Full Scale IQs (Age 11%)

r 89 r 87 .88 XY= ' ' XX= • > ~

Normalized Linear Equating SIT IQs s„, o Y*

60 59.601 2.837 67 66.706 2.415 71 70.766 2.180 74 73.811 2.009 77 76.856 1.844 80 79.901 1.685 83 82.946 1.536 84 83.961 1.488 85 84.976 1.443 87 87.006 1.356 88 88.021 1.315 90 90.051 1.241 91 91.066 1.207 93 93.096 1.149 95 95.126 1.103 96 96.141 1.086 98 98.171 1.062 99 99.186 1.056 100 100.201 1.054 101 101.216 1.057 103 103.246 1.074 105 105.276 1.106 106 106.291 1.128 108 108.321 1.181 109 109.336 1.213 111 111.366 1.283 114 114.411 1.405 116 116.441 1.496 118 118.471 1.592 120 120.501 1.693 123 123.546 1.852 126 126.591 2.018 129 129.636 2.189 133 133.696 2.424 140 140.801 2.847

Y.= 100.000 RMS =10.458 265

Table H6

Equivalent Scores Normalized MHVS Raw Scores to Normalized WISC-R Verbal IQs (Age llh)

r^ = .92, r^ = .93, ryY = .88

Normalized MHVS Raw Linear Equating

Scores Y* s,T. o Y*

13 62.768 2.258 16 70.037 1.891 17 72.460 1.773 18 74.883 1.657 19 77.306 1.544 20 79.729 1.436 21 82.152 1.332 22 84.575 1.234 23 86.998 1.144 24 89.421 1.064 25 91.844 .995 26 94.267 .942 27 96.690 .905 28 99.113 .888 29 101.536 .891 30 103.959 .915 32 108.805 1.015 33 111.228 1.088 34 113.651 1.171 36 118.497 1.364 38 123.343 1.579 40 128.189 1.810 41 130.612 1.929 42 133.035 2.050 45 140.304 2.422

Y.= 28.343 •RMS ,=9.175 Permission is hereby granted to Ms. Barbara J. Holmes to reproduce the WISC-R substitution items from the article, "Final Report on Modifications of WISC-R for Canadian Use" (Canadian Psychological Association Bulletin, 1977, No. 5, pp. 5-7), in her Ed.D. dissertation, "Inidvidually-Administered Intelligence Tests: An .Application of Anchor Test Norming and Equating Procedures in British Columbia", University of British Columbia, 1981, and to the National Library of Canada to copy them on microfilm and to lend or sell copies of the film.

Please note that this material appears in Appendix B, p. 188.