<<

DEMOGRAPHIC VARIABLES AND SCORES IN

DISABILITY APPLICANTS

ROBERT BRUCE CLAPP JR.

Bachelor of

Ohio Dominican University

May 1981

Masters of Arts

Cleveland State University

December, 1989

Educational Specialist in Counseling

Cleveland State University

December, 2002

Submitted in partial fulfillment of the requirements for the degree

DOCTOR OF PHILOSOPHY IN URBAN

at the

CLEVELAND STATE UNIVERSITY

MAY, 2014

© Copyright by Robert Bruce Clapp Jr. 2014 We hereby approve this dissertation of Robert Bruce Clapp Jr. Candidate for the Doctor of Philosophy in Urban Education degree

This Dissertation has been approved for the

Office of Doctoral Studies,

College of Education and Services

and

CLEVELAND STATE UNIVERSITY

College of Graduate Studies by:

______Dr. Kathryn C. MacCluskie, Committee Chairperson Counseling, Administration, Supervision, and Adult

______Dr. Graham B. Stead: Methodologist Curriculum and Foundations

______Dr. Sarah M. Toman; Committee Member Counseling, Administration, Supervision, and Adult Learning

______Dr. Aaron T. Ellington; Committee Member Counseling, Administration, Supervision, and Adult Learning

______Dr. Deborah Koricke; Committee Member Center for Effective Living

April 28, 2014 Student’s Date of Defense

DEDICATION

Reverend Robert Clapp

(1924-2011)

He taught me how to believe in God; and to believe in our fellow man.

ACKNOWLEDGEMENTS

To Vanessa, Hannah and Robby Clapp; for their patience during this long, challenging journey.

To Drs. Sarah M. Toman and Kathryn C. MacCluskie, for amazing support and patient revising.

To Dr. Graham B. Stead Methodologist for his guidance great instruction in the process.

To Drs Aaron T. Ellington and Deborah Koricke for their great support in the writing process.

To the Center for Effective Living for providing the data and an excellent learning experience.

ABSTRACT

The American Psychological Association (2012) has published Guidelines for the

Assessment of and Intervention with Persons with Disabilities. These guidelines emphasize the importance of recognizing social and cultural diversity for persons with disabilities (guideline 8, p. 49 ) and the need to apply assessment approaches that are

“psychometrically sound, fair, comprehensive, and appropriate for clients with disabilities” (guideline 14, p. 52). In addition, the American Association on Intellectual and Developmental Disabilities (AAIDD; 2011) has indicated IQ testing is a major tool in assessing . Also, the Wechsler Adult Intelligence Scale, Fourth

Edition (WAIS-IV; Wechsler, 2008) and the Wechsler Intelligence Scale for Children

(WISC-IV; Wechsler, 2003) have become two of the most widely used tests in the world for assessing intellectual disability (Chen & Zhu; 2008). Studies on both assessments have focused mainly on people who are generally functioning in the average range of cognitive ability. Among people whose level of functioning is less sophisticated, however, issues related to possible test bias are even more concerning, because their test scores might be even more deleteriously affected by bias than a test-taker whose ability is in the "average" range.

To illuminate possible psychometric bias against people with disabilities resulting from cultural differences, this study looks specifically at any differences in the sample between age, gender, and (specifically Blacks and Whites) as assessed by the WAIS-

IV or WISC-IV scores, particularly among people being evaluated for disability benefits.

This study seeks to investigate the possible presence of significant correlations between subtest, Index, and Full Scale scores and selected demographic variables. Since the data

vi in this archival analysis will be derived from assessments of people who had been referred due to problems resulting from some form of cognitive impairment, the mean test scores fall below “average" scores for the standardization sample, which is considered to be roughly representative of the general U.S. population. Analyses of subscale scores will be conducted to determine whether subtest variability appears to have a significant relationship with any of three demographic variables of age, gender, and race. The purpose of this psychometric analysis is to implement the established practice guidelines in order to help assure the best possible care for persons being assessed for intellectual disability.

vii TABLE OF CONTENTS

ABSTRACT ...... vi

LIST OF TABLES ...... xi

CHAPTER:

I. INTRODUCTION AND STATEMENT OF THE PROBLEM ...... 1

Intelligence ...... 8

Race ...... 10

Fairness ...... 11

Research Hypothesis ...... 14

Summary ...... 15

II. REVIEW OF THE LITERATURE ...... 16

Disability ...... 16

Age ...... 18

Gender ...... 20

History of Intelligence Testing and Racial Differences ...... 22

Nature: The Hereditarian Model ...... 35

Nurture: The Disparity is argued to be from Environmental Causes ...... 39

Beyond : Towards a More Complex

Understanding ...... 42

Determining Which Scales to Use ...... 44

Subscales ...... 44

Summary ...... 46

III. METHODOLOGY ...... 48

viii Research Hypothesis ...... 48

Procedure ...... 48

Participants ...... 50

Instruments ...... 50

WISC-IV ...... 50

WAIS-IV ...... 53

Analyses ...... 56

Summary ...... 57

IV. RESEARCH RESULTS ...... 58

Introduction ...... 58

Demographics ...... 59

Analysis of the WISC IV scores for Age, Race, and Gender: Research

Hypothesis 1 ...... 61

Full Scale (FSIQ) ...... 62

Indices ...... 63

Analysis of the WAIS- IV scores for Age, Race, and Gender: Research

Hypothesis 2 ...... 65

Full Scale Intelligence Quotient (FSIQ) ...... 66

Indices ...... 66

Conclusion ...... 68

V. RESULTS ...... 69

Research Hypotheses ...... 69

First Research Hypothesis...... 69

ix WISC-IV Results ...... 70

WAIS-IV Results ...... 70

Discussion ...... 71

Limitations and Implications for Future Research ...... 74

Implications for Practice ...... 79

Summary ...... 80

REFERENCES ...... 81

x LIST OF TABLES

1. WISC-IV, Indexes and Their Subtests ...... 52

2. WAIS-IV: Indexes and Their Subtest ...... 56

3. WISC-IV and WAIS-IV Demographics (Summary of participants) ...... 59

4. WISC-IV Age (by range categories; as determined by The Psychological

Corporation) ...... 60

5. WAIS-IV Age (categories; as determined by The Psychological Corporation) ...... 60

6. Means and Standard Deviations (in parentheses) of WISC-IV Scores for Disability

Applicants ...... 61

7. Univariate tests: WISC-IV: Dependent Variable FSIQ by Age, Race, and Gender ..62

8. Multivariate tests for Age, Race, Gender, and WISC-IV ...... 63

9. WISC source of significance for Age by VCI, PRI, WMI, and PSI ...... 64

10. WISC source of significance for Gender by VCI, PRI, WMI, and PSI ...... 64

11. Means and Standard Deviations (in parentheses) of WAIS-IV Scores for Disability

Applicants ...... 65

12. Univariate tests: WAIS-IV: Dependent Variable FSIQ by Age, Race, and Gender.66

13. Multivariate tests for Age, Race, Gender, and WAIS-IV ...... 67

14. WAIS-IV source of significance for AGE by VCI, PRI, WMI, and PSI ...... 68

xi CHAPTER I

INTRODUCTION AND STATEMENT OF THE PROBLEM

We have eliminated the colored versus white factor by admitting at the outset, that

our norms cannot be used for the colored population of the . Though

we have tested a large number of colored persons, our standardization is based

upon white subjects only. We omitted the colored population from our first

standardization because we did not feel that norms derived by mixing the

populations could be interpreted without special provisions and reservations.

(David Weschler, 1944, as cited in Williams, 1972, p.4)

Chapter one introduces the rationale for conducting this dissertation research. The chapter includes discussion of the importance and significant contribution this research will make to the field. In addition the chapter elaborates on the concepts of intelligence, race, and fairness as they apply to research on the topic of racial disparity in intelligence quotient (IQ) assessment amongst people with intellectual disabilities.

Standardized testing for intelligence began during an era when prejudice and discrimination against black people was less subtle than it is today. While much has been written about the history of intelligence testing and the connections to , little current research exists to provide clarity about the current status of the newest editions of

1 these high stakes tests. Specifically, there is a dearth in the amount of research being conducted on the Wechsler Adult Intelligence Scale Fourth Edition (WAIS-IV; Wechsler,

2008) and Wechsler Intelligence Scale for Children Fourth Edition (WISC-IV; Wechsler,

2003) related to the question of racial differences. This is concerning given the important relevance of this area of research from both a psychological and social perspective and within the process of assessing for cognitive disability. Nisbett et al (2012) stated “IQ is also important because some group differences are large and predictive of performance in many domains. Much evidence indicates that it would be difficult to overcome racial disadvantage if IQ differences could not be ameliorated. IQ tests help us track the changes in intelligence of different groups and of entire nations and to measure the impact of interventions intended to improve intelligence” (p. 131).

Even less research exists on samples within the lower end of IQ, where these high stakes tests may have their biggest impact in terms of decisions made about diagnoses, treatments, and educational opportunities. For example, Koocher (2003), explained that the Supreme Court could rule, as in Atkins v Virginia, that intelligence tests have played a role in death penalty enforcement across the country. Thus, when the decision to execute a person could be influenced by our ability to determine if the person has the intelligence to understand the crime they have committed, our ability to measure intelligence can be of major significance. However, as will be described later, the courts views on intelligence testing have waivered and not always been consistent.

Since 1932, it has been accepted that there exists a one difference between test scores for White and Black intelligence test takers (Onwuegbuzie

& Daley, 2001). While several authors (reviewed in Chapter 2 of this text) have proposed diverse explanations for the causes of these differences, little research is being done today

2 to investigate the status of current scales with current test takers or the factors within the tests that may continue to contribute to race differences in test scores on standardized tests of intelligence. This is especially true when considering persons with disabilities, women, and age differences.

When a person does not perform as well as others taking the same tests, assessors have an ethical obligation to be concerned about the potential consequences for test takers. Gasquoine (2009) expressed this ethical concern when he stated:

As clinical neuropsychologists can interpret low scores (or a high number of

errors) on neuropsychological tests as indicative of cognitive impairment from

structural brain injury these findings place minorities at a higher risk of

misdiagnosis than Whites. (p.250)

In 2008, Suinn and Borrayo cited U. S. Census projections that ethnic minorities will make up 36% of the population in 2010 and 52.3% by 2050. They indicated that we need to increase our research on effective assessment practices, with minorities receiving as much research consideration as “Euro-Americans,” and indicated a void in the number of assessments that access information about cultural-specific syndromes. Actual Census data (Census.gov, 2011) from the 2010 census indicates that 27.6% of the population did not identify as “white alone.” The identification was 12.6 % Black or African American alone, 0.9% American Indian and Alaskan Native alone, 4.8% Asian alone, 0.2% Native

Hawaiian and other Pacific Islander alone, 2.6% some other race alone and 2.9% two or more races.

When consideration is given specifically to individuals whose scores are lower than average, the research on the effects of the disparity in scores almost does not exist.

This lack of exploration of low end scores continues in spite of findings by Detterman

3 and Daniel (1989) that indicated mental assessments correlate with each other and with cognitive variables, and are highest for low IQ groups. They explained that “subtest inter- correlations are significantly larger for low relative to high IQ groups” (Detterman &

Daniel, 1989, p. 352). Since these inter-correlations are higher, if the Black- White disparity was determined to be greater for people on the lower end of the intelligence curve, misdiagnosis becomes an even larger concern. It is still important to attempt to determine if there is any part of the tests that influence racial differences, increasing the potential for any group to be misdiagnosed.

Sattler (2008) reported differences in Black-White IQ scores for the WISC-IV and found differences on all four Indexes and Full Scale IQ. Sattler’s (2008) publication indicated the following findings by the Psychological Corporation: “Euro- Americans

(N= 1,402) had a mean Full Scale IQ (FSIQ) of 103.24 (SD = 14.52) while African

Americans (N= 343) had a mean FSIQ of 91.72 (SD = 15.74)” (p. 280). This replicates the previously identified one standard deviation in FSIQ performance (Onwuegbuzie &

Daley, 2001).

Index scores were also reported for Verbal Comprehension (VCI), Perceptual

Reasoning (PRI), Working (WMI), and Processing Speed (PSI) (Sattler, 2008, p. 280). Sattler (2008) reported Euro Americans had a VCI mean score of 102.92 (SD

=13.80), while had a mean score of 91.86 (SD=15.42). For PRI, Euro

Americans had a mean score of 102.77 (SD =14.36) while African Americans had a mean score of 91.43 (SD=15.07). For WMI, Euro Americans had a mean score of 101.26 (SD

=14.55) while African Americans had a mean score of 96.12 (SD=15.35). Finally, for

PSI, Euro Americans had a mean score of 101.41 (SD =14.70) while African Americans had a mean score of 95.00 (SD=15.66). For more detailed explanations of the Indexes,

4 the reader is referred to Chapter 3 of Wechsler Intelligence Scale for Children – Fourth

Edition (WISC-IV) Technical and Interpretive Manual (Wechsler, 2003).

While the differences in VCI and PRI may appear to be slightly greater than those of WMI and PSI, it was not a part of the purpose of Sattler’s chapter to present specific analyses of which, if any, Index scores contribute most to disparity. Nor was it relevant for Sattler to provide commentary about the contributions of the subscales themselves, but this is a topic of interest for this dissertation. Also appropriate for Sattler’s chapter, the reported data reflects performance of individual’s representative of the general population. Earlier in the text, Sattler (2008) referenced Jensen (1975), and indicated that disparity may be associated with g loading. But, Sattler (2008) later qualified:

The present consensus is that it is not possible to make valid inferences about

genetic differences among races as long as there are relevant systematic

differences among races in , cultural patterns, and

environments. These differences influence the development of cognitive skills in

complex ways, and no one has succeeded in either estimating or eliminating their

effects. Centuries of discrimination have made meaningless direct comparisons of

the mental ability of African Americans and Euro Americans. (p.169)

It is understandable that comparing mental ability may be “meaningless,” but comparing elements of a test that may contribute to misinterpretations of cognitive differences certainly is not if an objective of the research is clarification of what constitutes a measure of intellectual disability.

Sattler’s (2008) reported findings do not provide information about the WAIS-IV because Sattler’s text was focused on the testing of children. Also, the research questions of this dissertation research are beyond the scope of his text; Sattler’s results do not

5 indicate the specifics of how any of the subscales impact the disparity in full scale IQ, nor do they address the question about racial differences on lower performance scores for either the WISC-IV or WAIS-IV. Sattler’s text also does not report questions about within-group differences.

Assessment disparity is not merely an academic concern. and test administrators have an ethical obligation to attend to any perceived lack of cultural consideration in the work completed as practitioners. According to the American

Psychological Association (APA), Guidelines on multicultural education, training, research, practice, and organizational change for psychologists, (2003),“ Consistent with

Standard 2.04 of the APA Code (APA, 1992), multiculturally sensitive practitioners are encouraged to be aware of the limitations of assessment practices, from intakes to the use of standardized assessment instruments” (p.391).

Because of the well-documented concerns about the historical misuse of these tests, it is important that we do not merely assume that the normalization processes used by test developers have effectively eliminated biases. Simply put, to ignore the question of race differences in these new assessments could inadvertently perpetuate a myth of one race’s intellectual superiority. Such a myth is of great concern in light of recognition that the tests may require specific cognitive procedures that are more often used by whites than blacks, as was determined to be the case in previous studies of similar tests (Helms,

2006).

If differences persist between the performance of whites and blacks, it would be helpful to determine if this is due to overall test differences or if some subtests contribute more than others. If subtest findings exist, it might allow for modifications of future

6 tests and greater of cultural differences in the cognitive experiences of both groups.

The “” (Flynn, 1987) demonstrated that there has been a rise in IQ scores in the past century. In addition, much of the gains in IQ have occurred in the lower range of intelligence. If a decrease is to occur in the disparity between black and white IQ scores, it could occur within this group; that is, if the increases in IQ are happening more in the lower end of the distribution the impact on decreases in disparity are more likely found there first. While Sattler demonstrated the one standard deviation difference was still present for children taking the WISC-IV (Sattler, 2008, p. 280), the research was not conducted on a sample that represented the black-white differences on the lower end of intelligence. Sattler also does not provide information on testing for adults on the most current scales (WAIS-IV). In addition, as mentioned, these studies do not explore within- group variability.

Many people over the course of history have sought to provide definitions and tests to determine a person’s intelligence. Just as there may be many differing perspectives on the definition of intelligence and consequently how it might be tested, there may be many differing personal backgrounds that could influence the way in which an author may define and then test for intelligence. Peoples’ views on race, their experiences as members of a race and ideas about the role of race on intelligence have been reported to play a part in how a person may define and test for intelligence. Thus, another question that arises in the literature about intelligence testing and race is the issue of test fairness. The following paragraphs offer brief descriptions of Intelligence, Race, and Fairness. While each of these topics is directly relevant to this dissertation research, these descriptions are limited by the scope of this dissertation.

7 Intelligence

Discussion in the literature about intelligence makes clear that the definition and conceptualization of intelligence remains a topic of debate. Volumes have been written to describe intelligence, to present about what it involves and theories about how individuals may or may not have it. Boring commented, that “intelligence is whatever the tests test” (cited in Beins, 2010, p.89). Thus, we need to be thoughtful of which test we choose and what the test measures.

In 1958, Weschler (cited in Onwuegbuzie & Daley, 2001) defined intelligence as,

“the aggregate or global capacity of the individual to act purposefully, to think rationally, and to deal effectively with the environment” (p. 211). However, Onwuegbuzie and

Daley (2001) challenged Weschler’s perspective when noting, “Yet even , who devised the first IQ test in 1905, declared that intelligence is too complex to summarize with a single number and warned of the ‘brutal pessimism’ that would ensue if IQ tests ever were mistaken as a measure of fixed immutable intelligence” (p.211).

Onwuegbuzie and Daley also referred to Greenfield (1998), noting that his explanation of intelligence was:

…the ability to acquire competence through learning, socialization, and

development from each of the following: (a) technology, (b) linguistic

, and (c) social organization, facets that vary from culture to

culture. Thus by Greenfield’s definition, intelligence varies from culture to

culture. In other words, cultures define intelligence by what is adaptive in their

particular social and cultural milieu. (p.212)

Two additional contributions to the understanding of intelligence are Pinker’s

(1997) research on artificial intelligence, and Dweck’s (2000) work with self-. In

8 the book, How the Mind Works, Pinker (1997), compared artificial and and offered that a large part of intelligence has to do with the ability to attain goals. Factors that thwart the individual’s ability to experience goal attainment can then be seen as contributing to impacting their intelligence. This could be interpreted to support the idea that because groups of people have had their goals blocked they may not score as well on intelligence tests. Persons of minority status or with disabilities may experience more obstacles to goal attainment. The tests they are given should not be ones that contain their own obstacles, rather tests that serve as an aid to overcoming them.

Dweck (2000), in the book Self- Theories compared “entity” versus “incremental” theories of intelligence. Based on her research, if an individual views intelligence as a single entity they are less likely to perform well than if they view intelligence as something able to change from time to time and situation to situation (incrementally).

One potential concern about standardized measures is the potential for them to perpetuate

“entity” perspectives of intelligence. Sternberg (2000) described a similar concept,

“implicit theory” of intelligence, noting that intelligence tests “are validated almost exclusively against the societally approved criteria, giving the tests the appearance of that they may not have within a given sociocultural group” (p.159). Differing conceptualizations of intelligence are likely reflective of and/or influenced by the history of intelligence testing. More information about the history of intelligence testing follows in Chapter 2.

Advances in our understanding of “cognitive plasticity” (see Mercado III, 2009, for more detailed explanation) have also helped us to understand important critical periods for intervention based on the remarkable adaptive abilities of the developing brain. Intelligence from this area of cognitive neuroscience involves a complex interplay

9 of genetic, physiological, and environmental interventions. Such sophisticated of brain development may assist in restructuring our knowledge of the nuances of intelligence and will contribute to even more beneficial tools for assessment.

While alternative conceptualizations of intelligence offer promise in increasing fairness and broadening our view of the concept of intelligence, they tragically, at this time, lack empirically validated objective measures that lend themselves to research analyses. Suzuki and Valencia’s (1997) definition of intelligence best supports the purposes of this dissertation research: “Intelligence is operationally defined by scores on individually administered standardized intelligence tests” (p. 1103). Since their available data for study had been operationalized as scores on the WAIS-IV and WISC-IV, we are also led to accept them as the definition of intelligence provided by theses scales.

Sometimes, when our only tool is a hammer, the question becomes is it our best tool or is there a way to at least create a modification to make it a better tool?

Race

The topic of racial differences begs the question, what do you mean by race? The

APA provides answers to the question, by defining race in their Guidelines on

Multicultural Education, Training, Research, Practice, and Organizational Change for

Psychologists (APA, 2003). The APA indicated that the biological definition of race has led to a great deal of controversy. The APA cites Helms and Cook (1999) and explains that the controversy surrounds the recognition that “biological racial categories and phenotypic characteristics have more within-group variation than between-group variation” (p.380). In the guidelines, (APA, 2003) race is not viewed as a biological construct, but rather a social construct. “Race, then, is the category to which others assign

10 individuals on the basis of physical characteristics, such as skin color or hair type, and the generalizations and made as a result” ( p.380). They quote Helmes and

Talleyrand (1997), “Thus, people are treated or studied as though they belong to biologically defined racial groups on the basis of such characteristics” (p.374). For the purposes of this dissertation study, only participants that identified with and were classified within the social construct of either Whites or Blacks will be included. These racial constructs include stereotypes that have resulted from studies with these racial groups in the past, but this dissertation research is being conducted in the spirit of contributing to the eventual elimination of stereotypes, as they may be related to intelligence testing and disability determination. In addition, identification of races may be beneficial if it is determined that one group is being discriminated against, or even being treated differently than another in assessing intellectual disability.

Fairness

In light of the history of IQ tests, the potential misuses of IQ test results in disability determination, and because the history would make the presence of an “illusory correlation” or stereotypical conclusion not only possible, but also hazardous, one of the main for generating this dissertation research is to determine if the newest versions of the Weschler tests are fair. Fairness of these tests might be found if the

“Flynn effect” (Flynn, 1987), or modifications in the tests themselves, has rendered the

Black/White disparity obsolete or progressing in that direction, at least for persons with intellectual disabilities. Helms (2006) defined ‘fairness’ in testing as “the removal from test scores of systematic variance attributable to experiences of racial or cultural socialization, and it is differentiated from test-score validity and cultural bias” (p. 845).

11 Even the legal system appears perplexed in attempting to decide about the fairness of IQ testing as it relates to determining a disability. For example, in 1979 in the case of

Larry P. v. Wilson Riles, Judge Peckman of California ruled that IQ tests are culturally biased when used to assess Black children for classes for the educable mentally retarded.

One year later, in the case brought forward by the conservative parent group (Parents in

Action), Parents in Action on v. Joseph P Hannon, Judge Grady of

Illinois ruled that intelligence tests are not racially or culturally biased and do not discriminate against Black children. However, if a racial disparity exists amongst persons who are applying for disability determination and happen to score lower on the newer test versions, it is more likely that they may not be fair. Given debates within

Psychology about the fairness of intelligence testing, it is not surprising that the courts have not been able to agree whether or not these tests are culturally biased.

While promoting an argument for genetic influence, Herrnstein and Murray

(1995) conceded that, to some degree, culture had to be acknowledged in results from intelligence testing, yet there would be more potential for the situation to improve if there were greater improvements in the larger culture that contains disparity. They also support the hypothesis of this dissertation, that disparity change may first be observed on the lower end of the scale scores.

In the past few decades, the gap between blacks and whites has narrowed by

perhaps 3 points. The narrowing appears to have been mainly caused by a

shrinking number of very low scores in the black population rather than an

increasing number of high scores. Improvements in the economic circumstances of

blacks, in the quality of the schools they attend, in better public heath, and perhaps

12 in diminishing racism may be narrowing the gap. (Herrnstein & Murray, 1995, p.

269)

In addition to the recognition that with improvement to environmental factors within the culture, there will be a closing of the gap, there is within this quote the prediction of the research proposed in this dissertation. That if the gap begins to close, it is more likely to be discovered on the lower end of test scores first. Sattler (2008) appeared to agree and cites the findings of Colom, Lluis-Font and Andres-Peuyo (2005), that the gains made in performance amongst blacks on the WISC-IV were “on the lower portion of the IQ distribution” (p.252).

Findings of reductions in the disparity of the test scores (even if just starting on the lower portion) could also be an early indication of efforts towards discovery of greater fairness within these cognitive assessments. Without changes in the degree of disparity,

IQ assessments are not completely fair tests. Perry, Satiani, Henze, Mascher, and Helms,

(2008) explained:

Helms (2006) revisited the concept of cultural test bias by reframing the matter

under an Individual-Differences Fairness model. According to this approach, CAT

(cognitive ability test) scores should not be correlated with racial-cultural

constructs (e.g., racial identity). If they are correlated and result in mean

differences, then CATs are not ‘fair’ instruments. Helms (2006) suggested that

researchers replace the use of the construct of race with individual-difference

constructs based on socialization. In a hierarchical regression, fairness would enter

‘conceptual constructs in the first step of an analysis to predict scores and racial

groups in a second step’ (Helms, 2006, p. 852). If the second step fails to explain

13 variation in scores above and beyond the first step, the explanations based on racial

group would not be adequate models. (p. 164-165)

While it will fall beyond the scope of this project to deem if the revisions of the

IQ scales are completely fair or unfair, the results of this dissertation may indicate that the scores are not biased for people that are applying for disability. If ethical practice is to be retained, it is essential that attempts are made to provide assessments that are as fair as possible to all people being measured. If gaps exist in a measure that continues to an entire group of people to exhibit differences in performance, it will not lead to the conclusion that the measure is fair, despite reports of validity and reliability statistics. If disparity continues to exist, future fairness can be accomplished by discovering factors that contribute to the disparity.

Research Hypothesis

To determine if there are disparities that exist in more current assessment tools for intellectual disability, the psychometric properties of the WAIS-IV (Wechsler, 2008) and the WISC-IV (Wechsler, 2003) could be explored for individuals applying for disability benefits. It would be beneficial to determine if the assessment tool indicated any differences for blacks or whites, males or females, or if any influence of age was observed. The following research hypothesis provides a guide for answering preliminary questions about whether the psychometric properties of this tool are impacted by race, gender, or age: Hypothesis 1: Among people applying for disability, there is a relationship between age, gender, race, and performance on the Wechsler Intelligence

Scale for Children (WISC-IV; Wechsler, 2003). Hypothesis 2: Among people applying for disability, there is a relationship between age, gender, race, and performance on the

Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008).

14 Summary

This first chapter presented an argument for the importance of conducting this dissertation research. Although several less biased models of intelligence have been proposed, they have not been standardized. The assessment tools that have been standardized have been challenged as being unfair due to differences measured with people of differing race. This issue cannot be ignored because of the professional mandate to practice ethically when determining the correct assessment methods for determining intellectual disability. Additionally, the ethical challenge is coupled with recognition of the potential for harm by labeling someone with a disability if a test may be biased. If one group of people is at greater risk of misdiagnosis than another due to a disparity in test scores people may be inadvertently labeled as having a disability when the issue is really about differences in the way people from different backgrounds respond to the test. Black people have a much greater history of being exploited than whites, and review of the history of the historical debates about racial differences in intelligence, which follows in chapter 2, underscores this.

Chapter two, a review of the relevant literature, explores the meaning of intellectual disability, the history of intelligence testing specific to racial differences, including the expression of ’s nature versus nurture debate, the determination of scales to be used, the potential role of age, gender, and racial differences in the test scores, and the potential contributions of the subscales.

15 CHAPTER II

REVIEW OF THE LITERATURE

In reviewing the literature chapter 2 explores the relevant research on disability, age and gender in the measurement of intelligence. Chapter 2 then briefly describes some of the vastly documented history of intelligence testing as it relates to racial disparity in determination of disability and continues by discussing how the nature/nurture debate has been expressed in exploring the of racial differences in IQ tests. This chapter concludes by explaining the choice of which scales were explored and why the population of people applying for disability was chosen.

Disability

The American Association on Intellectual and Developmental Disabilities

(AAIDD, 2011) indicated that: (1) IQ testing is a major tool in assessing intellectual disability. (2) A score as high as 75 can indicate limitations in .

(3) The term “intellectual disability” means the same as “mental retardation.” Since

“intellectual disability” is the preferred term, the association changed their name from the

American Association on Mental Retardation to AAIDD in 2007. While identifying multiple and differing factors that may contribute to intellectual disability; they emphasized:

16 The overarching for evaluation and classifying individuals with intellectual

disabilities is to tailor supports for each individual, in the form of a set of

strategies and services provided over a sustained period. The goal is to enhance

peoples functioning within their own culture and environment in order to lead a

more successful and satisfying life. Some of this enhancement is of in

terms of self–worth, subjective well-being, pride, engagement in political action,

and other principals of ‘disability identity.’ (p. 2)

According to Foley-Nicpon and Lee (2012), “Now is the time for to give theory, research, and practice with individuals with disability the attention they deserve”(p.397). Foley-Nicpon and Lee (2012) completed an analysis of the content of 20 years of research in counseling psychology in the area of disability.

Despite disability being an important element of counseling psychology’s emphasis on multiculturalism and diversity coupled with the importance counseling psychology places on preventing marginalization and discrimination, disability research comprised “an extremely small amount (from less than 1% to 2.7%) of the counseling psychology literature” (p.392). Referencing 2005 Census bureau reports they indicated that this is of great concern when one realizes that 19% of the U.S. population (one in five) report having one or more disability and constitute the largest minority group in America.

Foley-Nicpon and Lee (2012) indicated that the study of disability is complex because there is an enormous range of disabilities, and what can happen is that disability research becomes specialized by divisions like Rehabilitation Psychology (APA Division 22) for physical disability and (APA Division 12) for psychological disorders. In a review of 55 articles in counseling psychology literature Foley-Nicpon and Lee (2012) found that amongst those few articles written in disability research the

17 majority (38%) were reviews of the literature, or opinion, policy, or legal reports.

Fortunately, the number of empirical studies completed in the area of disability research, while still sparse, has increased slightly in the past 20 years, a trend the authors are encouraging. However another major concern is that the limited disability research that is being conducted is “one dimensional and focused on the disability itself … but not on how aspects of disability interact with our many other identities including race, class and gender…Literature focusing on the intersection of multiple identities is recommended”

(p. 396).

Age

According to Rushton and Jensen (2005), the size of the average Black-White differences does not change beyond age three. This dissertation research continues the exploration to confirm if this remains the case for people applying for disability. Murray (2007) recently observed fluctuations in age and score differences that challenge Rushton and

Jensen’s conclusions. However, Murray found these differences based on testing with the Woodcock-Johnson (Woodcock & Johnson, 1989) test for intelligence. Kaufman,

Johnson and Liu (2008), using Kaufman Brief Intelligence Test (Kaufman & Kaufman,

2004), found IQ decreases with age but, this study did not report any impact of this change upon the lower end of intelligence test scores. Age may also be a consideration regarding contribution scores between differences in subscales (ex. Crystalized versus fluid scales). Horne and Cartell (1966), differentiated between fluid and crystallized intelligence. Fluid intelligence was defined as and the ability to perceive relationships and similarities and does not rely on specific instruction. In later intelligence tests this came to be known as performance intelligence. At this same time

Horne and Cartell also defined crystallized intelligence, which is accumulated knowledge

18 over time. Crystallized skills are affected by knowledge and cultural experiences. In later tests this came to be known as . Crystallized intelligence can increase with age, while Fluid intelligence can decrease with age.

Nisbett et al (2012) cited Flynn (2009) indicating correlations between declines in Fluid (g F) versus Crystallized (g C) and aging for “brighter individuals” (p 143), there appears to be an “age tax” after age 65 those with higher IQ show the greatest decline. The differences in age for lower IQ scores were not so great and may be found in the verbal skills. It should be noted however that this research was conducted on previous versions of the WAIS.

After age 65, the brighter the person the greater the decline: Whereas those at the

median lose an extra 6.35 points (SD= 15) compared to those 1 SD below the

median, those 1 SD above the median lose an extra 6.20 points compared to those

at the median, and those 2 SDs above the median lose an extra 3.4 points

compared to those 1 SD above. Those who are very bright, rather than below

average, pay a total penalty of 16 IQ points. The reverse is true of verbal, where

there is a ‘bright bonus’ of 6.30 points.

(p.143)

Nisbett et al (2012) also referenced Blair’s (2006) claim that g F declines more quickly than g C as we get older, as the prefrontal cortex deteriorates faster than the remainder of the cortex. Because people of different ages are also representative of different eras, comparisons from different ages as mediating the racial disparity could also provide information about differences from different times in history. Indeed,

Nisbett et al (2012) indicated a “dramatic decline of Black IQ with age” adding that there was a 5 point difference between Black and White 4 year olds, but a 17 point difference

19 Black and White 24 year olds. They add “this could be as it seems a loss with age. But it could be that younger cohorts of Black (those born 5 years ago) have had more favorable life histories than older cohorts of Blacks (born 24 years ago)” (p. 147).

This dissertation seeks to explore any fluctuations related to age within WISC-IV and WAIS-IV scores, and, more specifically, within persons from the lower end of the scale where the results carry higher stakes.

Gender

There is some discussion in the literature about the role gender has played in

WAIS-IV and WISC-IV scale results. Most studies have observed no differences, while others observed that some differences existed. For example, in a sample by Chen and

Zhu (2008) with 2200 children, half male/half female, “overall factor patterns, loadings, unique variances, and factor co-variances of the WISC-IV generally did not vary with gender” (p. 260).

Yet, other research found some gender differences. Jackson and Rushton (2006) explored test results for 100,000 Scholastic Assessment Test (SAT), (College Entrance

Examination Board, 1992), participants and found gender differences. The measured by the verbal and mathematics sections of the SAT (which the authors report ought to parallel g as measured by standardized intelligence tests) was significantly higher for males than for females. Chen and Zhu (2008) indicated that gender invariance is essential to demonstrate, and there is a great deal of interest in understanding if there are any differences between the genders in cognitive abilities. It is implied by the practice of lumping males and females test results together that the scales and the subscales have the same meaning for both genders. Chen and Zhu (2008) indicated that,

“According to standard 7.8 of the ‘Standards for Educational and

20 (American Educational Research Association and National Council on Measurement in

Education, 1999), ‘Comparisons across groups are only meaningful if scores have comparable meaning across groups. The standard is intended applicable to settings where scores are implicitly or explicitly presented as comparable in meaning across groups.’” (p.83).

Most tests found little or no gender differences in overall performance. Sattler (2008) reported WISC-IV average full scale IQ scores as similar for boys (100.24) and girls

(99.78). This held true for mean differences on Verbal Comprehension, Perceptual

Reasoning, and , but the average on the Processing Speed Index was almost 4 points higher for boys (102.48), than for girls (97.63). However, this data had been based on the original Psychological Corporation norm group, so it does not indicate if this gender difference holds true for youth with cognitive disabilities (Wechsler, 2003, p. 280). In addition, Sattler has not addressed adult assessment, so it does not report gender differences amongst adult test takers.

Nisbett et al (2012) did report gender differences (amongst normal population participants) with an advantage for males in “visuo-spatial abilities,” (such as object rotation) and for females in “verbal abilities” (fluency and memory). Males “verbal scores may be decreased because of a higher prevalence of stuttering, dyslexia, and in males.” Also, they noted that there are many more “mentally retarded” males than females. “Males are more variable in their performance on some tests of quantitative abilities which results in more males at both the high and low ends of the distribution…This may indicate if we find a gender difference (not already adjusted for by standardization) that it would not be because males are performing higher than females on quantitative measures (which is not the case in median and above median

21 performances)” (p. 145). In general, there may be no gender differences in overall performance, but there may be some in subtests as indicated:

“ Men and women apparently achieve similar IQ results with different brain regions, suggesting that there is no singular underlying neuroanatomical structure to general intelligence and that different types of brain designs may manifest equivalent intellectual performance” (Haier et al., 2005, p. 145).

Researchers could wonder if this standard applies not only to gender, but also to age and race, and whether more research is warranted. Review of the literature on gender differences raised some points that contributed to the research questions of this dissertation: Which research can be replicated? Do any of these findings hold true for

WAIS-IV? Does this hold true for differences on the high or low end of the scale? Do these differences in any way influence measured differences between races? Does gender contribute to psychometric outcomes indicating intellectual disability, and if so, what scales may be most influenced by gender?

History of Intelligence Testing and Racial Differences

Knowledge of the reported history of intelligence testing helps to understand why it remains important to stay apprised of issues around race in high stakes testing. While not a full report on the entire history of IQ testing, I will highlight some of the information in the literature about the historical factors that may have contributed to the controversy about the differences observed in scores between White and Black test takers. For a more extensive review of the history of intelligence testing, the reader is referred to: Beins (2010), Benjamin and Baker (2004), Boake (2002), Gould (1981),

Gutherie (2004), and Pickren, and Dewsbury (2002).

22 As early as 1817, Gall and Spurzheim began the practice of Phrenology concluding, and convincing the public that the size, shape, and bumps of the skull reflected its contents. This led to the establishment of multiple testing clinics within the

United States and increased public interest not only in the study of the skull and what it contained, but also what findings about the skull might mean.

In 1849, packed skulls with packing material to measure

“endocranial volume.” He found that he could pack about five cubic inches more worth of material into the skulls of Whites than Blacks (Rushton & Jensen, 2005). However, as

Gould (1981) reported, it was later discovered that Morton may have altered his data and even shaken the skulls differently to allow more material to be packed into selected skulls. In 1869, Galton, Charles Darwin's cousin, (also interested in genetic influence), used head size and other “brass instruments” to measure intelligence. Galton introduced the term to science and had a “lifelong obsession” with selective mating to improve the stock of a race (Welch, 2006). In 1873, Paul Broca weighed brains of

Blacks and Whites and found Whites brains to be heavier, their frontal lobes larger and with more “complex convolutions” (Rushton & Jensen, 2005). Thus, during the 1800s there was early research interest in intelligence and specifically measuring the brain and skull on the premise that the sizes reflected a parallel amount of intelligence.

People were interested in intelligence and in its relationship to race before Binet generated his first intelligence test in 1881. Binet, commissioned by the French government, was looking for “sub-normal children” and began objective testing in the hope of determining which children could benefit most from school. Simon joined Binet in 1905 and added tests of abstract reasoning and general mental ability, leading to the

Simon/Binet concept of a mental age (Weiten, 2010).

23 In 1897, Charles Cooley, first president and one of the founders of the American

Sociological Association, argued for a “deprivation” model. Cooley attributed , , and racism to explain the disparity in Black-White intelligence differences.

Cooley used the metaphor of corn seeds grown in deprived versus normal environments and the effects on their height (Rushton & Jensen, 2005). In 1904, Spearman discussed the concept of “General intelligence,” concluding it could be objectively determined and measured. Spearman observed that different measures of mental ability are positively correlated and theorized that all of these tests tapped into a general mental ability that he called “g.” Terman, at , developed the Stanford/ Binet in 1916 by adding a new scoring method and IQ (Intelligence Quotient) where IQ = MA /CA X 100

(mental age over chronological age multiplied by 100). Guthrie (2004) expressed concern about Terman and quotes Terman as having said: “[Mental retardation] represents the level of intelligence which is very, very common among Spanish-Indians and Mexican families of the Southwest and also among Negroes. Their dullness seems to be racial.”(p.61). Guthrie added: “Terman further predicted that when future IQ testing of these groups was done, ‘there will be discovered enormously significant racial differences which cannot be wiped out by any scheme of mental culture’” (p.61). Thus,

Guthrie demonstrated his concern that one of the early contributors to the field of intelligence testing firmly believed that persons of different races were not as intelligent.

Goddard introduced the study of the Kallikak family to the American people in

1912. This report served to illustrate the problems created by the potential breeding of whites with “colored” people and how their lack of intelligence predisposed them to lives of crime and poverty (Welch, 2006). Benjamin (2009) provided a review of Goddard’s contributions to the American intelligence testing movement. While Benjamin

24 acknowledged the importance of Goddard’s contributions, he was also concerned about

Goddard’s comments about keeping cognitively-impaired children in institutions or having them sterilized to prevent breeding. Benjamin was also concerned about

Goddard’s contributions to the eugenics movement. Benjamin described how Goddard’s book, The Kallikak family: a study in the of feeble mindedness (1912), helped to perpetuate the argument against “feebleminded people” having children. Gould (1981) claimed Goddard “altered photographs to suggest mental retardation in the Kallikaks,” indicating that his efforts were conscious acts of “social prejudice” (p.59). Goddard insisted that Binet’s test measured g, despite Binet’s insistence that his tests did not measure intelligence. Goddard also insisted intelligence is innate and inherited from one’s family (Welch, 2006).

In 1958, Wechsler introduced the WAIS (Wechsler Adult Intelligence Scale) and

WISC (Wechsler Intelligence Scale for Children). Both added performance scales

(assemble blocks, etc.) and were less verbal than the Stanford-Binet Intelligence Scale

(Terman, 1916). However, in the 1970s, these tests also generated some controversy regarding being potentially culturally-biased. For example, one item questioned what to do if someone hits you and gave higher points for a culturally-biased response: Contact the authorities. White, Anglo-Saxon people were more likely to provide this response, as opposed to other cultural groups that respond with “hit him back.” (Weiten, 2010).

An additional contribution to intelligence testing, that has been a part of understanding potential contributions to the racial differences is the idea that some intelligence may be innate while some may be learned. As noted earlier, terminology has been developed to describe this difference. Horne and Cartell (1966), differentiated between fluid and crystalized intelligence. Fluid intelligence was defined as problem

25 solving and the ability to perceive relationships and similarities and does not rely on specific instruction. In later intelligence tests this came to be known as performance intelligence. At this same time Horne and Cartell also defined crystallized intelligence, which is accumulated knowledge over time. Crystalized skills are affected by knowledge and cultural experiences. In later tests this came to be known as Verbal Intelligence.

Crystallized intelligence can increase with age, while Fluid intelligence can decrease with age.

Between 1968 and 1972, two historical figures emerged in challenging the alleged racial bias of intelligence tests, Adrian Dove and Robert Williams, designers of the

Chitling (Dove, 1968) and Black Intelligence Test of Cultural Homogeneity (BITCH;

Williams, 1972) tests. Both designed simple measures that illustrated that tests written in the language of Black people generated better performances by Black people. This indicated that perhaps to succeed at an IQ test, Blacks would need to be further exposed to White language, essentially demonstrating bilingual abilities that are not expected of

White test-takers. In 1968, Black Sociologist Adrian Dove developed the Dove

Counterbalance Intelligence Test. It came to be known as the Chitling Test (from an article in Newsweek, 1968) because one of the questions asked about cooking time for chitlings. According to Kaplan and Saccuzzo (2001), the Chitling test is not standardized and has no predictive validity, only face validity. They added that it did not discriminate between those who had been exposed to 1960’s Black culture and those who had not. In a conversation by phone with Dr. Robert Williams on June 11, 2008, he reported to this researcher that he was influenced by the Chitling test in his construction of the BITCH

(Black Intelligence Test of Cultural Homogeneity) Test (1972). This conversation indicated that Williams, like Dove, had strong interest in the contribution that crystalized

26 skills (verbal intelligence) was making on the perceived racial differences reported in standardized tests.

The BITCH test was designed with results from 100 Black and 100 White high school students (ages 16-18) from St. Louis, half of low SES and half of middle SES.

Black subjects appeared intensely interested in the test, while white subjects questioned validity and appeared tense (sighed, showed signs of discomfort, etc.). Test items consisted of 100 multiple-choice items. Examinees were presented with a word, term, or expression and asked to select the correct meaning. Blacks outscored whites by 36 mean points. Both groups produced skewed curves. Blacks’ scores were negatively skewed

(more high scores than low scores) while Whites’ were positively skewed (more low scores than high scores), indicating the test was difficult for Whites but easy for Blacks.

When distributions were combined, they produced a curve, with

Blacks comprising the upper half and Whites comprising the lower half of the normal distribution. Williams reported these results demonstrated that tests derived from the experience of one group, but used to determine the abilities of another, are inherently unfair. BITCH had only a .33 correlation with the language portion of the California

Achievement Test (CAT; California Test Bureau, 1973); meaning low CAT scorers were some of highest BITCH scorers and vice versa.

Additional research was conducted with the BITCH test. Long and Anthony

(1974) examined the relationship between WISC and BITCH scores for 30 Black high school students enrolled in an EMR (educable mentally retarded) program. All 30 students had been classified as educable mentally retarded using the WISC. The goal was to determine whether BITCH scores can be used to rule out mental retardation. They found the following: Black EMR students obtained similar scores on the WISC and the

27 BITCH. All 30 students obtained scores on the BITCH that fell below the 1st percentile, indicating that EMR students scored poorly on both test instruments even with the inclusion of culturally specific items. Matarazzo and Wiens (1977) compared WAIS and

BITCH scores for 17 Black and 66 White police applicants. They examined the utility of the BITCH as a selection instrument for Black police officers and found White applicants outscored Black applicants on the WAIS, while Black applicants outscored White applicants on the BITCH. WAIS and BITCH scores were completely uncorrelated. The authors concluded that the BITCH was not useful as a selection instrument due to lack of adequate ceiling for Black applicants. Young and Rearden (1979) compared BITCH and

Shipley Institute of Living Scale scores (Shipley, 1940) for 45 Black Chicago youth.

They used the Vocabulary and Abstract Reasoning subtests of the Shipley Scale. Their results indicated that BITCH scores correlated negatively with Vocabulary and Abstract

Reasoning on the Shipley Scale. Finally, Butler-Omololu, Doster, and Lahey (1984) compared WISC-R, CAT (California Test Bureau, 1973), and BITCH scores for 16 Black and 16 White high school students. The goal was to examine the influence of cultural factors on test performance. Their results indicated that Whites scored significantly better than Blacks on the CAT and the WISC-R, and Blacks scored significantly better than Whites on the BITCH. The authors concluded that cultural factors need to be considered during test construction.

Thus, the overall conclusions about the BITCH test have been that BITCH performance is either unrelated or negatively correlated to performance on traditional ability tests grounded in the Euro-American culture (e.g., WAIS, WISC, CAT, Shipley, etc.). Blacks outscored Whites on the BITCH. Whites outscored Blacks on tests reported

28 as favoring Euro-American culture, and cultural factors were reported as inseparable from the test construction process.

Guthrie (2004) described the work of G.R. Stetson, who in 1897 compared 500 black and 500 white American school children in an experiment that required them to memorize and repeat a stanza of poetry. Stetson found that the black children outperformed the whites, but as Guthrie (2004) indicated, little was done with the results of this test and additionally, it was concluded that “the memory technique was not a valid measure of intelligence” (p.63). What Guthrie concluded about Stetson’s research may be considered parallel, to some degree, to the conclusions made by more current intelligence theorists about the importance of findings like those of Williams and Dove. The point being, that efforts may have been made by some historical contributors to the field of intelligence testing to intentionally ignore information that could have contributed to the development of tests that were more fair and culturally sensitive. One could simply dismiss these findings because neither the BITCH test nor the Chitling test (or Stetsons findings) were ever found to be a measure of intelligence. But what was lost in that process is the point that was made by Williams and Dove at the height of the American

Civil Rights movement, while tests may be standardized they may not be fair; a point

Helms takes up later.

In 1981, Gould published the book The Mismeasure of Man, which explores the history of research efforts to demonstrate whites as being more intelligent than blacks. In this text, Gould is critical of Darwin, Morton, and Broca for their work in “Crainiometry” because they assumed that intelligence was a “single, innate, heritable, and measureable thing” (p.57). Thus, the research they were doing, he believed, was not accurate. While they offer correlations between things like or cranial capacity and intelligence,

29 they were only able to do this because they made the aforementioned assumption about intelligence.

Gould (1981) stated he had no doubt that IQ is to some degree inherited, but argued that there is more to the “hereditarian fallacy” than just the conclusion that intelligence is inherited. “The hereditarian fallacy resides in two false implications drawn from this basic fact: The equation of ‘heritable’ with ‘inevitable’ and the confusion of within- and between- group heredity” (p.185-186). Gould identified three major early contributors to bringing the of intelligence and intelligence testing to the

United States: (1) Goddard, who translated Binet’s work and brought the test back with him from Europe, (2) Terman, developer of the Stanford –Binet, and (3) Yerkes, who convinced the army into “objective testing” that contributed to the Immigration

Restriction Act of 1924.

Murray (Herrnstein & Murray, 1996) refuted Gould’s criticisms of early research on cranial capacity, brain size and intelligence. Murray indicated that research with magnetic resonance imaging (MRI) has found, if one accounts for body size, that a relationship between brain size and intelligence does exist and that there are different distributions of brain size among races. According to Murray (1996), Gould’s work is very wrong and it largely ignores the research on g and quotes Herrnstein as having said,

“You can make g hide, but you can’t make it go away” (p. 559). Murray (Herrnstein &

Murray, 1996) went on to argue that g has empirically been linked to IQ scores, “neuro- physiological functioning,” and . Murray added that, “the higher the g loading of a subtest, the higher its heritability” (p.560). Hence, we observe that the argument around a hereditarian explanation for why there is racial disparity is defined, to some degree, by

Spearman’s concept of “g” as proof of a genetic contribution to intelligence. But do

30 some subtests contribute stronger to the disparity than others, and perhaps another perspective may be that this would render those subtests as less fair?

In 1981, Sternberg posed a Triarchic model of intelligence. Sternberg found for most people, intelligence could be defined as either: Verbal Intelligence; having an excellent ability to use language (for example J.K. Rowling, author), Practical

Intelligence; exceptional problem solving skills (for example Bill Gates, innovator and entrepreneur), or ; strong ability to get along with and even influence other people (for example Oprah Winfrey, celebrity and social activist). Sternberg elaborated that intelligence, in the Triarchic model, has multiple layers and levels of influence. Similarly, Gardner (1983) argued that intelligence is more complex than a single factor (like “g”) and proposed eight types of intelligence based on observations of people of achievement/success: (1) Linguistic, (2) Logical, (3) Musical, (4) Spatial, (5)

Kinesthetic, (6) Interpersonal, (7) Intrapersonal, and (8) Naturalistic. Gardner also concluded that almost every studied relationship between societal outcome and IQ test explained, at the most, 20% of the outcomes. Eighty percent of the elements that contribute to socioeconomic status were factors that exist beyond measured intelligence

(Welch, 2006). Murray, in the Afterward section of (1996), was critical of Gardner’s work. Murray argued “linguistic intelligence” or “logical- mathematical intelligence” (as measured by IQ tests) were better predictors (than Gardner’s other five types of intelligence) of a person’s success in life. Murray added that Gardner had succeeded better than anyone else to demonstrate that his types or elements of intelligence were statistically independent of each other; that theses abilities actually grouped with one another, as implied by the concept of g.

31 Mayer, Caruso, Panter, & Salovey (2012) described the growing research zeitgeist towards alternative views of intelligence as the “hot ” and advocated for their continued inclusion in future research. These “hot intelligences would include social, emotional, and personal intelligences including the influence of peoples’ skills in their abilities to interact within a social context; as contrasted with “cool intelligences” which only measure such traditional cognitive functions as abilities to abstract and to manipulate information. Nisbett, et al (2012) indicated agreement with Mayer, et al

(2012), but added that beyond finding correlations between cool and hot intelligences, it has been hard to demonstrate a significant contribution of hot intelligences towards general behaviors and performance.

In 1987, Flynn published his findings indicating IQ gains in 14 separate nations.

Flynn (1987), found what has been named the “Flynn effect:” IQ had increased 3 points every 10 years, in different nations, due largely to increases in fluid (performance) intelligence as a society. Recently, in concluding a discussion of the Flynn effect,

Hiscock (2007) stated,

A pervasive increase over time in performance on IQ tests is well established. The

magnitude of the increase is especially marked when culture-reduced tests of

general intelligence, such as Raven’s Matrices, are used. The Flynn effect also

raises scores from Weschler and Stanford –Binet IQ tests, and the increases are

sufficiently large as to present an interpretive problem for practitioners who

administer IQ tests to their clients. Not only does the Flynn effect cause published

norms for Full Scale IQ to become progressively less appropriate over time, but it

also causes different subtest norms to change at different rates Clinical

neuropsychologists who use old versions of IQ tests not only will overestimate IQ

32 but also will risk misinterpreting ipsative indicators such as Verbal-Performance

disparities and subtest profiles. (p.526)

Thus we can see that neurologists and other testing researchers continued to observe what Flynn reported, and continued to utilize these findings to express concerns about test interpretation. Rushton and Jensen (2010) are more critical of Flynn’s findings:

Rather than interpreting the secular gain of three IQ points a decade as evidence

that people become familiar with the test material over time, requiring periodic

updates of the test, Flynn took it to mean that ‘real’ intelligence levels have

increased at least in abstract reasoning. Where is there evidence of this

‘familiarity’? (p.217)

Another major contributing factor to the historical issues surrounding was the 1995 publication of the best seller The Bell Curve. When Herrnstein and Murray (1995) published The Bell Curve, they made the argument that intelligence is largely genetic and that racial differences are an evolutionary bi-product. They included provocative conclusions, such as the idea that American Black intelligence was not impacted by slavery, evidenced by the research indicating that American Blacks performed better on IQ tests than African Blacks. In 1995, Murray added “Afterward” comments to criticisms of the text. He argued that American social stratification is based on intelligence, and in particular “g” which is argued to be largely inherited. Murray summarized the conclusions of The Bell Curve’s controversial 13th chapter:

Mental test scores are generally as predictive of academic and

for blacks as for other ethnic groups. Insofar as the tests are biased at all, they

tend to over-predict, not under-predict, black performance. These factors are

33 useful in the quest to understand why (for example) occupational and wage

differences separate blacks and whites, or why aggressive has

produced academic apartheid in our universities. (p.562)

Herrnstein and Murray (1995) presented a review of more than 156 studies to indicate that there existed the presence of a one standard deviation difference between

Black and White test scores while making the argument that there is no evidence of external or internal bias of the test that contributes to these differences. As mentioned earlier, one year after the initial publication of The Bell Curve, Murray wrote an

“Afterward” chapter which has been included in subsequent publications of the text.

Murray wrote alone because Herrnstein died the year the original text was published and because he felt the need to defend against the attacks made about their book. Murray noted that Herrnstein and himself accept the idea of g; a general factor of intelligence within which people differ.

However, lest the reader mistakenly conclude Murray (1995) is strictly arguing from a genetic point of view, he added: “It is scientifically prudent at this point to assume that both environment and genes are involved, in unknown proportions; and most important, people are getting far too excited about the whole issue” (p.563).

In criticism, Welch (2002) credited anthropologist K. Anthony Appiah for the observation that Herrnstein and Murray’s work “exploited the economic insecurity that many middle-class whites feel about their own futures” (p.192). Welch (2002) concluded, “Despite its timing, the ideas presented in The Bell Curve reflect the continuity of racial bigotry and prejudice that have legitimized the socioeconomic, cultural, and political oppression of people of African descent” (p. 195). We see then a

34 continued debate about not only the role of g but also the role of genetics and fairness in the racial differences found in these scales.

Nisbett et al (2012) indicated Blacks gained 5 .5 IQ points on Whites between

1972 and 2002. Rushton, (2012) rebuts claiming there was no narrowing in mean black- white IQ differences predicted by heritable g. Rushton (2012) challenges Nisbett’s

(2012) report, stating that they were incorrect in stating that between 1972 and 2002 there was a 5.5 point in narrowing the 15 point gap between whites and blacks based on educational gaps and did not describe adequately “how heritable g provides evidence of a significant genetic contribution to Black- White differences” (p.501). Nisbett et al

(2012) responded to Rushton’s challenge by indicating that his data represented only mean scores as opposed to looking at the gap reductions in terms of and that this data indicated the changes in the gaps are “substantial” (p.503).

The history of intelligence testing, as reviewed, indicates some contributions made by people that may have been compelled to emphasize a perceived difference between the intelligence of Whites and Blacks based on testing scores. Others have made contributions in efforts to indicate that a problem existed and that the environment played a key part in the differences. At the crux of the debate is the idea that one group may be intellectually advantaged due to nature, and at the other side of the familiar debate is the idea that the difference is due to environmental factors.

Nature: The Hereditarian Model

In review of some of the more current literature there were strong advocates of the hereditarian model. The idea advocates that genetics plays a strong role in influencing g and a strong role in the racial differences within these high stakes tests.

35 Rushton and Ankney (2009) reviewed research on brain size via brain imaging, external head size and general mental ability. They reported evidence of correlations

(r=.63) between brain size and g. In another study, Rushton (2010) argued that brain size was correlated not only with IQ, but also with longevity, parental care, intensity of child rearing practices, and delays in reproductive behaviors. Brain size was also used by

Rushton (2010) to explain differences in IQ between different nations. He concluded,

“Central to answering the question of why nations differ in IQ, longevity, crime, and economic ‘development status’, is heritable brain power that evolved in part as a response to natural selection in the colder northern latitudes” (p.99).

It could be interjected here, that an evolutionary explanation might also have accounted for how people of differing races may have been impacted by years of oppression and deprivation of intellectual stimulation. This would require further research than hypothesizing that the genetic explanation resides solely in adaptation to climatic differences. It could also be challenged that regretfully psychology has been unable to provide standardized tests to these early nomads thus conclusions about how the climate impacted their intelligence (by the tools we use to measure intelligence) can only be speculative.

Using large samples, Jensen (1995), reported having found differences between

Blacks and Whites in head size (controlling for age and body size). Jensen added that head size significantly correlated with IQ, not only within each racial group, but within families (i.e., same sex full with age partialed out), indicating a relationship between brain size and IQ. In addition, Jensen found that this increase in brain size correlated with g of IQ and that this “three-way pattern in IQ, brain size, and other traits” was found outside of the United States (Rushton & Jensen, 2005).

36 Rushton and Jensen (2005) reviewed the literature of the past 30 years on race differences and IQ and described that the “vexing” issue of a 15 point difference in

Black-White scores can be traced back to the mid-19th century. They also observed that the size of the average Black-White differences do not change beyond age three. Critical of what they perceived as a “tabula-rosa” approach to IQ, Rushton and Jensen, (2005) argued that social sciences and members of ethnic groups need to be more receptive to the idea of accepting genetic differences.

Some have suggested that we cannot expect members of ethnic groups to simply

accept the genetic component in mean-group differences in IQ and other traits.

Yet, with regard to individuals within families, we do acknowledge that some

siblings are more intelligent, more athletic, more physically attractive or more

socially charming than others. We also accept that some families are genetically

more gifted in certain areas than other families. We should, therefore, by

extension, be able to generalize to all the members of the human family. If

viewed against the backdrop that group differences are simply aggregated

individual differences, the former may be easier to accept than has hitherto been

thought. (p.282)

One explanation could be that some members of ethnic groups (and others) may have difficulty in simply accepting a genetic component is because of the potential illusory correlation that exists between race and intellectual superiority and inferiority.

“Illusory correlations” have been defined by Goldstein (2005) as what occurs when “a correlation between two events appears to exist, but in reality the correlation doesn’t exist or is much weaker than you assume it to be” (p. 460). He added that these correlations often appear in the form of stereotypical thinking. It may be hard to read statements

37 about genetic research and not conclude that the authors may be implying Black-White disparity is attributable to one race being born smarter than another; this could be due to the associated with the testing and the continued dialogues about what constitutes intelligence.

As recently as 2010 Rushton and Jensen argued for the need for a herditarian perspective and continue to challenge the notion of a “Flynn effect.” Rushton and Jensen

(2010) provided review of that maintained that the IQ gap between

Blacks and Whites remained at least 15-20 points (1.1 standard deviations) since 1917 when mass testing first started. “Flynn effect” advocates had argued that the average difference between races had decreased from the of (1917), to the Army General Classification Test of World War II (1946), to the Armed Forces

Qualification Test of the Vietnam era (1968). And the gap closed by 5.5 points (35%) between 1970 and 1992 (p.217). Rushton and Jensen (2010) were also critical that

Nisbett claimed that blacks had narrowed the gap in educational achievement by 35% on the National Assessment of Educational Progress (NAEP) tests adding that Nisbett argued that educational interventions eliminated the gap altogether. Rushton and Jensen

(2010) challenged Flynn and Nisbett’s findings that the racial disparity gap is shrinking, as bad research concluding: “To the contrary, we find there is little or no evidence of narrowing. The evidence presented in its favor rests mainly on insufficient sampling and selective reporting” (p. 217). Rushton and Jensen (2010) continued to argue for heritability for IQ differences and added, “we present analysis that demonstrate that over the last 54 years there has been no narrowing of the Black-White gap in either IQ or educational achievement”. These authors also predicted, “Black-White differences are greater on more heritable and g-loaded tests.” (p. 214). However, Rushton and Jensen

38 (2010) were referencing research on the WISC-III and the WAIS-R; thus, it would be beneficial to explore results of research on the more current scales.

Nurture: The Disparity is argued to be from Environmental Causes

Other recent articles, found in reviewing the literature, offered equally passionate arguments that intelligence differences could be explained by more external or environmental factors with disregard for the hereditarian perspective.

Onwuegbuzie and Daley (2001) identified and then challenged eight premises held by the hereditarian theory of intelligence; they cited studies relevant to challenging the hereditarian position. Their premises were:

(1). Intelligence is unidimensional and structural, with a dominant factor, g,

representing some core mental ability. ( 2). Intelligence is fixed within individuals

and across generations. (3). IQ tests accurately measure this fixed core mental

ability. (4). IQ tests are equally valid across racial ethnic and cultural groups. (5).

Intelligence determines individual’s professional and social standings. (6). The

environment plays little or no role in determining individual’s levels of

intelligence. (7).The intelligence of populations is deteriorating over time. (8).

Scores on IQ tests are consistent with classical statistical and measurement

theory.” (p. 210)

These authors, as mentioned, continued in their article to explore why each of these premises may be challenged from the more environmental perspective.

In continuing to discuss their concerns about the lack of acceptance for a herditarian model Rushton and Jensen (2005) indicated the American Psychological

Association (APA) has taken an environmental position. In 1996, the APA established

39 an 11-person Task force that concluded that while the findings that the White-Black IQ differences exist, “There is certainly no support for a genetic interpretation.” (p. 217).

Guthrie (2004), seeking to explore a more environmental explanation, was critical of Jensen and his work on brain size, race, and intelligence noting:

In 1969, Berkeley educational strode into prominence

just as the United States was establishing federally financed compensatory

programs designed to prepare disadvantaged children for increased learning

opportunities. His 1969 article ‘How much can we boost IQ and scholastic

achievement’ was aimed toward discrediting the purpose of such programs by

revitalizing the tired racist theme that inheritance accounts for 80 percent of the

variability in intelligence. Although Jensen attracted a following of supporters,

Illinois professor Jerry Hirsch examined Jensen’s research and ‘uncovered literal

misrepresentations of a kind and to an extent that erodes all confidence in it (and

in him) as a reliable source of information’. Hirsch further questioned the

derivation of Jensen’s formula for estimating broad heritability. ‘He did not say

how this formula is derived. It has no theoretical justification nor does it estimate

heritability broad or narrow.’ (p. 106)

Adding to the environmental argument, Manly (2005) stated that to make such conclusions about groups of people, concluding that the resultant differences are based in heredity, is not only in error, but harmful. “Normative data have been used by social scientists such as , Arthur Jensen, and Richard Herrnstein, whose research agendas lead to dangerous and irresponsible biological and genetic interpretations” (p.

272).

40 Using age appropriate IQ measures and conducting research to explore an environmental explanation for the disparity, Brooks-Gunn, Klebanov, and Duncan (1996) tested 483 Black and White five year old children born with low birth weight. They found the traditional one standard deviation IQ score, but identified that the disparity was decreased if they controlled for economic factors (poverty) and social factors (if a learning environment existed and if there was “warmth” in the home). Also, from an environmental perspective, as stated earlier, Gardner concluded that almost every studied relationship between societal outcome. Also, IQ tests explained at the most 20% of the outcomes with 80% of the elements that contribute to socioeconomic status being factors that exist beyond measured intelligence, indicating Gardner’s endorsement of a more environmentally based explanation for intelligence (Welch, 2006).

Scott (1994) noted an increase in IQ scores of three points, indicating that Blacks had increased ability to lead “productive lives in a complex society” (p.56) despite increased impoverishment that occurred simultaneously.

Dickens and Flynn (2006) utilized the results from standardization of the WISC

(WISC-R, WISC-III, and WISC-IV) in 1972, 1989, and 2002, and the WAIS (WAIS-R, and WAIS-III) in 1978 and 1995. They also utilized standardization results from the

Stanford-Binet and Armed Forces qualifications tests. They argued that the lack of constancy in the Black-White demonstrates that it is a “myth,” and therefore cannot serve to argue for a genetic origin for IQ. They reported that the analysis of the studies show that Black children have had large IQ gains (relative to Whites) since the 1960’s.

Blacks have gained 4-7 IQ points on Whites over the past 30 years. Neither

change in the ancestry of the individuals classified as Black nor those who

41 identify themselves as Black can explain more than a fraction of the gain.

Therefore, the environment has been responsible. (p.917)

Even more recently one speaker noted that we must still concern ourselves with the potential role environment plays on intellectual development. The October 2011 APA

Monitor (Winerman, 2011) reported that at the 2011 APA convention, one speaker,

Frank Worrell, PhD, shared “We have spent more than half a century trying unsuccessfully to address the achievement gap” (p.28). This was in reference to the educational achievement gap between Whites and minorities. Worrell suggested several environmentally based factors contributing to the achievement gap: (1) A lack of diversity in schools, (2) a failure to support immigrant students, (3) too few leaders of color and (4) a failure to support ESL students. Achievement measures are often argued to demonstrate the predictability of intelligence tests. Thus I would argue that Worrell’s perspective indicated clear and specific environmental factors contributing to a gap in achievement and could relate to disparity in test scores.

Review of the literature would indicate there are as many compelling arguments for viewing the origins of the disparity in Black and White test scores as being from environmental factors as there have been from more genetically-oriented theories.

Although it aids in simplifying the debate over the origins of the test score disparity by classifying the discussion into another example of Psychology’s nature-nurture debate, other research indicates that the issue is more complex.

Beyond Nature versus Nurture: Towards a More Complex Understanding

While Rushton and Jensen are mostly known for advocating the hereditarian position (that the disparities between Black and White test scores lies in genetic differences), they appear to have taken a more buffered position when offering

42 explanation of their model. Rushton and Jensen (2005) claimed that their “hereditarian model” is actually 50% genetic-50% environmental. Further, they sited Jensen’s work on twin studies and concluded an interactive model of both genetic and environmental factors best explained the observed Black-White group differences in IQ, whereas both the genetic-only and the environmental-only explanations were inadequate.

Sattler (2008) further emphasized an interactionist perspective, advocating:

The Flynn effect might be due to improvements in educational opportunities and

schooling, genetic factors, increased cross-ethnic mating, smaller family size, test

sophistication (i.e., improved ability in the population to take intelligence tests)

improvements in cognitive stimulation (e.g., availability of cognitively

stimulating toys, computers, books, and media) better , and improved

parental literacy. (p.252)

In addition, Sattler (2008), stated: “We believe that intelligence scores represent interplay of biological factors, environmental factors, and past learning. If ethnic minority children obtain low scores on intelligence tests, perhaps we need to improve the educational system rather than abandoning standardized tests” (p. 162).

But rather than reducing conclusions like these to a concept like Interactionism, a better and more sophisticated perspective has been proposed. According to Perry et al.

(2008), a Culturalist model was developed by Helms in 1992 to serve as an alternative explanation for the Black-White test score gap. This served to challenge the “implicit- biological” perspective and the “environmental” perspective by explaining that “the culturalist point of view emphasizes racial group differences in CAT (cognitive ability test) scores as a matter of cultural bias in the tests and the testing process itself” (p. 156).

The culturalist model accounts for the illusory correlation that may continue to exist in

43 the group differences, and it aids in our acknowledgement of a history of cultural bias in that parallels a known and acknowledged history of racial prejudice and discrimination that was less implicit than when the tests were first developed.

It is beyond the scope of this dissertation research to succeed in identifying if any changes that we see in the historical disparities of these tests are brought about by changes within the genetics or the environment of the test takers. Indeed, studies of the complexities of evolution have indicated that divorcing nature and nurture is almost impossible. This research is attempting to explore more about the current culture of testing and in the hope of determining if there have been changes in the disparity (has the

“Flynn effect” closed any gaps). It is especially important to see if this disparity exists among test takers on the below average end of the test results.

Determining Which Scales to Use

The data for this dissertation was gathered at a testing site which utilized the

WAIS-IV and the WISC-IV more than other scales. Most of the participants had been referred for testing by the Bureau of Disability Determination and more often the WAIS-

IV and WISC-IV were the tests requested. Watkins, Wilson, Kotz, Carbone, and Babula

(2006) referred to Prifitera, Saklofske, Weiss, and Rolfus, (2005) when noting that the

WISC-IV has already surpassed the WISC-III as the most widely used test of cognitive abilities in children. Chen and Zhu (2008) noted that, “Weschler tests are among the most widely used in the world. Roughly 20 countries have standardized these tests so far.” (p.206).

Subscales

Studies, similar to this dissertation research establish support for the g factor theory and have reported that the study of subscales does not indicate that they make

44 independent contributions to the disparity. Watkins, et al. (2006) completed a factor structure analysis of the WISC-IV among referred students. In their study, 432

Pennsylvania students were referred to be evaluated for inclusion to special education classes. Of the participants, 176 were female and 256 were male. They were of ages 6-16

(average 10.3, standard deviation 2.7 years). The breakdown of the participant’s racial background was: 89.6 % white 2.5% black, 1.6 % Hispanic. They found 65% eligible for special education services: 37 % had learning disabilities 5% mental retardation, 7 % emotional disabilities, 8% gifted, 2% speech disabilities, and 6% had multiple disabilities. Full scale IQ scores were found to be “slightly lower and somewhat more variable than the normative sample,” (p.982), which the authors indicated had been found to parallel other studies of referred students. But the distribution of the scores was found to be normal. Analysis of their data did not appear to explore what role, if any, age, race, or gender may be contributing to the findings, but the results of their factorial analysis study indicated that g more than any subtest contributed most to the variance in the core

WISC-IV subtests. This fit with the Carroll (1993) three-stratum theory used to develop the test, and indicated, according to the authors, that the same model proposed to work with the general population also fits for use with “referred students.”

Although it runs against the g factor perspective, research into the effects of the subscales could be justified if the purpose of such exploration aids in the development of culturally-sensitive and fair testing. Shuttleworth-Edwards et al. (2004), in addition to demonstrating the important difference that quality education plays in intelligence, referred to multiple studies indicating that block design, digit span, vocabulary, and arithmetic subtests in the WAIS-III were found to be sensitive to cultural differences.

They added that this was often in a negative direction and was related to educational

45 deprivation. Whitaker (2008) found that children with lower IQ appeared to have difficulty following instructions on certain subtests, especially Letter-Number-

Sequencing. He demonstrated that the WISC-IV may be giving lower IQ scores for youth than the WAIS-III for children with low IQ.

Glass, Ryan, Chater, and Bartels (2009) advised that if we are going to compare subtests, we need to be certain that we limit the comparisons to hypothesis testing and not jump to concluding applicability in clinical decision making. They cited research on internal consistency reliability and what were considered acceptable standards. They identified that the research indicated a range of .70 to .90 and was found to be generally acceptable. When the measure is utilized to generate hypothesis, a level of at least .80 needed to be reached in order to be considered adequate, and that internal consistency reliabilities needed to be even higher (.95) when the tool is used as important decisions concerning treatment or diagnosis are to be made based on a test score. Because the results indicated that the internal consistency reliabilities for the subtests are not as high as recommended for clinical decision making, clinicians need to be careful when interpreting discrepancy scores. They also noted, “Test results for individuals at different ability levels, as well as, a variety of clinical groups, including those with neurobiological disorders (e.g., mental retardation and traumatic brain injury), and should also receive attention” (p.143). This dissertation was designed to include representation of people from this understudied group of people with disabilities.

Summary

IQ tests, especially WAIS-IV and WISC-IV are still a primary resource in determining intellectual disability and disabled populations remain inadequately researched. Age, gender, and race have each been considered, albeit to variant degrees,

46 as potentially contributing to variations in intelligence testing scores. A review of the history of intelligence testing indicates controversy over racial differences in intelligence test scores; including discussion over the role heredity may play in influencing outcomes of these disability measures. Several authors cited have generated alternate descriptions of intelligence, yet the primary method for determination of intellectual disability remains standardized testing. Although debate remains in explanations for differences in test results, even the strongest and most read advocates of a hereditarian position Herrnstein and Murray (1995) indicated endorsement of the recognition that if there will be a narrowing of the disparity of scores between the races it is mostly likely to begin to appear amongst people with lower test scores.

Having reviewed some of the key literature, Chapter 2 explored the topic of disability and the need for additional research with this area; it also explored the research on age and gender as they can serve as potential contributing factors in outcomes related to testing for intellectual disability. The chapter also provided review of the history of intelligence testing as it relates to racial disparity in test scores. This chapter also discussed how the nature/nurture debate was expressed in exploring the heritability of racial differences in IQ tests. The chapter concluded by exploring the importance of research on age, gender, along with the subscales in helping to understand which factors may be contributing to the racial disparity amongst people scoring on the lower end of the range of cognitive functioning. Chapter 3 explores the methodology used to conduct this research.

47 CHAPTER III

RESEARCH METHODOLOGY

It is important to take advantage of every opportunity to energize and encourage

research on culture and cognitive test performance. (Manly, 2005, p. 271)

Chapter three describes the design, method, and statistical details of this dissertation research. The description includes the participants, procedures and research instruments. Chapter three closes with an explanation of the selected statistical procedures used to analyze the following research hypothesis:

Research Hypothesis

Hypothesis 1: Among people applying for disability there is a relationship between age, gender, race, and performance on the WISC-IV (Wechsler, 2003).

Hypothesis 2: Among people applying for disability there is a relationship between age, gender, race, and performance on the WAIS-IV (Wechsler, 2008).

Procedure

Following Institutional Review Board (IRB) approval from Cleveland State

University and permission to use scores from the Center for Effective Living, data were collected from existing charts. The data were analyzed from an existing archived source at the Center for Effective Living, a Midwestern psychology and testing private practice

48 that has been providing services by licensed professionals for more than 30 years. The center is a private practice site dedicated to working with forensic clients, disability cases, individual adults, teens, children, and family clients. The data of interest were face sheets that included the assessors’ descriptions of the clients’ race and gender, another sheet that includes date of birth, and an additional data sheet that contained the results on subtests of either the WISC-IV or WAIS-IV, depending on the participant’s age. This data were obtained as a part of psychological testing to determine participant eligibility for disability which included a specific request to be assessed for cognitive disability using one of the two studied scales. A double-blind protocol was implemented. Separate parties employed by the center and trained in HIPPA (Health Insurance Portability and

Accountability Act, 1996.) compliance and maintenance of confidentiality converted the data from confidential to anonymous scores. They then entered the data about race, age, and test scores onto an Excel spreadsheet. The spreadsheet was the only data to leave the center and only for statistical analyses. No identifying information was transferred to the researcher that could pair the actual person with either their scores or information about their age, gender, or race.

During the course of testing, administrators occasionally determine, based on behavioral observations, if a participant may either be malingering or too impaired by illness to effectively complete testing. If either malingering or impairment was found to be the case, the data were not used in the study. Examples of these would be when a person walks out of the test, fails to complete sections, or becomes too anxious or agitated to respond.

49 Participants

Participants included approximately children and adults who were referred either by themselves or from the Social Security Bureau of Disability Determination to the

Center for Effective Living for testing for cognitive impairment and to determine eligibility for disability income. The collected participant data existed in archival records.

Identifying information was removed by staff before data were entered, by separate, trained staff, on excel spreadsheets. Participants were of mixed age and gender and had been identified as being either White or Black. Participants of mixed race were excluded because of small sample size. Participants’ cognitive functioning was assessed by trained testers who had at least held a Master’s degree and were supervised by a licensed

Psychologist. All assessments were completed at The Center for Effective Living Review from November 2008 through August of 2009. These cases were chosen because the records were ready for filing, about to be disposed of, and most accessible for the staff completing the spreadsheets.

Instruments

WISC-IV. According to the WISC-IV: Technical and interpretive manual

(Wechsler, 2003), the WISC-IV is “an individually administered clinical instrument for assessing cognitive ability of children aged 6 years 0 months through 16 years 11 months” (p.1).

As summarized in Table 1., the WISC–IV measures ability in four areas: Verbal

Comprehension Index (VCI), a measure of language command; Perceptual Reasoning

Index (PRI), a measure of manipulation of concrete materials or processing of visual stimuli to solve non-verbal problems; Working Memory Index (WMI), a measure of short-term memory; and Processing Speed Index (PSI), a measure of how quickly and

50 correctly someone can think about things needed to complete a task. These four areas are combined to provide a participant’s Full Scale Intelligence (FSIQ).

The overall reliability (average internal consistency coefficients) of these four

Indexes is contained in the WISC-IV Technical and Interpretive Manual (Wechsler,

2003). The scales and their overall reliabilities are summarized as: Verbal

Comprehension (VCI) .94, Perceptual Reasoning (PRI) .92, Working Memory (WMI),

.92, Processing Speed (PSI), .88, and Full Scale .97. Flanagan and Kaufman (2009, p.41) report an overall validity of .89 in relation to the WISC-III

Each of the Indexes is composed of subtests. The different subtests contribute to the scores of the Indexes as follows: Verbal Comprehension (VCI) composed of:

Similarities (SI), Vocabulary (VC), and Comprehension (CO). The Perceptual Reasoning

Index (PRI) is composed of: Block Design (BD), Picture Concepts (PCn), and Matrix

Reasoning (MR). The Working Memory Index (WMI) is composed of Digit Span (DS) and Letter-Number Sequencing (LN). The Processing Speed Index (PSI) is composed of

Coding (CD) and Symbol Search (SS).

The subtests, their abbreviations, and their descriptions are as follows: Block

Design, (BD). “While viewing a constructed model, or a picture in the Stimulus Book, the child uses -and-white blocks to re-create the design within a specified time limit”

(p.2). Similarities (SI): “The child is presented two words that represent common objects or concepts and describes how they are similar” (p.2). Digit Span (DS): For Digit

Span, Forward: “the child repeats numbers in the same order as presented aloud by the examiner” (p.2). For Digit Span Backward: “the child repeats numbers in the reverse order of that presented aloud by the examiner.” Picture Concepts (PCn): “The child is presented with two or three rows of pictures and chooses one picture from each row to

51 form a group with a common characteristic” (p.2). Coding, (CD): “The child copies symbols that are paired with simple geometric shapes or numbers. Using a key, the child draws each symbol in its corresponding shape or box within a specified time limit.” (p.2).

Vocabulary, (VC): “For Picture Items, the child names pictures that are displayed in the

Stimulus Book” (p.2). “For Verbal Items: the child gives definitions for words that the examiner reads aloud” (p.2). Letter-Number Sequencing (LN): “The child is read a sequence of numbers and letters and recalls the numbers in ascending order and the letters in alphabetical order” (p.3). Matrix reasoning, (MR): “the child looks at an incomplete matrix and selects the missing portion from five response options” (p.3).

Comprehension, (CO): “The child answers questions based on his or her understanding of general principles and social situations” (p.3). Symbol Search (SS): “The child scans a search group and indicates whether the target symbol(s) matches any of the symbols in the search group within a specified time limit” (p.3)

Table 1

WISC-IV, Indexes and Their Subtests

Index Subtests

Verbal Similarities (SI), Vocabulary (VC), and Comprehension (CO).

Comprehension (VCI)

Perceptual Reasoning (PRI) Block Design (BD), Picture Concepts (PCn), and Matrix Reasoning

(MR).

Working Memory (WMI) Digit Span (DS) and Letter-Number Sequencing (LN).

Processing Speed (PSI) Coding (CD) and Symbol Search (SS).

52 Subscale Overall Reliability (test-retest, average internal consistency coefficients), according to the WISC-IV Technical and Interpretive Manual (Wechsler,

2003) are BD = .86, SI =.86, DS = .87, PCn = .82, CD =.85, VC =.89, LN =.90, MR

=.89, CO =.81, and SS =.79.

Additional supplemental subscales (Picture Completion, Cancellation,

Information, Arithmetic, and Word Reasoning) are available as a part of the WISC-IV, but are not described here, because they were not included in this study. The reason for their exclusion is that when testing was requested for the participants, it specified that only the primary subtests be used. Future research could explore the potential gains and losses of the supplemental scales in relation to the hypotheses.

Flanagan and Kaufman (2004) reported that according to the test manual, there is validity evidence for test content, response procedure, internal structure, relationship with other variables, and consequences of testing. According to Pearson’s website,

(Pearsonpsychcorp.com, 2010) careful sampling ensures that norms are representative of the current population of children in the United States. The WISC–IV sample consisted of 2,200 children between the ages of 6:00 and 16:11 years. A total of 200 children were selected for each of the 11 age groups. The sample was stratified on age, sex, parent education level, region, and race/ethnicity.

WAIS-IV. The WAIS-IV (Wechsler, 2008) is intended for use with adults aged

16 to 90. As summarized in Table 2, the assessment measures cognitive ability using a core battery of 10 unique subtests that focus on four specific domains of intelligence: verbal comprehension, perceptual reasoning, working memory, and processing speed.

The WAIS-IV featured a normative sample of 2,200 adults and was stratified by age, gender, education level, ethnicity, and region to provide the highest reliability of results.

53 Thirteen special group studies included in the data were also conducted with specific clinical populations by Litchenberger and Kaufman (2009).

WAIS-IV measures ability in four areas: Verbal Comprehension Index (VCI), which measures command of language, Perceptual Reasoning Index (PRI), which measures manipulation of concrete materials or processing of visual stimuli to solve non- verbal problems; Working Memory Index (WMI), which measures short term memory; and Processing Speed Index (PSI), which measures how quickly and correctly someone can think about things needed to complete a task. These four areas are combined to provide a participant’s Full Scale Intelligence quotient (FSIQ).

Overall reliability (test-retest average internal consistency coefficients) of these four Indexes, according to Litchenberger and Kaufman (2009) are summarized as: Verbal

Comprehension (VCI) =.96, Perceptual Reasoning (PRI) = .87, Working Memory (WMI)

= .88, Processing Speed (PSI) =.87, and Full Scale =. 96. Lichtenberger and Kaufman

(2009, p. 32) report an overall validity of .94 in relation to the WAIS-IV.

The Indexes are composed of subtests as follows: VCI is composed of Similarities

(SI), Vocabulary (VC), and Information (IN). PRI is composed of Block Design (BD),

Matrix Reasoning (MR), and Visual Puzzles (VP). WMI is composed of Digit Span (DS), and Arithmetic (AR). Processing Speed (PSI) is composed of Symbol Search (SS) and

Coding (CD). Litchenberger and Kaufman (2009) provided descriptions of subtests as follows:

Similarities (SI), the examinee is presented two words that represent common

objects or concepts and describes how they are similar, Vocabulary (VC), for

picture items, the examinee names the object presented visually. For verbal items,

the examinee defines words that are presented visually and orally. Information

54 (IN): the examinee answers questions that address a broad range of general

knowledge topics. Block Design (BD), working within a specified time limit, the

examinee views a model and a picture or a picture only and uses red and white

blocks to recreate the design. Matrix Reasoning (MR), the examinee views an

incomplete matrix or series and selects the response option that completes the

matrix or series. Visual Puzzles (VP) working within a specified time limit, the

examinee views a completed puzzle and selects three response options that when

combined, reconstruct the puzzle. Digit Span (DS), For Digit Span Forward, the

examinee is read a sequence of numbers and recalls the numbers in the numbers

in the same order. For Digit Span Backward, the examinee is read a sequence of

numbers and recalls the numbers in reverse order. Arithmetic (AR), working

within a specified time limit, the examinee mentally solves a series of arithmetic

problems. Symbol Search (SS), working within a specified time limit, the

examinee scans a search group and indicates whether one of the symbols in the

target group matches, and Coding (CD), using a key, the examinee copies

symbols that are paired with numbers within a specified time limit. (p. 25)

According to Litchenberger and Kaufman (2009) the subtests are summarized as having Test-Retest Reliability Coefficients as follows: “SI=.87, VC=.89, IN=.90,

BD=.80, MR=.74, VP=.74, DS=.83, AR=.83, SS=.81, and CD=.86.” (p.35).

55 Table 2

WAIS-IV: Indexes and Their Subtest

Index Subtests

Verbal Similarities (SI), Vocabulary (VC), Comprehension (CO) Comprehension (VCI) Information (IN) and Visual Puzzles (VP). Perceptual Reasoning (PRI) Block Design (BD), and Matrix Reasoning (MR).

Working Memory (WMI) Digit Span (DS) and Arithmetic (AR).

Processing Speed (PSI) Coding (CD) and Symbol Search (SS).

Additional supplemental subtests, Comprehension (CO), Figure Weights (FW),

Picture Completion (PCm), Letter-Number Sequencing (LN), Cancellation (CA), are available as a part of the WISC-IV, but are not described here because they were not included in this study. The reason for their exclusion is that when testing was requested for the participants, it specified that only the primary subtests were to be used. Future research could explore the potential gains and losses of the supplemental scales in relation to the hypotheses. Analyses The Kolmogorov-Smirnov test was used to determine that the sample was normally distributed. To answer research Hypothesis 1, “Among people applying for disability there is a relationship between age, gender, race, and performance on the

WISC-IV (Wechsler, 2003).” A MANOVA was ran to determine any relationships between age, gender, race, and performance on the WISC-IV (Wechsler, 2003). Since significance was determined a post hoc MANOVA was run on the subtests. Since the

Full Scale IQ is composed of the Indexes and the Indexes composed of the subtests separate analysis was warranted.

56 To answer research Hypothesis 2, “Among people applying for disability there is a relationship between age, gender, race, and performance on the WAISC-IV (Wechsler,

2008).” A MANOVA was run to determine any relationships between age, gender, race, and performance on the WAIS-IV (Wechsler, 2008). Since significance was determined a post hoc MANOVA was run on the subtests. Since the Full Scale IQ is composed of the

Indexes and the Indexes composed of the subtests separate analysis was warranted.

Summary

This third chapter outlined the research hypothesis, method, procedure, instruments and analyses used to answer the research questions. Chapter four will report the findings of those analyses for each research hypothesis.

57 CHAPTER IV

RESEARCH RESULTS

Introduction

Chapter 4 presents the results of the analyses conducted to answer the research questions of this dissertation. As stated in the prior chapter these questions are:

Hypothesis 1: Among people applying for disability there is a relationship between age, gender, race, and performance on the Wechsler Intelligence Scale for Children (WISC-

IV; Wechsler, 2003). Also, Hypothesis 2: Among people applying for disability there is a relationship between age, gender, race, and performance on the Wechsler Adult

Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008). The chapter contains results of the analyses completed for both the WISC-IV and the WAIS-IV scores in relation to age, race, and gender amongst people applying for disability income. The data were statistically analyzed using SPSS, PASW Statistics version 18. Data were analyzed using GLM (General Linear Model) Multivariate Analysis.

GLM analysis is appropriate when the sample distributions are normal (they were) and allows for investigation of the effects of both individual factors (such as gender) and interactive factors (such as gender and age). In addition GLM allows for analysis when models are unbalanced (when there are an uneven number of participants

58 in different groups) as was the case with this sample. MANOVAS were preferred instead of multiple ANOVAS to control for inflation of Type I error. Statistical significance was set at .05 as the research was exploratory.

Demographics

This section provides the demographic data to aid in providing a description of the participants for the WISC-IV and the WAIS-IV (Table 3) based upon gender and race. In addition it provides a reporting of participants’ ages for the WISC-IV and the

WAIS-IV (Table 4 and 5) as distributed by the age categories assigned and determined by the test designers, The Psychological Corporation.

Table 3

WISC-IV and WAIS-IV Demographics (Summary of participants)

Gender N Race N

WISC-IV

Male 161 White 83

Female 96 Black 174

Total 257 257

WAIS-IV

Male 246 White 214

Female 148 Black 180

Total 394 394

As indicated in Table III there were 161 Males and 96 Females that completed testing using the WISC-IV. Of this same group, 83 participants where White and 174 were

59 Black. Also indicated in table 1, there were 246 Males and 148 Females that completed testing using the WAISC-IV. Of this same group, 214 where White and 180 were Black.

Table IV provides a detailed reporting of the distribution of participants into the different age categories as provided by The Psychological Corporation.

Table 4

WISC-IV Age (by range categories; as determined by The Psychological Corporation)

Category Age Range N 1 6.0 – 7.11 45 2 8.0 – 9.11 42 3 10 – 11.11 64 4 12 – 13.11 48 5 14 – 15.11 48 6 16+ 10 Total 257

Table 5

WAIS-IV Age (categories; as determined by The Psychological Corporation)

Category Age Range N 1 16 – 17.11 25 2 18 – 19.11 127 3 20 24.11 48 4 25 – 29.11 22 5 30 – 34.11 25 6 35 – 44.11 42 7 45 – 54.11 65 8 55 – 64.11 40 Total 394

60 Analysis of the WISC IV scores for Age, Race, and Gender: Research Hypothesis 1

Among people applying for disability there is a relationship between age, gender,

race, and performance on the WISC-IV (Wechsler, 2003).

Table 3 provided data about the Mean and Standard Deviation scores for participants applying for disability income that were assessed for cognitive disability using the WISC-IV scale.

Table 6

Means and Standard Deviations (in parentheses) of WISC-IV Scores for Disability

Applicants

Scale FSIQ VCI PRI WMI PSI

Total sample (N =257) 67.80 (18.58) 19.14 (6.39) 22.56 (6.74) 12.40 (4.53) 13.66 (5.36)

Gender

(Male, N = 161) 66.68 (19.58) 18.98 (7.01) 22.47 (7.18) 11.98 (4.52) 12.86 (5.71)

(Female, N = 96) 70.29 (15.83) 19.42 (5.21) 22.70 (5.97) 13.12 (4.48) 15.01 (4.41)

Race

(White, N = 83) 70.14 (16.12) 20.40 (6.14) 23.39 (6.23) 11.98 (4.36) 14.34 (5.52)

(Black, N = 174) 66.68 (19.58) 18.54 (6.43) 22.16 (6.95) 12.61 (4.61) 13.34 (5.26)

Age (Categorical)

1 6.0 – 7.11 (45) 69.71 (19.68) 18.27 (7.59) 24.24 (6.92) 12.20 (4.55) 14.98 (5.48)

2 8.0 – 9.11 (42) 70.95 (18.12) 20.45 (5.31) 22.76 (6.25) 13.17 (4.73) 14.64 (4.69)

3 10 – 11.11 (64) 72.11 (19.78) 20.84 (6.86) 23.30 (7.32) 13.42 (4.29) 14.66 (5.48)

4 12 – 13.11 (48) 66.69 (16.58) 18.88 (5.55) 21.94 (6.24) 12.48 (4.07) 13.10 (5.13)

5 14 – 15.11 (48) 60.50 (15.05) 17.60 (5.56) 20.87 (6.43) 10.69 (4.42) 11.25 (5.05)

6 16+ (10) 58.70 (20.81) 15.30 (6.16) 20.40 (6.77) 11.50 (6.09) 11.50 (5.54)

61 Full Scale Intelligence Quotient (FSIQ). Univariate analysis of variance was conducted to explore interactions between gender, race and age on WISC-IV performance. Table 7 provides results indicating the only variable of significance was categorical age. This significance was not found when categorical age was explored in interaction with any other variables. Because the Indices measure distinctly separate factors that contribute to FSIQ, additional multivariate analyses were conducted on the

WISC-IV Indices. The WICS-IV Indices are: Verbal Comprehension Index (VCI; which measures command of language), Perceptual Reasoning Index (PRI; which measures manipulation of concrete materials or processing of visual stimuli to solve non-verbal problems), Working Memory Index (WMI; which measures short term memory), and

Processing Speed Index (PSI; which measures how quickly and correctly someone can think about things needed to complete a task). These four areas are combined to provide a participant’s FSIQ.

Table 7

Univariate tests: WISC-IV: Dependent Variable FSIQ by Age, Race, and Gender

Effect F df p. Age 2.426 5 .036* Race 0.372 1 .542 Gender 1.646 1 .201 Race X Gender 0.022 1 .882 Race X Age 0.353 5 .880 Gender X Age 0.052 5 .998 Race X Gender 0.382 4 .821 X Age *p < .05

62 Indices. A 6 (Age group) X 2 (Race), X 2 (Gender) factorial analysis of variance

(MANOVA) was conducted with the WISC-IV test results to determine the extent to which there were differences and interactions in age groups, race or gender on performance on the 4 Indices (VCI, PRI, WMI, & PSI) of the WISC IV. The MANOVA was preferred over multiple ANOVAS to control for the inflation of Type I error.

Statistical significance was set at .05 as the research was exploratory. Prior to computing the MANOVA, the Kolmogorov-Smirnov test determined the sample was normally distributed, meeting the requirement for use of a MANOVA. One-Sample

Kolmogorov-Smirnov Test for the WISC-IV (X= 67.798, SD= 18.576, p= .132) indicated that the scores for WISC-IV were normally distributed. Table 8 demonstrates that when the Indices are included, there is significance for both age and gender.

Table 8 Multivariate tests for Age, Race, Gender, and WISC-IV

Effect Wilks’ λ F df (error df) p. Age .892 2.229 20 (767) .002* Race .977 1.366 4 (231) .247 Gender .956 2.629 4 (231) .035* Race X Gender .992 .469 4 (231) .758 Race X Age .925 .918 20 (767) .564 Gender X Age .951 .589 20 (767) .922 Race X Gender .941 .886 16 (706) .586 X Age *p < .05

As demonstrated in Table 9, statistical significance was found in age which decreased significantly in performance on the PSI in this sample group, indicating performance declined with age. Wilks’ λ (20, 2.23) = .829), p = .002.

63 Table 9

WISC source of significance for Age by VCI, PRI, WMI, and PSI

Source F df sig

VCI 1.964 5 .085

PRI 1.395 5 .227

WMI 2.031 5 .075

PSI 4.060 5 .001*

*p < .05

Table 10 indicates a statistically significant relation to gender and performance on the WISC-IV on the PSI Wilks’ λ (4, 2.63) = .956), p = .035. Males had slower PSI

(X=12.86) compared to females (X=15.0).

Table 10

WISC source of significance for Gender by VCI, PRI, WMI, and PSI

Source F df p.

VCI 0.296 1 .587

PRI 0.104 1 .748

WMI 0.929 1 .336

PSI 7.036 1 .009*

*p < .05

Gender and age were the only significant sources of variance and those differences were only evident on the Processing Speed Index amongst this sample.

64 Analysis of the WAIS- IV scores for Age, Race, and Gender: Research Hypothesis 2

Among people applying for disability there is a relationship between age, gender,

race, and performance on the WAIS-IV (Wechsler, 2008).

Table 11 provides data about the Mean and Standard Deviation scores for participants applying for disability income that were testing for cognitive disability using the WAIS-IV scale.

Table 11

Means and Standard Deviations (in parentheses) of WAIS-IV Scores for Disability

Applicants

Scale FSIQ VCI PRI WMI PSI Total sample (N =394) 58.98 (19.84) 18.14 (8.34) 19.82 (7.78) 11.14 (5.38) 10.98 (5.73)

Gender

(Male, N = 246) 58.35 (19.35) 17.81 (6.60) 19.92 (6.82) 10.83 (4.01) 10.25 (4.94)

(Female, N = 148) 60.03 (20.63) 18.70 (11.65) 19.67 (9.18) 11.65 (7.10) 12.21 (6.69)

Race

(White, N = 214) 61.06 (20.91) 19.43 (10.61) 20.66 (8.66) 11.72 (6.35) 11.25 (6.29)

(Black, N = 180) 56.50 (18.21) 16.61 (5.76) 18.83 (6.48) 10.44 (3.84) 10.67 (4.98)

Age (Categorical)

1 16 – 17.11 (25) 58.88 (16.08) 17.68 (3.78) 18.96 (6.21) 10.68 (3.80) 11.56 (5.16)

2 18 – 19.11 (127) 53.04 (16.59) 16.73 (10.12) 17.16 (5.83) 9.99 (3.87) 10.19 (4.37)

3 20 – 24.11 (48) 59.35 (19.20) 16.96 (6.17) 19.67 (6.66) 11.46 (4.70) 11.54 (4.38)

4 25 29.11 (22) 70.59 (21.08) 20.45 (5.65) 23.14 (7.56) 12.50 (4.13) 14.50 (6.58)

5 30 – 34.11 (25) 52.60 (20.81) 18.12 (7.87) 18.48 (7.51) 9.44 (4.27) 8.80 (4.14)

6 35 – 44.11 (42) 60.19 (19.63) 17.74 (7.89) 20.71 (7.34) 11.57 (4.75) 10.07 (4.49)

7 45 – 54.11 (65) 61.34 (21.49) 18.75 (10.88) 22.11 (10.96) 12.34 (8.88) 12.03 (8.65)

8 55 – 64.11 (40) 69.93 (21.26) 22.50 (7.41) 23.40 (7.10) 12.58 (4.70) 11.17 (6.05)

65 Full Scale Intelligence Quotient (FSIQ). Univariate analysis of variance was conducted to explore interactions between gender, race and age on WAIS-IV performance. Table 12 provides results indicating the only variable of significance was categorical age. This significance was not found when categorical age was explored in interaction with any other variables.

Table 12

Univariate tests: WAIS-IV: Dependent Variable FSIQ by Age, Race, and Gender

Effect F df p.

Age 5.058 7 .000*

Race 2.717 1 .067

Gender 1.235 1 .267

Race X Gender 0.022 1 .882

Race X Age 0.492 7 .840

Gender X Age 1.067 7 .384

Race X Gender 0.971 7 .452

X Age

*p < .05

Indices. Because the Indices measure distinctly separate factors that contribute to

FSIQ, an additional multivariate analysis was conducted on the WAIS-IV Indices. The

WICS-IV Indices are: Verbal Comprehension Index (VCI), age Perceptual Reasoning

Index (PRI), Working Memory Index (WMI), and Processing Speed Index (PSI). These four areas are combined to provide a participant’s FSIQ.

66 A 9 (Age group) X 2 (Race), X 2 (Gender) factorial analysis of variance (MANOVA) was conducted with sample 2 (WAIS-IV test results) to determine the extent to which there were any differences or interactions in age groups, race or gender on their performance on the 4 Indices (VCI, PRI, WMI, & PSI) of the WAIS IV. The MANOVA was preferred over multiple ANOVAS to control for the inflation of Type I error.

Statistical significance was set at .05 as the research was exploratory.

Prior to computing the MANOVA, the Kolmogorov-Smirnov test determined the sample was normally distributed, meeting the requirement for use of a MANOVA. One-

Sample Kolmogorov-Smirnov Test for the WAIS-IV (X= 58.982, SD= 19.81, p= .757) indicated that the scores for WISC-IV were normally distributed.

As demonstrated in Table 13, statistical significance was found in relation to

Categorical age Wilks’ λ (28, 2.969) = .799, p=.001. Table 14 further demonstrates that it was performance on the PRI, WMI, and the PSI that each had statistically significant influence by Categorical age.

Table 13

Multivariate tests for Age, Race, Gender, and WAIS-IV

Effect Wilks’ λ F df (error df) p. Age .799 2.969 28 (1296) .000* Race .978 1.983 4 (359) .097 Gender .969 2.897 4 (359) .022 Race X Gender .999 0.086 4 (359) .987 Race X Age .917 1.120 28 (1296) .305 Gender X Age .938 0.833 28 (1296) .716 Race X Gender .934 0.878 28 (1296) .648 X Age *p < .05

67 Table 14

WAIS-IV source of significance for AGE by VCI, PRI, WMI, and PSI

Source F df p.

VCI 1.970 7 .058

PRI 5.178 7 .000*

WMI 2.626 7 .012*

PSI 2.991 7 .005*

*p < .05

Conclusion

This chapter presented the results of the statistical analyses as related to the research questions. Chapter 5 interprets the analyses and offers recommendations for further research and practice.

68 CHAPTER V

RESULTS

This chapter interprets the analyses of the results and offers recommendations for further research and practice. Overall a few main effects, but no interactive effects, were observed and the general psychometric qualities of the scales did not appear to demonstrate a significant cultural bias when used with this population. Review of the demographic information about the sample indicated that amongst individuals applying for disability at the time and location surveyed, for applicants under the age of 16, more participants were Black (N =174) than White (N =83). However, after the age of 16, there were more White applicants than Black. Male applicants outnumbered female applicants for both administered assessments. The largest age group represented was among applicants from the ages of 18-20 (N =127), with the second largest group being age 45-55 (N = 65) and 10-12 year olds (N = 64).

Research Hypotheses

First Research Hypothesis. Among people applying for disability there is a relationship between age, gender, race, and performance on the Wechsler Intelligence

Scale for Children (WISC-IV; Wechsler, 2003).

69 WISC-IV Results. Multivariate analysis of the WISC-IV scores found significance (p. >. 05) for Age (Wilks’ λ = .892, F= 2.229 and p.= .002 > .05) and for

Gender (Wilks’ λ = .956, F= 2.629 and p.= .035 > .05). Closer exploration of these findings indicated that Processing Speed Index (PSI) was the only significant Index to reflect the differences found in Age (PSI F = 4.060, p. = .001 > .05) Gender (PSI F =

7.036, p. = .009 > .05). This indicated that for the WISC, the older participants demonstrated slower processing speeds than the younger participants and male applicants exhibited slower processing speeds than females. However, there did not appear to be an interactive effect between age and gender. Females were as likely as males to have decline in relation to age as were males.

Second Research Hypothesis. Among people applying for disability there is a relationship between age, gender, race, and performance on the Wechsler Adult

Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008).

WAIS-IV Results. Multivariate analysis of the WAIS-IV scores found significance (p. >. 05) for Age (Wilks’ λ = .799, F= 2.969 and p. = .000 > .05). Closer exploration of these findings indicated that Perceptual Reasoning Index (PRI) (F = 5.178, p. = .000 > .05), Working Memory Index (WMI) (F = 2.626, p. = .012 > .05) and PSI (F

= 2.626, p. = .012 > .05) were all significantly influenced by age. However, inspection of the mean scores indicated that the age where the highest and lowest average scores were found, differed for some Indices and even for Full Scale Intelligence Quotient

(FSIQ) (which also varied with age). Thus, while there is a relationship between age and performance on the WAIS, there did not appear to be a specific predictive pattern that emerged. Indeed these findings are likely an artifact of how the age categories were

70 assigned. For PRI, the highest average (23.40) was at ages 55- 65 and the lowest average

(17.16) was for ages 18-20. Yet while the highest average (12.58) for WMI was also at ages 55- 65, the lowest average (9.44) was for ages 30-35. However, similar results were obtained for 18-20 year olds with low average mean (9.99) on the WMI. PSI had different age groups represented with the highest mean age between ages 25-30 and the lowest mean age for ages 30-35. FSIQ for this sample had the highest mean age for 25-

30 year olds at 70.59 and the lowest mean age for 30-35 year olds with an average FSIQ of 52.60.

Discussion

It is noteworthy that within the current samples the largest group of people applying for disability was represented by young adults, at the age when many were finishing high school and preparing for further education or a career. On the WISC IV, a lower PSI could be an indication of cognitive difficulties for males under the age of 16 applying for disability. Age effects may also reflect that age had a role in some of the slower processing speeds for young adults, but there is no measure of developmental changes within the participants. The findings may be equally reflective of information about age of onset for differing cognitive difficulties contributing to disability. Either way, there did not appear to be anything within these findings to raise serious consideration of the cultural integrity of the test. The test developers provide conversion tables which adjust for age differences, and the differences in Processing Speed for males, while worthy of more detailed research, and did not lead the researchers to question gender biases created by the test. In fact, Sattler (2008) also found a 4 point difference in scores for males over scores for females in the general population.

71 On the WAIS-IV while age effects were noted on each of the Indices it is likely that, as mentioned with the WISC-IV results, these results would be adjusted for within conversion tables of the tests. Further research could explore whether age differences are representative of differences in the impact of distinct pathologies (for example anxiety disorder versus attention disorders on cognitive functioning). Although it is possible there could be age differences having an effect on the performance of applicants with disabilities differently than in the general population, it was not a conclusion of this dissertation research that these age differences represent a cultural bias of this test for this sample.

As previously mentioned, the American Psychological Association has published

Guidelines for the Assessment of and Intervention with Persons with Disabilities (2012).

These guidelines emphasize the importance of recognizing social and cultural diversity for persons with disabilities (guideline 8, p. 49 ) and the need to apply assessment approaches that are “psychometrically sound, fair, comprehensive, and appropriate for clients with disabilities” (guideline 14, p. 52). The two hypothesis of this dissertation explored two major assessment tools (WISC-IV and WAIS-IV) to determine if they fall within these guidelines.

The first Research Hypothesis stated: Among people applying for disability there is no relationship between age, gender, race, and performance on the Wechsler

Intelligence Scale for Children (WISC-IV; Wechsler, 2003).

This data indicated that for the WISC, the older participants had slower processing speeds than the younger participants and male applicants had slower processing speeds

72 than females. However, there did not appear to be an interactive effect between age and gender. Females were as likely as males to have decline in relation to age.

The second Research Hypothesis stated: Among people applying for disability there is no relationship between age, gender, race, and performance on the Wechsler

Adult Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008). The data indicated that Perceptual Reasoning Index (PRI) (F = 5.178, p. = .000 > .05), Working Memory

Index (WMI) (F = 2.626, p. = .012 > .05) and PSI (F = 2.626, p. = .012 > .05) were all significantly influenced by Age. Thus, while there was a relationship between Age and performance on the WAIS, there did not appear to be a specific predictive pattern that emerged. Again, these findings were likely an artifact of how the age categories were assigned.

Therefore, for both Hypothesis 1 and 2 while some relationships between age and performance, and gender and performance were noted, they were not interactive with each other or other variables. There was not enough evidence to fully conclude that either the WISC-IV or the WAIS-IV failed to meet the standards established in the

American Psychological Association’s Guidelines for the Assessment of and Intervention with Persons with Disabilities (2012). These scales appeared to be “psychometrically sound, fair, comprehensive, and appropriate for clients with disabilities” (guideline 14, p.

52). The two hypotheses of this dissertation explored two major assessment tools to aid in determining if they fall within these guidelines when assessing individuals applying for disabilities and demonstrated the importance of conducting research to continue to assure assessment tools fall within these guidelines.

73 Limitations and Implications for Future Research

This dissertation study did not identify the etiology of participants’ cognitive impairments, or any differences that may exist between the state of progression of differing pathologies and the impact of that on Age differences in performance. So, for example, while age and gender differences were observed on PSI in WISC-IV testing, it is not clear whether these differences might have been emotionally based, or were indicative of processing or perceptual disorders within the participants. Also, there is no information regarding the relative contribution of organic versus functional factors in the observed test scores. This includes not being certain of the absence or presence of potentially confounding psychiatric disorders. These limitations of this dissertation study indicate the need for further research that explores the role of differences in pathological etiologies on test outcomes amongst persons applying for disability.

A related factor, also not accounted for in this dissertation research, was the previously documented role that chronic ongoing stress plays on the brain. Nisbett

(2012) indicated that “Research suggests that part of the Black-White IQ gap may be attributable to the fact that Blacks, on average, tend to live in more stressful environments than do Whites. This is particularly the case in urban environments, where Black children are exposed to multiple stressors (p. 152).” If this is the case the absence of a gap in scores of Black and White applicants in this sample could indicate that persons with disabilities experience a higher degree of stress than that observed amongst persons without disabilities. Future research could include the level of stress in persons applying for disability experience as compared to stress levels amongst persons without disabilities and the relationship between these factors and performance on intelligence tests. Stress

74 levels theoretically may also interact with age and could have played a part in differences for participants who live in economic distress.

As mentioned earlier, the largest group of people applying for disability, in this sample, was 18 - 20 years old. Future research would be helpful in exploring if this age group is representative of a larger trend, as it might lend support to the need for more and longer educational interventions for persons with disabilities leaving high school.

Since data were drawn from people applying for disability determination, scores on standardized tests were anticipated to be lower than scores from people in the general population. While some aspects of this were desirable, because the stakes are even higher for individuals with below average scores, the results of this dissertation cannot be extrapolated to apply to the general population. Further research would be needed to determine the similarities and differences of findings for the population of this research and the general population.

Another limitation is that invalid individual test scores were excluded from the analyses. Invalid scores occurred for multiple reasons (suspected malingering, cognitive or severe psychiatric impairments [for example ADHD or Anxiety Disorder]) indicating the participant may not capable of responding at what would be an anticipated cognitive ability. An additional study could be conducted to explore the characteristics of individuals that provide invalid test results during disability determination. An example would be to explore the SES of individuals that malinger during testing; perhaps applying for disability income out of financial desperation. Again, not enough is known about the multiple and complex factors that may contribute to the etiology of the cognitive

75 impairments experienced by the participants. Future research is needed to explore this in greater depth.

This dissertation research did not account for socio economic status (SES).

Discussion needs to continue about the role SES plays in intelligence test performance

One consideration was made in this research to account for SES through the consideration of individual’s zip codes, since no information was requested about income amounts. However, zip codes were not included in the analyzed data since it could not be known for certain that zip code was indicative of income range, because of fair market housing regulations.

Age as it was represented in this dissertation research was cross sectional. Had it been possible to complete a longitudinal, or even cross sequential study of the participants, it would have provided even more in depth understanding about ways that intelligence may be changing over time for individuals and how age may interact with different types of disability on cognitive performance measures over time. An additional challenge to this dissertation research was the limitations of retesting on the same instrument.

An additional limitation to the meaning of the results and a potential extraneous variable is that this dissertation did not explore any relationship between the test taker and the test provider including any differences that age, gender, or race of either the examiner or the participant may play on the performance of another. While it is known that testing conditions were similar for all the participants, it is not known if there were differences in the person’s administering the tests. While some research has eliminated

76 this as a potential variable, future research may be useful to explore if characteristics of the test administer remains true.

Another limitation of this dissertation was that it did not have information about other minority members or members of multiple ethnic backgrounds. Future research, focused on demonstrating absence of bias, will also be charged with providing more ethnic categories’ that allow for inclusion of greater clarity in describing diverse groups that comprise the population. Even with the defined concept of race for the purposes of this research, there certainly could have been a large diversity of backgrounds amongst the people identified as merely Blacks and Whites. The intentional exclusion of Asian,

Hispanic, Bi-racial, and multiracial participants was due to a smaller number of these groups being tested at the time of data collection. Further research will need to explore any performance differences for these and within many cultural and ethnic groups.

This dissertation research was limited to one way of measuring intelligence. Although both WAIS-IV and WISC-IV are popular measures of IQ, they are not the only scales, and given other theories of intelligence, not the only ways to define intelligence. Future research could verify or contradict the results of this dissertation by including the administration of other IQ assessments during the disability application

Our study failed, as do most in this area of research, to fully account for the influence of racial socialization and threat. As Manly (2005) indicated, both are under-researched and may have strong influence on test results. Our study lacks qualitative histories of the participants. It could be very valuable to hear more history and information from the participants. For example, as Manly (2005) stated, Blacks are a group that has been hard to get information from; “African Americans have fears of

77 participation in medical research that are justified by a legacy of unethical use of human subjects and skin-deep social science. As a result many well-meaning, but inexperienced, neuropsychological researchers have considerable difficulty enrolling African American participants in studies” (p. 271). It would be very helpful for future research to utilize, as

Perry et al (2008, p165) suggested, to include Matsumoto’s concept of “unpackaging” to try to better understand how participants ascribe meaning to the test and to the testing situation. Since this dissertation study was archival in nature, we did not obtain information about the test takers experience of the testing event. Authors like Ryan

(2001) are finding test taking experience to be an important variable in test performance and future research may seek to gather more information from test takers about their experience with the test.

Future researchers could look at additional races and cultures, explore even more about differing disabilities and the ways they interact with age, gender, and race to impact intelligence test performance. (For example do Hispanic females with ADHD perform differently than Caucasian females with ADHD and is either of these impacted by age or the etiology of the ADHD). Mayer, et al (2012) described the growing research zeitgeist towards alternative views of intelligence as the “hot intelligences” and advocated for their continued inclusion in future research. These “hot intelligences” would include social, emotional, and personal intelligences including the influence of people’s skills in their abilities to interact within a social context; as contrasted with “cool intelligences” which only measure such traditional cognitive functions as abilities to abstract and to manipulate information. Nisbett, et al (2012) indicated agreement with Mayer, et al

(2012), but added that beyond finding correlations between cool and hot intelligences, it

78 has been hard to demonstrate a significant contribution of hot intelligences towards general behavior and performance.

Implications for Practice

It will remain critical for there to be continued research on scales used by psychologists assessing for disability. Keeping certain that the tools used in assessment of disability are “psychometrically sound, fair, comprehensive, and appropriate for clients with disabilities” (guideline 14, p. 52) are extremely important considerations prior to providing testing. Additionally, while the tests studied here were adequate for their purpose, with growing understanding of intellectual disability within our profession, the cognitive assessments we have previously relied upon are argued to need additional empirical augmentation to provide a complete picture of an individual being considered as having disability. The Vineland Adaptive Behavior Scales Second Edition (Vineland

II; Sparrow, Cicchetti, & Balla, 2005) is considered a “gold standard” (Scattone,

Donalaggio, & May, 2011, p.626) in the measure of adaptive behaviors. These scales measure Communication, Language Skills, Motor Skills, Daily Living Skills and

Socialization and provide additional information to the clinician about the individual’s ability to function beyond the scope of measuring for cognitive impairment and can aid in meeting criteria established in the DSM-V. However, additional research may need to be conducted to aid in assuring that the combined scales (WAISIV/WISCIV and Vineland

II) do not pose any cultural bias.

Future research could help in advocating the clear need for greater contextual information about the men women and children that are applying for disability. To develop knowledge about how to better assist this group of people would include better

79 integration of both qualitative and quantitative information about the multitude of complex variables that directly contribute to individuals applying for and potentially possessing a disability. This would include research involving greater depth of information about factors contributing and interfering with functioning and adaptation.

Summary

The Diagnostic and Statistical Manual (DSM-V; American Psychiatric

Association, 2013) classifies intellectual disability (Intellectual Developmental Disorder) under the broader classification of Neurodevelopmental Disorders. This reflects the impact of the AAID definition of intellectual disability and a general movement away from strong reliance upon cognitive assessment only; requiring that assessment also include objective measures for difficulties in adaptive functioning. This is also in keeping with Rosa’s Law (Public Law 111-256) a United States federal statute replacing the term “mental retardation” with “intellectual disability.” Yet, while these measures will not play a solo role in diagnosis it is highly likely they will maintain a key part. The call on the part of the APA for ethical non biased testing remains and is especially clearly articulated in the 2012 guidelines for Assessment and Intervention with Persons with

Disabilities. While very limited, this dissertation study provided a reply to the APA request for applied research directed at the actual implementation of assessment measures with purposeful exploration of bias in testing. Clearly an enormous amount of research remains in order to be clear that we have begun to respond to providing assessments meeting the standards we have established for ourselves as a profession.

80 REFERENCES

AAIDD. (2011). FAQ on Intellectual Disability, http://www.aaidd.org/content,

1/21/2011, 10:07 PM, 1-3.

American Psychological Association, (2003). Guidelines on multicultural education,

training, research, practice, and organizational change for psychologists,

American Psychologist, 58, 377-402.

American Psychological Association. (2012). Guidelines for assessment of and

intervention with persons of disabilities. American Psychologist. 67(1), 43- 62.

American Psychiatric Association. (2013). Diagnostic and statistical manual of mental

Disorders (5th ed.). American Psychiatric Publishing: Washington, DC.

Beins, B. (2010). Teaching measurement through historical sources. History of

Psychology, 13(1), 89-94.

Benjamin, L. (2009). The birth of American intelligence testing. Monitor on Psychology.

40(1), 20-21.

Benjamin, L. & Baker, D. B., (2004). From séance to science: A history of the profession

of psychology in America. Belmont, CA: Thompson, Wadsworth.

Boake, C. (2002). From the Binet-Simon to the Wechler-Bellvue: Tracing the history of

intelligence testing. Journal of Clinical and Experimental Neurology, 24( 3), 383-

405.

Brooks-Gunn, J. B., Klebanov, P. K. & Duncan, G. (1996). Ethnic differences in

children’s intelligence test scores: role of economic deprivation, home

environment, and maternal characteristics. , 67, 396-408.

81 Butler-Omololu, C., Doster, J. A., & Lahey, B. (1984). Some implications for intelligence

test construction with children of different racial groups. The Journal of Black

Psychology, 10, 63-75.

Carroll, J. B. (1993) Human cognitive abilities. Cambridge: Cambridge University Press.

Chen, H. & Zhu, J. (2008). Factor invariance between genders of the Weschler

Intelligence Scale for Children-Fourth edition. Personality and Individual

Differences, 45, 260-266.

Census data (2011). United States census 2010 data. Retrieved from:

http://www.census.gov/2010census/data/.

College Entrance Examination Board. (1992). Validity study sample of the 1991SAT

administration. New York, NY: College Entrance Examination Board.

Colom, Lluis-Font, & Andres-Peuyo (2005). The generational intelligence gains are

caused by decreasing variance in the lower half of the distribution: Supporting

evidence for the nutrition hypothesis. Intelligence, 33, 83-91. In Sattler, J. M.

(Ed.). Assessment of Children, Cognitive Foundations (5th ed.). LaMesa, CA:

Jerome M. Sattler. Publisher, Inc.

Detterman, D. K. & Daniel, M. H. (1989). Correlations of mental tests with each other

and with cognitive variables are highest for low IQ groups. Intelligence, 13, 349-

359.

Dickens W. T. & Flynn, J. R. (2006). Black Americans reduce the racial IQ gap.

Psychological Science, 17(10), 913-920.

Dove, A. (1968). Taking the chitling test, Newsweek, 72, 51-52.

82 Dweck, C. (2000). Self-Theories: Their role in motivation, personality, and development.

Philadelphia, PA: Psychology Press.

Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis

program. Behavior Research Methods, Instruments, & Computers, 28, 1-11.

Flanagan, D. P., & Kaufman, A. S. (2009). Essentials of WISC-IV assessment, (2nd ed.).

Hoboken, NJ: John & Sons, Inc.

Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure.

Psychological Bulletin. 101, 171-191.

Flynn, J. R., (1999). Searching for Justice: The discovery of IQ gains over time.

American Psychologist. 54(1), 5-20.

Foley-Nicpon, M. & Lee, S. (2012). Disability research in counseling psychology

journals: A 20- year content analysis. Journal of Counseling Psychology, 59,

392-398.

Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York:

Basic books.

Gasquoine, P. (2009). Race-norming of neuropsychological tests.

Review. 19, 250-262.

Glass, L. A., Ryan, J. J., Chater, R. A., and Bartels, J. D., (2009). Discrepancy score

reliabilities in the WISC-IV standardization sample. Journal of

Psychoeducational Assessment, 27(2), 138-144.

Goddard, H. H. (1912). The Kallikak family: as study in the heredity of feeble

mindedness. New York, MacMillan.

83 Goldstein, E. B., (2005). Reasoning and decision making. In E. B. Goldstein (Ed.).

Cognitive psychology: connecting mind, research, and everyday experience (pp.

427-478). Belmont, CA: Thompson Wadsworth.

Gould, S. J. (1981). The mismeasure of man. New York: W.W. Norton.

Guthrie, R.V. (2004). Even the rat was white: a historical view of psychology. Boston:

Pearson.

Health Insurance Portability and Accountability Act of 1996 (Pub.L. 104-191, 110

Stat.1936, enacted August 21 1996)

Helms, J. E. (2002). A remedy for the black-white test-score disparity. American

Psychologist. 47, 303-305.

Helms, J. E. (2006). Fairness is not validity or cultural bias in racial-group assessment:

The quantitative perspective. American Psychologist, 6(8), 845-859.

Helmes, J. E., & Cook, D. A. (1999). Using race and cultural in counseling and

psychotherapy: Theory and process. Boston: Allyn & Bacon.

Helmes, J. E., & Talleyrand, R. M. (1997). Race is not ethnicity. American Psychologist,

52, 1246-1247.

Herrnstein, R. J., & Murray, C. (1995). The bell curve: Intelligence and class structure in

American life. . New York: Simon and Schuster Inc.

Hiscock, M. (2007). The Flynn effect and its relevance to neuropsychology. Journal of

Clinical and Experimental Neuropsychology, 29(5), 514-529.

Howell, D. C. (2007). Statistical methods for psychology, (6th ed.). Belmont, CA:

Thompson Wadsworth .

84 Jackson, D. N., Rushton, J. P., (2006). Males have greater g: Sex differences in general

mental ability from 100,000 17- to 18- year-olds on the Scholastic Assessment

Test. Intelligence, 34(5), 479-486.

Jensen, A.R. (1995). Psychological Research on Race Differences, American

Psychologist, 50, 1,

Johnson, C. K. & Liu, X. (2008). A CHC theory based analysis of age differences on

cognitive abilities and academic skills at ages 22 and 90 years. Journal of

Psychoeducational Assessment. 26(4), 350-381.

Kaplan, R. M., & Saccuzzo, D. P. (2001). Psychological testing: Principles, applications

and issues (5th ed.). Belmont, CA: Wadsworth.

Kaufman, A., Johnson, C. K. & Liu, X. (2008). A CHC theory based analysis of age

differences on cognitive abilities and academic skills at ages 22 and 90 years.

Journal of Psychoeducational Assessment, 26(4), 350-381.

Kaufman, A. S., & Kaufman, N. L. (2004). Brief intelligence test - second edition

(KBIT-2). Circle Pines, MN: American Guidance Service.

Koocher, G.P. (2003). IQ Testing a matter of life or death. Ethics & Behavior. 13(1), 1-2.

Long, P. A., & Anthony, J. J. (1974). The measurement of mental retardation by a

culture-specific test. Psychology in the Schools, 11, 310-312.

Litchenberger & Kaufman (2009). Essentials of WAIS-IV assessment. Hoboken, NJ: John

Wiley and Sons, Inc.

Manly, J. J., (2005). Advantages and disadvantages of separate norms for African

Americans. The Clinical Neuro-Psychologist, 19, 270-275.

85 Matarazzo, J. D., & Wiens, A. N. (1977). Black intelligence test of cultural homogeneity

and Wechsler adult intelligence scale scores of black and white police applicants.

Journal of , 62, 57-63.

Mayer, J. D., Caruso, D. R., Panter, A. T., Salovey, P. (2012). The growing significance

of hot intelligences. American Psychologist, 67(2), 502.

Mercado III, E. (2009). Cognitive plasticity and cortical modules. Current Directions in

Psychological Science. 18(3), 153- 158.

Murray, C. (2007). The magnitude and components of change in the black- white IQ

difference from 1920 to 1991: A birth cohort analysis of the Woodcock-Johnson

standardizations. Intelligence, 35, 305-318.

Nisbett, R. E., Aronson, J., Blair, C., Dickens, W., Flynn, J., Halpern, D.F., &

Turkheiner, E. (2012). Intelligence: New findings and theoretical developments.

American Psychologist, 67(2), 130-159.

Onwuegbuzie, A. J. &Daley, C. E. (2001). Racial differences in IQ revisited: A synthesis

of nearly a century of research. Journal of Black Psychology, 27(2), 209-220

Perry, J. C., Satiani, A., Henze, K. T., Mascher, J., & Helms, J. E. (2008). Why is there

still no study of cultural equivalence in standardized cognitive ability tests?

Journal of Multicultural Counseling and Development, 36, 155-167.

Pickren, W. E. & Dewsbury, D. A., (2002), Evolving perspectives on the history of

psychology. Washington, DC: American Psychological Association..

Pinker, S. (1997). How the mind works. New York: W.W. Norton & Company.

Psychological Corporation. (2003). WISC-IV Scoring and Administration Manual. San

Antonio, TX: Author.

86 Rushton, P. J. (2010). Brain size as an explanation of national differences in IQ,

longevity, and other life-history variables. Personality and Individual differences.

48, 97-99

Rushton, P. J. (2012). No narrowing in mean black-white IQ differences predicted by

heritable g. American Psychologist, 67(2), 500-501.

Rushton, P. J. & Ankney, C. D. (2009). Whole brain size and general mental ability: a

review. International Journal of Neuroscience, 119(5), 691-731.

Rushton, P. J. & Jensen, A. R., (2005). Thirty years of research on race differences in

cognitive ability. Psychology, Public Policy, and Law. 11(2), 235-294.

Rushton, P. J., Jensen, A. R. (2010). The rise and fall of the Flynn Effect as a reason to

expect narrowing of the Black-White IQ gap. Intelligence , 38, 213-219

Sattler, J. M. (2008). Assessment of children, cognitive foundations (5th ed.). LaMesa,

CA: Jerome M. Sattler. Publisher, Inc.

Scattone, D., Donalaggio, D., & May, W. (2011). Comparison of the Vineland Adaptive

Behavior Scales Second Edition, and the Bayley Scales of Infant and

Development, Third Edition. Psychological Reports, 109(2), 626-634

Scott, D. (1994). Cognitive conceit: A review of the bell curve. A Social Policy. 25, 50-

59.

Shipley, W. C. (1940). A self-administering scale for measuring impairment and

deterioration. Journal of Psychology, 9, 371-377.

Shuttleworth-Edwards A., Kemp.R, Rust. A, Muirhead. J, Hartman. N., & Radloff. S.,

(2004). Cross-cultural effects on IQ test performance: A review and preliminary

87 normative indications on WAIS-II test performance. Journal of Clinical and

Experimental Neuropsychology, 26(7), 903-920

Sparrow, S., Cicchetti, D., & Balla, D., (2005). Vineland- II: Vineland adaptive behavior

scales survey forms manual. The Psychological Corporation.

Sternberg, R. J. (2000). Implicit theories of Intelligence as exemplar stories of success

why intelligence test validity is in the eye of the beholder. Psychology, Public

Policy, and Law. 6(1), 159-167.

Suinn, R. & Borrayo. (2008). The Ethnicity Gap: The Past, Present, and Future.

Professional Psychology: Research and Practice. 39(6), 646-651(US Census

Bureau, 2004) (US interim reports by age sex race and Hispanic origin retrieved

from http://www.census.gov/ipc/www/usinterimproj/)

Suzuki, L., & Valencia, R., (1997). Race-Ethnicity and Measured Intelligence:

Educational Implications. American Psychologist. 52(10), 1103-1114.

Terman, L. M. (1916). The measurement of intelligence. Boston: Houghton Mifflin. .

Watkins, M. W., Wilson, S. M., Kotz, K. M., Carbone, M. C., & Babula, T., (2006).

Factor structure of the Weschler Intelligence Scale for Children-Fourth Edition

among referred students. Educational and Psychological Measurement. 66(6),

975-983

Weiten, W. (2011). Psychology: Themes and variations (8th ed.). Mason, OH:

Wadsworth.

Wechsler, D. (2003). WISC-IV: Technical and interpretive manual. San Antonio, TX:

The Psychological Corporation.

88 Welch, K. C. (2002). The bell curve and the politics of negrophobia. In Fish, J. M. (Ed.).

Race and intelligence separating science from myth (p.177-198). Mahwah, NJ:

Lawrence Erlbaum.

Whitaker, S. (2008). WISC-IV and low IQ: Review and comparison with the WAIS-III.

Educational Psychology in Practice. 24(2), 129-137.

Winerman, L. (2011). Where’s the progress? Monitor on Psychology. 42(9), 28.

Williams. R. L. (1972). The BITCH-100: A culture-specific test. Paper presented at the

American Psychological Association Annual Convention, Honolulu, Hawaii.

Woodcock, R. W, & Johnson, M. B. (1989). Woodcock-Johnson psycho-educational

battery-revised. Allen, TX: DLM/Teaching Resources.

WWW. Pearsonpsychcorp.com

Young, A. & Rearden, J. (1979). Black intelligence test of cultural homogeneity and

Shipley-institute of living scale scores for black Chicago youths. Psychological

Reports, 45,457-458.

89