Psychometric Review of Language Tests for Preschool Children A

Psychometric Review of Language Tests for Preschool Children A Senior Honors Thesis Presented in Partial Fulfillment of the Requirements for graduation with research distinction in Speech and Hearing Science in the undergraduate colleges of The Ohio State University by Abby Kinsey The Ohio State University Project Advisor: Dr. Rebecca McCauley, Department of Speech and Hearing Science 1 Abstract Speech-language pathologists have a large number of norm-referenced tests to choose from as they diagnose language disorders in preschool age children. Because determining the psychometric quality of a test is important to that choice, the purpose of the present study is to review the quality of currently available tests. A standardized search strategy yielded 15 norm- referenced tests published since 1998 that focused on language skills in preschool children. Eleven criteria related to the test manual's documentation of evidence such as reliability and validity were developed for use in the review based on those used in 2 similar studies (McCauley & Swisher, 1984; McCauley and Strand, 2008). Studies such as this can (a) promote ongoing study of existing tests to strengthen evidence of their psychometric quality and (b) encourage improvements in future test development. Further, such studies can increase speech-language pathologists' attention to the importance of psychometric characteristics and the potentially negative effects of poorer tests on the quality of their own work. 2 Table of Contents Abstract……………………………………………………………………………………………2 Acknowledgments…………………………………………………………………………............4 Introduction………………………………………………………………………………………..5 Background………………………………………………………………………………………..6 Method…………………………………………………………………………………………..10 Results……………………………………………………………………………………………17 Conclusion……………………………………………………………………………………….20 References………………………………………………………………………………………..21 Appendix…………………………………………………………………………………………22 3 Acknowledgements I would like to thank my advisor, Dr. Rebecca McCauley for allowing me to work alongside her on this project. Working alongside her I have grown professionally in the field of speech language pathology as well as in the area of research. I would also like to thank Pearson and Pro-Ed for donating many of the tests used in the study. Finally, I would like to thank my friends and family for all the support they have given me throughout this process. 4 Introduction This study is a replication of a 1984 study by Rebecca McCauley and Linda Swisher. The previous study examined both language and articulation tests. This particular study replicates the language portion of the previous study. There are 3 appropriate assessment objectives for which tests are used to (a) determine the existence of a speech or language disorder, (b) determine the goals of intervention, and (c) plan procedures for intervention. Norm-referenced tests are primarily used to determine the existence of a disorder. That is because norm-referenced tests measure the child’s ability compared to the normative scores of the sample of children in the child’s age range. If the child’s score falls a predetermined distance (e.g., 1.5 SD) below the mean score of the normative sample, then it is said that he/she has may have a speech or language disorder. The two purposes for this study are (a) to stimulate discussion of the psychometric characteristics of language tests rather than to serve as a definitive psychometric review and (b) to see how psychometric data has changed for tests in the last 25 years. Therefore, the criteria used in the review focused on a selected sample of a larger number of important psychometric characteristics. They included those used in the previous study (McCauley & Swisher, 1984) as well as an additional one that was added to more fully examine validity evidence and was similar to one used in a more recent psychometric review (McCauley & Strand, 2008). There are three concepts that are important to understand and take into consideration in order to have a true understanding of norm-referenced tests and the standardized process. These concepts are (a) test validity and reliability, (b) the normative sample, and (c) test norms and derived scores. 5 Background on basic psychometric concepts Validity and Reliability A measurement instrument, such as a language test, is described as psychometrically valid if it accurately measures what it is designed to measure. There are four types of validity evidence that can help demonstrate the successful construction of a test. The four types of evidence are evidence of content validity, concurrent validity, predictive validity, and construct validity. Each of these individual characteristics should be considered and present in every language test manual. Content validity procedures involve the systematic examination of the test content to determine whether it covers a representative sample of the behavior domain to be measured (Anastasi & Urbina, 1997). One form of criterion-related validity is termed concurrent validity. Evidence for this type of validity should indicate the effectiveness of a test in determining an individual’s performance in specific activities (Anastasi & Urbina, 1997). If a language test shows validity in this area then the test should correlate well with other tests that are measuring the same ability. Predictive validity, the other form of evidence usually considered under the label criterion- related validity is examined by assessing how closely an individual’s test score can be used to predict future performance on a criterion measure. The information that would be supplied by a test validated in this fashion could provide an important basis for the identification of subgroups differentiated by prognosis (McCauley & Swisher, 1984).Construct validity of a test is the extent to which the test may be said to measure a theoretical construct or trait. (Anastasi & Urbina, 1997, p. 126) This type of validity is studied by a careful comparison of the test author’s description of the construct to be tested to the test’s actual content. In order for a test to measure 6 what it is intended to measure, evidence that there is high reliability must also be reported in a test manual. Reliability refers to the stability with which a test measures a given behavior. There are two types of reliability that should be present in a language test manual to ensure that it is a stable measurement. Test-retest reliability is the first in which the manual for a language test should include. This should examine the extent to which a test taker’s performance is consistent over a period of time. This type of reliability shows the extent to which scores on a test can be generalized over different occasions; the higher the reliability, the less susceptible the scores are to random daily changes in the conditions of the examinee or the testing environment (Anastasi & Urnina, 1997). Interexaminer reliability is the second type of reliability that should be provided in the manual for a test. Evidence of this type of reliability is provided through a study demonstrating a correlation between two scorers. This type of reliability is used to show that the directions and administration procedures are clear enough so that multiple scorers will yield approximately the same score. The Normative Sample Test norms are “a statistical summary of the scores received by a normative sample” (McCauley & Swisher, 1984, p.36). These norms are a basis for comparing a child’s score with peers based on age and language experience. In order to find the existence of a problem, it may be appropriate to obtain norms from different normative groups. Test developers should ideally publish norms for a variety of different groups to allow the test user to answer all relevant assessment questions for all possible test takers. However, it is rare that a representative normative sample is provided for even the most basic question—Is there a language impairment? 7 When a norm-referenced language test is developed, the normative sample should clearly be defined in the test manual. The information that should be provided to clearly define the normative sample include age, geographic residence, and socioeconomic status. These factors are important to know because they can have an effect on the child’s performance, therefore having an effect on how the scores should be interpreted. Age would have an effect because you would expect that a 10 year old child would do better on the test than a 3 year old child. Geographic residence is important as well because it is possible that there would be dialectal differences in individuals which could potentially affect the way in which a child scores on the test. And lastly, socioeconomic status is important to know because studies have shown that lower socioeconomic status can be associated with poorer test performance on existing language tests (Arnold & Reed, 1976; Johnson, 1974). Along with the description of the normative sample, it is also important for the test user to know how the sample was chosen. Often, a test designer will exclude individuals who have a disability or non-normal language abilities. According to McCauley and Swisher (1984), this can present difficulties because even the most deviant scores within the normative sample are still representing normal performance. This makes it hard to tell whether a child that scores just below the lowest score has nonnormal performance which makes it difficult to tell just how different the score has to be before it reflects

Psychometric Review of Language Tests for Preschool Children A

Technology-Enabled and Universally Designed Assessment: Considering Access in Measuring the Achievement of Students with Disabilities— a Foundation for Research

What Teachers Know About Validity of Classroom Tests: Evidence from a University in Nigeria

Curriculum B.Sc. Programme in Clinical Genetics 2020-21

Reliability and Validity: How Do These Concepts Influence Accurate Student Assessment?

On the Validity of Reading Assessments GOTHENBURG STUDIES in EDUCATIONAL SCIENCES 328

Construct Validity in Psychological Tests

EDS 245 Psychology in the Schools Stephen E. Brock, Ph.D., NCSP 1

“Is the Test Valid?” Criterion-Related Validity– 3 Classic Types Criterion-Related Validity– 3 Classic Types

An Evaluation Paradox: the Issues of Test Validity in the Realm of Writing Test

The Predictive Validity of Kindergarten Screening Measures Is Strengthened by Examining the Relationships at Both the Pupil-Level and the Between-School Levels

4. the Status of Test Validation Research

An Unfinished Canvas