Name of Test

Hayward, Stewart, Phillips, Norris, & Lovell 1

Test Review: Comprehensive Test of Phonological Processing (CTOPP)

Name of Test: Comprehensive Test of Phonological Processing (CTOPP) Author(s): Richard K. Wagner, Joseph K. Torgesen, and Carol A. Rashotte Publisher/Year: Pro-Ed 1999 Forms: two forms, ages 5 and 6 years, and 7 through 24 years Age Range: 5 years, 0 months to 24 years, 11 months Norming Sample:

The normative sample was prepared by PRO-ED using their staff, customer base, and sites they set up in Tallahassee, Kansas City, and Auburn, WA. The norming was carried out in Fall, 1997 and Spring, 1998.

Total Number: 1 656 Number and Age: 5 through 24 years Location: 30 states, 4 major geographic regions Demographics: Data are presented by region, gender, race, ethnicity, rural or urban residence, family income, parent education, and disability. Rural/Urban: yes 78% urban, 22% rural compared to U.S. school population 75% urban and 25% rural SES: by income range, closely matches U.S. school population Other: parent education by less than bachelor’s degree, bachelor’s degree, and graduate degree. Disability categories: no disability, learning disability, speech-language disorder, mental retardation, and other handicap. Percentages close to U.S. school population data are presented.

Comment: Elsewhere, perhaps in the TNL, I read that the percentage of the population with speech-language disorders was estimated to be 7%.

Summary Prepared By: Eleanor Stewart 27 July 2007 Test Description/Overview:

The manual provides a comprehensive introduction to phonological skills, theory, and description of subtests. Throughout, the information provided is clear and detailed in all areas (i.e., background, instructions for use, directions for administration, explanation Hayward, Stewart, Phillips, Norris, & Lovell 2

of scores, and standardization).

The test kit consists of an examiner’s manual, a hard cover, coil bound test picture book, record booklets, and a CD contained in a sturdy cardboard box. The record booklet consists of identifying information, record of scores, profile of scores, and record of item and test performance by subtest. Each subtest is introduced in the booklet with brief, well labeled information about the material needed, ceiling rule, feedback allowed, scoring, and direction for practice and test items. The examiner’s text for test administration is printed in blue. Materials required for subtests include: a stopwatch, CTOPP CD with test items for Blending Words, Memory for Digits, Nonword Repetition, and Blending of Nonwords, and a CD player.

Theory: The framework presented consists of three kinds of phonological processing important to the development of written language: phonological awareness, phonological memory, and rapid naming. Each is described as distinct but interrelated. The authors state, “Phonological awareness, phonological memory, and rapid naming represent three correlated yet distinct kinds of phonological processing abilities. These abilities are correlated rather than independent, in that confirmatory factor analytic studies reveal that the correlations between them are substantially greater than zero. They are distinct rather than undifferentiated, in that the correlations between them are less than one. In general, phonological awareness and phonological memory tend to be more highly correlated with one another than with rapid naming. In addition, the three kinds of phonological processing abilities tend to become less correlated with development. For very young children, phonological awareness and phonological memory can be nearly perfectly correlated” (Wagner, Torgeson, & Rashotte, 1999, pp. 6-7). A schema is presented in Figure 1.1 (paper copy of Chapter 1 on file).

The subtests are constructed on the basis of the research tasks used to explore phonological skills.

Purpose of Test: The purpose of this test is to assess phonological skills, to identify those students who are performing below their peers, to profile strengths and weaknesses for intervention, to document progress, and to use CTOPP as a tool in research.

Areas Tested: Subtests are: 1. Elision* 2. Blending Words* 3. Sound Matching* 4. Memory for Digits* Hayward, Stewart, Phillips, Norris, & Lovell 3

5. Nonword Repetition* 6. Rapid Color Naming* 7. Rapid Object Naming* 8. Rapid Letter Naming 9. Phoneme Reversal 10. Blending Nonwords 11. Segmenting Words 12. Segmenting Nonwords

*Core Subtests for ages 5 and 6 years. Supplemental subtest at this age: Blending Nonwords. Comment: I learned a great deal from reading this section of the manual and would encourage clinicians to do the same.

Areas Tested:  Phonological Awareness Segmenting Blending Elision Other -see list above

Who can Administer: Authors state that examiners should have extensive training in testing as well as a thorough understanding of test statistics. Comment: I also think that the examiner should be very familiar with phonology especially if you want to consider error data, which I do. Clinically, errors are of interest to me.

Administration Time: The authors state test time is approximately 30 minutes. Comment: I think that more time is likely.

The authors state that during rapid naming and phoneme reversal subtests, the examiner should encourage the student to respond within 2 seconds (on rapid naming) and 10 seconds (phoneme reversal). If no response during these specified times, the item is marked as incorrect.

Test Administration (General and Subtests):

Chapter 2 addresses, “Information to Consider Before Testing”. Eligibility, examiner qualifications, testing time, and procedures are briefly discussed. The authors note that many students will not have had experience with the type of tasks included in the CTOPP. They offer test practice items with examiner feedback. The student must complete at least one of the trial items in order to proceed with testing. Otherwise, testing should be discontinued. Hayward, Stewart, Phillips, Norris, & Lovell 4

Procedures include: practice items, feedback on practice items, prompting for timed subtests, discontinue rules, and instructions specific to each form (subtests to be given for that age range). Entry points and ceilings are uniform with all subtests beginning with the first item, and all but the Sound Matching and rapid naming subtests have ceilings of three consecutive failed items. These exceptions are noted in the manual and on the record form. Six examples of proper ceiling use are provided.

A section on “Situational and Subject Error” outlines the sources of error variance including environment (e.g., uncomfortable setting) and examinee characteristics (e.g., fatigue, attitude). The authors encourage examiners to consider testing situation characteristics in the analysis of results. Other sections in this chapter include “Other Information About Testing”, and “Sharing the Test Results”. This is a good review of familiar material.

Chapter 3 provides “Administration and Scoring of the CTOPP” for each subtest with age ranges identified. This chapter offers specific instructions for administering and scoring each subtest. Scripted examiner instructions to the student appear in red type. These are the same instructions that appear in blue in the test booklet. Though the instructions in the manual describe using an audiocassette recorder and cassette of stimulus items, the version of CTOPP currently available uses a CD as previously described.

Test Interpretation:

Chapter 4, “Interpreting the CTOPP” is dedicated to information on recording, analyzing, and using CTOPP scores as well as interpreting test scores and composites and conducting discrepancy analyses.

The authors first show how to complete the profile booklet, including how to calculate exact age for the student examinee. Several examples of age calculation are provided. Sections II and III, “Record of Scores” and “Profile of Scores”, provide instructions with a demonstration with a student’s record. In the next section, “Test Scores and Their Interpretation”, each of the derived scores is defined and explained briefly. Regarding age and grade equivalent scores, the authors state that these scores have “come under close scrutiny in recent years so much so that the American Psychological Association (1985), among others, has advocated the discontinuance of these scores. In fact, the organization has gone so far as to encourage test publishers to stop reporting test scores as age and grade equivalents. Nevertheless, age and grade equivalents are currently mandated by many educational agencies and school systems. Because these scores are often required for administrative purposes, we provide them (reluctantly)” (Wagner, et al., 1999, p. 42). Hayward, Stewart, Phillips, Norris, & Lovell 5

Comment: This is the second or third manual in which the authors have made a statement such as this. I understand their reservations but I wonder what the rationale/defense is from the school’s point of view. I have found that clinicians are often “forced” to report these scores. Do these scores somehow align with other scores used?

With respect to percentiles, the authors note that these scores are often the ones used by clinicians to explain results. They point out that “the distance between two percentile ranks becomes much greater as those ranks are more distant from the mean or average (i.e., the 50th percentile)” (Wagner, et al., 1999, p. 42). They state, “Standard scores provide the clearest indication of an examinee’s subtest performance…Standard scores allow examiners to make comparisons across subtests” (p. 44).

The next sections cover discussions, with examples, of interpreting composite and individual subtest scores, discrepancy analyses (by calculating a difference score using standard scores to determine statistical significance for the difference), and clinical usefulness (in terms of severe discrepancies in difference scores). The last section in this chapter addresses “Cautions in Interpreting Test Results” with brief review of test reliability, using tests alone to diagnose, and limitations of test in terms of intervention planning. Regarding test reliability, the authors state, “Results based on tests having reliabilities of less than .80 should be viewed skeptically” (Wagner, et al., 1999, p. 56).

Standardization: Age equivalent scores Grade equivalent scores Percentiles Standard scores Stanines Other (Please Specify) Composite Scores are determined from the sums of standard scores. Reliability

The entire norming sample was used. Extensive description of reliability calculations is given and standard errors of measurement are provided. Impressive results are reported across subtests and composites (ie. nothing below .70 on any reliability calculation). The authors state,” The magnitude of these coefficients strongly suggest that the CTOPP possesses little test error and that users can have confidence in the results” (Wagner, et al., 1999, p. 73).

Internal consistency of items: Except for the rapid naming subtests which used alternate-form reliability (appropriate for “speeded tests”), all subtests were investigated using Cronbach’s coefficient alpha. Composite scoreswere investigated for internal consistency using Guilford’s formula (again excepting rapid naming subtests as noted). Associated SEMs for standard scores and composites were presented (1s and 2s for subtests and 4s, 5s, and 6s for composites).

Results: “100% of the alphas for the CTOPP subtests reach .70; 76% attain .80…and 19% attain .90, the optimal level” (Wagner, et Hayward, Stewart, Phillips, Norris, & Lovell 6

al., 1999, p. 68).

Test-retest: The Tallahassee study involved 91 participants in three age ranges (n=32 ages 5 through 7 years, n=30 ages 8 through 17 years, and n=29 ages 18 through 24 years) after a two-week interval. Correlation coefficients ranged between .68 and .97 for the 5 to 7 year range, .72 to .93 for 8 to 17, and .67 to .90 for 18 to 24 years.

Comment from the Buros reviewer: “Although the correlation coefficients are adequate, it is surprising that they are not stronger, particularly at the oldest age levels where one would believe that little development in phonological processing would occur between times of administration” (Hurford & Wright, 2003, p. 227).

Inter-rater: Two staff members at PRO-ED independently scored 30 protocols from 5 and 6 years olds and 30 protocols from 7-24 year olds. Results were reported to be .98. Comment from Buros reviewer: “Although this method assesses the likelihood of arriving at similar results when the protocols have already been completed, it does not directly assess interrater reliability. To adequately assess this type of reliability requires that two individuals independently administer the CTOPP to the same child, transcribe the responses that the individual provides, and then score the protocol. Although the authors report in the manual that the interrater reliability was .98, this is an incomplete, inadequate, and invalid evaluation of interrater” (Hurford & Wright, 2003, p. 227).

My comment: Isn’t this surprising, given the amount of resources that appear to have been dedicated to the development of this test? The upshot of this is that the reviewer concludes that the test has “acceptable psychometric properties”(Hurford & Wright, 2003, p. 229).

Other: none Validity:

Content: The authors provide an overview of the research on phonological skills in Chapter 1 “Rationale and Overview” and in Chapter 7 “Validity of Test Results” as well as discuss the tasks which were chosen based on those used in research studies to assess these skills. They also discuss the rationale for the format of the subtests and the selection of test items. Results of conventional item analysis and Item Response Theory (IRT) are reported. Differential item functioning analysis was performed (see section below).

Criterion Prediction Validity: The results of several studies were reported. The CTOPP was compared with: Hayward, Stewart, Phillips, Norris, & Lovell 7

1. Using a preliminary version of the test, CTOPP Composites were compared with the Word Identification and Word Analysis subtests of the Woodcock Reading Mastery Test-Revised (WRMT-R) on a sample of 216 children in kindergarten (53% girls, 26% minority students). Coefficients were found after one year in kindergarten: .71 for Phonological Awareness, .42 for Phonological Memory, and .66 for Rapid Naming and WRMT-R composites. The same group, assessed after Grade One, showed better values: .80, .52, and .70 respectively (note the modest correlations for Phonological Memory). 2. A revised version of CTOPP was used in the study of 603 public school students, approximately 100 per grade K-5 (69% Euro-American, 28% African American, 3% other minority groups) in Tallahassee, FL. The authors compared the Word Identification, and Word Attack of WRMT-R and the TOWRE Sight Word Efficiency and Phonetic Decoding Efficiency subtests. Partial correlations controlling for age between CTOPP subtests and criterion variables showed “moderate to strong correlations between CTOPP subtests and the criterion variables support [ing] the criterion-prediction validity of the test”. Results ranged from .25 to .74 (x=.48) (Wagner, et al., 1999, p. 92). 3. The final version of CTOPP was used in a study of 164 students, (approximately n=25 students in kindergarten and grades 2, 5, and 7, n=40 in high school and n=22 in college) in what appears to be a university affiliated school program in Florida. Demographic data were reported: 76% Euro-American, 20% African American, 4% minority. Using CTOPP and the WRMT- R Word Identification subtest, results of partial correlations controlling for age between CTOPP subtests and the criterion variable of Word Identification were in moderate to high range (.46 to .66). 4. Students with learning disabilities (n=73) participated in a study in which the CTOPP was administered on two occasions with the Lindamood Auditory Conceptualization Test (LAC) administered on the first occasion. The WRMT-R Word Attack and Word Identification subtests and the Gray Oral Reading Tests 3rd edition Accuracy, Rate, and Comprehension subtests, and the WRAT-3rd ed were also administered, along with a spelling test. Intervention was initiated and carried out between the two test occasions. “Partial correlations controlling for age and corrected for unreliability between CTOPP subtests, the phonological awareness criterion variable, and the reading and spelling criterion variables…correlations were large for the CTOPP phonological awareness subtests (Elision and Blending Words) and rapid naming subtests (Rapid Digit Naming, Rapid Letter Naming), but variable for the memory subtests (i.e., Memory for Digits, Nonword Repetition)” (Wagner, et al., 1999, p. 94-95). 5. CTOPP was correlated with TOWRE in a study using the entire normative sample. Results indicated correlations “in the low to moderate range both for the kindergarteners and first graders, and for the second graders and older. The correlations are high enough to give support for the validity of CTOPP scores. The coefficients relative to the Phonological Awareness Quotient are particularly strong for the kindergarteners and first graders. The relationship between the CTOPP and TOWRE quotients is particularly encouraging in that it establishes the relationship between phonological processes and word reading Hayward, Stewart, Phillips, Norris, & Lovell 8

fluency” (Wagner, et al., 1999, p. 96-97). Note: While CTOPP measures phonological abilities related to reading, the TOWRE measures reading ability.

Construct Identification Validity: The confirmatory factor analysis performed confirms the construct validity of the CTOPP. “The model provided an excellent fit to the data…Comparative Fit Index (CFI) for the present model of .99 approached the maximum possible value of 1.00” (Wagner, et al., 1999, p. 98). Age differentiation: Coefficients were calculated and vary from low to high. The authors state, “These coefficients are high enough to demonstrate the developmental nature of the subtest’s contents. Because a relationship with age is a long-acknowledged characteristic of most phonological processing abilities, the data found…support the construct validity of the CTOPP” (p. 101). Group differentiation: Using mean standard scores for the entire normative sample and 8 identified subgroups, outcomes were as expected. Participants identified with speech/learning disabilities and language disabilities performed poorly. African American children scored slightly below average but the authors defend this outcome pointing to their earlier research that demonstrated that “performance on measures of phonological awareness is more modifiable by environment/language/training experience than is performance of measures of memory or rapid naming.” (p.103).

Comment from the Buros reviewer: “One statement that the authors make regarding using group differentiation to support construct validity is somewhat inconsistent with their results from the differential item functioning analysis. The Delta scores indicated that there was no item bias for the CTOPP among African American vs. non-African American participants. However, there were differences between the standard scores of African American participants and other participants, some as large as a SEM from the mean. The authors claim that 'The differences among groups on measures of phonological awareness should not be taken as evidence of test bias, but rather as confirmation of the fact that differences in language experience in the home and neighborhood can have an impact on the development of phonological awareness in children' (manual, p. 103). Although this may be true, the evidence they use to support this claim is based on a study they performed that, most likely, included participants from the norming group. This renders the argument somewhat circular. As the authors note in the manual, the study of validity is a continuous process. This is one aspect of their validity study that warrants further investigation and verification” (Hurford & Wright, 2003, p. 228).

Differential Item Functioning: Item bias was examined with male/female, European and non-European Americans, African Americans, other races, and Hispanic and other ethnic groups. Two methods were used: Logistical regression and Delta Scores. Using regression, with alpha at .01, the authors and their colleague at PRO-ED found 25 suspect items that were then eliminated from the final CTOPP. Using the second technique, correlation coefficients for Delta scores for the groups ranged between .86 and .99 Hayward, Stewart, Phillips, Norris, & Lovell 9

(x=.98). Therefore, differential item functioning was not evidenced.

Other: none. Comment: I think they’ve exhausted themselves.

Summary/Conclusions/Observations:

Both Chapters 1 and 7 are valuable for clinicians wanting to have an overview of phonological skills, their theoretical basis, what processes are involved, how to assess, and implications. Given the implications of the student’s test performance in terms of academic achievement in reading, I think that clinicians need to be thoroughly knowledgeable about the information in this manual. The authors have done a very good job of describing why it is important to test phonological processing skills, how this is done, and how to interpret the findings.

My guess is that the average clinician might need to be encouraged to read the entire manual but it is worthwhile to do so.

Despite the reservations about reliability noted, overall the reviewers favour this test for its use of theory, task construction, and sound psychometric foundations. I concur.

For the purpose of TELL development, I think that this test provides an excellent example of a thorough and detailed manual. However, I really had to concentrate to get through the evidence for validity. I finally had to sketch out the various studies in order to understand the range of studies conducted.

Comment from the Buros reviewer: “One minor criticism of the manual concerns the authors' overly cautious remarks regarding interpretation of the results. This overly cautious attitude is present in several places within the manual. These remarks are most likely related to the authors' backgrounds in research. As researchers, the authors were trained to avoid statements that were not totally grounded in the data and to eschew making Type I errors (indicating there are differences, difficulties, or relationships that do not, in reality, exist). Given the massive amount of research examining the relationship between phonological processing and reading and language difficulties, the cautions are not warranted. Finally, the psychometric properties of the CTOPP are appropriate for making rather strong statements concerning one's phonological processing abilities” (Hurford & Wright, 2003, p. 229). Hayward, Stewart, Phillips, Norris, & Lovell 10

Clinical/Diagnostic Usefulness:

Yes, to this test. I would feel confident that I identified a student in need, and supported in reporting results as well as in planning intervention.

References

American Psychological Association (1985). Standards for educational and psychological tests. Washington, DC: Author.

Hurford, D., & Wright, C. (2003). Test review of the Test of Word Reading Efficiency. In B.S. Plake, J.C. Impara, and R.A. Spies (Eds.), The fifteenth mental measurements yearbook (pp. 226-232). Lincoln, NE: Buros Institute of Mental Measurements.

U.S. Bureau of the Census (1997). Statistical abstract of the United States (117th ed.). Washington, DC: U.S. Department of Commerce.

Wagner, R., Torgeson, J., & Rashotte, C. (1999). Comprehensive test of phonological processing. Austin, TX: Pro-Ed.

To cite this document:

Hayward, D. V., Stewart, G. E., Phillips, L. M., Norris, S. P., & Lovell, M. A. (2008). Test review: Comprehensive test of phonological processing (CTOPP). Language, Phonological Awareness, and Reading Test Directory (pp. 1-10). Edmonton, AB: Canadian Centre for Research on Literacy. Retrieved [insert date] from http://www.uofaweb.ualberta.ca/elementaryed/ccrl.cfm