Subtopic High Stakes Accountability Assessment

Join Together: A Nationwide On-Line Community of Practice and Professional Development School Dedicated to Instructional Effectiveness and Academic Excellence within Deaf/Hard of Hearing Education

Objective 2.4 – Assessment

Submitted by: Topical Team Leaders – John Luckner, Ed.D. and Sandy Bowen, Ph.D.

1 Subtopic – High Stakes Accountability Assessment

Rationale In the past, high stakes often referred to a high-rolling card game in which large sums of money were bet. The winner stood to gain a lot and, conversely, the loser would lose a lot. The phrase high stakes testing is widely used in education today. Generally, the phrase refers to group achievement tests used to assess accountability of students and school systems (Overton, 2003). The accountability comes out of the education reform movement, the No Child Left Behind Act (NCLB, 2001), and the Individuals with Disabilities Education Act (IDEA, 1997). The high stakes in education allude to the direct consequences of test scores. The consequences of high stakes testing can affect individuals, schools, school districts, and states. The consequences can be positive, such as teachers receiving additional pay in recognition for excellence in teaching. The consequences can also be negative, such as published “report-card” grades for each school based on test results. Some consequences directly impact students, including tracking, promotion, retention, and graduation. Other consequences may include rewards or sanctions given to schools or districts. High stakes testing continues to evolve as educators and politicians see intended and unintended ramifications of the testing and reporting. The issue of high stakes testing is highly controversial. Writers and researchers seem to agree that accountability is important to ensure that students are held to high standards and are progressing through the curriculum (Salvia & Ysseldyke, 2001; Thurlow, 2003). However, there are concerns about most aspects of the testing. Specific concerns include the tests’ content validity, nonalignment with state standards, inconsistency in administration of the tests, and errors in scoring, data collection and analysis (Salvia & Ysseldyke, 2001; Thurlow, 2003). It is also important to note that many researchers and organizations have long protested the use of the scores from one instrument as the sole or primary basis for important decisions (American Educational Research Association, 2000; International Reading Association, 1999; Moores, 2000; Randall, McAnally, Rittenhouse, Russell, & Sorensen, 2000). In addition, there are documented instances of teachers or administrators cheating in attempts to improve students’ scores (Heubert, 2003).

Assessments for Students who are Deaf or Hard of Hearing Generally, students who are deaf or hard of hearing are required to take the same high stakes tests that everyone else is taking. Unfortunately, few of the tests are normed for students who are deaf or hard of hearing; making them essentially invalid measurements (Salvia & Ysseldyke, 2001). However, since most students who are deaf or hard of hearing attend school in general education classrooms and high stakes tests are the current measuring sticks in most states, it is necessary to use the same measuring stick. One multiple-skill norm-referenced test series that has been normed using a sample of students who are deaf or hard of hearing is the Stanford Achievement Test series (SAT) by Harcourt Brace Educational Measurement (2002).This test is the most widely chosen test for state schools for the deaf and large programs serving students who are deaf or hard of hearing. Currently, no one is analyzing and reporting the results of the Stanford Achievement test for students who are deaf of hard of hearing. Since the majority students who are deaf or hard of hearing are receiving most of their education in general education classes in public schools (U.S. Department of Education, 2002), the high stakes testing experience for these students mirrors the experience of other students

2 across the nation. This results in a lack of clear data about the experiences or test scores of students who are deaf or hard of hearing. There is concern about including such students in the testing pool due to tests not being designed for or normed on this population. While there are no studies about the accommodations provided for students with hearing loss in high-stakes testing situations, Thurlow (2003) reports a number of concerns regarding accommodations for students with disabilities and English language learners. These concerns include a tendency to over- accommodate, poor criteria for determining accommodations, and inappropriate use of accommodations (or lack thereof) as a way of excluding student scores from reporting or accountability systems. The State Accountability for All Students project has studied the policies and accommodations provided for students with disabilities on state high-stakes assessments (http://www.ssco.org/saas/prelim.findings.march04.pdf). They report that there is too much variability in the provision of accommodations and students are often under- or over- accommodated. Some states delete the scores of students receiving any accommodations while others include those scores. An interesting sidelight in this study is their finding that in states where there was a wide scope of accommodations allowed, more students with disabilities were included in the high-stakes tests.

Fears and Agreement There is collective alarm and fear in the field of deaf education that high stakes tests may have a negative impact on students who are deaf or hard of hearing (Johnson, 2001, 2004; Moores 2000, 2001; Mounty, 2001; Randall et al., 2000; Steffan, 2004). Nevertheless, while many voice their fears about the negative aspects of high-stakes testing for students who are deaf and hard of hearing, researchers and leaders also concede a need to be involved. Indeed, Johnson (2001) says this is a sort of “triumph” for students who are deaf or hard of hearing – to be included in the general curriculum and assessment after years of being excluded! As Claire Bugen, superintendent of the Texas School for the Deaf, said at the 2003 national conference of the Conference of Educational Administrators for the Deaf (CEASD), “NCLB challenges us all to become more accountable. We want to know if schools are doing their jobs and children are learning” (cited in Hanson, 2003, p.27). Similarly, Steffan (2004) expressed the common belief among educators of the deaf and hard of hearing that students should be included in the general curriculum and state testing, but there are still so many “unanswered questions” that this is a fearsome prospect.

Overriding Principles in Assessment The use of multiple measures of a student’s achievement for progress monitoring is a universal principle in assessment (American Educational Research Association, 2000; International Reading Association, 1999; National Association of the Deaf (2002), Quenemoen, Thurlow, Moen, Thompson, & Morse, 2003; Salvia & Ysseldyke, 2001). When the outcomes of a sole test carry such deep and far-reaching consequences, it is all the more important for additional sources of information to be added to those outcomes. Martin (2001) argued “supplemental measures” should be used in addition to high-stakes state assessment. Randall and colleagues suggested using a School Career Portfolio including five areas: (a) life experiences, (b) rigor of program of study, (c) grade point average, (d) content area projects and performances, and (e) standardized test scores (Randall et al., 2000). Heubert and Hauser (1999) explain that when test use is inappropriate, it can weaken the quality of education and reduce the

3 “equality of opportunity.” This understanding accentuates the magnitude of the practical consequences of high-stakes testing for students who are deaf and hard of hearing. Research in this area is desperately needed, but there are some clear messages from long-time leaders and researchers that we can draw from for guidance in remedying the one-test travesty. One portion of the remedy requires examining and adjusting test design and development.

Test Development and Design One of the first areas to scrutinize is each test’s alignment with the state standards. For example, many reading tests tend to assess student decoding and reading comprehension skills to the exclusion of any other reading benchmarks and standards. Having a wider scope of knowledge and skills to assess could also lead toward multiple measures of skills, which is the first and most important principle in best practices of assessment. Simply stated, the complexities of reading cannot be “captured” in current large-scale assessments (Thompson, Johnstone, Thurlow, & Clapper, 2004). Due to the nature of the difficulties with high-stakes testing in each state, Popham (2004) even went so far as to suggest at the National Task Force on Equity in Testing Deaf Individuals’ conference that states need to analyze their standards to be sure they are worthy of testing. He emphasizes that states should scrutinize each standard and underlying benchmarks to ensure that they are indeed measurable and teachable. Another remedy for the ills of high-stakes testing is Universal Design (UD). Because it is based on principles that make content accessible to all learners, UD has the potential for erasing barriers that many test directions and items present to students who are deaf and hard of hearing. Often, when educators express concerns about typical high-stakes tests, it is because the tests use a multiple-choice format. While this seems a good format because “the answer” is provided along with other options, there are concerns about such items for students with hearing loss (Martin, 2001). Often the language in multiple-choice tests is atypical to daily language and is specific to testing. This kind of specialized language is often unfamiliar to students who are deaf and hard of hearing. Vocabulary can be challenging because low-frequency meanings are often preferred in tests in place of high-frequency meanings when students are given word choices. Test items also often use complex or confusing grammatical structures, which are difficult to comprehend even for students with good English skills. If the test items were rewritten with straightforward grammatical structures and common vocabulary, it is more likely that the test will be examining the knowledge and skill of the student rather than their ability to understand the test itself. The process of improving assessment for students with disabilities should begin with aligning standards with the assessment tools and implementing UD in designing new tests and adjusting old tests (Thompson et al., 2004). If some of the general concerns about the tests were addressed so that test design and development incorporated the principles of UD and state standards were scrutinized and thoroughly aligned with the standards, the tests themselves would be much better assessments of student learning and progress. However, even if UD is utilized and alignment of standards occurs, students who are deaf or hard of hearing will still need adjustments in order to be able to best show their academic levels.

The Accommodation Tangle Another enormous area of consternation is the use of accommodations in the testing situations. As Thompson and colleagues (2004) described, research findings are mixed about the

4 effects of the use of accommodations and the appropriate accommodations to use. In addition, the use of test accommodations is a very controversial topic within the research community. While the topic of accommodations causes controversy among researchers and experts rightly concerned about the validity of results of tests, there should also be concern about giving tests to students when the students do not have cognitive or linguistic access to the directions or the test items. There is no question that much needs to be researched in this area. In the meantime, students who are deaf or hard of hearing should not suffer from the very real negative consequences of low test scores.

What We Can Do Students who are deaf or hard of hearing receive access to the general education curriculum through appropriate accommodations, and are able to progress in their classes with the individualized use of strategic accommodations. Some of these accommodations should occur prior to the actual test dates. LaSasso (1999) recommended that formal and specific test-taking strategies become part of the curriculum for students who are deaf or hard of hearing. She suggested that when students receive formal instruction and practice with the various types of test items, formats, testing conditions, and the types of information requested, it is likely the tests will be a better reflection of the students’ learning. Chaleff and Toranzo (2000) reported on a test-taking training program adopted at one school for the deaf. Each class in the program had custom-made goals based on previous SAT results and practice test materials. The trainers used the Test Best (1998) booklet and teacher- made practice materials to target skills development in four areas: (a) Practical information, (b) developing test-taking behaviors, (c) applying reading strategies, and (d) learning about the influence of tests. Another crucial step prior to test time is to ensure that each student’s Individualized Education Program (IEP) includes specific accommodations that the student will use during high-stakes tests. Generally, it makes sense for students to have the same accommodations in testing as they have in the classroom so they are best able to show what they know. Several common accommodations are recommended for use with students who are deaf or hard of hearing when they take high-stakes tests (Anderson, et al., 2001; Mounty, 2001):  Extra time  Interpreters o For directions only o For test items/content interpretation  Alternate test forms that exclude phonics and music items  Assistive Listening Devices (ALDs)  Print directions to accompany the oral presentation of directions  Demonstration of the directions to accompany the oral presentation of directions In general, it is recommended that any accommodations and equipment provided to students to achieve access and academic success in their classes should also be provided to them in the testing environment.

A Piece of the Data Pie As previously mentioned, there is no one collecting national data on high-stakes testing of students with hearing loss. However, there is one small study that shed some light on the

5 current state of affairs. In 2003, the National Center on Low-Incidence Disabilities (NCLID) collected data disaggregated by disability category from states’ high stakes testing data for 2001, 2002, and 2003 (http://nclid.unco.edu/outcomes). Most telling was the large amount of data that was not available. Only 13 states had the data disaggregated by disability area. Results of this data indicate that students who are deaf or hard of hearing are not testing as well as other students on the state tests. When compared to the scores of all students, the maximum mean scores for students with hearing loss were lower than the lowest mean scores for all students in five of seven grade levels in math and in seven of eight grade levels in reading. In summary, in none of those states did the mean scores for students with hearing loss approach the mean scores for students without disabilities. It should be noted that many data specialists were reluctant to share their data with the NCLID. While acknowledging that the information paints a bleak picture, and acknowledging that the high-stakes testing process is fraught with peril and errors, it still seems imperative to have the data disaggregated by disability category in order to know how these students are faring. In other words, if high stakes test scores are the way that schools, programs, and states are measuring their success or failure at educating students, then students who are deaf or hard of hearing should be included in such testing and reporting. In addition, it is important to disaggregate the disability data by category so programs and services can also be evaluated and improved.

References American Educational Research Association (2000). Position statement concerning High Stakes Testing in PreK-12 Education. July, 2000. Retrieved June 3, 2004, from http://www.aera.net/about/policy/stakes.htm

Anderson, C., Boyd, B., Brecklein, K., Dietz, C., Gibson-Harmon, K., & Ishman, S.(1997). Basic academic preparation: A report of the National Task Force on Quality of Services in the Postsecondary Education of the Deaf and Hard of Hearing Students. Rochester, NY: Northeast Technical Assistance Center, Rochester Institute of Technology.

Chaleff, C., Toranzo, N. (2000). Helping our students meet the standards through test preparation classes. American Annals of the Deaf, 145, 33-40.

Hanson, D. (2003, Summer). No child left behind: What will it take? CSD Spectrum, pp. 24-28.

Harcourt Brace Educational Measurement. (2002). Stanford Achievement Test (10th ed.). San Antonio, TX: Psychological Corporation.

Heubert, J.P. (2003). High stakes testing in a changing environment: Disparate impact, opportunity to learn, and current legal protections. In S.H. Fuhrman & R.F.Elmore (Eds.), Redesigning accountability systems for education (pp. 220-242). Teachers College: Columbia University.

Heubert, J.P., & Hauser, R.M. (Eds.) (1999). High stakes: Testing for tracking,

6 promotion, and graduation. Washington, DC: National Academy Press. Individuals With Disabilities Education Act Amendments of 1997, Pub. L. No. 105-17, 37 Stat. 111 et seq. (1999).

International Reading Association (1999). High Stakes Assessment in Reading. Retrieved June 3, 2004, from http://www.reading.org/positions/high_stakes.html

Johnson, R. C. (2001, Summer). High-stakes testing and deaf students: Some research perspectives. Odyssey, 2(3), 18-23.

Johnson, R.C. (2004). Educational reform meets deaf education at a national conference. Sign Language Studies, 4, 99-117.

LaSasso, C.J. (1999). Test-taking skills: A missing component of deaf students’curriculum. American Annals of the Deaf, 144, 35-43.

Martin, D.S. (2001). Multiple-Choice Tests: Issues for Deaf Test Takers. [Issue Brief]. National Task Force on Equity in Testing Deaf Individuals. Gallaudet University: Washington, DC.

Moores, D.F. (2000). High stakes testing: Are the stakes too high? [Editorial]. American Annals of the Deaf, 145, 235-236.

Moores, D.F. (2001). Testing…3, 2, 1. [Editorial]. American Annals of the Deaf, 146, 243-244.

Mounty, J.L. (2001). Standardized testing: Considerations for testing deaf and hard of hearing candidates. [Issue Brief]. National Task Force on Equity in Testing Deaf Individuals. Gallaudet University: Washington, DC.

National Association of the Deaf (2002). NAD Position Statement on High-Stakes Assessment and Accountability. Retrieved June 3, 2004, from http://www.nad.org/infocenter/newsroom/positions/hsaa.html

No Child Left Behind Act of 2001. 20 U.S.C. § 6301 et seq. (2001).

Overton, T. (2003). Assessing learners with special needs: An applied approach (4th ed.). Upper Saddle River, NJ: Pearson Education.

Popham, W.J. (2002, November). High stakes tests: Harmful, permanent, fixable. Presentation at the High Stakes Testing conference at Gallaudet University, Washington, DC. Retrieved June 3, 2004, from http://gri.gallaudet,edu/ TestEquity/

7 Quenemoen, R., Thurlow, M., Moen, R., Thompson, S., & Morse, A.B., (2003). Progress Monitoring in an inclusive standards-based assessment and accountability system (Synthesis Report 53). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved 7/6/2004, from the World Wide Web: http:education.umn.edu/NCEO/OnlinePubs/Synthesis53.html

Randall, K., McAnally, P., Rittenhouse, B., Russell, D., & Sorensen, G., (2000). Highstakes testing: What is at stake? [Letter to the Editor] American Annals of theDeaf, 145, 390- 393.

Salvia, J., & Ysseldyke, J. E. (2001). Assessment (8th ed.). Boston: Houghton Mifflin.

Steffan, R. C., Jr. (2004). Navigating the difficult waters of the No Child Left Behind Act of 2001: What it means for education of the deaf. American Annals of the Deaf, 149, 46-50.

Test best on the Stanford Achievement Test ninth edition. (1998). Austin, TX: Steck Vaughn.

Thompson, S.J., Johnstone, C.J., Thurlow, M.L., & Clapper, A.T., (2004). State literacy Standards, practice, and testing: Exploring accessibility (Technical Report 38). Minneapolis, MN: University of Minnesota, National Center on Educational Outcomes. Retrieved 7/7/2004, from the World Wide Web: http:education.umn.edu/NCEO/OnlinePubs/Technical38.htm

Thurlow, M. L. (2003). Biting the bullet: Including special-needs students in accountability systems. In S.H. Fuhrman & R.F. Elmore (Eds.), Redesigning Accountability Systems for Education. Teachers College: Columbia University.

U.S. Department of Education (2002). To assure the free appropriate public education of all children with disabilities. Twenty-fourth annual report to Congress on the implementation of the Individuals with Disabilities Education Act. Washington, DC: Author.

8