Why Use Tests and Assessments? Questions and Answers

Why Use Tests and Assessments? Questions and Answers John Fremer & Janet E. Wall The terms assessment, measurement, and testing will be used heavily in this book. Although the terms are often used interchangeably, there are some distinctions between them. Testing, generally considered to be the most narrow or specific of the terms, tends to refer to a set of questions that has been compiled to measure a specific concept such as achievement or aptitude. Assessment is broader in scope; it encompasses testing, but can also include measurement via observations, interviews, checklists, and other data gathering instruments. The term assessment is used more often in the clinical setting or for determining preferences, interests, and personality types. The term measurement generally refers to the attempt at quantifying the results of tests and assessments. This chapter will outline the purpose of testing and assessment, focus on uses, and highlight some of the limitations of all forms of testing. The concept of testing is one of the major contributions of the field of psychology to society. Carefully developed tests, when used wisely, provide valuable information for decision makers in educational, employment, and clinical settings. It is because of their often‐demonstrated utility that tests and other standardized assessments are so widely used in educational settings. In order to gain the potential benefits that tests offer, it is essential to be aware of their strengths and their limitations. In this chapter, we review these key aspects of high‐quality testing: • What is a test or assessment? • What are the major uses of tests? • What are the key benefits of systematic, high‐quality testing? • What are the frequent criticisms of testing? • How can we promote high‐quality testing? What Is a Standardized Test or Assessment? During the medieval period in Europe, skilled craftsmen who were members of a guild carried with them symbols of their trade. We do not have many examples of that practice now, but the stethoscope around a doctor’s neck, the chalk in the hands of a teacher, or the tool belt of a carpenter or telephone line worker all bring to mind that person’s line of work. What might a tester carry to signal his or her professional role? It could be a copy of the Iowa Test of Basic Skills or the Florida Comprehensive Assessment (FCAT). Perhaps the Myers‐Briggs Type Indicator (MBTI) or the Minnesota Multiphasic Personality Inventory (MMPI). What about a driver’s test or a military entrance exam? Yet other options could be an SAT‐I, advanced placement test, or a copy of an ACT Assessment. Basically, testing is a special way of collecting information used to help make decisions about individuals, programs, or institutions. Tests and assessments are generally made up of items or questions that elicit responses from an individual. It is important to note that merely administering some set of questions or performance tasks is only one part of the testing process. If the tests are never scored and the results never used, we have done only part of what is needed. Yet there are instances ranging from the individual classroom level to nationwide assessment where tests are given and little use is ever made of the information. In order for actual measurement to take place as part of testing, one or more of the following steps must take place: 9 An individual or group must receive a score along with some guide to interpreting that score. 9 The individual or group must be ranked against others who have been tested. 9 The individual or group must be classified into some meaningful category; for example, “gifted,” “shows some evidence of obsessive behavior,” “merits a personal interview,” or “needs further evaluation.” 9 The performance of the individual or group must be compared against some explicit standard. Most instances of testing very clearly meet one or more of these criteria: An individual who takes a required test receives a score on a well‐defined scale and also receives a good deal of comparison information and an interpretive guide. In other instances the situation is not so straightforward. For example, a teacher asks the class to answer a set of questions and to send in electronic or paper responses. The teacher reads all the responses and makes a judgment as to how well the group as a whole has learned the material covered by the questions. Has measurement taken place? Yes, for the class as a unit, but no for the individuals, if the teacher has not classified their responses in any way. In real life, of course, the teacher may recall the specific responses of some students and either confirm or change his or her perception of their level of understanding. For that subset of students the testing process has actually led to measurement. The issue, “What is measurement?” is reviewed by Jones (1971), who notes that although “unanimity concerning the meaning of measurement may appear unlikely . each measurement is purposive . and the purpose is always . to acquire information” (p. 335). What Are the Major Uses of Tests? We have maintained that the basic purpose of tests is to provide information for decision makers. In the last section we made the case that the process must also include assigning a score, rank, or classification of some type. We now want to describe five major uses of test results, as follows: • selection or placement • diagnosis • accountability evaluations • judging progress and following trends • self‐discovery Selection or Placement The use of tests to help select individuals for admissions to an institution or special program is so widespread that it is perhaps best described as a standard feature of U.S. society. Entrance examinations are used as early as entrance into kindergarten and with increased frequency as the student moves up the grades and into college and a profession. Usually test information is combined with grades to make decisions; tests are also frequently used at the college level to grant exemption from or credit for college courses taken while a student is still attending high school (Willingham, Lewis, Morgan, & Ramist, 1990). When using a test to help make selection or placement decisions, it is essential that the decisions made be of higher quality when the tests are used than when they are not. If students are being accepted for admission to a college, for example, the group that is admitted should perform better than the group that would have been chosen without the use of tests. How might we determine whether tests had improved our process for selecting college students? We could look at overall grade point average, grades in specific courses, record of successful completion of the freshman year, or persistence to graduation of the students who were admitted. Each of these criteria has been employed to evaluate the value of college admissions tests. Most often, though, it is freshman grade point average (FGPA) that is employed in studies of the value of the SAT I and SAT II and of the ACT Assessment. FGPA is routinely determined by virtually all colleges, so it is an easy bit of criterion information to obtain. The results from many thousands of studies of the value of college admissions have yielded consistent results. For most colleges, high school grades are the best predictor of college grades (Donlon, 1984). For many colleges, though, admissions test scores are the best single predictors. The most common practice is to use both test scores and high school grades. Increasingly, colleges are looking at all available information about students, including recommendations, personal essays, past accomplishments, community service record, and other evidence of a student’s potential for college achievement and for subsequent contributions to society. In a classic work on this topic, Willingham & Breland (1982) point out that although “some personal qualities are related to success, some have intrinsic merit in their own right and some are demonstrably related to important institutional objectives” (p. 3). Whereas a great deal of study has been devoted to evaluating the strengths and limitations of college admissions tests, much less attention has been given to other uses for tests in educational selection and placement settings. It is very common for one or more tests to be used to select students for gifted and talented or special education programs. Ideally the managers of such programs would first define the student characteristics that each program is designed to nurture and develop. Then they would develop selection procedures to choose the most appropriate students for the program. Some combination of prior academic work, teacher recommendations, and test results will typically be most effective in the selection process. Whatever approach is used, the results should be carefully evaluated to make sure that all the information used is having the desired contribution to picking the group of students for whom the program will be most effective. Diagnosis Tests are also used extensively to evaluate students’ special needs. Test results help educators, counselors, and other professionals plan individualized education programs for students or point out specific misconceptions or problem areas that are hindering progress. Often tests help determine the need for counseling services, especially when students are experiencing high personal stress or engaging in substance abuse or other harmful and dangerous behaviors. The home and workplace are other contexts where physical and psychological problems occur for which tests are often part of the solution. Some of the tests used in diagnostic settings in education measure basic academic skills and knowledge. Has a child mastered basic linguistic and mathematical content? If not, what are the child’s areas of strength and weakness? Often a classroom teacher will ask for special diagnostic testing for a child who is not keeping up with other students or not responding to the teaching methods being employed.

Load more