Reliability and Validity of a Test and Its Procedure

Total Page:16

File Type:pdf, Size:1020Kb

Reliability and Validity of a Test and Its Procedure View metadata, citation and similar papers at core.ac.uk brought to you by CORE 新潟国際情報大学 情報文化学部 紀要 RELIABILITY AND VALIDITY OF A TEST AND ITS PROCEDURE CONDUCTED AT A JAPANESE HIGH SCHOOL Paul Bela Nadasdy Abstract This paper analyses the validity and reliability of an English listening test being used at a private Japanese high school. Through an analysis of the test results,an attempt was made to make salient the qualities and deficiencies of the test and its procedure. The test’s reliability was analysed using a split-half method measuring the coefficient of internal consistency. The split-test’s coefficient results suggested that there was a certain amount of unreliability between the two halves of the test. Although the reliability was below an acceptable level,calculations using the Spearman-Brown formula suggested the possibility of higher coefficiency. Regarding construct,content,criterion-related,and face validity the test appeared valid. Key words:TESTING,VALIDITY,RELIABILITY,HIGH SCHOOL . Introduction Though some have argued whether testing is actually necessary at all, it is generally agreed that it is the most practical way to monitor and systematically rank students. And as tests remain the most popular way to grade students fairly,the quality of their production would seem vital. For test efficiency,validity and reliability need to be present. And as these two condi- tions are important for the effectiveness of testing, it is generally accepted that we can achieve a precise evaluation of our students if they are both consistent. Unsurprisingly, however,the variables that exist in measuring both reliability and validity in tests at times produce a range of results. This paper starts with an analysis of testing in general and of how the examination of validity and reliability is used as a means of quality control in test production. This is followed by an analysis of a listening test that is being used in a high school in Japan. Quantitative and qualitative results are analysed to ascertain whether it is reliable and valid, and this followed by a evaluation of its overall effectiveness. .Review of literature Analysts who have made important contributions within the realm of testing include Paul Bela Nadasdy[情報文化学科] ― 23 ― Oller (1979), Hughes (1989), Bachman (1990), Spolsky (1985), Messick (1996), Fulcher (1997), Cohen et al.(2000),and Chapelle(1999,2003). In defining testing and its usefulness Bachman states that “language tests are indirect indicators of the underlying traits in which we are interested”(1990:33). Davies(1990),Hughes(1989),and Baker(1989)refer to tests in the way that they help us to acquire information,act as a procedure for problem solving,and act as a decision making procedure (Owen 1997:2). Owen also offers an endorsement of testing in that instructors need to monitor student progress independently,which opposes the possibility of inaccurate and biased self-assessment (1997:5). Owen further defines possible motivations for tests in language learning explaining that they assist in ranking students, assist in gauging whether students are able to cope with certain language forms,help us to observe whether learning has been achieved, give useful information relating to forecasting future developments in student performance,and help us to refine what we are teaching and testing. Furthermore, testing can also contribute to establishing whether certain entities are effective such as teachers, schools and teaching methods in comparing them against one another. Among these positive endorsements,Owen also suggests that tests act as a means of control and motivation of our students. However, some commentators draw our attention to the negative reputation that tests have within the teaching community. For example,Hughes (2003)refers to the“mistrust”educators have of tests and testing in general. Validity in testing Two areas should be considered when discussing validity in testing : 1.Consider how closely the test performance resembles the performance we expect outside the test. 2.Consider to what extent evidence of knowledge about the language can be taken as evidence of proficiency. (Owen 1997:13) Referring to the importance of validity in tests, Cohen et al. (2000)state that effective research is impossible or even“worthless”without the presence of validity(2000:105),though they do recommended against aiming for absolute validity. Instead they define the search for validity as being one of minimizing invalidity, maximizing validity, and therefore using measurement in validity as a matter of degree rather than a pursuit of perfection (2000:105). Owen (1997)citing Baker (1989)also considers the accuracy and proficiency of testing and how we evaluate individuals: It is quite useful for understanding tendencies in testing, but...it seems less easy actually to allocate particular tests to one cell rather than another, and...it is not easy to separate ― 24 ― 新潟国際情報大学 情報文化学部 紀要 knowledge of system as a counterpoint to performance from knowledge of a system as indirect evidence of proficiency. (Owen,1997:17) 3.1 Construct, content, criterion-based, and face validity Several categories exist for validity. The following four categories are described by Hughes (1989)and Bachman (1990),these being construct validity,content validity(included within this are internal and external validity),criterion-based validity,and face validity. 3.1.1 Construct validity Construct validity is concerned with the level of accuracy a construct within a test is believed to measure (Brown 1994:256;Bachman & Palmer 1996)and,particularly in ethno- graphic research,“must demonstrate that the categories that the researchers are using are meaningful to the participants themselves”(Cohen et al 2000:110). 3.1.2 Content validity Content validity is concerned with the degree to which the components of a test relate to the real-life situation they are attempting to replicate(Hughes 1989 :22;Bachman 1990:306) and is relevant to the degree to which it proportionately represents. Within the domain of content validity are internal validity and external validity. These refer to relationships between independent and dependent variables when experiments are conducted. External validity occurs when our findings can be related to the general populous, whereas internal validity is related to the elimination of difficult variables within studies. 3.1.3 Criterion-related validity Criterion-related validity “(relates)the results of one particular instrument to another external criterion”(Cohen et al. 2000:111). It contains two primary forms, these being predictive and concurrent validity. Concerning predictive validity, if results from two separate but related experiments or tests produce similar results the original examination is said to have strong predictive validity. Concurrent validity is similar but it is not necessary to have been measured over a span of time and can be “demonstrated simultaneously with another instrument”(2000:112). 3.1.4 Face validity This term relates to what degree a test is perceived to be doing what it is supposed to. In general,face validity in testing describes the look of the test as opposed to whether the test is proved to work or not. 3.2 Messick’s framework of unitary validity Messick’s (1989) framework of unitary validity differs from the previous view which ― 25 ― identifies exclusively content validity,face validity,construct validity,and criterion-related validity as its main elements. Messick considers these sole elements to be inadequate and stresses the need for further consideration of complementary facets of validity, and in particular the examination of scores and construct validity assessment as its key features. Six aspects of validation included in Messick’s paradigm provide “an integrated evaluative judgment of the degree to which empirical evidence and theoretical rationales support the adequacy and appropriateness of inferences and actions based on test scores”(Messick 1989 : 13 cited in Bachman 1990:236). These elements are judgmental/logical analysis, which is concerned with content rele- vance,correlation analyses,which utilizes quantitative analyses in interpreting test scores to gather evidence in support of the test scores,analyses of process,which involves the inves- tigating of test taking,analyses of group difference and change over time,which examines to what extent score properties generalize across population groups, manipulation of tests and test conditions,which is concerned with gathering knowledge about how test intervention affects test scores, and test consequences, which examines elements that affect testing including washback, consequences of score interpretation, and bias in scoring (Bachman 1990;Messick 1996). 3.3 Testing outcomes Considering the above framework defining validity in testing, we need to consider the importance of determining what is appropriate for our students and teaching situations as well as on a larger scale. The importance of analysis in low-stakes testing could be signifi- cant if one considers how data can be collected from the source and used productively. Regarding Chapelle’s(2003)reference to Shepard(1993)in that the primary focuses are testing outcomes and that “a test’s use should serve as a guide to validation”(2003:412),suggests we are in need of a point from where to start our validation analysis from. Chapelle also cites that “as a validation argument is ‘an argument’rather than a ‘thumbs up/thumbs down’ verdict”(Cronbach cited in Chapelle 2003), we start to focus on something that we can generally agree is an important outcome― the result. 4.Reliability in testing Reliability relates to the generalisability,consistency,and stability of a test. Following on from test validity Hughes points out that “if a test is not reliable,it cannot be valid”(2003: 34). Hughes continues that “to be valid a test must provide consistently accurate measure- ments”(2003:50) Therefore it would seem that the higher amount of similarity there is between tests,the more reliable they would appear to be(Hughes:1989). However,Bachman (1990)argues that although the similarity case is relevant,other factors concerning what we are measuring will affect test reliability.
Recommended publications
  • Technology-Enabled and Universally Designed Assessment: Considering Access in Measuring the Achievement of Students with Disabilities— a Foundation for Research
    The Journal of Technology, Learning, and Assessment Volume 10, Number 5 · November 2010 Technology-Enabled and Universally Designed Assessment: Considering Access in Measuring the Achievement of Students with Disabilities— A Foundation for Research Patricia Almond, Phoebe Winter, Renée Cameto, Michael Russell, Edynn Sato, Jody Clarke-Midura, Chloe Torres, Geneva Haertel, Robert Dolan, Peter Beddow, & Sheryl Lazarus www.jtla.org A publication of the Technology and Assessment Study Collaborative Caroline A. & Peter S. Lynch School of Education, Boston College Volume 10, Number 5 Technology-Enabled and Universally Designed Assessment: Considering Access in Measuring the Achievement of Students with Disabilities—A Foundation for Research Patricia Almond, Phoebe Winter, Renée Cameto, Michael Russell, Edynn Sato, Jody Clarke-Midura, Chloe Torres, Geneva Haertel, Robert Dolan, Peter Beddow, & Sheryl Lazarus Editor: Michael Russell [email protected] Technology and Assessment Study Collaborative Lynch School of Education, Boston College Chestnut Hill, MA 02467 Copy Editor: Jennifer Higgins Design: omas Ho mann Layout: Aimee Levy JTLA is a free online journal, published by the Technology and Assessment Study Collaborative, Caroline A. & Peter S. Lynch School of Education, Boston College. Copyright ©2010 by the Journal of Technology, Learning, and Assessment (ISSN 1540-2525). Permission is hereby granted to copy any article provided that the Journal of Technology, Learning, and Assessment is credited and copies are not sold. Preferred citation: Almond, P., Winter, P., Cameto, R., Russell, M., Sato, E., Clarke-Midura, J., Torres, C., Haertel, G., Dolan, R., Beddow, P., & Lazarus, S. (2010). Technology-Enabled and Universally Designed Assessment: Considering Access in Measuring the Achievement of Students with Disabilities—A Foundation for Research.
    [Show full text]
  • What Teachers Know About Validity of Classroom Tests: Evidence from a University in Nigeria
    IOSR Journal of Research & Method in Education (IOSR-JRME) e-ISSN: 2320–7388,p-ISSN: 2320–737X Volume 6, Issue 3 Ver. I (May. - Jun. 2016), PP 14-19 www.iosrjournals.org What Teachers Know About Validity of Classroom Tests: Evidence from a University in Nigeria Christiana Amaechi Ugodulunwa1,Sayita G. Wakjissa2 1,2Department of Educational Foundations, Faculty of Education, University of Jos, Nigeria Abstract: This study investigated university teachers’ knowledge of content and predictive validity of classroom tests. A sample of 89 teachers was randomly selected from five departments in the Faculty of Education of a university in Nigeria for the study. A 41-item Teachers’ Validity Knowledge Questionnaire (TVK-Q) was developed and validated. The data collectedwere analysed using descriptive statistics, t-test and ANOVA techniques. The results of the analysis show that the teachers have some knowledge of content-related evidence, procedure for ensuring coverage and adequate sampling of content and objectives, as well as correlating students’ scores in two measures for predictive validation. The study also reveals that the teachers lack adequate knowledge of criterion-related evidence of validity, concept of face validity and sources of invalidity of test scores irrespective of their gender, academic disciplines, years of teaching experience and ranks. The implications of the results were discussed and recommendations made for capacity building. Keywords:validity, teachers’ knowledge, classroom tests, evidence, university, Nigeria I. Introduction There is increased desire for school effectiveness and improvement at all levels of education in Nigeria. The central role of assessment in the school system for improving teaching and learning therefore demands that classroom tests should be valid and reliable measure of students’ real knowledge and skills and not testwiseness or test taking abilities.
    [Show full text]
  • Curriculum B.Sc. Programme in Clinical Genetics 2020-21
    BLDE (DEEMED TO BE UNIVERSITY) Choice Based Credit System (CBCS) Curriculum B.Sc. Programme in Clinical Genetics 2020-21 Published by BLDE (DEEMED TO BE UNIVERSITY) Declared as Deemed to be University u/s 3 of UGC Act, 1956 The Constituent College SHRI B. M. PATIL MEDICAL COLLEGE, HOSPITAL & RESEARCH CENTRE, VIJAYAPURA Smt. Bangaramma Sajjan Campus, B. M. Patil Road (Sholapur Road), Vijayapura - 586103, Karnataka, India. BLDE (DU): Phone: +918352-262770, Fax: +918352-263303 , Website: www.bldedu.ac.in, E-mail:[email protected] College: Phone: +918352-262770, Fax: +918352-263019, E-mail: [email protected] BLDE (Deemed to be University) Vision: To be a leader in providing quality medical education, healthcare & to become an Institution of eminence involved in multidisciplinary and translational research, the outcome of which can impact the health & the quality of life of people of this region. Mission: To be committed to promoting sustainable development of higher education, including health science education consistent with statutory and regulatory requirements. To reflect the needs of changing technology Make use of academic autonomy to identify dynamic educational programs To adopt the global concepts of education in the health care sector Clinical Genetics 1 BLDE (Deemed to be University) Type Duration Course Hours/ IA Exam Total of of Exam Code Course Name Week Marks Marks Marks Credits paper (Hours) SEMESTER-I BCT 1.1 Fundamentals of T 4 3 30 70 100 2 Cell Biology BCT 1.1P Fundamentals of P 3 3 15 35 50 1 Cell Biology
    [Show full text]
  • Reliability and Validity: How Do These Concepts Influence Accurate Student Assessment?
    LEARNING POINT Reliability and validity: How do these concepts influence accurate student assessment? There are two concepts central to ac- scale reads 50 pounds heavier than Internal consistency reliability can curate student assessment: reliability expected. A second reading shows a be thought of as the degree to which and validity. Very briefly explained, weight that is 60 pounds lighter than responses to all the items on a test reliability refers to the consistency of expected. Clearly, there is something “hang together.” This is expressed as test scores, where validity refers to wrong with this scale! It is not mea- a correlation between 0 and 1, with the degree that a test measures what suring the person’s weight accurately values closer to 1 indicating higher re- or consistently. This is liability or internal consistency within the basic idea behind the measure. Internal consistency reliability for education- values are impacted by the structure al measurements. of the test. Tests that consist of more homogeneous items (i.e., all mea- In practice, education- sure the same thing) will have higher al assessment is not internal consistency values than tests quite as straightfor- that have more divergent content. For ward as the bathroom example, we would expect a test that scale example. There contains only American history items are different ways to to have higher reliability (internal think of, and express, consistency) values than a test that the reliability of educa- contains American history, African tional tests. Depending history, and political science ques- on the format and purpose of the it purports to measure.
    [Show full text]
  • On the Validity of Reading Assessments GOTHENBURG STUDIES in EDUCATIONAL SCIENCES 328
    On the Validity of Reading Assessments GOTHENBURG STUDIES IN EDUCATIONAL SCIENCES 328 On the Validity of Reading Assessments Relationships Between Teacher Judgements, External Tests and Pupil Self-assessments Stefan Johansson ACTA UNIVERSITATIS GOTHOBURGENSIS LOGO GOTHENBURG STUDIES IN EDUCATIONAL SCIENCES 328 On the Validity of Reading Assessments Relationships Between Teacher Judgements, External Tests and Pupil Self-assessments Stefan Johansson ACTA UNIVERSITATIS GOTHOBURGENSIS LOGO © STEFAN JOHANSSON, 2013 ISBN 978-91-7346-736-0 ISSN 0436-1121 ISSN 1653-0101 Thesis in Education at the Department of Education and Special Education The thesis is also available in full text on http://hdl.handle.net/2077/32012 Photographer cover: Rebecka Karlsson Distribution: ACTA UNIVERSITATIS GOTHOBURGENSIS Box 222 SE-405 30 Göteborg, Sweden Print: Ale Tryckteam, Bohus 2013 Abstract Title: On the Validity of Reading Assessments: Relationships Between Teacher Judgements, External Tests and Pupil Self-assessments Language: English with a Swedish summary Keywords: Validity; Validation; Assessment; Teacher judgements; External tests; PIRLS 2001; Self-assessment; Multilevel models; Structural Equation Modeling; Socioeconomic status; Gender ISBN: 978-91-7346-736-0 The purpose of this thesis is to examine validity issues in different forms of assessments; teacher judgements, external tests, and pupil self-assessment in Swedish primary schools. The data used were selected from a large-scale study––PIRLS 2001––in which more than 11000 pupils and some 700 teachers from grades 3 and 4 participated. The primary method used in the secondary analyses to investigate validity issues of the assessment forms is multilevel Structural Equation Modeling (SEM) with latent variables. An argument-based approach to validity was adopted, where possible weaknesses in assessment forms were addressed.
    [Show full text]
  • Construct Validity in Psychological Tests
    CONSTRUCT VALIDITY IN PSYCHOLOGICAL TF.STS - ----L. J. CRONBACH and P. E. MEEID..----- validity seems only to have stirred the muddy waters. Portions of the distinctions we shall discuss are implicit in Jenkins' paper, "Validity for 'What?" { 33), Gulliksen's "Intrinsic Validity" (27), Goo<lenough's dis­ tinction between tests as "signs" and "samples" (22), Cronbach's sepa· Construct Validity in Psychological Tests ration of "logical" and "empirical" validity ( 11 ), Guilford's "factorial validity" (25), and Mosier's papers on "face validity" and "validity gen­ eralization" ( 49, 50). Helen Peak ( 52) comes close to an explicit state­ ment of construct validity as we shall present it. VALIDATION of psychological tests has not yet been adequately concep· Four Types of Validation tua1ized, as the APA Committee on Psychological Tests learned when TI1e categories into which the Recommendations divide validity it undertook (1950-54) to specify what qualities should be investigated studies are: predictive validity, concurrent validity, content validity, and before a test is published. In order to make coherent recommendations constrnct validity. The first two of these may be considered together the Committee found it necessary to distinguish four types of validity, as criterion-oriented validation procedures. established by different types of research and requiring different interpre· TI1e pattern of a criterion-oriented study is familiar. The investigator tation. The chief innovation in the Committee's report was the term is primarily interested in some criterion which he wishes to predict. lie constmct validity.* This idea was first formulated hy a subcommittee administers the test, obtains an independent criterion measure on the {Meehl and R.
    [Show full text]
  • EDS 245 Psychology in the Schools Stephen E. Brock, Ph.D., NCSP 1
    EDS 245 Psychology in the Schools EDS 245: Psychology in the Schools Stephen E. Brock, Ph.D., NCSP California State University, Sacramento Introduction to Psychological Testing IMPORTANT!!! The final exam will emphasize the material covered in this second half of the course. Psychological Assessment: A Credentialing Standard NASP Practice Model: Domains of Practice Stephen E. Brock, Ph.D., NCSP 1 EDS 245 Psychology in the Schools Psychological Assessment: A Credentialing Standard NASP Practice Model: Domains of Practice Domain 1: Data-Based Decision-Making and Accountability: School psychologists have knowledge of varied models and methods of assessment and data collection for identifying strengths and needs, developing effective services and programs, and measuring progress and outcomes. Psychological Assessment: A Credentialing Standard NASP Practice Model: Domains of Practice Domain 1: Data-Based Decision-Making and Accountability: As part of a systematic and comprehensive process of effective decision making and problem solving that permeates all aspects of service delivery, school psychologists demonstrate skills to use psychological and educational assessment, data collection strategies, and technology resources and apply results to design, implement, and evaluate response to services and programs. Psychological Assessment: A Credentialing Standard NASP Practice Model: Domains of Practice Domain 1: Data-Based Decision-Making and Accountability: Examples of professional practices associated with data-based decision making and accountability include the following: School psychologists use a problem-solving frame-work as the basis for all professional activities. School psychologists systematically collect data from multiple sources as a foundation for decision-making and consider ecological factors (e.g., classroom, family, community characteristics) as a context for assessment and intervention in general and special education settings.
    [Show full text]
  • “Is the Test Valid?” Criterion-Related Validity– 3 Classic Types Criterion-Related Validity– 3 Classic Types
    “Is the test valid?” Criterion-related Validity – 3 classic types • a “criterion” is the actual value you’d like to have but can’t… Jum Nunnally (one of the founders of modern psychometrics) • it hasn’t happened yet and you can’t wait until it does claimed this was “silly question”! The point wasn’t that tests shouldn’t be “valid” but that a test’s validity must be assessed • it’s happening now, but you can’t measure it “like you want” relative to… • It happened in the past and you didn’t measure it then • does test correlate with “criterion”? -- has three major types • the specific construct(s) it is intended to measure Measure the Predictive Administer test Criterion • the population for which it is intended (e.g., age, level) Validity • the application for which it is intended (e.g., for classifying Concurrent Measure the Criterion folks into categories vs. assigning them Validity Administer test quantitative values) Postdictive Measure the So, the real question is, “Is this test a valid measure of this Administer test Validity Criterion construct for this population for this application?” That question can be answered! Past Present Future The advantage of criterion-related validity is that it is a relatively Criterion-related Validity – 3 classic types simple statistically based type of validity! • does test correlate with “criterion”? -- has three major types • If the test has the desired correlation with the criterion, then • predictive -- test taken now predicts criterion assessed later you have sufficient evidence for criterion-related
    [Show full text]
  • An Evaluation Paradox: the Issues of Test Validity in the Realm of Writing Test
    An Evaluation Paradox: The Issues of Test Validity in the Realm of Writing Test DINAMIKA ILMU Vol. 17 No. 1, 2017 P-ISSN: 1411-3031; E-ISSN: 2442-9651 doi: http://dx.doi.org/10.21093/di.v17i1.585 An Evaluation Paradox: The Issues of Test Validity in the Realm of Writing Test as the Final School Examination in the Indonesian Senior High School Milieu1 David Imamyartha Universitas Negeri Jember, Indonesia e-mail: [email protected] Gunadi Harry Sulistyo Universitas Negeri Malang, Indonesia e-mail: [email protected] Abstract Even though there are four English language skills in the Indonesia’s national curriculum at upper secondary schools, each of these skills is given an unequal emphasis since only reading and listening skills are formally tested in the national examination. Although writing competence possesses a particular stake as the determinant of students’ achievement after students undergo a three-year education at the upper secondary school level, it appears that the existing writing tests are low in terms of test validity, as demonstrated by a preliminary study. A further study is carried out to probe the issues of test validity by deploying the framework of test validity, which pertains to theory-based validity, context validity, scoring validity, criterion-related validity, and consequential validity in the scrutiny of the existing writing tests. It is revealed that the current writing tests are fraught with validity problems in all of these facets of test validity. This is corroborated by interview data in the preliminary study and the analysis of the existing writing tests.
    [Show full text]
  • The Predictive Validity of Kindergarten Screening Measures Is Strengthened by Examining the Relationships at Both the Pupil-Level and the Between-School Levels
    IDENTIFYING CHILDREN AT RISK: THE PREDICTIVE VALIDITY OF KINDERGARTEN SCREENING MEASURES By S. Suzanne Jacobsen B.Sc, California State Polytechnic University, 1973 M.A., California State Polytechnic University, 1978 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS OF THE DEGREE OF DOCTOR OF EDUCATION in THE FACULTY OF EDUCATION Department of Educational Psychology and Special Education We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA August 1990 ® S. Suzanne Jacobsen In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. The University of British Columbia Vancouver, Canada DE-6 (2/88) Abstract Identifying Children at Risk: The Predictive Validity of Kindergarten Screening Measures The early identification of children who are "at risk" of experiencing learning problems is of interest to educators and policy• makers. Conflicting evidence exists regarding the efficacy of screening measures for identifying children "at risk". The rationale for screening programs is that early identification of problems allows for treatment which may eliminate more severe problems from developing. If a student is identified as "at risk", school personnel may intervene with remedial programs. Subsequently, if the student succeeds, the earlier prediction is no longer valid.
    [Show full text]
  • 4. the Status of Test Validation Research
    University of Nebraska - Lincoln DigitalCommons@University of Nebraska - Lincoln Social and Technical Issues in Testing: Buros-Nebraska Series on Measurement and Implications for Test Construction and Usage Testing 1984 4. The Status of Test Validation Research Lyle F. Schoenfeldt Texas A&M University Follow this and additional works at: https://digitalcommons.unl.edu/burostestingissues Part of the Educational Assessment, Evaluation, and Research Commons Schoenfeldt, Lyle F., "4. The Status of Test Validation Research" (1984). Social and Technical Issues in Testing: Implications for Test Construction and Usage. 6. https://digitalcommons.unl.edu/burostestingissues/6 This Article is brought to you for free and open access by the Buros-Nebraska Series on Measurement and Testing at DigitalCommons@University of Nebraska - Lincoln. It has been accepted for inclusion in Social and Technical Issues in Testing: Implications for Test Construction and Usage by an authorized administrator of DigitalCommons@University of Nebraska - Lincoln. Published in SOCIAL AND TECHNICAL ISSUES IN TESTING: IMPLICATIONS FOR TEST CONSTRUCTION AND USAGE, edited by Barbara S. Plake (Hillsdale, NJ: Lawrence Erlbaum Associates, 1984). Copyright © 1984 by Lawrence Erlbaum Associates. Digital edition copyright © 2012 Buros Center for Testing. The Status of Test Val idation Research Lyle F. Schoenfeldt Texas A & M University More than any other area, validation research is where the "rubber meets the road" in test construction and test usage. The very term validation implies the assessment or measurement of individuals and the relationship of this assessment to some criterion of performance. The success of a test validation effort, or the lack thereof, has implications for the value of the assessment and for the utility of the procedures.
    [Show full text]
  • An Unfinished Canvas
    An Unfinished Canvas A Review of Large-Scale Assessment in K–12 Arts Education Research conducted by SRI International This study was commissioned by The William and Flora Hewlett Foundation. Copies of this and other An Unfinished Canvas reports can be downloaded from http://policyweb.sri.com/cep/publications Suggested citation: Stites, R. & Malin, H. (2008). An unfinished canvas. A review of large-scale assessment in K–12 arts education. Menlo Park, CA: SRI International. Silicon Valley-based SRI International is one of the world’s leading independent research and technology development organizations. Founded as Stanford Research Institute in 1946, SRI has been meeting the strategic needs of clients for more than 60 years. SRI’s Center for Education Policy studies reforms that hold promise for improving the K-16 system of schooling and lifelong learning. The Center conducts research and evaluations on the design, implementation, and impact of educational programs, especially improvement efforts. For more information about SRI, please visit our Web site: www.sri.com. SRI International 333 Ravenswood Avenue Menlo Park, CA 94025 Phone: 650.859.2000 Copyright © 2008 SRI International. All rights reserved. An Unfinished Canvas A Review of Large-Scale Assessment in K–12 Arts Education Regie Stites Heather Malin Center for Education Policy SRI International 2008 CONTENTS Exhibits ................................................................................................................................ iii Acknowledgements................................................................................................................v
    [Show full text]