On the Validity of Reading Assessments GOTHENBURG STUDIES in EDUCATIONAL SCIENCES 328

On the Validity of Reading Assessments GOTHENBURG STUDIES IN EDUCATIONAL SCIENCES 328 On the Validity of Reading Assessments Relationships Between Teacher Judgements, External Tests and Pupil Self-assessments Stefan Johansson ACTA UNIVERSITATIS GOTHOBURGENSIS LOGO GOTHENBURG STUDIES IN EDUCATIONAL SCIENCES 328 On the Validity of Reading Assessments Relationships Between Teacher Judgements, External Tests and Pupil Self-assessments Stefan Johansson ACTA UNIVERSITATIS GOTHOBURGENSIS LOGO © STEFAN JOHANSSON, 2013 ISBN 978-91-7346-736-0 ISSN 0436-1121 ISSN 1653-0101 Thesis in Education at the Department of Education and Special Education The thesis is also available in full text on http://hdl.handle.net/2077/32012 Photographer cover: Rebecka Karlsson Distribution: ACTA UNIVERSITATIS GOTHOBURGENSIS Box 222 SE-405 30 Göteborg, Sweden Print: Ale Tryckteam, Bohus 2013 Abstract Title: On the Validity of Reading Assessments: Relationships Between Teacher Judgements, External Tests and Pupil Self-assessments Language: English with a Swedish summary Keywords: Validity; Validation; Assessment; Teacher judgements; External tests; PIRLS 2001; Self-assessment; Multilevel models; Structural Equation Modeling; Socioeconomic status; Gender ISBN: 978-91-7346-736-0 The purpose of this thesis is to examine validity issues in different forms of assessments; teacher judgements, external tests, and pupil self-assessment in Swedish primary schools. The data used were selected from a large-scale study––PIRLS 2001––in which more than 11000 pupils and some 700 teachers from grades 3 and 4 participated. The primary method used in the secondary analyses to investigate validity issues of the assessment forms is multilevel Structural Equation Modeling (SEM) with latent variables. An argument-based approach to validity was adopted, where possible weaknesses in assessment forms were addressed. A fairly high degree of correspondence between teacher judgements and test results was found within classrooms with a correlation of .65 being obtained for 3rd graders, a finding well in line with documented results in previous research. Grade 3 teachers’ judgements correlated higher than those of grade 4 teachers. The longer period of time spent with the pupils, as well as their different education, were suggested as plausible explanations. Gender and socioeconomic status (SES) of the pupils showed a significant effect on the teacher judgements, in that girls and pupils with higher SES received higher judgements from teachers than test results accounted for. Teachers with higher levels of formal competence were shown to have pupils with higher achievement levels. Pupil achievement was measured with both teacher judgements and PIRLS test results. Furthermore, higher correspondence between judgements and test-results was demonstrated for teachers with higher levels of competence. Comparisons of classroom achievement were shown to be problematic with the use of teachers’ judgements. The judgements reflected different achievement levels, despite the fact that test-results indicated similar performance levels across classrooms. Pupil self-assessments correlated slightly lower to both teacher judgement and to test results, than did teacher judgements and test results. However, in spite of their young age, pupils assessed their knowledge and skills in the reading domain relatively well. No differences in self-assessments were found for pupils of different gender or SES. In summary, a conclusion of the studies on the three forms of assessment was that all have certain limitations. Strengths and weaknesses of the different assessment forms were discussed. Table of contents Acknowledgements Chapter One: Introduction and points of departure ........................................................................... 11 Purpose ................................................................................................................................................. 13 Guidance for readers ........................................................................................................................... 13 Chapter Two: Assessment of educational achievement ...................................................................... 17 Common notions of educational assessment .................................................................................. 18 Assessing reading literacy in Swedish primary schools .................................................................. 19 Chapter Three: Validating measures of achievement .......................................................................... 23 Validity .................................................................................................................................................. 23 Early definitions of validity ........................................................................................................... 24 Criterion validity ............................................................................................................................. 25 Content validity .............................................................................................................................. 25 Construct validity as the whole of validity .................................................................................. 26 Threats to construct validity ......................................................................................................... 27 Validation .............................................................................................................................................. 28 Using an argument structure for validation ................................................................................ 29 Toulmin’s structure of arguments ................................................................................................ 29 Chapter Four: Relations between different forms of assessment: An overview ............................. 31 Teachers assessing pupil achievement .............................................................................................. 32 Factors influencing teacher judgements ...................................................................................... 35 Pupils assessing their own achievement ........................................................................................... 39 Factors influencing pupil self-assessments ................................................................................. 41 Chapter Five: Methodology ..................................................................................................................... 43 Data ....................................................................................................................................................... 43 Variables .......................................................................................................................................... 44 Methods of analysis ............................................................................................................................. 48 Latent variable modeling ............................................................................................................... 49 Multilevel modeling ........................................................................................................................ 51 Random slope modeling................................................................................................................ 53 Assessing model fit ........................................................................................................................ 55 Missing data ..................................................................................................................................... 55 Analytical stages ................................................................................................................................... 56 The structure of arguments .......................................................................................................... 56 Chapter Six: Results and Discussion ...................................................................................................... 59 Validating teacher judgements for use within classrooms and for classroom comparisons .... 59 Assessment within classroom ....................................................................................................... 59 Classroom comparisons ................................................................................................................ 61 Pupil self-assessments in relation to other forms of assessment .................................................. 63 Factors influencing teacher judgements and pupil self-assessment ............................................. 65 The influence of SES and gender on pupil self-assessment within classrooms .................... 68 Exploring the relationship between teacher competence, teacher judgements and pupil test results ..................................................................................................................................................... 68 Chapter Seven: Concluding Remarks ..................................................................................................... 73 Methodological issues .......................................................................................................................... 74 Future research ....................................................................................................................................

On the Validity of Reading Assessments GOTHENBURG STUDIES in EDUCATIONAL SCIENCES 328

Technology-Enabled and Universally Designed Assessment: Considering Access in Measuring the Achievement of Students with Disabilities— a Foundation for Research

What Teachers Know About Validity of Classroom Tests: Evidence from a University in Nigeria

Curriculum B.Sc. Programme in Clinical Genetics 2020-21

Heterogeneous Versus Homogeneous Measures: a Meta-Analysis of Predictive Efficacy

Reliability and Validity: How Do These Concepts Influence Accurate Student Assessment?

Construct Validity in Psychological Tests

EDS 245 Psychology in the Schools Stephen E. Brock, Ph.D., NCSP 1

Questionnaire Validity and Reliability

Running Head: the STRUCTURE of PSYCHOPATHIC PERSONALITY 1 Evaluating the Structure of Psychopathic Personality Traits: a Meta-An

“Is the Test Valid?” Criterion-Related Validity– 3 Classic Types Criterion-Related Validity– 3 Classic Types

An Evaluation Paradox: the Issues of Test Validity in the Realm of Writing Test

The Predictive Validity of Kindergarten Screening Measures Is Strengthened by Examining the Relationships at Both the Pupil-Level and the Between-School Levels