Evaluation of the Statistical Elements of the Teaching Excellence and Student Outcomes Framework

Evaluation of the Statistical Elements of the Teaching Excellence and Student Outcomes Framework A report by the Methodology Advisory Service of the Office for National Statistics Gareth James Daniel Ayoubkhani Jeff Ralph Gentiana D. Roarson July 2019 1 Contents List of tables ............................................................................................................................................ 5 List of figures ........................................................................................................................................... 7 List of acronyms ...................................................................................................................................... 8 Executive summary ................................................................................................................................. 9 Summary of recommendations ............................................................................................................ 16 1. Introduction ...................................................................................................................................... 21 1.1 Context ........................................................................................................................................ 21 1.2 An introduction to TEF and its statistics ..................................................................................... 21 1.3 About this report ......................................................................................................................... 22 1.3.1 Scope ............................................................................................................................... 22 1.3.2 Sources of information ................................................................................................... 23 1.3.3 Analytical work ................................................................................................................ 24 1.3.4 Qualitative analysis ......................................................................................................... 24 1.4 Structure of this report ............................................................................................................... 24 1.5 Acknowledgements ..................................................................................................................... 25 1.6 About the authors ....................................................................................................................... 26 2. Overview of the methods used for core metrics .............................................................................. 27 2.1 Measures associated with the core metrics ............................................................................... 27 2.2 Current approaches to combining core metrics to form the starting point in Step 1a .............. 29 2.2.1 The 2017 and 2018 methods .......................................................................................... 29 2.2.2 The core metrics and their weights ................................................................................ 29 2.2.3 Starting points and Step 1a ............................................................................................. 30 3. Evaluation of current Step 1a methods and data ............................................................................. 33 3.1 Reference period: timeliness, comparability and stability ................................................ 33 3.2 The indicators .............................................................................................................................. 34 3.2.1 Statistical uncertainty in the indicators and super-populations ..................................... 34 3.3 Benchmarking ............................................................................................................................. 36 3.3.1 General approach ........................................................................................................... 36 3.3.2 Selection of the benchmarking factors ........................................................................... 37 3.3.3 Causality and attribution ................................................................................................ 39 3.3.4 Grouping by provider type: a discussion ........................................................................ 41 3.4 z-scores and assumptions of normality ...................................................................................... 43 2 3.4.1 Basics ............................................................................................................................... 43 3.4.2 Target populations .......................................................................................................... 43 3.4.3 Normality assumptions ................................................................................................... 43 3.4.4 Estimation of the standard deviation of d ...................................................................... 44 3.4.5 Associations between flags and size of provider ............................................................ 45 3.4.6 Use of z-scores ................................................................................................................ 49 3.5 Correlation between core metrics .............................................................................................. 53 3.6 Comparison of weights between the 2017 and 2018 methods .................................................. 56 3.7 Binary nature of flags .................................................................................................................. 57 3.8 Missing metrics ........................................................................................................................... 60 3.9 Majority mode ............................................................................................................................ 63 3.10 Comments on the principle of combining metrics ................................................................... 63 4. Possible adaptations to Step 1a processes ....................................................................................... 65 4.1 Starting points under the current process .................................................................................. 65 4.1.1 A net total value: differencing the positive and negative flag values ............................. 65 4.1.2 Penalties for negative flags ............................................................................................. 67 4.1.3 Further thoughts on net measures ................................................................................. 68 4.2 Development of a single, overall measure that does not compare against benchmarks .......... 69 4.3 Sensitivity analysis of alternative options in Step 1a .................................................................. 71 4.3.1 Changes to weights ......................................................................................................... 71 4.3.2 Changes to thresholds used in flagging rules ................................................................. 72 4.3.3 Weighting together metrics for full-time and part-time students ................................. 73 5. Contextual data ................................................................................................................................. 74 5.1 High and low absolute values ..................................................................................................... 74 5.1.1 Principles ......................................................................................................................... 75 5.1.2 Documentation ............................................................................................................... 75 5.1.3 Methods and process ...................................................................................................... 75 5.2 Split metrics ................................................................................................................................ 77 5.3 Data analysis of pilot subject-level metrics ................................................................................ 81 6. Statistical infrastructure .................................................................................................................... 84 6.1 Harmonisation ............................................................................................................................ 84 6.1.1 Disability .......................................................................................................................... 84 6.1.2 Ethnicity .......................................................................................................................... 84 6.1.3 Gender identity ............................................................................................................... 85 6.1.4 Sex ................................................................................................................................... 85 6.2 Classifications .............................................................................................................................. 86 3 6.2.1 Standard Occupational Classification (SOC).................................................................... 86 6.2.2. Common Aggregation Hierarchy

Evaluation of the Statistical Elements of the Teaching Excellence and Student Outcomes Framework

©2018 Oxford University Press

The Problem of Multiple Testing and Its Solutions for Genom-Wide Studies

Simultaneous Confidence Intervals for Comparing Margins of Multivariate

P and Q Values in RNA Seq the Q-Value Is an Adjusted P-Value, Taking in to Account the False Discovery Rate (FDR)

Dataset Decay and the Problem of Sequential Analyses on Open Datasets

A Partitioning Approach for the Selection of the Best Treatment

Multiple Testing Corrections

Multiple Testing with Gene Expression Array Data

Don't Have to Worry About Multiple Comparisons

False Discovery Rate Computation: Illustrations and Modiﬁcations by Megan Hollister Murray and Jeffrey D

Machine-Learning Tests for Effects on Multiple Outcomes

Multiple Comparisons