Methods for Measuring the Correctness of a Prediction

This project has received funding from the European Union’s Preparatory Action for Defence Research - PADR programme under grant agreement No 800893 [PYTHIA] D2.1 – Methods for measuring the correctness of a prediction WP number and title WP2 – The cognitive factor in foresight Lead Beneficiary HAWK Contributor(s) Z&P, ENG, BDI, ICSA Deliverable type Report Planned delivery date 31/05/2018 Last Update 25/09/2018 Dissemination level PU PYTHIA Project PADR-STF-01-2017 – Preparatory Action on Defence Research Grant Agreement n°: 800893 Start date of project: 1 February 2018 Duration: 18 months D2.1 – Methods for measuring the correctness of a prediction Disclaimer This document contains material, which is the copyright of certain PYTHIA contractors, and may not be reproduced or copied without permission. All PYTHIA consortium partners have agreed to the full publication of this document. The commercial use of any information contained in this document may require a license from the proprietor of that information. The PYTHIA Consortium consists of the following partners: Partner Name Short name Country 1 Engineering Ingegneria Informatica S.p.A. ENG Italy 2 Zanasi & Partners Z&P Italy 3 Expert System France ESF France 4 Hawk Associates Ltd HAWK UK 5 Military University of Technology WAT Poland 6 Bulgarian Defence Institute BDI Bulgaria 7 Fondazione ICSA ICSA Italy 8 Carol I National Defense University NDU Romania 2 D2.1 – Methods for measuring the correctness of a prediction Document History VERSION DATE STATUS AUTHORS, REVIEWER DESCRIPTION V0.1 01/05/2018 Draft Hawk First draft V0.2 25/05/2018 Draft Hawk, Z&P Second draft V0.3 31/05/2018 Complete Draft Hawk, Z&P, ICSA Version ready for peer review V0.4 05/06/2018 Final Draft Hawk, Z&P, ENG, BDI, Final Draft pre submission ICSA V0.5 12/06/2018 Final Updated Hawk, Z&P, ENG, BDI, Final Updated Version Version ICSA V0.6 12/06/2018 Final Hawk, Z&P, ENG, BDI, Final submitted version ICSA V0.7 19/09/2018 Final Hawk, Z&P, ENG, BDI, Final version post-EDA ICSA comments 3 D2.1 – Methods for measuring the correctness of a prediction Definitions, Acronyms and Abbreviations ACRONYMS / DESCRIPTION ABBREVIATIONS CRPS Continuous Ranked Probability Score RMSE Root Mean Square Error PEs Percentage Errors CSI Critical Success Index ETC Equitable Threat Score FAR False Alarm Ratio GMRAE Geometric Mean of the RAE MAE Mean Absolute Error MAD MEAN Absolute Deviation MAPE Mean Absolute Percentage Error MAPD Mean Absolute Percentage Deviation RPSS Ranked Probabilistic Skill Score ROC Receiver Operating Characteristics UAP Unbiased Absolute Percentage Error 4 D2.1 – Methods for measuring the correctness of a prediction Table of Contents Disclaimer ......................................................................................................................................................... 2 Table of Contents .............................................................................................................................................. 5 Executive Summary .......................................................................................................................................... 6 1 Introduction ............................................................................................................................................... 8 2 Terminology ............................................................................................................................................... 9 3 Context .................................................................................................................................................... 10 4 Types of Forecast ..................................................................................................................................... 12 4.1 Quantitative versus qualitative methodologies ................................................................................ 12 4.2 Exploratory versus normative methodologies .................................................................................. 13 4.3 Forecast versus foresight methodologies ......................................................................................... 13 5 Literature Review ..................................................................................................................................... 14 5.1 Informal Methods for Measuring Forecast Accuracy ....................................................................... 14 5.1.1 Objective Performance Feedback .............................................................................................. 15 5.1.2 Peer Review and Critical Review Process ................................................................................... 15 5.1.3 Training Approaches .................................................................................................................. 15 5.1.4 Inferred Probabilities and Blind Retrospective Assessments ..................................................... 16 5.2 Formal Methods for Measuring Forecast Accuracy .......................................................................... 16 5.2.1 Proper Scoring Rules .................................................................................................................. 16 5.2.2 Brier scoring rule ........................................................................................................................ 17 5.2.3 Logarithmic scoring rule ............................................................................................................. 17 5.2.4 Spherical scoring rule ................................................................................................................. 18 5.2.5 Continuous Ranked Probability Score (CRPS) ............................................................................. 18 5.2.6 Loss function .............................................................................................................................. 18 5.2.7 Root Mean Square Error (RMSE) ................................................................................................ 19 5.2.8 Measures based on percentage errors (PEs) .............................................................................. 19 5.2.9 Binary Logistic Models ................................................................................................................ 20 5.2.10 Critical Success Index (CSI) ....................................................................................................... 20 5.2.11 Equitable Threat Score (ETC) .................................................................................................... 20 5.2.12 False Alarm Ratio (FAR) ............................................................................................................ 20 5.2.13 Forecast Skill Score (SS) ............................................................................................................ 21 5.2.14 Geometric Mean of the RAE (GMRAE) ..................................................................................... 21 5.2.15 MAE (Mean Absolute Error) / MEAN Absolute Deviation (MAD) ............................................ 21 5.2.16 MAPE (Mean Absolute Percentage Error) / MAPD (Mean Absolute Percentage Deviation) ... 21 5.2.17 Ranked Probabilistic Skill Score (RPSS) .................................................................................... 22 5.2.18 Receiver Operating Characteristics (ROC) ................................................................................ 22 5.2.19 The ‘Goodness Score’, discernment factor, base algorithm .................................................... 22 5.2.20 Unbiased Absolute Percentage Error or UAPE ......................................................................... 22 6 Conclusions .............................................................................................................................................. 23 7 References ............................................................................................................................................... 26 5 D2.1 – Methods for measuring the correctness of a prediction Executive Summary This document attempts to review the ‘family’ of approaches to forecasting accuracy in order to make recommendations as to the most appropriate choices for PYTHIA. An Executive Summary follows here: • The importance of measuring the accuracy of a forecast is that it helps determine if one forecasting method or system is better than another one. In other words, if we do not measure the correctness of our predictions we cannot expect to improve their results • This is true for individual forecasters or for comparing results from different forecasting groups. The evaluation of past forecast performances becomes a key factor in improving future forecasts • When measuring forecast accuracy, we must define the type of forecast, its purpose or aim, and set specific criteria for the evaluation metrics. Standard terms define the data sets, the time this data is taken for and the forecast horizon • These evaluation or assessment metrics (the measures used to gauge accuracy) can be broadly categorised into formal or informal methods. The formal approach uses statistical analysis and the informal approach is empirical and uses a critical review process (e.g. expert appraisals and analytical judgement) • Both approaches carry flaws: the formal approach can be misleading, depending on data size and type of

Methods for Measuring the Correctness of a Prediction

Recommendations for the Verification and Intercomparison of Qpfs from Operational NWP Models

Guidance on Verification of Operational Seasonal Climate Forecasts

A Comparative Verification of Precipitation Forecasts for Selected Numerical Weather Prediction Models Over Kenya

FORECASTERS' FORUM Comments on “Discussion of Verification Concepts in Forecast Verification: a Practitioner's Guide in At

Improved Approaches for Measuring the Quality of Convective Weather Forecasts

Twenty-One Years of Wave Forecast Verification

Forecast Verificaiton in the African Swfdps

Chapter 20 Numerical Weather Prediction (NWP)