KOMUSO

Information for the Big Data society in official statistics

Peter Stoltze, Statistics Denmark What is KOMUSO?

 KOMUSO is an imprecise acronym for the ESSnet entitled Quality of multisource statistics.  It is also…

2 What is KOMUSO?

 … a group of Japanese mendicant monks of the Fuke school of who flourished during the Edo period of 1600- 1868. Komuso were characterized by a straw bascinet (a sedge or reed hood named a tengai or tengui) worn on the head, manifesting the absence of specific ego. They were also known for playing solo pieces on the (a type of Japanese bamboo flute). These pieces, called , were played during a meditative practice called suizen, for alms, as a method of attaining enlightenment, and as a healing modality.

3 KOMUSO  ESS.VIP.ADMIN

4 Overall project objective

 This project has a dual purpose:

. To reap the benefits of using administrative data sources for the production of official statistics;

. To guarantee the quality of the output produced using administrative sources, in particular the comparability of the statistics required for European purposes

5 Organization and project period

 Project participants are the NSIs of the Netherlands, Italy, Norway, Hungary, Austria, Lithuania, Ireland, and Denmark (coordinator)  First project period is from January 2016 to April 2017

6 WP1: Checklists for evaluating the quality of input data

 Task 1: Critical review and testing of existing methodology

 Task 2: Commented repository on CROS portal

 Task 3: Consolidated version of checklist (tested with at least 3 different sources)

 Task 4: Identification of possible gaps (recommendations for further work)

7 WP1: Checklist with 17 indicators

 Accuracy (8 indicators) . Item nonresponse; Misclassification rate; Undercoverage; Overcoverage; Size of revisions (Relative Absolute Revisions); % of units which fail checks; % of units with adjusted variables; % of imputed values  Timeliness and punctuality (2 indicators) . Periodicity; Delay to accessing source from end of reference period  Coherence (2 indicators) . % of common units across two or more admin sources; % of relevant units in admin data which have to be adjusted to create statistical units  Comparability (1 indicator) . Discontinuity in estimate when moving from a survey-based output to an output involving admin data  Cost and efficiency (2 indicators) . % of items obtained from admin source and also collected by survey; Cost of using data source  Use of administrative data (2 indicators) . % of items obtained exclusively from admin data; % of required variables which are derived using admin data as a proxy

8 WP2: Methodology for the assessment of the quality of frames for social statistics

 Task 1: Literature review on quality measures and related practices of frames of e.g. person/household and address/dwelling for social statistics  Task 2: Comparative analysis of frames for social and business statistics  Task 3: Gap analysis between existing quality measures and related practices relevant to frames for social statistics  Task 4: Proposal of quality measures and indicators: . List A contains existing quality measures and related methodology. . List B contains quality measures that need to be developed, for which a sound methodology is currently lacking. . Moreover, list B0 will be identified as a subset of list B, which consists of the quality measures that are to be developed within the scope of current SGA.  Task 5: Development and test

9 WP2: Frames

 The table quantifies jointly under-coverage, erroneous frame units and domain classification errors, in the (simple) case where the frame units and population elements are in a one-one relationship

10 WP3: Framework for the quality evaluation of statistical output based on multiple sources  Task 1: Critical review of existing quality measures and approaches to evaluate and compare the quality of the output based on several sources

 Task 2: Tests of the suitability of existing and proposed quality measures and approaches in several domains

 Task 3: Development of an action plan detailing the work that needs to be done in order to develop a theoretical framework for measuring output quality

11 WP3: Quality of output…

 Two examples of basic data configurations: Combining overlapping microdata sources without coverage problems (left) and combining overlapping microdata sources with undercoverage (right)

12 Further outlook

 First SGA ends April next year, so planning of next project period is in full swing

 Second specific grant agreement 2017Q2-2018Q2 . Continuation of methodological studies

 Third specific grant agreement 2018Q3-2019Q2 . Finalization of a Eurostat handbook

13 Thanks for your attention!

 Homepage at CROS portal: https://ec.europa.eu/eurostat/cros/content/essnet- quality-multisource-statistics_en

14