Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys
Total Page:16
File Type:pdf, Size:1020Kb
—————————————————————————————————————————— Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys —————————————————————————————————————————— ISTAT Orietta Luzi (Coordinator) CBS Ton De Waal SFSO Beat Hulliger Marco Di Zio Jeroen Pannekoek Daniel Kilchmann Ugo Guarnera Jeffrey Hoogland Antonia Manzari Caren Tempelman ©ISTAT, CBS, SFSO, EUROSTAT ii ©ISTAT, CBS, SFSO, EUROSTAT Foreword Overview Eurostat and the European National Statistical Institutes (NSIs) need to provide detailed high-quality data on all relevant aspects (economic, social, demographic, environmental) of modern societies. Moreover, the quality declaration of the European Statistical System (ESS) states that continuously improving a programme of harmonized European statistics is one of the major objectives of the ESS. Given the differences among the existing national statistical systems, the development of new approaches aiming at standardizing the statistical production processes for improving the quality of statistics and for a successful harmonization is essential. In particular, one of the two main purposes of the “European Statistics Code of Practice”, adopted by the Statistical Programme Committee on 24 February 2005, is: “To promote the application of best international statistical principles, methods and practices by all producers of European Statistics in order to enhance their quality”. In this framework, the European Community (represented by the Commission of the European Com- munities) is supporting the development of a set of handbooks on current best methods in order to facilitate the implementation of principle 8 of the Code of Practice - “Appropriate statistical proce- dures, implemented from data collection to data validation, must underpin quality statistics”. This work should capitalize on what has already been done in Member Countries, consolidate or comple- ment existing information and present it in a harmonized way. This action is directly related to the recommendations of the Leadership Expert Group (LEG) on Quality (LEG on Quality, 2001). The LEG on Quality stated that Recommended Practices (RPs) can be considered a highly effective harmonization method which can, at the same time, provide to NSIs the necessary flexibility for using the “best” methods in their respective national context. A Recommended Practice Manual (RPM) is a handbook that describes a collection of proven good methods for performing different statistical operations and their respective attributes. The purpose is to help survey managers to choose among the RPs those that are most suitable for use in their surveys in order to ensure quality. The reason for providing a set of good practices is that it is difficult to define the best methods or standards for statistical methodology at European level. In the area ”Data validation methods and procedures during the statistical data production process”, editing and imputation represents a relevant data processing stage mainly due to its potential strong impact on final data as well as on costs and timeliness: a better quality of processes and produced statistical information can be obtained more efficiently by adopting the most appropriate practices for data editing and imputation. Therefore, developing Recommended Practices for editing and imputation is considered an important task. The RPM presented in this handbook focuses on cross-sectional business surveys. Developing a RPM for editing and imputation in this specific statistical area is considered a priority for a number of reasons: • since important economic decisions are taken on the basis of information collected by NSIs, if data with errors or with missing values are not appropriately dealt with, then the quality of those decisions can be jeopardized. For this reason, the best practices and methodologies for the detection of errors and the imputation of missing and erroneous information for the different iii iv types of data in the Community ought to be used in the ESS; • particularly in business surveys, there is a large heterogeneity in practices and methods used for editing and imputation not only at NSIs level, but also at European level. For this reason, comparisons of data from different statistical institutions in the ESS may be difficult; • particularly in business surveys, editing and imputation is recognized as one of the most time and resources-consuming survey processes. Standardizing the editing and imputation process is expected to generate less expensive statistics and to shorten the time from data collection to publication of results while maintaining the quality of results; • surveys conducted by NSIs, and in particular business surveys, are generally characterized by a lack of documentation concerning editing and imputation strategies, methods and practices. In this area, there is a strong need for performing systematic studies and for developing common practical and methodological frameworks. The present RPM was developed in the framework of the European Project ”Recommended Practices for Editing and Imputation in Cross-Sectional Business Surveys (EDIMBUS)”, carried out in 2006/07 with partial financial support of Eurostat. The project was coordinated by the Italian National Sta- tistical Institute (ISTAT) and involved the participation, as partners, of the Centraal Bureau voor de Statistiek Netherlands (CBS), and the Swiss Federal Statistical Office (SFSO). Aim of the Recommended Practices Manual EDIMBUS Developing and disseminating the RPM in the area of business statistics is expected to contribute to the development of a common practical and methodological framework for editing and imputation applications. The RPM represents a guide for survey managers for the design, the implementation, the testing and the documentation of editing and imputation activities in cross-sectional business surveys at both European and NSIs levels. At the European level, the handbook will facilitate the harmonization of statistical data production, thus contributing to the improvement of the comparability of the data produced in the ESS. At single NSIs level, the RPM aims to reduce the heterogeneity of methods and practices for editing and imputation and to improve and standardize the procedures adopted for data editing and im- putation. At individual survey level, the handbook can represent a support for survey managers in the adoption of the appropriate methodology in each specific context. Furthermore, the handbook is meant to disseminate a systematic approach for testing new editing and imputation processes or editing and imputation processes already in use, but that have not been appropriately tested before. Researchers and survey managers can find in the handbook a guide for planning, testing, evaluating and documenting an editing and imputation process. The RPM can also be used as a checklist for documenting and evaluating the effects on the final results of the performed editing and imputation activities. In general, the RPM is expected to contribute to the improvement of the quality and cost-effectiveness of the editing and imputation process, and the timeliness of the publication of the results. It has to be underlined that the reader of the RPM is assumed to have some basic theoretical and practical knowledge on editing and imputation in cross-sectional business surveys. In effect, the good practices and recommendations given in the manual will not all apply in the same way to all survey contexts: their relevance and actual applicability in each specific context are to be carefully evaluated by subject matter experts and designers of the editing and imputation process, taking into account the survey objectives, organization and constraints. ©ISTAT, CBS, SFSO, EUROSTAT v Though the focus of this RPM is on cross-sectional business surveys, many of its elements can be extended to longitudinal business surveys and possibly to some cross-sectional and longitudinal household surveys. Organization of the volume The structure of the handbook follows a general prototype process of editing and imputation in cross-sectional business surveys. Chapter 1 starts with the description of the general framework, with the embedding of the E&I process in the whole survey production process. Quality management aspects are also discussed. Chapter 2 is dedicated to the discussion of how to design and control an editing and imputation process in the area of business statistics. It illustrates the basic concepts, the overall criteria and the key elements to be taken into account when performing these activities. Furthermore, it also describes methods and approaches for testing and tuning editing and imputation methods: these activities allow survey managers to evaluate and monitor their own editing and imputation processes for their continuous improvement. Chapter 3 contains the description of the main types of errors which usually corrupt economic survey data. For each error type, the error detection methods which can be used to identify it are described. The methods which can be used to treat the errors once they have been identified are described in Chapter 4. Chapters 3 and 4 follow a similar structure. First of all, an overview of each method is provided. Then, for each method, the advantages and limitations relating to its use, as well as the most appropriate application context(s) are pointed out. Finally, recommendations are provided on the use of the method. Recommendations are provided in the form of a checklist on how to efficiently use the specific method. In