Statistical Data Editing

UNITED NATIONS STATISTICAL COMMISSION and ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS STATISTICAL STANDARDS AND STUDIES - No. 44 STATISTICAL DATA EDITING Volume No. 1 METHODS AND TECHNIQUES UNITED NATIONS New York and Geneva, 1994 III CONTENTS PREFACE A. REVIEW OF STATISTICAL DATA EDITING METHODS AND TECHNIQUES - An Introduction to the Data Editing Process Page 1 (Dania Ferguson, United States Department of Agriculture, National Agricultural Statistics Service) - A Review of the State of the Art in Automated Data Editing Page 10 and Imputation (Mark Pierzchala, United States Department of Agriculture, National Agricultural Statistics Service) - On the Need for Generalized Numeric and Imputation Systems Page 41 (Leopold Granquist, Statistics Sweden) - Evaluation of Data Editing Procedures: Results of a Simulation Approach Page 52 (Emiliano Garcia Rubio and Vicente Peirats Cuesta, National Statistical Institute of Spain) - A systematic approach to Automatic Edit and Imputation Page 69 (I. Fellegi, Statistics Canada and D. Holt, University of Southampton, United Kingdom) B. MACRO-EDITING PROCEDURES - Macro-Editing - A Review of Methods for Rationalizing the Editing of Page 109 Survey Data (Leopold Granquist, Statistics Sweden) - Macro-Editing - The Hidiroglou-Berthelot Method Page 125 (Eiwor Hoglund Davila, Statistics Sweden) - Macro-Editing - The Aggregate Method Page 135 (Leopold Granquist, Statistics Sweden) - Macro-Editing - The Top-Down Method Page 142 (Leopold Granquist, Statistics Sweden) IV C. IMPLEMENTATION OF DATA EDITING PROCEDURES - Data Editing in a Mixed DBMS Environment Page 146 (N.W.P. Cox/D.A. Croot, Statistics Canada) - Blaise - A New Approach to Computer Assisted Survey Processing Page 165 (D. Denteneer, J.G. Bethlehem, A.J. Hundepool & M.S. Schuerhoff, Central Bureau of Statistics of the Netherlands) - SAS Usage in Data Editing Page 174 (Dania Ferguson, United States Department of Agriculture, National Agricultural Statistics Service) - A Macro-Editing Application Developed in PC-SAS Page 180 (Klas Lindstrom, Statistics Sweden) V PREFACE Data editing methods and techniques may significantly influence the quality of statistical data as well as the cost efficiency of statistical production. The aim of this publication is to assist National Statistical Offices in their efforts to improve and economize their data editing processes. Different methods and techniques can be used in the various stages of the data editing process; that is in survey management, data capture, data review and data adjustment. Questions as to what methods and techniques to use for data review and data adjustment, as well as the iterative character of these two steps, are often very sensitive. The publication, therefore, particularly focuses on these issues. Part A -- Review of Statistical Data Editing Methods and Techniques -- presents a more detailed treatment of data editing methods and considerations pertaining to each. It describes some examples and experiences achieved in statistical practice. To complete the methodological part of the presented review, the Fellegi-Holt methodology, as a most frequently used approach, is presented here in its entirety. Part B -- Macro-editing Procedures -- is devoted to macro-editing methods as powerful tools for the rationalizing of the data review process through error detection at the aggregated level. Significant savings are achieved in some cases when implementing macro- instead of micro-editing procedures. The most advanced experiences in this respect are reported from Statistics Sweden . Another important aspect is an appropriate computerization of the data editing process. Part C -- Implementation of Data Editing Procedures -- deals with some of these features. Surprisingly few commercially developed software packages provide efficient facilities for statistical data editing. Taking into account a mixed technological environment, only SAS was reported. Tailor-made software systems prevail. Some experiences with SAS applications and individually prepared systems such as GEIS, DC2 (both prepared in Statistics Canada) and BLAISE (developed in the Netherlands Central Bureau of Statistics) are presented in this part of the publication An integral part of the publication is the Bibliography presented in Part D. The authors' intention in the preparation of this section was to spread knowledge on the already prepared and published studies and other materials dealing with statistical data editing methods and techniques. This publication is based on contributions of the group of experts working within the project on Statistical Data Editing in the programme of work of the Conference of European Statisticians. It was compiled and edited by the Statistical Division of the United Nations Economic Commission for Europe. This material represents an extensive voluntary effort on the parts of authors. The editors express their appreciation and thanks to all authors who contributed to the publication, as well as to all members of the UN Working Group on Statistical Data Editing whose joint efforts contributed significantly to the preparation of this publication. 1 AN INTRODUCTION TO THE DATA EDITING PROCESS by Dania P. Ferguson United States Department of Agriculture National Agricultural Statistics Service Abstract: A primer on the various data editing methodologies, the impact of their usage, available supporting software, and considerations when developing new software. I. INTRODUCTION The intention of this paper is to promote a better understanding of the various data editing methods and the impact of their usage as well as the software available to support the use of the various methodologies. It is hoped that this paper will serve as: - a brief overview of the most commonly used editing methodologies, - an aid to organize further reading about data editing systems, - a general description of the data editing process for use by designers and developers of generalized data editing systems. II. REVIEW OF THE PROCESS Data editing is defined as the process involving the review and adjustment of collected survey data. The purpose is to control the quality of the collected data. This process is divided into four (4) major sub-process areas. These areas are: - Survey Management - Data Capture - Data Review - Data Adjustment The rest of this section will describe each of the four (4) sub-processes. 1. Survey Management The survey management functions are: a) completeness checking, and b) quality control including audit trails and the gathering of cost data. These functions are administrative in nature. Completeness checking occurs at both survey and questionnaire levels. 2 At survey level, completeness checking ensures that all survey data have been collected. It is vitally important to account for all samples because sample counts are used in the data expansion procedures that take place during Summary. Therefore, changes in the sample count impact the expansion. A minimal completeness check compares the Sample count to the questionnaire count to insure that all samples are accounted for, even if no data were collected. In the case of a Census, the number of returned questionnaires are compared to the number of distributed questionnaires or to an estimated number of questionnaires expected to be returned. Questionnaire level completeness checking insures routing instructions have been followed. Questionnaires should be coded to specify whether the respondent was inaccessible or has refused, this information can be used in verification procedures. Survey management includes quality control of the data collection process and measures of the impact on the data by the data adjustment that occurs in sub-process No. 3 below. Survey management is a step in the quality control process that assures that the underlying statistical assumptions of a survey are not violated. "Long after methods of calculation are forgotten, the meaning of the principal statistical measures and the assumptions which condition their use should be maintained", (Neiswanger, 1947). Survey management functions are not data editing functions per se but, many of the functions require accounting and auditing information to be captured during the editing process. Thus, survey management must be integrated in the design of data editing systems. 2. Data Capture Data capture is the conversion of data to electronic media. The data may be key entered in either a heads down or heads up mode. a. Heads down data entry refers to data entry with no error detection occurring at the time of entry. High-speed data - entry personnel are used to key data in a "heads down" mode. Data entered in a heads down mode is often verified by re-keying the questionnaire and comparing the two keyed copies of the same questionnaire. b. Heads up data entry refers to data entry with a review at time of entry. Heads up data entry requires subject matter knowledge by the individuals entering the data. Data entry is slower, but data review/adjustment is reduced since simple inconsistencies in responses are found earlier in the survey process. This mode is specially effective when the interviewer or respondent enter data during the interview. This is known as Computer Assisted Interviewing which is explained in more detail below. Data may be captured by many automated methods without traditional key entry. As technology advances, many more tools will become available for data capture. One popular tool is the touch-tone telephone key-pad with synthesized

Statistical Data Editing

Chapter 4 Research Methodology

Evaluation Design Report for the National Cancer Institute's

Methods Used in the Development of the Global Roads Open Access Data Set (Groads), Version 1

DHS Data Editing and Imputation Trevor Croft

Data Management, Analysis Tools, and Analysis Mechanics

An Introduction to the Data Editing Process

Handbook on Precision Requirements and Variance Estimation for ESS Households Surveys

NCRM Social Sciences Research Methods Typology (2014)

Assessing Sample Bias and Establishing Standardized

STANDARDS and GUIDELINES for STATISTICAL SURVEYS September 2006

Guide to Good Statistical Practice in the Transportation Field

Strengthening Federal Statistics