The Caribbean. the Small Size of Caribbean Countries Makes Anonymization Relatively More Difficult and Standard Methods Are Not Always Directly Applicable
Total Page:16
File Type:pdf, Size:1020Kb
ISSN 1727-9917 eclac subregional studies and headquarters SERIES PERSPECTIves for the caribbean Dissemination of Caribbean census microdata to researchers Including an experiment in the anonymization of microdata for Grenada and Trinidad and Tobago Francis Jones Kristin Fox 149 49 Dissemination of Caribbean census microdata to researchers Including an experiment in the anonymization of microdata for Grenada and Trinidad and Tobago Francis Jones Kristin Fox 2 This document has been prepared by Francis Jones, Population Affairs Officer, of the Statistics and Social Development Unit of the subregional headquarters for the Caribbean of the Economic Commission for Latin America and the Caribbean (ECLAC), and Kristin Fox, consultant (formerly Databank Manager, the Derek Gordon Databank, University of the West Indies). The views expressed in this document, which has been reproduced without formal editing, are those of the authors and do not necessarily reflect the views of the Organization. United Nations publication ISSN 1727-9917 LC/L.4134 LC/CAR/L.486 Copyright © United Nations, February 2016. All rights reserved. Printed at United Nations, Santiago, Chile S.15-01381 Member States and their governmental institutions may reproduce this work without prior authorization, but are requested to mention the source and inform the United Nations of such reproduction. 2 ECLAC – Studies and Perspectives Series – The Caribbean – No. 49 Dissemination of Caribbean census microdata... Contents Abstract ..................................................................................................................................................... 5 Introduction ............................................................................................................................................... 7 I. Dissemination of census microdata ............................................................................................. 11 A. Technical disclosure control methods for census data ........................................................... 11 1. Disclosure control concepts and scenarios ..................................................................... 11 2. Analysis of disclosure risk and methods of disclosure control ...................................... 12 B. Administrative arrangements for access to census data ......................................................... 14 1. Public use files ............................................................................................................... 15 2. Licensed use files ........................................................................................................... 16 3. Secure data laboratories ................................................................................................. 16 4. Remote access facilities ................................................................................................. 17 5. Data archives .................................................................................................................. 17 C. The legal context for release of microdata ............................................................................. 18 II. The creation of microdata release files for Grenada and Trinidad and Tobago ................... 21 A. Caribbean census datasets ...................................................................................................... 21 B. The creation of microdata release files .................................................................................. 22 1. Removal of direct identifying variables ......................................................................... 22 2. Sampling of records ....................................................................................................... 22 3. Analysis of disclosure risk, the sampling fraction, and recoding of indirect identifying variables ..................................................................................... 23 4. Data swapping ................................................................................................................ 29 5. Recoding of non-identifying variables ........................................................................... 30 C. Release conditions: licensed or public use ............................................................................. 33 III. Discussion: the dissemination of Caribbean census microdata ................................................ 35 A. Demand for Caribbean census microdata ............................................................................... 35 B. The utility of small samples ................................................................................................... 36 C. Modes of dissemination ......................................................................................................... 37 3 ECLAC – Studies and Perspectives Series – The Caribbean – No. 49 Dissemination of Caribbean census microdata... IV. Conclusions ................................................................................................................................... 39 Bibliography ............................................................................................................................................ 43 Annex ....................................................................................................................................................... 45 Studies and Perspectives Series: issues published ................................................................................ 51 Tables TABLE 1 MEASURES OF DISCLOSURE RISK FOR A 10 PER CENT SAMPLE OF RECORDS FROM THE GRENADA 2011 CENSUS (9 825 PERSONS) .............................................. 25 TABLE 2 MEASURES OF DISCLOSURE RISK FOR SAMPLES OF ANONYMIZED RECORDS FROM THE GRENADA 2011 CENSUS .......................................................... 26 TABLE 3 MEASURES OF DISCLOSURE RISK FOR SAMPLES OF ANONYMIZED RECORDS FROM THE TRINIDAD AND TOBAGO 2011 CENSUS ............................... 28 TABLE 4 VARIABLES TO BE REMOVED FROM THE MICRODATA FILES .............................. 31 TABLE 5 RECODING OF VARIABLES FOR ANONYMIZATION, GRENADA 2011 CENSUS .................................................................................................. 31 TABLE 6 RECODING OF VARIABLES FOR ANONYMIZATION, TRINIDAD AND TOBAGO 2011 CENSUS ....................................................................... 32 Figures FIGURE 1 NUMBER OF POPULATION UNIQUES IN SAMPLES OF RECORDS FROM THE 2011 CENSUS OF GRENADA WITH RESPECT TO SELECTED SETS OF KEY VARIABLES ......................................................................................................... 27 FIGURE 2 NUMBER OF POPULATION UNIQUES IN SAMPLES OF RECORDS FROM THE 2011 CENSUS OF TRINIDAD AND TOBAGO WITH RESPECT TO SELECTED SETS OF KEY VARIABLES .................................................. 29 FIGURE 3 SAMPLE SIZES FOR 10 AND 20 PERCENT SAMPLES OF CENSUS RECORDS COMPARED WITH SAMPLE SIZES OF TYPICAL HOUSEHOLD SURVEYS ............. 37 FIGURE A.1 THE NUMBER OF POPULATION UNIQUES IN A 20 PERCENT SAMPLE OF RECORDS FROM THE 2011 CENSUS OF GRENADA CALCULATED USING DIFFERENT TREATMENTS OF MISSING DATA.............................................. 50 4 ECLAC – Studies and Perspectives Series – The Caribbean – No. 49 Dissemination of Caribbean census microdata... Abstract Caribbean census microdata are not easily accessible to researchers. Although there are well-established and commonly used procedures technical, administrative and legal which are used to disseminate anonymized census microdata to researchers, they have not been widely used in the Caribbean. The small size of Caribbean countries makes anonymization relatively more difficult and standard methods are not always directly applicable. This study reviews commonly used methods of disseminating census microdata and considers their applicability to the Caribbean. It demonstrates the application of statistical disclosure control methods using the census datasets of Grenada and Trinidad and Tobago and considers various possible designs of microdata release file in terms of disclosure risk and utility to researchers. It then considers how various forms of microdata dissemination: public use files, licensed use files, remote data access and secure data laboratories could be used to disseminate census microdata. It concludes that there is scope for a substantial expansion of access to Caribbean census microdata and that through collaboration with international organisations and data archives, this can be achieved with relatively little burden on statistical offices. 5 ECLAC – Studies and Perspectives Series – The Caribbean – No. 49 Dissemination of Caribbean census microdata... Introduction Over the last twenty five years, statistical offices worldwide have increasingly sought to meet the demand from researchers for greater access to statistical microdata. Statistical disclosure control methods have been developed to enable statisticians to release microdata in a controlled way which protects the privacy and statistical confidentiality of individuals and other entities. There has been a substantial growth in the volume of microdata made available to researchers in universities and other organizations. Census microdata is among the most useful to social researchers because of the range of social and demographic information collected in censuses, the information which is available about small population subgroups, and the