Mosaic Data Files Harmonized Variables Version
Total Page:16
File Type:pdf, Size:1020Kb
Mosaic data files: Documentation of harmonized variables, Version 1.8 Dr. Siegfried Gruber Max Planck Institute for Demographic Research Laboratory of Historical Demography Konrad-Zuse-Straße 1, 18057 Rostock - Germany +49 381 2081-248 mailto:[email protected] http://www.demogr.mpg.de http://www.censusmosaic.org Sep. 12th , 2013 Table of Contents About Mosaic data files and their distribution ........................................... 1 Overview of variables ................................................................................. 2 General overview ....................................................................................................... 2 Table 1: Harmonized variables of Mosaic data files by data level and alphabetic order ........................................................................................................................... 3 Table 2: Order of harmonized variables in Mosaic data files .................................... 5 Variable description in alphabetic order ..................................................... 6 age (age) ..................................................................................................................... 6 country code (country) ............................................................................................... 7 enumeration identification number (id_enum) .......................................................... 9 first name (fname) .................................................................................................... 12 group quarter status (gq) .......................................................................................... 13 household identification number (id_hhold) ............................................................ 14 household size (hhsize) ............................................................................................ 15 household weight (hhwt) .......................................................................................... 16 last name (lname) ..................................................................................................... 17 literacy (lit) ............................................................................................................... 18 marital status (marst) ................................................................................................ 19 occupational code in OCCHISCO (occhisco) .......................................................... 20 occupational title (occupat) ...................................................................................... 35 person identification number (id_pers) .................................................................... 36 person weight (perwt) .............................................................................................. 37 place code (place) .................................................................................................... 38 presence of person at enumeration (presence) ......................................................... 50 quality flag age (qage) ............................................................................................. 51 quality flag household (qhhold) ............................................................................... 52 quality flag household relationship code (qrelate) ................................................... 53 quality flag marital status (qmarst) .......................................................................... 55 quality flag sex (qsex) .............................................................................................. 56 region (region) ......................................................................................................... 57 relationship to household head (relate) .................................................................... 60 relationship to household head original (relateor) ................................................... 63 religion (relig) .......................................................................................................... 64 sex (sex) ................................................................................................................... 65 type of enumeration (enumtype) .............................................................................. 65 urban-rural status (urban) ......................................................................................... 67 year of enumeration (year) ....................................................................................... 68 About Mosaic data files and their distribution The Mosaic project distributes European historical census microdata. Every interested researcher can download data for his/her own research free of charge. The only condition is that you register as a user with the Mosaic project. Please keep in mind that most data released by the Mosaic project is not representative of a region or country, especially in cases where the data stems from research concentrating on one specific settlement. The following requirements exist for a data file to be included in the Mosaic project: • The data source should list individual persons, preferably by name • The data source should list all persons of a settlement or area, not only household heads, men, or adult people • The data source should list individuals by residence units (houses, hearths, domestic groups, or households) Characteristics that should be either given explicitly or possible to infer: • Age • Sex • Relationship to household head • Marital status • Place of enumeration • Year of enumeration • (first name) • (last name) Harmonized Mosaic data files are distributed via the data section of the Mosaic website ( http://www.censusmosaic.org/cgi-bin/index_data.plx ). There is a zipped file available for each dataset, which contains three files: • a CSV file with codes for each variable • a CSV file with value labels for each variable • a readme file with the citation you must use for this dataset and a short description of the dataset The CSV file with codes for each variable can be imported into any software able to read CSV files. Scripts are available for importing CSV files into SPSS and R. The CSV file with value labels for each variable can be used by persons who do not want to use statistical software for analyzing the data, but prefer to use a spreadsheet program like MSExcel or OpenOffice.org Calc. Note: These files use Unicode (UTF-8) for character encoding, the “.” as the decimal separator, the semicolon as delimiter of columns, and lists the variables in the first line of the file. 1 Overview of variables General overview This file contains the documentation of the 30 harmonized variables of the Mosaic data files. These consist of variables of three different levels: • 4 variables defining the data set, • 8 variables defining the household, and • 18 variables defining the person. Variables defining the data set: These are variables which are the same for all cases of one data set. This information is provided either as part of the form used for the enumeration, or as information on material accompanying the enumeration, or as information from other sources. Variables defining the household: These are variables which are the same for all persons of one household. Some of them are the same for many or even all persons of one data set (e.g. region). This information is partly provided by the enumeration form, but sometimes derived from other sources. It includes also one quality flag variable. Variables defining the person: These are variables which pertain to individual persons. This information is generally provided by the enumeration itself, but some variables have been constructed and added, like four quality flag variables. The following table gives basic information about all variables: • variable name • variable label • number of digits or characters • kind of data (numeric or alphabetic data) • origin of data (original or added data) • comparison to IPUMS-International • comparison to NAPP • data level (data set, household, or person) In general most variables are designed according to their IPUMS-International (https://international.ipums.org/international/ ) and/or NAPP (http://www.nappdata.org/napp/citation.shtml ) counterparts and can therefore be used easily in comparative research with data from these databases. 2 Table 1: Harmonized variables of Mosaic data files by data level and alphabetic order variable variable label no. of digits or kind of origin of IPUMS- NAPP data level name characters data data International country country code 3 numeric original or CNTRY: 3 digits CNTRY: 3 digits data set added enumtype type of enumeration 2 numeric added not existing not existing data set id_enum enumeration 6 numeric added SAMPLE: 4 digits SAMPLE: 4 digits data set identification number year year of enumeration 4 numeric original or YEAR: 4 digits YEAR: 4 digits data set added gq group quarter status 1 numeric added GQ: 2 digits GQ: 1 digit household hhsize household size 4 numeric computed PERSONS: 3 digits NUMPERHH: 4 digits household hhwt household weight 4 (2 decimal numeric computed WTHH: 8 digits (2 HHWT: 4 digits household places) decimal places) id_hhold household 14 numeric computed SERIAL: 10 digits SERIAL: 10 digits household identification (identification within (identification within number sample) sample) place place code 15 numeric original or many different many different household added variables variables qhhold quality