Mosaic Data Files Harmonized Variables Version 1.9A
Total Page:16
File Type:pdf, Size:1020Kb
Mosaic data files: Documentation of harmonized variables, Version 1.9 Dr. Siegfried Gruber University of Graz Southeast European History and Anthropology Mozartgasse 3, 8010 Graz - Austria +43 316 380-2377 email: [email protected] http://geschichte.uni-graz.at/de/suedost/ http://www.censusmosaic.org April, 7th , 2015 Table of Contents About Mosaic data files and their distribution ........................................... 1 Overview of variables ................................................................................. 3 General overview ....................................................................................................... 3 Table 1: Harmonized variables of Mosaic data files by data level and alphabetic order ........................................................................................................................... 4 Table 2: Order of harmonized variables in Mosaic data files .................................... 6 Variable description in alphabetic order ..................................................... 7 age (age) ..................................................................................................................... 7 country code (country) ............................................................................................... 8 enumeration identification number (id_enum) ........................................................ 10 first name (fname) .................................................................................................... 13 group quarter status (gq) .......................................................................................... 14 household identification number (id_hhold) ............................................................ 15 household size (hhsize) ............................................................................................ 16 household weight (hhwt) .......................................................................................... 17 last name (lname) ..................................................................................................... 18 literacy (lit) ............................................................................................................... 19 marital status (marst) ................................................................................................ 20 occupational code in OCCHISCO (occhisco) .......................................................... 21 occupational title (occupat) ...................................................................................... 36 person identification number (id_pers) .................................................................... 37 person weight (perwt) .............................................................................................. 38 place code (place) .................................................................................................... 39 presence of person at enumeration (presence) ......................................................... 61 quality flag age (qage) ............................................................................................. 62 quality flag household (qhhold) ............................................................................... 63 quality flag household relationship code (qrelate) ................................................... 64 quality flag marital status (qmarst) .......................................................................... 67 quality flag sex (qsex) .............................................................................................. 68 region (region) ......................................................................................................... 69 relationship to household head (relate) .................................................................... 73 relationship to household head original (relateor) ................................................... 76 religion (relig) .......................................................................................................... 77 sex (sex) ................................................................................................................... 78 type of enumeration (enumtype) .............................................................................. 79 urban-rural status (urban) ......................................................................................... 80 year of enumeration (year) ....................................................................................... 81 About Mosaic data files and their distribution The Mosaic project distributes European historical census microdata. Every interested researcher can download data for his/her own research free of charge. The only condition is that you register as a user with the Mosaic project and cite the data properly. Please keep in mind that most data released by the Mosaic project is not representative of a region or country, especially in cases where the data stems from research concentrating on one specific settlement. The following requirements exist for a data file to be included in the Mosaic project: • The data source should list individual persons, preferably by name • The data source should list all persons of a settlement or area, not only household heads, men, or adult people • The data source should list individuals by residence units (houses, hearths, domestic groups, or households) Characteristics that should be either given explicitly or possible to infer: • Age • Sex • Relationship to household head • Marital status • Place of enumeration • Year of enumeration • (first name) • (last name) Harmonized Mosaic data files are distributed via the data section of the Mosaic website ( http://www.censusmosaic.org/cgi-bin/index_data.plx ). There is a zipped file available for each dataset, which contains three files: • a CSV file with codes for each variable • a CSV file with value labels for each variable • a readme file with the citation you must use for this dataset and a short description of the dataset The CSV file with codes for each variable can be imported into any software able to read CSV files. Scripts are available for importing CSV files into SPSS and R. The CSV file with value labels for each variable can be used by persons who do not want to use statistical software for analyzing the data, but prefer to use a spreadsheet program like MSExcel or LibreOffice.org Calc. Note: These files use Unicode (UTF-8) for character encoding, the “.” as the decimal separator, the semicolon as delimiter of columns, and lists the variables in the first line of the file. If your software uses the “,” as the decimal separator, you have to open the CSV file with an editor and replace the “.” with “,” before importing the file. 1 Data checking Basic checks have been performed to ensure that Mosaic data files do not contain obvious errors. Errors could happen at different stages of producing these data files: • Enumeration of the census (including a possible copying of the data) • Transcribing the census • Coding the census Additional problems arise from missing data or illegible data. The basic checks can only check about impossible combinations of persons’ characteristics (like grandparents being younger than their grandchildren), but not about possible other errors. If there are changes made to the variables id_hhold, relate, age, sex, or marst, they will be flagged in the variables qhhold, qrelate, qage, qsex, and qmarst. The following checks have been performed: All values must be included in the list of acceptable values. Every person and household needs a unique identification number. Every household includes exactly one household head. The only exception are incomplete households. The number of married men and women has to match each other. Mosaic census files do not include societies which allowed same-sex marriages and only a few societies which allowed polygamy. Couples must have the correct relationship codes. Spouses and children of the household head must be really their spouses and children and not spouses or children of other members of the household. Age and marital status are checked for children being married. Marital status and relationship codes have to match each other. Age and relationship to the household head have to match each other. Geo-referencing of places All places are coded and geo-referenced. The table of places with their latitude and longitude will be made available in future. 2 Overview of variables General overview This file contains the documentation of the 30 harmonized variables of the Mosaic data files. These consist of variables of three different levels: • 4 variables defining the data set, • 8 variables defining the household, and • 18 variables defining the person. Variables defining the data set: These are variables which are the same for all cases of one data set. This information is provided either as part of the form used for the enumeration, or as information on material accompanying the enumeration, or as information from other sources. Variables defining the household: These are variables which are the same for all persons of one household. Some of them are the same for many or even all persons of one data set (e.g. region). This information is partly provided by the enumeration form, but sometimes derived from other sources. It includes also one quality flag variable. Variables defining the person: These are variables which pertain to individual persons. This information