Data Introduction

Data Introduction

Outline Introduction Ingest Adding Additional Metadata Diagnosing Common Data Problems Summary Data Introduction M. Benno Blumenthal and John del Corral 5 June 2008 International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Adding Additional Metadata Diagnosing Common Data Problems Summary Introduction Ingest Uploading Excel File Adding Additional Metadata Independent Variables: Time and Locale Dependent Variables Diagnosing Common Data Problems Self Consistency Geographical Consistency Summary International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Adding Additional Metadata Diagnosing Common Data Problems Summary Abstract This is a short description of how to convert numbers in an excel file and a shapefile into a dataset that can be used for time and spatial analysis. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Adding Additional Metadata Diagnosing Common Data Problems Summary Goals I Want to analyze fields temporally and spatially I starting with Excel tables (possibly never analyzed) and shapefiles International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Adding Additional Metadata Diagnosing Common Data Problems Summary Plan of Action 1. reading excel tables and shapefiles 2. augment information in files 3. preliminary analysis to find and remove flaws International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Adding Additional Metadata Diagnosing Common Data Problems Summary Figure: Ethiopia Malaria Data International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Uploading Excel File Adding Additional Metadata Diagnosing Common Data Problems Summary Ingest I locate Excel file or shapefile I scan and display table structure I edit/enhance data descriptions International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Uploading Excel File Adding Additional Metadata Diagnosing Common Data Problems Summary Upload Excel File Figure: Screen asking for file to upload The Browse button gives one the opportunity of locating the file on one’s local machine. The Upload button transmits the file to the data library. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Uploading Excel File Adding Additional Metadata Diagnosing Common Data Problems Summary Response to Excel File upload Figure: Screen with results of excel file upload TheAdd Metadata button then lets one proceed to the next screen which allows adding additional information to the dataset. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Adding Additional Metadata International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Figure: Screen to edit metadata Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Screen to edit Metadata This screen allows adding additional information. Top of the page allows adding a description of the new dataset. Information for each column can be added as well: insuring that the columns are properly recognized as dates or numeric values is particularly useful. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Independent Variables: Time and Locale While ultimately we would like to consider the data as a function of time and locale, most likely the excel file has information about time and some kind of spatial entity (such as district or state). International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Time Best case scenario is that one of the columns is clearly time, i.e. formatted as an excel-standard date. We then simply designate the column as an independent variable, and the software extracts a sorted list of values as our time axis. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Precisely describing time I when speaking about time, usually imply a start and an end. i.e. January 2008 implies the entire month, where 1 January 2008 implies the entire day. I when analyzing data, usually need to know the interval corresponding to each data point. I for the sorts of data we have (days to weeks to months to years), some can be precisely specified in Excel, some cannot. Example: weekly sea surface temperature data . International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Specified Time I specify the time dependence I specify column interpretation and time connection International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Locale Best case scenario is that one of the columns in the excel is clearly an identifier for a spatial entity, i.e. each identifier is unique and corresponds to a value in the corresponding column in the shapefile. We then simply designate the column as an independent variable, and the software extracts a sorted list of values as our entity axis. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Specified Locale I specify the locale id (choice guided by the shapefile) I specify column interpretation and locale connection International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Dependent Variables Dependent variables are the data to be analyzed, and can be most simply provided as columns in the excel file. In that case the add Metadata screens lists the columns, and additional descriptive information can be added. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary Variables from multiple columns Sometimes variables are stored as multiple columns in the file (most frequently a different month in each column), in which case a pattern can be provided which properly combines the columns. The screen provides an opportunity to give such a pattern, and remove the corresponding combined columns from the dataset. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Independent Variables: Time and Locale Adding Additional Metadata Dependent Variables Diagnosing Common Data Problems Summary For example, consider the sample dataset 1993-2005 Madagascar Highlands incidence. On ingest, we find 26 columns, 12 for monthly cases, 12 for monthly incidence, year, and district. So in this case there is a clear column for locale, but no such column for time. So we define the time variable as going from Jan 1993 to Dec 2005 in steps of one month. We then translate the column names to english so that we can easily generate the month names from the time with our current software. The pattern that corresponds to incidence is then select incid_%b[T] as incid from data_central_highl_incidence WHERE district=’%s[district]’ AND year=%Y[T] where T is our time variable and we have used it to generate both the year and the month-part of the column name. International Research Institute M. Benno Blumenthal and John del Corral Data Introduction for Climate and Society Outline Introduction Ingest Self Consistency Adding Additional Metadata Geographical Consistency Diagnosing Common Data Problems Summary Diagnosing Common Data Problems Excel files created by hand over long periods of time that have never been analyzed are unlikely to have perfectly

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    35 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us