How to Download and Load CSO Data Into Postgresql
Total Page:16
File Type:pdf, Size:1020Kb
Lab 4 DT786 Getting CSO data into R
In this lab we will cover: 1) Describe data.frame and SpatialPolygonsDataFrame objects. 2) How to load shape files for the Dublin Electoral Division (dublin.eds). 3) Adding an additional column to a data.frame. 4) Saving the changes in a shape file. 5) Plot the contents of a shape file 6) Downloading education and car ownership data from CSO. 7) Adding the education data from the CSO to R data.frame. 8) Adding the car ownership data from the CSO to R data.frame.
First we must load the required packages: library(spdep) library(maptools) library(RColorBrewer) library(classInt)
1.1) In R data frames (data.frame) can take several vectors of different types and store them in the same variable. The vectors can be of all different types. For example, a data frame may contain many lists, and each list might be a list of factors, strings, or numbers. DFs are tightly coupled collections of variables which share many of the properties of matrices and of lists, used as the fundamental data structure by most of R's modeling software. There are different ways to create and manipulate data frames. Here are some examples. L3 <- LETTERS[1:3] df1 <- data.frame(cbind(x=1, y=1:10), fac=sample(L3, 10, replace=TRUE)) The same with automatic column names: df2 <- data.frame(cbind( 1, 1:10),sample(L3, 10, replace=TRUE)) is.data.frame(df1) The data.frame structure is used as part of spatial classes such as SpatialPolygonsDataFrame and SpatialPointsDataFrame to hold attributes of spatial objects
A SpatialPolygonsDataFrame is made up of a data.frame and an object of class SpatialPolygons, which is a list of objects of class Polygons, which is made up of a list with Polygon class objects. See lab3 on examining the structure of spatial objects.
There is also a SpatialPointsDataFrame which is trructured as follows 2) How to load shape files for the Dublin Electoral Division (dublin.eds). The Dublin Electoral Division data was exported from PostgreSQL as a shape file. It contains al the geometric and some of the attributes (MALE1_1 and FEMALE1_1) data for Dublin EDs. The file should be copied from student distrib to C:\My-R-Dir. The file can be loaded into R with the command: dublin.eds <- readShapePoly("C:\\My-R-Dir\\dublinEds.shp") Note this is the same data as used on the Spatial Databases course but in R it is called dublin.eds. Examine dublin.eds names(dublin.eds) is(dublin.eds) is(dublin.eds$MALE1_1) dublin.eds$FEMALE1_1 coordinates(dublin.eds) getClass(dublin.eds) slotNames(dublin.eds) #Examine slots slotNames(dublin.eds) dublin.eds@data dublin.eds@polygons
3) Add a new column called POP to dublin.eds to store the sume of male and female populations for each ED. dublin.eds@data$POP <- dublin.eds@data$MALE1_1 + dublin.eds@data$FEMALE1_1
4) To save the above changes in a file set called dubeds1. details <- paste("C:\\My-R-Dir", "dubeds1", sep="/") writePolyShape(dublin.eds,details)
A new file will be created in your current working directory C:/My-R-Dir called dubeds1.shp. This new file will have an population column calculated in the current R session.
5) Plot the contents of dubeds1.shp
Load the file that you saved from step 4. It should contain the data on Dublin ED and a total population: dublin.eds <- readShapePoly("C:\\My-R-Dir\\dubeds1.shp")
Load a colour library library(RColorBrewer) Set some colours and pop8 <- brewer.pal(8,'Set2') Use ranges for thematic colouring spplot(dublin.eds, "POP", col.regions=pop8, at=c(500,1000,2000,3000,4000,5000,6000,7000,8000),main='Dublin Population') You should get a map as below: 6) Downloading education and car ownership data from CSO.
The CSO census for 2006 consists of over 70 themes. Each of these themes contains several components. We will use three steps to get CSO data into R. 1) Download a particular topic for a given area. We will download the education and car ownership data for the Dublin Electoral Divisions. 2) Load the new topic into a data.frame specifically designed for that topic. 3) Move the new topic to the Electoral Division (dublin.eds) table that contains all topics and the geometry of each ED.
Download Education information
Go to the CSO Census 2006 reports page for loading into PostgreSQL: http://census.cso.ie/census/ReportFolders/ReportFolders.aspx
Follow the following screen shots: Scroll to Dublin City
Select Dublin City and download as a CSV file.
Set the dimension order.
Save the file as DublinEducation.csv in C:\My-R-Dir Edit DublinEducation.csv in TextPad or Notepad and delete the header: "Theme 10 - 4 : Persons aged 15 and over by sex, principal economic status and highest level of education completed, 2002"
Each generic theme has one or more components. It is the theme components that actually get downloaded. We are interests only in the educational parts of Theme 10- 4. Note the data from he CSO does not contain geometry. Selected columns from the downloaded table will later be loaded into the big ED frame dublin.eds. You must have dublin.eds loaded into R. Now in R read in the file: dubeduc=read.csv(file="DublinEducation.csv",header=TRUE)
> names(dubeduc) > names(dublin.eds) names(dubeduc) names(dublin.eds) "Geographic.Area" "SAPS_LABEL" "No.formal.education" "FORMAL_EDU" "Primary.education" "PRIMARY_ED" "Lower.secondary.education" "LOWER_SECO" "Upper.secondary" "UPPER_SECO" "Technical.or.vocational.qualification" "TECHNICAL1" "Upper.secondary.and.technical.or.vocational" "UPPER_S_01" "Non.degree" "NON_DEGREE" "Primary.degree" "PRIMARY_DE" “Professional.qualification..degree.status." "PROFESSION" "Both.degree.and.professional.qualification" "BOTH_DEGRE" "Post.graduate.certificate.or.diploma" "POSTGRADUA" "Post.graduate.degree..masters." "POSTGRA_01" "Doctorate..PhD." "DOCTORATE1" "Not.stated" "Total"
Add the CSO data to dublin.eds as follows
dublin.eds@data$FORMAL_EDU <- dubeduc$"No.formal.education" dublin.eds@data$ PRIMARY_ED <- dubeduc$"Primary.education" dublin.eds@data$PROFESSION <- dubeduc$“Professional.qualification..degree.status."
Check that the has been correctly updates
sideBySide <-paste(dublin.eds@data$SAPS_LABEL, dublin.eds@data$PROFESSION, dubeduc$"Geographic.Area", dubeduc$"Professional.qualification..degree.status.")
writeLines(sideBySide) You do not have to check the rest of the data. Add the rest of the education data in a similar fashion. You can save the newley entered data as follows: details <- paste("C:\\My-R-Dir", "dubeds2", sep="/") writePolyShape(dublin.eds,details) Print the map of Professionals Load a colour library if necessary library(RColorBrewer)
Set some colours and pop8 <- brewer.pal(8,'Set2')
Get the ranges for thematic colouring lower = min(dublin.eds@data$PROFESSION) upper = max(dublin.eds@data$PROFESSION) intrv = (lower+upper)/8
Now plot the map with the above intervals: spplot(dublin.eds, "PROFESSION", col.regions=pop8, at=c(intrv, intrv*2, intrv*3, intrv*4, intrv*5, intrv*6, intrv*7, intrv*8),main='Dublin Professionals')
You should get a map depicting the number of professionally qualified per ED. How would you make a map displaying the density of professionals per ED.
The following updates all the dublin.eds education fields with the CSO values. dublin.eds@data$"FORMAL_EDU" <- dubeduc$No.formal.education dublin.eds@data$"PRIMARY_ED" <- dubeduc$Primary.education dublin.eds@data$"LOWER_SECO" <- dubeduc$Lower.secondary.education dublin.eds@data$"UPPER_SECO" <- dubeduc$Upper.secondary dublin.eds@data$"TECHNICAL1" <- dubeduc$Technical.or.vocational.qualification dublin.eds@data$"UPPER_S_01" <- dubeduc$Upper.secondary.and.technical.or.vocational dublin.eds@data$"NON_DEGREE" <- dubeduc$Non.degree dublin.eds@data$"PRIMARY_DE" <- dubeduc$Primary.degree dublin.eds@data$"PROFESSION" <- dubeduc$"Professional.qualification..degree.status." dublin.eds@data$"BOTH_DEGRE" <- dubeduc$"Both.degree.and.professional.qualification" dublin.eds@data$"POSTGRADUA" <- dubeduc$"Post.graduate.certificate.or.diploma" dublin.eds@data$"POSTGRA_01" <- dubeduc$"Post.graduate.degree..masters." dublin.eds@data$"DOCTORATE1" <- dubeduc$"Doctorate..PhD." dublin.eds@data$"NOT_STATED" <- dubeduc$"Not.stated"
It is a good idea to save your data afer changes. It is a good idea to increment file version as follows: writePolyShape(dublin.eds,C:\\My-R-Dir\\", "dubeds3.shp")
Add the car data called: "Theme 15 - 1 : Number of households with cars, 2006" in a similar fashion.