In These Notes We Will Work with SQL in an R Environment

######################################### Introduction to R # # SQL IN R # # Author: J. Priestley, Ph.D. # #########################################In these notes we will work with SQL in an R environment#lets bring in the Pennstate2 file...PS2<- read.csv ("C:\\Users\\Mommy\\Documents\\JENNIFER\\KENNESAW STATE WORK\\WEBSITE\\STAT4030\\DATA\\pennstate2.csv") head(PS2)#to get started in SQL lets load the sqldf package install.packages("sqldf") library (sqldf)#~~~~~~~~~~~~~~~~~~~~~~~~~~~~## code chunk 1 ## Basic SQL Queries ## Using Select, Limit, As ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~##to use SQL in R, you have to start each execution with the sqldf function#within the sqldf function, you are writing SQL code - not R code. #therefore, some of the logic/operators will be different#the asterisk operator in SQL represents ALL?sqldf sqldf('select * from PS2')#if we only want to retain Sex, Tattoo and Looks, we can do this using Select: sqldf('select Sex,Tattoo,Looks from PS2')#to limit the number of observations returned for analysis, we can use the "limit" clause: sqldf('select Sex,Tattoo,Looks from PS2 limit 10')#but...be aware that this is not a random sampling...its the first 10 obs#we can create new variables using existing columns using mathematical operators# and then return the new column using the AS keyword... sqldf('select Sex, ((HtChoice-Height)/Height)*100 as PCTDIFF from PS2') #cool...but this only went to the console...if we want to keep this for later analysis,#we need to create a new dataframe... PS3<- sqldf('select Sex, HtChoice, Height, ((HtChoice-Height)/Height)*100 as PCTDIFF from PS2') PS3#~~~~~~~~~~~~~~~~~~~~~~~~~~~~## code chunk 2 ## Basic SQL Queries ## Using Where, And, Or ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~##a few of the observations had HtChoice values of 2 - that does not make sense.#lets only select those observations where the HtChoice is greater than 60 or 5 feet: sqldf('select * from PS2 where HtChoice >=60')#we could also select just the males... sqldf('select * from PS2 where Sex = "Male"') sqldf('select Sex,Tattoo, NumPrces from PS2 where Sex="Male" AND Height >70') sqldf('select Sex,Tattoo, NumPrces from PS2 where Sex="Male" OR HtChoice >70')#~~~~~~~~~~~~~~~~~~~~~~~~~~~~~## code chunk 3 # # Basic SQL Queries ## Using Like, Group By ##~~~~~~~~~~~~~~~~~~~~~~~~~~~~~##the LIKE clause can be used to select rows using a pattern that occurs in the variable values#this clause can ONLY be used with character vectors - not numeric vectors sqldf ('select Sex, Tattoo, NumPrces from PS2 where Anypeirces like "No"')#another example of this would be code like this: sqldf ('select Sex, GPA, KSUID from KSU2 where KSUID like "0002%"')#This code will return rows where the KSUID value - which is a character variable# has values which begin with 0002 in the first 4 places#note that you can reverse this and use: sqldf ('select Sex, GPA, KSUID from KSU2 where KSUID like "%99"')#this will return KSUIDs with values which END in 99.#The Group By clause in sql performs aggregation - like an avg or count... sqldf('select Sex, count(Sex) N, avg(NumPrces) AVG_NumPrces, stdev(NumPrces) StdDev from PS2 group by Sex') #the idea here is that we are coding: select variable, count(number of observations for that variable) Name given to the column...from...group by...)

In These Notes We Will Work with SQL in an R Environment

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support