<<

Statistics with R, 2012, 1. Exercises (factors, data-frames)

1. The data structure factor is used for representing categorical data, i.e., factors. Create a factor variable of places of residence. (2 x , , , , , Salo). (First as a character variable and then to a factor variable)

- Use functions c() and factor().

2. Now you can test the function levels0.

3. We can also form a factor variable from a numeric variable by using a function called cut(). Lets draw a random sample of size 20 from the uniform distribution (0,1) and let's classify them into four classes to form a new factor variable f.

- Use also functions runif() and check the levels with levels().

4. Change the levels of f to small, medium, large and huge.

- The code is of the form: levels(f) <- c(“small”,..)

5. Put the original values and the classified values together to form a data frame.

- Use function data.frame().

6. Change the levels of the factor, by combining large and huge classes.

- The code is of the form: levels(f) <- c(…,”large”,”large”)

7. A factor variable can be also generated by using the function called gl(). Form a factor variable with four classes using this function.

- Calling gl(), first argument is nr of levels, second is nr of replications.

8. A data frame is a list, whose components are vectors of the same length. The component vectors can be numeric vectors, logical vectors, character vectors, or factors. Form a data frame, which consists of the following information about planets:

Planet Category Mass Rings Mercury Terrestrial 0.06 No Venus Terrestrial 0.82 No Earth Terrestrial 1.00 No Mars Terrestrial 0.11 No Jupiter Gas giant 317.80 Yes Saturn Gas giant 95.20 Yes Uranus Gas giant 14.60 Yes Neptune Gas giant 11.20 Yes - So, use functions c() and data.frame().

9. Use function str() to what you have in the data frame. What classes are the variables? Compile the data frame again, now using first options(stringsAsFactors=FALSE). How did the classes change? Give new column names for the data frame.

- The code is of the form: names(data frame) <- c(“name1”,..)

10. Add a column 'diameter' to the dataset, which consists of values (0.382, 0.949, 1, 0.532, 11.209, 9.449, 4.007, 3.883).

- Use function c() and e.g. data.frame().

11. Components of a data frame can be accessed the same way as those of lists. Let's practice accessing data frame components using $, '[ ]' and '[[ ]]'.

- For example, for a data frame called df: df[3] gives the third column vector.

12. Form a subset of the created data frame using function called subset(). Select only those rows where Mass < 1 .00.

- The code is of the form: subset(data frame, logical expression)

Statisitcs with R, autumn 2012

2. Exercises (lists, writing to and reading from file , apply()-function)

1. R has a set of built in datasets. You can explore them by the command data(). Let's take dataset called airquality under inspection.

2. First determine how many variables there are in the dataset and what classes are they. -str()

3. Save the data frame airquality as a txt-file to for example to your own R- folder using the function write.table(). Make use of the option "sep". 4. Now, using read.table() function read the .txt-data back to R. 5. If you are working with only one dataset at a time, it's possible to attach the dataset to R, so you won't have to write the name of the dataset every time. Attach your current dataset to R with attach(). (With detach() you can go backwards.)

6. Apply -function is a very useful tool for many different cases. Apply() has also a few varietes called tapply(), sapply(), lapply() and mapply(). Look through the help-pages of these functions.

7. Let's create a 5x4 -matrix, which is filled with random numbers from the standard normal distribution.

- Code is for example of the form: matrix(rnorm(n*m),nrow=n)

8. a) Calculate the sum of each row of the matrix with the function apply(). - For matrix m code otf.: apply(m,1,sum) b) Calculate the sum of each column of the matrix with the function apply(). How can you also reach (as easily) the same results without apply-function? - For matrix m code otf.: apply(m,2,sum)

9. Let's now use a built in dataset called iris. Calculate mean of each of the numeric variables with function sapply().

- First find out which columns are numeric., then for example code: sapply(iris[1:4],mean)