Dutra/Gleaton

Univariate

Dutra/Gleaton 9/19/2017

Intro The variables described below are are based on a few of our research questions regarding eduction, drug use, and a criminal record. The first variable described and asked if the respondent had ever been arrested. The second variable described was based on if the respondent answered yes to the first variable, and asked if they had been arrested once or more than once. The third variable descried was Highest Education Level achieved. Lastly, we looked at age as our fourth variable. addhealth$H4CJ1[addhealth$H4CJ1 == 6] <- NA addhealth$H4CJ1[addhealth$H4CJ1 == 7] <- NA addhealth$H4CJ1[addhealth$H4CJ1 == 8] <- NA arrest <- table(addhealth$H4CJ1) table(addhealth$H4CJ1)

## ## 0 1 ## 3610 1457 addhealth$H4CJ1[addhealth$H4CJ1 == 6] <- NA addhealth$H4CJ1[addhealth$H4CJ1 == 7] <- NA addhealth$H4CJ1[addhealth$H4CJ1 == 8] <- NA arrest <- table(addhealth$H4CJ1) barplot(arrest, main = "Ever Been Arrested", col = "blue", ylab = "Frequency", xlab = "No = 0, Yes = 1", ylim = c(0,4000)) This bar graph shows if the respondent had ever been arrested. If the person had never been arrested they answered 0, for No, and if they had ever been been they answered 1, for Yes. The possible responses for this variable were 0,1,6,7, and 8. 6=refused, 7= legitimate skip, 8= don't know. Very few respondents actually answered 6,7, or 8 so we decided to only graph the data we were focused on, which was if the repsondent had ever been arrested or not. Based on the graph, much more people answered No to if they had been arrested than Yes. addhealth$H4CJ2[addhealth$H4CJ2 == 6] <- NA addhealth$H4CJ2[addhealth$H4CJ2 == 8] <- NA addhealth$H4CJ2[addhealth$H4CJ2 == 7] <- NA num_arrest <- table(addhealth$H4CJ2) table(addhealth$H4CJ2)

## ## 1 2 ## 720 765 summary(addhealth$H4ED2)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 1.000 4.000 6.000 5.724 7.000 98.000 1390 addhealth$H4CJ2[addhealth$H4CJ2 == 6] <- NA addhealth$H4CJ2[addhealth$H4CJ2 == 8] <- NA addhealth$H4CJ2[addhealth$H4CJ2 == 7] <- NA num_arrest <- table(addhealth$H4CJ2) barplot(num_arrest, main = "Number of Arrests", col = "darkgreen", ylab = "Frequency", xlab = "Once = 1, More Than Once = 2")

If the respondent answerered Yes in the previous question, they were asked if they had been arrested once, 1, or More than once, 2.The possible responses for this variable were 0,1,6,7, and 8. 6=refused, 7= legitimate skip, 8= don't know. Very few respondents actually answered 6,7, or 8 so we decided to only graph the data we were focused on, if the repsondent had been arrested once, 1, or more than once 2. The graph shows that more people responded that they had been arrested more than once, if they had been arrested. table(addhealth$H4ED2)

## ## 1 2 3 4 5 6 7 8 9 10 11 12 13 98 ## 16 383 835 182 327 1702 1012 199 256 59 31 39 72 1 summary(addhealth$H4ED2)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 1.000 4.000 6.000 5.724 7.000 98.000 1390 hist(addhealth$H4ED2, main = "Highest Level of Education", xlab = "8th Grade - Doctorate", col = "yellow", ylim = c(0,2000), xlim = c(0,12))

summary(addhealth$H4ED2)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 1.000 4.000 6.000 5.724 7.000 98.000 1390

This histogram is used to show the Highest Level of Education recieved by the respondent. 1=8th grade or less, 2=some highschool,3=high school graduate 4=some vocational/techinical training after high school, 5=completed voc./tech. training, 6=some college, 7=bachelor's degree, 8=some graduate school, 9=completed a mastser's degree, 10=some grad. training beyond mast., 11= completed doctoral. The histogram is symmetric. The best indicator of spread would be the median in this case. The median is 6, some college. hist(addhealth$H4OD1Y, main = "Respondents Date of Birth - Year", xlab = "Year Born", col = "purple", ylim = c(0,1000), xlim = c(1974, 1983)) summary(addhealth$H4OD1Y)

## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## 1974 1978 1979 1979 1980 1983 1390

This histogram is showing the year that the respondents were born, therefore their age. I would say it has a symmetric distribution. The average year the respondents were born is 1979. Not many people were born in the early 70s, or early 80s. The mode is 1978.