Describing Univariate Distributions
Total Page:16
File Type:pdf, Size:1020Kb
Describing Univariate Distributions
Sean and Tom library(ggplot2);library(descr);library(knitr) addHealth <- read.csv("/Users/YungOG/Desktop/Math 315/Projects/Data/AddHealth_Wave_IV.csv", stringsAsFactors = FALSE) Intro We are exploring the variable of smoking tobacco, cigarettes, as the explanatory variable and its three response variable, all of which pertain to a persons health. The first response variable is if smoking increases your chances of getting gum disease, gingivitis or tooth loss. The second response variable is if smoking increases your chances of getting asthma, chronic bronchitis, or emphysema. And lastly, if smoking increases your chance of getting cancer, lymphona, or leukemia. Categorical kable(freq(addHealth$H4TO1)) 0 1 6 8 NA's Total addHealth$H4TO1 <- ifelse(addHealth$H4TO1== 6, NA, addHealth$H4TO1) addHealth$H4TO1 <- ifelse(addHealth$H4TO1== 8, NA, addHealth$H4TO1) addHealth$H4TO1 <- ifelse(addHealth$H4TO1== 7, NA, addHealth$H4TO1)
The first graph for H4TO1 shows the first bar being people who has NEVER smoked a cigarette and the second bar showing who HAS smoked cigarette in their life. Under the first bar there was about 36% of people in the data has never smoked a cigarette while the moajority of the percentage of people did. This graph is skewed right and there is no modailty. The spread is from yes to no. There is no midpoint but the mode is more people have smoked than not. No outliers for this variable. kable(freq(addHealth$H4ID9A)) 0 1 NA's Total addHealth$H4ID9A <- ifelse(addHealth$H4ID9A== 6, NA, addHealth$H4ID9A)
This graph shows the datd from question H$ID9A from the addHealth data in how many people in the last four weeks have had gum disease or tooth loss due to cavities. The first bar under 0 shows that about 97% of the people did not have gum disease or tooth loss in the last four weeks while only 3 percent did. This graph is skewed left and there is no modailty. The spread is from yes to no. There is no midpoint but the mode is people have not been diagnosed with gum disease or tooth loss. No outliers for this variable. kable(freq(addHealth$H4ID5A)) 0 1 NA's Total addHealth$H4ID5A <- ifelse(addHealth$H4ID5A== 6, NA, addHealth$H4ID5A)
This graph shows the data for question H4ID5A where it asks if the respondent had cancer, lymphonma, or leukemia. The first bar under 0 shows that about 98.7% of people do have any of these three health problems while about 1.3% of people do have one of these three health problems. This graph is skewed left and there is no modailty. The spread is from yes to no. There is no midpoint but the mode is more people DO NOT have cancer, lymphoma, or leukemia. No outliers for this variable. kable(freq(addHealth$H4ID5F)) 0 1 6 NA's Total addHealth$H4ID5F <- ifelse(addHealth$H4ID5F== 6, NA, addHealth$H4ID5F) addHealth$H4ID5F <- ifelse(addHealth$H4ID5F== 7, NA, addHealth$H4ID5F)
For our final graph, it shows the data for question H4ID5F where the respondent has either one of the three health problems: asthma, chronic bronchitis, emphysema. The first bar under 0 has about 85% of the respondent that DID NOT have one of the three conditions listed above and under bar 1 was the respondents that DID have one of the three conditions listed above. This graph is skewed left and there is no modailty. The spread is from yes to no. There is no midpoint but the mode more people DO NOT have asthma, chronic bronchitis, or emphysema. ```