Stats1 Definitions and Variables
Evan Misshula
2020-01-31
Evan Misshula Stats1 Definitions and Variables 2020-01-31 1 / 73 Outline
1 Data Management and Descriptive Statistics
2 Two graphs
Evan Misshula Stats1 Definitions and Variables 2020-01-31 2 / 73 Topics
Topics for masters statisics 1 Frequency Distribution Presentation, & Central Tendency I 2 Measures of Dispersion 3 Data Distribution and Variance 4 Hypothesis Testing Unit of Analysis 5 Basic Probability, Inference 6 Significance Tests
Evan Misshula Stats1 Definitions and Variables 2020-01-31 3 / 73 Topics Continued
Topics for masters statisics II 1 Chi Square, Expected values & Mean Testing Continued 2 Analysis of Variance (ANOVA) 3 Associations Nominal and Ordinal Data, Bivariate Correlation, 4 Pearson’s r and Spearman’s Rho 5 Bivariate Regression 6 Multiple Regression
Evan Misshula Stats1 Definitions and Variables 2020-01-31 4 / 73 Variables
Definition (Variables) Variables are characteristics that change from person to person or object to object in a population of interest Variables can take different values or levels They are called variables because they vary between cases Each variable has a level of measurement.
Evan Misshula Stats1 Definitions and Variables 2020-01-31 5 / 73 The four levels are: Nominal Ordinal Interval Ratio There are four levels of measurement
Measurement
Levels of Measurement Each variable has a level of measurement. The level of measurement is important because it determines the type of analysis.
Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 The four levels are: Nominal Ordinal Interval Ratio
Measurement
Levels of Measurement Each variable has a level of measurement. The level of measurement is important because it determines the type of analysis.
There are four levels of measurement
Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 Nominal Ordinal Interval Ratio
Measurement
Levels of Measurement Each variable has a level of measurement. The four levels are: The level of measurement is important because it determines the type of analysis.
There are four levels of measurement
Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 Measurement
Levels of Measurement Each variable has a level of measurement. The four levels are: The level of measurement is Nominal important because it Ordinal determines the type of Interval analysis. Ratio There are four levels of measurement
Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 Hierarchy of measurement
Evan Misshula Stats1 Definitions and Variables 2020-01-31 7 / 73 Nominal Variable
Categorical, Nominal or Qualitative data The word nominal means names A nominal variable ONLY describes something The only fuction is to label and categorize NO INHERENT NUMERIC QUANTITY, NO Ranking of levels or ordering scheme Numbers can be nominal level if there is no quantity associated with them. Categories should be distinct, mututally exclusive, and completely exhaustive
Evan Misshula Stats1 Definitions and Variables 2020-01-31 8 / 73 Nominal Variable Examples
Examples Sex Religious Affiliation Race Zip code State
Evan Misshula Stats1 Definitions and Variables 2020-01-31 9 / 73 Ordinal Variables
Definition (Ordinal Variables) Ordinal variables have a ranking The root of Ordinal is “Ord” for order The exact amount of difference is unknown rank in most wanted
Evan Misshula Stats1 Definitions and Variables 2020-01-31 10 / 73 Ordinal Variable Examples
Examples rating at a restaurant rank in the police department letter grade Service ratings 1=Poor 2=Fair 3=Good 4=Excellent
Evan Misshula Stats1 Definitions and Variables 2020-01-31 11 / 73 Interval variables
Definition (Interval variable) Interval Variables have inherently numeric values We can talk about the difference between two items
Evan Misshula Stats1 Definitions and Variables 2020-01-31 12 / 73 Interval Variable Gotcha
Interval Variable Gotcha Temperature has difference ’0’ is arbitrary Does not indicate that there is no temperature
Evan Misshula Stats1 Definitions and Variables 2020-01-31 13 / 73 Ratio Variables
Ratio Variables the same as interval but with a true 0 Number of children Age Prior Arrests Commute time
Evan Misshula Stats1 Definitions and Variables 2020-01-31 14 / 73 Variable Type
Dependent and Independent Dependent Variable is what you are trying to predict Independent is what you are using There are many names for each
Evan Misshula Stats1 Definitions and Variables 2020-01-31 15 / 73 Other names for dependent variables
Synonyms for Dependent outcome response
Evan Misshula Stats1 Definitions and Variables 2020-01-31 16 / 73 Other names for independent variables
Synonyms for Independent feature explanatory causal control
Evan Misshula Stats1 Definitions and Variables 2020-01-31 17 / 73 Validity and Reliability
Definition (Validity) Validity: Addresses the question of whether the variable used actually reflects the concept or theory you seek to examine. Reliability: A measure is reliable if it is consistent and stable. stable is if it remains the same when measured in the same group reliable is if the same person in different groups will score similarly
Evan Misshula Stats1 Definitions and Variables 2020-01-31 18 / 73 Univariate Descriptive Statistics
Univariate descriptive statistics describes one variable Univariate statistics is composed of: Central Tendency Measures of Dispersion Form of the distribution
Evan Misshula Stats1 Definitions and Variables 2020-01-31 19 / 73 a batting average
a shooting percentage
an average length of stay for pretrial detention
a median income for a zip code
Definition of a statistic
Example (Example statistics)
Definition (What is a statistic?) A statistic is one number that summarizes many numbers
Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 a shooting percentage
an average length of stay for pretrial detention
a median income for a zip code
Definition of a statistic
Example (Example statistics) a batting average Definition (What is a statistic?) A statistic is one number that summarizes many numbers
Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 an average length of stay for pretrial detention
a median income for a zip code
Definition of a statistic
Example (Example statistics) a batting average
Definition (What is a statistic?) a shooting percentage A statistic is one number that summarizes many numbers
Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 a median income for a zip code
Definition of a statistic
Example (Example statistics) a batting average
Definition (What is a statistic?) a shooting percentage A statistic is one number that summarizes many numbers an average length of stay for pretrial detention
Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 Definition of a statistic
Example (Example statistics) a batting average
Definition (What is a statistic?) a shooting percentage A statistic is one number that summarizes many numbers an average length of stay for pretrial detention
a median income for a zip code
Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 The measure of central tendency you choose depends on the level of measurement of the variable
Measures of Central Tendency (MCT)
Goal of Central Tendency We want the single best number that describes typical case
Evan Misshula Stats1 Definitions and Variables 2020-01-31 21 / 73 Measures of Central Tendency (MCT)
Goal of Central Tendency We want the single best number that describes typical case
The measure of central tendency you choose depends on the level of measurement of the variable
Evan Misshula Stats1 Definitions and Variables 2020-01-31 21 / 73 Three measures Mean, Median and Mode
Definition (Mode) The mode is the most frequently occurring value It is the only MCT appropriate for Nominal Data
Evan Misshula Stats1 Definitions and Variables 2020-01-31 22 / 73 Median
Definition (Median) The median is the value in where half the values are greater and half are less
Evan Misshula Stats1 Definitions and Variables 2020-01-31 23 / 73 Mean
Definition (Mean) The mean is the sum of the values divided by the number of values
Evan Misshula Stats1 Definitions and Variables 2020-01-31 24 / 73 Calculate the Mode
Calculation example rdata <- c() rdata <- c(2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 7) rdata <- c(rdata,7, 7, 8, 8, 8, 9, 9, 10) table(rdata) table(rdata)[order(table(rdata),decreasing=TRUE)]
rdata 2 3 4 5 6 7 8 9 10 1 2 1 3 3 4 3 2 1 rdata 7 5 6 8 3 9 2 4 10 4 3 3 3 2 2 1 1 1
Evan Misshula Stats1 Definitions and Variables 2020-01-31 25 / 73 Median
Median is the value that splits the distribution into two equal parts often used with distributions that are skewed or have outliers
myMedian <- function(x) { x1 <- x[order(x,decreasing = F)] l <- length(x) if(l %% 2 == 0) { return( .5*(x1[.5*l]+x1[.5*l+1])) } else { return( x1[(l+1)/2]) } }
Evan Misshula Stats1 Definitions and Variables 2020-01-31 26 / 73 Testing the median function
Do we get the same result for our own median function as the one built into R
median(rdata) myMedian(rdata)
[1] 6.5 [1] 6.5
Evan Misshula Stats1 Definitions and Variables 2020-01-31 27 / 73 Mean
Mean is also known as the average Only for Interval or Ratio level data Assumes order and equality of intervals Very sensitive to outliers and skew P X X = n
myMean <- function(x) { myMean <- sum(x)/length(x) return(myMean) } mean(rdata) myMean(rdata)
[1] 6.25 [1] 6.25
Evan Misshula Stats1 Definitions and Variables 2020-01-31 28 / 73 You need to understand what you are describing/modeling
textbook answer use the median in presence of skew and outliers
Mean and Outliers
Mean, outliers and typical case the mean is sensitive to outliers outliers can be important crashes (stock market) frequent (arrestees) billionaires in income data
Evan Misshula Stats1 Definitions and Variables 2020-01-31 29 / 73 textbook answer use the median in presence of skew and outliers
Mean and Outliers
Mean, outliers and typical case the mean is sensitive to outliers outliers can be important crashes (stock market) frequent (arrestees) billionaires in income data
You need to understand what you are describing/modeling
Evan Misshula Stats1 Definitions and Variables 2020-01-31 29 / 73 Mean and Outliers
Mean, outliers and typical case the mean is sensitive to outliers outliers can be important crashes (stock market) frequent (arrestees) billionaires in income data
You need to understand what you are describing/modeling
textbook answer use the median in presence of skew and outliers
Evan Misshula Stats1 Definitions and Variables 2020-01-31 29 / 73 Skew
Definition (Defining skew) A distribution is skew if it is not symmetric If a distribution is skewed it is conventional to use the median not the mean median moves less for a given outlier than the mean
Evan Misshula Stats1 Definitions and Variables 2020-01-31 30 / 73 Skew examples
Below are two asymmetrical distributions: (a) positive (right) skew, (b) negative (left) skew.
Evan Misshula Stats1 Definitions and Variables 2020-01-31 31 / 73 Skew examples
In the masters program you are also responsible for recognizing different levels of kurtosis
Evan Misshula Stats1 Definitions and Variables 2020-01-31 32 / 73 Example
Choose the most appropriate measure of central tendency (MCT) for:
LAWCODE ARRESTBORO ARRESTPRECINCT AGEGROUP PERPSEX PERPRACE
Evan Misshula Stats1 Definitions and Variables 2020-01-31 33 / 73 the measures we should use are dictated by the variable’s level of measurement
Recap
Recap so far level of measurement limit on the information in a variable central tendency describe the typical case measures of dispersion tell us how typical the typical case is
Evan Misshula Stats1 Definitions and Variables 2020-01-31 34 / 73 Recap
Recap so far level of measurement limit on the information in a variable central tendency describe the typical case measures of dispersion tell us how typical the typical case is
the measures we should use are dictated by the variable’s level of measurement
Evan Misshula Stats1 Definitions and Variables 2020-01-31 34 / 73 R example
Open Stats1.R Choose the most appropriate MCT for: central tendency (MCT) for:
LAWCODE ARRESTBORO ARRESTPRECINCT AGEGROUP PERPSEX PERPRACE
Evan Misshula Stats1 Definitions and Variables 2020-01-31 35 / 73 R code for MCT
MCT R code setwd(’~/Documents/peter/2020/stats1/data/’) ## list.files() ## read in YTD arrests arr <- read.table("NYPD_Arrest_Data__Year_to_Date_2019.csv", header=T,sep=",",nrow=214618) head(arr[1:3,1:4])
ARREST_KEY ARREST_DATE PD_CD PD_DESC 1 206892169 12/31/2019 907 IMPAIRED DRIVING,DRUG 2 206888084 12/31/2019 739 FRAUD,UNCLASSIFIED-FELONY 3 206890433 12/31/2019 122 HOMICIDE, NEGLIGENT, VEHICLE,
Evan Misshula Stats1 Definitions and Variables 2020-01-31 36 / 73 R can help us
first get the answer then make it look good
Some simple descriptives
Questions to ask our data how many arrests of each type were there what % of arrests are in each category what about severity?
Evan Misshula Stats1 Definitions and Variables 2020-01-31 37 / 73 first get the answer then make it look good
Some simple descriptives
Questions to ask our data how many arrests of each type were there what % of arrests are in each category what about severity?
R can help us
Evan Misshula Stats1 Definitions and Variables 2020-01-31 37 / 73 Some simple descriptives
Questions to ask our data how many arrests of each type were there what % of arrests are in each category what about severity?
R can help us
first get the answer then make it look good
Evan Misshula Stats1 Definitions and Variables 2020-01-31 37 / 73 how many arrests each month
Var1 Freq 1 8348 2 9027 3 9092 4 9364 5 7605 6 8570 7 8630 8 8279 9 7846 10 9806 11 8533 12 7290
Evan Misshula Stats1 Definitions and Variables 2020-01-31 38 / 73 What about percent?
V1 8.20 8.80 8.90 9.10 7.40 8.40 8.40 8.10 7.70 9.60 8.30 7.10 100.00
Evan Misshula Stats1 Definitions and Variables 2020-01-31 39 / 73 What about leagal category and month?
First, let’s look at the raw numbers
V1 F I M V 48 3,025 21 5,155 99 45 3,206 23 5,643 110 41 3,366 24 5,540 121 66 3,541 21 5,598 138 36 3,070 13 4,394 92 49 3,245 12 5,164 100 61 3,555 18 4,889 107 57 3,374 12 4,736 100 41 3,181 17 4,513 94 91 4,037 21 5,513 144 64 3,516 13 4,876 64 62 2,905 14 4,244 65 661 40,021 209 60,265 1,234
Evan Misshula Stats1 Definitions and Variables 2020-01-31 40 / 73 Let’s get the mean, median for arrests?
mnArr <- as.data.frame.matrix(table(month(arrDate),arr$LAW_CAT_CD)) mean(mnArr$F) median(mnArr$F)
[1] 3335.083 [1] 3305.5
Evan Misshula Stats1 Definitions and Variables 2020-01-31 41 / 73 a rough graph
barplot(mnArr$F)
Evan Misshula Stats1 Definitions and Variables 2020-01-31 42 / 73 vnMisshula Evan
0 1000 2000 3000 4000 5000 6000 7000 tt1Dfiiin n Variables and Definitions Stats1 000-14 73 / 43 2020-01-31 Better graph
myLabels <- c("jan","feb","mar","apr","may","jun") myLabels1 <- c("jan","jun") FelRange <- c(0,max(mnArr$F)*1.1) myAts=(2000*(0:4)) myYlabs=prettyNum(2000*(0:4),big.mark=",") myData <- mnArr$F names(myData) <- myLabels barplot(myData,border="orange",col="steelblue",axes=T,ylab="arrests",ylim=c(0,9000),las=1) title("Plot of monthly felony arrests")
Evan Misshula Stats1 Definitions and Variables 2020-01-31 44 / 73 Plot of monthly felony arrests
8000
6000
arrests 4000
2000
0 jan feb mar apr may jun
Evan Misshula Stats1 Definitions and Variables 2020-01-31 45 / 73 Spread
Two distributions can have the same mean but very different spread of values the amount of variation is very important there can be a little, there can be a lot
Evan Misshula Stats1 Definitions and Variables 2020-01-31 46 / 73 Variation is important because the typical case can be misleading
think of the crash or Bill Gates
Variability
More names for spread variation, dispersion
Evan Misshula Stats1 Definitions and Variables 2020-01-31 47 / 73 think of the crash or Bill Gates
Variability
More names for spread variation, dispersion
Variation is important because the typical case can be misleading
Evan Misshula Stats1 Definitions and Variables 2020-01-31 47 / 73 Variability
More names for spread variation, dispersion
Variation is important because the typical case can be misleading
think of the crash or Bill Gates
Evan Misshula Stats1 Definitions and Variables 2020-01-31 47 / 73 Central Tendency vs. Variability
Central Tendency vs. Variability Central Tendency shows typical case Measures of dispersion show spread of values around the typical case
Evan Misshula Stats1 Definitions and Variables 2020-01-31 48 / 73 Variability Examples
small data examples d1 <- c( 7, 6, 3, 3, 1) d2 <- c( 3, 4, 4, 5, 4) d3 <- c( 4, 4, 4, 4, 4) c(mean(d1), median(d1), sd(d1)) c( mean(d2), median(d2), sd(d2)) c( mean(d3), median(d3), sd(d3))
[1] 4.00000 3.00000 2.44949 [1] 4.0000000 4.0000000 0.7071068 [1] 4 4 0
Evan Misshula Stats1 Definitions and Variables 2020-01-31 49 / 73 K(1002 − PK p2) IQV = i=1 i 1002(K − 1)
Measures of Dispersion
Nominal variables proportion in modal category Index of Qualitative Variation
Evan Misshula Stats1 Definitions and Variables 2020-01-31 50 / 73 Measures of Dispersion
Nominal variables proportion in modal category Index of Qualitative Variation K(1002 − PK p2) IQV = i=1 i 1002(K − 1)
Evan Misshula Stats1 Definitions and Variables 2020-01-31 50 / 73 MOD Ordinal
Measures of dispersion for ordinal variables proportion in modal category Index of Qualitative Variation
Evan Misshula Stats1 Definitions and Variables 2020-01-31 51 / 73 MOD Interval/Ratio
Measures of dispersion for interval/ratio variables range variance standard deviation
Evan Misshula Stats1 Definitions and Variables 2020-01-31 52 / 73 Computing Proportion
Proportion in modal category formula Number modal ∗ 100 Total N
Evan Misshula Stats1 Definitions and Variables 2020-01-31 53 / 73 Computing Proportion
Proportion in modal category code rdata <- c(1,3,3,3,3,5) freqTable <- table(rdata) ordFreqTable <- freqTable[order(freqTable,decreasing = T)] propOrdFTable <- prop.table(ordFreqTable) 100*propOrdFTable[1]
3 66.66667
Evan Misshula Stats1 Definitions and Variables 2020-01-31 54 / 73 Computing IQV
Computing IQV Fear of Crime resp Not Concerned at All 3 A Little Concerned 4 Quite Concerned 6 Very Concerned 20
Evan Misshula Stats1 Definitions and Variables 2020-01-31 55 / 73 Computing IQV in R
Computing IQV in R myCr <- c(rep(1,3),rep(2,4),rep(3,6), rep(4,20)) myFreq <- table(myCr) myFreq
myCr 1 2 3 4 3 4 6 20
Evan Misshula Stats1 Definitions and Variables 2020-01-31 56 / 73 Computing IQV proportions
continuing the calculation of IQV K <- length(myFreq) myPropFreq <- prop.table(myFreq) sqProp <- apply(X=myPropFreq,MARGIN = 1,FUN = function(x){return(x^2)}) sumSqProp <- sum(sqProp) IQV <- (K/(K-1))*(1-sumSqProp) IQV
[1] 0.7689011
Evan Misshula Stats1 Definitions and Variables 2020-01-31 57 / 73 Recipe to function
The style of this is called imperative if we want to make it so we can use it we make it a function
IQV <- function(myCr) { myFreq <- table(myCr) K <- length(myFreq) myPropFreq <- prop.table(myFreq) sqProp <- apply(X=myPropFreq,MARGIN = 1, FUN = function(x){return(x^2)}) sumSqProp <- sum(sqProp) IQV <- (K/(K-1))*(1-sumSqProp) return(IQV) }
Evan Misshula Stats1 Definitions and Variables 2020-01-31 58 / 73 Interval Ratio Variables
There are three common measures of variability range variance standard deviation
Evan Misshula Stats1 Definitions and Variables 2020-01-31 59 / 73 Range
Calculating range
Range = Xmaximum − Xminimum
myData<-c(35,60,80,93,98) myRange <- max(myData)-min(myData) myRange range(myData)
[1] 63 [1] 35 98
Evan Misshula Stats1 Definitions and Variables 2020-01-31 60 / 73 Range advantages and disadvantages
Advantages of the range easy to compute interval includes all data
Disadvantages of the range crude because: it relies on only two points it ignores N-2 points it is very sensitive to outliers
Evan Misshula Stats1 Definitions and Variables 2020-01-31 61 / 73 Use of range
should never be used as the sole measure More examples
d1 <- c(10,39,39,40,40,40,40,41,41,47) d2 <- c(39,39,40,40,40,40,41,41,41,88) d3 <- c(39,39,40,40,40,40,41,41,41,42) myRanges<-c(max(d1),max(d2),max(d3))-c(min(d1),min(d2),min(d3)) myRanges
[1] 37 49 3
Evan Misshula Stats1 Definitions and Variables 2020-01-31 62 / 73 Variance and standard deviation
Variance and standard deviation appropriate for Interval/Ratio data not used for nominal or ordinal data there is a one-to-one map from variance to standard deviation
[1] 10 39 39 40 40 40 40 41 41 47 [1] 100.0111 [1] 100.0111
Evan Misshula Stats1 Definitions and Variables 2020-01-31 63 / 73 Fomulas for variance and standard deviation
Formulas for sample variance
PN 2 i=1(xi −(x)) var(x) = N Formulas for standard deviation d1 p var(d1) sd(x) = (var(x)) md1 <- mean(d1) sd(d1) sumSq <- unlist(lapply(X=(d1-md1),function(x){return(x^2)}))sqrt(myVar) myVar <- sum(sumSq)/(length(d1)-1) myVar [1] 10.00056 [1] 10.00056 [1] 10 39 39 40 40 40 40 41 41 47 [1] 100.0111 [1] 100.0111
Evan Misshula Stats1 Definitions and Variables 2020-01-31 64 / 73 long mathematical derivation not going to do it
Khan Academy Link
Sample vs. Population Standard Deviation
Sample vs. Population Standard Devaiation They both use the difference between the mean ($ µ $ or $ X $)andeachobservation(individualXvalue)Thesampleuses“n − 100insteadofNtoadjustforsamplingerror Sample statistics are used to estimate population parameters Samples tend to have less variation than the populations n-1 is used to adjust for this by inflating a value we know to be artificially low
Evan Misshula Stats1 Definitions and Variables 2020-01-31 65 / 73 Khan Academy Link
Sample vs. Population Standard Deviation
Sample vs. Population Standard Devaiation They both use the difference between the mean ($ µ $ or $ X $)andeachobservation(individualXvalue)Thesampleuses“n − 100insteadofNtoadjustforsamplingerror Sample statistics are used to estimate population parameters Samples tend to have less variation than the populations n-1 is used to adjust for this by inflating a value we know to be artificially low
long mathematical derivation not going to do it
Evan Misshula Stats1 Definitions and Variables 2020-01-31 65 / 73 Sample vs. Population Standard Deviation
Sample vs. Population Standard Devaiation They both use the difference between the mean ($ µ $ or $ X $)andeachobservation(individualXvalue)Thesampleuses“n − 100insteadofNtoadjustforsamplingerror Sample statistics are used to estimate population parameters Samples tend to have less variation than the populations n-1 is used to adjust for this by inflating a value we know to be artificially low
long mathematical derivation not going to do it
Khan Academy Link
Evan Misshula Stats1 Definitions and Variables 2020-01-31 65 / 73 Another example
Example of univariate analysis Number of arrests (1,5,7,8,9)
arrestData <- c(1,5,7,8,9) mean(arrestData) sd(arrestData)
[1] 6 [1] 3.162278
Evan Misshula Stats1 Definitions and Variables 2020-01-31 66 / 73 Standard deviation insight
Insight into standard deviation The reason why the sum of the deviations from the mean is always zero: The mean serves as a balance point for the distribution The total distance of individual scores above the mean is exactly equal to the total distance of those below the mean The result is the positive and negative numbers cancelling each other out The solution is to get rid of the negative signs Squaring the deviations makes every number positive
Evan Misshula Stats1 Definitions and Variables 2020-01-31 67 / 73 Standard Deviation
Drawbacks of standard deviation It takes into account all values Drawbacks of standard deviation in the distribution Standard deviation and Values far from the mean are variance are sensitive to given extra weight because outliers (extreme values) deviations from the mean are squared
Evan Misshula Stats1 Definitions and Variables 2020-01-31 68 / 73 Stem and Leaf plot
a display useful in summarizing distribution Goes back to Aurther Bowley’s work in early 1900’s Popularized by John Tukey in 1977’s Exploratory Data Analysis
Evan Misshula Stats1 Definitions and Variables 2020-01-31 69 / 73 Computing a Stem and Leaf
class scores Class Scores: 75 8 36 36 55 55 42 36 50 42 100 83 27 58 55 55 27 83 17 58 82 27 92 50 42
Evan Misshula Stats1 Definitions and Variables 2020-01-31 70 / 73 Read then into R
myScores<-c(75,8,36,36,55,55,42,36,50,42,100,83,27,58,55,55,27,83,17,58,82,27,92,50,42) myOrdScores <- myScores[order(myScores)] stem(myScores,scale=1)
The decimal point is 1 digit(s) to the right of the |
0 | 87 2 | 777666 4 | 22200555588 6 | 5 8 | 2332 10 | 0
Evan Misshula Stats1 Definitions and Variables 2020-01-31 71 / 73 Display Stem
stem(myScores,scale=2)
The decimal point is 1 digit(s) to the right of the |
0 | 8 1 | 7 2 | 777 3 | 666 4 | 222 5 | 00555588 6 | 7 | 5 8 | 233 9 | 2 10 | 0
Evan Misshula Stats1 Definitions and Variables 2020-01-31 72 / 73 Next time
Topics for next time Probability The Gaussian Distribution and the “Normal” Curve Risk of Error
Evan Misshula Stats1 Definitions and Variables 2020-01-31 73 / 73