Stats1 Definitions and Variables

Evan Misshula

2020-01-31

Evan Misshula Stats1 Definitions and Variables 2020-01-31 1 / 73 Outline

1 Data Management and Descriptive Statistics

2 Two graphs

Evan Misshula Stats1 Definitions and Variables 2020-01-31 2 / 73 Topics

Topics for masters statisics 1 Frequency Distribution Presentation, & Central Tendency I 2 Measures of Dispersion 3 Data Distribution and 4 Hypothesis Testing Unit of Analysis 5 Basic Probability, Inference 6 Significance Tests

Evan Misshula Stats1 Definitions and Variables 2020-01-31 3 / 73 Topics Continued

Topics for masters statisics II 1 Chi Square, Expected values & Mean Testing Continued 2 Analysis of Variance (ANOVA) 3 Associations Nominal and Ordinal Data, Bivariate Correlation, 4 Pearson’s r and Spearman’s Rho 5 Bivariate Regression 6 Multiple Regression

Evan Misshula Stats1 Definitions and Variables 2020-01-31 4 / 73 Variables

Definition (Variables) Variables are characteristics that change from person to person or object to object in a population of interest Variables can take different values or levels They are called variables because they vary between cases Each variable has a .

Evan Misshula Stats1 Definitions and Variables 2020-01-31 5 / 73 The four levels are: Nominal Ordinal Interval Ratio There are four levels of measurement

Measurement

Levels of Measurement Each variable has a level of measurement. The level of measurement is important because it determines the type of analysis.

Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 The four levels are: Nominal Ordinal Interval Ratio

Measurement

Levels of Measurement Each variable has a level of measurement. The level of measurement is important because it determines the type of analysis.

There are four levels of measurement

Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 Nominal Ordinal Interval Ratio

Measurement

Levels of Measurement Each variable has a level of measurement. The four levels are: The level of measurement is important because it determines the type of analysis.

There are four levels of measurement

Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 Measurement

Levels of Measurement Each variable has a level of measurement. The four levels are: The level of measurement is Nominal important because it Ordinal determines the type of Interval analysis. Ratio There are four levels of measurement

Evan Misshula Stats1 Definitions and Variables 2020-01-31 6 / 73 Hierarchy of measurement

Evan Misshula Stats1 Definitions and Variables 2020-01-31 7 / 73 Nominal Variable

Categorical, Nominal or Qualitative data The word nominal means names A nominal variable ONLY describes something The only fuction is to label and categorize NO INHERENT NUMERIC QUANTITY, NO Ranking of levels or ordering scheme Numbers can be nominal level if there is no quantity associated with them. Categories should be distinct, mututally exclusive, and completely exhaustive

Evan Misshula Stats1 Definitions and Variables 2020-01-31 8 / 73 Nominal Variable Examples

Examples Sex Religious Affiliation Race Zip code State

Evan Misshula Stats1 Definitions and Variables 2020-01-31 9 / 73 Ordinal Variables

Definition (Ordinal Variables) Ordinal variables have a ranking The root of Ordinal is “Ord” for order The exact amount of difference is unknown rank in most wanted

Evan Misshula Stats1 Definitions and Variables 2020-01-31 10 / 73 Ordinal Variable Examples

Examples rating at a restaurant rank in the police department letter grade Service ratings 1=Poor 2=Fair 3=Good 4=Excellent

Evan Misshula Stats1 Definitions and Variables 2020-01-31 11 / 73 Interval variables

Definition (Interval variable) Interval Variables have inherently numeric values We can talk about the difference between two items

Evan Misshula Stats1 Definitions and Variables 2020-01-31 12 / 73 Interval Variable Gotcha

Interval Variable Gotcha Temperature has difference ’0’ is arbitrary Does not indicate that there is no temperature

Evan Misshula Stats1 Definitions and Variables 2020-01-31 13 / 73 Ratio Variables

Ratio Variables the same as interval but with a true 0 Number of children Age Prior Arrests Commute time

Evan Misshula Stats1 Definitions and Variables 2020-01-31 14 / 73 Variable Type

Dependent and Independent Dependent Variable is what you are trying to predict Independent is what you are using There are many names for each

Evan Misshula Stats1 Definitions and Variables 2020-01-31 15 / 73 Other names for dependent variables

Synonyms for Dependent outcome response

Evan Misshula Stats1 Definitions and Variables 2020-01-31 16 / 73 Other names for independent variables

Synonyms for Independent feature explanatory causal control

Evan Misshula Stats1 Definitions and Variables 2020-01-31 17 / 73 Validity and Reliability

Definition (Validity) Validity: Addresses the question of whether the variable used actually reflects the concept or theory you seek to examine. Reliability: A measure is reliable if it is consistent and stable. stable is if it remains the same when measured in the same group reliable is if the same person in different groups will score similarly

Evan Misshula Stats1 Definitions and Variables 2020-01-31 18 / 73 Univariate Descriptive Statistics

Univariate descriptive statistics describes one variable Univariate statistics is composed of: Central Tendency Measures of Dispersion Form of the distribution

Evan Misshula Stats1 Definitions and Variables 2020-01-31 19 / 73 a batting average

a shooting percentage

an average length of stay for pretrial detention

a median income for a zip code

Definition of a statistic

Example (Example statistics)

Definition (What is a statistic?) A statistic is one number that summarizes many numbers

Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 a shooting percentage

an average length of stay for pretrial detention

a median income for a zip code

Definition of a statistic

Example (Example statistics) a batting average Definition (What is a statistic?) A statistic is one number that summarizes many numbers

Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 an average length of stay for pretrial detention

a median income for a zip code

Definition of a statistic

Example (Example statistics) a batting average

Definition (What is a statistic?) a shooting percentage A statistic is one number that summarizes many numbers

Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 a median income for a zip code

Definition of a statistic

Example (Example statistics) a batting average

Definition (What is a statistic?) a shooting percentage A statistic is one number that summarizes many numbers an average length of stay for pretrial detention

Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 Definition of a statistic

Example (Example statistics) a batting average

Definition (What is a statistic?) a shooting percentage A statistic is one number that summarizes many numbers an average length of stay for pretrial detention

a median income for a zip code

Evan Misshula Stats1 Definitions and Variables 2020-01-31 20 / 73 The measure of central tendency you choose depends on the level of measurement of the variable

Measures of Central Tendency (MCT)

Goal of Central Tendency We want the single best number that describes typical case

Evan Misshula Stats1 Definitions and Variables 2020-01-31 21 / 73 Measures of Central Tendency (MCT)

Goal of Central Tendency We want the single best number that describes typical case

The measure of central tendency you choose depends on the level of measurement of the variable

Evan Misshula Stats1 Definitions and Variables 2020-01-31 21 / 73 Three measures Mean, Median and Mode

Definition (Mode) The mode is the most frequently occurring value It is the only MCT appropriate for Nominal Data

Evan Misshula Stats1 Definitions and Variables 2020-01-31 22 / 73 Median

Definition (Median) The median is the value in where half the values are greater and half are less

Evan Misshula Stats1 Definitions and Variables 2020-01-31 23 / 73 Mean

Definition (Mean) The mean is the sum of the values divided by the number of values

Evan Misshula Stats1 Definitions and Variables 2020-01-31 24 / 73 Calculate the Mode

Calculation example rdata <- c() rdata <- c(2, 3, 3, 4, 5, 5, 5, 6, 6, 6, 7, 7) rdata <- c(rdata,7, 7, 8, 8, 8, 9, 9, 10) table(rdata) table(rdata)[order(table(rdata),decreasing=TRUE)]

rdata 2 3 4 5 6 7 8 9 10 1 2 1 3 3 4 3 2 1 rdata 7 5 6 8 3 9 2 4 10 4 3 3 3 2 2 1 1 1

Evan Misshula Stats1 Definitions and Variables 2020-01-31 25 / 73 Median

Median is the value that splits the distribution into two equal parts often used with distributions that are skewed or have outliers

myMedian <- function(x) { x1 <- x[order(x,decreasing = F)] l <- length(x) if(l %% 2 == 0) { return( .5*(x1[.5*l]+x1[.5*l+1])) } else { return( x1[(l+1)/2]) } }

Evan Misshula Stats1 Definitions and Variables 2020-01-31 26 / 73 Testing the median function

Do we get the same result for our own median function as the one built into R

median(rdata) myMedian(rdata)

[1] 6.5 [1] 6.5

Evan Misshula Stats1 Definitions and Variables 2020-01-31 27 / 73 Mean

Mean is also known as the average Only for Interval or Ratio level data Assumes order and equality of intervals Very sensitive to outliers and skew P X X = n

myMean <- function(x) { myMean <- sum(x)/length(x) return(myMean) } mean(rdata) myMean(rdata)

[1] 6.25 [1] 6.25

Evan Misshula Stats1 Definitions and Variables 2020-01-31 28 / 73 You need to understand what you are describing/modeling

textbook answer use the median in presence of skew and outliers

Mean and Outliers

Mean, outliers and typical case the mean is sensitive to outliers outliers can be important crashes (stock market) frequent (arrestees) billionaires in income data

Evan Misshula Stats1 Definitions and Variables 2020-01-31 29 / 73 textbook answer use the median in presence of skew and outliers

Mean and Outliers

Mean, outliers and typical case the mean is sensitive to outliers outliers can be important crashes (stock market) frequent (arrestees) billionaires in income data

You need to understand what you are describing/modeling

Evan Misshula Stats1 Definitions and Variables 2020-01-31 29 / 73 Mean and Outliers

Mean, outliers and typical case the mean is sensitive to outliers outliers can be important crashes (stock market) frequent (arrestees) billionaires in income data

You need to understand what you are describing/modeling

textbook answer use the median in presence of skew and outliers

Evan Misshula Stats1 Definitions and Variables 2020-01-31 29 / 73 Skew

Definition (Defining skew) A distribution is skew if it is not symmetric If a distribution is skewed it is conventional to use the median not the mean median moves less for a given outlier than the mean

Evan Misshula Stats1 Definitions and Variables 2020-01-31 30 / 73 Skew examples

Below are two asymmetrical distributions: (a) positive (right) skew, (b) negative (left) skew.

Evan Misshula Stats1 Definitions and Variables 2020-01-31 31 / 73 Skew examples

In the masters program you are also responsible for recognizing different levels of kurtosis

Evan Misshula Stats1 Definitions and Variables 2020-01-31 32 / 73 Example

Choose the most appropriate measure of central tendency (MCT) for:

LAWCODE ARRESTBORO ARRESTPRECINCT AGEGROUP PERPSEX PERPRACE

Evan Misshula Stats1 Definitions and Variables 2020-01-31 33 / 73 the measures we should use are dictated by the variable’s level of measurement

Recap

Recap so far level of measurement limit on the information in a variable central tendency describe the typical case measures of dispersion tell us how typical the typical case is

Evan Misshula Stats1 Definitions and Variables 2020-01-31 34 / 73 Recap

Recap so far level of measurement limit on the information in a variable central tendency describe the typical case measures of dispersion tell us how typical the typical case is

the measures we should use are dictated by the variable’s level of measurement

Evan Misshula Stats1 Definitions and Variables 2020-01-31 34 / 73 R example

Open Stats1.R Choose the most appropriate MCT for: central tendency (MCT) for:

LAWCODE ARRESTBORO ARRESTPRECINCT AGEGROUP PERPSEX PERPRACE

Evan Misshula Stats1 Definitions and Variables 2020-01-31 35 / 73 R code for MCT

MCT R code setwd(’~/Documents/peter/2020/stats1/data/’) ## list.files() ## read in YTD arrests arr <- read.table("NYPD_Arrest_Data__Year_to_Date_2019.csv", header=T,sep=",",nrow=214618) head(arr[1:3,1:4])

ARREST_KEY ARREST_DATE PD_CD PD_DESC 1 206892169 12/31/2019 907 IMPAIRED DRIVING,DRUG 2 206888084 12/31/2019 739 FRAUD,UNCLASSIFIED-FELONY 3 206890433 12/31/2019 122 HOMICIDE, NEGLIGENT, VEHICLE,

Evan Misshula Stats1 Definitions and Variables 2020-01-31 36 / 73 R can help us

first get the answer then make it look good

Some simple descriptives

Questions to ask our data how many arrests of each type were there what % of arrests are in each category what about severity?

Evan Misshula Stats1 Definitions and Variables 2020-01-31 37 / 73 first get the answer then make it look good

Some simple descriptives

Questions to ask our data how many arrests of each type were there what % of arrests are in each category what about severity?

R can help us

Evan Misshula Stats1 Definitions and Variables 2020-01-31 37 / 73 Some simple descriptives

Questions to ask our data how many arrests of each type were there what % of arrests are in each category what about severity?

R can help us

first get the answer then make it look good

Evan Misshula Stats1 Definitions and Variables 2020-01-31 37 / 73 how many arrests each month

Var1 Freq 1 8348 2 9027 3 9092 4 9364 5 7605 6 8570 7 8630 8 8279 9 7846 10 9806 11 8533 12 7290

Evan Misshula Stats1 Definitions and Variables 2020-01-31 38 / 73 What about percent?

V1 8.20 8.80 8.90 9.10 7.40 8.40 8.40 8.10 7.70 9.60 8.30 7.10 100.00

Evan Misshula Stats1 Definitions and Variables 2020-01-31 39 / 73 What about leagal category and month?

First, let’s look at the raw numbers

V1 F I M V 48 3,025 21 5,155 99 45 3,206 23 5,643 110 41 3,366 24 5,540 121 66 3,541 21 5,598 138 36 3,070 13 4,394 92 49 3,245 12 5,164 100 61 3,555 18 4,889 107 57 3,374 12 4,736 100 41 3,181 17 4,513 94 91 4,037 21 5,513 144 64 3,516 13 4,876 64 62 2,905 14 4,244 65 661 40,021 209 60,265 1,234

Evan Misshula Stats1 Definitions and Variables 2020-01-31 40 / 73 Let’s get the mean, median for arrests?

mnArr <- as.data.frame.matrix(table(month(arrDate),arr$LAW_CAT_CD)) mean(mnArr$F) median(mnArr$F)

[1] 3335.083 [1] 3305.5

Evan Misshula Stats1 Definitions and Variables 2020-01-31 41 / 73 a rough graph

barplot(mnArr$F)

Evan Misshula Stats1 Definitions and Variables 2020-01-31 42 / 73 vnMisshula Evan

0 1000 2000 3000 4000 5000 6000 7000 tt1Dfiiin n Variables and Definitions Stats1 000-14 73 / 43 2020-01-31 Better graph

myLabels <- c("jan","feb","mar","apr","may","jun") myLabels1 <- c("jan","jun") FelRange <- c(0,max(mnArr$F)*1.1) myAts=(2000*(0:4)) myYlabs=prettyNum(2000*(0:4),big.mark=",") myData <- mnArr$F names(myData) <- myLabels barplot(myData,border="orange",col="steelblue",axes=T,ylab="arrests",ylim=c(0,9000),las=1) title("Plot of monthly felony arrests")

Evan Misshula Stats1 Definitions and Variables 2020-01-31 44 / 73 Plot of monthly felony arrests

8000

6000

arrests 4000

2000

0 jan feb mar apr may jun

Evan Misshula Stats1 Definitions and Variables 2020-01-31 45 / 73 Spread

Two distributions can have the same mean but very different spread of values the amount of variation is very important there can be a little, there can be a lot

Evan Misshula Stats1 Definitions and Variables 2020-01-31 46 / 73 Variation is important because the typical case can be misleading

think of the crash or Bill Gates

Variability

More names for spread variation, dispersion

Evan Misshula Stats1 Definitions and Variables 2020-01-31 47 / 73 think of the crash or Bill Gates

Variability

More names for spread variation, dispersion

Variation is important because the typical case can be misleading

Evan Misshula Stats1 Definitions and Variables 2020-01-31 47 / 73 Variability

More names for spread variation, dispersion

Variation is important because the typical case can be misleading

think of the crash or Bill Gates

Evan Misshula Stats1 Definitions and Variables 2020-01-31 47 / 73 Central Tendency vs. Variability

Central Tendency vs. Variability Central Tendency shows typical case Measures of dispersion show spread of values around the typical case

Evan Misshula Stats1 Definitions and Variables 2020-01-31 48 / 73 Variability Examples

small data examples d1 <- c( 7, 6, 3, 3, 1) d2 <- c( 3, 4, 4, 5, 4) d3 <- c( 4, 4, 4, 4, 4) c(mean(d1), median(d1), sd(d1)) c( mean(d2), median(d2), sd(d2)) c( mean(d3), median(d3), sd(d3))

[1] 4.00000 3.00000 2.44949 [1] 4.0000000 4.0000000 0.7071068 [1] 4 4 0

Evan Misshula Stats1 Definitions and Variables 2020-01-31 49 / 73 K(1002 − PK p2) IQV = i=1 i 1002(K − 1)

Measures of Dispersion

Nominal variables proportion in modal category Index of Qualitative Variation

Evan Misshula Stats1 Definitions and Variables 2020-01-31 50 / 73 Measures of Dispersion

Nominal variables proportion in modal category Index of Qualitative Variation K(1002 − PK p2) IQV = i=1 i 1002(K − 1)

Evan Misshula Stats1 Definitions and Variables 2020-01-31 50 / 73 MOD Ordinal

Measures of dispersion for ordinal variables proportion in modal category Index of Qualitative Variation

Evan Misshula Stats1 Definitions and Variables 2020-01-31 51 / 73 MOD Interval/Ratio

Measures of dispersion for interval/ratio variables variance

Evan Misshula Stats1 Definitions and Variables 2020-01-31 52 / 73 Computing Proportion

Proportion in modal category formula Number modal ∗ 100 Total N

Evan Misshula Stats1 Definitions and Variables 2020-01-31 53 / 73 Computing Proportion

Proportion in modal category code rdata <- c(1,3,3,3,3,5) freqTable <- table(rdata) ordFreqTable <- freqTable[order(freqTable,decreasing = T)] propOrdFTable <- prop.table(ordFreqTable) 100*propOrdFTable[1]

3 66.66667

Evan Misshula Stats1 Definitions and Variables 2020-01-31 54 / 73 Computing IQV

Computing IQV Fear of Crime resp Not Concerned at All 3 A Little Concerned 4 Quite Concerned 6 Very Concerned 20

Evan Misshula Stats1 Definitions and Variables 2020-01-31 55 / 73 Computing IQV in R

Computing IQV in R myCr <- c(rep(1,3),rep(2,4),rep(3,6), rep(4,20)) myFreq <- table(myCr) myFreq

myCr 1 2 3 4 3 4 6 20

Evan Misshula Stats1 Definitions and Variables 2020-01-31 56 / 73 Computing IQV proportions

continuing the calculation of IQV K <- length(myFreq) myPropFreq <- prop.table(myFreq) sqProp <- apply(X=myPropFreq,MARGIN = 1,FUN = function(x){return(x^2)}) sumSqProp <- sum(sqProp) IQV <- (K/(K-1))*(1-sumSqProp) IQV

[1] 0.7689011

Evan Misshula Stats1 Definitions and Variables 2020-01-31 57 / 73 Recipe to function

The style of this is called imperative if we want to make it so we can use it we make it a function

IQV <- function(myCr) { myFreq <- table(myCr) K <- length(myFreq) myPropFreq <- prop.table(myFreq) sqProp <- apply(X=myPropFreq,MARGIN = 1, FUN = function(x){return(x^2)}) sumSqProp <- sum(sqProp) IQV <- (K/(K-1))*(1-sumSqProp) return(IQV) }

Evan Misshula Stats1 Definitions and Variables 2020-01-31 58 / 73 Interval Ratio Variables

There are three common measures of variability range variance standard deviation

Evan Misshula Stats1 Definitions and Variables 2020-01-31 59 / 73 Range

Calculating range

Range = Xmaximum − Xminimum

myData<-c(35,60,80,93,98) myRange <- max(myData)-min(myData) myRange range(myData)

[1] 63 [1] 35 98

Evan Misshula Stats1 Definitions and Variables 2020-01-31 60 / 73 Range advantages and disadvantages

Advantages of the range easy to compute interval includes all data

Disadvantages of the range crude because: it relies on only two points it ignores N-2 points it is very sensitive to outliers

Evan Misshula Stats1 Definitions and Variables 2020-01-31 61 / 73 Use of range

should never be used as the sole measure More examples

d1 <- c(10,39,39,40,40,40,40,41,41,47) d2 <- c(39,39,40,40,40,40,41,41,41,88) d3 <- c(39,39,40,40,40,40,41,41,41,42) myRanges<-c(max(d1),max(d2),max(d3))-c(min(d1),min(d2),min(d3)) myRanges

[1] 37 49 3

Evan Misshula Stats1 Definitions and Variables 2020-01-31 62 / 73 Variance and standard deviation

Variance and standard deviation appropriate for Interval/Ratio data not used for nominal or ordinal data there is a one-to-one map from variance to standard deviation

[1] 10 39 39 40 40 40 40 41 41 47 [1] 100.0111 [1] 100.0111

Evan Misshula Stats1 Definitions and Variables 2020-01-31 63 / 73 Fomulas for variance and standard deviation

Formulas for sample variance

PN 2 i=1(xi −(x)) var(x) = N Formulas for standard deviation d1 p var(d1) sd(x) = (var(x)) md1 <- mean(d1) sd(d1) sumSq <- unlist(lapply(X=(d1-md1),function(x){return(x^2)}))sqrt(myVar) myVar <- sum(sumSq)/(length(d1)-1) myVar [1] 10.00056 [1] 10.00056 [1] 10 39 39 40 40 40 40 41 41 47 [1] 100.0111 [1] 100.0111

Evan Misshula Stats1 Definitions and Variables 2020-01-31 64 / 73 long mathematical derivation not going to do it

Khan Academy Link

Sample vs. Population Standard Deviation

Sample vs. Population Standard Devaiation They both use the difference between the mean ($ µ $ or $ X $)andeachobservation(individualXvalue)Thesampleuses“n − 100insteadofNtoadjustforsamplingerror Sample statistics are used to estimate population parameters Samples tend to have less variation than the populations n-1 is used to adjust for this by inflating a value we know to be artificially low

Evan Misshula Stats1 Definitions and Variables 2020-01-31 65 / 73 Khan Academy Link

Sample vs. Population Standard Deviation

Sample vs. Population Standard Devaiation They both use the difference between the mean ($ µ $ or $ X $)andeachobservation(individualXvalue)Thesampleuses“n − 100insteadofNtoadjustforsamplingerror Sample statistics are used to estimate population parameters Samples tend to have less variation than the populations n-1 is used to adjust for this by inflating a value we know to be artificially low

long mathematical derivation not going to do it

Evan Misshula Stats1 Definitions and Variables 2020-01-31 65 / 73 Sample vs. Population Standard Deviation

Sample vs. Population Standard Devaiation They both use the difference between the mean ($ µ $ or $ X $)andeachobservation(individualXvalue)Thesampleuses“n − 100insteadofNtoadjustforsamplingerror Sample statistics are used to estimate population parameters Samples tend to have less variation than the populations n-1 is used to adjust for this by inflating a value we know to be artificially low

long mathematical derivation not going to do it

Khan Academy Link

Evan Misshula Stats1 Definitions and Variables 2020-01-31 65 / 73 Another example

Example of univariate analysis Number of arrests (1,5,7,8,9)

arrestData <- c(1,5,7,8,9) mean(arrestData) sd(arrestData)

[1] 6 [1] 3.162278

Evan Misshula Stats1 Definitions and Variables 2020-01-31 66 / 73 Standard deviation insight

Insight into standard deviation The reason why the sum of the deviations from the mean is always zero: The mean serves as a balance point for the distribution The total distance of individual scores above the mean is exactly equal to the total distance of those below the mean The result is the positive and negative numbers cancelling each other out The solution is to get rid of the negative signs Squaring the deviations makes every number positive

Evan Misshula Stats1 Definitions and Variables 2020-01-31 67 / 73 Standard Deviation

Drawbacks of standard deviation It takes into account all values Drawbacks of standard deviation in the distribution Standard deviation and Values far from the mean are variance are sensitive to given extra weight because outliers (extreme values) deviations from the mean are squared

Evan Misshula Stats1 Definitions and Variables 2020-01-31 68 / 73 Stem and Leaf plot

a display useful in summarizing distribution Goes back to Aurther Bowley’s work in early 1900’s Popularized by John Tukey in 1977’s Exploratory Data Analysis

Evan Misshula Stats1 Definitions and Variables 2020-01-31 69 / 73 Computing a Stem and Leaf

class scores Class Scores: 75 8 36 36 55 55 42 36 50 42 100 83 27 58 55 55 27 83 17 58 82 27 92 50 42

Evan Misshula Stats1 Definitions and Variables 2020-01-31 70 / 73 Read then into R

myScores<-c(75,8,36,36,55,55,42,36,50,42,100,83,27,58,55,55,27,83,17,58,82,27,92,50,42) myOrdScores <- myScores[order(myScores)] stem(myScores,scale=1)

The decimal point is 1 digit(s) to the right of the |

0 | 87 2 | 777666 4 | 22200555588 6 | 5 8 | 2332 10 | 0

Evan Misshula Stats1 Definitions and Variables 2020-01-31 71 / 73 Display Stem

stem(myScores,scale=2)

The decimal point is 1 digit(s) to the right of the |

0 | 8 1 | 7 2 | 777 3 | 666 4 | 222 5 | 00555588 6 | 7 | 5 8 | 233 9 | 2 10 | 0

Evan Misshula Stats1 Definitions and Variables 2020-01-31 72 / 73 Next time

Topics for next time Probability The Gaussian Distribution and the “Normal” Curve Risk of Error

Evan Misshula Stats1 Definitions and Variables 2020-01-31 73 / 73