Outline Why ? Populations, Samples, and Proportions, Averages, , and

Week 1 Basic Statistical Concepts, Part I

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Populations, Samples, and Census Proportions, Averages, Variances, and Percentiles Week 1 Objectives

We will give an introduction to the statistical package R, and to statistics. The introduction to R is included in a different pdf file and a script file. After motivating the need for statistics in engineering and scientific research, we will introduce fundamental notions that forms the foundation for the material in later weeks. In particular we will introduce the notions of population, census, , and sampling variability, and will define such basic quantities as the proportion, , and percentiles.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Populations, Samples, and Census Proportions, Averages, Variances, and Percentiles

1 Why Statistics?

2 Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations

3 Proportions, Averages, Variances, and Percentiles Population Proportions and Proportions Population Averages and Sample Averages Population Variance and Sample Variance Sample Percentiles

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Populations, Samples, and Census Proportions, Averages, Variances, and Percentiles

Example (Examples of Engineering/Scientific Studies) Comparing the compressive strength of two or more cement mixtures. Comparing the effectiveness of three cleaning products in removing four different types of stains. Predicting failure time on the basis of stress applied. Assessing the effectiveness of a new traffic regulatory measure in reducing the weekly rate of accidents. Testing a manufacturer’s claim regarding a product’s quality. Studying the relation between salary increases and employee productivity in a large corporation.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Populations, Samples, and Census Proportions, Averages, Variances, and Percentiles

These studies require Statistics due to the intrinsic variability: The compressive strength of different preparations of the same cement mixture will differ. The figure in http://personal.psu.edu/acq/401/fig/ HistComprStrCement.pdf shows 32 compressive strength measurements (MegaPascal units), of test cylinders (6 in. diameter, 12 in. high), using water/cement ratio of 0.4, measured on the 28th day after they are made. Under the same stress, two beams fail at different times. The proportion of defective items of a certain product will differ from batch to batch. Intrinsic variability renders the objectives of the case studies, as stated, ambiguous.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Populations, Samples, and Census Proportions, Averages, Variances, and Percentiles

The objectives of the case studies can be made precise if stated in terms of averages or .

Comparing the average compressive strength of two different cement mixtures. Estimation of average failure time on the basis of stress applied. Estimation of the average proportion of defective items.

Moreover, because of variability, the words ”average” and ”mean” have a technical meaning which can be made clear through the concepts of population and sample.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Definition Population is a well-defined collection of objects or subjects, of relevance to a particular study, which are exposed to the same treatment or method.

Population members are called units. The objective of a study is to investigate certain characteristic(s) of the units of the population(s) of interest.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Example (Populations and Unit Characteristics) All water samples taken from a lake. Characteristics: Mercury concentration; Concentration of other pollutants. All items of a certain manufactured product (that have, or will be produced). Characteristic: Proportion of defective items. All students enrolled in Big Ten universities during the 2019-2020 academic year. Characteristics: Favorite type of music; Political affiliation. Two types of cleaning products. Characteristic: cleaning effectiveness.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Populations consisting of the same type of units but differ in the treatment, or method, applied to them are called treatment populations.

Example (Treatment Populations) The concentration of pollutants in water samples is analyzed by two different labs. Water samples sent to Lab 1 constitute population 1, and those sent to Lab 2 constitute population 2. The time to failure of beams is studied under different stress conditions. The beams subjected to each stress condition constitute different populations.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Census and Samples

Full (i.e., population-level) understanding of a characteristic can only be achieved by examining all population units. This is called census. However, taking a census can be time consuming and expensive: The 2000 U.S. Census costed $6.5 billion, while the 2010 Census costed $13 billion. Moreover, census is not feasible if the population is hypothetical or conceptual, i.e., not all members are available for examination. Because of the above, we typically settle for examining all units in a sample, which is a subset of the population.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Due to the intrinsic variability, the sample properties/attributes of the characteristic of interest will differ from the population-level properties/attributes. For example:

The average mercury concentration in 25 water samples will differ from the overall mercury concentration in the lake. The proportion in a sample of 100 PSU students who favor expanding the use of solar energy will differ from the corresponding proportion of all PSU students. The relation between bear’s chest girth and weight in a sample of 10 bears, will differ from the corresponding relation in the entire population of 50 bears in a forested region.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

The GOOD NEWS is that, if the sample is suitably drawn, then sample properties approximate the population properties.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles 400 300 Weight 200 100

20 25 30 35 40 45 50 55

Chest Girth Figure: Population and sample relationships between chest girth and weight of black bears.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Sampling Variability

Sample properties of the characteristic of interest also differ from sample to sample. For example: 1 The number of US citizens, in a sample of size 20, who favor expanding solar energy, will (most likely) be different from the corresponding number in a different sample of 20 US citizens. 2 The average mercury concentration in two sets of 25 water samples drawn from a lake will differ. The term sampling variability is used to describe such differences in the properties of the characteristic of interest from sample to sample.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles 400 300 Weight 200 100

20 25 30 35 40 45 50 55

Chest Girth

Figure: Illustration of Sampling Variability.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles and Statistics

Population level properties/attributes of characteristic(s) of interest are called (population) parameters. Examples of parameters include averages, proportions, percentiles, and the correlation coefficient. The corresponding sample properties/attributes of characteristics are called statistics. Sample statistics approximate the corresponding population parameters but are not equal to them. deals with the uncertainty issues which arise in approximating parameters by statistics. The tools of statistical inference include point and , hypothesis testing and prediction.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Outline

1 Why Statistics?

2 Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations

3 Proportions, Averages, Variances, and Percentiles Population Proportions and Sample Proportions Population Averages and Sample Averages Population Variance and Sample Variance Sample Percentiles

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

For valid statistical inference the sample must be representative of the population. For example, a sample of PSU basketball players is not representative of PSU students, if the characteristic of interest is height. Typically it is hard to tell whether a sample is representative of the population. So, we define a sample to be representative if ... (cyclical definition!!)

it allows for valid statistical inference.

The only guarantee for that comes from the sampling method, i.e., the method used to collect the sample. The good news is that there are several sampling methods which guarantee representativeness.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Definition A sample of size n is a (s.r.s.) if the selection process ensures that every sample of size n has equal chance of being selected.

• In simple random sampling every member of the population has the same chance of being included in the sample. Example A class consists of 35 male and 15 female students. To obtain a sample of 2, one flips a coin and if H selects a female, if T selects a male student. Does every student have the same chance of being included in the sample? Is this s.r.s?

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

• The fact that every member of the population has the same chance of being included in the sample does not, by itself, imply that the sample is s.r.s. Example To select a sample of 2 students from a population of 20 male and 20 female students, one selects at random one male and one female student. Does every student have the same chance of being included in the sample? Is this s.r.s?

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Another sampling method for obtaining a representative sample is called stratified sampling. Definition A stratified sample consists of simple random samples from each of a number of groups (which are non-overlapping and make up the entire population) called strata.

Examples of strata include: ethnic groups, age groups, and production facilities.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

• A common method of choosing the within-strata sample sizes assures that the sample representation of each stratum equals to its population representation (proportional allocation).

The stratified sample in the last example used proportional allocation.

• If the units in the different strata differ in terms of the characteristic under study, stratified sampling is preferable to s.r.s.

If different production facilities differ in terms of the proportion of defective products, a stratified sample is preferable for estimating the overall proportion of defective items produced.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

• Conceptually, selecting a s.r.s. of size n from a population of N units corresponds to the following steps:

STEP 1: Assign to each unit a number from 1 to N. STEP 2: Write each number on a slips of paper, place the N slips of paper in an urn, and shuffle them. STEP 3: Select n slips of paper at random, one at a time and without replacement.

• Because of STEP 3, simple random sampling is also referred to as sampling without replacement.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

If the selection of slips of paper in STEP 3 of the previous slide is done with replacement, we say that we have sampled with replacement. For example, rolling a die 10 times gives a sample with replacement from the numbers 1, 2,..., 6. Sampling with replacement is easier to work with from a mathematical point of view. When the population is very large, sampling with and without replacement are practically equivalent.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Non-representative Sampling

Non-representative samples arise whenever there is selection . That is, the sampling process excludes, or leaves under-represented, parts of the population. Two examples of non-representative samples are self-selected and convenience samples. Self-selected samples often occur in opinion surveys or . For example, in political surveys, those who feel that things are running smoothly, or without strong feelings, are less likely to respond than activists.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Convenience samples are made up from the most easily accessed units. For example, randomly selecting students from your classes will not result in a sample that is representative of all PSU students since your classes are mostly comprised of students with the same major as you. The Literary Digest poll of 1936 is often used to illustrate the misleading potential of self-selected and convenience samples.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Example (The Literary Digest poll of 1936) The magazine had been extremely successful in predicting the results in US presidential elections, but in 1936 it predicted a 3-to-2 victory for Republican Alfred Landon over the Democratic incumbent Franklin Delano Roosevelt. This prediction was based on 2.3 million responses to a sent to 10 million voters randomly selected from phone books. On the other hand Gallup correctly predicted the outcome of that election by surveying only 50,000 people.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Outline

1 Why Statistics?

2 Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations

3 Proportions, Averages, Variances, and Percentiles Population Proportions and Sample Proportions Population Averages and Sample Averages Population Variance and Sample Variance Sample Percentiles

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Variable = a Numerical Characteristic

Numerical characteristics such as length, mercury concentration, or number of accidents are called quantitative. Non-quantitative characteristics are called qualitative or categorical. For example, gender, make of car, eye color, political affiliation, or strength of opinion. For statistical purposes, the categories in qualitative characteristics are labeled with numbers, e.g., ’male’= −1, ’female’= +1. For characteristics such as “strength of opinion”, the chosen numbers should reflect the implicit ordering. A characteristic expressed as a number is called a variable.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Types of Variables

Quantitative variables expressing measurements on a continuous scale are called continuous. Measurements of length, strength, weight, or time to failure are examples of continuous variables. Qualitative variables, as well as quantitative variables expressing counts, e.g., number of defective items in a batch, are called discrete. When two or more characteristics are measured on each population unit, we have bivariate or multivariate variables. Example of bivariate: Salary increase and productivity. Example of multivariate: Age, income, education level.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Random Variables and Statistical Populations

When a unit is randomly sampled from a population, the value of its variable will be denoted by X (or Y, or Z, etc). Because of the intrinsic variability, X is not known a-priori and thus it is called a (r.v.). The population from which a random variable is drawn is called the underlying population of the r.v. The collection of of the variable values of all population units is called the . The statistical population of a r.v. should not be confused with the set of values a variable can take.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles

Example (Examples of Statistical Populations) 1 A list of the weight of every PSU student is the statistical population of the r.v. weight. 2 A list of 1s and 0s representing every student’s opinion on whether or not solar energy should be expanded is the statistical population of the r.v. expressing opinion on solar energy.

Week 1 Basic Statistical Concepts, Part I Outline Why Statistics? Some Sampling Concepts Populations, Samples, and Census Random Variables and Statistical Populations Proportions, Averages, Variances, and Percentiles Sampling from the Statistical Population

It should be intuitively clear that taking a sample of n units form some population and recording the variable of each sampled unit, is equivalent to taking a sample of n units from the statistical population of the random variable. Henceforth, the word sample will be mainly used to denote a sample from the statistical population. Thus, samples will be thought of as a collection of numbers. Moreover,

1 The numbers are not known a-priori, so they are r.v.’s.

2 A sample of size n will be denoted by X1, X2,..., Xn. 3 Once the sample values are recorded they will be denoted by x1, x2,..., xn.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Outline

1 Why Statistics?

2 Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations

3 Proportions, Averages, Variances, and Percentiles Population Proportions and Sample Proportions Population Averages and Sample Averages Population Variance and Sample Variance Sample Percentiles

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Proportions are relevant whenever the variable of interest is categorical, or has been categorized. Definition

1 If the population has N units, and Ni units are in category i, then the for category i, is

#{population units of category i} N p = = i . i #{population units} N

2 If a sample of size n is taken, and ni sample units are in category i, then the sample proportion for category i, is

#{sample units of category i} n p = = i . bi #{sample units} n

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Example 1 In a sample of 1000 adults, 72% favor tougher penalties for drunk driving. Is the correct notation for 0.72 p or pb? 2 In a population of 80 engineering majors taking a required statistics class, 40 are enthusiastic about having computer labs. In a s.r. sample of 20 from these students 8 are enthusiastic. What is the correct notation for 40/80 = 0.5 and for 8/20 = 2/5?

Always remember that, under s.r. sampling, pb approximates, but in general is different from p.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Outline

1 Why Statistics?

2 Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations

3 Proportions, Averages, Variances, and Percentiles Population Proportions and Sample Proportions Population Averages and Sample Averages Population Variance and Sample Variance Sample Percentiles

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Consider a population of N units, and let v1, v2,..., vN denote the statistical population corresponding to some variable. Then the population average or population mean, denoted by µ, is the arithmetic average of all values in the statistical population. Thus,

N 1 X µ = v . N i i=1 If the random variable X denotes the value of the variable of a randomly selected population unit, then the population mean is also called expected value of X, or mean value of X, and is denoted by µX or E(X).

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Example In a population of 500 tin plates, the number of plates with 0, 1 and 2 scratches is N0 = 190, N1 = 160 and N2 = 150. Thus, in the statistical population v1,..., v500, 190 vi equal 0, 160 equal 1, and 150 equal 2. The population mean is

500 1 X 0 × N0 1 × N1 2 × N2 µ = v = + + = 0.92 500 i 500 500 500 i=1 If a tin plate is selected at random and X is the rv denoting the number of scratches, the mean value of X is 0.92 and we write µX = 0.92, or E(X) = 0.92.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Two important properties of the mean

n X The mean µ satisfies: (vi − µ) = 0. i=1 Pn Pn Indeed, i=1(vi − µ) = i=1 vi − nµ = nµ − nµ = 0.

For any number a we have

n n X 2 X 2 (vi − µ) ≤ (vi − a) . i=1 i=1

Pn 2 Setting the first derivative of i=1(vi − a) with respect to a equal to zero and solving for a gives a = µ. Also the second derivative is positive.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Use the statistical population of the variable scratches for the 500 tin plates and the following R commands to illustrate the two properties of the mean:

V=c(rep(0,190), rep(1,160), rep(2,150)); sum(V-mean(V))

sum((V-mean(V))**2); sum((V-1)**2)

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

If a sample of size n is taken, and x1, x2,..., xn denote the variable values of the sample units, then the sample average or sample mean, denoted by x, is n 1 X x = x n i i=1 Under s.r. sampling, a sample mean approximates, but in general is different from the population mean. Example If a s.r. sample of n = 100 is taken from the 500 tin plates, it could be that there are n0 = 40, n1 = 34 and n2 = 26 plates with 0, 1 and 2 scratches. In this case, x = 0.86.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Proportions are Averages!

A proportion is always the mean of a suitably defined random variable. To see this consider the example with the tin plates, where N1 = 160 out of N = 500 have one scratch. Then: For the random variable X which takes the value 1 if a tin plate has one scratch and the value 0 otherwise, its statistical population, v1,..., v500, consists of 160 1s and 340 0s. Thus,

500 1 X 160 µ = v = = 0.32. X 500 i 500 i=1

But 0.32 = N1/N is the proportion of tin plates with 1 scratch.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Outline

1 Why Statistics?

2 Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations

3 Proportions, Averages, Variances, and Percentiles Population Proportions and Sample Proportions Population Averages and Sample Averages Population Variance and Sample Variance Sample Percentiles

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Let v1, v2,..., vN be a statistical population with mean µ. Definition The population variance, σ2, is defined as N 1 X σ2 = (v − µ)2. N i i=1

The standard√ deviation is the positive square root of the variance: σ = σ2. If the rv X denotes a randomly selected value from the statistical population, then a synonymous terminology for the population variance is variance of X, and is denoted by σ2 , or q X 2 Var(X). The of X is σX = σX .

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

A simpler computational formula for the variance is

N 1 X σ2 = v 2 − µ2 N i i=1 .Example Consider the tin plate example, so the statistical population v1,..., v500, has 190 vi equal 0, 160 equal 1, 150 equal 2, and µ = 0.92. Then, 190 × 0 1 × 160 4 × 150 σ2 = + + − 0.922 = 0.6736. 500 500 500

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

If x1, x2,..., xn denotes a sample from the statistical population, the sample variance and its computational formula are:

n n n 2 1 X 1  X 1 X   S2 = (x − x)2 = x2 − x . n − 1 i n − 1 i n i i=1 i=1 i=1 √ The sample standard deviation is S = S2. Under s.r. sampling, S2 approximates, but in general is different from σ2. Example Consider the s.r. sample of n = 100 tin plates, which has 40, 34 and 26 plates with 0, 1 and 2 scratches. Then,

1 S2 = [138 − 73.96] = 0.647 99

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Why Divide by n − 1?

• Because this way the mean of the sample variances resulting from all possible samples taken with replacement is equal to the population variance. A simple demonstration of this follows:

Take a s.r.s. of size two from the infinite (conceptual) population of all coin tosses, i.e., toss a coin twice, setting 0 for H and 1 for T. The possible samples are {0, 0}, {0, 1}, {1, 0}, {1, 1}. Verify that the four sample variances average to 0.25. In Chapter 3, we will see that the variance of this infinite population of all coin tosses equals the population variance of {0, 1}, which is (check!) 0.25.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Outline

1 Why Statistics?

2 Populations, Samples, and Census Some Sampling Concepts Random Variables and Statistical Populations

3 Proportions, Averages, Variances, and Percentiles Population Proportions and Sample Proportions Population Averages and Sample Averages Population Variance and Sample Variance Sample Percentiles

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Roughly speaking, the (1 − α)100th sample separates the part having the (1 − α)100% smaller values, from that which has the α100% larger values. Thus: The 90th sample percentile separates the largest 10% from the lower 90% values in the set. The 50th sample percentile is also called the sample . The 25th, the 50th and the 75th sample percentiles are also called sample quartiles. The 25th and 75th percentiles are the lower quartile and upper quartile, respectively. The distance between the lower and upper quartiles is called the interquartile or IQR.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Order Statistics as Sample Percentiles

Let X1,..., Xn be a s.r. sample from a continuous population. The ordered sample values are denoted

X(1), X(2),..., X(n) .

Thus, X(1) < X(2) < ··· < X(n).

X(i), the ith smallest sample value, is defined to be the

i−0.5 100 n -th sample percentile.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Example

A s.r.s. of 10 black bears’ weights is: 154 158 356 446 40 154 90 94 150 142. Give the order statistics, and state the sample percentiles they correspond to. Solution: The R command sort( c(154, 158, 356, 446, 40, 154, 90, 94, 150, 142) ) returns the order statistics: 40, 90, 94, 142, 150, 154, 154, 158, 356, 446. These order statistics are the 5th, 15th, 25th, 35th, 45th, 55th, 65th, 75th, 85th and 95th sample percentiles. For example, X(3) = 94 is the 100(3 − 0.5)/10 = 25-th sample percentile. In R these percentiles are obtained with: 100*(1:10 - 0.5)/10.]

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

In the above example none of the order statistics corresponds to the median or the 90th percentile. In general, if n is even, none of the order statistics corresponds to the median. For example, If n = 4 then 1.5 X(2) is the 100 4 = 37.5th sample percentile, 2.5 while X(3) is the 100 4 = 62.5th sample percentile. Depending on n, the above definition may not identify other percentiles of interest. In such cases, percentiles are defined by the use of interpolations.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Hand Calculation of Sample Median

Definition

Let X(1), X(2),..., X(n) denote the ordered sample values in a sample of size n. The sample median is defined as  X( n+1 ), if n is odd  2  Xe = X( n ) + X( n +1)  2 2 , if n is even  2

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Example (Relation Between Xe and X)

Find the sample median of X1 = 2.3, X2 = 3.2, X3 = 1.8, X4 = 2.5, X5 = 2.7. Solution. Here, X(1) = 1.8, X(2) = 2.3, X(3) = 2.5, X(4) = 2.7, X(5) = 3.2. Since sample size is odd,

Xe = X n+1 = X(3) = 2.5. ( 2 )

For this data, X = Xe = 2.5.

If X(5) is changed to 4.2, then X = 2.7 but Xe = 2.5. Thus X is affected by , where as Xe is not. In general, if the of the data is positively skewed X > Xe, and if it is negatively skewed X < Xe.

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles Hand Calculation of Sample Quartiles and Sample IQR

Definition The sample lower quartile or SLQ is defined as the median of the smallest n/2 values, if n is even the median the smallest (n + 1)/2 values, if n is odd The sample upper quartile or SUQ is defined as the median of the largest n/2 values, if n is even the median the largest (n + 1)/2 values, if n is odd The sample , or SIQR, is defined as

SIQR = SUQ − SLQ,

Week 1 Basic Statistical Concepts, Part I Outline Population Proportions and Sample Proportions Why Statistics? Population Averages and Sample Averages Populations, Samples, and Census Population Variance and Sample Variance Proportions, Averages, Variances, and Percentiles Sample Percentiles

Example Find the lower and upper quartiles of the n = 9 observations 9.39, 7.04, 7.17, 13.28, 9.00, 7.46, 21.06, 15.19, 7.50. Solution. Since n is odd, the SLQ is the median of the

Smallest 5(= (n + 1)/2) values: 7.04, 7.17, 7.46, 7.50, 9.00 and the SUQ is the median of the

Largest 5(= (n + 1)/2) values: 9.00, 9.39, 13.28, 15.19, 21.06.

Thus SLQ = 7.46, and SUQ = 13.28.

Week 1 Basic Statistical Concepts, Part I