Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

1. Preface In estimation is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan , analyze data and interpret results. A thorough explanation of point and are discussed. Four important steps to understand interval estimation were explained. In addition to the scenario for more than one population. 2. Introduction Let’s first discus a term called Estimation. It’s a division of statistics and signal processing that determines the values of parameters through measured and observed empirical data. The process of estimation is carried out in order to measure and diagnose the true value of a function or a particular set of populations. It is done on the basis of observations on the samples, which are a combined piece of the target population or function. Several statistics are used to perform the task of estimation. It’s an essential to understand this concept in order to progress further in the field of data analysis. To simplify the definition above, if I want to construct an to understand the behavior of a population. Then I would draw a sample (1), and maybe again sample (2), till sample (n). the question is why you drew these samples? The answer is, of which we have already discussed in previous chapters, I want to estimate the values of the population parameters using these samples’ statistics. See the word “estimate”, that’s the focal point of this summary paper. 3. Estimation in Statistics Estimation statistics is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results. It is distinct from null hypothesis significance testing (NHST), that we will cover in upcoming chapters, which is considered to be less informative. Estimation statistics, or simply estimation, is also known as the new statistics, a distinction introduced in the fields of , , life sciences and a wide of other experimental sciences where NHST still remains prevalent, despite estimation statistics having been recommended as preferable for several decades ("Research that Matters ", 2002; Cohen, 1994). 4. Map of Estimation The map below illustrates a map of estimation in statistics. We are going to discus them in details.

Mean CI for Point single population CI for variance Esstimation CI for mean diference Interval Two population CI for ratio of variance

Figure 1: Map of estimation

Remember that Statisticians use sample statistics to estimate population parameters. For example, sample are used to estimate population means; sample proportions, to estimate population proportions.

P a g e 1 | 8

Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

1. Point estimate: A point estimate of a population parameter is a single value of a . For example, the sample mean x is a point estimate of the population mean μ. Similarly, the sample proportion p is a point estimate of the population proportion P. In addition, 퓼 is a point estimate of the population variance 휎2. 2. Interval estimate: An interval estimate is defined by two numbers, between which a population parameter is said to lie. The picture of interval estimate is this 풑(푳 ≤ 휽 ≤ 푼) = ퟏ − 휶 where L is the lower boundary and U is the upper boundary, 휽 is the population parameter. let’s now discus the interval estimate for a single and two population.

a. Single population: let’s assume that I have got a single population. Then I draw “S1”, “S2”, .. , “Sn”. as a result, I can calculate the “statistics” from each sample. Let’s take for example the statistic 푥̅ and 휎2. If I settle these statistics in a vector as the figure below, then these vectors will follow a certain distribution. Please refer to chapter three (Dahman, 2018a).

2 푥1̅ 휎1 Population 2 푥̅2 휎2 푥̅ 2 푋̅ = 3 휎 = 휎3 푥̅4 2 S1 S2 … Sn 휎4 . . . . 푥 1̅ 푥̅2 푥̅푛 푥̅푛 2 휎푛 휎1 휎2 휎푛

The figure below will help to decide the type of distribution we can use.

n>=30 Z distribution 흈 known n<30 Z distribution Normal n>=30 Zdistribution n>=30 Zdistribution 흈 un-known n<30 t distribution 흈 known n<30 No Population

n>=30 Z distribution Non-Normal

n<30 No 흈 un-known

Figure 2: type of distribution according to population

In the single population for either mean or variance we have to understand the before we learn how to calculate it.

P a g e 2 | 8

Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

• Confidence Interval: Statisticians use a confidence interval to express the precision and uncertainty associated with a particular method. A confidence interval consists of three parts: 1) A confidence levels The probability part of a confidence interval is called a confidence level. The confidence level describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. Here is how to interpret a confidence level. Suppose we collected all possible samples from a given population, and computed confidence intervals for each sample. Some confidence intervals would include the true population parameter; others would not. A 95% confidence level means that 95% of the intervals contain the true population parameter; a 90% confidence level means that 90% of the intervals contain the population parameter; and so on. See the figure below. 2) A statistic, any sample statistics such as mean, proportion, variance, etc. 3) A margin of error In a confidence interval, the range of values above and below the sample statistic is called the margin of error. For example, suppose the local newspaper conducts an election survey and reports that the independent candidate will receive 30% of the vote. The newspaper states that the survey had a 5% margin of error and a confidence level of 95%. These findings result in the following confidence interval: We are 95% confident that the independent candidate will receive between 25% and 35% of the vote. Note: Many public opinion surveys report interval estimates, but not confidence intervals. They provide the margin of error, but not the confidence level. To clearly interpret survey results you need to know both! We are much more likely to accept survey findings if the confidence level is high (say, 95%) than if it is low (say, 50%). The confidence level describes the uncertainty of a sampling method. The statistic and the margin of error define an interval estimate that describes the precision of the method. The interval estimate of a confidence interval is defined by the sample statistic +/- margin of error. For example, suppose we compute an interval estimate of a population parameter. We might describe this interval estimate as a 95% confidence interval. This means that if we used the same sampling method to select different samples and compute different interval estimates, the true population parameter would fall within a range defined by the sample statistic +/- margin of error 95% of the time (see the figure below). Confidence intervals are preferred to point estimates, because confidence intervals indicate (a) the precision of the estimate and (b) the uncertainty of the estimate.

P a g e 3 | 8

Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

Assume I collected (n) samples. See 푥̅4, it’s the statistic from sample 4. It doesn’t contain the parameter mean (i.e. away from the mean parameter). To the right of the upper bound. However other statistics do contain the mean. Then I would have (n-1) samples contained the parameter mean, and one sample (i.e. sample 4) did not. The confidence interval says that I have chance of 95% the sample will contain the mean. And 5% will contain the error. So, the

(n-1) was the 95% and sample 4 was in the 5% margin of error.

• Calculate Confidence Interval: the process to obtain the confidence interval for either the mean or the variance will be accordingly with the type of distribution as we have shown in figure 2. 1) Obtain CI For the population mean: in this section we have the scenario based on 흈 is known or un-known. As well as the sample size. ▪ Use Z distribution: the steps are straightforward as follow: 푇 1. Collect the sample of size (n): 푥 = (푥1, 푥2, . . , 푥푛) , 1 2. Compute the sample mean and : 푥̅ = ∑푛 푥 ; 푠 = 푛 푗=1 푖 1 √∑푛 (푥 − 푥̅)2; 푛−1 푖=1 푖 푥−휇 3. Choose alpha and obtain upper and lower value of Z: 푝 (퐿 ≤ ≤ 푈) = 1 − 훼; 휎 4. Develop the interval: 푥̅ − 푧훼/2휎푥̅ ≤ 휇 ≤ 푥̅ + 푧훼/2휎푥̅

If you are interested to learn how we have calculated the formula from step 4. You may refer to (Dean W. & Wichern, 2007).

Example: you have collected a sample of (n=76); and you computed (푥̅ = 7, 푠 = 4); given that the population standard deviation 휎 = 3. Construct 95% confidence interval for 푥̅?

Required 95% interval that means 100(1 − 훼) = 95; solve for alpha then you get (훼) = 0.05; and (훼/2) = 0.025; apply the formula from step (4); you will get 3 3 7 − 푍0.025 ≤ 휇 ≤ 7 + 푍0.025 √76 √76 Find the value of Z alpha from the table 3 3 7 − 1.96 ≤ 휇 ≤ 7 + 1.96 √76 √76

P a g e 4 | 8

Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

Result will be 6.32 ≤ 휇 ≤ 7.67. From the result you can see the difference between the point estimate and confidence interval. In point estimate we just say 휇 = 7. But with interval we say that 휇 will fall between two values (6.32 and 7.67).

Example extend: what will happen if the population standard deviation 휎 is un-know. The answer as we illustrated in figure 2; we still can use Z distribution, as long as the sample size is larger than 30. What change is that I can use the statistic 풔 from the sample instead of the parameter. So, you can follow the exact same calculation and just replace 휎 value with 푠 = 4.

▪ Use t distribution: see the same example. If the sample size is less than 30, then in this case 푥̅−휇 I can’t use Z distribution, instead, I will use t distribution. 푡푛−1 = 푠 . We can use the same ⁄ √푛 four steps as we did in Z distribution. The only changes will be in the formula as following: 푠 푠 푥̅ − 푡훼/2 ≤ 휇 ≤ 푥̅ + 푡훼/2 푛−1 √푛 푛−1 √푛

2) Obtain CI For the population variance: the same technique I will follow to obtain the CI for the population variance. However, the only change will be the distribution. Recall from chapter three (Dahman, 2018a), we have mentioned that, once I want to learn the distribution of squared variance then I will follow Chi square distribution. Thus, following the exact same steps I can write the formula as following:

(푛 − 1)푠2 (푛 − 1)푠2 ≤ 휎2 ≤ 2훼 2 훼 휒 ,푛−1 휒 1− ,푛−1 2 2 I have to draw your attention to the formula above, see the “numerator” terms are the same. The difference is in the “denominator” terms. As you know that Chi square is restricted to the degree of freedom (n-1), however in the alpha value (훼) we have two sides the upper (U) and lower (L). see figure below. So, the 휎2 will be greater than the U but less than the L. in other 훼 words, if you see the chart distribution you see that, this value 1 − associated with L is less 2 훼 (푛−1)푠2 than this value associated with U. that means this value will be absolutely less than 2 휒2훼 ,푛−1 2 (푛−1)푠2 . 휒2훼 ,푛−1 2

P a g e 5 | 8

Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

Example: you have collected a sample of (n=30); and you computed (푠2 = 25). Construct 90% confidence interval for 휎2? You have all the information you need. Let’s determine the value of 훼 alpha. Given the CI 90%. Then 1 − 훼 = 1 − 0.90 that gives value 0.10. = 0.05. now apply the 2 (29)25 2 (29)25 2 formula above 2 ≤ 휎 ≤ 2 . Result 17.03 ≤ 휎 ≤ 40.94. 4.13 ≤ 휎 ≤ 6.40. 휒 0.05,29 휒 0.95,29

b. Two population: we have understood the concept of CI for a single population. The question is how about having two population! Recall the picture of interval estimate 풑(푳 ≤ 휽 ≤ 푼) = ퟏ − 휶. This picture illustrated the scenario for a single population. The question is how if I have two population?

Well it will resemble the same picture with a minor change. 풑(푳 ≤ 휽ퟏ − 휽ퟐ ≤ 푼) = ퟏ − 휶. Note that 휽ퟏ is the parameter of the first population and 휽ퟐ the parameter of the second population. • Confidence Interval for two population: it’s the same definition, for single population, as we have introduced in the section above: Statisticians use a confidence interval for two population to express the precision and uncertainty associated with two population samples for a particular sampling method. A confidence interval consists of three parts: the confidence level, the statistics, and the margin of error. • Calculate the confidence interval for two population: the process to obtain the confidence interval for two population samples will follow the same map as we illustrated in figure 2. 1) Obtain CI For the two population mean: in this section we have the scenario based on 흈 is known or un-known. As well as the sample size. ▪ Use Z distribution: the four steps are straightforward as the one in single population. I will have the final formula as following: note for the mathematical abstraction you may see (Dean W. & Wichern, 2007). 2 2 2 2 휎1 휎2 휎1 휎2 (푥1̅ − 푥̅2) − 푧훼√ + ≤ 흁ퟏ − 흁ퟐ ≤ (푥1̅ − 푥̅2) + 푧훼/2√ + . 2 푛1 푛2 푛1 푛2 This formula is applicable under some conditions, of which you know one of them. The first

condition is that 흈ퟏ and 흈ퟐ must be known and not equal. The second one is that sample size must be larger than 30.

Let’s see the first condition. If you know the two parameters (흈ퟏ and 흈ퟐ) that’s fine, however how if they are equal. In this case, this formula will not apply. We have to use ퟐ extend of it. Using something called 풔풑풐풐풍풆풅. 2 2 ퟐ (푛1−1)푆1 +(푛2−1)푆2 2 푺풑풐풐풍풆풅 = . This quantity will be the replacement value of 휎 from above. 푛1+푛1−2 The new arrangement will be:

ퟐ 1 1 ퟐ 1 1 (푥1̅ − 푥̅2) − 푧훼푺풑풐풐풍풆풅√ + ≤ 흁ퟏ − 흁ퟐ ≤ (푥1̅ − 푥̅2) + 푧훼푺풑풐풐풍풆풅√ + 2 푛1 푛2 2 푛1 푛2 So, now you see in case you have the two-population are equal what formula to 푆2+푆2 use. One more note, in case that (푛 = 푛 = 푛). Then 푺ퟐ = 1 2 . 1 2 풑풐풐풍풆풅 2

P a g e 6 | 8

Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

▪ t distribution: Let’s see now the second condition: if the sample size is less than 30. Well the answer I believe is simple, we will use t distribution. Replace Z with t, and for the degree of

freedom it will be (푛1 + 푛1)-2. The formula will be:

ퟐ 1 1 ퟐ 1 1 ( ) 훼 √ ( ) 훼 √ 푥1̅ − 푥̅2 − 푡 ;(푛 +푛 )−2푺풑풐풐풍풆풅 + ≤ 흁ퟏ − 흁ퟐ ≤ 푥1̅ − 푥̅2 + 푡 ;(푛 +푛 )−2푺풑풐풐풍풆풅 + 2 1 1 푛1 푛2 2 1 1 푛1 푛2

2) Obtain CI For the ratio of two-population variance: it’s the same technique as finding the CI for ퟐ 흈ퟏ two population mean. The only minor change In the picture is 푃 (푳 ≤ ퟐ ≤ 푼) = ퟏ − 휶. The 흈ퟐ same four steps will be followed. (1) we collect the sample, (2) we compute the statistic “in this case the variance”, (3) decide the alpha value, and finally (4) construct the interval. Remember, from chapter three (Dahman, 2018a), we have mentioned that if I have from population one (푛−1)푆2 sample 푛 , and 풔ퟐ as well as 흈ퟐ. Then will follow 흌ퟐ chi square distribution. Same for 1 ퟏ ퟏ 휎2 population two. In this case I would have for both variances’ ratio in the numerator and the denominator chi square values. And that will follow F distribution with two degree of freedom. 2 2 (푛1−1)푠1 (푛2−1)푠2 퐹푛1−1,푛2−1 = 2 / 2 . Finally, you can construct the interval as below. Note the 휎1 휎2 values of alpha (1 − 훼/2) and (훼/2). Both are same as explained in single population. 2 2 ퟐ 2 2 푠1 /푠2 흈ퟏ 푠1 /푠2 1−훼/2 ≤ ퟐ ≤ 훼/2 퐹 흈ퟐ 퐹 푛1−1,푛2−1 푛1−1,푛2−1

P a g e 7 | 8

Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International

Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh

• "Research that Matters ". (2002). FAQs | Research that matters, results that make sense. Retrieved October 23, 2018, from https://effectsizefaq.com/ • Cohen, J. (1994). The earth is round (p<.05). American Psychologiest, 49(12), 997–1003. Retrieved from http://www.iro.umontreal.ca/~dift3913/cours/papers/cohen1994_The_earth_is_round.pdf • Dahman, M. R. (2018a). AMSM- - Chapter Three. OSF Preprints. https://doi.org/10.31219/OSF.IO/H5AUC • Dean W., R. A., & Wichern, J. (2007). Applied Multivariate Statistical Analysis (6th ed.). Pearson Prentice Hall.

P a g e 8 | 8