1. Preface 2. Introduction 3. Estimation in Statistics 4. Map of Estimation
Total Page:16
File Type:pdf, Size:1020Kb
Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh 1. Preface In statistics estimation is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results. A thorough explanation of point and interval estimation are discussed. Four important steps to understand interval estimation were explained. In addition to the scenario for more than one population. 2. Introduction Let’s first discus a term called Estimation. It’s a division of statistics and signal processing that determines the values of parameters through measured and observed empirical data. The process of estimation is carried out in order to measure and diagnose the true value of a function or a particular set of populations. It is done on the basis of observations on the samples, which are a combined piece of the target population or function. Several statistics are used to perform the task of estimation. It’s an essential to understand this concept in order to progress further in the field of data analysis. To simplify the definition above, if I want to construct an experiment to understand the behavior of a population. Then I would draw a sample (1), and maybe again sample (2), till sample (n). the question is why you drew these samples? The answer is, of which we have already discussed in previous chapters, I want to estimate the values of the population parameters using these samples’ statistics. See the word “estimate”, that’s the focal point of this summary paper. 3. Estimation in Statistics Estimation statistics is a data analysis framework that uses a combination of effect sizes, confidence intervals, precision planning, and meta-analysis to plan experiments, analyze data and interpret results. It is distinct from null hypothesis significance testing (NHST), that we will cover in upcoming chapters, which is considered to be less informative. Estimation statistics, or simply estimation, is also known as the new statistics, a distinction introduced in the fields of psychology, medical research, life sciences and a wide range of other experimental sciences where NHST still remains prevalent, despite estimation statistics having been recommended as preferable for several decades ("Research that Matters ", 2002; Cohen, 1994). 4. Map of Estimation The map below illustrates a map of estimation in statistics. We are going to discus them in details. Mean CI for mean Point variance single population CI for variance Esstimation CI for mean diference Interval Two population CI for ratio of variance Figure 1: Map of estimation Remember that Statisticians use sample statistics to estimate population parameters. For example, sample means are used to estimate population means; sample proportions, to estimate population proportions. P a g e 1 | 8 Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh 1. Point estimate: A point estimate of a population parameter is a single value of a statistic. For example, the sample mean x is a point estimate of the population mean μ. Similarly, the sample proportion p is a point estimate of the population proportion P. In addition, 퓼 is a point estimate of the population variance 휎2. 2. Interval estimate: An interval estimate is defined by two numbers, between which a population parameter is said to lie. The picture of interval estimate is this 풑(푳 ≤ 휽 ≤ 푼) = ퟏ − 휶 where L is the lower boundary and U is the upper boundary, 휽 is the population parameter. let’s now discus the interval estimate for a single and two population. a. Single population: let’s assume that I have got a single population. Then I draw “S1”, “S2”, .. , “Sn”. as a result, I can calculate the “statistics” from each sample. Let’s take for example the statistic 푥̅ and 휎2. If I settle these statistics in a vector as the figure below, then these vectors will follow a certain distribution. Please refer to chapter three (Dahman, 2018a). 2 푥1̅ 휎1 Population 2 푥̅2 휎2 2 푥̅3 휎 푋̅ = 휎 = 3 푥̅4 2 S1 S2 … Sn 휎4 . 푥 1̅ 푥̅2 푥̅푛 푥̅푛 2 휎푛 휎1 휎2 휎푛 The figure below will help to decide the type of distribution we can use. n>=30 Z distribution 흈 known n<30 Z distribution Normal n>=30 Zdistribution n>=30 Zdistribution 흈 un-known n<30 t distribution 흈 known n<30 No Population n>=30 Z distribution Non-Normal n<30 No 흈 un-known Figure 2: type of distribution according to population In the single population for either mean or variance we have to understand the confidence interval before we learn how to calculate it. P a g e 2 | 8 Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh • Confidence Interval: Statisticians use a confidence interval to express the precision and uncertainty associated with a particular sampling method. A confidence interval consists of three parts: 1) A confidence levels The probability part of a confidence interval is called a confidence level. The confidence level describes the likelihood that a particular sampling method will produce a confidence interval that includes the true population parameter. Here is how to interpret a confidence level. Suppose we collected all possible samples from a given population, and computed confidence intervals for each sample. Some confidence intervals would include the true population parameter; others would not. A 95% confidence level means that 95% of the intervals contain the true population parameter; a 90% confidence level means that 90% of the intervals contain the population parameter; and so on. See the figure below. 2) A statistic, any sample statistics such as mean, proportion, variance, etc. 3) A margin of error In a confidence interval, the range of values above and below the sample statistic is called the margin of error. For example, suppose the local newspaper conducts an election survey and reports that the independent candidate will receive 30% of the vote. The newspaper states that the survey had a 5% margin of error and a confidence level of 95%. These findings result in the following confidence interval: We are 95% confident that the independent candidate will receive between 25% and 35% of the vote. Note: Many public opinion surveys report interval estimates, but not confidence intervals. They provide the margin of error, but not the confidence level. To clearly interpret survey results you need to know both! We are much more likely to accept survey findings if the confidence level is high (say, 95%) than if it is low (say, 50%). The confidence level describes the uncertainty of a sampling method. The statistic and the margin of error define an interval estimate that describes the precision of the method. The interval estimate of a confidence interval is defined by the sample statistic +/- margin of error. For example, suppose we compute an interval estimate of a population parameter. We might describe this interval estimate as a 95% confidence interval. This means that if we used the same sampling method to select different samples and compute different interval estimates, the true population parameter would fall within a range defined by the sample statistic +/- margin of error 95% of the time (see the figure below). Confidence intervals are preferred to point estimates, because confidence intervals indicate (a) the precision of the estimate and (b) the uncertainty of the estimate. P a g e 3 | 8 Summary Papers- Applied Multivariate Statistical Modeling- Estimation (Point and Interval) -Chapter Four Author: Mohammed R. Dahman. Ph.D. in MIS ED.10.18 License: CC-By Attribution 4.0 International Citation: Dahman, M. R. (2018, October 24). AMSM- Estimation (Point and Interval)- Chapter Four. https://doi.org/10.31219/osf.io/fc9zh Assume I collected (n) samples. See 푥̅4, it’s the statistic from sample 4. It doesn’t contain the parameter mean (i.e. away from the mean parameter). To the right of the upper bound. However other statistics do contain the mean. Then I would have (n-1) samples contained the parameter mean, and one sample (i.e. sample 4) did not. The confidence interval says that I have chance of 95% the sample will contain the mean. And 5% will contain the error. So, the (n-1) was the 95% and sample 4 was in the 5% margin of error. • Calculate Confidence Interval: the process to obtain the confidence interval for either the mean or the variance will be accordingly with the type of distribution as we have shown in figure 2. 1) Obtain CI For the population mean: in this section we have the scenario based on 흈 is known or un-known. As well as the sample size. ▪ Use Z distribution: the steps are straightforward as follow: 푇 1. Collect the sample of size (n): 푥 = (푥1, 푥2, . , 푥푛) , 1 2. Compute the sample mean and standard deviation: 푥̅ = ∑푛 푥 ; 푠 = 푛 푗=1 푖 1 √∑푛 (푥 − 푥̅)2; 푛−1 푖=1 푖 푥−휇 3.