The Scalar Algebra of Means, Covariances, and Correlations
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Lecture 22: Bivariate Normal Distribution Distribution
6.5 Conditional Distributions General Bivariate Normal Let Z1; Z2 ∼ N (0; 1), which we will use to build a general bivariate normal Lecture 22: Bivariate Normal Distribution distribution. 1 1 2 2 f (z1; z2) = exp − (z1 + z2 ) Statistics 104 2π 2 We want to transform these unit normal distributions to have the follow Colin Rundel arbitrary parameters: µX ; µY ; σX ; σY ; ρ April 11, 2012 X = σX Z1 + µX p 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 1 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - Marginals General Bivariate Normal - Cov/Corr First, lets examine the marginal distributions of X and Y , Second, we can find Cov(X ; Y ) and ρ(X ; Y ) Cov(X ; Y ) = E [(X − E(X ))(Y − E(Y ))] X = σX Z1 + µX h p i = E (σ Z + µ − µ )(σ [ρZ + 1 − ρ2Z ] + µ − µ ) = σX N (0; 1) + µX X 1 X X Y 1 2 Y Y 2 h p 2 i = N (µX ; σX ) = E (σX Z1)(σY [ρZ1 + 1 − ρ Z2]) h 2 p 2 i = σX σY E ρZ1 + 1 − ρ Z1Z2 p 2 2 Y = σY [ρZ1 + 1 − ρ Z2] + µY = σX σY ρE[Z1 ] p 2 = σX σY ρ = σY [ρN (0; 1) + 1 − ρ N (0; 1)] + µY = σ [N (0; ρ2) + N (0; 1 − ρ2)] + µ Y Y Cov(X ; Y ) ρ(X ; Y ) = = ρ = σY N (0; 1) + µY σX σY 2 = N (µY ; σY ) Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 2 / 22 Statistics 104 (Colin Rundel) Lecture 22 April 11, 2012 3 / 22 6.5 Conditional Distributions 6.5 Conditional Distributions General Bivariate Normal - RNG Multivariate Change of Variables Consequently, if we want to generate a Bivariate Normal random variable Let X1;:::; Xn have a continuous joint distribution with pdf f defined of S. -
Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution
Health Sciences M.Sc. Programme Applied Biostatistics Mean and Standard Deviation The mean The median is not the only measure of central value for a distribution. Another is the arithmetic mean or average, usually referred to simply as the mean. This is found by taking the sum of the observations and dividing by their number. The mean is often denoted by a little bar over the symbol for the variable, e.g. x . The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later. The median is a very useful descriptive statistic, but not much used for other purposes. Median, mean and skewness The sum of the 57 FEV1s is 231.51 and hence the mean is 231.51/57 = 4.06. This is very close to the median, 4.1, so the median is within 1% of the mean. This is not so for the triglyceride data. The median triglyceride is 0.46 but the mean is 0.51, which is higher. The median is 10% away from the mean. If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will not. If the distribution is skew to the right, as for serum triglyceride, the mean will be greater, if it is skew to the left the median will be greater. This is because the values in the tails affect the mean but not the median. Figure 1 shows the positions of the mean and median on the histogram of triglyceride. -
1. How Different Is the T Distribution from the Normal?
Statistics 101–106 Lecture 7 (20 October 98) c David Pollard Page 1 Read M&M §7.1 and §7.2, ignoring starred parts. Reread M&M §3.2. The eects of estimated variances on normal approximations. t-distributions. Comparison of two means: pooling of estimates of variances, or paired observations. In Lecture 6, when discussing comparison of two Binomial proportions, I was content to estimate unknown variances when calculating statistics that were to be treated as approximately normally distributed. You might have worried about the effect of variability of the estimate. W. S. Gosset (“Student”) considered a similar problem in a very famous 1908 paper, where the role of Student’s t-distribution was first recognized. Gosset discovered that the effect of estimated variances could be described exactly in a simplified problem where n independent observations X1,...,Xn are taken from (, ) = ( + ...+ )/ a normal√ distribution, N . The sample mean, X X1 Xn n has a N(, / n) distribution. The random variable X Z = √ / n 2 2 Phas a standard normal distribution. If we estimate by the sample variance, s = ( )2/( ) i Xi X n 1 , then the resulting statistic, X T = √ s/ n no longer has a normal distribution. It has a t-distribution on n 1 degrees of freedom. Remark. I have written T , instead of the t used by M&M page 505. I find it causes confusion that t refers to both the name of the statistic and the name of its distribution. As you will soon see, the estimation of the variance has the effect of spreading out the distribution a little beyond what it would be if were used. -
User Guide April 2004 EPA/600/R04/079 April 2004
ProUCL Version 3.0 User Guide April 2004 EPA/600/R04/079 April 2004 ProUCL Version 3.0 User Guide by Anita Singh Lockheed Martin Environmental Services 1050 E. Flamingo Road, Suite E120, Las Vegas, NV 89119 Ashok K. Singh Department of Mathematical Sciences University of Nevada, Las Vegas, NV 89154 Robert W. Maichle Lockheed Martin Environmental Services 1050 E. Flamingo Road, Suite E120, Las Vegas, NV 89119 i Table of Contents Authors ..................................................................... i Table of Contents ............................................................. ii Disclaimer ................................................................. vii Executive Summary ........................................................ viii Introduction ................................................................ ix Installation Instructions ........................................................ 1 Minimum Hardware Requirements ............................................... 1 A. ProUCL Menu Structure .................................................... 2 1. File ............................................................... 3 2. View .............................................................. 4 3. Help .............................................................. 5 B. ProUCL Components ....................................................... 6 1. File ................................................................7 a. Input File Format ...............................................9 b. Result of Opening an Input Data File -
Characteristics and Statistics of Digital Remote Sensing Imagery (1)
Characteristics and statistics of digital remote sensing imagery (1) Digital Images: 1 Digital Image • With raster data structure, each image is treated as an array of values of the pixels. • Image data is organized as rows and columns (or lines and pixels) start from the upper left corner of the image. • Each pixel (picture element) is treated as a separate unite. Statistics of Digital Images Help: • Look at the frequency of occurrence of individual brightness values in the image displayed • View individual pixel brightness values at specific locations or within a geographic area; • Compute univariate descriptive statistics to determine if there are unusual anomalies in the image data; and • Compute multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). 2 Statistics of Digital Images It is necessary to calculate fundamental univariate and multivariate statistics of the multispectral remote sensor data. This involves identification and calculation of – maximum and minimum value –the range, mean, standard deviation – between-band variance-covariance matrix – correlation matrix, and – frequencies of brightness values The results of the above can be used to produce histograms. Such statistics provide information necessary for processing and analyzing remote sensing data. A “population” is an infinite or finite set of elements. A “sample” is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. (e.g., training signatures) 3 Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around the central value, and the frequency of occurrence declines away from this central point. -
Linear Regression
eesc BC 3017 statistics notes 1 LINEAR REGRESSION Systematic var iation in the true value Up to now, wehav e been thinking about measurement as sampling of values from an ensemble of all possible outcomes in order to estimate the true value (which would, according to our previous discussion, be well approximated by the mean of a very large sample). Givenasample of outcomes, we have sometimes checked the hypothesis that it is a random sample from some ensemble of outcomes, by plotting the data points against some other variable, such as ordinal position. Under the hypothesis of random selection, no clear trend should appear.Howev er, the contrary case, where one finds a clear trend, is very important. Aclear trend can be a discovery,rather than a nuisance! Whether it is adiscovery or a nuisance (or both) depends on what one finds out about the reasons underlying the trend. In either case one must be prepared to deal with trends in analyzing data. Figure 2.1 (a) shows a plot of (hypothetical) data in which there is a very clear trend. The yaxis scales concentration of coliform bacteria sampled from rivers in various regions (units are colonies per liter). The x axis is a hypothetical indexofregional urbanization, ranging from 1 to 10. The hypothetical data consist of 6 different measurements at each levelofurbanization. The mean of each set of 6 measurements givesarough estimate of the true value for coliform bacteria concentration for rivers in a region with that urbanization level. The jagged dark line drawn on the graph connects these estimates of true value and makes the trend quite clear: more extensive urbanization is associated with higher true values of bacteria concentration. -
Basic Statistical Concepts Statistical Population
Basic Statistical Concepts Statistical Population • The entire underlying set of observations from which samples are drawn. – Philosophical meaning: all observations that could ever be taken for range of inference • e.g. all barnacle populations that have ever existed, that exist or that will exist – Practical meaning: all observations within a reasonable range of inference • e.g. barnacle populations on that stretch of coast 1 Statistical Sample •A representative subset of a population. – What counts as being representative • Unbiased and hopefully precise Strategies • Define survey objectives: what is the goal of survey or experiment? What are your hypotheses? • Define population parameters to estimate (e.g. number of individuals, growth, color etc). • Implement sampling strategy – measure every individual (think of implications in terms of cost, time, practicality especially if destructive) – measure a representative portion of the population (a sample) 2 Sampling • Goal: – Every unit and combination of units in the population (of interest) has an equal chance of selection. • This is a fundamental assumption in all estimation procedures •How: – Many ways if underlying distribution is not uniform » In the absence of information about underlying distribution the only safe strategy is random sampling » Costs: sometimes difficult, and may lead to its own source of bias (if sample size is low). Much more about this later Sampling Objectives • To obtain an unbiased estimate of a population mean • To assess the precision of the estimate (i.e. -
The Central Limit Theorem (Review)
Introduction to Confidence Intervals { Solutions STAT-UB.0103 { Statistics for Business Control and Regression Models The Central Limit Theorem (Review) 1. You draw a random sample of size n = 64 from a population with mean µ = 50 and standard deviation σ = 16. From this, you compute the sample mean, X¯. (a) What are the expectation and standard deviation of X¯? Solution: E[X¯] = µ = 50; σ 16 sd[X¯] = p = p = 2: n 64 (b) Approximately what is the probability that the sample mean is above 54? Solution: The sample mean has expectation 50 and standard deviation 2. By the central limit theorem, the sample mean is approximately normally distributed. Thus, by the empirical rule, there is roughly a 2.5% chance of being above 54 (2 standard deviations above the mean). (c) Do you need any additional assumptions for part (c) to be true? Solution: No. Since the sample size is large (n ≥ 30), the central limit theorem applies. 2. You draw a random sample of size n = 16 from a population with mean µ = 100 and standard deviation σ = 20. From this, you compute the sample mean, X¯. (a) What are the expectation and standard deviation of X¯? Solution: E[X¯] = µ = 100; σ 20 sd[X¯] = p = p = 5: n 16 (b) Approximately what is the probability that the sample mean is between 95 and 105? Solution: The sample mean has expectation 100 and standard deviation 5. If it is approximately normal, then we can use the empirical rule to say that there is a 68% of being between 95 and 105 (within one standard deviation of its expecation). -
Calculating Variance and Standard Deviation
VARIANCE AND STANDARD DEVIATION Recall that the range is the difference between the upper and lower limits of the data. While this is important, it does have one major disadvantage. It does not describe the variation among the variables. For instance, both of these sets of data have the same range, yet their values are definitely different. 90, 90, 90, 98, 90 Range = 8 1, 6, 8, 1, 9, 5 Range = 8 To better describe the variation, we will introduce two other measures of variation—variance and standard deviation (the variance is the square of the standard deviation). These measures tell us how much the actual values differ from the mean. The larger the standard deviation, the more spread out the values. The smaller the standard deviation, the less spread out the values. This measure is particularly helpful to teachers as they try to find whether their students’ scores on a certain test are closely related to the class average. To find the standard deviation of a set of values: a. Find the mean of the data b. Find the difference (deviation) between each of the scores and the mean c. Square each deviation d. Sum the squares e. Dividing by one less than the number of values, find the “mean” of this sum (the variance*) f. Find the square root of the variance (the standard deviation) *Note: In some books, the variance is found by dividing by n. In statistics it is more useful to divide by n -1. EXAMPLE Find the variance and standard deviation of the following scores on an exam: 92, 95, 85, 80, 75, 50 SOLUTION First we find the mean of the data: 92+95+85+80+75+50 477 Mean = = = 79.5 6 6 Then we find the difference between each score and the mean (deviation). -
Nearest Neighbor Methods Philip M
Statistics Preprints Statistics 12-2001 Nearest Neighbor Methods Philip M. Dixon Iowa State University, [email protected] Follow this and additional works at: http://lib.dr.iastate.edu/stat_las_preprints Part of the Statistics and Probability Commons Recommended Citation Dixon, Philip M., "Nearest Neighbor Methods" (2001). Statistics Preprints. 51. http://lib.dr.iastate.edu/stat_las_preprints/51 This Article is brought to you for free and open access by the Statistics at Iowa State University Digital Repository. It has been accepted for inclusion in Statistics Preprints by an authorized administrator of Iowa State University Digital Repository. For more information, please contact [email protected]. Nearest Neighbor Methods Abstract Nearest neighbor methods are a diverse group of statistical methods united by the idea that the similarity between a point and its nearest neighbor can be used for statistical inference. This review article summarizes two common environmetric applications: nearest neighbor methods for spatial point processes and nearest neighbor designs and analyses for field experiments. In spatial point processes, the appropriate similarity is the distance between a point and its nearest neighbor. Given a realization of a spatial point process, the mean nearest neighbor distance or the distribution of distances can be used for inference about the spatial process. One common application is to test whether the process is a homogeneous Poisson process. These methods can be extended to describe relationships between two or more spatial point processes. These methods are illustrated using data on the locations of trees in a swamp hardwood forest. In field experiments, neighboring plots often have similar characteristics before treatments are imposed. -
Annex : Calculation of Mean and Standard Deviation
Annex : Calculation of Mean and Standard Deviation • A cholesterol control is run 20 times over 25 days yielding the following results in mg/dL: 192, 188, 190, 190, 189, 191, 188, 193, 188, 190, 191, 194, 194, 188, 192, 190, 189, 189, 191, 192. • Using the cholesterol control results, follow the steps described below to establish QC ranges . An example is shown on the next page. 1. Make a table with 3 columns, labeled A, B, C. 2. Insert the data points on the left (column A). 3. Add Data in column A. 4. Calculate the mean: Add the measurements (sum) and divide by the number of measurements (n). Mean= ∑ x +x +x +…. x 3809 = 190.5 mg/ dL 1 2 3 n N 20 5. Calculate the variance and standard deviation: (see formulas below) a. Subtract each data point from the mean and write in column B. b. Square each value in column B and write in column C. c. Add column C. Result is 71 mg/dL. d. Now calculate the variance: Divide the sum in column C by n-1 which is 19. Result is 4 mg/dL. e. The variance has little value in the laboratory because the units are squared. f. Now calculate the SD by taking the square root of the variance. g. The result is 2 mg/dL. Quantitative QC ● Module 7 ● Annex 1 A B C Data points. 2 xi −x (x −x) X1-Xn i 192 mg/dL 1.5 2.25 mg 2/dL 2 188 mg/dL -2.5 6.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 193 mg/dL 2.5 6.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 194 mg/dL 3.5 12.25 mg 2/dL2 194 mg/dL 3.5 12.25 mg 2/dL2 188 mg/dL -2.5 6.25 mg 2/dL2 192 mg/dL 1.5 2.25 mg 2/dL2 190 mg/dL -0.5 0.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 189 mg/dL -1.5 2.25 mg 2/dL2 191 mg/dL 0.5 0.25 mg 2/dL2 192 mg/dL 1.5 2.25 mg 2/dL2 2 2 2 ∑x=3809 ∑= -1 ∑ ( x i − x ) Sum of Col C is 71 mg /dL 2 − 2 ∑ (X i X) 2 SD = S = n−1 mg/dL SD = S = 71 19/ = 2mg / dL The square root returns the result to the original units . -
Experimental Investigation of the Tension and Compression Creep Behavior of Alumina-Spinel Refractories at High Temperatures
ceramics Article Experimental Investigation of the Tension and Compression Creep Behavior of Alumina-Spinel Refractories at High Temperatures Lucas Teixeira 1,* , Soheil Samadi 2 , Jean Gillibert 1 , Shengli Jin 2 , Thomas Sayet 1 , Dietmar Gruber 2 and Eric Blond 1 1 Université d’Orléans, Université de Tours, INSA-CVL, LaMé, 45072 Orléans, France; [email protected] (J.G.); [email protected] (T.S.); [email protected] (E.B.) 2 Chair of Ceramics, Montanuniversität Leoben, 8700 Leoben, Austria; [email protected] (S.S.); [email protected] (S.J.); [email protected] (D.G.) * Correspondence: [email protected] Received: 3 September 2020; Accepted: 18 September 2020; Published: 22 September 2020 Abstract: Refractory materials are subjected to thermomechanical loads during their working life, and consequent creep strain and stress relaxation are often observed. In this work, the asymmetric high temperature primary and secondary creep behavior of a material used in the working lining of steel ladles is characterized, using uniaxial tension and compression creep tests and an inverse identification procedure to calculate the parameters of a Norton-Bailey based law. The experimental creep curves are presented, as well as the curves resulting from the identified parameters, and a statistical analysis is made to evaluate the confidence of the results. Keywords: refractories; creep; parameters identification 1. Introduction Refractory materials, known for their physical and chemical stability, are used in high temperature processes in different industries, such as iron and steel making, cement and aerospace. These materials are exposed to thermomechanical loads, corrosion/erosion from solids, liquids and gases, gas diffusion, and mechanical abrasion [1].