Chapter 3. Univariate Statistics Empirical Distribution: Histogram Histogram Shows the Number of Data Points in a Given Data Bin

Total Page:16

File Type:pdf, Size:1020Kb

Chapter 3. Univariate Statistics Empirical Distribution: Histogram Histogram Shows the Number of Data Points in a Given Data Bin Chapter 3. Univariate Statistics Empirical distribution: Histogram Histogram shows the number of data points in a given data bin Syntax [n,xout]=hist(data) %n: row vector if the number of data in each bin hist (data) %xout: bin locations hist(data, # of bins) hist(data, vector of data bins) Updated functions: hist à histogram [n, edges]=histcounts(data) center=edges(1:end-1)+diff(edges)/2 Empirical distribution: Histogram x=randn(1000, 1); histogram(x) hist(x, 22) %gives similar results histogram(x, 50) %50 bins y=-2:0.1:2; hist(x,y) %not pretty histogram(x,y) %much better Empirical distributions How do we describe a dataset? Discrete parameters n min, max, mean n Median, quartile n standard deviation n variance n skewness n kurtosis Mean: Why different definitions? 1 N n Arithmetic mean xx= å i N i=1 n Geometric mean 1/ N xxxx=(...)12×××N N n Harmonic mean x = N å 1 i=1 xi Median: write a median function function m=mymedian(x) a=sort(x); b=length(x); b2=floor(b/2); if (b/2 > b2) %if mod(b,2) m=a(b2+1); else m=0.5*(a(b2)+a(b2+1)); end end Quantiles Divide ordered data into (approximately) equal- sized subsets of data. 4-quantiles: quartiles 100-quantiles: percentiles 1st quartile: 25th percentile 2nd quartile: median: 50th percentile Quartiles x=1:15, what is the 3rd quartile? 1. Use the median to divide the data to 2 subset (do not include the median value) 2. The lower quartile is the median of the lower half. The 3rd quartile is 12. Matlab uses linear interpolation: prctile(x,[25 50 75]) Dispersion of the data: Central moments N th ' 1 n n n moments: µni= å x N i=1 N 1 n th n n central moments µni=å()xx- st N i=1 q 1 µ1=0 nd q 2 µ2à variance rd q 3 µ3 àskewness th q 4 µ4 àkurtosis Moment statistics 1 N n Variance and standard 22 µ2 ==s å()xxi - deviation N -1 i=1 N ()xx- 3 n Skewness å i i=1 /s 3 N N 4 å()xxi - n Kurtosis i=1 /s 4 N Moment statistics Skewness > 0 dist. shifts to the right of mean = 0 dist. symmetric around the mean < 0 dist. shifts to the left of mean N 4 å()xxi - Kurtosis (some define it as i =1 /3 s 4 - ) N >3 “wide” dist. =3 normal dist <3 “narrow” dist Which variable is needed to compare mean with the median? Moment statistics 3000 2500 n Variance: var(x) 2000 n Standard deviation: 1500 std(x) 1000 500 0 n How do variance, -4 -3 -2 -1 0 1 2 3 4 skewness and kurtosis of “red” data compare to “blue” data? Moment statistics How do variance, skewness and kurtosis of “red” data compare to “blue” data? 6000 5000 4000 3000 2000 1000 0 -5 0 5 10 15 20 Dealing with NaN x=[1:120, NaN]; mean(x), var(x) nanmean(x), nanvar(x) skewness(x) kurtosis(x) How do we remove the NaN values? x(isnan(x))=[] x=x(~isnan(x)) NaN==NaN always return 0; must use isnan Organic matter data org=load('organicmatter_one.txt'); %checkout the data plot(org,'o-'), ylabel('wt %') %histogram %sqrt of the number of data is often a good first guess of intervals to use hist(org, 8) Statistics: mean(org) 12.3 median(org) 12.5 std(org) 1.17 var(org) 1.36 skewness(org) -0.25 kurtosis(org) 2.47 prctile(org,[25,50,75]) [11.4 12.5 13.3] Historgram: customized org=load('organicmatter_one.txt'); [n,xout]=hist(org,8); %n: raw with the number of data of each bin %xout: bin locations bar(xout, n, 'r') %red bar %3d bar bar3(xout, n, 'b') Sensitivity to outliers sodium = load('sodiumcontent.txt'); whos sodium hist(sodium,11) %add an outliner sodium2=sodium; sodium2(121,1) = 0.1; %sodim2=[sodium;0.1]; Which variable is most sensitive? Sensitivity to outliers original outlier 40 Mean 5.7 5.6 35 30 Median 6.0 6.0 25 20 Std 1.1 1.2 15 10 Skewness -1.1 -1.5 5 0 0 1 2 3 4 5 6 7 8 Kurtosis 3.7 6.1 boxplot boxplot(org) n Box shows the lower quartile, median, and upper quartile values. n Whiskers show the most extreme data within 1.5 times interquatile range (25th-75th percentile) from the ends of the box (25th, 75th percentile) n Red + signs: outliners load carsmall boxplot(MPG,Origin) %MPG is a vector of numbers, Origin a vector of strings that define “group” Box plot: group assignment {} data=[sodium; sodium2]; name(1:length(sodium))={'original'}; ed= length(sodium); name(ed+1:ed+length(sodium2))={'outlier'}; boxplot(data, name) Statistical distribution n Discrete probability distribution n Continuous probability distribution f(t): PDF probability density function F(x): CDF cumulative distribution function Discrete distribution: Poisson λ and k are integers. Continuous PDF: Boltzman Gaussian (normal) distributions n Parameters q Mean µ Syntax Y=pdf(name, p1,..) q Standard deviation s Y=cdf(name, p1,…) name: distribution name n PDF pi: parameters for the distribution Guassian Y=pdf(‘norm’,data vector, mean,std) Y=cdf(‘norm’,data vector, mean,std) n CDF Or Y=normpdf(data vector, mean,std) Y=normcdf(data vector, mean,std) Distributions Beta Lognormal Binomial Nakagami Birnbaum-Saunders Negative Binomial Burr Type XII Noncentral F Chi-Square Noncentral t Exponential Noncentral Chi-Square Extreme Value Normal F Poisson Gamma Rayleigh Generalized Extreme Rician Value Student's t Generalized Pareto t Location-Scale Geometric Uniform (Continuous) Hypergeometric Uniform (Discrete) Inverse Gaussian Weibull Logistic Loglogistic Gaussian distribution µ1=0; s1=0.2 µ2=2; s2=1 µ3=-2; s4=0.5 µ4=0; s4=3 mu=[0, 2, -2, 0];sig=[0.2,1,0.5,3]; x=linspace(-5,5,100); for i=1:4 xpdf(:,i)=pdf('norm',x,mu(i),sig(i)); xcdf(:,i)=cdf('norm',x,mu(i),sig(i)); end subplot(2,1,1), plot(x,xpdf) subplot(2,1,2), plot(x,xcdf) Gaussian distribution PDF CDF Central limit theorem The sum of a large number of independent and identically distributed random variables, each with finite mean and variance, is approximately normally distributed. -the 2nd fundamental theorem of probability for i=1:2000 x=rand(1000,1)<0.5; heads=sum(x); tails=1000-heads; y(i)=heads-tails; end histogram(y) If winning odds is 50% If winning odds is 45% Can you afford going to Vegas? What are the probabilities of losing $50 and $100 if the winning odds is 50%? ymean=mean(y) ystd=std(y) cdf('norm',-50,ymean,ystd) cdf('norm',-100,ymean,ystd) Gaussian distribution Vegas: Poll: n draws of 1 or -1 with a n samples of yes (1) or no (0) winning odds of p p: probability of yes Total earn/loss Total vote mean: n*(p-(1-p)) mean: n*p std: 2*sqrt(n*p*(1-p)) std: sqrt(n*p*(1-p)) The poll result: a Gaussian with a mean of p and a standard deviation of sqrt(p*(1-p))/sqrt(n) Polling uncertainty A Gaussian distribution with a mean of p and a standard deviation of sqrt(p*(1-p))/sqrt(n) (1) If p=50% and 1,000 people are sampled, what is the 95th percentile confidence interval of the polling result? (2) If p=30% and 1,000 people are sampled, what is the 95th percentile confidence interval of the polling result? 0.4 0.3 18.5% 18.5% 0.2 0.1 68% 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 1 0.8 0.6 2.5% 2.5% 0.4 0.2 95% 0 -5 -4 -3 -2 -1 0 1 2 3 4 5 Central limit theorem n Let X1, X2, X3, ... be a set of n independent and identically distributed (not necessarily normal) random variables having finite values of mean µ and variance σ2. As the sample size n increases, the distribution of the sample average approaches the normal distribution with a mean µ and variance σ2/n irrespective of the shape of the original distribution. n The PDF of the sum of two or more independent variables is the convolution of their densities (if these densities exist). The convolution of a number of density functions tends to the normal density as the number of density functions increases without bound, under the conditions stated above. Gaussian distribution and (1) (2) (3) Estimate of the errors n Constant error X12++XX+... n Y = 2 n XN~ (µXX,)s s 2 s 2 2 2 å YN~(,µYYs ) sY = 2 = nn 2 aX~( N aµXX ,()) as n Weighted error 22 XY+++~( NµµXYXY,)ss wX Y = å ii å wi 22 2 å wiis s Y = 2 (å wi ) Propagation of error (normal distribution) Central limit theorem The log of a product of random variables that take only positive values tends to have a normal distribution, which makes the product itself have a log-normal distribution. Log-normal distribution If Y is a random variable with a normal distribution, then X = exp(Y) has a log-normal distribution PDF CDF Log-normal distribution n If and then Y is a log-normally distributed variable as well n If then Y can be reasonably approximated by another log-normal distribution. Log-normal distribution mu=[0, 0, 1, 1];sig=[1/4,1/2,1,2]; x=linspace(0,3,100); for i=1:4 xpdf(:,i)=pdf('logn',x,mu(i),sig(i)); xcdf(:,i)=cdf('logn',x,mu(i),sig(i)); end subplot(2,1,1), semilogx(x,xpdf) subplot(2,1,2), semilogx(x,xcdf) Atmospheric aerosol size distribution Chi-sQuared distribution one of the most widely used in statistical significance tests.
Recommended publications
  • Comparison of Harmonic, Geometric and Arithmetic Means for Change Detection in SAR Time Series Guillaume Quin, Béatrice Pinel-Puysségur, Jean-Marie Nicolas
    Comparison of Harmonic, Geometric and Arithmetic means for change detection in SAR time series Guillaume Quin, Béatrice Pinel-Puysségur, Jean-Marie Nicolas To cite this version: Guillaume Quin, Béatrice Pinel-Puysségur, Jean-Marie Nicolas. Comparison of Harmonic, Geometric and Arithmetic means for change detection in SAR time series. EUSAR. 9th European Conference on Synthetic Aperture Radar, 2012., Apr 2012, Germany. hal-00737524 HAL Id: hal-00737524 https://hal.archives-ouvertes.fr/hal-00737524 Submitted on 2 Oct 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. EUSAR 2012 Comparison of Harmonic, Geometric and Arithmetic Means for Change Detection in SAR Time Series Guillaume Quin CEA, DAM, DIF, F-91297 Arpajon, France Béatrice Pinel-Puysségur CEA, DAM, DIF, F-91297 Arpajon, France Jean-Marie Nicolas Telecom ParisTech, CNRS LTCI, 75634 Paris Cedex 13, France Abstract The amplitude distribution in a SAR image can present a heavy tail. Indeed, very high–valued outliers can be observed. In this paper, we propose the usage of the Harmonic, Geometric and Arithmetic temporal means for amplitude statistical studies along time. In general, the arithmetic mean is used to compute the mean amplitude of time series.
    [Show full text]
  • Simple Mean Weighted Mean Or Harmonic Mean
    MultiplyMultiply oror Divide?Divide? AA BestBest PracticePractice forfor FactorFactor AnalysisAnalysis 77 ––10 10 JuneJune 20112011 Dr.Dr. ShuShu-Ping-Ping HuHu AlfredAlfred SmithSmith CCEACCEA Los Angeles Washington, D.C. Boston Chantilly Huntsville Dayton Santa Barbara Albuquerque Colorado Springs Ft. Meade Ft. Monmouth Goddard Space Flight Center Ogden Patuxent River Silver Spring Washington Navy Yard Cleveland Dahlgren Denver Johnson Space Center Montgomery New Orleans Oklahoma City Tampa Tacoma Vandenberg AFB Warner Robins ALC Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PRT-70, 01 Apr 2011 ObjectivesObjectives It is common to estimate hours as a simple factor of a technical parameter such as weight, aperture, power or source lines of code (SLOC), i.e., hours = a*TechParameter z “Software development hours = a * SLOC” is used as an example z Concept is applicable to any factor cost estimating relationship (CER) Our objective is to address how to best estimate “a” z Multiply SLOC by Hour/SLOC or Divide SLOC by SLOC/Hour? z Simple, weighted, or harmonic mean? z Role of regression analysis z Base uncertainty on the prediction interval rather than just the range Our goal is to provide analysts a better understanding of choices available and how to select the right approach Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 2 of 25 OutlineOutline Definitions
    [Show full text]
  • “Mean”? a Review of Interpreting and Calculating Different Types of Means and Standard Deviations
    pharmaceutics Review What Does It “Mean”? A Review of Interpreting and Calculating Different Types of Means and Standard Deviations Marilyn N. Martinez 1,* and Mary J. Bartholomew 2 1 Office of New Animal Drug Evaluation, Center for Veterinary Medicine, US FDA, Rockville, MD 20855, USA 2 Office of Surveillance and Compliance, Center for Veterinary Medicine, US FDA, Rockville, MD 20855, USA; [email protected] * Correspondence: [email protected]; Tel.: +1-240-3-402-0635 Academic Editors: Arlene McDowell and Neal Davies Received: 17 January 2017; Accepted: 5 April 2017; Published: 13 April 2017 Abstract: Typically, investigations are conducted with the goal of generating inferences about a population (humans or animal). Since it is not feasible to evaluate the entire population, the study is conducted using a randomly selected subset of that population. With the goal of using the results generated from that sample to provide inferences about the true population, it is important to consider the properties of the population distribution and how well they are represented by the sample (the subset of values). Consistent with that study objective, it is necessary to identify and use the most appropriate set of summary statistics to describe the study results. Inherent in that choice is the need to identify the specific question being asked and the assumptions associated with the data analysis. The estimate of a “mean” value is an example of a summary statistic that is sometimes reported without adequate consideration as to its implications or the underlying assumptions associated with the data being evaluated. When ignoring these critical considerations, the method of calculating the variance may be inconsistent with the type of mean being reported.
    [Show full text]
  • Harmonic Means ‰ Mean of a Ratio ‰ Summarizing Variability: Range, Variance, Percentiles, Quartiles ‰ Determining Distribution of Data: Quantile-Quantile Plots
    SummarizingSummarizing MeasuredMeasured DataData ©2010 Raj Jain www.rajjain.com 12-1 OverviewOverview Basic Probability and Statistics Concepts: CDF, PDF, PMF, Mean, Variance, CoV, Normal Distribution Summarizing Data by a Single Number: Mean, Median, and Mode, Arithmetic, Geometric, Harmonic Means Mean of A Ratio Summarizing Variability: Range, Variance, percentiles, Quartiles Determining Distribution of Data: Quantile-Quantile plots ©2010 Raj Jain www.rajjain.com 12-2 PartPart III:III: ProbabilityProbability TheoryTheory andand StatisticsStatistics 1. How to report the performance as a single number? Is specifying the mean the correct way? 2. How to report the variability of measured quantities? What are the alternatives to variance and when are they appropriate? 3. How to interpret the variability? How much confidence can you put on data with a large variability? 4. How many measurements are required to get a desired level of statistical confidence? 5. How to summarize the results of several different workloads on a single computer system? 6. How to compare two or more computer systems using several different workloads? Is comparing the mean sufficient? 7. What model best describes the relationship between two variables? Also, how good is the model? ©2010 Raj Jain www.rajjain.com 12-3 BasicBasic ProbabilityProbability andand StatisticsStatistics ConceptsConcepts Independent Events: Two events are called independent if the occurrence of one event does not in any way affect the probability of the other event. Random Variable:
    [Show full text]
  • Notes on Calculating Computer Performance
    Notes on Calculating Computer Performance Bruce Jacob and Trevor Mudge Advanced Computer Architecture Lab EECS Department, University of Michigan {blj,tnm}@umich.edu Abstract This report explains what it means to characterize the performance of a computer, and which methods are appro- priate and inappropriate for the task. The most widely used metric is the performance on the SPEC benchmark suite of programs; currently, the results of running the SPEC benchmark suite are compiled into a single number using the geometric mean. The primary reason for using the geometric mean is that it preserves values across normalization, but unfortunately, it does not preserve total run time, which is probably the figure of greatest interest when performances are being compared. Cycles per Instruction (CPI) is another widely used metric, but this method is invalid, even if comparing machines with identical clock speeds. Comparing CPI values to judge performance falls prey to the same prob- lems as averaging normalized values. In general, normalized values must not be averaged and instead of the geometric mean, either the harmonic or the arithmetic mean is the appropriate method for averaging a set running times. The arithmetic mean should be used to average times, and the harmonic mean should be used to average rates (1/time). A number of published SPECmarks are recomputed using these means to demonstrate the effect of choosing a favorable algorithm. 1.0 Performance and the Use of Means We want to summarize the performance of a computer; the easiest way uses a single number that can be compared against the numbers of other machines.
    [Show full text]
  • Package 'Cumstats'
    Package ‘cumstats’ January 16, 2017 Type Package Title Cumulative Descriptive Statistics Version 1.0 Date 2017-01-13 Author Arturo Erdely and Ian Castillo Maintainer Arturo Erdely <[email protected]> Description Cumulative descriptive statistics for (arithmetic, geometric, harmonic) mean, me- dian, mode, variance, skewness and kurtosis. License GPL-3 NeedsCompilation no Repository CRAN Date/Publication 2017-01-16 08:13:00 R topics documented: cumstats-package . .2 cumgmean . .3 cumhmean . .4 cumkurt . .5 cummean . .6 cummedian . .7 cummode . .8 cumquant . .9 cumskew . 10 cumvar . 11 kurtosis . 11 Mode ............................................ 12 skewness . 13 Index 15 1 2 cumstats-package cumstats-package Cumulative Descriptive Statistics Description Cumulative descriptive statistics for (arithmetic, geometric, harmonic) mean, median, mode, vari- ance, skewness and kurtosis. Details The DESCRIPTION file: Package: cumstats Type: Package Title: Cumulative Descriptive Statistics Version: 1.0 Date: 2017-01-13 Author: Arturo Erdely and Ian Castillo Maintainer: Arturo Erdely <[email protected]> Description: Cumulative descriptive statistics for (arithmetic, geometric, harmonic) mean, median, mode, variance, skewness and kurtosis. License: GPL-3 Index of help topics: Mode Statistical Mode cumgmean Cumulative Geometric Mean cumhmean Cumulative Harmonic Mean cumkurt Cumulative Kurtosis cummean Cumulative Arithmetic Mean cummedian Cumulative Median cummode Cumulative Mode cumquant Cumulative Quantile cumskew Cumulative Skewness cumstats-package Cumulative Descriptive Statistics cumvar Cumulative Variance kurtosis Pearson's Measure of Kurtosis skewness Skewness Cumulative descriptive statistics for (arithmetic, geometric, harmonic) mean, median, mode, vari- ance, skewness and kurtosis. Author(s) Arturo Erdely and Ian Castillo Maintainer: Arturo Erdely <[email protected]> cumgmean 3 cumgmean Cumulative Geometric Mean Description Returns a vector whose elements are the cumulative geometric mean of the elements of the argu- ment.
    [Show full text]
  • [Math.PR] 3 Dec 2001 Nsm Eali Eto .;Hr Ewl Utrcl H Essentials
    HARMONIC MEAN, RANDOM POLYNOMIALS AND STOCHASTIC MATRICES NATALIA L. KOMAROVA AND IGOR RIVIN Abstract. Motivated by a problem in learning theory, we are led to study the dominant eigenvalue of a class of random matrices. This turns out to be related to the roots of the derivative of random polynomials (generated by picking their roots uniformly at random in the interval [0, 1], although our results extend to other distribu- tions). This, in turn, requires the study of the statistical behavior of the harmonic mean of random variables as above, and that, in turn, leads us to delicate question of the rate of convergence to stable laws and tail estimates for stable laws. Introduction The original motivation for the work in this paper was provided by the first-named author’s research in learning theory, specifically in various models of language acquisition (see [KNN2001, NKN2001, KN2001]) and more specifically yet by the analysis of the speed of con- vergence of the memoryless learner algorithm. The setup is described in some detail in Section 4.1; here we will just recall the essentials. There is a collection of concepts R1,...,Rn and words which refer to these concepts, sometimes ambiguously. The teacher generates a stream of words, referring to the concept R1. This is not known to the student, but he must learn by, at each steps, guessing some concept Ri and checking for consistency with the teacher’s input. The mem- oryless learner algorithm consists of picking a concept Ri at random, arXiv:math/0105236v2 [math.PR] 3 Dec 2001 and sticking by this choice, until it is proven wrong.
    [Show full text]
  • Approximation of the Expected Value of the Harmonic Mean and Some Applications
    Approximation of the expected value of the harmonic mean and some applications Calyampudi Radhakrishna Raoa,b,1, Xiaoping Shic, and Yuehua Wuc aDepartment of Biostatistics, University at Buffalo, The State University of New York, Buffalo, NY 14221-3000; bCRRAO Advanced Institute of Mathematics, Statistics And Computer Science, Hyderabad-500046, India; and cDepartment of Mathematics and Statistics, York University, Toronto, ON, Canada M3J 1P3 Contributed by Calyampudi R. Rao, July 1, 2014 (sent for review June 5, 2014) Although the harmonic mean (HM) is mentioned in textbooks asymptotic expansion of EðHnÞ or the general version EðHnðwÞÞ along with the arithmetic mean (AM) and the geometric mean (GM) under more general assumptions on distributions of Zi s. We as three possible ways of summarizing the information in a set of show that under mild assumptions, observations, its appropriateness in some statistical applications is ( ) notmentionedintextbooks.Duringthelast10yanumberof 1 pffiffiffiffiffiffiffiffiffiffiffiffiffic1 papers were published giving some statistical applications where EðHnÞ ∼ 1 + ; [5] logðnÞ HM is appropriate and provides a better performance than AM. In logðnÞ the present paper some additional applications of HM are consid- ered. The key result is to find a good approximation to E(Hn), the where the constant c1 will be given. In addition, we use the 5 expectation of the harmonic mean of n observations from a proba- approach for obtaining [ ] to the case that the first moment of = bility distribution. In this paper a second-order approximation to 1 Zi is finite, motivated by evaluation of the marginal likelihood E(Hn) is derived and applied to a number of problems.
    [Show full text]
  • UNIT-II : Measures of Central Tendency and Dispersion: Meaning and Objectives of Measures of Central Tendency, Different Measure Viz
    1 BUSINESS STATISTICS BBA UNIT –II, 2ND SEM UNIT-II : Measures of Central Tendency and Dispersion: Meaning and objectives of measures of central tendency, different measure viz. arithmetic mean, median, mode, geometric mean and harmonic mean, characteristics, applications and limitations of these measures; measure of variation viz. range, quartile deviation mean deviation and standard deviation, co-efficient of variation and skewness. Measures of Central Tendency: A measure of central tendency is a single value that attempts to describe a set of data by identifying the central position within that set of data. As such, measures of central tendency are sometimes called measures of central location. They are also classed as summary statistic: 1. Mean 2. Median 3. Mode The mean, median and mode are all valid measures of central tendency, but under different conditions, some measures of central tendency become more appropriate to use than others. In the following sections, we will look at the mean, mode and median, and learn how to calculate them and under what conditions they are most appropriate to be used. 1. Mean: There are 3 types of mean 1A. Arithmetic Mean The mean (or average) is the most popular and well known measure of central tendency. It can be used with both discrete and continuous data, although its use is most often with continuous data. The mean is equal to the sum of all the values in the data set divided by the number of values in the data set. So, if we have n values in a data set and they have values x1, x2, ..., xn, the sample mean, usually denoted by (pronounced x bar), is: This formula is usually written in a slightly different manner using the Greek capitol letter, , pronounced "sigma", which means "sum of...": Example: The marks of seven students in a mathematics test with a maximum possible mark of 20 are given below: 15 13 18 16 14 17 12 Dr.
    [Show full text]
  • An Exposition on Means Mabrouck K
    Louisiana State University LSU Digital Commons LSU Master's Theses Graduate School 2004 Which mean do you mean?: an exposition on means Mabrouck K. Faradj Louisiana State University and Agricultural and Mechanical College, [email protected] Follow this and additional works at: https://digitalcommons.lsu.edu/gradschool_theses Part of the Applied Mathematics Commons Recommended Citation Faradj, Mabrouck K., "Which mean do you mean?: an exposition on means" (2004). LSU Master's Theses. 1852. https://digitalcommons.lsu.edu/gradschool_theses/1852 This Thesis is brought to you for free and open access by the Graduate School at LSU Digital Commons. It has been accepted for inclusion in LSU Master's Theses by an authorized graduate school editor of LSU Digital Commons. For more information, please contact [email protected]. WHICH MEAN DO YOU MEAN? AN EXPOSITION ON MEANS A Thesis Submitted to the Graduate Faculty of the Louisiana State University and Agricultural and Mechanical College in partial fulfillment of the requirements for the degree of Master of Science in The Department of Mathematics by Mabrouck K. Faradj B.S., L.S.U., 1986 M.P.A., L.S.U., 1997 August, 2004 Acknowledgments This work was motivated by an unpublished paper written by Dr. Madden in 2000. This thesis would not be possible without contributions from many people. To every one who contributed to this project, my deepest gratitude. It is a pleasure to give special thanks to Professor James J. Madden for helping me complete this work. This thesis is dedicated to my wife Marianna for sacrificing so much of her self so that I may realize my dreams.
    [Show full text]
  • Package 'Harmonicmeanp'
    Package ‘harmonicmeanp’ August 19, 2019 Type Package Title Harmonic Mean p-Values and Model Averaging by Mean Maximum Likelihood Version 3.0 Date 2019-08-17 Author Daniel J. Wilson Maintainer Daniel Wilson <[email protected]> Description The harmonic mean p-value (HMP) test combines p-values and corrects for multiple test- ing while controlling the strong-sense family-wise error rate. It is more powerful than com- mon alternatives including Bonferroni and Simes procedures when combining large propor- tions of all the p-values, at the cost of slightly lower power when combining small propor- tions of all the p-values. It is more stringent than controlling the false discovery rate, and pos- sesses theoretical robustness to positive correlations between tests and un- equal weights. It is a multi-level test in the sense that a superset of one or more signifi- cant tests is certain to be significant and conversely when the superset is non-significant, the con- stituent tests are certain to be non-significant. It is based on MAMML (model averag- ing by mean maximum likelihood), a frequentist analogue to Bayesian model averag- ing, and is theoretically grounded in generalized central limit theorem. For detailed exam- ples type vignette(``harmonicmeanp'') after installation. Version 3.0 addresses errors in ver- sions 1.0 and 2.0 that led function p.hmp to control the familywise er- ror rate only in the weak sense, rather than the strong sense as intended. License GPL-3 Depends FMStable Suggests knitr, ape, rmarkdown VignetteBuilder knitr NeedsCompilation no Repository CRAN Date/Publication 2019-08-19 18:00:05 UTC R topics documented: Asymptotically Exact Harmonic Mean p-Value .
    [Show full text]
  • Water Quality Standards Quarterly Meeting April 7, 2015
    WATER QUALITY STANDARDS QUARTERLY MEETING JUNE 13, 2017 LAURA COOPER ASSISTANT DIRECTOR DWWM, WATER QUALITY STANDARDS [email protected] AGENDA • Introductions – WQS Staff & Attendees • Legislative Session in Review • Harmonic Mean Discussion • Other Water Quality Standards Topics WATER QUALITY STANDARDS UPDATE Since last meeting in November 2016 • DEP developed and shared “WV DEP Selenium Chronic Aquatic Life Criteria Implementation” • Still no approval or disapproval from EPA on • WV DEP Special Reclamation variances for Muddy Creek and Sandy Creek • Aluminum aquatic life criterion AGENDA • Introductions – WQS Staff & Attendees • Legislative Session in Review • Harmonic Mean Discussion • Other Water Quality Standards Topics WATER QUALITY STANDARDS 2017 WV LEGISLATURE • DEP withdrew its proposed WQS rule revisions on December 1st • Bills were introduced affecting WQS in Legislative session • HB 2506, amended §22-11-7b, passed March 28, signed by Governor on April 8 • SB 687, amended §22-11-7b, passed April 8, signed by Governor April 26 WHAT HOUSE BILL 2506 SAID HB 2506, Committee Substitute , from subsection (c) 2506 did not make any changes to Subsection (f) WHAT SENATE BILL 687 SAID SB 687, originated in Committee, subsection (f) Likewise, 687 did not make any changes to Subsection (c) WATER QUALITY STANDARDS RULE PLAN FOR 2017 • DEP is not proposing to revise the Water Quality Standards rule in 2017 • WQS program will continue to have quarterly public meetings to discuss standards, and hear any information about desired revisions • Will continue to review potential revisions and prepare for Triennial Review AGENDA • Introductions – WQS Staff & Attendees • Legislative Session in Review • Harmonic Mean Discussion • Other Water Quality Standards Topics WHAT IS HARMONIC MEAN FLOW? Harmonic mean flow—The flow that is determined by taking the reciprocal of the arithmetic mean of reciprocals of daily flow values.
    [Show full text]