Statistical Analysis of Hydrologic Data
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Use of Proc Iml to Calculate L-Moments for the Univariate Distributional Shape Parameters Skewness and Kurtosis
Statistics 573 USE OF PROC IML TO CALCULATE L-MOMENTS FOR THE UNIVARIATE DISTRIBUTIONAL SHAPE PARAMETERS SKEWNESS AND KURTOSIS Michael A. Walega Berlex Laboratories, Wayne, New Jersey Introduction Exploratory data analysis statistics, such as those Gaussian. Bickel (1988) and Van Oer Laan and generated by the sp,ge procedure PROC Verdooren (1987) discuss the concept of robustness UNIVARIATE (1990), are useful tools to characterize and how it pertains to the assumption of normality. the underlying distribution of data prior to more rigorous statistical analyses. Assessment of the As discussed by Glass et al. (1972), incorrect distributional shape of data is usually accomplished conclusions may be reached when the normality by careful examination of the values of the third and assumption is not valid, especially when one-tail tests fourth central moments, skewness and kurtosis. are employed or the sample size or significance level However, when the sample size is small or the are very small. Hopkins and Weeks (1990) also underlying distribution is non-normal, the information discuss the effects of highly non-normal data on obtained from the sample skewness and kurtosis can hypothesis testing of variances. Thus, it is apparent be misleading. that examination of the skewness (departure from symmetry) and kurtosis (deviation from a normal One alternative to the central moment shape statistics curve) is an important component of exploratory data is the use of linear combinations of order statistics (L analyses. moments) to examine the distributional shape characteristics of data. L-moments have several Various methods to estimate skewness and kurtosis theoretical advantages over the central moment have been proposed (MacGillivray and Salanela, shape statistics: Characterization of a wider range of 1988). -
The Statistical Analysis of Distributions, Percentile Rank Classes and Top-Cited
How to analyse percentile impact data meaningfully in bibliometrics: The statistical analysis of distributions, percentile rank classes and top-cited papers Lutz Bornmann Division for Science and Innovation Studies, Administrative Headquarters of the Max Planck Society, Hofgartenstraße 8, 80539 Munich, Germany; [email protected]. 1 Abstract According to current research in bibliometrics, percentiles (or percentile rank classes) are the most suitable method for normalising the citation counts of individual publications in terms of the subject area, the document type and the publication year. Up to now, bibliometric research has concerned itself primarily with the calculation of percentiles. This study suggests how percentiles can be analysed meaningfully for an evaluation study. Publication sets from four universities are compared with each other to provide sample data. These suggestions take into account on the one hand the distribution of percentiles over the publications in the sets (here: universities) and on the other hand concentrate on the range of publications with the highest citation impact – that is, the range which is usually of most interest in the evaluation of scientific performance. Key words percentiles; research evaluation; institutional comparisons; percentile rank classes; top-cited papers 2 1 Introduction According to current research in bibliometrics, percentiles (or percentile rank classes) are the most suitable method for normalising the citation counts of individual publications in terms of the subject area, the document type and the publication year (Bornmann, de Moya Anegón, & Leydesdorff, 2012; Bornmann, Mutz, Marx, Schier, & Daniel, 2011; Leydesdorff, Bornmann, Mutz, & Opthof, 2011). Until today, it has been customary in evaluative bibliometrics to use the arithmetic mean value to normalize citation data (Waltman, van Eck, van Leeuwen, Visser, & van Raan, 2011). -
Applied Biostatistics Mean and Standard Deviation the Mean the Median Is Not the Only Measure of Central Value for a Distribution
Health Sciences M.Sc. Programme Applied Biostatistics Mean and Standard Deviation The mean The median is not the only measure of central value for a distribution. Another is the arithmetic mean or average, usually referred to simply as the mean. This is found by taking the sum of the observations and dividing by their number. The mean is often denoted by a little bar over the symbol for the variable, e.g. x . The sample mean has much nicer mathematical properties than the median and is thus more useful for the comparison methods described later. The median is a very useful descriptive statistic, but not much used for other purposes. Median, mean and skewness The sum of the 57 FEV1s is 231.51 and hence the mean is 231.51/57 = 4.06. This is very close to the median, 4.1, so the median is within 1% of the mean. This is not so for the triglyceride data. The median triglyceride is 0.46 but the mean is 0.51, which is higher. The median is 10% away from the mean. If the distribution is symmetrical the sample mean and median will be about the same, but in a skew distribution they will not. If the distribution is skew to the right, as for serum triglyceride, the mean will be greater, if it is skew to the left the median will be greater. This is because the values in the tails affect the mean but not the median. Figure 1 shows the positions of the mean and median on the histogram of triglyceride. -
Comparison of Harmonic, Geometric and Arithmetic Means for Change Detection in SAR Time Series Guillaume Quin, Béatrice Pinel-Puysségur, Jean-Marie Nicolas
Comparison of Harmonic, Geometric and Arithmetic means for change detection in SAR time series Guillaume Quin, Béatrice Pinel-Puysségur, Jean-Marie Nicolas To cite this version: Guillaume Quin, Béatrice Pinel-Puysségur, Jean-Marie Nicolas. Comparison of Harmonic, Geometric and Arithmetic means for change detection in SAR time series. EUSAR. 9th European Conference on Synthetic Aperture Radar, 2012., Apr 2012, Germany. hal-00737524 HAL Id: hal-00737524 https://hal.archives-ouvertes.fr/hal-00737524 Submitted on 2 Oct 2012 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. EUSAR 2012 Comparison of Harmonic, Geometric and Arithmetic Means for Change Detection in SAR Time Series Guillaume Quin CEA, DAM, DIF, F-91297 Arpajon, France Béatrice Pinel-Puysségur CEA, DAM, DIF, F-91297 Arpajon, France Jean-Marie Nicolas Telecom ParisTech, CNRS LTCI, 75634 Paris Cedex 13, France Abstract The amplitude distribution in a SAR image can present a heavy tail. Indeed, very high–valued outliers can be observed. In this paper, we propose the usage of the Harmonic, Geometric and Arithmetic temporal means for amplitude statistical studies along time. In general, the arithmetic mean is used to compute the mean amplitude of time series. -
A Skew Extension of the T-Distribution, with Applications
J. R. Statist. Soc. B (2003) 65, Part 1, pp. 159–174 A skew extension of the t-distribution, with applications M. C. Jones The Open University, Milton Keynes, UK and M. J. Faddy University of Birmingham, UK [Received March 2000. Final revision July 2002] Summary. A tractable skew t-distribution on the real line is proposed.This includes as a special case the symmetric t-distribution, and otherwise provides skew extensions thereof.The distribu- tion is potentially useful both for modelling data and in robustness studies. Properties of the new distribution are presented. Likelihood inference for the parameters of this skew t-distribution is developed. Application is made to two data modelling examples. Keywords: Beta distribution; Likelihood inference; Robustness; Skewness; Student’s t-distribution 1. Introduction Student’s t-distribution occurs frequently in statistics. Its usual derivation and use is as the sam- pling distribution of certain test statistics under normality, but increasingly the t-distribution is being used in both frequentist and Bayesian statistics as a heavy-tailed alternative to the nor- mal distribution when robustness to possible outliers is a concern. See Lange et al. (1989) and Gelman et al. (1995) and references therein. It will often be useful to consider a further alternative to the normal or t-distribution which is both heavy tailed and skew. To this end, we propose a family of distributions which includes the symmetric t-distributions as special cases, and also includes extensions of the t-distribution, still taking values on the whole real line, with non-zero skewness. Let a>0 and b>0be parameters. -
TR Multivariate Conditional Median Estimation
Discussion Paper: 2004/02 TR multivariate conditional median estimation Jan G. de Gooijer and Ali Gannoun www.fee.uva.nl/ke/UvA-Econometrics Department of Quantitative Economics Faculty of Economics and Econometrics Universiteit van Amsterdam Roetersstraat 11 1018 WB AMSTERDAM The Netherlands TR Multivariate Conditional Median Estimation Jan G. De Gooijer1 and Ali Gannoun2 1 Department of Quantitative Economics University of Amsterdam Roetersstraat 11, 1018 WB Amsterdam, The Netherlands Telephone: +31—20—525 4244; Fax: +31—20—525 4349 e-mail: [email protected] 2 Laboratoire de Probabilit´es et Statistique Universit´e Montpellier II Place Eug`ene Bataillon, 34095 Montpellier C´edex 5, France Telephone: +33-4-67-14-3578; Fax: +33-4-67-14-4974 e-mail: [email protected] Abstract An affine equivariant version of the nonparametric spatial conditional median (SCM) is con- structed, using an adaptive transformation-retransformation (TR) procedure. The relative performance of SCM estimates, computed with and without applying the TR—procedure, are compared through simulation. Also included is the vector of coordinate conditional, kernel- based, medians (VCCMs). The methodology is illustrated via an empirical data set. It is shown that the TR—SCM estimator is more efficient than the SCM estimator, even when the amount of contamination in the data set is as high as 25%. The TR—VCCM- and VCCM estimators lack efficiency, and consequently should not be used in practice. Key Words: Spatial conditional median; kernel; retransformation; robust; transformation. 1 Introduction p s Let (X1, Y 1),...,(Xn, Y n) be independent replicates of a random vector (X, Y ) IR IR { } ∈ × where p > 1,s > 2, and n>p+ s. -
Estimating the Mean and Variance of a Normal Distribution
Estimating the Mean and Variance of a Normal Distribution Learning Objectives After completing this module, the student will be able to • explain the value of repeating experiments • explain the role of the law of large numbers in estimating population means • describe the effect of increasing the sample size or reducing measurement errors or other sources of variability Knowledge and Skills • Properties of the arithmetic mean • Estimating the mean of a normal distribution • Law of Large Numbers • Estimating the Variance of a normal distribution • Generating random variates in EXCEL Prerequisites 1. Calculating sample mean and arithmetic average 2. Calculating sample standard variance and standard deviation 3. Normal distribution Citation: Neuhauser, C. Estimating the Mean and Variance of a Normal Distribution. Created: September 9, 2009 Revisions: Copyright: © 2009 Neuhauser. This is an open‐access article distributed under the terms of the Creative Commons Attribution Non‐Commercial Share Alike License, which permits unrestricted use, distribution, and reproduction in any medium, and allows others to translate, make remixes, and produce new stories based on this work, provided the original author and source are credited and the new work will carry the same license. Funding: This work was partially supported by a HHMI Professors grant from the Howard Hughes Medical Institute. Page 1 Pretest 1. Laura and Hamid are late for Chemistry lab. The lab manual asks for determining the density of solid platinum by repeating the measurements three times. To save time, they decide to only measure the density once. Explain the consequences of this shortcut. 2. Tom and Bao Yu measured the density of solid platinum three times: 19.8, 21.4, and 21.9 g/cm3. -
Simple Mean Weighted Mean Or Harmonic Mean
MultiplyMultiply oror Divide?Divide? AA BestBest PracticePractice forfor FactorFactor AnalysisAnalysis 77 ––10 10 JuneJune 20112011 Dr.Dr. ShuShu-Ping-Ping HuHu AlfredAlfred SmithSmith CCEACCEA Los Angeles Washington, D.C. Boston Chantilly Huntsville Dayton Santa Barbara Albuquerque Colorado Springs Ft. Meade Ft. Monmouth Goddard Space Flight Center Ogden Patuxent River Silver Spring Washington Navy Yard Cleveland Dahlgren Denver Johnson Space Center Montgomery New Orleans Oklahoma City Tampa Tacoma Vandenberg AFB Warner Robins ALC Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PRT-70, 01 Apr 2011 ObjectivesObjectives It is common to estimate hours as a simple factor of a technical parameter such as weight, aperture, power or source lines of code (SLOC), i.e., hours = a*TechParameter z “Software development hours = a * SLOC” is used as an example z Concept is applicable to any factor cost estimating relationship (CER) Our objective is to address how to best estimate “a” z Multiply SLOC by Hour/SLOC or Divide SLOC by SLOC/Hour? z Simple, weighted, or harmonic mean? z Role of regression analysis z Base uncertainty on the prediction interval rather than just the range Our goal is to provide analysts a better understanding of choices available and how to select the right approach Presented at the 2011 ISPA/SCEA Joint Annual Conference and Training Workshop - www.iceaaonline.com PR-70, 01 Apr 2011 Approved for Public Release 2 of 25 OutlineOutline Definitions -
Statistics and GIS Assistance Help with Statistics
Statistics and GIS assistance An arrangement for help and advice with regard to statistics and GIS is now in operation, principally for Master’s students. How do you seek advice? 1. The users, i.e. students at INA, make direct contact with the person whom they think can help and arrange a time for consultation. Remember to be well prepared! 2. Doctoral students and postdocs register the time used in Agresso (if you have questions about this contact Gunnar Jensen). Help with statistics Research scientist Even Bergseng Discipline: Forest economy, forest policies, forest models Statistical expertise: Regression analysis, models with random and fixed effects, controlled/truncated data, some time series modelling, parametric and non-parametric effectiveness analyses Software: Stata, Excel Postdoc. Ole Martin Bollandsås Discipline: Forest production, forest inventory Statistics expertise: Regression analysis, sampling Software: SAS, R Associate Professor Sjur Baardsen Discipline: Econometric analysis of markets in the forest sector Statistical expertise: General, although somewhat “rusty”, expertise in many econometric topics (all-rounder) Software: Shazam, Frontier Associate Professor Terje Gobakken Discipline: GIS og long-term predictions Statistical expertise: Regression analysis, ANOVA and PLS regression Software: SAS, R Ph.D. Student Espen Halvorsen Discipline: Forest economy, forest management planning Statistical expertise: OLS, GLS, hypothesis testing, autocorrelation, ANOVA, categorical data, GLM, ANOVA Software: (partly) Shazam, Minitab og JMP Ph.D. Student Jan Vidar Haukeland Discipline: Nature based tourism Statistical expertise: Regression and factor analysis Software: SPSS Associate Professor Olav Høibø Discipline: Wood technology Statistical expertise: Planning of experiments, regression analysis (linear and non-linear), ANOVA, random and non-random effects, categorical data, multivariate analysis Software: R, JMP, Unscrambler, some SAS Ph.D. -
A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow
Prepared in cooperation with the Vermont Center for Geographic Information A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow Scientific Investigations Report 2006–5217 U.S. Department of the Interior U.S. Geological Survey A Logistic Regression Equation for Estimating the Probability of a Stream in Vermont Having Intermittent Flow By Scott A. Olson and Michael C. Brouillette Prepared in cooperation with the Vermont Center for Geographic Information Scientific Investigations Report 2006–5217 U.S. Department of the Interior U.S. Geological Survey U.S. Department of the Interior DIRK KEMPTHORNE, Secretary U.S. Geological Survey P. Patrick Leahy, Acting Director U.S. Geological Survey, Reston, Virginia: 2006 For product and ordering information: World Wide Web: http://www.usgs.gov/pubprod Telephone: 1-888-ASK-USGS For more information on the USGS--the Federal source for science about the Earth, its natural and living resources, natural hazards, and the environment: World Wide Web: http://www.usgs.gov Telephone: 1-888-ASK-USGS Any use of trade, product, or firm names is for descriptive purposes only and does not imply endorsement by the U.S. Government. Although this report is in the public domain, permission must be secured from the individual copyright owners to reproduce any copyrighted materials contained within this report. Suggested citation: Olson, S.A., and Brouillette, M.C., 2006, A logistic regression equation for estimating the probability of a stream in Vermont having intermittent -
Business Statistics Unit 4 Correlation and Regression.Pdf
RCUB, B.Com 4 th Semester Business Statistics – II Rani Channamma University, Belagavi B.Com – 4th Semester Business Statistics – II UNIT – 4 : CORRELATION Introduction: In today’s business world we come across many activities, which are dependent on each other. In businesses we see large number of problems involving the use of two or more variables. Identifying these variables and its dependency helps us in resolving the many problems. Many times there are problems or situations where two variables seem to move in the same direction such as both are increasing or decreasing. At times an increase in one variable is accompanied by a decline in another. For example, family income and expenditure, price of a product and its demand, advertisement expenditure and sales volume etc. If two quantities vary in such a way that movements in one are accompanied by movements in the other, then these quantities are said to be correlated. Meaning: Correlation is a statistical technique to ascertain the association or relationship between two or more variables. Correlation analysis is a statistical technique to study the degree and direction of relationship between two or more variables. A correlation coefficient is a statistical measure of the degree to which changes to the value of one variable predict change to the value of another. When the fluctuation of one variable reliably predicts a similar fluctuation in another variable, there’s often a tendency to think that means that the change in one causes the change in the other. Uses of correlations: 1. Correlation analysis helps inn deriving precisely the degree and the direction of such relationship. -
Approaching Mean-Variance Efficiency for Large Portfolios
Approaching Mean-Variance Efficiency for Large Portfolios ∗ Mengmeng Ao† Yingying Li‡ Xinghua Zheng§ First Draft: October 6, 2014 This Draft: November 11, 2017 Abstract This paper studies the large dimensional Markowitz optimization problem. Given any risk constraint level, we introduce a new approach for estimating the optimal portfolio, which is developed through a novel unconstrained regression representation of the mean-variance optimization problem, combined with high-dimensional sparse regression methods. Our estimated portfolio, under a mild sparsity assumption, asymptotically achieves mean-variance efficiency and meanwhile effectively controls the risk. To the best of our knowledge, this is the first time that these two goals can be simultaneously achieved for large portfolios. The superior properties of our approach are demonstrated via comprehensive simulation and empirical studies. Keywords: Markowitz optimization; Large portfolio selection; Unconstrained regression, LASSO; Sharpe ratio ∗Research partially supported by the RGC grants GRF16305315, GRF 16502014 and GRF 16518716 of the HKSAR, and The Fundamental Research Funds for the Central Universities 20720171073. †Wang Yanan Institute for Studies in Economics & Department of Finance, School of Economics, Xiamen University, China. [email protected] ‡Hong Kong University of Science and Technology, HKSAR. [email protected] §Hong Kong University of Science and Technology, HKSAR. [email protected] 1 1 INTRODUCTION 1.1 Markowitz Optimization Enigma The groundbreaking mean-variance portfolio theory proposed by Markowitz (1952) contin- ues to play significant roles in research and practice. The optimal mean-variance portfolio has a simple explicit expression1 that only depends on two population characteristics, the mean and the covariance matrix of asset returns. Under the ideal situation when the underlying mean and covariance matrix are known, mean-variance investors can easily compute the optimal portfolio weights based on their preferred level of risk.