Probability Distribution

Total Page:16

File Type:pdf, Size:1020Kb

Probability Distribution Probability Distribution Prof. (Dr.) Rajib Kumar Bhattacharjya Indian Institute of Technology Guwahati Guwahati, Assam Email: [email protected] Web: www.iitg.ernet.in/rkbc Visiting Faculty NIT Meghalaya Probabilistic Approach: A random variable X is a variable described by a probability distribution. The distribution specifies the chance that an observation x of the variable will fall in a specific range of X. A set of observations n1, n2, n3,…………., nn of the random variable is called a sample. It is assumed that samples are drawn from a hypothetical infinite population possessing constant statistical properties. Properties of sample may vary from one sample to others. The probability P (A) of an event is the chance that it will occur when an observation of the random variable is made. 푛 푃(퐴) = lim 퐴 1. Total Probability 푛→∞ 푛 푃 퐴 + 푃 퐴 + 푃 퐴 +. … … … … … + 푃(퐴 ) = 푃(Ω) = 1 n --- number in range of event A. 1 2 3 푛 A 1. Complementarity n----- Total observations 푃 퐴 = 1 − 푃(퐴) 1. Conditional Probability 퐵 푃(퐴∩퐵) 푃 = 퐴 푃(퐴) B will occur provided A has already occurred. Joint Probability 푃 퐴 ∩ 퐵 = 푃(퐴)푃(퐵) Example: The probability that annual precipitation will be less than 120 mm is 0.333. What is the probability that there will be two successive year of precipitation less than 120 mm. P(R<35) = 0.333 P(C) = 0.3332 =0.111 15 10 9 Precipi- tation 5 5 2 Year Frequency Histogram Relative Frequency: 푛 푓 푥 = 푖 푠 푖 푛 Which is equivalent to 푃 푥푖 − ∆푥 ≤ 푋 ≤ 푥푖 , the probability that the random variable X will lie in the interval [푥푖 − ∆푥,푥푖] Cumulative Frequency Function: 푖 퐹푠 푥푖 = 푓푠 푥푗 푗=1 This is estimated as 푃 푋 ≤ 푥푖 , the cumulative probability of xi. This is estimated for sample data, corresponding function for population will be Probability density Function: 푓푠(푥) 푓 푥 = 푛lim→∞ ∆푥→0 ∆푥 Probability Distribution Function: 퐹 푥 = 푛lim→∞ 퐹푠 푥 ∆푥→0 Whose derivative is the probability density function 푑퐹 훾 푓 훾 = 푑푥 The cumulative probability as 푃 푋 ≤ 푥 , can be expressed as 푥 푃 푋 ≤ 푥 = 퐹 푥 = 푓 푢 푑푢 −∞ 푃 푥푖 = 푃 푥푖 − ∆푥 ≤ 푋 ≤ 푥푖 푥푖 = 푓 푥 푑푥 푥푖−∆푥 푥푖 푥푖−∆푥 = 푓 푥 푑푥 − 푓 푥 푑푥 −∞ −∞ =퐹 푥푖 − 퐹 푥푖 − ∆푥 =퐹 푥푖 − 퐹 푥푖−1 Sample Population Pxi fsxi ∆x x xi Fsxi Fsxi dF(x) dx x Probability density function 푓(푥) 푥 1 푥 − 휇 2 푓 푥 = 푒푥푝 − 2휋 2휎2 2 퐹(푥) 1 푧 푥 − 휇 푓 푧 = 푒푥푝 − 푧 = −∞ ≤ 푧 ≤ ∞ 2휋 2 휎 푧 1 2 퐹 푧 = 푒−푢 /2푑푢 2휋 푥 −∞ Cumulative probability of standard normal distribution 푧 1 2 퐹 푧 = 푒−푢 /2푑푢 2휋 −∞ 퐹 푧 = 퐵 for 푧 < 0 퐹 푧 = 1 − 퐵 for 푧 ≥ 0 1 퐵 = 1 + 0.196854 푧 + 0.115194 푧 2 + 0.000344 푧 3 + 0.019527 푧 4 −4 2 Ex. What is the probability that the standard normal random variable 푧 will be less than -2? Less than 1? What is 푃 −2 < 푍 < 1 ? Solution 푃 푍 < −2 = 퐹 −2 = 1 − 퐹 2 = 1 − 0.9772 = 0.228 푃 푍 < 1 = 퐹 1 = 0.8413 푃 −2 < 푍 < 1 = 퐹 1 − 퐹 2 = 0.841 − 0.023 = 0.818 Example 2: The annual runoff of a stream is modeled by a normal distribution with mean and standard deviation of 5000 and 1000 ha-m respectively. i. Find the probability that the annual runoff in any year is more than 6500 ha-m. ii. Find the probability that it would be between 3800 and 5800 ha-m. Solution: Let X is the random variable denoting the annual runoff. Then z is given by (푋−5000) 푧 = N (0, 1) 1000 (6500−5000) (i) P (X≥6500) = P (z ≥ ) 1000 = P (z≥1.5) = 1-P (z≤1.5) = 1- F (1.5) F (1.5) = 0.9332 P (X≥6500) =1-0.9332 = 0.0688 P (X≥6500) =0.0688 (ii) P (3800≤X≤5800) (3800−5000) (5800−5000) = P [ ≤ 푧 ≤ P ] 1000 1000 = P [-1.2≤z≤0.8] = F (0.8)-F (-1.2) = F (0.8)-[1-F (1.2)] =0.7881-(1-0.8849) =0.673 Statistical parameter ∞ Expected value 퐸 푥 = 휇 = 푥푓 푥 푑푥 −∞ It is the first moment about the origin of the random variable, a measure of the midpoint or central tendency of the distribution 푛 1 푥 = 푥 The sample estimate of the mean is the average 푛 푖 푖=1 The variability of data is measured by the variance 휎2 ∞ 퐸 푥 − 휇 2 = 휎2 = 푥 − 휇 2푓 푥 푑푥 −∞ 푛 1 The sample estimate of the variance is given by 푠2 = 푥 − 푥 2 푛 − 1 푖 푖=1 A symmetry of distribution about the mean is a measured by the skewness. This is the third moment about the mean ∞ 퐸 푥 − 휇 3 = 푥 − 휇 3푓 푥 푑푥 −∞ The coefficient of skewness 훾 is defined as 1 훾 = 퐸 푥 − 휇 3 휎3 A sample estimate for 훾 is given by 푛 푛 푥 − 푥 3 퐶 = 푖=1 푖 푠 (푛 − 1)(푛 − 2)푠3 Fitting a probability distribution A probability distribution is a function representing the probability of occurrence of a random variable. By fitting a distribution function, we can extract the probabilistic information of the random variable Fitting distribution can be achieved by the method of moments and the method of maximum likelihood Method of moments Developed by Karl Pearson in 1902 Karl Pearson (27 March He considered that good estimate of the parameters of a 1857 – 27 April 1936) was probability distribution are those for which moments of an English mathematician the probability density function about the origin are equal to the corresponding moments of the sample data The moments are ∞ 푛 1 퐸 푥 = 휇 = 푥푓 푥 푑푥 = 푥 = 푥 푛 푖 −∞ 푖=1 ∞ 푛 1 퐸 푥 − 휇 2 = 휎2 = 푥 − 휇 2푓 푥 푑푥 = 푠2 = 푥 − 푥 2 푛 − 1 푖 −∞ 푖=1 ∞ 푛 푛 푥 − 푥 3 퐸 푥 − 휇 3 = 푥 − 휇 3푓 푥 푑푥 = 푖=1 푖 (푛 − 1)(푛 − 2) −∞ Method of moments Developed by R A Fisher in 1922 Sir Ronald Aylmer Fisher (17 February 1890 – He reasoned that the best value of a parameter of a 29 July 1962), known as probability distribution should be that value which R.A. Fisher, was an English statistician, evolutionary maximizes the likelihood of joint probability of biologist, mathematician, occurrence of the observed sample geneticist, and eugenicist. Suppose that the sample space is divided into intervals of length 푑푥 and that a sample of independent and identically distributed observations 푥1, 푥2, 푥3, … , 푥푛 is taken The value of the probability density for 푋 = 푥푖 is 푓(푥푖) The probability that the random variable will occur in that interval including 푥푖 is 푓 푥푖 푑푥 Since the observation is independent, their joint probability of occurrence is 푛 푛 푓 푥1 푑푥푓 푥2 푑푥푓 푥3 푑푥 … 푓 푥푛 푑푥 = 푓(푥푖)푑푥 푖=1 Since 푑푥 is fixed, we can maximize 푛 푀푎푥 퐿 = 푓(푥푖) 푖=1 푛 푀푎푥 푙푛퐿 = 푙푛푓(푥푖) 푖=1 Example: The exponential distribution can be used to describe various kinds of hydrological data, such as inter arrival times of rainfall events. Its probability density function is 푓 푥 = 휆푒−휆푥 for 푥 > 0. Determine the relationship between the parameter 휆 and the first moment about the origin 휇. Solution: Method of maximum likelihood Method of moments 푛 푛 휕푙푛퐿 푛 푙푛퐿 = 푙푛푓(푥 ) = − 푥 = 0 ∞ 푖 휕휆 휆 푖 휇 = 퐸 푥 = 푥푓 푥 푑푥 푖=1 푖=1 −∞ 푛 푙푛퐿 = 푙푛 휆푒−휆푥푖 푛 ∞ 1 1 푖=1 = 푥푖 −휆푥 휆 푛 = 푥휆푒 푑푥 푖=1 −∞ 푛 푙푛퐿 = 푙푛휆 − 휆푥 1 푖 휆 = 푖=1 푥 1 휇 = 푛 휆 푙푛퐿 = 푛푙푛휆 − 휆 푥푖 푖=1 Normal family Normal, lognormal, lognormal-III Generalized extreme value family EV1 (Gumbel), GEV, and EVIII (Weibull) Exponential/Pearson type family Exponential, Pearson type III, Log-Pearson type III Normal Distribution 1 1 푥−휇 2 푓 푥 = 푒−2 휎 −∞ ≤ 푥 ≤ +∞ 휎 2휋 0.8 0.08 =0.5 =-20 0.7 =1.5 0.07 =-10 =2.5 =0 0.6 =3.5 0.06 =10 =4.5 =20 0.5 0.05 0.4 0.04 f(x) f(x) 0.3 0.03 0.2 0.02 0.1 0.01 0 0 -10 -8 -6 -4 -2 0 2 4 6 8 10 -50 -40 -30 -20 -10 0 10 20 30 40 50 x x Log-Normal Distribution 2 1 푦−휇푦 1 − 푓 푦 = 푒 2 휎푦 푥 > 0 and 푦 = ln(푥) 휎푦 2휋 Extreme value (EV) distributions • Extreme values – maximum or minimum values of sets of data • Annual maximum discharge, annual minimum discharge • When the number of selected extreme values is large, the distribution converges to one of the three forms of EV distributions called Type I, II and III 25 0.2 =0.5 0.18 =1 EV type I distribution =1.5 0.16 =2.0 0.14 =2.5 If M1, M2…, Mn be a set of daily rainfall or streamflow, and let X = max(Mi) be the maximum for the year. If Mi are independent and 0.12 identically distributed, then for large n, X has an extreme value type I 0.1 or Gumbel distribution. f(x) 0.08 0.06 0.04 0.02 0 -5 0 5 10 15 20 1 −푦 푓 푥 = 푒− 푦+푒 x 훽 0.2 =1 푥 − 휇 0.18 =2 =3 푦 = 0.16 훽 =4 0.14 6푆푥 휇 = 푥 − 0.5772훽 훽 = 0.12 휋 0.1 f(x) 0.08 Distribution of annual maximum stream flow follows an EV1 distribution 0.06 0.04 0.02 0 -5 0 5 10 15 20 x EV type III distribution 1.8 If Wi are the minimum stream flows in different days of the year, =1 k=0.5 let X = min(Wi) be the smallest.
Recommended publications
  • Statistical Characterization of Tissue Images for Detec- Tion and Classification of Cervical Precancers
    Statistical characterization of tissue images for detec- tion and classification of cervical precancers Jaidip Jagtap1, Nishigandha Patil2, Chayanika Kala3, Kiran Pandey3, Asha Agarwal3 and Asima Pradhan1, 2* 1Department of Physics, IIT Kanpur, U.P 208016 2Centre for Laser Technology, IIT Kanpur, U.P 208016 3G.S.V.M. Medical College, Kanpur, U.P. 208016 * Corresponding author: [email protected], Phone: +91 512 259 7971, Fax: +91-512-259 0914. Abstract Microscopic images from the biopsy samples of cervical cancer, the current “gold standard” for histopathology analysis, are found to be segregated into differing classes in their correlation properties. Correlation domains clearly indicate increasing cellular clustering in different grades of pre-cancer as compared to their normal counterparts. This trend manifests in the probabilities of pixel value distribution of the corresponding tissue images. Gradual changes in epithelium cell density are reflected well through the physically realizable extinction coefficients. Robust statistical parameters in the form of moments, characterizing these distributions are shown to unambiguously distinguish tissue types. These parameters can effectively improve the diagnosis and classify quantitatively normal and the precancerous tissue sections with a very high degree of sensitivity and specificity. Key words: Cervical cancer; dysplasia, skewness; kurtosis; entropy, extinction coefficient,. 1. Introduction Cancer is a leading cause of death worldwide, with cervical cancer being the fifth most common cancer in women [1-2]. It originates as a few abnormal cells in the initial stage and then spreads rapidly. Treatment of cancer is often ineffective in the later stages, which makes early detection the key to survival. Pre-cancerous cells can sometimes take 10-15 years to develop into cancer and regular tests such as pap-smear are recommended.
    [Show full text]
  • The Instat Guide to Choosing and Interpreting Statistical Tests
    Statistics for biologists The InStat Guide to Choosing and Interpreting Statistical Tests GraphPad InStat Version 3 for Macintosh By GraphPad Software, Inc. © 1990-2001,,, GraphPad Soffftttware,,, IIInc... Allllll riiighttts reserved... Program design, manual and help Dr. Harvey J. Motulsky screens: Paige Searle Programming: Mike Platt John Pilkington Harvey Motulsky Maciiintttosh conversiiion by Soffftttware MacKiiiev... www...mackiiiev...com Project Manager: Dennis Radushev Programmers: Alexander Bezkorovainy Dmitry Farina Quality Assurance: Lena Filimonihina Help and Manual: Pavel Noga Andrew Yeremenko InStat and GraphPad Prism are registered trademarks of GraphPad Software, Inc. All Rights Reserved. Manufactured in the United States of America. Except as permitted under the United States copyright law of 1976, no part of this publication may be reproduced or distributed in any form or by any means without the prior written permission of the publisher. Use of the software is subject to the restrictions contained in the accompanying software license agreement. How ttto reach GraphPad: Phone: (US) 858-457-3909 Fax: (US) 858-457-8141 Email: [email protected] or [email protected] Web: www.graphpad.com Mail: GraphPad Software, Inc. 5755 Oberlin Drive #110 San Diego, CA 92121 USA The entire text of this manual is available on-line at www.graphpad.com Contents Welcome to InStat........................................................................................7 The InStat approach ---------------------------------------------------------------7
    [Show full text]
  • Extreme Value Theory and Backtest Overfitting in Finance
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Bowdoin College Bowdoin College Bowdoin Digital Commons Honors Projects Student Scholarship and Creative Work 2015 Extreme Value Theory and Backtest Overfitting in Finance Daniel C. Byrnes [email protected] Follow this and additional works at: https://digitalcommons.bowdoin.edu/honorsprojects Part of the Statistics and Probability Commons Recommended Citation Byrnes, Daniel C., "Extreme Value Theory and Backtest Overfitting in Finance" (2015). Honors Projects. 24. https://digitalcommons.bowdoin.edu/honorsprojects/24 This Open Access Thesis is brought to you for free and open access by the Student Scholarship and Creative Work at Bowdoin Digital Commons. It has been accepted for inclusion in Honors Projects by an authorized administrator of Bowdoin Digital Commons. For more information, please contact [email protected]. Extreme Value Theory and Backtest Overfitting in Finance An Honors Paper Presented for the Department of Mathematics By Daniel Byrnes Bowdoin College, 2015 ©2015 Daniel Byrnes Acknowledgements I would like to thank professor Thomas Pietraho for his help in the creation of this thesis. The revisions and suggestions made by several members of the math faculty were also greatly appreciated. I would also like to thank the entire department for their support throughout my time at Bowdoin. 1 Contents 1 Abstract 3 2 Introduction4 3 Background7 3.1 The Sharpe Ratio.................................7 3.2 Other Performance Measures.......................... 10 3.3 Example of an Overfit Strategy......................... 11 4 Modes of Convergence for Random Variables 13 4.1 Random Variables and Distributions...................... 13 4.2 Convergence in Distribution..........................
    [Show full text]
  • Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function
    Journal of the Korean Society of Marine Environment & Safety Research Paper Vol. 21, No. 3, pp. 253-258, June 30, 2015, ISSN 1229-3431(Print) / ISSN 2287-3341(Online) http://dx.doi.org/10.7837/kosomes.2015.21.3.253 Estimating Suitable Probability Distribution Function for Multimodal Traffic Distribution Function Sang-Lok Yoo* ․ Jae-Yong Jeong** ․ Jeong-Bin Yim** * Graduate school of Mokpo National Maritime University, Mokpo 530-729, Korea ** Professor, Mokpo National Maritime University, Mokpo 530-729, Korea Abstract : The purpose of this study is to find suitable probability distribution function of complex distribution data like multimodal. Normal distribution is broadly used to assume probability distribution function. However, complex distribution data like multimodal are very hard to be estimated by using normal distribution function only, and there might be errors when other distribution functions including normal distribution function are used. In this study, we experimented to find fit probability distribution function in multimodal area, by using AIS(Automatic Identification System) observation data gathered in Mokpo port for a year of 2013. By using chi-squared statistic, gaussian mixture model(GMM) is the fittest model rather than other distribution functions, such as extreme value, generalized extreme value, logistic, and normal distribution. GMM was found to the fit model regard to multimodal data of maritime traffic flow distribution. Probability density function for collision probability and traffic flow distribution will be calculated much precisely in the future. Key Words : Probability distribution function, Multimodal, Gaussian mixture model, Normal distribution, Maritime traffic flow 1. Introduction* traffic time and speed. Some studies estimated the collision probability when the ship Maritime traffic flow is affected by the volume of traffic, tidal is in confronting or passing by applying it to normal distribution current, wave height, and so on.
    [Show full text]
  • Continuous Dependent Variable Models
    Chapter 4 Continuous Dependent Variable Models CHAPTER 4; SECTION A: ANALYSIS OF VARIANCE Purpose of Analysis of Variance: Analysis of Variance is used to analyze the effects of one or more independent variables (factors) on the dependent variable. The dependent variable must be quantitative (continuous). The dependent variable(s) may be either quantitative or qualitative. Unlike regression analysis no assumptions are made about the relation between the independent variable and the dependent variable(s). The theory behind ANOVA is that a change in the magnitude (factor level) of one or more of the independent variables or combination of independent variables (interactions) will influence the magnitude of the response, or dependent variable, and is indicative of differences in parent populations from which the samples were drawn. Analysis is Variance is the basic analytical procedure used in the broad field of experimental designs, and can be used to test the difference in population means under a wide variety of experimental settings—ranging from fairly simple to extremely complex experiments. Thus, it is important to understand that the selection of an appropriate experimental design is the first step in an Analysis of Variance. The following section discusses some of the fundamental differences in basic experimental designs—with the intent merely to introduce the reader to some of the basic considerations and concepts involved with experimental designs. The references section points to some more detailed texts and references on the subject, and should be consulted for detailed treatment on both basic and advanced experimental designs. Examples: An analyst or engineer might be interested to assess the effect of: 1.
    [Show full text]
  • Fitting Population Models to Multiple Sources of Observed Data Author(S): Gary C
    Fitting Population Models to Multiple Sources of Observed Data Author(s): Gary C. White and Bruce C. Lubow Reviewed work(s): Source: The Journal of Wildlife Management, Vol. 66, No. 2 (Apr., 2002), pp. 300-309 Published by: Allen Press Stable URL: http://www.jstor.org/stable/3803162 . Accessed: 07/05/2012 19:10 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. Allen Press is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Wildlife Management. http://www.jstor.org FITTINGPOPULATION MODELS TO MULTIPLESOURCES OF OBSERVEDDATA GARYC. WHITE,'Department of Fisheryand WildlifeBiology, Colorado State University,Fort Collins, CO 80523, USA BRUCEC. LUBOW,Colorado Cooperative Fish and WildlifeUnit, Colorado State University,Fort Collins, CO 80523, USA Abstract:The use of populationmodels based on severalsources of data to set harvestlevels is a standardproce- dure most westernstates use for managementof mule deer (Odocoileushemionus), elk (Cervuselaphus), and other game populations.We present a model-fittingprocedure to estimatemodel parametersfrom multiplesources of observeddata using weighted least squares and model selectionbased on Akaike'sInformation Criterion. The pro- cedure is relativelysimple to implementwith modernspreadsheet software. We illustratesuch an implementation using an examplemule deer population.Typical data requiredinclude age and sex ratios,antlered and antlerless harvest,and populationsize.
    [Show full text]
  • Parametric (Theoretical) Probability Distributions. (Wilks, Ch. 4) Discrete
    6 Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Note: parametric: assume a theoretical distribution (e.g., Gauss) Non-parametric: no assumption made about the distribution Advantages of assuming a parametric probability distribution: Compaction: just a few parameters Smoothing, interpolation, extrapolation x x + s Parameter: e.g.: µ,σ population mean and standard deviation Statistic: estimation of parameter from sample: x,s sample mean and standard deviation Discrete distributions: (e.g., yes/no; above normal, normal, below normal) Binomial: E = 1(yes or success); E = 0 (no, fail). These are MECE. 1 2 P(E ) = p P(E ) = 1− p . Assume N independent trials. 1 2 How many “yes” we can obtain in N independent trials? x = (0,1,...N − 1, N ) , N+1 possibilities. Note that x is like a dummy variable. ⎛ N ⎞ x N − x ⎛ N ⎞ N ! P( X = x) = p 1− p , remember that = , 0!= 1 ⎜ x ⎟ ( ) ⎜ x ⎟ x!(N x)! ⎝ ⎠ ⎝ ⎠ − Bernouilli is the binomial distribution with a single trial, N=1: x = (0,1), P( X = 0) = 1− p, P( X = 1) = p Geometric: Number of trials until next success: i.e., x-1 fails followed by a success. x−1 P( X = x) = (1− p) p x = 1,2,... 7 Poisson: Approximation of binomial for small p and large N. Events occur randomly at a constant rate (per N trials) µ = Np . The rate per trial p is low so that events in the same period (N trials) are approximately independent. Example: assume the probability of a tornado in a certain county on a given day is p=1/100.
    [Show full text]
  • Handbook on Probability Distributions
    R powered R-forge project Handbook on probability distributions R-forge distributions Core Team University Year 2009-2010 LATEXpowered Mac OS' TeXShop edited Contents Introduction 4 I Discrete distributions 6 1 Classic discrete distribution 7 2 Not so-common discrete distribution 27 II Continuous distributions 34 3 Finite support distribution 35 4 The Gaussian family 47 5 Exponential distribution and its extensions 56 6 Chi-squared's ditribution and related extensions 75 7 Student and related distributions 84 8 Pareto family 88 9 Logistic distribution and related extensions 108 10 Extrem Value Theory distributions 111 3 4 CONTENTS III Multivariate and generalized distributions 116 11 Generalization of common distributions 117 12 Multivariate distributions 133 13 Misc 135 Conclusion 137 Bibliography 137 A Mathematical tools 141 Introduction This guide is intended to provide a quite exhaustive (at least as I can) view on probability distri- butions. It is constructed in chapters of distribution family with a section for each distribution. Each section focuses on the tryptic: definition - estimation - application. Ultimate bibles for probability distributions are Wimmer & Altmann (1999) which lists 750 univariate discrete distributions and Johnson et al. (1994) which details continuous distributions. In the appendix, we recall the basics of probability distributions as well as \common" mathe- matical functions, cf. section A.2. And for all distribution, we use the following notations • X a random variable following a given distribution, • x a realization of this random variable, • f the density function (if it exists), • F the (cumulative) distribution function, • P (X = k) the mass probability function in k, • M the moment generating function (if it exists), • G the probability generating function (if it exists), • φ the characteristic function (if it exists), Finally all graphics are done the open source statistical software R and its numerous packages available on the Comprehensive R Archive Network (CRAN∗).
    [Show full text]
  • Pdf) of a Random Process X(T) and E() the Mean
    Sensors 2008, 8, 5106-5119; DOI: 10.3390/s8085106 OPEN ACCESS sensors ISSN 1424-8220 www.mdpi.org/sensors Article The Statistical Meaning of Kurtosis and Its New Application to Identification of Persons Based on Seismic Signals Zhiqiang Liang *, Jianming Wei, Junyu Zhao, Haitao Liu, Baoqing Li, Jie Shen and Chunlei Zheng Shanghai Institute of Micro-system and Information Technology, Chinese Academy of Sciences, 200050, Shanghai, P.R. China E-mails: [email protected] (J.M.W); [email protected] (J.Y.Z); [email protected] (L.H.T) * Author to whom correspondence should be addressed; E-mail: [email protected] (Z.Q.L); Tel.: +86- 21-62511070-5914 Received: 14 July 2008; in revised form: 14 August 2008 / Accepted: 20 August 2008 / Published: 27 August 2008 Abstract: This paper presents a new algorithm making use of kurtosis, which is a statistical parameter, to distinguish the seismic signal generated by a person's footsteps from other signals. It is adaptive to any environment and needs no machine study or training. As persons or other targets moving on the ground generate continuous signals in the form of seismic waves, we can separate different targets based on the seismic waves they generate. The parameter of kurtosis is sensitive to impulsive signals, so it’s much more sensitive to the signal generated by person footsteps than other signals generated by vehicles, winds, noise, etc. The parameter of kurtosis is usually employed in the financial analysis, but rarely used in other fields. In this paper, we make use of kurtosis to distinguish person from other targets based on its different sensitivity to different signals.
    [Show full text]
  • Multivariate Non-Normally Distributed Random Variables in Climate Research – Introduction to the Copula Approach Christian Schoelzel, P
    Multivariate non-normally distributed random variables in climate research – introduction to the copula approach Christian Schoelzel, P. Friederichs To cite this version: Christian Schoelzel, P. Friederichs. Multivariate non-normally distributed random variables in cli- mate research – introduction to the copula approach. Nonlinear Processes in Geophysics, European Geosciences Union (EGU), 2008, 15 (5), pp.761-772. cea-00440431 HAL Id: cea-00440431 https://hal-cea.archives-ouvertes.fr/cea-00440431 Submitted on 10 Dec 2009 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Nonlin. Processes Geophys., 15, 761–772, 2008 www.nonlin-processes-geophys.net/15/761/2008/ Nonlinear Processes © Author(s) 2008. This work is licensed in Geophysics under a Creative Commons License. Multivariate non-normally distributed random variables in climate research – introduction to the copula approach C. Scholzel¨ 1,2 and P. Friederichs2 1Laboratoire des Sciences du Climat et l’Environnement (LSCE), Gif-sur-Yvette, France 2Meteorological Institute at the University of Bonn, Germany Received: 28 November 2007 – Revised: 28 August 2008 – Accepted: 28 August 2008 – Published: 21 October 2008 Abstract. Probability distributions of multivariate random nature can be found in e.g. Lorenz (1964); Eckmann and Ru- variables are generally more complex compared to their uni- elle (1985).
    [Show full text]
  • Section 2, Basic Statistics and Econometrics 1 Statistical
    Risk and Portfolio Management with Econometrics, Courant Institute, Fall 2009 http://www.math.nyu.edu/faculty/goodman/teaching/RPME09/index.html Jonathan Goodman Section 2, Basic statistics and econometrics The mean variance analysis of Section 1 assumes that the investor knows µ and Σ. This Section discusses how one might estimate them from market data. We think of µ and Σ as parameters in a model of market returns. Ab- stract and general statistical discussions usually use θ to represent one or several model parameters to be estimated from data. We will look at ways to estimate parameters, and the uncertainties in parameter estimates. These notes have a long discussion of Gaussian random variables and the distributions of t and χ2 random variables. This may not be discussed explicitly in class, and is background for the reader. 1 Statistical parameter estimation We begin the discussion of statistics with material that is naive by modern standards. As in Section 1, this provides a simple context for some of the fun- damental ideas of statistics. Many modern statistical methods are descendants of those presented here. Let f(x; θ) be a probability density as a function of x that depends on a parameter (or collection of parameters) θ. Suppose we have n independent samples of the population, f(·; θ). That means that the Xk are independent random variables each with the probability density f(·; θ). The goal of param- eter estimation is to estimate θ using the samples Xk. The parameter estimate is some function of the data, which is written θ ≈ θbn = θbn(X1;:::;Xn) : (1) A statistic is a function of the data.
    [Show full text]
  • Statistical Evidence of Central Moments As Fault Indicators in Ball Bearing Diagnostics
    Statistical evidence of central moments as fault indicators in ball bearing diagnostics Marco Cocconcelli1, Giuseppe Curcuru´2 and Riccardo Rubini1 1University of Modena and Reggio Emilia Via Amendola 2 - Pad. Morselli, 42122 Reggio Emilia, Italy fmarco.cocconcelli, [email protected] 2University of Palermo Viale delle Scienze 1, 90128 Palermo, Italy [email protected] Abstract This paper deals with post processing of vibration data coming from a experimental tests. An AC motor running at constant speed is provided with a faulted ball bearing, tests are done changing the type of fault (outer race, inner race and balls) and the stage of the fault (three levels of severity: from early to late stage). A healthy bearing is also measured for the aim of comparison. The post processing simply consists in the computation of scalar quantities that are used in condition monitoring of mechanical systems: variance, skewness and kurtosis. These are the second, the third and the fourth central moment of a real-valued function respectively. The variance is the expectation of the squared deviation of a random variable from its mean, the skewness is the measure of the lopsidedness of the distribution, while the kurtosis is a measure of the heaviness of the tail of the distribution, compared to the normal distribution of the same variance. Most of the papers in the last decades use them with excellent results. This paper does not propose a new fault detection technique, but it focuses on the informative content of those three quantities in ball bearing diagnostics from a statistical point of view.
    [Show full text]