Application of the Truncated Triangular and the Trapezoidal Distributions

Total Page:16

File Type:pdf, Size:1020Kb

Application of the Truncated Triangular and the Trapezoidal Distributions Section on Survey Research Methods Application of the Truncated Triangular and the Trapezoidal Distributions Jay J. Kim ORM, NCHS, 3311 Toledo rd, Hyattsville, MD, 20782 Abstract them and investigate some of their properties in this paper. It has been suggested that the truncated triangular distribution be used for masking Multiplicative noise has the following form: microdata. The random variable which follows the truncated triangular distribution serves as a yxeiiii==,1,2, n, multiplicative noise factor. The natural candidate distribution is the one which is th where y is the masked variable for the i unit centered at 1, but truncated symmetrically i around 1, because it is not desirable to use such as person, household, establishment or some values which are too close to 1 due to governmental unit, xi is the corresponding un- confidentiality concerns. Instead of truncating masked variable and ei is the noise. the middle section of the distribution, we can assign low probabilities for that part of the Since noise is generated independent of the distribution. This procedure will render some original data, x and e are independent. Thus variations of the trapezoidal distribution. Two i i other approaches are to use the upside-down triangular distribution or parabloa for the Ey()= ExEe ()(). middle section of the distribution. In this paper, we will derive the probability density Usually E(e)= 1 is used, and thus function for each of the four distributions and investigate their properties. Ey()= Ex (). KEY WORDS: confidentiality, masking, Also truncated triangular distribution 22 Vy()=++ VxVe () ()µµ Vx () Ve (). 1 Introduction ex Since around 1980, the U.S. Energy Information Thus 2 Administration (EIA) has been using Vy()− µx Ve () Vx()= 2 (1) multiplicative noise for masking the number of Ve()+ µe heating and cooling days and marginal rates for utility, such as electricity and gas in their public Since the data disseminating agencies provide use micro data file from the Residential Energy the mean and variance of the noise distribution to Consumption Survey. Evans, et al (1998) the data users, users can estimate the variance of proposed use of multiplicative noise to mask characteristics from the masked data using the economic data. They considered noise which formula in equation (1). follows distributions such as normal and truncated normal distributions. Kim and Winkler 2. Noise Distributions. (2002, 2003) considered a noise distribution following the truncated normal distribution. 2.1 Triangular Distribution Massell and Russell (2006) proposed to use a truncated triangular distribution for masking Since a truncated triangular distribution is a economic data. In this paper, we will consider modified form of a triangular distribution, we noise that follows the truncated triangular, consider a triangular distribution first. modified trapezoidal, double triangular and parabola-filled truncated triangular distributions. As the above probability density functions are not available at this moment, we will derive 2723 Section on Survey Research Methods 2.2 Truncated Triangular Distribution When noise following a triangular distribution is used, using 1 for m is not desirable. This is because multiplication by 1 does not change the original value at all. That is, if the original value is multiplied by 1 for a unit, the unit does not get e any protection from disclosure. Even worse, the a m b probability of noise being 1 is the highest in this Figure 1. Triangular Distribution type of distribution. This implies that the largest number of units do not get protection. Thus Massell and Russell suggested using noise that follows the triangular distribution whose middle Triangular distribution of e, as shown above has portion, or the part near 1, is truncated. The the following form. truncated triangular distribution mentioned above is shown in Figure 2. ⎧ 2 ea− ⎪ , aem≤< ⎪bama−− fe()= ⎨ (2) ⎪ 2 be− , meb≤< ⎩⎪babm−− Note that m is the mode of this distribution. If it is a symmetric distribution around m, m is also the mean and median of the variable e. If the distribution is symmetric around m, i.e., ma−=− bm then equation (2) reduces to e m cd ab ⎧ ea− ⎪ , aem≤< Figure 2. Truncated Triangular Distribution ⎪()ma− 2 fe()= ⎨ (3) ⎪ be− , meb≤< ⎩⎪()bm− 2 The expected value of e is Suppose the distribution is truncated at b and c , cb> , as shown above, which are around m. mab++ One of the vertices, denoted by b in equation (2), Ee()= (4) is now denoted by d. In this case, the probability 3 density function (pdf) of e has the following form: Suppose the triangular distribution is symmetric around m, i.e., ma bm or ab2 m, −=− += ⎧ 2(ea− ) then the above expectation reduces to m. ⎪ , aem≤< ⎪km()()−− a d a fe()= ⎨ (7) The variance of e is ⎪ 2(de− ) , med≤< ⎩⎪kd()()−− m d a b222−++ abam − mab() + Ve()= . (5) 18 where k needs be determined. If ma−=− bm or ab+=2 m, then the above Now variance reduces to bd2(ea−− ) 2( de ) ∫∫de+ de ()bm− 2 ackmada()()−− kdmda ()() − − Ve()= (6) 6 2724 Section on Survey Research Methods ()ba−−22 () dc symmetric triangular distribution with m=1 will = + . (8) provide an unbiased mean of the original data. kmada()()()()−− kdmda − − Example 1. In order for the function in equation (7) to become a pdf, equation (8) needs to be 1. Thus If a = 0.5, d = 1.5, m = 1, b = 0.75 and c = 1.25, then E(e) without truncation is ()()()()dmba−−+−−22 madc k = ()()()damadm−−− 1++ 0.75 1.25 Ee()== 1. 3 Using the above k, the truncated triangular distribution of e, truncated at b and c is, The expected value of e after the truncation is ⎧ 2(dm− ) Ee()= 1, since m = 1. ⎪ 22(),ea− ⎪()()()()ba−−+−− dm dc ma ⎪ for a≤< e b As shown in the formulas, they are the same. fe()= ⎨ ⎪ 2(ma− ) (),de− The variance of e of the truncated triangular ⎪()()()()ba−−+−−22 dm dc ma ⎪ distribution is, ⎩ forced≤< (9) 1 Ve()= ⎡⎤222 If the triangular distribution is symmetric around 18⎣⎦()()()()ba−−+−− dm dc ma m and the truncation is also symmetric around m, 62 62 then equation (9) can be simplified as follows: {()()()()ba−−+−− dm dc ma 22 2 +−()()()()()ba dmdc − − ma −⎣⎡ 3 ba + ⎧ ea− ⎪ , aeb≤< 2 22 ⎪()dc− 2 ++()32cd +() a + d − 422()() ba + cd +⎦⎤} (13) fe()= ⎨ (10) ⎪ de− , ced≤< ⎩⎪ ()dc− 2 If the truncation is symmetric around m and the triangular distribution is also symmetric around The expected value of e without assuming m, then symmetry of the distribution and truncation is 22 2 16mc+−++ 8 md 2( d 2 dc 3 c ) Ve()=− m (14) ()()(2)dmba−−2 ba + 12 Ee()= 22 3[(ba−−+−− ) ( dm ) ( dc ) ( ma )] Example 2. Using the data in Example 1, V(e) without truncation is .04167, but after truncation, ()()(2)madc−−2 cd + + (11) it is .1146. The variance of e after truncation is 3[(ba−−+−− )22 ( dm ) ( dc ) ( ma )] 2.75 times that of the variable e without truncation. Suppose again that the triangular distribution is symmetric around m and the mid-section of the The variance of the truncated triangular distribution is truncated symmetrically around m. distribution is greater, since the values around Then the mean, which have the highest frequencies, are not there anymore. E(e)=m (12) 2.3 Trapezoidal Distribution Thus, the mean of the symmetrically truncated triangular distribution, where the mid-section of If we connect the top two points in the truncated the distribution is symmetrically truncated, is the part of the truncated triangular distribution, we same as that of the un-truncated triangular have a trapezoidal distribution. Instead of not distribution. Consequently, the data masked by allowing noise (e) to have any positive noise following the symmetrically truncated probability between b and c, we can allow some 2725 Section on Survey Research Methods small (equal) probability. In that case, we have a Suppose the trapezoidal distribution is modified trapezoidal distribution, which is symmetric around m, then Ex()= m . shown in Figure 4. To derive the probability density function for the modified trapezoidal 2.4 Modified Trapezoidal Distribution distribution, we start with the trapezoidal distribution in Figure 3. When numbers around 1 are multiplied for masking records, it is better to avoid using 1 or numbers too close to 1. This is the reason why the truncated normal and triangular distributions are considered as candidates for a noise distribution, where truncation is done very near to 1. An alternative approach to the truncated distribution is using a distribution which allows a small probability, instead of a zero probability, in the mid-section of the noise distribution. The shape of the noise distribution then will have a variation of the trapezoidal distribution (see a b m c d Figure 4 below), where there is no connectivity Figure 3. Trapezoidal Distribution in the two highest points and the mid-section is dropped down. Let the trapezoidal distribution have the following form. ⎧ 2 k ⎪ 1 (),ea−≤< aeb ⎪()()daba−− ⎪ ⎪ ⎪ 2 k fe()=≤<⎨ 2 , b e c (15) da ⎪ − e ⎪ a b m c d ⎪ Figure 4. Modified Trapezoidal Distribution ⎪ 2 k3 (),de−≤< ced ⎩⎪()()dadc−− Depending on values of k’s, the height of top line k changes. Suppose kkk13== and k2 = . Then q da− If f (b) = f (c), kkk=== , 123dcba ⎧ 2kea− +−− ⎪ , aeb≤< ⎪daba−− then the distribution has the following form: ⎪ 2k fe()=≤<⎨ , b e c (17) ⎪ ()daq− ⎧ 2 ea− ⎪ ⎪ , aeb≤< 2kde− dcbaba+−− − ⎪ , ced≤< ⎪ ⎩dadc−− ⎪ 2 fe()=≤<⎨ , b e c (16) dcba ⎪ +−− 2k b ⎪ 2 de− ∫∫f() e de=− ( e a ) de + ⎪ , ced≤< ()()daba−−a ⎩dcbadc+−− − Ω 22kkcd The above is the same form as given by van ∫∫1()de+− d e de (18) Dorp and Kotz. qd()−−− abc ()() d a d c 2726 Section on Survey Research Methods The equation above needs to sum up to 1 in order for the function in equation (17) to be a probability density function.
Recommended publications
  • 2.1 Introduction Civil Engineering Systems Deal with Variable Quantities, Which Are Described in Terms of Random Variables and Random Processes
    2.1 Introduction Civil Engineering systems deal with variable quantities, which are described in terms of random variables and random processes. Once the system design such as a design of building is identified in terms of providing a safe and economical building, variable resistances for different members such as columns, beams and foundations which can sustain variable loads can be analyzed within the framework of probability theory. The probabilistic description of variable phenomena needs the use of appropriate measures. In this module, various measures of description of variability are presented. 2. HISTOGRAM AND FREQUENCY DIAGRAM Five graphical methods for analyzing variability are: 1. Histograms, 2. Frequency plots, 3. Frequency density plots, 4. Cumulative frequency plots and 5. Scatter plots. 2.1. Histograms A histogram is obtained by dividing the data range into bins, and then counting the number of values in each bin. The unit weight data are divided into 4- kg/m3 wide intervals from to 2082 in Table 1. For example, there are zero values between 1441.66 and 1505 Kg / m3 (Table 1), two values between 1505.74 kg/m3 and 1569.81 kg /m3 etc. A bar-chart plot of the number of occurrences in each interval is called a histogram. The histogram for unit weight is shown on fig.1. Table 1 Total Unit Weight Data from Offshore Boring Total unit Total unit 2 3 Depth weight (x-βx) (x-βx) Depth weight Number (m) (Kg / m3) (Kg / m3)2 (Kg / m3)3 (m) (Kg / m3) 1 0.15 1681.94 115.33 -310.76 52.43 1521.75 2 0.30 1906.20 2050.36 23189.92 2.29 1537.77
    [Show full text]
  • Stat 345 - Homework 3 Chapter 4 -Continuous Random Variables
    STAT 345 - HOMEWORK 3 CHAPTER 4 -CONTINUOUS RANDOM VARIABLES Problem One - Trapezoidal Distribution Consider the following probability density function f(x). 8 > x+1 ; −1 ≤ x < 0 > 5 > <> 1 ; 0 ≤ x < 4 f(x) = 5 > 5−x ; 4 ≤ x ≤ 5 > 5 > :>0; otherwise a) Sketch the PDF. b) Show that the area under the curve is equal to 1 using geometry. c) Show that the area under the curve is equal to 1 by integrating. (You will have to split the integral into pieces). d) Find P (X < 3) using geometry. e) Find P (X < 3) with integration. Problem Two - Probability Density Function Consider the following function f(x), where θ > 0. 8 < c (θ − x); 0 < x < θ f(x) = θ2 :0; otherwise a) Find the value of c so that f(x) is a valid PDF. Sketch the PDF. b) Find the mean and standard deviation of X. p c) If nd , 2 and . θ = 2 P (X > 1) P (X > 3 ) P (X > 2 − 2) c) Compute θ 3θ . P ( 10 < X < 5 ) Problem Three - Cumulative Distribution Functions For each of the following probability density functions, nd the CDF. The Median (M) of a distribution is dened by the property: 1 . Use the CDF to nd the Median of each distribution. F (M) = P (X ≤ M) = 2 a) (Special case of the Beta Distribution) 8 p <1:5 x; 0 < x < 1 f(x) = :0; otherwise b) (Special case of the Pareto Distribution) 8 < 1 ; x > 1 f(x) = x2 :0; otherwise c) Take the derivative of your CDF for part b) and show that you can get back the PDF.
    [Show full text]
  • STAT 345 Spring 2018 Homework 5 - Continuous Random Variables Name
    STAT 345 Spring 2018 Homework 5 - Continuous Random Variables Name: Please adhere to the homework rules as given in the Syllabus. 1. Trapezoidal Distribution. Consider the following probability density function. 8 x+1 ; −1 ≤ x < 0 > 5 > 1 < 5 ; 0 ≤ x < 4 f(x) = 5−x > ; 4 ≤ x ≤ 5 > 5 :>0; otherwise a) Sketch the PDF. b) Show that the area under the curve is equal to 1 using geometry. c) Show that the area under the curve is equal to 1 by integrating. (You will have to split the integral into peices). d) Find P (X < 3) using geometry. e) Find P (X < 3) with integration. 2. Let X have the following PDF, for θ > 0. ( c (θ − x); 0 < x < θ f(x) = θ2 0; otherwise a) Find the value of c which makes f(x) a valid PDF. Sketch the PDF. b) Find the mean and standard deviation of X. θ 3θ c) Find P 10 < X < 5 . 3. PDF to CDF. For each of the following PDFs, find and sketch the CDF. The Median (M) of a continuous distribution is defined by the property F (M) = 0:5. Use the CDF to find the Median of each distribution. a) (Special case of Beta Distribution) p f(x) = 1:5 x; 0 < x < 1 b) (Special case of Pareto Distribution) 1 f(x) = ; x > 1 x2 4. CDF to PDF. For each of the following CDFs, find the PDF. a) 8 0; x < 0 <> F (x) = 1 − (1 − x2)2; 0 ≤ x < 1 :>1; x ≥ 1 b) ( 0; x < 0 F (x) = 1 − e−x2 ; x ≥ 0 5.
    [Show full text]
  • Hand-Book on STATISTICAL DISTRIBUTIONS for Experimentalists
    Internal Report SUF–PFY/96–01 Stockholm, 11 December 1996 1st revision, 31 October 1998 last modification 10 September 2007 Hand-book on STATISTICAL DISTRIBUTIONS for experimentalists by Christian Walck Particle Physics Group Fysikum University of Stockholm (e-mail: [email protected]) Contents 1 Introduction 1 1.1 Random Number Generation .............................. 1 2 Probability Density Functions 3 2.1 Introduction ........................................ 3 2.2 Moments ......................................... 3 2.2.1 Errors of Moments ................................ 4 2.3 Characteristic Function ................................. 4 2.4 Probability Generating Function ............................ 5 2.5 Cumulants ......................................... 6 2.6 Random Number Generation .............................. 7 2.6.1 Cumulative Technique .............................. 7 2.6.2 Accept-Reject technique ............................. 7 2.6.3 Composition Techniques ............................. 8 2.7 Multivariate Distributions ................................ 9 2.7.1 Multivariate Moments .............................. 9 2.7.2 Errors of Bivariate Moments .......................... 9 2.7.3 Joint Characteristic Function .......................... 10 2.7.4 Random Number Generation .......................... 11 3 Bernoulli Distribution 12 3.1 Introduction ........................................ 12 3.2 Relation to Other Distributions ............................. 12 4 Beta distribution 13 4.1 Introduction .......................................
    [Show full text]
  • MARLAP Manual Volume III: Chapter 19, Measurement Uncertainty
    19 MEASUREMENT UNCERTAINTY 19.1 Overview This chapter discusses the evaluation and reporting of measurement uncertainty. Laboratory measurements always involve uncertainty, which must be considered when analytical results are used as part of a basis for making decisions.1 Every measured result reported by a laboratory should be accompanied by an explicit uncertainty estimate. One purpose of this chapter is to give users of radioanalytical data an understanding of the causes of measurement uncertainty and of the meaning of uncertainty statements in laboratory reports. The chapter also describes proce- dures which laboratory personnel use to estimate uncertainties. This chapter has more than one intended audience. Not all readers are expected to have the mathematical skills necessary to read and completely understand the entire chapter. For this reason the material is arranged so that general information is presented first and the more tech- nical information, which is intended primarily for laboratory personnel with the required mathe- matical skills, is presented last. The general discussion in Sections 19.2 and 19.3 requires little previous knowledge of statistical metrology on the part of the reader and involves no mathe- matical formulas; however, if the reader is unfamiliar with the fundamental concepts and terms of probability and statistics, he or she should read Attachment 19A before starting Section 19.3. The technical discussion in Sections 19.4 and 19.5 requires an understanding of basic algebra and at least some familiarity with the fundamental concepts of probability and statistics. The discus- sion of uncertainty propagation requires knowledge of differential calculus for a com- Contents plete understanding.
    [Show full text]
  • Generating Random Variates
    CHAPTER 8 Generating Random Variates 8.1 Introduction................................................................................................2 8.2 General Approaches to Generating Random Variates ....................................3 8.2.1 Inverse Transform...............................................................................................................3 8.2.2 Composition.....................................................................................................................11 8.2.3 Convolution......................................................................................................................16 8.2.4 Acceptance-Rejection.......................................................................................................18 8.2.5 Special Properties.............................................................................................................26 8.3 Generating Continuous Random Variates....................................................27 8.4 Generating Discrete Random Variates.........................................................27 8.4.3 Arbitrary Discrete Distribution...........................................................................................28 8.5 Generating Random Vectors, Correlated Random Variates, and Stochastic Processes ........................................................................................................34 8.5.1 Using Conditional Distributions ..........................................................................................35
    [Show full text]
  • Tumor Detection Using Trapezoidal Function and Rayleigh Distribution
    IOSR Journal of Electronics and Communication Engineering (IOSR-JECE) e-ISSN: 2278-2834,p- ISSN: 2278-8735.Volume 12, Issue 3, Ver. IV (May - June 2017), PP 27-32 www.iosrjournals.org Tumor Detection Using Trapezoidal Function and Rayleigh Distribution. *Mohammad Aslam. C1 Satya Narayana. D2 Padma Priya. K3 1 Vaagdevi Institute of Technology and Science, Proddatur, A.P 2 Rajiv Gandhi Memorial College Of Engineering And Technology, Nandyal, A.P 3jntuk College of Engineering, Kakinada, A.P Corresponding Author: Mohammad Aslam. C1 Abstract: Breast cancer is one of the most common cancers worldwide. In developed countries, among one in eight women develop breast cancer at some stage of their life. Early diagnosis of breast cancer plays a very important role in treatment of the disease. With the goal of identifying genes that are more correlated with the prognosis of breast cancer. An algorithm based on Trapezoidal and Rayleigh distributions. Property is used for the detection of the cancer tumors. Fuzzy logic and markov random fields work in real time. We also try to show its performance by comparing it with the existing real time algorithm that uses only markov random fields. This is not only suitable for the detection of the tumors, but also suitable for various general applications like in the detection of object in an image etc. Keywords: Tumors, Rayleigh distribution texture, Segmentation, Fuzzy Logic. --------------------------------------------------------------------------------------------------------------------------------------- Date of Submission: 19-06-2017 Date of acceptance: 26-07-2017 ----------------------------------------------------------------------------------------------------------------------------- ---------- I. Introduction Breast Cancer is one of the main sources of malignancy related passings in ladies. There are accessible distinctive strategies for the discovery of malignancy tumors, these techniques are utilized for the finding of disease in ladies at early stages.
    [Show full text]
  • Determining the 95% Confidence Interval of Arbitrary Non-Gaussian Probability Distributions
    XIX IMEKO World Congress Fundamental and Applied Metrology September 6−11, 2009, Lisbon, Portugal DETERMINING THE 95% CONFIDENCE INTERVAL OF ARBITRARY NON-GAUSSIAN PROBABILITY DISTRIBUTIONS France Pavlovčič, Janez Nastran, David Nedeljković University of Ljubljana, Faculty of Electrical Engineering, Ljubljana, Slovenia E-mail: [email protected], [email protected], [email protected] Abstract − Measurements are nowadays permanent at- drift, the time drift, and the resolution, all have the rectan- tendants of scientific research, health, medical care and gular distribution. Sometimes also the A-type uncertainties treatment, industrial development, safety and even global do not match with the normal distribution as they are result economy. All of them depend on accurate measurements of regulated quantities, or are affected by the resolution of and tests, and many of these fields are under the legal me- measuring device. In the latter case the probability distribu- trology because of their severity. How the measurement is tion consists of two Dirac functions, one at the lower meas- accurate, is expressed by uncertainty, which is obtained by urement reading and the other at the upper measurement multiplication of standard deviations by coverage factors to reading regarding the resolution. If the distribution is un- increase trustfulness in the measured results. These coverage known the coverage factor is calculated from Chebyshev's factors depend on degree of freedom, which is the function inequality [2] and is 4.472 for the 95% confidence interval. of the number of implied repetitions of measurements, and The uncertainty, the extended or the standard one, does not therefore the reliability of the results is increased.
    [Show full text]
  • Generalized Trapezoidal Distributions
    1 Generalized Trapezoidal Distributions J. René van Dorp, Samuel Kotz Department of Engineering Management & Systems Engineering, The George Washington University, 1776 G Street, N.W., Washington D.C. 20052 (e-mail: [email protected]). Submitted April 24, 2002; Revised, March 2003 Abstract. We present a construction and basic properties of a class of continuous distributions of an arbitrary form defined on a compact (bounded) set by concatenating in a continuous manner three probability density functions with bounded support using a modified mixture technique. These three distributions may represent growth, stability and decline stages of a physical or mental phenomenon. Keywords: Mixtures of Distributions, Risk AnalysisÄ Applied Physics 1. INTRODUCTION Trapezoidal distributions have been advocated in risk analysis problems by Pouliquen (1970) and more recently by Powell and Wilson (1997). They have also found application as membership functions in fuzzy set theory (see, e.g. Chen and Hwang (1992)). Our interest in trapezoidal distributions and their modifications stems mainly from the conviction that many physical processes in nature and human body and mind (over time) reflect the form of the trapezoidal distribution. In this context, trapezoidal distributions have been used in the screening and detection of cancer (see, e.g. Flehinger and Kimmel, (1987) and Brown (1999)). Trapezoidal distributions seem to be appropriate for modeling the duration and the form of a phenomenon which may be represented by three stages. The first stage can be To appear in Metrika, Vol. 58, Issue 1, July 2003 2 viewed as a growth-stage, the second corresponds to a relative stability and the third represents a decline (decay).
    [Show full text]
  • RCEM – Reclamation Consequence Estimating Methodology Interim
    RCEM – Reclamation Consequence Estimating Methodology Interim Examples of Use U.S. Department of the Interior Bureau of Reclamation June 2014 Mission Statements The U.S. Department of the Interior protects America’s natural resources and heritage, honors our cultures and tribal communities, and supplies the energy to power our future. The mission of the Bureau of Reclamation is to manage, develop, and protect water and related resources in an environmentally and economically sound manner in the interest of the American public. Interim RCEM – Reclamation Consequence Estimating Methodology Examples of Use U.S. Department of the Interior Bureau of Reclamation June 2014 RCEM – Examples of Use Interim Table of Contents Introduction ................................................................................................................... 1 General Usage ........................................................................................................... 1 Description of Examples ........................................................................................... 1 Format of Examples .................................................................................................. 3 Graphical Approach ...................................................................................................... 3 Uncertainty and Variability ........................................................................................... 8 Examples ......................................................................................................................
    [Show full text]
  • Selecting and Applying Error Distributions in Uncertainty Analysis1
    SELECTING AND APPLYING ERROR DISTRIBUTIONS 1 IN UNCERTAINTY ANALYSIS Dr. Howard Castrup Integrated Sciences Group 14608 Casitas Canyon Road Bakersfield, CA 93306 1-661-872-1683 [email protected] ABSTRACT Errors or biases from a number of sources may be encountered in performing a measurement. Such sources include random error, measuring parameter bias, measuring parameter resolution, operator bias, environmental factors, etc. Uncertainties due to these errors are estimated either by computing a Type A standard deviation from a sample of measurements or by forming a Type B estimate based on experience. In this paper, guidelines are given for selecting and applying various statistical error distribution that have been found useful for Type A and Type B analyses. Both traditional and Monte Carlo methods are outlined. Also discussed is the estimation of the degrees of freedom for both Type A or Type B estimates. Where appropriate, concepts will be illustrated using spreadsheets, freeware and commercially available software. INTRODUCTION THE GENERAL ERROR MODEL The measurement error model employed in this paper is given in the expression x = xtrue + (1) where x = a value obtained by measurement xtrue = the true value of the measured quantity = the measurement error. With univariate direct measurements, the variable is composed of a linear sum of errors from various measurement error sources. With multivariate measurements consists of a weighted sum of error components, each of which is make up of errors from measurement error sources. More will be said on this later. APPLICABLE AXIOMS In order to use Eq. (1) to estimate measurement uncertainty, we invoke three axioms [HC95a]: Axiom 1 Measurement errors are random variables that follow statistical distributions.
    [Show full text]
  • Uncertainty-Aware Principal Component Analysis
    Uncertainty-Aware Principal Component Analysis Jochen Gortler,¨ Thilo Spinner, Dirk Streeb, Daniel Weiskopf, and Oliver Deussen 0 0 0 0 0 0 0 0 0 0 0 0 (a) Regular PCA (b) s = 0.5 (c) s = 0.8 (d) s = 1.0 Fig. 1. Data uncertainty can have a significant influence on the outcome of dimensionality reduction techniques. We propose a generalization of principal component analysis (PCA) that takes into account the uncertainty in the input. The top row shows the dataset with varying degrees of uncertainty and the corresponding principal components, whereas the bottom row shows the projection of the dataset, using our method, onto the first principal component. In Figures (a) and (b), with relatively low uncertainty, the blue and the orange distributions are comprised by the red and the green distributions. In Figures (c) and (d), with a larger amount of uncertainty, the projection changes drastically: now the orange and blue distributions encompass the red and the green distributions. Abstract—We present a technique to perform dimensionality reduction on data that is subject to uncertainty. Our method is a generalization of traditional principal component analysis (PCA) to multivariate probability distributions. In comparison to non-linear methods, linear dimensionality reduction techniques have the advantage that the characteristics of such probability distributions remain intact after projection. We derive a representation of the PCA sample covariance matrix that respects potential uncertainty in each of the inputs, building the mathematical foundation of our new method: uncertainty-aware PCA. In addition to the accuracy and performance gained by our approach over sampling-based strategies, our formulation allows us to perform sensitivity analysis with regard to the uncertainty in the data.
    [Show full text]