The Probability Distribution Functions of Emission Line Flux Measurements
Total Page:16
File Type:pdf, Size:1020Kb
Mon. Not. R. Astron. Soc. 000, 000–000 (0000) Printed 6 April 2016 (MN LATEX style file v2.2) The probability distribution functions of emission line flux measurements and their ratios R. Wesson1, D.J. Stock2, P. Scicluna3 1European Southern Observatory, Alonso de Cordova´ 3107, Casilla 19001, Santiago, Chile 2Department of Physics and Astronomy, University of Western Ontario, London, ON N6K 3K7, Canada 3Kiel University, Institute of Theoretical Physics and Astrophysics, Leibnizstr. 15, 24118 Kiel, Germany Received: ABSTRACT Many physical parameters in astrophysics are derived using the ratios of two observed quantities. If the relative uncertainties on measurements are small enough, uncertainties can be propagated analytically using simplifying assumptions, but for large normally distributed uncertainties, the probability distribution of the ratio become skewed, with a modal value offset from that expected in Gaussian uncertainty propagation. Furthermore, the most likely value of a ratio A/B is not equal to the reciprocal of the most likely value of B/A. The ef- fect is most pronounced when the uncertainty on the denominator is larger than that on the numerator. We show that this effect is seen in an analysis of 12,126 spectra from the Sloan Digital Sky Survey. The intrinsically fixed ratio of the [O III] lines at 4959 and 5007A˚ is conven- tionally expressed as the ratio of the stronger line to the weaker line. Thus, the uncertainty on the denominator is larger, and non-Gaussian probability distributions result. By taking this effect into account, we derive an improved estimate of the intrinsic 5007/4959 ratio. We ob- tain a value of 3.012±0.008, which is slightly but statistically significantly higher than the theoretical value of 2.98. We further investigate the suggestion that fluxes measured from emission lines at low signal to noise are strongly biased upwards. We were unable to detect this effect in the SDSS line flux measurements, and we could not reproduce the results of Rola and Pelat who first described this bias. We suggest that the magnitude of this effect may depend strongly on the specific fitting algorithm used. Key words: atomic data – methods: data analysis – methods: statistical arXiv:1604.01205v1 [astro-ph.IM] 5 Apr 2016 1 INTRODUCTION value of the intrinsic [O III] 4959/5007 ratio by ensuring that the biases described are minimised. We then consider whether line flux A great deal of information in the physical sciences is derived measurements at low signal to noise ratios are strongly biased up- from the ratios of observed quantities. The measurements of quan- wards, as has previously been suggested. Finally we discuss the tities are always associated with an uncertainty, and the uncertainty circumstances in which these biases may lead to erroneous conclu- should of course be propagated into the resulting ratio. When do- sions. ing so, if the fractional uncertainty is small, one can use truncated Taylor expansions to derive approximate expressions for the un- certainties on derived quantities. However, when dealing with real 2 THE PROBABILITY DISTRIBUTION OF THE RATIO data, the fractional uncertainty is often not small, and biases may OF GAUSSIANS result from any invalid approximations made in propagating the un- certainties. If two quantities X and Y have independent Gaussian probability We outline in this paper some properties of the probability dis- density functions, both with mean zero and variance of unity, then tributions of ratios which can significantly affect the interpretation the probability distribution of their ratio X/Y, f(X/Y), has a Lorentz of results when the signal to noise ratio of measurements is rela- distribution (Marsaglia 1964, Hinkley 1969, Marsaglia 2006): tively low. We first describe a number of mathematical axioms, and the biases which result from them, and then we show that these bi- 1 f(X=Y ) / (1) ases can be detected in observational data. We derive an improved 1 + x2 c 0000 RAS 2 R. Wesson et al. Figure 1. The distribution of the ratios of two normally distributed variables Figure 2. Scaled probability density functions for the ratios of non-zero is a Lorentz distribution. normally distributed variables. The ratio of two non-zero quantities with Gaussian probability Gaussian, though a divergence from Gaussian can already be seen distributions has a probability density function which is Lorentz- when a Gaussian function derived by non-linear least squares fitting like, but cannot be expressed in closed form (Hinkley 1969). It to the observed distribution is overplotted. When the uncertainties can be approximated by a Gaussian distribution when the fractional are large, the ratio distribution is highly non-Gaussian. The second uncertainty is small, but becomes increasingly skewed as the frac- panel of the figure shows the probability distributions of the ratio tional uncertainties increase. The mean of this ratio distribution is of the smaller quantity to the larger quantity, and in this case it is mathematically undefined and the mode of the distribution is not apparent that even when the uncertainties are large, the ratio distri- equal to the ratio of the modes of the two quantities. Also, for bution is still fairly well approximated by a Gaussian distribution. two quantities X and Y , the mode of the probability distribution We thus summarise by making the following general points of X=Y is not the same as the reciprocal of the mode of the distri- about the probability distributions of ratios: bution of Y=X. • The probability distribution of the ratio of two Gaussian vari- We have carried out Monte Carlo analyses to illustrate these ables is not Gaussian, but can in certain circumstances be approxi- axioms. Firstly, we drew pairs of independent random numbers mated as such; from a Gaussian distribution with mean of zero and variance of • when the uncertainty on the denominator is smaller than that unity and took their ratio. Figure 1 shows the distribution of on the numerator, a Gaussian approximation is reasonable even 1 000 000 such ratios, together with a Lorentz distribution having when the uncertainty is relatively large; γ=1.0 and x0=0. • when the uncertainty in the denominator is larger than that on Figure 2 shows the results of three further Monte Carlo sim- the numerator, a Gaussian approximation is not reasonable even ulations, this time for the ratios of variables with normal distribu- when the uncertainty is relatively small; tions and non-zero means. We drew random numbers from Gaus- • when the ratio distribution is non-Gaussian, the mode of its sian distributions with means of 30 and 10; 12 and 4; and 9 and probability distribution is not equal to the ratio of the modes of the 3. The standard deviations of the distributions were 1.7 and 1.0 in input Gaussian distributions. each case to simulate the common case of Poissonian noise. Thus, the ratio of the quantities is 3.0 but the fractional uncertainty varies. These points result from the probability of numbers close to The figure shows the probability distributions scaled such that the zero becoming more probable as the uncertainties increase. When peak is at unity in each case; it can be seen that when the fractional the denominator has large uncertainties and thus a significant prob- uncertainty is small, the mode of the probability distribution of the ability of being close to or less than zero, the probability distribu- line ratio is very close to 3.0, and the distribution is very close to tion of the ratio becomes highly non-Gaussian with a heavy tail and Gaussian. But as the fractional uncertainty increases, the skew of a significant probability of very large values. As the uncertainty of the distribution increases and the mode is offset to lower values. the denominator tends to zero, the probability distribution of the For the case where the signal to noise ratios of the two values are 6 ratio tends towards a simple scaling of that of the numerator. and 2, the mode of the resulting probability distribution is 2.3. For the skewed distributions resulting from large uncertain- Figure 3 shows two sets of probability distributions, both de- ties on the denominator, the skew is always to the right. Taking rived from the same set of 1,000,000 pairs of randomly chosen the mode of a sample of ratios in which the uncertainty of the de- numbers with Gaussian uncertainties. As before, the distributions nominator is large will lead to an underestimate of the true value are scaled such that the peak is at unity for ease of comparison. of the ratio. Considering the mean, at intermediate signal to noise The first shows the probability distribution of the ratio of the larger ratios it will give an overestimate of the true value of the ratio due quantity to the smaller quantity, for large and for small uncertain- to the right hand skew of the distribution, but as the signal to noise ties. When the uncertainties are small (5 per cent on the smaller of the denominator tends towards zero, the distribution tends to- quantity in this example), the ratio distribution is approximately wards a Cauchy distribution, in which the mean is undefined and c 0000 RAS, MNRAS 000, 000–000 PDFs of line fluxes and ratios 3 Figure 3. Scaled probability density functions for ratios where the denominator has the larger uncertainty (l) and where the numerator has the larger uncertainty (r). the probability distribution of a sample mean is the same as that of SNR range [O III] ratio [N II] ratio an individual sample. The median is almost unbiased except when the signal to noise ratio of the denominator is very low and the 0-3 2.71 – distribution becomes significantly bimodal.