The Relationship Between Fourier and Mellin Transforms, with Applications to Probability
Total Page:16
File Type:pdf, Size:1020Kb
The relationship between Fourier and Mellin transforms, with applications to probability Dave Collins [email protected] Abstract The use of Fourier transforms for deriving probability densities of sums and differences of random variables is well known. The use of Mellin transforms to derive densities for products and quotients of random vari- ables is less well known. We present the relationship between the Fourier and Mellin transform, and discuss the use of these transforms in deriv- ing densities for algebraic combinations of random variables. Results are illustrated with examples from reliability analysis. 1 Introduction For the purposes of this paper, we may loosely define a random variable (RV) as a value in some domain, say R, representing the outcome of a process based on a probability law. An example would be a real number representing the height in inches of a male chosen at random from a population in which height is distributed according to a Gaussian (normal) law with mean 71 and variance 25. Then we can say, for example, that the probability of the height of an individual from the population being between 66 and 76 inches is about .68. For deriving such information about “nice” probability distributions (e.g., the height distribution above), we integrate the probability density function ¦ 2 ¨ 1 1 x § µ ¡ ¢ ¤ ¥ (pdf); in the case of the Gaussian the pdf is f x exp 2 © , where σ £ 2π 2 σ µ is the mean and σ2 is the variance.1 A question that frequently arises in applications is, given RVs X,Y with ¡ densities f x ¡ , g y , what is the density of the random variable X Y ? (The ¡ answer is not f x ¡ g y .) A less frequent but sometimes important question is, what is the density of the product XY ? In this paper, after some brief background on probability theory, we provide specific examples of these ques- tions and show how they can be answered with convolutions, using the Fourier and Mellin integral transforms. In fact (though we will not go into this level 1In this paper “nice” means RVs whose range is Rn, with finite moments of all orders, and which are absolutely continuous with respect to Lebesgue measure, which implies that their pdfs are smooth almost everywhere and Riemann integrable. We will only deal with nice distributions. 1 of detail), using these transforms one can, in principle, compute densities for arbitrary rational functions of random variables [15]. 1.1 Terminology To avoid confusion, it is necessary to mention a few cases in which the termi- nology used in probability theory may be confusing: • “Distribution” (or “law”) in probability theory means a function that as- signs a probability 0 p 1 to every Borel subset of R; not a “generalized function” as in the Schwartz theory of distributions. • For historical reasons going back to Henri Poincar´e,the term “characteris- tic function” in probability theory refers to an integral transform of a pdf, not to what mathematicians usually refer to as the characteristic function. For that concept, probability theory uses “indicator function”, symbolized ¡ ¡ £ ¤ © I; e.g., I 0,1 ¢ x is 1 for x 0, 1 and 0 elsewhere. In this paper we will not use the term “characteristic function” at all. 1 • We will be talking about pdfs being in L R ¡ , and this should be taken in the ordinary mathematical sense of a function on R which is absolutely integrable. More commonly, probabilists talk about random variables be- ing in L1, L2, etc., which is quite different—in terms of a pdf f, it means 2 ¥ ¥ ¡ ¤ ¥ ¥ ¡ that ¤ x f x dx, x f x dx, etc. exist and are finite. It would require an excursion into measure theory to explain why this makes sense; suf- fice it to say that in the latter case we should really say something like 1 1 ¡ “L Ω, F,P ¡ ”, which is not at all the same as L R . 2 Probability background For those with no exposure to probability and statistics, we provide a brief intuitive overview of a few concepts. Feel free to skip to the end if you are already familiar with this material (but do look at the two examples at the end of the section). Probability theory starts with the idea of the outcome of some process, which is mapped to a domain (e.g., R) by a random variable, say X. We will ignore the underlying process and just think of x £ R as a “realization” of X, with a probability law or distribution which tells us how much probability is associated ¤ R ¦ with any interval a, b © . “How much” is given by a number 0 p 1. Formally, probabilities are implicitly defined by their role in the axioms of probability theory; informally, one can think of them as degrees of belief (varying from 0, complete disbelief, to 1, complete belief), or as ratios of the number of times a certain outcome occurs to the total number of outcomes (e.g., the proportion of coin tosses that come up heads). A probability law on R can be represented by its density, or pdf, which is a continuous function f x ¡ with the property that the probability of finding x 2 b ¤ £ ¤ ¡ ¢ ¤ ¡ © in a, b © is P x a, b a f x dx. The pdf is just like a physical density—it gives the probability “mass” per unit length, which is integrated to measure the total mass in an interval. Note the defining characteristics of a probability measure on R: ¤ £ ¤ ¡ © 1. For any a, b © , 0 P x a, b 1. ¤ £ ¥ ¡ ¢ 2. P x , © 1. ¢ ¤ ¤ ¢ £ ¤ ¤ ¡ ¢ £ ¤ ¡ £ ¤ ¡ ¡ © © £ © © © 3. if a, b © c, d , then P x a, b c, d P x a, b P x c, d . From these properties and general properties of the integral it follows that if f ¤ ¤ ¥ ¡ ¢ is a continuous pdf, then f x ¡ 0 and f x dx 1. § Though we don’t need them here, there¥ are also discrete random variables, which take values in a countable set as opposed to a continuous domain. For example, a random variable representing the outcome of a process that counts the number of students in the classroom at any given moment takes values only in the nonnegative integers. There is much more to probability, and in particular a great deal of measure-theoretic apparatus has been ignored here, but it is not necessary for understanding the remainder of the paper. The Gaussian or normal density was mentioned in section 1. We say that 2 ¡ X ¦ N µ, σ if it is distributed according to a normal law with mean or average µ and variance σ2. The mean µ determines the center of the normal pdf, which is symmetric; µ is also the median (the point such that half the probability mass is above it, half below), and the mode (the unique local maximum of the pdf). If the pdf represented a physical mass distribution over a long rod, the mean µ is the point at which it would balance. The variance is a measure of the variability or “spread” of the distribution. The square root of the variance, σ, is called the standard deviation, and is often used because it has the same unit of measure as X. ¥ ¤ ¡ Formally, given any RV X with pdf f, its mean is µ ¢ xf x dx (the § average of x over the support of the distribution, weighted by¥ the probability density). This is usually designated by E X ¡ , the expectation or expected value 2 2 ¤ ¥ ¤ ¥ ¡ ¥ ¡ ¡ of X. The variance of X is E X µ © = x µ f x dx (the weighted § average of the squared deviation of x from its mean¥ value). Figure 1 plots the N 71, 25 ¡ density for heights mentioned in Section 1. The central vertical line marks the mean, and the two outer lines are at a distance of one standard deviation from the mean. The definite integral of the normal pdf can’t be solved in closed form; an approximation is often found as follows: 2 X § µ ¡ ¦ ¡ It is easy to show that if X ¦ N µ, σ , then σ N 0, 1 ; also from the properties of a probability measure, for any random variable X, ¡ ¢ ¥ § ¡ ¥ ¥ § ¡ P a X b P X b P X a . § ¡ It therefore suffices to have a table of values for P ¥ X b for the ¡ N 0, 1 ¡ distribution. (Viewed as a function of b, P b is called the cumulative distribution function.) Such tables are found in all elementary statistics books, ¡ ¨ and give, e.g., P 66 X 76 .682. 3 0.08 0.06 0.04 0.02 50 60 70 80 90 100 Figure 1: N 71, 25 ¡ pdf for the distribution of heights ¡ Many applications use random variables that take values only on ¤0, , for example to represent incomes, life expectancies, etc. A frequently used model for such RVs is the gamma distribution with pdf 1 § α § 1 x β ¢ ¡ f x ¡ α x e if x 0, 0 otherwise. Γ α ¡ β 1 (Notice that aside from the constant ¦ α , which normalizes f so it integrates Γ α ¨ β ¢ to 1, and the extra parameter β, this is the kernel of the gamma function Γ α ¡ § α § 1 x ¥ ¤ 0 x e dx, which accounts for the name.) Figure 2 shows a gamma(4, 2) ¢ pdf (α ¢ 4, β 2). Because a gamma density is never symmetric, but skewed to the right, the mode, median and mean occur in that order and are not identical. For an incomes distribution this means that the typical (most likely) income is smaller than the “middle” income which is smaller than the average income (the latter is pulled up by the small number of people who have very large incomes).