Maximum Likelihood Estimation 1

Total Page:16

File Type:pdf, Size:1020Kb

Maximum Likelihood Estimation 1 Math 541: Statistical Theory II Maximum Likelihood Estimation Lecturer: Songfeng Zheng 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for an un- known parameter θ. It was introduced by R. A. Fisher, a great English mathematical statis- tician, in 1912. Maximum likelihood estimation (MLE) can be applied in most problems, it has a strong intuitive appeal, and often yields a reasonable estimator of θ. Furthermore, if the sample is large, the method will yield an excellent estimator of θ. For these reasons, the method of maximum likelihood is probably the most widely used method of estimation in statistics. Suppose that the random variables X1; ¢ ¢ ¢ ;Xn form a random sample from a distribution f(xjθ); if X is continuous random variable, f(xjθ) is pdf, if X is discrete random variable, f(xjθ) is point mass function. We use the given symbol | to represent that the distribution also depends on a parameter θ, where θ could be a real-valued unknown parameter or a vector of parameters. For every observed random sample x1; ¢ ¢ ¢ ; xn, we de¯ne f(x1; ¢ ¢ ¢ ; xnjθ) = f(x1jθ) ¢ ¢ ¢ f(xnjθ) (1) If f(xjθ) is pdf, f(x1; ¢ ¢ ¢ ; xnjθ) is the joint density function; if f(xjθ) is pmf, f(x1; ¢ ¢ ¢ ; xnjθ) is the joint probability. Now we call f(x1; ¢ ¢ ¢ ; xnjθ) as the likelihood function. As we can see, the likelihood function depends on the unknown parameter θ, and it is always denoted as L(θ). Suppose, for the moment, that the observed random sample x1; ¢ ¢ ¢ ; xn came from a discrete distribution. If an estimate of θ must be selected, we would certainly not consider any value of θ for which it would have been impossible to obtain the data x1; ¢ ¢ ¢ ; xn that was actually observed. Furthermore, suppose that the probability f(x1; ¢ ¢ ¢ ; xnjθ) of obtaining the actual observed data x1; ¢ ¢ ¢ ; xn is very high when θ has a particular value, say θ = θ0, and is very small for every other value of θ. Then we would naturally estimate the value of θ to be θ0. When the sample comes from a continuous distribution, it would again be natural to try to ¯nd a value of θ for which the probability density f(x1; ¢ ¢ ¢ ; xnjθ) is large, and to use this value as an estimate of θ. For any given observed data x1; ¢ ¢ ¢ ; xn, we are led by this reasoning to consider a value of θ for which the likelihood function L(θ) is a maximum and to use this value as an estimate of θ. 1 2 The meaning of maximum likelihood is as follows. We choose the parameter that makes the likelihood of having the obtained data at hand maximum. With discrete distributions, the likelihood is the same as the probability. We choose the parameter for the density that maximizes the probability of the data coming from it. Theoretically, if we had no actual data, maximizing the likelihood function will give us a function of n random variables X1; ¢ ¢ ¢ ;Xn, which we shall call \maximum likelihood estimate" θ^. When there are actual data, the estimate takes a particular numerical value, which will be the maximum likelihood estimator. MLE requires us to maximum the likelihood function L(θ) with respect to the unknown parameter θ. From Eqn. 1, L(θ) is de¯ned as a product of n terms, which is not easy to be maximized. Maximizing L(θ) is equivalent to maximizing log L(θ) because log is a monotonic increasing function. We de¯ne log L(θ) as log likelihood function, we denote it as l(θ), i.e. Yn Xn l(θ) = log L(θ) = log f(Xijθ) = log f(Xijθ) i=1 i=1 Maximizing l(θ) with respect to θ will give us the MLE estimation. 2 Examples Example 1: Suppose that X is a discrete random variable with the following probability mass function: where 0 · θ · 1 is a parameter. The following 10 independent observations X 0 1 2 3 P (X) 2θ=3 θ=3 2(1 ¡ θ)=3 (1 ¡ θ)=3 were taken from such a distribution: (3,0,2,1,3,2,1,0,2,1). What is the maximum likelihood estimate of θ. Solution: Since the sample is (3,0,2,1,3,2,1,0,2,1), the likelihood is L(θ) = P (X = 3)P (X = 0)P (X = 2)P (X = 1)P (X = 3) £ P (X = 2)P (X = 1)P (X = 0)P (X = 2)P (X = 1) (2) Substituting from the probability distribution given above, we have à ! à ! à ! à ! Yn 2θ 2 θ 3 2(1 ¡ θ) 3 1 ¡ θ 2 L(θ) = P (Xijθ) = i=1 3 3 3 3 Clearly, the likelihood function L(θ) is not easy to maximize. 3 Let us look at the log likelihood function Xn l(θ) = log L(θ) = log P (Xijθ) i=1 µ 2 ¶ µ 1 ¶ µ 2 ¶ µ 1 ¶ = 2 log + log θ + 3 log + log θ + 3 log + log(1 ¡ θ) + 2 log + log(1 ¡ θ) 3 3 3 3 = C + 5 log θ + 5 log(1 ¡ θ) where C is a constant which does not depend on θ. It can be seen that the log likelihood function is easier to maximize compared to the likelihood function. Let the derivative of l(θ) with respect to θ be zero: dl(θ) 5 5 = ¡ = 0 dθ θ 1 ¡ θ and the solution gives us the MLE, which is θ^ = 0:5. We remember that the method of moment estimation is θ^ = 5=12, which is di®erent from MLE. Example 2: Suppose³ ´ X1;X2; ¢ ¢ ¢ ;Xn are i.i.d. random variables with density function 1 jxj f(xjσ) = 2σ exp ¡ σ , please ¯nd the maximum likelihood estimate of σ. Solution: The log-likelihood function is " # Xn jX j l(σ) = ¡ log 2 ¡ log σ ¡ i i=1 σ Let the derivative with respect to θ be zero: " # P Xn n 0 1 jXij n i=1 jXij l (σ) = ¡ + 2 = ¡ + 2 = 0 i=1 σ σ σ σ and this gives us the MLE for σ as P n jX j σ^ = i=1 i n Again this is di®erent from the method of moment estimation which is sP n X2 σ^ = i=1 i 2n Example 3: Use the method of moment to estimate the parameters ¹ and σ for the normal density ( ) 1 (x ¡ ¹)2 f(xj¹; σ2) = p exp ¡ ; 2¼σ 2σ2 4 based on a random sample X1; ¢ ¢ ¢ ;Xn. Solution: In this example, we have two unknown parameters, ¹ and σ, therefore the pa- rameter θ = (¹; σ) is a vector. We ¯rst write out the log likelihood function as Xn · ¸ Xn 1 1 2 n 1 2 l(¹; σ) = ¡ log σ ¡ log 2¼ ¡ 2 (Xi ¡ ¹) = ¡n log σ ¡ log 2¼ ¡ 2 (Xi ¡ ¹) i=1 2 2σ 2 2σ i=1 Setting the partial derivative to be 0, we have @l(¹; σ) 1 Xn = 2 (Xi ¡ ¹) = 0 @¹ σ i=1 Xn @l(¹; σ) n ¡3 2 = ¡ + σ (Xi ¡ ¹) = 0 @σ σ i=1 Solving these equations will give us the MLE for ¹ and σ: v u u 1 Xn t 2 ¹^ = X andσ ^ = (Xi ¡ X) n i=1 This time the MLE is the same as the result of method of moment. From these examples, we can see that the maximum likelihood result may or may not be the same as the result of method of moment. Example 4: The Pareto distribution has been used in economics as a model for a density function with a slowly decaying tail: θ ¡θ¡1 f(xjx0; θ) = θx0x ; x ¸ x0; θ > 1 Assume that x0 > 0 is given and that X1;X2; ¢ ¢ ¢ ;Xn is an i.i.d. sample. Find the MLE of θ. Solution: The log-likelihood function is Xn Xn l(θ) = log f(Xijθ) = (log θ + θ log x0 ¡ (θ + 1) log Xi) i=1 i=1 Xn = n log θ + nθ log x0 ¡ (θ + 1) log Xi i=1 Let the derivative with respect to θ be zero: dl(θ) n Xn = + n log x0 ¡ log Xi = 0 dθ θ i=1 5 Solving the equation yields the MLE of θ: ^ 1 θMLE = log X ¡ log x0 Example 5: Suppose that X1; ¢ ¢ ¢ ;Xn form a random sample from a uniform distribution on the interval (0; θ), where of the parameter θ > 0 but is unknown. Please ¯nd MLE of θ. Solution: The pdf of each observation has the following form: ( 1/θ; for 0 · x · θ f(xjθ) = (3) 0; otherwise Therefore, the likelihood function has the form ( 1 ; for 0 · x · θ (i = 1; ¢ ¢ ¢ ; n) L(θ) = θn i 0; otherwise It can be seen that the MLE of θ must be a value of θ for which θ ¸ xi for i = 1; ¢ ¢ ¢ ; n and which maximizes 1/θn among all such values. Since 1/θn is a decreasing function of θ, the estimate will be the smallest possible value of θ such that θ ¸ xi for i = 1; ¢ ¢ ¢ ; n. This value ^ is θ = max(x1; ¢ ¢ ¢ ; xn), it follows that the MLE of θ is θ = max(X1; ¢ ¢ ¢ ;Xn). It should be remarked that in this example, the MLE θ^ does not seem to be a suitable ^ estimator of θ. We know that max(X1; ¢ ¢ ¢ ;Xn) < θ with probability 1, and therefore θ surely underestimates the value of θ. Example 6: Suppose again that X1; ¢ ¢ ¢ ;Xn form a random sample from a uniform distribu- tion on the interval (0; θ), where of the parameter θ > 0 but is unknown.
Recommended publications
  • Lecture 12 Robust Estimation
    Lecture 12 Robust Estimation Prof. Dr. Svetlozar Rachev Institute for Statistics and Mathematical Economics University of Karlsruhe Financial Econometrics, Summer Semester 2007 Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Copyright These lecture-notes cannot be copied and/or distributed without permission. The material is based on the text-book: Financial Econometrics: From Basics to Advanced Modeling Techniques (Wiley-Finance, Frank J. Fabozzi Series) by Svetlozar T. Rachev, Stefan Mittnik, Frank Fabozzi, Sergio M. Focardi,Teo Jaˇsic`. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Outline I Robust statistics. I Robust estimators of regressions. I Illustration: robustness of the corporate bond yield spread model. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Robust statistics addresses the problem of making estimates that are insensitive to small changes in the basic assumptions of the statistical models employed. I The concepts and methods of robust statistics originated in the 1950s. However, the concepts of robust statistics had been used much earlier. I Robust statistics: 1. assesses the changes in estimates due to small changes in the basic assumptions; 2. creates new estimates that are insensitive to small changes in some of the assumptions. I Robust statistics is also useful to separate the contribution of the tails from the contribution of the body of the data. Prof. Dr. Svetlozar Rachev Institute for Statistics and MathematicalLecture Economics 12 Robust University Estimation of Karlsruhe Robust Statistics I Peter Huber observed, that robust, distribution-free, and nonparametrical actually are not closely related properties.
    [Show full text]
  • An Introduction to Psychometric Theory with Applications in R
    What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD An introduction to Psychometric Theory with applications in R William Revelle Department of Psychology Northwestern University Evanston, Illinois USA February, 2013 1 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD Overview 1 Overview Psychometrics and R What is Psychometrics What is R 2 Part I: an introduction to R What is R A brief example Basic steps and graphics 3 Day 1: Theory of Data, Issues in Scaling 4 Day 2: More than you ever wanted to know about correlation 5 Day 3: Dimension reduction through factor analysis, principal components analyze and cluster analysis 6 Day 4: Classical Test Theory and Item Response Theory 7 Day 5: Structural Equation Modeling and applied scale construction 2 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD Outline of Day 1/part 1 1 What is psychometrics? Conceptual overview Theory: the organization of Observed and Latent variables A latent variable approach to measurement Data and scaling Structural Equation Models 2 What is R? Where did it come from, why use it? Installing R on your computer and adding packages Installing and using packages Implementations of R Basic R capabilities: Calculation, Statistical tables, Graphics Data sets 3 Basic statistics and graphics 4 steps: read, explore, test, graph Basic descriptive and inferential statistics 4 TOD 3 / 71 What is psychometrics? What is R? Where did it come from, why use it? Basic statistics and graphics TOD What is psychometrics? In physical science a first essential step in the direction of learning any subject is to find principles of numerical reckoning and methods for practicably measuring some quality connected with it.
    [Show full text]
  • The Gompertz Distribution and Maximum Likelihood Estimation of Its Parameters - a Revision
    Max-Planck-Institut für demografi sche Forschung Max Planck Institute for Demographic Research Konrad-Zuse-Strasse 1 · D-18057 Rostock · GERMANY Tel +49 (0) 3 81 20 81 - 0; Fax +49 (0) 3 81 20 81 - 202; http://www.demogr.mpg.de MPIDR WORKING PAPER WP 2012-008 FEBRUARY 2012 The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart ([email protected]) This working paper has been approved for release by: James W. Vaupel ([email protected]), Head of the Laboratory of Survival and Longevity and Head of the Laboratory of Evolutionary Biodemography. © Copyright is held by the authors. Working papers of the Max Planck Institute for Demographic Research receive only limited review. Views or opinions expressed in working papers are attributable to the authors and do not necessarily refl ect those of the Institute. The Gompertz distribution and Maximum Likelihood Estimation of its parameters - a revision Adam Lenart November 28, 2011 Abstract The Gompertz distribution is widely used to describe the distribution of adult deaths. Previous works concentrated on formulating approximate relationships to char- acterize it. However, using the generalized integro-exponential function Milgram (1985) exact formulas can be derived for its moment-generating function and central moments. Based on the exact central moments, higher accuracy approximations can be defined for them. In demographic or actuarial applications, maximum-likelihood estimation is often used to determine the parameters of the Gompertz distribution. By solving the maximum-likelihood estimates analytically, the dimension of the optimization problem can be reduced to one both in the case of discrete and continuous data.
    [Show full text]
  • Should We Think of a Different Median Estimator?
    Comunicaciones en Estad´ıstica Junio 2014, Vol. 7, No. 1, pp. 11–17 Should we think of a different median estimator? ¿Debemos pensar en un estimator diferente para la mediana? Jorge Iv´an V´eleza Juan Carlos Correab [email protected] [email protected] Resumen La mediana, una de las medidas de tendencia central m´as populares y utilizadas en la pr´actica, es el valor num´erico que separa los datos en dos partes iguales. A pesar de su popularidad y aplicaciones, muchos desconocen la existencia de dife- rentes expresiones para calcular este par´ametro. A continuaci´on se presentan los resultados de un estudio de simulaci´on en el que se comparan el estimador cl´asi- co y el propuesto por Harrell & Davis (1982). Mostramos que, comparado con el estimador de Harrell–Davis, el estimador cl´asico no tiene un buen desempe˜no pa- ra tama˜nos de muestra peque˜nos. Basados en los resultados obtenidos, se sugiere promover la utilizaci´on de un mejor estimador para la mediana. Palabras clave: mediana, cuantiles, estimador Harrell-Davis, simulaci´on estad´ısti- ca. Abstract The median, one of the most popular measures of central tendency widely-used in the statistical practice, is often described as the numerical value separating the higher half of the sample from the lower half. Despite its popularity and applica- tions, many people are not aware of the existence of several formulas to estimate this parameter. We present the results of a simulation study comparing the classic and the Harrell-Davis (Harrell & Davis 1982) estimators of the median for eight continuous statistical distributions.
    [Show full text]
  • Cluster Analysis for Gene Expression Data: a Survey
    Cluster Analysis for Gene Expression Data: A Survey Daxin Jiang Chun Tang Aidong Zhang Department of Computer Science and Engineering State University of New York at Buffalo Email: djiang3, chuntang, azhang @cse.buffalo.edu Abstract DNA microarray technology has now made it possible to simultaneously monitor the expres- sion levels of thousands of genes during important biological processes and across collections of related samples. Elucidating the patterns hidden in gene expression data offers a tremen- dous opportunity for an enhanced understanding of functional genomics. However, the large number of genes and the complexity of biological networks greatly increase the challenges of comprehending and interpreting the resulting mass of data, which often consists of millions of measurements. A first step toward addressing this challenge is the use of clustering techniques, which is essential in the data mining process to reveal natural structures and identify interesting patterns in the underlying data. Cluster analysis seeks to partition a given data set into groups based on specified features so that the data points within a group are more similar to each other than the points in different groups. A very rich literature on cluster analysis has developed over the past three decades. Many conventional clustering algorithms have been adapted or directly applied to gene expres- sion data, and also new algorithms have recently been proposed specifically aiming at gene ex- pression data. These clustering algorithms have been proven useful for identifying biologically relevant groups of genes and samples. In this paper, we first briefly introduce the concepts of microarray technology and discuss the basic elements of clustering on gene expression data.
    [Show full text]
  • Reliability Engineering: Today and Beyond
    Reliability Engineering: Today and Beyond Keynote Talk at the 6th Annual Conference of the Institute for Quality and Reliability Tsinghua University People's Republic of China by Professor Mohammad Modarres Director, Center for Risk and Reliability Department of Mechanical Engineering Outline – A New Era in Reliability Engineering – Reliability Engineering Timeline and Research Frontiers – Prognostics and Health Management – Physics of Failure – Data-driven Approaches in PHM – Hybrid Methods – Conclusions New Era in Reliability Sciences and Engineering • Started as an afterthought analysis – In enduing years dismissed as a legitimate field of science and engineering – Worked with small data • Three advances transformed reliability into a legitimate science: – 1. Availability of inexpensive sensors and information systems – 2. Ability to better described physics of damage, degradation, and failure time using empirical and theoretical sciences – 3. Access to big data and PHM techniques for diagnosing faults and incipient failures • Today we can predict abnormalities, offer just-in-time remedies to avert failures, and making systems robust and resilient to failures Seventy Years of Reliability Engineering – Reliability Engineering Initiatives in 1950’s • Weakest link • Exponential life model • Reliability Block Diagrams (RBDs) – Beyond Exp. Dist. & Birth of System Reliability in 1960’s • Birth of Physics of Failure (POF) • Uses of more proper distributions (Weibull, etc.) • Reliability growth • Life testing • Failure Mode and Effect Analysis
    [Show full text]
  • 5. the Student T Distribution
    Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way.
    [Show full text]
  • Bias, Mean-Square Error, Relative Efficiency
    3 Evaluating the Goodness of an Estimator: Bias, Mean-Square Error, Relative Efficiency Consider a population parameter ✓ for which estimation is desired. For ex- ample, ✓ could be the population mean (traditionally called µ) or the popu- lation variance (traditionally called σ2). Or it might be some other parame- ter of interest such as the population median, population mode, population standard deviation, population minimum, population maximum, population range, population kurtosis, or population skewness. As previously mentioned, we will regard parameters as numerical charac- teristics of the population of interest; as such, a parameter will be a fixed number, albeit unknown. In Stat 252, we will assume that our population has a distribution whose density function depends on the parameter of interest. Most of the examples that we will consider in Stat 252 will involve continuous distributions. Definition 3.1. An estimator ✓ˆ is a statistic (that is, it is a random variable) which after the experiment has been conducted and the data collected will be used to estimate ✓. Since it is true that any statistic can be an estimator, you might ask why we introduce yet another word into our statistical vocabulary. Well, the answer is quite simple, really. When we use the word estimator to describe a particular statistic, we already have a statistical estimation problem in mind. For example, if ✓ is the population mean, then a natural estimator of ✓ is the sample mean. If ✓ is the population variance, then a natural estimator of ✓ is the sample variance. More specifically, suppose that Y1,...,Yn are a random sample from a population whose distribution depends on the parameter ✓.The following estimators occur frequently enough in practice that they have special notations.
    [Show full text]
  • The Likelihood Principle
    1 01/28/99 ãMarc Nerlove 1999 Chapter 1: The Likelihood Principle "What has now appeared is that the mathematical concept of probability is ... inadequate to express our mental confidence or diffidence in making ... inferences, and that the mathematical quantity which usually appears to be appropriate for measuring our order of preference among different possible populations does not in fact obey the laws of probability. To distinguish it from probability, I have used the term 'Likelihood' to designate this quantity; since both the words 'likelihood' and 'probability' are loosely used in common speech to cover both kinds of relationship." R. A. Fisher, Statistical Methods for Research Workers, 1925. "What we can find from a sample is the likelihood of any particular value of r [a parameter], if we define the likelihood as a quantity proportional to the probability that, from a particular population having that particular value of r, a sample having the observed value r [a statistic] should be obtained. So defined, probability and likelihood are quantities of an entirely different nature." R. A. Fisher, "On the 'Probable Error' of a Coefficient of Correlation Deduced from a Small Sample," Metron, 1:3-32, 1921. Introduction The likelihood principle as stated by Edwards (1972, p. 30) is that Within the framework of a statistical model, all the information which the data provide concerning the relative merits of two hypotheses is contained in the likelihood ratio of those hypotheses on the data. ...For a continuum of hypotheses, this principle
    [Show full text]
  • 1 One Parameter Exponential Families
    1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e.
    [Show full text]
  • Random Variables and Probability Distributions 1.1
    RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 1. DISCRETE RANDOM VARIABLES 1.1. Definition of a Discrete Random Variable. A random variable X is said to be discrete if it can assume only a finite or countable infinite number of distinct values. A discrete random variable can be defined on both a countable or uncountable sample space. 1.2. Probability for a discrete random variable. The probability that X takes on the value x, P(X=x), is defined as the sum of the probabilities of all sample points in Ω that are assigned the value x. We may denote P(X=x) by p(x) or pX (x). The expression pX (x) is a function that assigns probabilities to each possible value x; thus it is often called the probability function for the random variable X. 1.3. Probability distribution for a discrete random variable. The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides pX (x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values. Any value x not explicitly assigned a positive probability is understood to be such that P(X=x) = 0. The function pX (x)= P(X=x) for each x within the range of X is called the probability distribution of X. It is often called the probability mass function for the discrete random variable X. 1.4. Properties of the probability distribution for a discrete random variable.
    [Show full text]
  • A Joint Central Limit Theorem for the Sample Mean and Regenerative Variance Estimator*
    Annals of Operations Research 8(1987)41-55 41 A JOINT CENTRAL LIMIT THEOREM FOR THE SAMPLE MEAN AND REGENERATIVE VARIANCE ESTIMATOR* P.W. GLYNN Department of Industrial Engineering, University of Wisconsin, Madison, W1 53706, USA and D.L. IGLEHART Department of Operations Research, Stanford University, Stanford, CA 94305, USA Abstract Let { V(k) : k t> 1 } be a sequence of independent, identically distributed random vectors in R d with mean vector ~. The mapping g is a twice differentiable mapping from R d to R 1. Set r = g(~). A bivariate central limit theorem is proved involving a point estimator for r and the asymptotic variance of this point estimate. This result can be applied immediately to the ratio estimation problem that arises in regenerative simulation. Numerical examples show that the variance of the regenerative variance estimator is not necessarily minimized by using the "return state" with the smallest expected cycle length. Keywords and phrases Bivariate central limit theorem,j oint limit distribution, ratio estimation, regenerative simulation, simulation output analysis. 1. Introduction Let X = {X(t) : t I> 0 } be a (possibly) delayed regenerative process with regeneration times 0 = T(- 1) ~< T(0) < T(1) < T(2) < .... To incorporate regenerative sequences {Xn: n I> 0 }, we pass to the continuous time process X = {X(t) : t/> 0}, where X(0 = X[t ] and [t] is the greatest integer less than or equal to t. Under quite general conditions (see Smith [7] ), *This research was supported by Army Research Office Contract DAAG29-84-K-0030. The first author was also supported by National Science Foundation Grant ECS-8404809 and the second author by National Science Foundation Grant MCS-8203483.
    [Show full text]