8 | Maximum Likelihood Estimation

Total Page:16

File Type:pdf, Size:1020Kb

8 | Maximum Likelihood Estimation 8 j Maximum likelihood estimation Maximum likelihood (ML) estimation is a general principle to derive point estimators in probabilistic models. ML estimation was popularized by Fisher at the beginning of the 20th century, but already found application in the works of Laplace (1749-1827) and Gauss (1777-1855) (Aldrich, 1997). ML estimation is based on the following intuition: the most likely parameter value of a probabilistic model that generated an observed data set should be that parameter value for which the probability of the data under the model is maximal. In this Section, we first make this intuition more precise and introduce the notions of (log) likelihood functions and ML estimators (Section 8.1). We then exemplify the ML approach by discussing ML parameter estimation for the univariate Gaussian samples (Section 8.2). Finally, we consider ML parameter estimation for the GLM, relate it to ordinary least squares estimation, and introduce the restricted ML estimator for the variance parameter of the GLM (Section 8.3). 8.1 Likelihood functions and maximum likelihood estimators Likelihood functions The fundamental idea of ML estimation is to select that parameter value as a point estimate of the true, but unknown, parameter value that gave rise to the data which maximizes the probability of the data under the model of interest. To implement this intuition, the notion of the likelihood function and its maximization is invoked. To introduce the likelihood function, consider a parametric probabilistic model pθ(y) which specifies the probability distribution of a random entity y. Here, y models data and θ denotes the model's parameter with parameter space Θ. Given a parametric probabilistic model pθ(y), the function Ly :Θ ! R≥0; θ 7! Ly(θ) := pθ(y) (8.1) is called the likelihood function of the parameter θ for the data y. Note that the specific nature of θ and y is left unspecified, i.e., θ and y may be scalars, vectors, or matrices. Notably, the likelihood function is a function of the parameter θ, while it also depends on y. Because y is a random entity, different data samples from probabilistic model pθ(y) result in different likelihood functions. In this sense, there is a distribution of likelihood functions for each probabilistic model, but once a data realization has been obtained, the likelihood function is a (deterministic) function of the parameter value only. This is in stark contrast with PDFs and PMFs which are functions of the random variable's outcome values (Section 5 j Probability and random variables). Stated differently, the input argument of a PDF or PMF is the value of a random variable and the output argument of a PDF or PMF is the probability density or mass of this value for a fixed value of the model's parameter. In contrast, the input argument of a likelihood function is a parameter value and the output of the likelihood function is the probability density or mass of a fixed value of the random variable modelling data for this parameter value under the probability model of interest. If the random variable value and parameter value submitted to a PDF or PMF of a model and their corresponding likelihood functions are identical, so are the outputs of both functions. It is the functional dependencies that distinguish likelihood functions from PDFs and PMFs, but not their functional form. Maximum likelihood estimators The ML estimator of a given probabilistic model pθ(y) is that parameter value which maximizes the likelihood function. Formally, this can be expressed as ^ θML := arg max Ly(θ): (8.2) θ2Θ ^ Eq. (8.2) should be read as follows: θML is defined as that argument of the likelihood function Ly for which Ly(θ) assumes its maximal value over all possible parameter values θ in the parameter space Θ. Note that from a mathematical viewpoint, the above definition is not overly general, because it is tacitly assumed that Ly in fact has an maximizing argument and that this argument is unique. Also note that Likelihood functions and maximum likelihood estimators 2 ^ ^ instead of values for θML, one is often interested in functional forms that express θML as a function of the ^ ^ data y. Concrete numerical values of θML are referred to as ML estimates, while functional forms of θML are referred to as ML estimators. There are essentially two approaches to ML estimation. The first approach aims to obtain functional forms of ML estimators (sometimes referred to as closed-form solutions) by analytically maximizing the likelihood function with respect to θ. The second approach, often encountered in applied computing, builds on the former and systematically varies θ given an observation of y while monitoring the numeric value of the likelihood function. Once this value appears to be maximal, varying θ stops, and the resulting value is used as an ML estimate. In the following, we consider the first approach, which is of immediate relevance for basic parameter estimation in the GLM, in more detail. ^ From Section 3 j Calculus, we know that candidate values for the ML estimator θML fulfil the requirement d Ly(θ)j ^ = 0: (8.3) dθ θ=θML ^ Eq. (8.3) is known as the likelihood equation and should be read as follows: at the location of θML, the d p derivative of the likelihood function dθ Ly with respect to θ is equal to zero. If θ 2 R ; p > 1, the statement ^ implies that at the location of θML, the gradient rL with respect to θ is equal to the zero vector 0p. Clearly, eq. (8.3) corresponds to the necessary condition for extrema of functions. By evaluating the necessary derivatives of the likelihood function and setting them to zero, one may thus obtain a set of equations which can hopefully be solved for an ML estimator. The log likelihood function To simplify the analytical approach for finding ML estimators as sketched above, one usually considers the logarithm of the likelihood function, the so-called log likelihood function. The log likelihood function is defined as (cf. eq. (8.1)) `y :Θ ! R; θ 7! `y(θ) := ln Ly(θ) = ln pθ(y): (8.4) Because the logarithm is a monotonically increasing function, the location in parameter space at which the likelihood function assumes its maximal value corresponds to the location in parameter space at which the log likelihood assumes its maximal value. Using either the likelihood function or log likelihood function to find a maximum likelihood estimator is thus equivalent, as both will identify the same maximizing value (if it exists). The use of log likelihood functions instead of likelihood functions in ML estimation is primarily of pragmatic nature: first, probabilistic models often involve PDFs with exponential terms that are dissolved by the log transform. Second, independence assumptions often give rise to factorized probability distributions which are simplified to sums by the log transform. Finally, from a numerical perspective, one often deals with PDF or PMF values that are rather close to zero and that are stretched to a broader range by the log transform. In analogy to (8.3), the log likelihood equation for the maximum likelihood estimator is given by d `y(θ)j ^ = 0: (8.5) dθ θ=θML Like eq. (8.3), the log likelihood equation can be extended to multivariate θ in terms of the gradient of `, ^ and like eq. (8.3), it can be solved for θML. We next aim to exemplify the idea of ML estimation in a first example (Section 8.2). To do so, we first discuss two additional assumptions that simplify the application of the ML approach considerably: the assumption of a concave log likelihood function and the assumption of independent data random variables with associated PDFs. Finally, we summarize the ML method in a recipe-like manner. Concave log likelihood functions If the log likelihood function is concave, then the necessary condition for a maximum of the log likelihood function is also sufficient. Recall that a multivariate real-valued function f : Rn ! R is called concave, if for all input arguments a; b 2 Rn the straight line connecting f(a) and f(b) lies below the function's graph. Formally, n f (ta + (1 − t)b) ≥ tf(a) + (1 − t)f(b) for a; b 2 R and t 2 [0; 1]: (8.6) The General Linear Model j © 2020 Dirk Ostwald CC BY-NC-SA 4.0 Likelihood functions and maximum likelihood estimators 3 Here, ta+(1−t)b for t 2 [0; 1] describes a straight line in the domain of the function, while tf(a)+(1−t)f(b) for t 2 [0; 1] describes a straight line in the range of the function. Leaving mathematical subtleties aside, it is roughly correct that concave functions have a single maximum, or in other words, that a critical point at which the gradient vanishes is guaranteed to be a maximum of the function. Thus, if the log likelihood function is concave, finding a parameter value for which the log likelihood equation holds, is sufficient to identify a maximum at this location. In principle, whenever applying the ML method based on the log likelihood equation, it is thus necessary to show that that the log likelihood function is concave and that the necessary condition for a maximum is hence also sufficient. However, such an approach is beyond the level of rigour herein and we content with stating without proof that the log likelihood functions of interest in the following are concave. Independent data random variables with probability density functions A second assumption that simplifies the application of the ML method is the assumption of independent data variables with associated PDFs.
Recommended publications
  • Probability Based Estimation Theory for Respondent Driven Sampling
    Journal of Official Statistics, Vol. 24, No. 1, 2008, pp. 79–97 Probability Based Estimation Theory for Respondent Driven Sampling Erik Volz1 and Douglas D. Heckathorn2 Many populations of interest present special challenges for traditional survey methodology when it is difficult or impossible to obtain a traditional sampling frame. In the case of such “hidden” populations at risk of HIV/AIDS, many researchers have resorted to chain-referral sampling. Recent progress on the theory of chain-referral sampling has led to Respondent Driven Sampling (RDS), a rigorous chain-referral method which allows unbiased estimation of the target population. In this article we present new probability-theoretic methods for making estimates from RDS data. The new estimators offer improved simplicity, analytical tractability, and allow the estimation of continuous variables. An analytical variance estimator is proposed in the case of estimating categorical variables. The properties of the estimator and the associated variance estimator are explored in a simulation study, and compared to alternative RDS estimators using data from a study of New York City jazz musicians. The new estimator gives results consistent with alternative RDS estimators in the study of jazz musicians, and demonstrates greater precision than alternative estimators in the simulation study. Key words: Respondent driven sampling; chain-referral sampling; Hansen–Hurwitz; MCMC. 1. Introduction Chain-referral sampling has emerged as a powerful method for sampling hard-to-reach or hidden populations. Such sampling methods are favored for such populations because they do not require the specification of a sampling frame. The lack of a sampling frame means that the survey data from a chain-referral sample is contingent on a number of factors outside the researcher’s control such as the social network on which recruitment takes place.
    [Show full text]
  • Detection and Estimation Theory Introduction to ECE 531 Mojtaba Soltanalian- UIC the Course
    Detection and Estimation Theory Introduction to ECE 531 Mojtaba Soltanalian- UIC The course Lectures are given Tuesdays and Thursdays, 2:00-3:15pm Office hours: Thursdays 3:45-5:00pm, SEO 1031 Instructor: Prof. Mojtaba Soltanalian office: SEO 1031 email: [email protected] web: http://msol.people.uic.edu/ The course Course webpage: http://msol.people.uic.edu/ECE531 Textbook(s): * Fundamentals of Statistical Signal Processing, Volume 1: Estimation Theory, by Steven M. Kay, Prentice Hall, 1993, and (possibly) * Fundamentals of Statistical Signal Processing, Volume 2: Detection Theory, by Steven M. Kay, Prentice Hall 1998, available in hard copy form at the UIC Bookstore. The course Style: /Graduate Course with Active Participation/ Introduction Let’s start with a radar example! Introduction> Radar Example QUIZ Introduction> Radar Example You can actually explain it in ten seconds! Introduction> Radar Example Applications in Transportation, Defense, Medical Imaging, Life Sciences, Weather Prediction, Tracking & Localization Introduction> Radar Example The strongest signals leaking off our planet are radar transmissions, not television or radio. The most powerful radars, such as the one mounted on the Arecibo telescope (used to study the ionosphere and map asteroids) could be detected with a similarly sized antenna at a distance of nearly 1,000 light-years. - Seth Shostak, SETI Introduction> Estimation Traditionally discussed in STATISTICS. Estimation in Signal Processing: Digital Computers ADC/DAC (Sampling) Signal/Information Processing Introduction> Estimation The primary focus is on obtaining optimal estimation algorithms that may be implemented on a digital computer. We will work on digital signals/datasets which are typically samples of a continuous-time waveform.
    [Show full text]
  • Analog Transmit Signal Optimization for Undersampled Delay-Doppler
    Analog Transmit Signal Optimization for Undersampled Delay-Doppler Estimation Andreas Lenz∗, Manuel S. Stein†, A. Lee Swindlehurst‡ ∗Institute for Communications Engineering, Technische Universit¨at M¨unchen, Germany †Mathematics Department, Vrije Universiteit Brussel, Belgium ‡Henry Samueli School of Engineering, University of California, Irvine, USA E-Mail: [email protected], [email protected], [email protected] Abstract—In this work, the optimization of the analog transmit achievable sampling rate fs at the receiver restricts the band- waveform for joint delay-Doppler estimation under sub-Nyquist width B of the transmitter and therefore the overall system conditions is considered. Based on the Bayesian Cramer-Rao´ performance. Since the sampling rate forms a bottleneck with lower bound (BCRLB), we derive an estimation theoretic design rule for the Fourier coefficients of the analog transmit signal respect to power resources and hardware limitations [2], it when violating the sampling theorem at the receiver through a is necessary to find a trade-off between high performance wide analog pre-filtering bandwidth. For a wireless delay-Doppler and low complexity. Therefore we discuss how to design channel, we obtain a system optimization problem which can be the transmit signal for delay-Doppler estimation without the solved in compact form by using an Eigenvalue decomposition. commonly used restriction from the sampling theorem. The presented approach enables one to explore the Pareto region spanned by the optimized analog waveforms. Furthermore, we Delay-Doppler estimation has been discussed for decades demonstrate how the framework can be used to reduce the in the signal processing community [3]–[5]. In [3] a subspace sampling rate at the receiver while maintaining high estimation based algorithm for the estimation of multi-path delay-Doppler accuracy.
    [Show full text]
  • 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017
    10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe iid yi ∼ pθ; θ 2 Θ n and the goal is to determine the θ that produced fyigi=1. Given a collection of observations y1; :::; yn and a probability model p(y1; :::; ynjθ) parameterized by the parameter θ, determine the value of θ that best matches the observations. 2 / 34 Estimation Using the Likelihood Definition: Likelihood function p(yjθ) as a function of θ with y fixed is called the \likelihood function". If the likelihood function carries the information about θ brought by the observations y = fyigi, how do we use it to obtain an estimator? Definition: Maximum Likelihood Estimation θbMLE = arg max p(yjθ) θ2Θ is the value of θ that maximizes the density at y. Intuitively, we are choosing θ to maximize the probability of occurrence for y. 3 / 34 Maximum Likelihood Estimation MLEs are a very important type of estimator for the following reasons: I MLE occurs naturally in composite hypothesis testing and signal detection (i.e., GLRT) I The MLE is often simple and easy to compute I MLEs are invariant under reparameterization I MLEs often have asymptotic optimal properties (e.g. consistency (MSE ! 0 as N ! 1) 4 / 34 Computing the MLE If the likelihood function is differentiable, then θb is found from @ log p(yjθ) = 0 @θ If multiple solutions exist, then the MLE is the solution that maximizes log p(yjθ). That is, take the global maximizer. Note: It is possible to have multiple global maximizers that are all MLEs! 5 / 34 Example: Estimating the mean and variance of a Gaussian iid 2 yi = A + νi; νi ∼ N (0; σ ); i = 1; ··· ; n θ = [A; σ2]> n @ log p(yjθ) 1 X = (y − A) @A σ2 i i=1 n @ log p(yjθ) n 1 X = − + (y − A)2 @σ2 2σ2 2σ4 i i=1 n 1 X ) Ab = yi n i=1 n 2 1 X 2 ) σc = (yi − Ab) n i=1 Note: σc2 is biased! 6 / 34 Example: Stock Market (Dow-Jones Industrial Avg.) Based on this plot we might conjecture that the data is \on average" increasing.
    [Show full text]
  • Information Theory, Pattern Recognition and Neural Networks Part III Physics Exams 2006
    Part III Physics exams 2004–2006 Information Theory, Pattern Recognition and Neural Networks Part III Physics exams 2006 1 A channel has a 3-bit input, x 000, 001, 010, 011, 100, 101, 110, 111 , ∈{ } and a 2-bit output y 00, 01, 10, 11 . Given an input x, the output y is ∈{ } generated by deleting exactly one of the three input bits, selected at random. For example, if the input is x = 010 then P (y x) is 1/3 for each of the outputs 00, 10, and 01; If the input is x = 001 then P (y=|01 x)=2/3 and P (y=00 x)=1/3. | | Write down the conditional entropies H(Y x=000), H(Y x=010), and H(Y x=001). | | [3] |Assuming an input distribution of the form x 000 001 010 011 100 101 110 111 1 p p p p p 1 p P (x) − 0 0 − , 2 4 4 4 4 2 work out the conditional entropy H(Y X) and show that | 2 H(Y )=1+ H p , 2 3 where H2(x)= x log2(1/x)+(1 x)log2(1/(1 x)). [3] Sketch H(Y ) and H(Y X−) as a function− of p (0, 1) on a single diagram. [5] | ∈ Sketch the mutual information I(X; Y ) as a function of p. [2] H2(1/3) 0.92. ≃ Another channel with a 3-bit input x 000, 001, 010, 011, 100, 101, 110, 111 , ∈{ } erases exactly one of its three input bits, marking the erased symbol by a ?.
    [Show full text]
  • On Measures of Entropy and Information
    On Measures of Entropy and Information Tech. Note 009 v0.7 http://threeplusone.com/info Gavin E. Crooks 2018-09-22 Contents 5 Csiszar´ f-divergences 12 Csiszar´ f-divergence ................ 12 0 Notes on notation and nomenclature 2 Dual f-divergence .................. 12 Symmetric f-divergences .............. 12 1 Entropy 3 K-divergence ..................... 12 Entropy ........................ 3 Fidelity ........................ 12 Joint entropy ..................... 3 Marginal entropy .................. 3 Hellinger discrimination .............. 12 Conditional entropy ................. 3 Pearson divergence ................. 14 Neyman divergence ................. 14 2 Mutual information 3 LeCam discrimination ............... 14 Mutual information ................. 3 Skewed K-divergence ................ 14 Multivariate mutual information ......... 4 Alpha-Jensen-Shannon-entropy .......... 14 Interaction information ............... 5 Conditional mutual information ......... 5 6 Chernoff divergence 14 Binding information ................ 6 Chernoff divergence ................. 14 Residual entropy .................. 6 Chernoff coefficient ................. 14 Total correlation ................... 6 Renyi´ divergence .................. 15 Lautum information ................ 6 Alpha-divergence .................. 15 Uncertainty coefficient ............... 7 Cressie-Read divergence .............. 15 Tsallis divergence .................. 15 3 Relative entropy 7 Sharma-Mittal divergence ............. 15 Relative entropy ................... 7 Cross entropy
    [Show full text]
  • Lessons in Estimation Theory for Signal Processing, Communications, and Control
    Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL PTR Englewood Cliffs, New Jersey 07632 Contents Preface xvii LESSON 1 Introduction, Coverage, Philosophy, and Computation 1 Summary 1 Introduction 2 Coverage 3 Philosophy 6 Computation 7 Summary Questions 8 LESSON 2 The Linear Model Summary 9 Introduction 9 Examples 10 Notational Preliminaries 18 Computation 20 Supplementary Material: Convolutional Model in Reflection Seismology 21 Summary Questions 23 Problems 24 VII LESSON 3 Least-squares Estimation: Batch Processing 27 Summary 27 Introduction 27 Number of Measurements 29 Objective Function and Problem Statement 29 Derivation of Estimator 30 Fixed and Expanding Memory Estimators 36 Scale Changes and Normalization of Data 36 Computation 37 Supplementary Material: Least Squares, Total Least Squares, and Constrained Total Least Squares 38 Summary Questions 39 Problems 40 LESSON 4 Least-squares Estimation: Singular-value Decomposition 44 Summary 44 Introduction 44 Some Facts from Linear Algebra 45 Singular-value Decomposition 45 Using SVD to Calculate dLS(k) 49 Computation 51 Supplementary Material: Pseudoinverse 51 Summary Questions 53 Problems 54 LESSON 5 Least-squares Estimation: Recursive Processing 58 Summary 58 Introduction 58 Recursive Least Squares: Information Form 59 Matrix Inversion Lemma 62 Recursive Least Squares: Covariance Form 63 Which Form to Use 64 Generalization to Vector
    [Show full text]
  • Information Theory and Entropy
    3 Information Theory and Entropy Solomon Kullback (1907–1994) was born in Brooklyn, New York, USA, and graduated from the City College of New York in 1927, received an M.A. degree in mathematics in 1929, and completed a Ph.D. in mathematics from the George Washington University in 1934. Kully as he was known to all who knew him, had two major careers: one in the Defense Department (1930–1962) and the other in the Department of Statistics at George Washington University (1962–1972). He was chairman of the Statistics Department from 1964–1972. Much of his pro- fessional life was spent in the National Security Agency and most of his work during this time is still classified. Most of his studies on information theory were done during this time. Many of his results up to 1958 were published in his 1959 book, “Information Theory and Statistics.” Additional details on Kullback may be found in Greenhouse (1994) and Anonymous (1997). When we receive something that decreases our uncertainty about the state of the world, it is called information. Information is like “news,” it informs. Informa- tion is not directly related to physical quantities. Information is not material and is not a form of energy, but it can be stored and communicated using material or energy means. It cannot be measured with instruments but can be defined in terms of a probability distribution. Information is a decrease in uncertainty. 52 3. Information Theory and Entropy This textbook is about a relatively new approach to empirical science called “information-theoretic.” The name comes from the fact that the foundation originates in “information theory”; a set of fundamental discoveries made largely during World War II with many important extensions since that time.
    [Show full text]
  • Lecture Notes (Chapter
    Chapter 12 Information Theory So far in this course, we have discussed notions such as “complexity” and “in- formation”, but we have not grounded these ideas in formal detail. In this part of the course, we will introduce key concepts from the area of information the- ory, such as formal definitions of information and entropy, as well as how these definitions relate to concepts previously discussed in the course. As we will see, many of the ideas introduced in previous chapters—such as the notion of maximum likelihood—can be re-framed through the lens of information theory. Moreover, we will discuss how machine learning methods can be derived from a information-theoretic perspective, based on the idea of maximizing the amount of information gain from training data. In this chapter, we begin with key definitions from information theory, as well as how they relate to previous concepts in this course. In this next chapter, we will introduce a new supervised learning technique—termed decision trees— which can be derived from an information-theoretic perspective. 12.1 Entropy and Information Suppose we have a discrete random variable x. The key intuition of information theory is that we want to quantify how much information this random variable conveys. In particular, we want to know how much information we obtain when we observe a particular value for this random variable. This idea is often motivated via the notion of surprise: the more surprising an observation is, the more information it contains. Put in another way, if we observe a random event that is completely predictable, then we do not gain any information, since we could already predict the result.
    [Show full text]
  • On the Aliasing and Resolving Power of Sea Level Low-Pass Filtered
    APRIL 2008 TAI 617 On the Aliasing and Resolving Power of Sea Level Low-Pass Filtered onto a Regular Grid from Along-Track Altimeter Data of Uncoordinated Satellites: The Smoothing Strategy CHANG-KOU TAI NOAA/NESDIS, Camp Springs, Maryland (Manuscript received 14 July 2006, in final form 20 June 2007) ABSTRACT It is shown that smoothing (low-pass filtering) along-track altimeter data of uncoordinated satellites onto a regular space–time grid helps reduce the overall energy level of the aliasing from the aliasing levels of the individual satellites. The rough rule of thumb is that combining N satellites reduces the energy of the overall aliasing to 1/N of the average aliasing level of the N satellites. Assuming the aliasing levels of these satellites are roughly of the same order of magnitude (i.e., assuming that no special signal spectral content signifi- cantly favors one satellite over others at certain locations), combining data from uncoordinated satellites is clearly the right strategy. Moreover, contrary to the case of coordinated satellites, this reduction of aliasing is not achieved by the enhancement of the overall resolving power. In fact (by the strict definition of the resolving power as the largest bandwidths within which a band-limited signal remains free of aliasing), the resolving power is reduced to its smallest possible extent. If one characterizes the resolving power of each satellite as a spectral space within which all band-limited signals are resolved by the satellite, then the combined resolving power of the N satellite is characterized by the spectral space that is the intersection of all N spectral spaces (i.e., the spectral space that is common to all the resolved spectral spaces of the N satellites, hence the smallest).
    [Show full text]
  • Estimation Theory
    IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. II (Nov – Dec. 2014), PP 30-35 www.iosrjournals.org Estimation Theory Dr. Mcchester Odoh and Dr. Ihedigbo Chinedum E. Department Of Computer Science Michael Opara University Of Agriculture, Umudike, Abia State I. Introduction In discussing estimation theory in detail, it will be essential to recall the following definitions. Statistics: A statistics is a number that describes characteristics of a sample. In other words, it is a statistical constant associated with the sample. Examples are mean, sample variance and sample standard deviation (chukwu, 2007). Parameter: this is a number that describes characteristics of the population. In other words, it is a statistical constant associated with the population. Examples are population mean, population variance and population standard deviation. A statistic called an unbiased estimator of a population parameter if the mean of the static is equal to the parameter; the corresponding value of the statistic is then called an unbiased estimate of the parameter. (Spiegel, 1987). Estimator: any statistic 0=0 (x1, x2, x3.......xn) used to estimate the value of a parameter 0 of the population is called estimator of 0 whereas, any observed value of the statistic 0=0 (x1, x2, x3.......xn) is known as the estimate of 0 (Chukwu, 2007). II. Estimation Theory Statistical inference can be defined as the process by which conclusions are about some measure or attribute of a population (eg mean or standard deviation) based upon analysis of sample data. Statistical inference can be conveniently divided into two types- estimation and hypothesis testing (Lucey,2002).
    [Show full text]
  • Quantization-Loss Reduction for Signal Parameter Estimation
    QUANTIZATION-LOSS REDUCTION FOR SIGNAL PARAMETER ESTIMATION Manuel Stein, Friederike Wendler, Amine Mezghani, Josef A. Nossek Institute for Circuit Theory and Signal Processing Technische Universitat¨ Munchen,¨ Germany ABSTRACT front-end. Coarse signal quantization introduces nonlinear ef- fects which have to be taken into account in order to obtain Using coarse resolution analog-to-digital conversion (ADC) optimum system performance. Therefore, adapting existent offers the possibility to reduce the complexity of digital methods and developing new estimation algorithms, operat- receive systems but introduces a loss in effective signal-to- ing on quantized data, becomes more and more important. noise ratio (SNR) when comparing to ideal receivers with infinite resolution ADC. Therefore, here the problem of sig- An early work on the subject of estimating unknown pa- nal parameter estimation from a coarsely quantized receive rameters based on quantized signals can be found in [1]. In [2] signal is considered. In order to increase the system perfor- [3] [4] the authors studied channel parameter estimation based mance, we propose to adjust the analog radio front-end to on a single-bit quantizer assuming uncorrelated noise. The the quantization device in order to reduce the quantization- problem of signal processing with low resolution has been loss. By optimizing the bandwidth of the analog filter with considered in another line of work concerned with data trans- respect to a weighted form of the Cramer-Rao´ lower bound mission. It turns out that the well known reduction of low (CRLB), we show that for low SNR and a 1-bit hard-limiting SNR channel capacity by factor 2/π (−1:96 dB) due to 1-bit device it is possible to significantly reduce the quantization- quantization [5] holds also for the general MIMO case with loss of initially -1.96 dB.
    [Show full text]