M- and Z-Estimation

Total Page:16

File Type:pdf, Size:1020Kb

M- and Z-Estimation STAT/BMI 741 University of Wisconsin-Madison Empirical Processes & Survival Analysis Lecture 1 M- and Z-Estimation Lu Mao [email protected] 1-1 Objectives By the end of this lecture, you will 1. know what the term empirical process means; 2. get (re-)familiarized with M- and Z-estimators and their asymptotic properties; 3. be somewhat acquainted with defining and solving infinite-dimensional estimating equations (e.g., Nelsen-Aalen, Cox model); 4. be able to heuristically derive functional derivatives. M- and Z-Estimation 1-2 1.1 Motivation and Warm-Up 1.2 Asymptotics for Z-Estimators 1.3 Infinite-Dimensional Estimating Functions 1.4 Application: Classical Survival Analysis Methods as NPMLE M- and Z-Estimation 1-3 Contents 1.1 Motivation and Warm-Up 1.2 Asymptotics for Z-Estimators 1.3 Infinite-Dimensional Estimating Functions 1.4 Application: Classical Survival Analysis Methods as NPMLE M- and Z-Estimation 1-4 Notation We deal with an i.i.d. sample of size n and use X to denote the generic observation until further notice. Empirical measure Pn: n −1 X Pnf(X) = n f(Xi) i=1 Underlying probability measure P : P f(X) = Ef(X) Standardized empirical measure Gn: √ Gnf(X) = n(Pn − P )f(X) n −1/2 X = n {f(Xi) − Ef(X)} i=1 M- and Z-Estimation 1-5 Notation By the (ordinary) central limit theorem, Gnf(X) N 0, Varf(X) , where denotes weak convergence. The empirical process is defined as: {Gnf : f ∈ F} where F is a class of functions, e.g., p F = {fθ : θ ∈ Θ ⊂ R }, F = {All uniformly bounded non-decreasing (or bounded-variation) functions on [0, τ]}. M- and Z-Estimation 1-6 Motivation and Warm-Up Why consider varying f? Example: the classical empirical process: Gn1(X ≤ t), t ∈ R, which converges (uniformly in t) weakly to a tight Brownian bridge GF with covariance function F (t ∧ s) − F (t)F (s) where F (t) = Pr(X ≤ t). Furthermore, GF has continuous sample paths GF (s) = GF (t) + o(1),F -a.s. as d(s, t) → 0, where d(s, t) = |F (s) − F (t)|. M- and Z-Estimation 1-7 Motivation and Warm-Up The uniform (weak) convergence and continuity of the sample paths of the limiting distribution have two important implications. 1. Confidence Band: √ n sup |Pn1(X ≤ t) − F (t)| supt∈[a,b]|GF | t∈[a,b] confidence bands for F (·) on [a, b] can be constructed using (estimated) quantiles of the right-hand side of the above display. 2. Asymptotic Continuity (AC): Gn1(X ≤ xn) = Gn1(X ≤ x) + oP (1) ∀xn, x, s.t. d(xn, x) → 0. M- and Z-Estimation 1-8 Motivation and Warm-Up 2. Asymptotic Continuity (AC, cont’d): To see this, use strong approximation to construct a GF such that |Gn1(X ≤ t) − GF (t)| →a.s. 0. Then, with probability one, Gn1(X ≤ xn) = GF (xn) + o(1) (uniform convergence) = GF (x) + o(1) (continuity of GF ) = Gn1(X ≤ x) + o(1). (pointwise convergence) The AC property is tremendously useful in the general setting, especially when one wishes to replace the (erratically behaved) sample average operation Pn with the population average operation P , which is usually smoother. M- and Z-Estimation 1-9 Motivation and Warm-Up What is the general setting? The problem of uniform weak convergence with varying f can be framed as weak convergence in the space of all bounded functions, i.e., ∞ Gnf GP (f), in l (F), || · ||F (1.1) ∞ for some tight process GP , where l (F) denotes the space of bounded functionals on F and || · ||F the supreme norm. The limiting process GP , if it exists at all, is always Gaussian with covariance matrix σ(f, g) = P fg − P fP g A more remarkable fact is that the tight process GP has uniformly continuous sample paths with respect to the standard deviation metric ρ(·, ·) ρ(f, g) = pP (f − P f − g + P g)2 M- and Z-Estimation 1-10 Motivation and Warm-Up So, if the uniform weak convergence (1.1) holds, one is allowed to 1. Construct confidence band for the functional P f, f ∈ F 2. Use the AC condition Gnfn = Gnf + oP (1) provided that ρ(fn, f) → 0. More applications of empirical processes theory include quantifying the modulus of continuity φn(δ) = sup |Gn(f − g)|, f,h∈F,ρ(f,g)<δ which is useful in deriving the rate of convergence for M-estimators. M- and Z-Estimation 1-11 Motivation and Warm-Up Why empirical processes (EP) in survival analysis 1. Survival analysis deals with processes (indexed by time), a natural setting for EP; 2. Survival analysis is usually concerned with non- and semi-parametric models, whose infinite-dimensional parameters are difficult to deal with (or outright intractable) by traditional tools of asymptotics; 3. While martingale theory offers an elegant conceptual framework, the range of problems it can handle is limited, especially when one ventures beyond the conventional univariate setting into the realms of, e.g., recurrent events, multivariate failure times, (semi-)competing risks, etc.. The EP is much more powerful and versatile than is the martingale theory. M- and Z-Estimation 1-12 Motivation and Warm-Up In this series of lectures we aim to 1. give an expository introduction of EP theory as a tool of asymptotic statistics; 2. illustrate the powerfulness and versatility of EP techniques with a sample of survival analysis problems; In the process, we will 1. be as mathematically precise as it need be; 2. not be overly concerned with regularity conditions, e.g., measureability in non-separable Banach spaces such as l∞(F); 3. stick to the principle of “intuition first, technicality later”. M- and Z-Estimation 1-13 Contents 1.1 Motivation and Warm-Up 1.2 Asymptotics for Z-Estimators 1.3 Infinite-Dimensional Estimating Functions 1.4 Application: Classical Survival Analysis Methods as NPMLE M- and Z-Estimation 1-14 M-Estimation Consider estimation of θ, whose true state of nature θ0 satisfies θ0 = arg max P mθ θ e.g. 1. Classical parametric model with density pθ(X) pθ mθ(X) = log (X) pθ0 2. A mean regression model Eθ(Y |Z) = µ(Z; θ) 2 mθ(Y, Z) = −(Y − µ(Z; θ)) M- and Z-Estimation 1-15 M-Estimation A natural estimator is θbn = arg max Pnmθ θ e.g. 1. Maximum likelihood estimator for classical parametric model ML θbn = arg max Pn log pθ(X) θ 2. Least squares estimator for mean regression model LS 2 θbn = arg min Pn(Y − µ(Z; θ)) θ M- and Z-Estimation 1-16 Z-Estimation When mθ(X) is smooth in θ, can compute θbn by solving the “score” equation Pnm˙ θ(X) = 0, ∂ where m˙ θ = ∂θ mθ. We thus consider Z-estimators defined as solutions to estimating equations of the following type Pnψθ = 0 where the root, denoted as θbn, is expected to estimate θ0 with P ψθ0 = 0. M- and Z-Estimation 1-17 Z-Estimation Formally, √ √ √ ˙ 0 = nPnψ = nPnψθ0 + Pnψ n(θbn − θ0), bθn eθn where θen is between θ0 and θbn. ˙ ˙ ˙ ˙ If Pnψ → P ψθ0 (e.g., if supθ |Pn − P |ψθ →P 0 and P ψθ is eθn continuous at θ0), then √ ˙ −1 n(θbn − θ0) = −(P ψθ0 ) Gnψθ0 + oP (1) N 0, (P ψ˙ )−1P ψ⊗2(P ψ˙ T )−1 , θ0 θ0 θ0 where the asymptotic variance can be consistently estimated by a (moment-based) sandwich estimator. But the above derivations would need (argument-wise, i.e., for each fixed x) smoothness conditions on the estimating function ψθ(x). M- and Z-Estimation 1-18 Z-Estimation It is not useful for deriving the asymptotic distributions of, e.g., the pth sample quantile, which is an approximate zero of Ψn(θ) := Pn 1(X ≤ θ) − p . But if the following holds Gnψ = Gnψθ0 + oP (1), (1.2) bθn √ √ i.e., n n(ψ − ψθ0 ) = nP (ψ − ψθ0 ) + oP (1), then we can show that P θn θn ∂ b ∂ b ∂θ Pnψθ can be replaced by ∂θ P ψθ. Condition (1.2) is an asymptotic continuity (AC) condition, and is true if a uniform central limit theorem of the empirical process {Gnψθ : ||θ − θ0|| < δ} holds and ρ(ψ , ψθ0 ) →P 0 as discussed in the previous section. bθn M- and Z-Estimation 1-19 Z-Estimation - Consistency Theorem 1.1 (Consistency of Z-Estimators) Suppose (1.1a) supθ |Pn − P |ψθ →P 0 (1.1b) For every {θn} with P ψθn → 0, we have θn → θ0 If Pnψ = oP (1), then θbn →p θ0. bθn Remark 1.1 (Conditions for consistency) Condition (1.1a) is a uniform-law-of-large-numbers type condition; Condition (1.1b) says that θ0, as the zero of P ψθ need be not only unique but also well separated (c.f. van der Vaart, 1998, Figure 5.2) M- and Z-Estimation 1-20 Z-Estimation Exercise 1.1 (A sufficient condition for well-separatedness) If P ψθ is continuous in θ which ranges in a compact set, then a unique zero θ0 is also well separated. Proof of Theorem 1.1: By (1.1a), P ψ = Pnψ + oP (1) = oP (1). (1.3) bθn bθn The result follows from (1.1b) (by sub-sequence arguments). Exercise 1.2 (Completion of Proof of Theorem 1.1) Show that given (1.1b), (1.3) implies θbn →P θ0 using sub-sequence arguments. M- and Z-Estimation 1-21 Z-Estimation Theorem 1.2 (Asymptotic Normality of Z-Estimator) Suppose (1.2a) The asymptotic continuity condition holds, i.e., Gnψ = Gnψ0 + oP (1) bθn (1.2b) The function P ψθ is differentiable at θ0 with an invertible derivative Vθ0 .
Recommended publications
  • Probability Based Estimation Theory for Respondent Driven Sampling
    Journal of Official Statistics, Vol. 24, No. 1, 2008, pp. 79–97 Probability Based Estimation Theory for Respondent Driven Sampling Erik Volz1 and Douglas D. Heckathorn2 Many populations of interest present special challenges for traditional survey methodology when it is difficult or impossible to obtain a traditional sampling frame. In the case of such “hidden” populations at risk of HIV/AIDS, many researchers have resorted to chain-referral sampling. Recent progress on the theory of chain-referral sampling has led to Respondent Driven Sampling (RDS), a rigorous chain-referral method which allows unbiased estimation of the target population. In this article we present new probability-theoretic methods for making estimates from RDS data. The new estimators offer improved simplicity, analytical tractability, and allow the estimation of continuous variables. An analytical variance estimator is proposed in the case of estimating categorical variables. The properties of the estimator and the associated variance estimator are explored in a simulation study, and compared to alternative RDS estimators using data from a study of New York City jazz musicians. The new estimator gives results consistent with alternative RDS estimators in the study of jazz musicians, and demonstrates greater precision than alternative estimators in the simulation study. Key words: Respondent driven sampling; chain-referral sampling; Hansen–Hurwitz; MCMC. 1. Introduction Chain-referral sampling has emerged as a powerful method for sampling hard-to-reach or hidden populations. Such sampling methods are favored for such populations because they do not require the specification of a sampling frame. The lack of a sampling frame means that the survey data from a chain-referral sample is contingent on a number of factors outside the researcher’s control such as the social network on which recruitment takes place.
    [Show full text]
  • Autoregressive Conditional Kurtosis
    Autoregressive conditional kurtosis Article Accepted Version Brooks, C., Burke, S. P., Heravi, S. and Persand, G. (2005) Autoregressive conditional kurtosis. Journal of Financial Econometrics, 3 (3). pp. 399-421. ISSN 1479-8417 doi: https://doi.org/10.1093/jjfinec/nbi018 Available at http://centaur.reading.ac.uk/20558/ It is advisable to refer to the publisher’s version if you intend to cite from the work. See Guidance on citing . Published version at: http://dx.doi.org/10.1093/jjfinec/nbi018 To link to this article DOI: http://dx.doi.org/10.1093/jjfinec/nbi018 Publisher: Oxford University Press All outputs in CentAUR are protected by Intellectual Property Rights law, including copyright law. Copyright and IPR is retained by the creators or other copyright holders. Terms and conditions for use of this material are defined in the End User Agreement . www.reading.ac.uk/centaur CentAUR Central Archive at the University of Reading Reading’s research outputs online This is a pre-copyedited, author-produced PDF of an article accepted for publication in the Journal of Financial Econometrics following peer review. The definitive publisher-authenticated version (C. Brooks, S.P. Burke, S. Heravi and G. Persand, ‘Autoregressive Conditional Kurtosis’, Journal of Financial Econometrics, 3.3 (2005)) is available online at: http://jfec.oxfordjournals.org/content/3/3/399 1 Autoregressive Conditional Kurtosis Chris Brooks1, Simon P. Burke2, Saeed Heravi3, Gita Persand4 The authors’ affiliations are 1Corresponding author: Cass Business School, City of London, 106 Bunhill Row, London EC1Y 8TZ, UK, tel: (+44) 20 70 40 51 68; fax: (+44) 20 70 40 88 81 41; e-mail: [email protected] ; 2School of Business, University of Reading 3Cardiff Business School, and 4Management School, University of Southampton.
    [Show full text]
  • Detection and Estimation Theory Introduction to ECE 531 Mojtaba Soltanalian- UIC the Course
    Detection and Estimation Theory Introduction to ECE 531 Mojtaba Soltanalian- UIC The course Lectures are given Tuesdays and Thursdays, 2:00-3:15pm Office hours: Thursdays 3:45-5:00pm, SEO 1031 Instructor: Prof. Mojtaba Soltanalian office: SEO 1031 email: [email protected] web: http://msol.people.uic.edu/ The course Course webpage: http://msol.people.uic.edu/ECE531 Textbook(s): * Fundamentals of Statistical Signal Processing, Volume 1: Estimation Theory, by Steven M. Kay, Prentice Hall, 1993, and (possibly) * Fundamentals of Statistical Signal Processing, Volume 2: Detection Theory, by Steven M. Kay, Prentice Hall 1998, available in hard copy form at the UIC Bookstore. The course Style: /Graduate Course with Active Participation/ Introduction Let’s start with a radar example! Introduction> Radar Example QUIZ Introduction> Radar Example You can actually explain it in ten seconds! Introduction> Radar Example Applications in Transportation, Defense, Medical Imaging, Life Sciences, Weather Prediction, Tracking & Localization Introduction> Radar Example The strongest signals leaking off our planet are radar transmissions, not television or radio. The most powerful radars, such as the one mounted on the Arecibo telescope (used to study the ionosphere and map asteroids) could be detected with a similarly sized antenna at a distance of nearly 1,000 light-years. - Seth Shostak, SETI Introduction> Estimation Traditionally discussed in STATISTICS. Estimation in Signal Processing: Digital Computers ADC/DAC (Sampling) Signal/Information Processing Introduction> Estimation The primary focus is on obtaining optimal estimation algorithms that may be implemented on a digital computer. We will work on digital signals/datasets which are typically samples of a continuous-time waveform.
    [Show full text]
  • Quadratic Versus Linear Estimating Equations
    ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Quadratic versus Linear Estimating Equations GLS estimating equations 0 1 n fβj 0 2 2 −1 X σ gj 0 Yj − fj @ 2 2 1/σ A 4 4 2 2 2 = 0: 0 2σ gj 0 2σ gj (Yj − fj ) − σ gj j=1 νθj Estimating equations for β are linear in Yj . Estimating equations for β only require specification of the first two moments. GLS is optimal among all linear estimating equations. 1 / 26 Quadratic versus Linear Estimating Equations ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Gaussian ML estimating equations 0 2 2 1 n fβj 2σ gj νβj 2 2 −1 X σ g 0 Yj − fj j = 0: @ 2 2 1/σ A 4 4 (Y − f )2 − σ2g 2 0 2σ gj 0 2σ gj j j j j=1 νθj Estimating equations for β are quadratic in Yj . Estimating equations for β require specification of the third and fourth moments as well. Specifically, if we let Yj − f (xj ; β) j = ; σg (β; θ; xj ) then we need to know 3 ∗ 2 ∗ E j = ζj and var j = 2 + κj : 2 / 26 Quadratic versus Linear Estimating Equations ST 762 Nonlinear Statistical Models for Univariate and Multivariate Response Questions ∗ ∗ ^ If we know the true values ζj and κj , how much is β improved using the quadratic estimating equations versus using the linear estimating equations? If we use working values (for example ζj = κj = 0, corresponding to ∗ ∗ normality) that are not the true values (i.e., ζj and κj ), is there any improvement in using the quadratic estimating equations? If we use working variance functions that are not the true variance functions, is there
    [Show full text]
  • Inference Based on Estimating Equations and Probability-Linked Data
    University of Wollongong Research Online Centre for Statistical & Survey Methodology Faculty of Engineering and Information Working Paper Series Sciences 2009 Inference Based on Estimating Equations and Probability-Linked Data R. Chambers University of Wollongong, [email protected] J. Chipperfield Australian Bureau of Statistics Walter Davis University of Wollongong, [email protected] M. Kovacevic Statistics Canada Follow this and additional works at: https://ro.uow.edu.au/cssmwp Recommended Citation Chambers, R.; Chipperfield, J.; Davis, Walter; and Kovacevic, M., Inference Based on Estimating Equations and Probability-Linked Data, Centre for Statistical and Survey Methodology, University of Wollongong, Working Paper 18-09, 2009, 36p. https://ro.uow.edu.au/cssmwp/38 Research Online is the open access institutional repository for the University of Wollongong. For further information contact the UOW Library: [email protected] Centre for Statistical and Survey Methodology The University of Wollongong Working Paper 18-09 Inference Based on Estimating Equations and Probability-Linked Data Ray Chambers, James Chipperfield, Walter Davis, Milorad Kovacevic Copyright © 2008 by the Centre for Statistical & Survey Methodology, UOW. Work in progress, no part of this paper may be reproduced without permission from the Centre. Centre for Statistical & Survey Methodology, University of Wollongong, Wollongong NSW 2522. Phone +61 2 4221 5435, Fax +61 2 4221 4845. Email: [email protected] Inference Based on Estimating Equations and Probability-Linked Data Ray Chambers, University of Wollongong James Chipperfield, Australian Bureau of Statistics Walter Davis, Statistics New Zealand Milorad Kovacevic, Statistics Canada Abstract Data obtained after probability linkage of administrative registers will include errors due to the fact that some linked records contain data items sourced from different individuals.
    [Show full text]
  • Analog Transmit Signal Optimization for Undersampled Delay-Doppler
    Analog Transmit Signal Optimization for Undersampled Delay-Doppler Estimation Andreas Lenz∗, Manuel S. Stein†, A. Lee Swindlehurst‡ ∗Institute for Communications Engineering, Technische Universit¨at M¨unchen, Germany †Mathematics Department, Vrije Universiteit Brussel, Belgium ‡Henry Samueli School of Engineering, University of California, Irvine, USA E-Mail: [email protected], [email protected], [email protected] Abstract—In this work, the optimization of the analog transmit achievable sampling rate fs at the receiver restricts the band- waveform for joint delay-Doppler estimation under sub-Nyquist width B of the transmitter and therefore the overall system conditions is considered. Based on the Bayesian Cramer-Rao´ performance. Since the sampling rate forms a bottleneck with lower bound (BCRLB), we derive an estimation theoretic design rule for the Fourier coefficients of the analog transmit signal respect to power resources and hardware limitations [2], it when violating the sampling theorem at the receiver through a is necessary to find a trade-off between high performance wide analog pre-filtering bandwidth. For a wireless delay-Doppler and low complexity. Therefore we discuss how to design channel, we obtain a system optimization problem which can be the transmit signal for delay-Doppler estimation without the solved in compact form by using an Eigenvalue decomposition. commonly used restriction from the sampling theorem. The presented approach enables one to explore the Pareto region spanned by the optimized analog waveforms. Furthermore, we Delay-Doppler estimation has been discussed for decades demonstrate how the framework can be used to reduce the in the signal processing community [3]–[5]. In [3] a subspace sampling rate at the receiver while maintaining high estimation based algorithm for the estimation of multi-path delay-Doppler accuracy.
    [Show full text]
  • 10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017
    10. Linear Models and Maximum Likelihood Estimation ECE 830, Spring 2017 Rebecca Willett 1 / 34 Primary Goal General problem statement: We observe iid yi ∼ pθ; θ 2 Θ n and the goal is to determine the θ that produced fyigi=1. Given a collection of observations y1; :::; yn and a probability model p(y1; :::; ynjθ) parameterized by the parameter θ, determine the value of θ that best matches the observations. 2 / 34 Estimation Using the Likelihood Definition: Likelihood function p(yjθ) as a function of θ with y fixed is called the \likelihood function". If the likelihood function carries the information about θ brought by the observations y = fyigi, how do we use it to obtain an estimator? Definition: Maximum Likelihood Estimation θbMLE = arg max p(yjθ) θ2Θ is the value of θ that maximizes the density at y. Intuitively, we are choosing θ to maximize the probability of occurrence for y. 3 / 34 Maximum Likelihood Estimation MLEs are a very important type of estimator for the following reasons: I MLE occurs naturally in composite hypothesis testing and signal detection (i.e., GLRT) I The MLE is often simple and easy to compute I MLEs are invariant under reparameterization I MLEs often have asymptotic optimal properties (e.g. consistency (MSE ! 0 as N ! 1) 4 / 34 Computing the MLE If the likelihood function is differentiable, then θb is found from @ log p(yjθ) = 0 @θ If multiple solutions exist, then the MLE is the solution that maximizes log p(yjθ). That is, take the global maximizer. Note: It is possible to have multiple global maximizers that are all MLEs! 5 / 34 Example: Estimating the mean and variance of a Gaussian iid 2 yi = A + νi; νi ∼ N (0; σ ); i = 1; ··· ; n θ = [A; σ2]> n @ log p(yjθ) 1 X = (y − A) @A σ2 i i=1 n @ log p(yjθ) n 1 X = − + (y − A)2 @σ2 2σ2 2σ4 i i=1 n 1 X ) Ab = yi n i=1 n 2 1 X 2 ) σc = (yi − Ab) n i=1 Note: σc2 is biased! 6 / 34 Example: Stock Market (Dow-Jones Industrial Avg.) Based on this plot we might conjecture that the data is \on average" increasing.
    [Show full text]
  • Generalized Estimating Equations for Mixed Models
    GENERALIZED ESTIMATING EQUATIONS FOR MIXED MODELS Lulah Alnaji A Dissertation Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY August 2018 Committee: Hanfeng Chen, Advisor Robert Dyer, Graduate Faculty Representative Wei Ning Junfeng Shang Copyright c August 2018 Lulah Alnaji All rights reserved iii ABSTRACT Hanfeng Chen, Advisor Most statistical approaches of molding the relationship between the explanatory variables and the responses assume subjects are independent. However, in clinical studies the longitudinal data are quite common. In this type of data, each subject is assessed repeatedly over a period of time. Therefore, the independence assumption is unlikely to be valid with longitudinal data due to the correlated observations of each subject. Generalized estimating equations method is a popular choice for longitudinal studies. It is an efficient method since it takes the within-subjects correla- tion into account by introducing the n n working correlation matrix R(↵) which is fully char- ⇥ acterized by the correlation parameter ↵. Although the generalized estimating equations’ method- ology considers correlation among the repeated observations on the same subject, it ignores the between-subject correlation and assumes subjects are independent. The objective of this dissertation is to provide an extension to the generalized estimating equa- tions to take both within-subject and between-subject correlations into account by incorporating the random effect b to the model. If our interest focuses on the regression coefficients, we regard the correlation parameter ↵ and as nuisance and estimate the fixed effects β using the estimating equations U(β,G,ˆ ↵ˆ).
    [Show full text]
  • Lessons in Estimation Theory for Signal Processing, Communications, and Control
    Lessons in Estimation Theory for Signal Processing, Communications, and Control Jerry M. Mendel Department of Electrical Engineering University of Southern California Los Angeles, California PRENTICE HALL PTR Englewood Cliffs, New Jersey 07632 Contents Preface xvii LESSON 1 Introduction, Coverage, Philosophy, and Computation 1 Summary 1 Introduction 2 Coverage 3 Philosophy 6 Computation 7 Summary Questions 8 LESSON 2 The Linear Model Summary 9 Introduction 9 Examples 10 Notational Preliminaries 18 Computation 20 Supplementary Material: Convolutional Model in Reflection Seismology 21 Summary Questions 23 Problems 24 VII LESSON 3 Least-squares Estimation: Batch Processing 27 Summary 27 Introduction 27 Number of Measurements 29 Objective Function and Problem Statement 29 Derivation of Estimator 30 Fixed and Expanding Memory Estimators 36 Scale Changes and Normalization of Data 36 Computation 37 Supplementary Material: Least Squares, Total Least Squares, and Constrained Total Least Squares 38 Summary Questions 39 Problems 40 LESSON 4 Least-squares Estimation: Singular-value Decomposition 44 Summary 44 Introduction 44 Some Facts from Linear Algebra 45 Singular-value Decomposition 45 Using SVD to Calculate dLS(k) 49 Computation 51 Supplementary Material: Pseudoinverse 51 Summary Questions 53 Problems 54 LESSON 5 Least-squares Estimation: Recursive Processing 58 Summary 58 Introduction 58 Recursive Least Squares: Information Form 59 Matrix Inversion Lemma 62 Recursive Least Squares: Covariance Form 63 Which Form to Use 64 Generalization to Vector
    [Show full text]
  • Using Generalized Estimating Equations to Estimate Nonlinear
    Using generalized estimating equations to estimate nonlinear models with spatial data ∗ § Cuicui Lu†, Weining Wang ‡, Jeffrey M. Wooldridge Abstract In this paper, we study estimation of nonlinear models with cross sectional data using two-step generalized estimating equations (GEE) in the quasi-maximum likelihood estimation (QMLE) framework. In the interest of improving efficiency, we propose a grouping estimator to account for the potential spatial correlation in the underlying innovations. We use a Poisson model and a Negative Binomial II model for count data and a Probit model for binary response data to demon- strate the GEE procedure. Under mild weak dependency assumptions, results on estimation consistency and asymptotic normality are provided. Monte Carlo simulations show efficiency gain of our approach in comparison of different esti- mation methods for count data and binary response data. Finally we apply the GEE approach to study the determinants of the inflow foreign direct investment (FDI) to China. keywords: quasi-maximum likelihood estimation; generalized estimating equations; nonlinear models; spatial dependence; count data; binary response data; FDI equation JEL Codes: C13, C21, C35, C51 arXiv:1810.05855v1 [econ.EM] 13 Oct 2018 ∗This paper is supported by the National Natural Science Foundation of China, No.71601094 and German Research Foundation. †Department of Economics, Nanjing University Business School, Nanjing, Jiangsu 210093 China; email: [email protected] ‡Department of Economics, City, U of London; Northampton Square, Clerkenwell, London EC1V 0HB. Humboldt-Universität zu Berlin, C.A.S.E. - Center for Applied Statistics and Economics; email: [email protected] §Department of Economics, Michigan State University, East Lansing, MI 48824 USA; email: [email protected] 1 1 Introduction In empirical economic and social studies, there are many examples of discrete data which exhibit spatial or cross-sectional correlations possibly due to the closeness of ge- ographical locations of individuals or agents.
    [Show full text]
  • On the Aliasing and Resolving Power of Sea Level Low-Pass Filtered
    APRIL 2008 TAI 617 On the Aliasing and Resolving Power of Sea Level Low-Pass Filtered onto a Regular Grid from Along-Track Altimeter Data of Uncoordinated Satellites: The Smoothing Strategy CHANG-KOU TAI NOAA/NESDIS, Camp Springs, Maryland (Manuscript received 14 July 2006, in final form 20 June 2007) ABSTRACT It is shown that smoothing (low-pass filtering) along-track altimeter data of uncoordinated satellites onto a regular space–time grid helps reduce the overall energy level of the aliasing from the aliasing levels of the individual satellites. The rough rule of thumb is that combining N satellites reduces the energy of the overall aliasing to 1/N of the average aliasing level of the N satellites. Assuming the aliasing levels of these satellites are roughly of the same order of magnitude (i.e., assuming that no special signal spectral content signifi- cantly favors one satellite over others at certain locations), combining data from uncoordinated satellites is clearly the right strategy. Moreover, contrary to the case of coordinated satellites, this reduction of aliasing is not achieved by the enhancement of the overall resolving power. In fact (by the strict definition of the resolving power as the largest bandwidths within which a band-limited signal remains free of aliasing), the resolving power is reduced to its smallest possible extent. If one characterizes the resolving power of each satellite as a spectral space within which all band-limited signals are resolved by the satellite, then the combined resolving power of the N satellite is characterized by the spectral space that is the intersection of all N spectral spaces (i.e., the spectral space that is common to all the resolved spectral spaces of the N satellites, hence the smallest).
    [Show full text]
  • Estimation Theory
    IOSR Journal of Computer Engineering (IOSR-JCE) e-ISSN: 2278-0661,p-ISSN: 2278-8727, Volume 16, Issue 6, Ver. II (Nov – Dec. 2014), PP 30-35 www.iosrjournals.org Estimation Theory Dr. Mcchester Odoh and Dr. Ihedigbo Chinedum E. Department Of Computer Science Michael Opara University Of Agriculture, Umudike, Abia State I. Introduction In discussing estimation theory in detail, it will be essential to recall the following definitions. Statistics: A statistics is a number that describes characteristics of a sample. In other words, it is a statistical constant associated with the sample. Examples are mean, sample variance and sample standard deviation (chukwu, 2007). Parameter: this is a number that describes characteristics of the population. In other words, it is a statistical constant associated with the population. Examples are population mean, population variance and population standard deviation. A statistic called an unbiased estimator of a population parameter if the mean of the static is equal to the parameter; the corresponding value of the statistic is then called an unbiased estimate of the parameter. (Spiegel, 1987). Estimator: any statistic 0=0 (x1, x2, x3.......xn) used to estimate the value of a parameter 0 of the population is called estimator of 0 whereas, any observed value of the statistic 0=0 (x1, x2, x3.......xn) is known as the estimate of 0 (Chukwu, 2007). II. Estimation Theory Statistical inference can be defined as the process by which conclusions are about some measure or attribute of a population (eg mean or standard deviation) based upon analysis of sample data. Statistical inference can be conveniently divided into two types- estimation and hypothesis testing (Lucey,2002).
    [Show full text]