Maximum Weighted Likelihood Estimation

Maximum Weighted Likelihood Estimation by Steven Xiaogang Wang B.Sc, Beijing Polytechnic University, P.R. China, 1991. M.S., University of California at Riverside, U.S.A., 1996. „ A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF Doctor of Philosophy in THE FACULTY OF GRADUATE STUDIES Department of Statistics We accept this thesis as conforming to the required standard THE UNIVERSITY OF BRITISH COLUMBIA June 21, 2001 ©Steven Xiaogang Wang, 2001 In presenting this thesis in partial fulfilment of the requirements for an advanced degree at the University of British Columbia, I agree that the Library shall make it freely available for reference and study. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the head of my department or by his or her representatives. It is understood that copying or publication of this thesis for financial gain shall not be allowed without my written permission. Department The University of British Columbia Vancouver, Canada DE-6 (2/88) Maximum Weighted Likelihood Estimation Steven X. Wang Abstract A maximum weighted likelihood method is proposed to combine all the relevant data from different sources to improve the quality of statistical inference especially when the sample sizes are moderate or small. The linear weighted likelihood estimator (WLE), is studied in depth. The weak consistency, strong consistency and the asymptotic normality of the WLE are proved. The asymptotic properties of the WLE using adaptive weights are also established. A procedure for adaptively choosing the weights by using cross-validation is proposed in the thesis. The analytical forms of the "adaptive weights" are derived when the WLE is a linear combination of the MLE's. The weak consistency and asymptotic nor• mality of the WLE with weights chosen by cross-validation criterion are established. The connection between WLE and theoretical information theory is discovered. The derivation of the weighted likelihood by using the maximum entropy principle is pre• sented. The approximations of the distributions of the WLE by using saddlepoint approximation for small sample sizes are derived. The results of the application to the disease mapping are shown in the last chapter of this thesis. ii Contents Abstract iii Table of Contents iii List of Figures vi List of Tables vii Acknowledgements viii 1 Introduction 1 1.1 Introduction 1 1.2 Local Likelihood and Related Methods 2 1.3 Relevance Weighted Likelihood Method 6 1.4 Weighted Likelihood Method 6 1.5 A Simple Example 8 1.6 The Scope of the Thesis 11 2 Motivating Example: Normal Populations 14 2.1 A Motivating Example 15 2.2 Weighted Likelihood Estimation 16 2.3 A Criterion for Assessing Relevance 18 iii 2.4 The Optimum WLE 22 2.5 Results for Bivariate Normal Populations 24 3 Maximum Weighted Likelihood Estimation 28 3.1 Weighted Likelihood Estimation 28 3.2 Results for One-Parameter Exponential Families 29 3.3 WLE On Restricted Parameter Spaces 32 3.4 Limits of Optimum Weights 38 4 Asymptotic Properties of the WLE 48 4.1 Asymptotic Results for the WLE 49 4.1.1 Weak Consistency 50 4.1.2 Asymptotic Normality 60 4.1.3 Strong Consistency 70 4.2 Asymptotic Properties of Adaptive Weights 74 4.2.1 Weak Consistency and Asymptotic Normality 74 4.2.2 Strong Consistency by Using Adaptive Weights 77 4.3 Examples 78 4.3.1 Estimating a Univariate normal Mean 78 4.3.2 Restricted Normal Means 79 4.3.3 Multivariate Normal Means 81 4.4 Concluding Remarks 83 5 Choosing Weights by Cross-Validation 85 5.1 Introduction 85 5.2 Linear WLE for Equal Sample Sizes 87 5.2.1 Two Population Case 88 iv 5.2.2 Alternative Matrix Representation of Ae and be 93 5.2.3 Optimum Weights X°f By Cross-validation 95 5.3 Linear WLE for Unequal Sample Sizes 96 5.3.1 Two Population Case 97 5.3.2 Optimum Weights By Cross-Validation 99 5.4 Asymptotic Properties of the Weights 102 5.5 Simulation Studies 107 6 Derivations of the Weighted Likelihood Functions 112 6.1 Introduction 112 6.2 Existence of the Optimal Density 114 6.3 Solution to the Isoperimetric Problem 116 6.4 Derivation of the WL Functions 117 7 Saddlepoint Approximation of the WLE 125 7.1 Introduction 125 7.2 Review of the Saddlepoint Approximation 125 7.3 Results for Exponential Family 128 7.4 Approximation for General WL Estimation 134 8 Application to Disease Mapping 137 8.1 Introduction 137 8.2 Weighted Likelihood Estimation 138 8.3 Results of the Analysis 141 8.4 Discussion 143 Bibliography 146 v List of Figures 2.1 A special case: the solid line is the max\p-a\<cMSE(WLE), and the broken line represents the MSE(MLE). The X-axis represents the value of Ai. The parameters have been set to be the following: = l,p = 0.5 and C = 1 21 8.1 Daily hospital admissions for CSD # 380 in the summer of 1983. 138 8.2 Hospital Admissions for CSD # 380, 362, 366 and 367 in 1983. ... 139 vi List of Tables 5.1 MSE of the MLE and the WLE and standard deviations of the squared errors for samples with equal sizes for N(0,1) and iV(0.3,1) 108 5.2 Optimum weights and their standard deviations for samples with equal sizes from N(0,1) and N(0.3,1) 109 5.3 MSE of the MLE and the WLE and their Standard deviations for samples with equal sizes from V(3) and V(3.6) 110 5.4 Optimum weights and their standard deviations for samples with equal sizes from V(3) and P(3.6) Ill 7.1 Saddlepoint Approximations of T(n + 1) = n\ 128 7.2 Saddlepoint Approximation of the Spread Density for m = 10 132 8.1 Estimated MSE for the MLE and the WLE 142 8.2 Correlation matrix and the weight function for 1984 144 8.3 MSE of the MLE and the WLE for CSD 380 145 vii Acknowledgments I am most grateful to my supervisor, Dr. James V. Zidek, whose advice and support are a source of invaluable guidance, constant encouragement and great in• spiration throughout my Ph.D. study at University of British Columbia. I am also very grateful to my co-supervisor, Dr. Constance van Eeden, who provides invaluable advice and rigorous training in technical writing. I wish to thank my thesis committee members Dr. John Petkau and Dr. Paul Gustafson whose insights and helps are greatly appreciated. I also wish to thank Dr. Ruben Zamar for his suggestions. Finally, I would like to thank my wife, Ying Luo, for her understanding and support. My other family members all have their shares in this thesis. viii Chapter 1 Introduction I.l Introduction Recently there has been increasing interest in. combining information from diverse sources. Effective methods for combining information are needed in a variety of fields, including engineering, environmental sciences, geosciences and medicine. Cox (1981) gives an overview on some methods for the combination of data such as weighted means and pooling in the presence of over-dispersion. An excellent survey of more current techniques of combining information and some concrete examples can be found in the report Combining Information written by the Committee on Applied and Theoretical Statistics, U.S. National Research Council (1992). This thesis will concentrate on the combination of information from separate sets of data. • When two or more data sets derive from similar variables measured on different samples of subjects are available to answer a given question, a judgment must be made whether the samples and variables are sufficiently similar that the data sets may be directly merged or whether some other method of combining information that only partially merges them is more appropriate. For instance, data sets may be 1 1.2. LOCAL LIKELIHOOD AND RELATED METHODS 2 time series data for ozone levels of each year. Or they may be hospital admissions for different geographical regions that are close to each other. The question is how to combine information from data sets collected under different conditions (and with • differing degrees of precision and bias) to yield more reliable conclusions than those available from a single information source. 1.2 Local Likelihood and Related Methods Local likelihood, introduced by Tibshirani and Hastie (1987), extends the idea of local fitting to likelihood-based regression models. Local regression may be viewed as a special case of the local likelihood procedure. Staniswalis (1989) defines her version of local likelihood in the context of non-parametric regression as follows: W(9) = J2w{^^)logf(yi]e) i=i where X{ are fixed and b is a single unknown parameter. Recently, versions of local likelihood for estimation have been proposed and discussed. The general form of local likelihood was presented by Eguchi and Copas (1998). The basic idea is to infuse local adaptation into the likelihood by considering n x — t ! L(t;x1,x2,...,xn) = K{--^—)logf{xi,e), i=l . where K = K(^^) is a kernel function with center t and bandwidth h. The local maximum likelihood estimate 0t of a parameter in a statistical model f(x; 9) is defined by maximizing the weighted version of the likelihood function which gives more weight to sample points near t. This does not give an unbiased estimating equation as it stands, and so the local likelihood approach introduces a correction factor to ensure consistent estimation. The resulting local maximum likelihood estimator (LMLE), 1.2.

Maximum Weighted Likelihood Estimation

The Matroid Theorem We First Review Our Definitions: a Subset System Is A

CONTINUITY in the ALEXIEWICZ NORM Dedicated to Prof. J

Cardinality Constrained Combinatorial Optimization: Complexity and Polyhedra

Some Properties of AP Weight Function

Importance Sampling

Adaptively Weighted Maximum Likelihood Estimation of Discrete Distributions

3 May 2012 Inner Product Quadratures

Depth-Weighted Estimation of Heterogeneous Agent Panel Data

Lecture 4 1 the Maximum Weight Matching Problem

Numerical Integration and Differentiation Growth And

Orthogonal Polynomials: an Illustrated Guide

Contents 3 Inner Product Spaces