Kaplan: Econometrics, Distributional and Nonparametric

Distributional and Nonparametric Econometrics Second edition David M. Kaplan Copyright © 2013, 2018, 2020, 2021 David M. Kaplan Licensed under the Creative Commons Attribution–NonCommercial–ShareAlike 4.0 In- ternational License (the “License”); you may not use this file or its source files except in compliance with the License. You may obtain a copy of the License at https: //creativecommons.org/licenses/by-nc-sa/4.0/legalcode, with a more readable summary at https://creativecommons.org/licenses/by-nc-sa/4.0. First nice edition, May 2020; second edition, January 2021 Updated March 11, 2021 To the tails. —DMK An economist was standing with one foot in a bucket of boiling water and the other foot in a bucket of ice. When asked how he felt, he replied, “On average I feel just fine.” Variation of quote attributed to Mark Twain As retold by Hansen(2020a, p. 29) Brief Contents Contents viii Preface xvii Textbook Learning Objectives xix Notation 1 Statistical Software Overview5 I Writing, Coding, and Logic9 1 Writing and Typesetting 11 2 R: Some Basics 29 3 Logic 51 II Quantile Methods 55 Introduction 57 4 Quantiles: Description and Prediction 59 5 Quantile Regression: Description and Prediction 69 6 Quantile Regression: Causality 77 v vi BRIEF CONTENTS 7 Quantile Regression: Endogeneity 87 III Distributional Methods 95 8 One-Sample, Two-Sided 97 9 Two-Sample, Two-Sided 105 10 Stochastic Dominance 109 11 Multiple Testing 115 IV Bootstrap and Friends 123 12 Bootstrap: Basics 125 13 Bootstrap Extensions and Subsampling 137 14 Bayesian Bootstrap 147 V Nonparametric Regression 163 Introduction 165 15 Nonparametric Methods: Preliminaries 167 16 Local (Kernel) Regression 173 17 Series and Sieves 189 18 Model Selection 195 19 Multiple Regressors 209 20 Nonparametric Regression in R 215 BRIEF CONTENTS vii VI Partial Identification 223 Introduction 225 21 Missing Data 227 22 Interval Data 239 23 Ordinal Data 243 Bibliography 251 Index 261 Contents Contents viii Preface xvii Textbook Learning Objectives xix Notation 1 Statistical Software Overview5 I Writing, Coding, and Logic9 1 Writing and Typesetting 11 1.1 LATEX ...................................... 11 1.2 Writing Advice ................................. 13 1.2.1 Striving................................. 14 1.2.2 Suppositions .............................. 15 1.2.3 Structure ................................ 15 1.2.4 Simplicity................................ 17 1.2.5 Segues (and Sentence Structure) ................... 18 1.2.6 Summary ................................ 21 1.2.7 Shawn’s Suggestions (bonus!)..................... 22 1.3 Plagiarism.................................... 24 1.4 Common Minor Mistakes ........................... 24 Exercises..................................... 28 2 R: Some Basics 29 2.1 Getting Help .................................. 29 2.2 Getting Started................................. 30 2.2.1 Running R ............................... 30 viii CONTENTS ix 2.2.2 Packages................................. 30 2.2.3 RStudio Interface............................ 31 2.2.4 Readability............................... 31 2.3 Data Types................................... 32 2.4 Basic Data Manipulation............................ 34 2.4.1 Numerical Operations ......................... 34 2.4.2 Combining Data ............................ 34 2.4.3 String Manipulation .......................... 35 2.5 Functions .................................... 37 2.6 Data File Input................................. 38 2.7 Basic Statistics................................. 39 2.8 Basic Plotting (Graphs) ............................ 40 2.9 Saving Text Output .............................. 41 2.10 Probability Distributions and Random Numbers............... 41 2.11 Control Flow: If, Loops, Errors........................ 42 2.11.1 If-Else Statements ........................... 42 2.11.2 For and While Loops.......................... 44 2.11.3 Try-Catch, Warnings, Errors ..................... 44 2.12 Time and Timing................................ 45 2.13 Parallel Computing (On Your Laptop).................... 46 2.14 Simulation: Example #1............................ 46 2.15 Simulation: Example #2............................ 48 Exercises..................................... 50 3 Logic 51 3.1 Terminology................................... 51 3.2 Assumptions .................................. 53 3.3 Theorems .................................... 53 II Quantile Methods 55 Introduction 57 4 Quantiles: Description and Prediction 59 4.1 Description ................................... 60 4.2 Formal Definitions ............................... 60 4.3 Prediction.................................... 60 4.4 Estimation and Sample Quantiles....................... 62 4.5 Censoring.................................... 64 4.6 Robustness and Efficiency........................... 66 x CONTENTS 4.7 Inference..................................... 67 5 Quantile Regression: Description and Prediction 69 5.1 Description ................................... 70 5.1.1 Conditional Quantile Function .................... 70 5.1.2 CQF Models .............................. 71 5.1.3 Monotonicity.............................. 71 5.2 Prediction.................................... 72 5.3 QR with Misspecification ........................... 72 5.3.1 “Best” Linear Predictor ........................ 73 5.3.2 “Best” Linear Approximation ..................... 73 5.4 Estimation ................................... 73 5.5 Asymptotic Properties............................. 74 5.6 Inference..................................... 75 5.7 Censoring.................................... 75 6 Quantile Regression: Causality 77 6.1 Background: Potential Outcomes and ATE ................. 77 6.2 Quantile Treatment Effects .......................... 79 6.3 Background: Random Coefficients ...................... 80 6.4 A Random Coefficients Model for QR .................... 81 6.4.1 The Model ............................... 81 6.4.2 Monotonicity and Identification.................... 82 6.4.3 Heteroskedasticity ........................... 83 6.5 Unconditional Quantile Regression ...................... 84 Exercises..................................... 85 7 Quantile Regression: Endogeneity 87 7.1 Instrumental Variables Quantile Regression ................. 87 7.1.1 Reminder: Usual IV Regression.................... 88 7.1.2 IVQR Identification .......................... 88 7.1.3 IVQR Estimation............................ 89 7.1.4 IVQR Inference............................. 90 7.2 Other Approaches to Endogeneity....................... 90 7.2.1 Triangular Model............................ 90 7.2.2 Local Quantile Treatment Effect ................... 90 7.3 Panel Data with Fixed Effects......................... 91 Exercises..................................... 93 III Distributional Methods 95 CONTENTS xi 8 One-Sample, Two-Sided 97 8.1 Warning: Weights ............................... 97 8.2 Discrete and Categorical Distributions.................... 98 8.3 Preliminary Results for Continuous Distributions.............. 98 8.4 Goodness-of-Fit Testing ............................ 98 8.5 Kolmogorov–Smirnov Test........................... 99 8.6 Uniform Confidence Band........................... 100 8.6.1 Test Inversion: Scalar .........................100 8.6.2 Test Inversion: Vectors and Functions . 101 8.6.3 Uniform Confidence Band....................... 101 8.A ECDF: Asymptotic Properties......................... 103 9 Two-Sample, Two-Sided 105 9.1 Setup ...................................... 106 9.2 Exact Finite-Sample Testing.......................... 106 9.3 Asymptotic KS................................. 107 10 Stochastic Dominance 109 10.1 First-Order Stochastic Dominance ......................110 10.2 Null of Dominance ............................... 110 10.2.1 KS Test................................. 111 10.2.2 Dirichlet Test.............................. 111 10.3 Null of Non-Dominance ............................111 Exercises..................................... 114 11 Multiple Testing 115 11.1 Multiple Testing: Concepts and Terms.................... 116 11.1.1 Familywise Error Rate......................... 116 11.1.2 Interpretation as Confidence Set ...................117 11.1.3 Alternatives to FWER......................... 117 11.1.4 Other Ways to Improve Power .................... 118 11.2 One-Sample, Two-Sided ............................118 11.2.1 KS and Dirichlet............................ 119 11.3 Two-Sample and/or One-Sided ........................ 119 Exercises..................................... 122 IV Bootstrap and Friends 123 12 Bootstrap: Basics 125 12.1 Introduction................................... 126 xii CONTENTS 12.2 Preliminaries: The Plug-in Principle .....................126 12.2.1 Example: Mean............................. 126 12.2.2 Example: OLS ............................. 127 12.2.3 Other Types of Parameters ......................127 12.3 The Real World and the Bootstrap World..................128 12.3.1 The Real World............................. 128 12.3.2 The Bootstrap World .........................129 12.4 Empirical Bootstrap .............................. 130 12.5 Standard Errors................................. 131 12.6 Confidence Intervals .............................. 131 12.6.1 CI Properties.............................. 132 12.6.2 Normal CI, Bootstrapped SE .....................132 12.6.3 Root Method.............................. 133 12.6.4 Percentile Bootstrap CI ........................135 12.6.5 Studentized Bootstrap CI .......................135

Kaplan: Econometrics, Distributional and Nonparametric

Least Squares Regression Principal Component Analysis a Supervised Dimensionality Reduction Method for Machine Learning in Scientiﬁc Applications

Nonparametric Regression

Enabling Deeper Learning on Big Data for Materials Informatics Applications

COVID-19 and Machine Learning: Investigation and Prediction Team #5

Kernel Smoothing Methods

Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels

Kernel Smoothing Toolbox ∗ for MATLAB

Krls: a Stata Package for Kernel-Based Regularized Least

Iprior: an R Package for Regression Modelling Using I-Priors

Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains

Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach

Deconvoluting Kernel Density Estimation and Regression for Locally Diferentially Private Data Farhad Farokhi