Econometric Data Science
Total Page:16
File Type:pdf, Size:1020Kb
Econometric Data Science Francis X. Diebold University of Pennsylvania October 22, 2019 1 / 280 Copyright c 2013-2019, by Francis X. Diebold. All rights reserved. All materials are freely available for your use, but be warned: they are highly preliminary, significantly incomplete, and rapidly evolving. All are licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. (Briefly: I retain copyright, but you can use, copy and distribute non-commercially, so long as you give me attribution and do not modify. To view a copy of the license, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.) In return I ask that you please cite the books whenever appropriate, as: "Diebold, F.X. (year here), Book Title Here, Department of Economics, University of Pennsylvania, http://www.ssc.upenn.edu/ fdiebold/Textbooks.html." The painting is Enigma, by Glen Josselsohn, from Wikimedia Commons. 2 / 280 Introduction 3 / 280 Numerous Communities Use Econometrics Economists, statisticians, analysts, "data scientists" in: I Finance (Commercial banking, retail banking, investment banking, insurance, asset management, real estate, ...) I Traditional Industry (manufacturing, services, advertising, brick-and-mortar retailing, ...) I e-Industry (Google, Amazon, eBay, Uber, Microsoft, ...) I Consulting (financial services, litigation support, ...) I Government (treasury, agriculture, environment, commerce, ...) I Central Banks and International Organizations (FED, IMF, World Bank, OECD, BIS, ECB, ...) 4 / 280 Econometrics is Special Econometrics is not just \statistics using economic data". Many properties and nuances of economic data require knowledge of economics for sucessful analysis. I Emphasis on predictions, guiding decisions I Observational data I Structural change I Volatility fluctuations ("heteroskedasticity") I Even trickier in time series: Trend, Seasonality, Cycles ("serial correlation") 5 / 280 Let's Elaborate on the \Emphasis on Predictions Guiding Decisions"... Q: What is econometrics about, broadly? A: Helping people to make better decisions I Consumers I Firms I Investors I Policy makers I Courts Forecasts guide decisions. Good forecasts promote good decisions. Hence prediction holds a distinguished place in econometrics, and it will hold a distinguished place in this course. 6 / 280 Types/Arrangements of Economic Data { Cross section Standard cross-section notation: i = 1; :::; N { Time series Standard time-series notation: t = 1; :::; T Much of our discussion will apply to both cross-section and time-series environments, but still we have to pick a notation. 7 / 280 A Few Leading Econometrics Web Data Resources (Clickable) Indispensible: I Resources for Economists (AEA) I FRED (Federal Reserve Economic Data) More specialized: I National Bureau of Economic Research I FRB Phila Real-Time Data Research Center I Many more 8 / 280 A Few Leading Econometrics Software Environments (Clickable) I High-Level: EViews, Stata I Mid-Level: R (also CRAN; RStudio; R-bloggers), Python (also Anaconda), Julia I Low-Level: C, C++, Fortran \High-level" does not mean \best", and \low-level" does not mean worst. There are many issues. 9 / 280 Graphics Review 10 / 280 Graphics Help us to: I Summarize and reveal patterns in univariate cross-section data. Histograms and density estimates are helpful for learning about distributional shape. Symmetric, skewed, fat-tailed, ... I Summarize and reveal patterns in univariate time-series data. Time Series plots are useful for learning about dynamics. Trend, seasonal, cycle, outliers, ... I Summarize and reveal patterns in multivariate data (cross-section or time-series). Scatterplots are useful for learning about relationships. Does a relationship exist? Is it linear or nonlinear? Are there outliers? 11 / 280 Histogram Revealing Distributional Shape: 1-Year Government Bond Yield 12 / 280 Time Series Plot Revealing Dynamics: 1-Year Goverment Bond Yield 13 / 280 Scatterplot Revealing Relationship: 1-Year and 10-Year Government Bond Yields 14 / 280 Some Principles of Graphical Style I Know your audience, and know your goals. I Appeal to the viewer. I Show the data, and only the data, withing the bounds of reason. I Avoid distortion. The sizes of effects in graphics should match their size in the data. Use common scales in multiple comparisons. I Minimize, within reason, non-data ink. Avoid chartjunk. I Third, choose aspect ratios to maximize pattern revelation. Bank to 45 degrees. I Maximize graphical data density. I Revise and edit, again and again (and again). Graphics produced using software defaults are almost never satisfactory. 15 / 280 Probability and Statistics Review 16 / 280 Moments, Sample Moments and Their Sampling Distributions I Discrete random variable, y I Discrete probability distribution p(y) I Continuous random variable y I Probability density function f (y) 17 / 280 Population Moments: Expectations of Powers of R.V.'s Mean measures location: X µ = E(y) = pi yi (discrete case) i Z µ = E(y) = y f (y) dy (continuous case) Variance, or standard deviation, measures dispersion, or scale: σ2 = var(y) = E(y − µ)2: { σ easier to interpret than σ2. Why? 18 / 280 More Population Moments Skewness measures skewness (!) E(y − µ)3 S = : σ3 Kurtosis measures tail fatness relative to a Gaussian distribution. E(y − µ)4 K = : σ4 19 / 280 Covariance and Correlation Multivariate case: Joint, marginal and conditional distributions f (x; y), f (x), f (y), f (xjy), f (yjx) Covariance measures linear dependence: cov(y; x) = E[(y − µy )(x − µx )]: So does correlation: cov(y; x) corr(y; x) = : σy σx Correlation is often more convenient. Why? 20 / 280 Sampling and Estimation N Sample : fyi gi=1 ∼ iid f (y) Sample mean: N 1 X y¯ = y N i i=1 Sample variance: PN 2 (yi − y¯) σ^2 = i=1 N Unbiased sample variance: PN 2 (yi − y¯) s2 = i=1 N − 1 21 / 280 More Sample Moments Sample skewness: 1 PN 3 (yi − y¯) S^ = N i=1 σ^3 Sample kurtosis: 1 PN 4 (yi − y¯) K^ = N i=1 σ^4 22 / 280 Still More Sample Moments Sample covariance: N 1 X cov(y; x) = [(y − y¯)(x − x¯)] d N i i i=1 Sample correlation: covd(y; x) corrd (y; x) = σ^y σ^x 23 / 280 Exact Finite-Sample Distribution of the Sample Mean (Requires iid Normality) 2 Simple random sampling : yi ∼ iid N(µ, σ ); i = 1; :::; N y¯ is unbiased and normally distributed with variance σ2=N. σ2 y¯ ∼ N µ, ; N and we estimate σ2 using s2, where PN (y − y¯)2 s2 = i=1 i : N − 1 s µ 2 y¯ ± t1− α (N − 1) p w:p: 1 − α 2 N y¯ − µ0 µ = µ0 =) ∼ t1− α (N − 1) ps 2 N where \t − α (N − 1)" denotes the appropriate critical value of the Student's t 1 2 density with N − 1 degrees of freedom 24 / 280 Large-Sample Distribution of the Sample Mean (Requires iid, but not Normality) 2 Simple random sampling : yi ∼ iid (µ, σ ); i = 1; :::; N y¯ is consistent and asymptotically normally distributed with variance v. a y¯ ∼ N(µ, v); and we estimate v usingv ^ = s2=N, where PN (y − y¯)2 s2 = i=1 i : N − 1 This is an approximate (large-sample) result, due to the central limit theorem. The \a" is for \asymptotically", which means \as N ! 1". s As N ! 1; µ 2 y¯ ± z1− α p w:p: 1 − α 2 N y¯ − µ0 As N ! 1; µ = µ0 =) ∼ N(0; 1) ps N 25 / 280 Wages: Distributions 26 / 280 Wages: Sample Statistics WAGE log WAGE Sample Mean 12:19 2.34 Sample Median 10:00 2.30 Sample Maximum 65:00 4.17 Sample Minimum 1:43 0.36 Sample Std. Dev. 7:38 0.56 Sample Skewness 1:76 0.06 Sample Kurtosis 7:93 2.90 Jarque-Bera 2027:86 1.26 (p = 0:00) (p = 0:53) t(H0 : µ = 12) 0.93 -625.70 (p = 0:36) (p = 0:00) 27 / 280 Regression 28 / 280 Regression A. As curve fitting. \Tell a computer how to draw a line through a scatterplot". (Well, sure, but there must be more...) B. As a probabilistic framework for optimal prediction. 29 / 280 Regression as Curve Fitting 30 / 280 Distributions of Log Wage, Education and Experience 31 / 280 Scatterplot: Log Wage vs. Education 32 / 280 Curve Fitting Fit a line: yi = β1 + β2xi Solve: N X 2 min (yi − β1 − β2xi ) β1,β2 i=1 \least squares" (LS, or OLS) \quadratic loss" 33 / 280 Actual Values, Fitted Values and Residuals Actual values: yi ; i = 1; :::; N Least-squares fitted parameters: β^1 and β^2 Fitted values:y ^i = β^1 + β^2xi ; i = 1; :::; N, (\hats" denote fitted things...) Residuals: ei = yi − y^i ; i = 1; :::; N. 34 / 280 Log Wage vs. Education with Superimposed Regression Line LWAGE\ = 1:273 + :081EDUC 35 / 280 Multiple Linear Regression (K RHS Variables) Solve: N X 2 min (yi − β1 − β2xi2 − ::: − βK xiK ) β1,...,βK i=1 Fitted hyperplane: y^i = β^1 + β^2xi2 + ::: + β^K xiK More compactly: K X y^i = β^k xik ; k=1 where xi1 = 1 for all i. Wage dataset: LWAGE\ = :867 + :093EDUC + :013EXPER 36 / 280 Regression as a Probability Model 37 / 280 An Ideal Situation (\The Ideal Conditions", or IC) 1. The data-generating process (DGP) is: yi = β1 + β2xi2 + ::: + βK xiK + "i 2 "i ∼ iidN(0; σ ) i = 1; :::; N; and the fitted model matches it exactly. 1.1 The fitted model is correctly specified 1.2 The disturbances are Gaussian 1.3 The coefficients (βk 's) are fixed 1.4 The relationship is linear 2 1.5 The "i 's have constant unconditional variance σ 1.6 The "i 's are uncorrelated 2. "i is independent of (xi1; :::; xiK ), for all i 2.1 E("i j xi1; :::; xiK ) = 0, for all i 2 2.2 var("i j xi1; :::; xiK ) = σ , for all i (Written here for cross sections.