Luke Stein Stanford graduate economics core (2006–7) Reference and review (last updated April 11, 2012)

Contents Conditional homoscedasticity 1.11 Decision theory ...... 14 Covariance Regression model Loss function 1 Econometrics ...... 4 Multivariate covariance Linear regression model with non- Decision rule 1.1 Probability foundations ...... 4 Correlation stochastic covariates Risk function squared error risk function Basic set theory Seemingly Unrelated Regressions Disjointness Probit model 1.12 Hypothesis testing ...... 14 Partition generating function Nonlinear least squares Hypothesis testing Sigma algebra Characteristic function Linear instrumental variables Test statistic Probability function Other generating functions 1.9 ...... 11 Type I/Type II error Probability space 1.5 Distributions ...... 7 Sufficient statistic Power Counting Factorization theorem Size Conditional probability Bivariate normal distribution Minimal sufficient statistic Level Bayes’ Rule Multivariate normal distribution Likelihood function Distance function principle Independence of events Chi squared distribution Ancillary statistic p-value Mutual independence of events Student’s t distribution Complete statistic Uniformly most powerful test Simple likelihood ratio statistic 1.2 Random variables ...... 5 Snedecor’s F distribution Basu’s theorem Neyman-Pearson Lemma Lognormal distribution Statistics in exponential families Exponential families Monotone likelihood ratio models Random vector 1.10 Point estimation ...... 11 Unbiased test Measurability Location and Scale families Estimator Uniformly most powerful unbiased Smallest σ-field Extremum estimator test Independence of r.v.s 1.6 Random samples ...... 9 Analogy principle Likelihood ratio test Independence of random vectors Random sample, iid Consistent estimator Wald statistic Mean independence Statistic Consistency with compact parameter Lagrange multiplier statistic Cumulative distribution function Unbiased estimator space Asymptotic significance Probability mass function Sample mean Consistency without compact param- Asymptotic size Marginal pmf Sample eter space Consistent test Conditional pmf Sample covariance Uniform (Weak) Law of Large Num- Probability density function Sample correlation bers 1.13 Time-series concepts ...... 15 Marginal pdf Order statistic Asymptotically normal estimator Stationarity Conditional pdf Samples from the normal distribution Asymptotic variance Covariance stationarity Borel Paradox Asymptotically efficient estimator White noise 1.7 Convergence of random variables . . . 9 Stochastic ordering Maximum Likelihood estimator Ergodicity Convergence in probability Support set Consistency for for MLE Martingale Uniform convergence in probability Asymptotic normality for MLE Random walk 1.3 Transformations of random variables . 6 Little o error notation M-estimator Martingale difference sequence Transformation 1 → 1 Big O error notation R R Asymptotic normality for M- Ergodic stationary martingale differ- Transformation 2 → 2 Order symbols R R estimator ences CLT Convolution formulae Asymptotic equivalence Method of Moments estimator Lag operator Probability integral transformation Convergence in L p Generalized Method of Moments esti- Moving average process Almost sure convergence 1.4 Properties of random variables . . . . 6 mator Autoregressive process of degree one Convergence in distribution , mean Consistency for GMM estimators Autoregressive process of degree p Central Limit Theorem for iid samples Conditional expectation Asymptotic normality for GMM esti- ARMA process of degree (p, q) Central Limit Theorem for niid sam- Two-way rule for expectations mator Autoregressive conditional het- ples Law of Iterated Expectations Efficient GMM eroscedastic process Slutsky’s Theorem Median Minimum Distance estimator GARCH process Delta Method Mode Asymptotic normality for MD estima- Local level model Symmetric distribution 1.8 Parametric models ...... 10 tor Estimating number of lags Moment Parametric model Uniformly minimum variance unbi- Estimating AR(p) Variance Parameter ased estimator Maximum likelihood with serial corre- lation Multivariate variance Identification Fisher Information Conditional variance Identification in exponential families Cram´er-RaoInequality 1.14 Ordinary least squares ...... 17

1 Ordinary least squares model Maximum likelihood for SUR Shutdown Rationalization: h and differentiable Ordinary least squares estimators Limited Information Maximum Like- Nonincreasing returns to scale e Asymptotic properties of OLS estima- lihood Nondecreasing returns to scale Rationalization: differentiable h tor Rationalization: differentiable x 1.18 Unit root processes ...... 24 Constant returns to scale Rationalization: differentiable e Estimating S Integrated process of order 0 Convexity Biases affecting OLS Integrated process of order d Transformation function Slutsky equation Maximum likelihood estimator for Unit root process Marginal rate of transformation Normal good OLS model Brownian motion Production function Inferior good Best linear predictor Function Central Limit Theorem Marginal rate of technological substi- Regular good Fitted value Dickey-Fuller Test tution Giffen good Projection matrix Profit maximization Substitute goods Annihilator matrix 1.19 Limited dependent variable models . . 24 Rationalization: profit maximization Complement goods Standard error of the regression Binary response model functions Gross substitute Sampling error Probit model Loss function Gross complement OLS R2 Logit model Hotelling’s Lemma Engle curve Standard error Truncated response model Substitution matrix Offer curve Robust standard error Tobit model Law of Supply Roy’s identity OLS t test Sample selection model Rationalization: y(·) and differen- Consumer welfare: price changes OLS robust t test 1.20 Inequalities ...... 25 tiable π(·) Compensating variation OLS F test Bonferroni’s Inequality Rationalization: differentiable y(·) Equivalent variation OLS robust Wald statistic Boole’s Inequality Rationalization: differentiable π(·) Marshallian consumer surplus Generalized least squares Chebychev’s Inequality Rationalization: general y(·) and π(·) Price index Feasible generalized least squares Markov’s Inequality Producer Surplus Formula Aggregating consumer demand Weighted least squares Numerical inequality lemma Single-output case 2.5 Choice under uncertainty ...... 32 Frisch-Waugh Theorem H¨older’sInequality Cost minimization Lottery Short and long regressions Cauchy-Schwarz Inequality Rationalization: single-output cost Preference axioms under uncertainty 1.15 Linear Generalized Method of Moments 19 Minkowski’s Inequality function Expected utility function Linear GMM model Jensen’s Inequality Monopoly pricing Bernoulli utility function Instrumental variables estimator 1.21 Thoughts on MaCurdy questions . . . 25 2.3 Comparative statics ...... 30 von Neumann-Morgenstern utility Linear GMM estimator Implicit function theorem function GMM hypothesis testing 2 Microeconomics ...... 27 Envelope theorem Risk aversion Efficient GMM Certain equivalent 2.1 Choice Theory ...... 27 Envelope theorem (integral form) Testing overidentifying restrictions Absolute risk aversion Rational preference relation Increasing differences Two-Stage Least Squares Certain equivalent rate of return Choice rule Supermodularity Durbin-Wu-Hausman test Relative risk aversion Revealed preference Submodularity First-order stochastic dominance 1.16 Linear multiple equation GMM . . . . 21 Houthaker’s Axiom of Revealed Pref- Topkis’ Theorem Second-order stochastic dominance Linear multiple-equation GMM erences Monotone Selection Theorem Demand for insurance model Weak Axiom of Revealed Preference Milgrom-Shannon Monotonicity The- Portfolio problem Linear multiple-equation GMM esti- Generalized Axiom of Revealed Pref- orem Subjective probabilities mator erence MCS: robustness to objective function Full-Information Instrumental Vari- Utility function perturbation 2.6 General equilibrium ...... 33 ables Efficient Interpersonal comparison Complement inputs Walrasian model Three-Stage Least Squares Continuous preference Substitute inputs Walrasian equilibrium Multiple equation Two-Stage Least Monotone preference LeChatelier principle Feasible allocation Squares Locally non-satiated preference 2.4 Consumer theory ...... 31 Pareto optimality Seemingly Unrelated Regressions Convex preference Budget set Edgeworth box Multiple-equation GMM with com- Homothetic preference Utility maximization problem First Welfare Theorem mon coefficients Separable preferences Indirect utility function Second Welfare Theorem Random effects estimator Quasi-linear preferences Marshallian demand correspondence Excess demand Pooled OLS Lexicographic preferences Walras’ Law Sonnenschein-Mantel-Debreu Theo- rem 1.17 ML for multiple-equation linear models 23 2.2 Producer theory ...... 28 Expenditure minimization problem Gross substitutes property Structural form Competitive producer behavior Expenditure function Incomplete markets Reduced form Production plan Hicksian demand correspondence Rational expectations equilibrium Full Information Maximum Likeli- Production set Relating Marshallian and Hicksian de- hood Free disposal mand 2.7 Games ...... 34

2 Game tree Feasible payoff Imperfect competition: final goods One-sided lack of commitment Perfect recall Individually rational payoff Imperfect competition: intermediate Two-sided lack of commitment Perfect information Folk Theorem goods Bulow-Rogoff defaultable debt model Complete information Nash reversion 3.3 General concepts ...... 40 Hyperbolic discounting Strategy Finitely repeated game Cobb-Douglas production function 3.8 John Taylor ...... 44 Payoff function Stick-and-carrot equilibrium Constant relative risk aversion utility Taylor Rule Normal form 2.12 Collusion and cooperation ...... 37 function Granger causality Revelation principle Core Additively separability with station- Okun’s Law Proper subgame Core convergence ary discounting Cagan money demand System of beliefs Nontransferable utility game Intertemporal Euler equation Rational expectations models Strictly mixed strategy Transferable utility game Intertemporal elasticity of substitu- Cagan model 2.8 Dominance ...... 35 Shapley value tion Dornbush model Dominant strategy Simple game General equilibrium Philips Curve Dominated strategy Convex game Inada conditions Time inconsistency Iterated deletion of dominated strate- One-sided matching Transversality condition Lucas supply function gies Two-sided matching No Ponzi condition Weakly dominated strategy Dynamic systems 4 Mathematics ...... 44 2.13 Principal-agent problems ...... 38 First Welfare Theorem 2.9 Equilibrium concepts ...... 35 Moral hazard 4.1 General mathematical concepts . . . . 44 Ramsey equilibrium Rationalizable strategy Adverse selection Elasticity Pure strategy Nash equilibrium Signaling 3.4 Dynamic programming mathematics . 41 Euler’s law Mixed strategy Nash equilibrium Screening Metric space Strong set order Trembling-hand perfection Normed vector space Meet 2.14 Social choice theory ...... 39 Correlated equilibrium Supremum norm ( n) Join Voting rule R Bayesian Nash equilibrium Euclidean norm Sublattice Neutrality Subgame perfect Nash equilibrium Continuous function Orthants of Euclidean space Anonymity Sequentially rational strategy Supremum norm (real-valued func- Sum of powers Monotonicity Weak perfect Bayesian equilibrium tions) Geometric series May’s Theorem Perfect Bayesian equilibrium Convergent sequence Unanimity 4.2 Set theory ...... 45 Consistent strategy Cauchy sequence Arrow’s independence of irrelevant al- Supremum limit Sequential equilibrium Complete metric space ternatives Infimum limit Extensive form trembling hand per- Contraction Arrow’s Theorem Partition fection Contraction Mapping Theorem Condorcet winner Convex set Contraction on subsets 2.10 Basic game-theoretic models ...... 36 Condorcet-consistent rule Blackwell’s sufficient conditions for a 4.3 Binary relations ...... 45 Cournot competition Gibbard-Satterthwaite Theorem Bertrand competition contraction Complete Vickrey-Clarke-Groves mechanism Bertrand competition—non-spatial Principle of Optimality Transitive Pivotal mechanism differentiation Dynamic programming: bounded re- Symmetric Bertrand competition—horizontal turns Negative transitive 3 Macroeconomics ...... 39 differentiation Benveniste-Scheinkman Theorem Reflexive 3.1 Models ...... 39 Irreflexive Bertrand competition—vertical dif- 3.5 Continuous time ...... 42 Two period intertemporal choice Asymmetric ferentiation Dynamic systems in continuous time model Equivalence relation Sequential competition Linearization in continuous time Herfindahl-Hirshman index Neoclassical growth model (discrete NCGM in continuous time 4.4 Functions ...... 45 Lerner index time) Log-linearization Function Monopolistic competition Endowment economy, Arrow-Debreu Surjective 3.6 Uncertainty ...... 42 Entry deterrence Endowment economy, sequential mar- Injective kets Uncertainty: general setup Vickrey auction Bijective Lucas tree First-price sealed bid auction Production economy, Arrow-Debreu Monotone Risk-free bond pricing English auction Production economy, sequential mar- Convex function kets Stock pricing Dutch auction Binomial theorem Overlapping generations Public good 3.7 Manuel Amador ...... 43 Gamma function Real Business Cycles model Spence signaling model Lucas cost of busines cycles Beta function Optimal taxation—primal approach 2.11 Repeated games with complete infor- Incomplete markets: finite horizon Logistic function mation ...... 37 3.2 Imperfect competition ...... 40 Incomplete markets: infinite horizon Taylor series Infinitely repeated game Imperfect competition model Hall’s martingale hypothesis Derivative of power series

3 Taylor series examples 4.6 Linear algebra ...... 46 Eigen decomposition Partitioned matrices Mean value expansion Matrix Orthogonal decomposition Kronecker product 4.5 Correspondences ...... 46 Matrix multiplication Matrix squareroot Deviation from Correspondence Determinant Eigenvector, eigenvalue Matrix inverse Positive definiteness, &c. Lower semi-continuity 5 References ...... 48 Upper semi-continuity Orthogonal matrix Orthogonal completion Brouwer’s Fixed Point Theorem Trace Idempotent matrix Kakutani’s Fixed Point Theorem Linear independence, rank Matrix derivatives Statistical tables ...... 48

1 Econometrics 1. P (A) ≥ 0 for all A ∈ B; 1. If A and B are disjoint (A ∩ B = ∅), then P (A|B) = P (B|A) = 0. 2. P (S) = 1; 1.1 Probability foundations S 3. If {Ai} ⊆ B are pairwise disjoint, then P ( Ai) = 2. If P (B) = 1, then ∀A, P (A|B) = P (A). P i Basic set theory (C&B 1.1.4) All sets A, B, C satisfy: i P (Ai) (countable additivity for pairwise disjoint sets). 3. If A ⊆ B, then P (B|A) = 1 and P (A|B) = P (A)/P (B). 1. Commutativity: A ∪ B = B ∪ A and A ∩ B = B ∩ A; 2. Associativity: A ∪ (B ∪ C) = (A ∪ B) ∪ C and A ∩ (B ∩ For any probability function P and A, B ∈ B, 4. If A and B are mutually exclusive, P (A|A ∪ B) = C) = (A ∩ B) ∩ C; P (A)/[P (A) + P (B)]. 1. P (∅) = 0; 3. Distributive laws: A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C) and A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C); 2. P (A) ≤ 1; 5. P (A ∩ B ∩ C) = P (A|B ∩ C) · P (B|C) · P (C). 4. DeMorgan’s Laws: (A∪B)C = AC ∩BC and (A∩B)C = 3. P (AC) = 1 − P (A); C C A ∪ B . 4. P (B ∩AC) = P (B)−P (A∩B)(B but not A is B minus Bayes’ Rule (C&B 1.3.5) A formula for “turning around” condi- both A and B); Disjointness (C&B 1.1.5) Two events are disjoint (a.k.a. mutually tional probabilities: P (A|B) = P (B|A) · P (A)/P (B). More exclusive) iff A∩B = ∅. Events {Ai} are pairwise disjoint or 5. P (A ∪ B) = P (A) + P (B) − P (A ∩ B); generally, if {Ai} partition the sample space and B is any set, then ∀i, mutually exclusive iff Ai ∩ Aj = ∅ for all i 6= j. Two events 6. If A ⊆ B, then P (A) ≤ P (B). with nonzero probability cannot be both mutually exclusive and independent (see ex. 1.39). If {C } partitions A, then P (A) = P P (A ∩ C ). i i i P (B|A ) · P (A ) P (A |B) = i i . Partition (C&B 1.1.6) A1,A2,... is a partition of S iff: i P Probability space (Hansen 1-31, 1-28) (Ω, F,P ) where: j P (B|Aj ) · P (Aj ) 1. S = S A (i.e., covering); i i 1. Ω is the universe (e.g., S, the sample space); 2. A1,A2,... are pairwise disjoint (i.e., non-overlapping). 2. F is the σ-field (e.g., B1); Independence of events (C&B 1.3.7, 9, 4.2.10) A, B statistically Sigma algebra (C&B 1.2.1) Collection of subsets of S (i.e., a subset 3. P a probability measure (e.g., P, the probability mea- of the power set of S) is a sigma algebra, denoted B iff: independent iff P (A ∩ B) = P (A) · P (B) or identically iff sure that governs all random variables). P (A|B) = P (A) (this happens iff P (B|A) = P (B)). 1. ∈ B; ∅ A random variable X induces a probability measure PX de- C 2. If A ∈ B, then A ∈ B (closed under fined by PX (B) ≡ P (X ∈ B) = P (F ). This gives the prob- complementation—along with first axiom gives S ∈ B); 1. Iff A and B are independent, then the following pairs ability space (R, B,PX ). C C C S are also independent: A and B , A and B, A and 3. If {Ai} ⊆ B, then i Ai ∈ B (closed under countable BC. unions). Counting (C&B sec. 1.2.3) The number of possible arrangement of size r from n objects is A cdf completely determines the of 2. Two events with nonzero probability cannot be both a random variable if its probability function is defined only No replacement With replacement mutually exclusive and independent (see ex. 1.39). 1 n! r for events in the Borel field B , the smallest sigma algebra Ordered (n−r)! n 3. If X, Y independent r.v.s, then for any A, B ⊆ R, containing all the intervals of real numbers of the form (a, b), Unordered n n+r−1 [a, b), (a, b], [a, b]. If probabilities are defined for a larger class r r events {X ∈ A} and {Y ∈ B} are independent events. of events, two random variables may have the same cdf but where n ≡ n! . (Unordered with replacement, a.k.a. not the same probability for every event (see C&B p. 33). r r!(n−r)! “stars and bars.”) Mutual independence of events (C&B 1.3.12) A collection of Probability function (C&B 1.2.4, 8–9, 11) Given a sample space S events {Ai} are mutually independent iff for any subcollec- (C&B 1.3.2, ex. 1.38) T Q and associated sigma algebra B, a function P : B → R is a Conditional probability For A, B ∈ S with tion Ai1 ,...,Aik we have P ( j Aij ) = j P (Aij ). Note probability function iff it satisfies the Kolmogorov Axioms P (B) > 0, the conditional probability of A given B is that pairwise independence does not imply mutual indepen- or Axioms of Probability: P (A|B) ≡ P (A ∩ B)/P (B). dence.

4 1.2 Random variables 2. The conditional distribution of any subset of the coor- Conditional pmf (C&B 4.2.1) For (X,Y ) a discrete random vec- dinates, given the values of the rest of the coordinates, tor, f(y|x) ≡ P (Y = y|X = x) = f(x, y)/fX (x), where Random variable (1.4.1, 1.5.7–8, 10, sec. 3.2) A function X : S → R is the same as the marginal distribution of the subset. f(x, y) is joint pmf, and fX (x) is marginal pmf. where S is a sample space. 3. Mutual independence implies pairwise independence, Probability density function (C&B 1.6.3, 5, 4.1.10) For a con- but pairwise independence does not imply mutual in- 1. Continuous iff its cdf is a continuous function and dis- tinuous r.v., fX (x) defined as the function which satisfies dependence. R x crete iff its cdf is a step function (i.e., if sample space FX (x) = −∞ fX (t) dt for all x. A function fX is a pdf iff: is countable). Mean independence (Metrics P.S. 3-4c, Metrics section) X is mean 1. ∀x, f (x) ≥ 0; 2. Identically distributed iff ∀A ∈ B1,P (X ∈ A) = P (Y ∈ independent of Y (written X ⊥⊥mY ) iff E(X|Y ) = E(X). X R A), or identically iff ∀x, FX (x) = FY (x). 2. fX (x) dx = 1. R 1. Mean independence is not transitive (i.e., X ⊥⊥mY does 3. Note identical distribution says nothing about not imply that Y ⊥⊥mX). fX gives the probability of any event: P (X ∈ B) = (in)dependence. R 1(x∈B)fX (x) dx. 2. Independence implies mean independence (i.e., X ⊥⊥ R Random vector (C&B 4.1.1) n-dimensional random vector is a Y =⇒ X ⊥⊥mY ∧ Y ⊥⊥mX). A continuous (in all dimensions) random vector X has n n function X: S → R where S is a sample space. joint pdf fX (x1, . . . , xn) iff ∀A ⊆ R ,P (X ∈ A) = 3. X ⊥⊥ mY =⇒ E[(X|g(Y )] = E[X], for any function R R g(·). ··· A fX (x1, . . . , xn) dx1 ··· dxn. Measurability (Hansen 1-28, 4-11) A r.v. X : (Ω, F) → (R, B) is F- measurable iff ∀B ∈ B, {ω ∈ Ω: X(ω) ∈ B} ∈ F (i.e., the 4. X ⊥⊥mY =⇒ Cov(X, g(Y )) = 0 for any function g(·). Marginal pdf (C&B p. 145, 178) For a continuous (in all dimen- preimage of every element of B is an element of F). sions) random vector, Cumulative distribution function (C&B 1.5.1, 3, p. 147) If F and G are both σ-fields with G ⊆ F, then X is G- FX (x) ≡ P (X ≤ x). By the Fundamental Theorem of Z Z d measurable =⇒ X is F-measurable (i.e., if the preimage of Calculus, F (x) = f (x) for a continuous r.v. at conti- fXi (xi) ≡ ··· fX (x) dx1 ··· dxi−1 dxi+1 ··· dxn, dx X X n−1 every B is in G, it is also in its superset F). R nuity points of fX . A function F is a cdf iff: i.e., hold Xi = xi, and integrate fX over R in all Xj for Smallest σ-field (Hansen 4-12) The smallest σ-field that makes a 1. limx→−∞ F (x) = 0 and limx→∞ F (x) = 1; i 6= j. r.v. Z : (Ω, F) → (R, B) measurable is σ(Z) ≡ {G ⊆ Ω: ∃B ∈ B, G = Z−1(B)} (i.e., the set of preimages of elements of 2. F (·) nondecreasing; We can also take the marginal pdf for multiple i by holding these and integrating f over in all X that aren’t being B). 3. F (·) right-continuous; i.e., ∀x0, limx↓x F (x) = F (x0). X R j 0 held.

Independence of r.v.s (C&B 4.2.5, 7, p. 154, 4.3.5) X and Y inde- A random vector X has joint cdf FX (x1, . . . , xn) ≡ P (X1 ≤ pendent r.v.s (written X ⊥⊥ Y ) iff any of the following equiv- x1,...,Xn ≤ xn). By the Fundamental Theorem of Calcu- Conditional pdf (C&B 4.2.3, p. 178) For (X,Y ) a continuous ran- n alent conditions hold: ∂ dom vector, f(y|x) ≡ f(x, y)/fX (x) as long as fX (x) 6= 0, lus, FX (~x) = fX (~x) for a continuous (in all dimen- ∂x1···∂xn where f(x, y) is joint pdf, and f (x) is marginal pdf. sions) random vector at continuity points of f . X 1. ∀A, B, P (X ∈ A, Y ∈ B) = P (X ∈ A) · P (Y ∈ B). X We can also condition for/on multiple coordinates: 2. FXY (x, y) ≡ P (X ≤ x, Y ≤ y) = FX (x)FY (y). Probability mass function (C&B 1.6.1, 5, 4.1.3) For a discrete r.v., e.g., for (X1,X2,X3,X4) a continuous random vector, f (x) ≡ P (X = x). A function f is a pmf iff: f(x , x |x , x ) ≡ f(x , x , x , x )/f (x , x ), where f 3. f(x, y) = f (x)f (y) (i.e., joint pdf/pmf is the product X X 3 4 1 2 1 2 3 4 X1X2 1 2 X Y is a joint pdf, and f is the marginal pdf in X and X . of marginal pdfs/pmfs). X1X2 1 2 1. ∀x, fX (x) ≥ 0; Borel Paradox (4.9.3) Be careful when we condition on events of 4. f(y|x) = fY (y) (i.e., conditional pdf/pmf equals 2. P f (x) = 1. marginal pdf/pmf). x X probability zero: two events of probability zero may be equiv- alent, but the probabilities conditional on the two events is 5. ∃g(x), h(y), ∀x, y, f(x, y) = g(x)h(y) (i.e., joint fX gives the probability of any event: P (X ∈ B) = P 1 f (x ). different! pdf/pmf is separable). Note that functional forms may k (xk∈B) X k appear separable, but limits may still depend on the A discrete random vector X has joint pmf fX (~v) ≡ P (X = Stochastic ordering (C&B ex. 1.49, ex. 3.41-2) cdf F stochastically other variable; if the support set of (X,Y ) is not a cross X ~v). greater than cdf F iff F (t) ≤ F (t) at all t, with strict product, then X and Y are not independent. Y X Y inequality at some t. This implies P (X > t) ≥ P (Y > t) at Marginal pmf (C&B 4.1.6, p. 178) For a discrete random vector, all t, with strict inequality at some t. For any functions g(t) and h(t), X ⊥⊥ Y =⇒ g(X) ⊥⊥ h(Y ). X A family of cdfs {F (x|θ)} is stochastically increasing in θ iff fXi (xi) ≡ P (Xi = xi) = fX (x); Independence of random vectors (C&B 4.6.5) X ,..., X mu- θ > θ =⇒ F (x|θ ) stochastically greater than F (x|θ ). 1 n n−1 1 2 1 2 x−i∈R tually independent iff for every (x1,..., xn), the joint A location family is stochastically increasing in its location pdf/pmf is the product of the marginal pdfs/pmfs; i.e., parameter; if a scale family has sample space [0, ∞), it is i.e., hold X = x , and sum f over all remaining possible f(x ,..., x ) = Q f (x ). i i X stochastically increasing in its scale parameter. 1 n i Xi i values of X.

1. Knowledge about the values of some coordinates gives We can also take the marginal pmf for multiple i by holding Support set (C&B eq. 2.1.7) Support set (a.k.a. support) of a r.v. us no information about the values of the other coordi- these and summing fX over all remaining possible values of X is X ≡ {x: fX (x) > 0}, where fX a cdf or pdf (or in nates. X. general, any nonnegative function).

5 1.3 Transformations of random variables x = h1i(u, v) and y = h2i(u, v) be the inverses, and Ji the 2. E[g(X)h(Y )|X] = g(X) E[h(Y )|X] for any functions g Jacobian, on Ai; and h. 1 1 Transformation R → R (C&B 2.1.3, 5, 8) A discrete r.v. can be (P 3. X ⊥⊥ Y =⇒ E(Y |X) = E(Y ) (i.e., knowing X gives us transformed into a discrete r.v. A continuous r.v. can be i fXY (h1i(u, v), h2i(u, v)) |Ji| , (u, v) ∈ B; fUV (u, v) = no additional information about E Y ). transformed into either a continuous or a discrete r.v. (or 0, otherwise. mixed!). When Y = g(X) and Y ≡ g(X ) (where X is the 4. E(Y |X) = E(Y ) =⇒ Cov(X,Y ) = 0 n n support of X), For generalization to R → R case for n > 2, see C&B 5. E(Y |X) is the MSE minimizing predictor of Y based p. 185. 2 on knowledge of X, (i.e., ming(x) E[Y − g(X)] = 1. If g monotone increasing on X , then FY (y) = 2 −1 E[Y − E(Y |X)] ). FX (g (y)) for y ∈ Y; Convolution formulae (C&B 5.2.9, ex. 5.6) X ⊥⊥ Y both continu- ous. Then: 2. If g monotone decreasing on X and X a continuous r.v., Let X be a r.v. that takes values in (R, B), let G = σ(X) −1 then FY (y) = 1 − FX (g (y)) for y ∈ Y. R (i.e., the smallest sigma field measuring X), and assume 1. fX+Y (z) = fX (w)fY (z − w) dw. R E |Y | < ∞. Conditional expected value of Y given X, is −1 R If g monotone, fX continuous on X , and g has continuous 2. fX−Y (z) = fX (w)fY (w − z) dw. defined implicitly (and non-uniquely) as satisfying: derivative on Y, then: R 3. f (z) = R | 1 |f (w)f (z/w) dw. XY w X Y 1. E | E(Y |X)| < ∞; ( R f g−1(y) d g−1(y) , y ∈ Y; R X dy 4. fX/Y (z) = |w|fX (wz)fY (w) dw. fY (y) = R 2. E(Y |X) is G-measurable (i.e., Y |X cannot rely on more 0, otherwise. information than X does); Probability integral transformation (C&B 2.1.10, ex. 2.10) If Y = F (X) (for X continuous) then Y ∼ Unif(0, 1). Can 3. ∀G ∈ G, R E(Y |X) dP (ω) = R Y dP (ω) (i.e., If {A }k partitions X , with P (X ∈ A ) = 0; f continu- X G G i i=0 0 X E[E(Y |X)|X ∈ G] = E[Y |X ∈ G]); k be used to generate random samples from a particular dis- ous on each Ai; and ∃{Ai}i=1 satisfying: tribution: generate a uniform random and apply inverse of R R cdf of target distribution. If X is discrete, Y is stochastically where the notation B · dPX (x) means B · fX (x) dx if X is 1. g(x) = gi(x) for x ∈ Ai, P greater than Unif(0, 1). continuous, and means x∈B · fX (x) if X is discrete. 2. gi monotone on Ai, Two-way rule for expectations (C&B p. 58, ex. 2.21) If Y = 3. ∃Y, ∀i, gi(Ai) = Y (i.e., all Ai have same image un- R 1.4 Properties of random variables g(X), then E g(X) = E Y ; i.e., g(x)fX (x) dx = der their respective gis) [Hansen note 2-15 suggests this R R yfY (y) dy. need not hold], R Expected value, mean (C&B 2.2.1, 5–6, 4.6.6) 4. ∀i, g−1 has a continuous derivative on Y; then: Law of Iterated Expectations (C&B 4.4.3; Hansen 4-21) E X = i (R g(x)fX (x) dx, if X continuous; E[E(X|Y )], provided the expectations exist. More generally, E g(X) ≡ R ( k  −1  d −1 P when L ⊆ M (i.e., L contains less information, M contains P f g (y) g (y) , y ∈ Y; x∈X g(x)fX (x) dx, if X discrete; i=1 X i dy i fY (y) = more), 0, otherwise. provided the integral or sum exists and that E |g(X)| 6= ∞. E[X|L] = E[E(X|M)|L] = E[E(X|L)|M]. For constants a, b, c and functions g1, g2 such that E(g1(X)), 2 2 Transformation R → R (C&B p. 158, 185) Let U = g1(X,Y ) and E(g2(X)) exist, V = g2(X,Y ) where: 1 Median (C&B ex. 2.17–18) m such that P (X ≤ m) ≥ 2 and 1. E[ag1(X) + bg2(X) + c] = a E(g1(X)) + b E(g2(X)) + c P (X ≥ m) ≥ 1 . If X continuous, the median minimizes 1. (X,Y ) has pdf fXY and support A; (i.e., expectation is a linear operator); 2 absolute deviation; i.e., mina E |X − a| = E |X − m|. 2. g1 and g2 define a 1-to-1 transformation from A to 2. If ∀x, g1(x) ≥ 0, then E g1(X) ≥ 0; B ≡ {(u, v): u ∈ g (A), v ∈ g (A)} (i.e., the support of Mode (C&B ex. 2.27) f(x) is unimodal with mode equal to a iff 1 2 3. If ∀x, g (x) ≥ g (x), then E g (X) ≥ E g (X); (U, V )); 1 2 1 2 a ≥ x ≥ y =⇒ f(a) ≥ f(x) ≥ f(y) and a ≤ x ≤ y =⇒ 4. If ∀x, a ≤ g1(x) ≤ b, then a ≤ E g1(X) ≤ b. f(a) ≥ f(x) ≥ f(y). 3. Inverse transform is X = h1(U, V ) and Y = h2(U, V ); then: The mean is the MSE minimizing predictor for X; i.e., 1. Modes are not necessarily unique. 2 2 ( minb E(X − b) = E(X − E X) . If X1,...,Xn mutually fXY (h1(u, v), h2(u, v)) |J| , (u, v) ∈ B; 2. If f is symmetric and unimodal, then the point of sym- fUV (u, v) = independent, then E[g1(X1) ····· gn(Xn)] = E[g1(X1)] ····· 0, otherwise; metry is a mode. E[gn(Xn)].

where J is the Jacobian, Symmetric distribution (C&B ex. 2.25–26) If fX is symmetric Conditional expectation (C&B p. 150; Hansen 4-14–6; Hayashi 138–9) about a (i.e., ∀ε, f (a + ε) = f (a − ε)), then: " # a.k.a. regression of Y on X. E(Y |X) is a r.v. which is a func- X X ∂x ∂x P J ≡ det ∂u ∂v . tion of X. For discrete (X,Y ), E(g(Y )|x) ≡ g(y)f(y|x). ∂y ∂y R Y 1. X and 2a − X are identically distributed; ∂u ∂v For continuous (X,Y ), E(g(Y )|x) ≡ g(y)f(y|x) dy. Con- R ditional expected value has all the standard properties of 2. If a = 0, then MX is symmetric about 0; If the transformation is not 1-to-1, we can partition A into expected value. Also: 3. a is the median; {Ai} such that 1-to-1 transformations exist from each Ai to B which map (x, y) 7→ (u, v) appropriately. Letting 1. E[g(X)|X] = g(X) for any function g. 4. If E X exists, then E X = a.

6 5. For odd k, the kth central moment µk is zero (if it ex- 5. Cov(aX +bY, cX +dY ) = ac Var(X)+bd Var(Y )+(ad+ 2. A cf completely determines a distribution: if the cfs of a ists); if the distribution is symmetric about 0, then all bc) Cov(X,Y ). sequence of r.v.s converge to φX in some neighborhood odd moments are zero (if they exist). of zero, then the cdfs of the sequence converge to FX Multivariate covariance (Hansen 5-28; Hayashi 75–6) Cov(X, Y) ≡ at all points where FX is continuous. 0 0 0 Moment (C&B 2.3.1, 11; Hansen 2-37) For n ∈ , the nth moment Z E[(X − E X)(Y − E Y) ] = E(XY ) − (E X)(E Y ). Thus: −t2/2 0 n 0 3. For X ∼ N(0, 1), φX (t) = e . of X is µn ≡ E X . Also denote µ1 = E X as µ. The nth n 0 central moment is µn ≡ E(X − µ) . 1. Cov(AX, BY) = A Cov(X, Y)B ; 4. We can recover probability from a cf: for all a, b such 2. Cov(X, Y) = Cov(Y, X)0. that P (X = a) = P (X = b) = 0, 1. Two different distributions can have all the same mo- ments, but only if the variables have unbounded support 1 Z T e−ita − e−itb Correlation (C&B 4.5.2, 5, 7) Corr(X,Y ) ≡ ρXY ≡ P (X ∈ [a, b]) = lim φX (t) dt. sets. T →∞ 2π −T it Cov(X,Y )/(σX σY ). 2. A distribution is uniquely determined by its moments Other generating functions (C&B sec. 2.6.2) generat- Pn 0 k if all moments are defined and limn→∞ µ r /k! 1. Corr(a1X + b1, a2Y + b2) = Corr(X,Y ). k=1 k ing function ≡ log[MX (t)], if the mgf exists. exists for all r in a neighborhood of zero. 2. Correlation is in the range [−1, 1], with ±1 indicating a Factorial moment generating function (a.k.a. probability- perfectly linear relationship (+1 for positive slope, −1 X Variance (C&B 2.3.2, 4, p. 60, 4.5.6, ex. 4.58) Var X ≡ µ = E(X − generating function when X is discrete) ≡ E t , if the ex- 2 for negative slope), by the Cauchy-Schwarz Inequality. E X)2 = E X2 − (E X)2. Often parametrized as σ2. pectation exists. 3. X ⊥⊥ Y =⇒ Cov(X,Y ) = ρXY = 0 (assuming finite 1. For constants a, b, if Var X 6= ∞, then Var(aX + b) = moments); note however that zero covariance need not a2 Var X; imply independence. 1.5 Distributions 2 2. Assuming exist, Var(aX + bY ) = a Var X + −3/2 (C&B p. 102–4, 2.1.9, 3.6.5, 4.2.14, 4.3.4, 6, 5.3.3; Skewness (C&B ex. 2.28; Greene 66) α ≡ µ · (µ ) , where µ is Normal distribution b2 Var Y + 2ab Cov(X,Y ); 3 3 2 i the ith central moment. Measures the lack of symmetry in Wikipedia) Normal (a.k.a. Gaussian) particularly important 3. Var[Y − E(Y |X)] = E[Var(Y |X)]. because it is analytically tractable, has a familiar symmetric the pdf. 0 for any√ normal, t, or uniform; 2 for exponential, 2p2/r for χ2, 2 a/a for gamma. bell shape, and CLT shows that it can approximate many 0 0 r Multivariate variance (?) Var X ≡ E[XX ]−E[X] E[X] . Thus: distributions in large samples. If X is normal with mean −2 2 2 Kurtosis (C&B ex. 2.28) α ≡ µ · µ , where µ is the ith central (and median) µ and variance σ , then X ∼ N(µ, σ ) with 1. Var(X + Y) = Var(X) + Cov(X, Y) + Cov(X, Y)0 + 4 4 2 i moment. Measures the “peakedness” of the pdf. α = 3 for pdf Var(Y); 4 any normal. (Sometimes normalized by subtracting 3.) 2 2   0 1 −(x−µ) /(2σ ) 1 x − µ 2. Var(AX) = A Var(X)A . fX (x) = √ e = φ . 2 σ σ Moment generating function (C&B 2.3.6–7, 11–12, 15, 4.2.12, 4.6.7, 2πσ Conditional variance (C&B p. 151, 4.4.7; Greene 81–4) a.k.a. scedas- tX 9) MX (t) ≡ E e as long as the expectation exists for t in a fX has maximum at µ and inflection points at µ ± σ. Mo- 2 2 2 2 2 3 3 2 tic function. Var(Y |X) ≡ E[(Y − E[Y |X]) |X] = E[Y |X] − neighborhood 0. If MX exists, then ∀n ∈ Z, n ≥ 0, ments are E X = µ,E X = µ + σ ,E X = µ + 3µσ , 2 (E[Y |X]) . E X4 = µ4 + 6µ2σ2 + 3σ4. dn 0 n 0 µ ≡ E X = MX (t) . Stein’s Lemma: If g(·) is differentiable with E |g (X)| < ∞, 1. X ⊥⊥ Y =⇒ Var(Y |X) = Var(Y ). n dtn t=0 then E[g(X)(X − µ)] = σ2 E g0(X). 2. Conditional variance identity: provided the expecta- tions exist, 1. It is possible for all moments to exist, but not the mgf. Z ≡ (X − µ)/σ is distributed N(0, 1) (i.e., “standard nor- k k 2. If r.v.s have equal mgfs in some neighborhood of 0, then mal”). E[Z ] = 0 if k odd, E[Z ] = 1 · 3 · 5 ··· (n − 1) if k Var(Y ) = E[Var(Y |X)] + Var[E(Y |X)] . the variables are identically distributed (i.e., an extant even. CDF denoted Φ(·); pdf is | {z } | {z } mgf characterizes a distribution). residual variance regression variance 1 −z2/2 φ(z) ≡ fZ (z) = √ e . 3. If the mgfs of a sequence of r.v.s converge to M in some 2π Implies that on average, conditioning reduces the vari- X neighborhood of zero, then the cdfs of the sequence con- ance of the variable subject to conditioning (Var(Y ) ≥ 1. P (|X − µ| ≤ σ) = P (|Z| ≤ 1) ≈ 68.26%; verge to FX at all points where FX is continuous. E[Var(Y |X)]). 2. P (|X − µ| ≤ 2σ) = P (|Z| ≤ 2) ≈ 95.44%; √ 4. For constants a, b, if MX exists, then MaX+b(t) = bt 3. P (|X − µ| ≤ 3σ) = P (|Z| ≤ 3) ≈ 99.74%. Standard deviation (C&B 2.3.2) σ ≡ Var X. e MX (at). Independence and zero-covariance are equivalent for linear Covariance (C&B 4.5.1, 3, ex. 4.58–9; Greene 77) Cov(X,Y ) ≡ E[(X − 5. For X ⊥⊥ Y , MX+Y (t) = MX (t)MY (t). For P Q functions of normally distributed r.v.s. If normally dis- E X)(Y − E Y )] = E[(X − E X)Y ] = E[X(Y − E Y )] = X1,...,Xn mutually independent, M Xi = i MXi . tributed random vectors are pairwise independent, they are E(XY ) − (E X)(E Y ). If X, Y , Z all have finite variances, 6. For X ,...,X mutually independent, Z ≡ 1 n mutually independent. then: (a1X1 + b1) + ··· + (anXn + bn), then MZ (t) = t(P b ) Q 2 (e i ) MX (ait). Given iid sample Xi ∼ N(µ, σ ), log-likelihood is 1. Cov(X,Y ) = Cov[X, E(Y |X)]; i i n n 2 1 X 2 itX L(x) = − log(2π) − log(σ ) − (xi − µ) . 2. Cov[X,Y − E(Y |X)] = 0; Characteristic√ function (C&B sec. 2.6.2) φX (t) ≡ E e , where 2 2 2σ2 i = −1. 3. Cov(X,Y ) = E[Cov(X,Y |Z)] + Cov[E(X|Z), E(Y |Z)]. Many distributions can be generated with manipula- 4. Cov(X,Y + Z) = Cov(X,Y ) + Cov(X,Z). 1. The cf always exists. tions/combinations of normals:

7 2 2 1. Square of standard normal is χ1. Chi squared distribution (C&B 5.3.2; Hansen 5-29–32; MaCurdy p. 6; Includes normal, gamma, beta, χ , binomial, Poisson, and 2 2. If X ∼ N(µ, σ2), Y ∼ N(γ, τ 2), and X ⊥⊥ Y , then Greene 92) χn (Chi squared with n degrees of freedom) has negative binomial. C&B Theorem 3.4.2 gives results that X + Y ∼ N(µ + γ, σ2 + τ 2) (i.e., independent normals mean n and variance 2n. Can be generated from normal: may help calculate mean and variance using differentiation, rather than summation/integration. are additive in mean and variance). 2 2 1. If Z ∼ N(0, 1), then Z ∼ χ1 (i.e., the square of stan- 3. The sum and difference of independent normal r.v.s are dard normal is a chi squared with 1 degree of freedom); Can be re-parametrized as: independent normal as long as the variances are equal. 2. If X ,...,X are independent with X ∼ χ2 , then 1 n i pi 4. Ratio of independent standard normals is Cauchy (σ = P 2 Xi ∼ χP p (i.e., independent chi squared variables 1, θ = 0); look for the kernel of the exponential distri- i add to a chi squared, and the degrees of freedom add). k ! bution when dividing normals. ∗ X 0 −1 2 f(x|η) = h(x)c (η) exp ηiti(x) , 3. If X ∼ Nn(µ, Σ), then (X − µ) Σ (X − µ) ∼ χn. i=1 Bivariate normal distribution (C&B 4.5.10, ex. 4.45) Parameters 2 4. If X ∼ Nn(0, I) and Pn×n is an idempotent matrix, µX , µY ∈ R; σX , σY > 0; ρ ∈ [−1, 1]; and pdf (on R ): 0 2 2 then X PX ∼ χrank(P) = χtr(P). −1 over “natural parameter space” H ≡ {η =  p 2 5. If X ∼ Nn(0, I) then the sum of the squared deviations f(x, y) = 2πσX σY 1 − ρ 0 2 R Pk from the sample mean X MιX ∼ χ . (η1, . . . , ηk): h(x) exp( ηiti(x)) dx < ∞}, where for n−1 R i=1 2 ∗ R Pk −1 −1  x − µ  6. If X ∼ Nn(0, Σ) and Bn×nΣ is an idempotent matrix, all η ∈ H, we have c (η) ≡ [ h(x) exp( i=1 ηiti(x)) dx] × exp X R 2 then X0BX ∼ χ2 = χ2 . to ensure the pdf integrates to 1. 2(1 − ρ ) σX rank(BΣ) tr(BΣ)      2!! x − µX y − µY y − µY Student’s t distribution √(C&B 5.3.4; Greene 69–70) If X1,...,Xn The joint distribution of an iid sample from an exponential −2ρ + . 2 ¯ σX σY σY are iid N(µ, σ ), then n(X − µ)/σ ∼ N(0, 1). However, we family will also be an exponential family (closure under ran- will generally not know σ. Using√ the sample variance rather dom sampling). 2 than the true variance gives n(X¯ − µ)/s ∼ tn−1. 1. The marginal distributions of X and Y are N(µX , σX ) 2 q and N(µY , σY ) (note marginal normality does not im- 2 Generally, N(0, 1)/ χn−1/(n − 1) ∼ tn−1. If a t distribu- ply joint normality). Location and Scale families (C&B 3.5.1–7, p. 121) If f(x) a pdf, tion has p degrees of freedom, there are only p − 1 defined µ, σ constants with σ > 0, then g(x) also a pdf: 2. The correlation between X and Y is ρ. moments. t has thicker tails than normal. 3. For any constants a and b, the distribution of aX + bY 2 2 2 2 t1 is (the ratio of two independent stan- is N(aµX + bµY , a σX + b σY + 2abρσX σY ). dard normals). t is standard normal. ∞ 1  x − µ  4. All conditional distributions of Y given X = x are nor- g(x) ≡ f . 2 2 mal: Y |X = x ∼ Snedecor’s F distribution (C&B 5.3.6–8) (χp/p)/(χq/q) ∼ Fp,q. σ σ The F distribution is also related by transformation with 2 2 N(µY + ρ(σY /σX ) (x − µX ), σY (1 − ρ )). several other distributions: | {z } X ∼ g(x) iff ∃Z ∼ f, X = σZ + µ. Assume X and Z exist; Cov(X,Y )/σ2 1. 1/F ∼ F (i.e., the reciprocal of an F r.v. is another X p,q q,p P (X ≤ x) = P (Z ≤ (x − µ)/σ), and if E Z and Var Z exist, F with the degrees of freedom switched); then E X = σ E Z + µ and Var(X) = σ2 Var Z. Multivariate normal distribution (Hansen 5-17–35; MaCurdy p. 6; 2 2. (tq) ∼ F1,q; Greene 94) p-dimensional normal, Np(µ, Σ) has pdf 3. (p/q)Fp,q/(1 + (p/q)Fp,q) ∼ beta(p/2, q/2). − p − 1 1 0 −1 f(x) = (2π) 2 |Σ| 2 exp − (x − µ) Σ (x − µ) , 1. Family of pdfs f(x − µ) indexed by µ is the “location 2 2 Lognormal distribution (C&B p. 625) If X ∼ N(µ, σ ), then family” with standard pdf f(x) and X where µ = E[X] and Σij = Cov(Xi,Xj ). Y ≡ e is lognormally distributed. (Note: a lognormal is µ. not the log of a normally distributed r.v.). A linear transformation of a normal is normal: if X ∼ q×p 2 1 Np(µ, Σ), then for any A ∈ R with full row rank (which µ+(σ /2) 2. Family of pdfs σ f(x/σ) indexed by σ > 0 is the “scale q E Y = e ; implies q ≤ p), and any b ∈ R , we have AX + b ∼ family” with standard pdf f(x) and scale parameter σ. 0 −1/2 2 2 Nq(Aµ + b, AΣA ). In particular, Σ (X − µ) ∼ N(0, I), Var Y = e2(µ+σ ) − e2µ+σ . (e.g., exponential.) where Σ−1/2 = (Σ1/2)−1 = HΛ−1/2H0. 1 The following transformations of X ∼ Np(µ, Σ) are indepen- Exponential families (C&B 3.4; Mahajan 1-5–6, 11) Any family of 3. Family of pdfs σ f((x − µ)/σ) indexed by µ, and σ > 0 dent iff AΣB0 = Cov(AX, BX) = 0: pdfs or pmfs that can be expressed as is the “location-scale family” with standard pdf f(x), location parameter µ, and scale parameter σ. (e.g., uni- 0 0 k ! 1. AX ∼ N(Aµ, AΣA ) and BX ∼ N(Bµ, BΣB ), X form, normal, double exponential, Cauchy.) f(x|θ) = h(x)c(θ) exp w (θ)t (x) , 2. AX ∼ N(Aµ, AΣA0) and X0BX ∼ χ2 (where i i rank(BΣ) i=1 BΣ is an idempotent matrix), where h(x) ≥ 0, {t (x)} are real-valued functions, c(θ) ≥ 0, Stable distribution (Hansen 5-15) Let X , X be iid F , and define 3. X0AX ∼ χ2 and X0BX ∼ χ2 (where i 1 2 rank(AΣ) rank(BΣ) {wi(θ)} are real-valued functions, and the support does not Y = aX1 + bX2 + c. Then F is a stable distribution iff ∀a, AΣ and BΣ are idempotent matrices). depend on θ. b, c, ∃d, e such that dY + e ∼ F .

8 1.6 Random samples Sample covariance (Greene 102–4) 1.7 Convergence of random variables

Random sample, iid (5.1.1) R.v.s X ,...,X are a random sam- L 1 n 1 X X −−→s X sXY ≡ (Xi − X¯)(Yi − Y¯ ) n ple of size n from population f(x) (a.k.a. n iid r.v.s with n − 1 pdf/pmf f(x)) if they are mutually independent, each with i s≥r marginal pdf/pmf f(x). By independence, f(x1, . . . , xn) = 1 hX ¯ ¯ i Q = XiYi − nXY . f(xi). n − 1  i i as Lr Xn −→ X Xn −−→ X Statistic (C&B 5.2.1; Mahajan 1-7) A r.v. Y ≡ T (X1,...,Xn), where If Cov(X ,Y ) = σ < ∞, then E s = σ (i.e., s T is a real or vector-valued function T (x1, . . . , xn) whose do- i i XY XY XY XY is an unbiased estimator of σ ). s = |ab|s . r≥0 main includes the support of (X1,...,Xn). The distribution XY aX,bY XY !) p u} of Y is called its sampling distribution. n P Xn −→ X For any real numbers {xi, yi} , we have (xi − x¯)(yi − P i=1 i Alternately, any measurable function of the data (as distinct y¯) = i xiyi − nx¯y¯. from a parameter—a function of the distribution).  d Sample correlation (Greene 102–4) rXY ≡ sXY /(sX sY ). X −→ X Unbiased estimator (Hansen 5-14; C&B 7.3.2) An statistic θˆ is un- n ˆ ˆ biased for θ iff Eθ(θ) = θ for all θ. That is, if Bias[θ] ≡ raX,bY = (ab/|ab|)rXY . See more LLNs and CLTs in “Time-series concepts.” ˆ Eθ θ − θ = 0 for all θ. Convergence in probability (C&B 5.5.1–4, 12; Hansen 5-41; Hayashi Order statistic (C&B 5.4.1–4) The order statistics of a sample ∞ 1 P 89; D&M 103; MaCurdy p. 9) {X } converges in probability to Sample mean (C&B 5.2.2, 4, 6–8, 10, p. 216, 5.5.2) X¯ ≡ Xi (i.e., i i=1 n i X1,...,Xn are the ordered values from X(1) (the sample the arithmetic average of the values in a random sample). X iff, ∀ε > 0, limn→∞ P (|Xn − X| ≥ ε) = 0, or equivalently minimum) to X(n) (the sample maximum). Thus the sam- p For any real numbers, the arithmetic average minimizes SSR ple median is limn→∞ P (|Xn − X| < ε) = 1. Written as Xn −→ X or P 2 (i.e.,x ¯ ∈ argmina i(xi − a) ). Xn − X = op(1) or plimn→∞ Xn = X. ( 1. If E X = µ < ∞, then E X¯ = µ (i.e., X¯ is an unbiased X , n is odd; 1. Convergence in probability is implied by almost sure i M ≡ ((n+1)/2) estimator of µ) 1 convergence or convergence in Lp (for p > 0). 2 (X(n/2) + X(n/2+1)), n is even. 2. Convergence in probability implies convergence in dis- 2. If Var X = σ2 < ∞, then Var X¯ = σ2/n. i tribution (but not conversely). n 3. If Xi have mgf MX (t), then MX¯ (t) = [MX (t/n)] . n 3. (Weak Law of Large Numbers) If {X } iid with E X = If {Xi}i=1 iid continuous r.v.s, then i i 2 ¯ p 4. (Law of Large Numbers) If {Xi} iid, E Xi = µ < ∞ and µ < ∞ and Var Xi = σ < ∞, then the series Xn −→ µ 2 as Var Xi = σ < ∞, then the series X¯n −→ µ (this also n (stronger result gives convergence almost surely). X n k n−k implies convergence in probability, the “Weak” LLN). FX (x) = [FX (x)] [1 − FX (x)] ; p (j) k 4. (Continuous Mapping Theorem) If Xn −→ X and h is a k=j p continuous function, then h(Xn) −→ h(X). The distribution of the Xi, together with n, characterize the n! j−1 n−j p distribution of X¯: fX (x) = fX (x)[FX (x)] [1 − FX (x)] . (j) (j − 1)!(n − j)! 5. If E Xn → µ and Var Xn → 0, then Xn −→ µ. 2 2 1. If Xi ∼ N(µ, σ ), then X¯ ∼ N(µ, σ /n). Uniform convergence in probability (Hayashi 456–7; MaCurdy ∞ See C&B 5.4.3 for discrete r.v. p. 14; D&M 137) {Qi(θ)}i=1 converges in probability to Q0(θ) 2. If Xi ∼ gamma(α, β), then X¯ ∼ gamma(nα, β/n). p iff, supθ∈ΘkQn(θ) − Q0(θ)k −→ 0. ¯ n 3. If Xi ∼ Cauchy(θ, σ), then X ∼ Cauchy(θ, σ). Samples from the normal distribution (C&B 5.3.1, 6) {Xi} i=1 That is ∀ε > 0, limn→∞ P (supθ∈ΘkQn(θ) − Q0(θ)k ≥ ε) = iid N(µ, σ2) gives: 4. If Xi ∼ (1/σ)f((x − µ)/σ) are members of a location- 0, or equivalently limn→∞ P (supθ∈ΘkQn(θ)−Q0(θ)k < ε) = ¯ ¯ n scale family, then X = σZ + µ, where {Zi}i=1 is a 1. random sample with Zi ∼ f(z). 1. X¯ ⊥⊥ s2 (can also be shown with Basu’s Theorem); This is stronger than pointwise convergence. Uniform con- vergence in probability is the regularity condition required 2. X¯ ∼ N(µ, σ2/n); Sample variance (C&B 5.2.3–4, 6; Greene 102–4) to pass plims through functions or to reverse the order of 3. Var(s2) = 2σ4/(n − 1); differentiation and integration. 2 1 X ¯ 2 1 hX 2 ¯ 2i s ≡ (Xi − X) = Xi − nX . n − 1 n − 1 2 2 2 Little o error notation (D&M 108–13; Hansen 5-42; MathWorld) i i 4. (n − 1)s /σ ∼ χ . n−1 Roughly speaking, a function is o(z) iff is of lower asymp- √ 2 2 totic order than z. Sample standard deviation is s ≡ s . If Var(Xi) = σ < n 2 m 2 If {Xi}i=1 iid N(µX , σ ) and {Yi}i=1 iid N(µY , σ ), ∞, then E s2 = σ2 (i.e., s2 is an unbiased estimator of σ2). X Y f(n) = o(g(n)) iff limn→∞ f(n)/g(n) = 0. If {f(n)} is 2 2 2 a sequence of random variables, then f(n) = o (g(n)) iff saX = a sX . p 2 2 2 2 n P 2 sX /sY sX /σX plimn→∞ f(n)/g(n) = 0. For any real numbers {xi}i=1, we have i(xi − x¯) = = ∼ Fn−1,m−1. P 2 2 σ2 /σ2 s2 /σ2 −γ γ p i xi − nx¯ . X Y Y Y We write Xn − X = op(n ) iff n (Xn − X) −→ 0.

9 Big O error notation (D&M 108–13; MathWorld) Roughly speaking, 2. (Strong Law of Large Numbers) If {Xi} iid with E Xi = Delta Method (C&B 5.5.24, 26, 28; Wikipedia; Hayashi 93–4) Let as √ a function is O(z) iff is of the same asymptotic order as z. µ < ∞ and Var X = σ2 < ∞, then the series X¯ −→ µ. ∞ d i n {Xi}i=1 be a sequence of r.v.s satisfying n(Xn − θ) −→ 2 0 f(n) = O(g(n)) iff |f(n)/g(n)| < K for all n > N and some N(0, σ ). For a given function g and specific θ, suppose g (θ) 3. (Strong Law of Large Numbers, niid) If {Xi} niid with 0 positive integer N and some constant K > 0. If {f(n)} is −2 P exists and g (θ) 6= 0. Then: E Xi = 0 and limn→∞ n i Var Xi = ∞, then the a sequence of random variables, then f(n) = op(g(n)) iff ¯ as series Xn −→ 0. √ d 2 0 2 plim f(n)/g(n) = 0. n[g(Xn) − g(θ)] −→ N(0, σ [g (θ)] ). n→∞ as 4. (Continuous Mapping Theorem extension) If Xn −→ X as If g0(θ) = 0, but g00(θ) exists and g00(θ) 6= 0, we can apply Order symbols (D&M 111-2) and h is a continuous function, then h(Xn) −→ h(X). d p q max{p,q} the second-order Delta Method and get n[g(Xn) − g(θ)] −→ O(n ) ± O(n ) = O(n ). 1 2 00 2 Convergence in distribution (C&B 5.5.10–13; Hayashi 90–1; Greene 2 σ g (θ)χ1. p q max{p,q} ∞ o(n ) ± o(n ) = o(n ). 119–20; D&M 107) {Xi}i=1 converges in distribution to X iff, Alternate formulation: If B is an estimator for β then limn→∞ FX (x) = FX (x) at all points where FX is contin- ( p n the variance of a function h(B) ∈ R is Var(h(B)) ≈ p q O(n ), p ≥ q; d 0 O(n ) ± o(n ) = q uous. Written as Xn −→ X or Xn − X = Op(1) or as “X is ∇h(β) Var(B)∇h(β). If h(B) is vector-valued, the variance o(n ), p < q. 0 ∂h the limiting distribution of Xn.” is H Var(B)H , where H = ∂β0 (i.e., Hij ≡ ∂j hi(β)). O(np) · O(nq) = O(np+q). 1. Convergence in probability is implied by almost sure o(np) · o(nq) = o(np+q). convergence, convergence in Lp (for p > 0), or conver- 1.8 Parametric models O(np) · o(nq) = o(np+q). gence in probability. Parametric model (Mahajan 1-1–2) Describe an (unknown) prob- 2. Convergence in distribution implies convergence in These identities cover sums, as long as the number of terms ability distribution P that is assumed to be a member of a probability if the series converges to a constant. summed is independent of n. family of distributions P. We describe P with a parametriza- tion: a map from a (simpler) space Θ to P such that Central Limit Theorem for iid samples (C&B 5.5.14–15; Hansen a P = {Pθ : θ ∈ Θ}. If Θ is a “nice” subset of Euclidean Asymptotic equivalence (D&M 110–1) f(n) = g(n) iff √ d 5-60–65; Hayashi 96) Lindeberg-Levy CLT: n(X¯ − µ)/σ −→ space, and the mapping P is “smooth,” then P is called a limn→∞ f(n)/g(n) = 1. n θ N(0, 1) as long as the iid Xis each have finite mean, and fi- parametric model. A regular parametric model if either all P are continuous, or all are discrete. Convergence in Lp (Hansen 5-43–5; Hayashi 90; MaCurdy p. 11) nite, nonzero variance. Note a weaker form requires mgfs of θ ∞ p Xi to exist in a neighborhood of zero. {Xi}i=1 converges in Lp to X iff, limn→∞ E(|Xn −X| ) = 0. Parameter (Mahajan 1-2) Mapping from the family of distributions Note this requires the existence of a pth moment. Written √ ¯ d P to another space (typically a subset of Euclidean Space). Lp In multivariate case, iid Xi ∼ (µ, Σ) satisfy n(Xn − µ) −→ Xn −−→ X. Convergence in L2 a.k.a. convergence in mean N(0, Σ). Proved using Cram´er-Wold Device. Can be explicitly or implicitly defined. square/quadratic mean. We also have CLT for niid samples (Lyapounov’s Theorem), A function of the distribution (as distinct from a statistic—a function of the data). 1. Convergence in Lp is implied by convergence in Lq for Ergodic stationary mds CLT, and CLT for MA(∞) processes. q ≥ p. Identification (Mahajan 1-3–4; C&B 11.2.2) A parameter is (point) Central Limit Theorem for niid samples (Hansen 5-62; D&M 2. Convergence in Lp implies convergence in Lj for j ≤ p. identified if it is uniquely determined for every distribution; 126; MaCurdy p. 21–2) [Lyapounov’s Theorem] If Xi ∼ 2 i.e., if the mapping is one-to-one. When the parameter is 3. Convergence in L (for p > 0) implies convergence in niid(µ, σ ) and a (2 + δ)th moment exists for each Xi, then p i implicitly defined as the solution to an optimization prob- probability and in distribution (but not conversely). √ lem (e.g., θ(P ) = argmaxb∈Θ Q0(b, P )), identification corre- ¯ d 2 Lp n(Xn − µ) −→ N(0, σ¯ ), sponds to existence of a unique solution to the optimization. 4. (Continuous Mapping Theorem extension) If Xn −−→ Lp Two elements θ , θ ∈ Θ are observationally equivalent iff X and h is a continuous function, then h(X ) −−→ whereσ ¯2 ≡ lim 1 P σ2, as long as the iid X s each 1 2 n n→∞ n i i i they imply P = P . Identification of θ means there is h(X). have finite mean, and finite, nonzero variance. Note a weaker θ1 θ2 no other element of Θ observationally equivalent to θ; i.e., form requires mgfs of Xi to exist in a neighborhood of zero. Pθ1 = Pθ2 =⇒ θ1 = θ2. Convergence in L2 to a constant requires limn→∞ E[(Xn − 2 0 2 2 Implies that if εi ∼ niid(0, σ ) (with extant (2 + δ)th mo- X) (Xn − X)] = limn→∞ Bias + Var = 0. Thus it is nec- Identification in exponential families (Mahajan 1-5–6, Metrics essary and sufficient that squared bias and squared variance ment), and {zi} a series of (non-stochastic) constants, then d P.S. 5-1) For iid sampling from an exponential family where go to zero. n−1/2Z0ε −→ N(0, σ2S ) where S ≡ lim 1 P z2 = zz zz n→∞ n i i η (θ) = θ , if the k × k (Fisher Information) matrix 1 0 i i limn→∞ n Z Z. Almost sure convergence (C&B 5.5.6, 9; Hayashi 89; D&M 106, 131;  ∗   ∗ 0 ∞ ∗ d log p(x, θ ) d log p(x, θ ) MaCurdy p. 10) {Xi}i=1 converges in probability to X iff, d I(θ ) ≡ E as Slutsky’s Theorem (C&B 5.5.17; Hayashi 92–3) If Xn −→ X and dθ dθ ∀ε > 0, P (limn→∞ |Xn − X| < ε) = 1. Written Xn −→ X. p Y −→ a, where a is a constant, then: Also known as strong convergence, or convergence almost n is nonsingular for every θ∗ ∈ Θ, then θ is identified. everywhere. d 1. YnXn −→ aX; Conditional homoscedasticity (Mahajan 2-17) The assumption 1. Almost sure convergence implies convergence in proba- that Var(Y |Z) = σ2; i.e., variance of Y does not depend d bility and in distribution (but not conversely). 2. Xn + Yn −→ X + a. on Z.

10 Regression model (Mahajan 1-6–7, 3-9 Hayashi 465–6, Metrics P.S. 7-7) 1.9 Statistics Complete statistic (Mahajan 1-16; C&B 6.2.21, 28) T : X → T is n {Yi,Xi}i=1, where Yi ∈ R the dependent variable (a.k.a. re- complete iff for every measurable real function g : T → R d sponse) and Xi ∈ R the covariates or explanatory variables. Sufficient statistic (Mahajan 1-8–10; C&B 6.2.1) T (X) is sufficient such that ∀θ ∈ Θ, Eθ[g(T )] = 0 implies that g(T ) = 0 almost Possible specifications include: for {Pθ : θ ∈ Θ} (or more compactly for θ) iff the condi- everywhere. Equivalently, T is complete if no non-constant tional distribution of X given T (X) does not depend on θ; function of T is first-order ancillary. 1. E(Yi|Xi) = g(Xi) for some unknown function g (non- i.e., p(x|T (X)) = p(x|T (x), θ). Once the value of a sufficient If a minimal sufficient statistic exists, then any complete parametric regression); statistic is known, the sample does not carry any further statistic is also a minimal sufficient statistic. 0 d information about θ. Useful for: 2. E(Yi|Xi) = g(X θ0) for some unknown θ0 ∈ i R Basu’s theorem (Mahajan 1-19; C&B 6.2.24, 28) If T (X) is a com- and unknown function g (single-index model; semi- plete minimal sufficient statistic, then T (X) is independent parametric); 1. Decision theory: base decision rules on sufficient statis- tics (for any decision rule, we can always come up with of every ancillary statistic. Note the minimal wasn’t really 0 d 3. E(Yi|Xi) = Xiθ0 for some unknown θ0 ∈ R ; rule based only on a sufficient statistic that has the same necessary: if a minimal sufficient statistic exists, then any complete statistic is also a minimal sufficient statistic. 0 2 d risk); 4. (Yi|Xi) ∼ N(Xiθ0, σ ) for some unknown θ0 ∈ R 2 and σ ∈ R+ (Gaussian regression model; θ0 is iden- 2. Dealing with nuisance parameters in hypothesis testing: Statistics in exponential families (Mahajan 1-11, 17; C&B 0 −1 0 tified and conditional MLE θˆ = (X X) X Y is con- find sufficient statistics for the nuisance parameters and 6.2.10, 25) By the factorization theorem, T (X) ≡ 0 sistent if E(XiXi) is nonsingular, conditional MLE condition decision rules on them; (T1(X),...,Tk(X)) is sufficient for θ. (N.B. The statis- 2 1 ˆ 0 ˆ tic must contain all the Ti.) σˆ = n (Y − Xθ) (Y − Xθ)). 3. Unbiased estimation: look for unbiased estimators that are functions of sufficient statistics. If X is an iid sample, then T (X) ≡ P P Linear regression model with non-stochastic covariates ( i T1(X1),..., i Tk(Xi)) is a complete statistic if the k (Mahajan 1-10–1, 18–9, Metrics P.S. 5-3) For two-dimensional Gaus- set {(η1(θ), . . . , ηk(θ)): θ ∈ Θ} contains an open set in R . 0 Any one-to-one function of a sufficient statistic is also suf- sian regression model with Xi = (1, xi) known. The param- (Usually, all we’ll check is dimensionality.) 2 ficient. Outside exponential families, it is rare to have a eter (θ0, σ ) is identified as long as the xi are not all identi- P P 2 P sufficient statistic of smaller dimension than the data. cal, with complete sufficient statistic ( Yi, Yi , xiYi). MLE computed in problem set. 1.10 Point estimation Factorization theorem (Mahajan 1-9; C&B 6.2.6) In a regular para- metric model {Pθ : θ ∈ Θ}, a statistic T (X) (with range T ) Estimator (Mahajan 2-2) Any measurable function of the data. Seemingly Unrelated Regressions (Mahajan 2-11, 21) is sufficient for θ iff there exists a function g : T × Θ → {Y ,X }n where Y ∈ m and X = (X0 ,...,X0 )0 ∈ R Note this must not be a function of any parameters of the i i i=1 i R i 1i mi and a function h such that f(x, θ) = g(T (x), θ)h(x) for all x (m2) distribution. R . We assume Yi are (multivariate) normal distributed and θ. 0 m with means E[Ysi] = xsiβs where βs ∈ R and the parame- Extremum estimator (Hayashi 446) An estimator θˆ such that 0 0 0 (m2) there is a scalar (“objective”) function Q (θ) such that θˆ ter of interest is β = (β , . . . , βm) ∈ R . Minimal sufficient statistic (Mahajan 1-12, 19; C&B 6.2.11) T (X) is n 1 p minimal sufficient if it is sufficient, and for any other suf- maximizes Qn(θ) subject to θ ∈ Θ ⊆ R . The objective n function depends not only on θ, but also on the data (a sam- Probit model (Mahajan 2-12–3, 3-9–10; Hayashi 466) iid {Wi}i=1 = ficient statistic S(X) we can find a function r such that n ple of size n). {Yi,Zi}i=1 where Yi ∈ {0, 1} and Yi have conditional dis- T (X) = r(S(X)). This means that a minimal sufficient 0 tribution P (Yi = 1|Zi) = Φ(θ Zi) where Φ(·) is the cdf of statistic induces the coarsest possible partition on the data; Analogy principle (Mahajan 2-2–3) Consider finding an estimator the standard normal. θ is identified and MLE is consistent i.e., it has achieved the maximal amount of data reduction 0 that satisfies the same properties in the sample that the pa- if E(ZiZ ) is nonsingular. possible while still retaining all information about the pa- i rameter satisfies in the population; i.e., seek to estimate θ(P ) ∗ rameter. Alternate motivation is threshold crossing model: Yi = with θ(Pn) where Pn is the empirical distribution which puts θ0Z − ε where ε ⊥⊥ Z and standard normal, and Y = 1 i i i i i Any one-to-one function of a minimal sufficient statistic is mass n at each sample point. Note this distribution con- ∗ I{Yi > 0}. minimal sufficient. If a minimal sufficient statistic exists, verges uniformly to P . then any complete sufficient statistic is also a minimal suffi- Nonlinear least squares (Mahajan 2-16–7, 3-6–7, 17–9) iid cient statistic. Consistent estimator (Mahajan 3-2; Hansen 5-41–2; C&B 10.1.1, 3) ∞ n The sequence of estimators {θˆn} is consistent for θ iff {Yi,Zi}i=1 where Yi have conditional expectation E(Y |Z) = n=1 ˆ p ˆ ψ(Z, θ). The parameter θ can also be defined implicitly as Likelihood function (Mahajan 1-13–4) L(x, θ) ≡ p(x, θ). This is θn −→ θ(P ). The sequence is superconsistent iff θn − θ = 2 −1/2 θ = argminb∈Θ E[Y − ψ(Z, b)] . Identification condition is the same as the pdf/pmf, but considered as a function of θ op(n )). Superconsistency implies consistency. that for all b 6= θ, we have P (ψ(Z, b) 6= ψ(Z, θ)) > 0. instead of x. ˆ If limn→∞ Varθ θn = 0 (variance goes to zero) and See Mahajan 3-17–9 for asymptotic properties, including het- ˆ The likelihood ratio Λ(x, ·) ≡ L(x, ·)/L(x, θ0), where θ0 ∈ Θ limn→∞ Eθ θn = θ (bias goes to zero) for every θ ∈ Θ, then eroscedasticity robust asymptotic variance matrix ˆ is fixed and known, with the support of Pθ a subset of the {θn} is consistent (sufficient, not necessary, by Chebychev’s

support of Pθ0 for all θ ∈ Θ. The likelihood ratio is minimal Inequality). Linear instrumental variables (Mahajan 2-22–3, 3-11–2) Yi = sufficient for θ. Xiθ + εi with moment conditions E[(Yi − Xiθ)Zi] = 0 for Consistency with compact parameter space (Mahajan 3-4–5, random vector Zi. The Zi are “endogenous instruments” 11; Hayashi 457) Let θˆn ≡ argmax Qn(W, b) ≡ Ancillary statistic (Mahajan 1-14; C&B 6.2.16) S(X) is ancillary for b∈Θ for regressors Xi, endogenous because E εiXi 6= 0. Identifi- argmaxb∈Θ Qn(b). (In the last equivalence we have sup- 0 θ iff the distribution of S(X) does not depend on θ. cation condition for θ is E(ZiXi) has full column rank and pressed dependence on the data W.) This covers M- that dimension of Zi be at least dimension of Xi. It is first-order ancillary iff E S(X) does not depend on θ. estimators, MLE, and GMM estimators. Suppose that:

11 d ˆ ˆ 1. Θ is a compact subset of R [generally not satisfied]; Asymptotically normal estimator√(Mahajan 3-2, 13–4) The se- 5. (Invariance property) If θ is an MLE of θ, then h(θ) is quence of estimators {θˆn}i is ( n) asymptotically normal an MLE of h(θ). 2. Qn(b) is continuous in b for any realization of the data √ d iff n(θˆn − θ(P )) −→ N(0,V (P )) for some symmetric pos- W [“usually easily checked”]; Consistency for for MLE (Hayashi 463–465; Mahajan 3-8–10) Let itive definite matrix V (P ) (somewhat inaccurately referred ˆ 3. Q (b) is a measurable function of the data for all b ∈ Θ {yt, xt} be ergodic stationary, and let θ be the conditional n to as the asymptotic variance of θˆ ). [generally assumed]. n MLE that maximizes the average log conditional likelihood Suppose that (derived under the assumption that {yt, xt} is iid). ˆ These conditions ensure that θn is well-defined. Suppose ˆ Suppose conditions (specified on Hayashi 464) allow us to 1. θn is consistent for θ; p there exists a function Q0(b) such that: apply a general consistency theorem. Then θˆ −→ θ0 despite 2. θ ∈ interior(Θ); the fact that the MLE was derived under the iid assumption. 1. Identification: Q0(·) is uniquely (globally) maximized 3. Qn(b) is twice continuously differentiable in a neighbor- on Θ at θ ∈ Θ; hood N of θ; Asymptotic normality for MLE (Mahajan 3-20–2; C&B 10.1.12; Hayashi 474–6; D&M 258, 260–3, 270–4) Suppose {W } ≡ {Y ,Z } √ ∂Q (θ) d i i i 2. Uniform convergence: Qn(·) converges uniformly in n 4. n ∂θ −→ N(0, Σ); is an iid sample, that Z is ancillary for θ, that θˆ ≡ probability to Q (·) [can be verified by checking more n 0 5. Uniform convergence of the Hessian: There exists a argmax 1 P log p(Y |Z , b) ≡ argmax Q (b). Define primitive conditions; in particular, for M-estimators a b∈Θ n i i b∈Θ n matrix H(b) that is continuous and nonsingular at θ ∂ log p(Yi|Zi,b) Uniform Weak LLN will suffice]. score s(Wi, b) ≡ ∂b and Hessian H(Wi, b) ≡ such that 2 ∂s(Wi,b) = ∂ log p(Yi|Zi,b) . Suppose: p 2 ∂b ∂b ∂b0 ˆ ∂ Q (b) p Then θn −→ θ. sup n − H(b) −→ 0. 0 b∈N ∂θ ∂θ 1. θˆn is consistent for θ—generally fails either because Consistency without compact parameter space (Mahajan 3- number of parameters increases with sample size, or ˆ √ d −1 −1 5, 6; Hayashi 458) Let θn ≡ argmaxb∈Θ Qn(W, b) ≡ Then n(θˆn − θ) −→ N(0,H(θ) ΣH(θ) ). model not asymptotically identified (even if it is identi- argmaxb∈Θ Qn(b) as above. Suppose that: fied by any finite sample); d 2 Asymptotic variance (C&B 10.1.9) If kn[θˆn − θ] −→ N(0, σ ) for 2. θ ∈ interior(Θ); 1. True parameter θ ∈ interior(Θ); 2 some sequence of constants {kn}, then σ is the asymptotic 3. p(Y |Z, b) is twice continuously differentiable in b for any 2. Θ is a convex set; variance. (Y,Z); 0 3. Qn(b) is concave in b for any realization of the data W Asymptotically efficient estimator (C&B 10.1.11–2) {σˆn} is 4. E[s(W, θ)] = 0 and − E[H(W, θ)] = E[s(W, θ)s(W, θ) ] [will be true for MLE, since log-likelihood is concave, if √ d 2 2 (this is stated as an assumption, but is the Information asymptotically efficient if n[θˆn − θ] −→ N(0, σ ) and σ only after a re-parametrization]; is the CRLB. Equality and hence holds if its requirements do); 1 d 4. Qn(b) is a measurable function of the data for all b ∈ Θ. 5. √ P s(W , θ) −→ N(0, Σ) for some Σ > 0; Under regularity conditions, the MLE is consistent and n i asymptotically efficient. 6. E(sup kH(W, b)k) < ∞, which implies via ULLN These conditions ensure that θˆ is well-defined. Suppose b∈N n that there exists a function Q0(b) such that: Maximum Likelihood estimator (Mahajan 2-4–10, 36; Hayashi 448– 9, 463–5) θˆ ≡ argmax L(X, b). Equivalently, for iid n b∈Θ 1 X p ˆ 1 P ∗ sup H(Wi, b) − E[H(W, b)] −→ 0; 1. Identification: Q0(·) is uniquely (globally) maximized data, θ = argmaxb∈Θ n log p(Xi, b). Estimating θ = n b∈N i=1 on Θ at θ ∈ Θ; argmaxb∈Θ EPθ log p(X, b). An M-Estimator with q(Xi, b) ≡ p − log p(Xi, b). Note: 7. E[H(W, θ)] is nonsingular (only required at true param- 2. Pointwise convergence: Qn(b) −→ Q0(b) for all b ∈ Θ. eter). 1. The identification condition is that the parameter being p ˆ ˆ estimated is identified. −1 Then θn exists with probability approaching 1 and θn −→ θ. Then θˆn is asymptotically normal with variance I(θ) (note See Mahajan 3-6 for M-estimators. 2. MLE need not always exist, and if they do, need not be this is the Fisher information for one observation, not the unique. joint distribution). The MLE is not necessarily unbiased,

Uniform (Weak) Law of Large Numbers (Mahajan 3-6; Hayashi 3. We may not be able to get a closed-form expression for but in the limit the variance attains the CRLB hence MLE ˆ is asymptotically efficient. No GMM estimator can achieve 459) Suppose {Wi}i is ergodic stationary and that: θ, and therefore have to characterize it as the solution to a maximization problem (or its FOCs). lower variance. 1. Θ is compact; 4. The expected value of (log) likelihood is uniquely max- Estimation of asymptotic variance can either be done by es- timating the Hessian or the score (either gives a consistent 2. q(Wi, b) is continuous in b for all Wi; imized at the true parameter as long as θ is identified; i.e., the Kullback-Liebler Divergence expression using the Fisher Information Equality). 3. q(W , b) is measurable in W for all b; i i    p(X, θ) M-estimator (Mahajan 2-15–6; Hayashi 447) θˆ ≡ 4. EP [supb∈Θ |q(Wi, b)|] < ∞. K(b, θ) ≡ Eθ log > 0 for all b 6= θ, 1 P p(X, b) argminb∈Θ n q(Xi, b) (assuming iid data). Estimating θ = argmin E q(X, b). 1 P b∈Θ P Then n i q(Wi, b) converges uniformly to E[q(Wi, b)], and or equivalently, if p(x, b) = p(x, θ) for all x implies that E[q(Wi, b)] is a continuous function of b. b = θ. 1. MLE is an M-Estimator with q(Xi, b) ≡ − log L(Xi, b).

∗This is “quasi-ML” if used for non-iid data. It can be consistent even for (non-iid) ergodic stationary processes—see Hayashi 464–5.

12 2. Sample mean is an M-Estimator with q(Xi, b) ≡ (Xi − for some specified weight matrix Sm×m symmetric and Then θˆn is asymptotically normal with variance 2 b) . positive definite. The quadratic form in S defines a 0 −1 0 0 −1 norm. Correct specification requires orthogonality condition [M(θ)WM(θ) ] M(θ)WS(θ)WM(θ) [M(θ)WM(θ) ] . Asymptotic normality for M-estimator (Mahajan 3-14–7; E[m(Xi, θ)] = 0. If we choose W = S(θ)−1 (the efficient choice), the asymp- Hayashi 470–4; D&M 593) Suppose {Wi} is an iid sample, that −1 0 −1 1 P Extremum estimators can typically be thought of as GMM totic variance reduces to [M(θ)S(θ) M(θ) ] . θˆn ≡ argmax q(Wi, b) ≡ argmax Qn(b). De- b∈Θ n b∈Θ estimators when we characterize them by FOCs. Includes ∂q(Wi,b) Assuming conditions for ULLN apply, we can estimate terms fine “score” s(Wi, b) ≡ ∂b and Hessian H(Wi, b) ≡ MLE and M-estimators that can be characterized by FOCs. 1 P ˆ ˆ 0 using consistent estimators: Sb ≡ n m(Xi, θn)m(Xi, θn) ∂s(W ,b) ∂2q(W ,b) i = i . Suppose: ∂m(X ,θˆ ) ∂b ∂b ∂b0 and Mc ≡ 1 P i n . Consistency for GMM estimators (Mahajan 3-10–1; Hayashi 467– n ∂b ˆ ˆ 1. θn is consistent for θ; 8) Let θn ≡ argmaxb∈Θ Qn(b) (as above), where Efficient GMM (Mahajan 3-26–7; Hayashi 212–3) Given above GMM ∗ −1 2. θ ∈ interior(Θ); estimator, if W = S(θ) (the inverse of the variance of the n 0 n 1  1 X   1 X  moment conditions), then the asymptotic variance reduces to 3. q(W, b) is twice continuously differentiable in b for any Qn(b) = − m(Xi, b) Sn m(Xi, b) . −1 0 −1 2 n n Ve = [M(θ)S(θ) M(θ) ] . W ; i=1 i=1 This is the lowest variance (in the matrix sense) that can be 1 P d ∗ 4. √ s(Wi, θ) −→ N(0, Σ) for some Σ > 0; n The true parameter satisfies EP [m(W, θ)] = 0 and achieved. Therefore the optimal choice of weighting matrices −1 5. E(sup kH(W, b)k) < ∞, which implies via ULLN hence uniquely maximizes limit function Q0(b) = is any sequence of random matrices that converge to S .A b∈N 1 0 ˜ p that − 2 E(m(W, b)] S E(m(W, b)]. Suppose that: natural choice is to develop preliminary estimates θn −→ θ (often using the identity as weighting matrix) and generating n 1 X p d 1. Θ is a compact subset of R ; −1 sup H(Wi, b) − E[H(W, b)] −→ 0; h 1 X X 0i b∈N n Wn = m(X , θ˜n) m(X , θ˜n) . i=1 2. m(b) is continuous in b for any realization of the data; n i i p 6. E[H(W, θ)] is nonsingular (only required at true param- 3. m(b) is a measurable function of the data for all b ∈ Θ Under conditions necessary to implement ULLN, Wn −→ eter). (this ensures that θˆ is measurable); S−1.

Then θˆn is asymptotically normal with variance 4. The weight matrices Sn converge in probability to some Minimum Distance estimator (Mahajan 2-23–4) Estimator θˆn ≡ −1 −1 symmetric positive definite matrix S. 0 (E[H(W, θ)]) Σ(E[H(W, θ)]) . argminb∈Θ gn(b) Sngn(b) for some square weight matrix Sn. “Under appropriate conditions for a ULLN to apply,” this GMM is a special case where gn are sample averages of some Suppose further that: variance is estimated by: function.

−1 1. Identification: E[m(W, b)] 6= 0 for any b 6= θ; Asymptotic normality for MD estimator (Mahajan 3-22– h 1 X i h 1 X 0i 1 0 Vb = H(Wi, θn) s(Wi, θn)s(Wi, θn) 4) Let θˆn ≡ argmax − [gn(b)] Wn[gn(b)] ≡ n n 2. Dominance to ensure ULLN applies: b∈Θ 2 argmax Qn(b). Define Jacobian [transpose of way we | {z } E[sup km(W, b)k] < ∞. b∈Θ Σb b∈Θ ∂gn(b) usually write derivatives?] Gn(b) ≡ ∂b . Suppose: h i−1 1 X p H(Wi, θn) . ˆ n Then θn −→ θ. 1. The matrix Gn(θn)WnGn(¯bn) is invertible (where ¯bn ˆ Showing identification and dominance for nonlinear GMM is is a point between θ and θn for which a mean value Method of Moments estimator (Mahajan 2-19–20) Suppose iid expansion holds); d quite difficult and usually just assumed. If objective function data from a parametric model with θ ∈ R identified and is concave, we can replace compact Θ; continuous, measur- √ d the first d moments of P exist: {m (θ)}d ≡ {E Xj }d . 2. ng (θ) −→ N(0,S(θ)) (by a CLT); θ j j=1 θ j=1 able m; and dominance by requirement that E[m(W, b)] exist n Method of Moments estimator gives moments equal to sam- p and be finite for all b ∈ Θ. 3. G (b ) −→ G(θ) (by a ULLN); ple moments: n n p n 4. Wn −→ W . (Mahajan 3-24–6; 1 X j Asymptotic normality for GMM estimator ˆ 1 0 mj (θ) = mj ≡ X for all j ∈ {1, . . . , d}. Hayashi 478–81) Let θˆ ≡ argmax − [m (b)] W [m (b)] ≡ ˆ b n i n b∈Θ 2 n n n Then θn is asymptotically normal with variance i=1 1 P argmax Qn(b), where mn(b) ≡ m(Xi, b )m×1. b∈Θ n d×1 [G(θ)WG(θ)0]−1G(θ)WS(θ)WG(θ)0[G(θ)WG(θ)0]−1. 1 P ∂m(b) ∂mn(b) Generalized Method of Moments estimator (Mahajan 2-20–1; Jacobian Mn(b)d×m ≡ n ∂b = ∂b . Suppose: d −1 Hayashi 447, 468) GMM estimates parameter θ ∈ R satisfy- If we choose W = S(θ) (the efficient choice), the asymp- m totic variance reduces to [G(θ)S(θ)−1G(θ)0]−1. ing E[m(X, θ)] = 0 where m: X × Θ → R a vector of m 1. The matrix Mn(θn)WnMn(¯bn) is invertible; moment conditions. If θ is identified, it is the unique solution √ d 0 Uniformly minimum variance unbiased estimator (Mahajan to the moment conditions. 2. nmn(θ) −→ N(0, E[m(X, θ)m(X, θ) ]) ≡ N(0,S(θ)) (by a CLT); 2-27–9, Metrics P.S. 6-1; C&B 7.3.7, 17, 19–20, 23, 7.5.1) An unbiased When m ≥ d we typically can’t find a θˆ satisfying all moment estimator φ(X) of a quantity g(θ) is a UMVUE (a.k.a. best ˆ p ∂m(X,θ) conditions, so instead we seek θ = argminb∈Θ Qn(b) where unbiased estimator) iff φ has finite variance and for every un- 3. Mn(bn) −→ E[ ∂b ] ≡ M(θ) (by a ULLN); 0 biased estimator δ(X) of g(θ), we have Var φ(X) ≤ Var δ(X)  1 X   1 X  p Qn(b) ≡ n m(Xi, θ) S n m(Xi, θ) 4. Wn −→ W . for all θ. Note: ∗ If {Wi} is non-iid ergodic stationary, then Σ is the long-run variance of {s(Wi, θ)}; Gordin’s conditions are sufficient for this convergence.

13 1. Unbiased estimators may not exist; Then Type I/Type II error (Mahajan 4-2) Type I error: rejecting H when in fact θ ∈ ΘH . Type II error: accepting H when 2. Not every unbiased estimator is a UMVUE;  dg(θ) 0  dg(θ)  Var φ(X) ≥ I(θ)−1 . θ ∈ ΘK . 3. If a UMVUE exists, it is unique; θ dθ dθ Power (Mahajan 4-2–3) β (θ) ≡ P (δ(X) = 1), for θ ∈ Θ . The 4. (Rao-Blackwell) If h(X) is an unbiased estimator of δ θ K Attained iff there exists a function a(θ) such that chance of (correctly) rejecting H given that the true param- g(θ), and T (X) is a sufficient statistic for θ, then eter is θ ∈ Θ . One minus the probability of Type II error φ(T ) ≡ E[h(X)|T ] is unbiased for g(θ), and has vari- ∂ K a(θ)[φ(x) − g(θ)] = log f(x|θ). at a given θ. ance (weakly) lower than h(X) for all θ—means we only ∂θ need consider statistics that are functions of the data Size (Mahajan 4-3; C&B 8.3.5) For θ ∈ Θ , the power function only through sufficient statistics; 1.11 Decision theory H βδ(θ) ≡ Pθ(δ(X) = 1) gives the probability of a Type I error 5. If φ(T ) is an unbiased estimator of g(θ) and is a func- (rejecting H incorrectly). Size is the largest probability of tion of a complete statistic T (X), then all other un- Loss function (Mahajan 2-24–5) We observe x ∈ X with unknown this occurring: sup β (θ). + θ∈ΘH δ biased estimators that are functions of T are equal to distribution Pθ ∈ P. Loss function l: P ×A → R (where A φ(T ) almost everywhere; is the action space), l(Pb, a) gives the loss that occurs if the Level (Mahajan 4-3; C&B 8.3.6) A test is called level α for some statistician chooses action a and the true distribution is Pb. α ∈ [0, 1] if its size is at most α. 6. (Lehmann-Scheff´e)If φ(T ) is a function (only) of a com- Common examples when estimating parameter v(P ) include plete statistic T (X), then φ(T ) is the unique UMVUE Distance function principle (Mahajan 4-7–8) When the hypoth- of E φ(T ); 2 1. Quadratic loss: l(P, a) ≡ [v(P ) − a] ; esis is framed in terms of a parameter θ for which we have 7. (Hausman Principle) W is a UMVUE for E W iff it is 2. Absolute value loss: l(P, a) ≡ |v(P ) − a|; a good estimate θˆ, it may be reasonable to reject if the dis- uncorrelated with every unbiased estimator of 0 (prac- tance between estimate and θ ∈ ΘH is large; i.e., reject if tically, impossible to prove, except for the case where 3. 0-1 loss: action space is {0, 1} and l(Pθ, a) is zero if ˆ T (x) ≡ infb∈ΘH d(θ, b) > k, where d(·, ·) is some distance W is a function only of a complete statistic). θ ∈ Θa, one otherwise. function.

Decision rule (Mahajan 2-25) A mapping from the sample space X Fisher Information (Mahajan 2-30–1, Metrics P.S. 6-3) p-value (Mahajan 4-8–9; C&B 8.3.26) The smallest level of significance to the action space A. at which a researcher using a given test would reject H on the  0  ∂  ∂  basis of observed data x. The magnitude of the P-value can I(θ) ≡ Eθ log f(x, θ) log f(x, θ) . Risk function (Mahajan 2-25) Expected loss (expectation is taken ∂θ ∂θ be interpreted as a measure of the strength of the evidence | {z } using the true distribution): R(Pθ, φ) = Eθ[l(Pθ, φ)]. Score for H.

Fisher information for an iid sample is n times information risk function (Mahajan 2-25–6; Greene 109– C&B offers a different conception: a test statistic with for each individual observation. For (univariate) normal, 11; C&B 7.3.1) If loss function l(·, ·) is quadratic loss, risk p(x) ∈ [0, 1] for all x ∈ X , and Pθ(p(x) ≤ α) ≤ α for every function is MSE risk function: R(Pθ, φ) ≡ Eθ[φ(X) − θ ∈ ΘH and every α ∈ [0, 1]. 2 σ−2 0   2  g(θ)] , where g(θ) is the parameter being estimated. Note 2 2 −1 σ 0 2 If we observe an iid sample from normal with known variance I(µ, σ ) = 1 −4 ; I(µ, σ ) = 4 . R(Pθ, φ) = Biasθ[φ(X)] + Varθ[φ(X)]. √ ¯ √ ¯ 0 2 σ 0 2σ and use T (x) = nX/σ, the P-value is Φ(− nX/σ). Typically it is infeasible to minimize MSE, since it depends on the (presumably unknown) parameter g(θ). So we often Uniformly most powerful test (Mahajan 4-10, 19; C&B 8.3.11) Let Cram´er-RaoInequality (Mahajan 2-30–1, 5; C&B 7.3.9–11, 15) Given ∗ use UMVUE instead. C be a set of tests. A test c ∈ C with power function β∗(θ) a sample X ∼ f(x|θ) and an estimator φ(X) with Eθ φ(X) = g(θ), suppose that is a UMP class C test of H against K : θ ∈ ΘK iff for every c ∈ C and every θ ∈ ΘK , we have β∗(θ) ≥ βc(θ). That is, for ∗ 1. The support does not depend on θ; 1.12 Hypothesis testing every θ ∈ ΘK , the test c is the most powerful class C test. In many cases, UMP level α tests do not exist. 2. pdf f(x|θ) is differentiable in θ almost everywhere; Hypothesis testing (Mahajan 4-1–2) We observe data x from dis- tribution P ∈ P (usually {Pθ : θ ∈ Θ}), and must decide 3. Eθ |φ| < θ (or per C&B, Varθ φ < ∞); Simple likelihood ratio statistic (Mahajan 4-10) For simple null whether P ∈ P ⊆ P. The null hypothesis H is that H and alternate hypotheses, L(x, θ , θ ) ≡ p(x|θ )/p(x|θ ) 4. The operations of differentiation and integration can be P ∈ P ⊆ P, or equivalently θ ∈ Θ . If P is a singleton H K K H d R H H H (where p is either a pdf or pmf). By convention, interchanged in φ(x)f(x, θ) dx; we call H simple, otherwise composite (identically for K). dθ L(x, θ , θ ) = 0 when both numerator and denominator The alternate hypothesis K is P ∈ P ⊆ P (or θ ∈ Θ ), H K 5. Fisher Information I(θ) is nonsingular. Note under pre- K K are zero. vious conditions, where PH ∩ PK = ∅ and we assume the maintained hypoth- esis P ∈ PH ∪ PK . ∂ Neyman-Pearson Lemma (Mahajan 4-12; C&B 8.3.12) Consider (a) I(θ) = Varθ[ ∂θ log f(x, θ)]; testing H : θ = θ0 against K : θ = θ1 where pdf/pmf is p(·|θ). (b) If f(·, θ) is twice differentiable and double integra- Test statistic (Mahajan 4-2) Test statistic (a.k.a. test function) is a tion and differentiation under the integral sign can decision rule δ : X → {0, 1}, where 1 corresponds to rejecting Consider (randomized) likelihood ratio test function: be interchanged, then (Fisher Information Equal- H, and 0 to accepting it. Typically we evaluate a test over  1,L(x, θ , θ ) > k; ity): this action space using the 0-1 loss function.  0 1  ∂2  φk(x) = 0,L(x, θ0, θ1) < k; I(θ) = − E log f(x, θ) . Equivalently defined by a critical region in the sample space 0  ∂θ ∂θ C ≡ {x ∈ X : δ(x) = 1}. γ ∈ (0, 1),L(x, θ0, θ1) = k.

14 √ (Mahajan 4-24–5, 34–6; Greene 159–67; C&B ˆ 1. If φk is a size α (> 0) test (i.e., Eθ0 [φk(X)] = α), then Likelihood ratio test 1. n(θ − θ0) has Taylor expansion φk is the most powerful in the class of level α tests. 10.3.1, 3; D&M 275) Test that rejects for large values of √ −1√ ∂Qn(θ0) n(θˆ − θ0) = −Ψ n + op; 2. For each α ∈ [0, 1] there exists a MP size α likelihood sup p(x, θ) ∂θ θ∈ΘK ratio test of the form φk. L(X) ≡ . sup p(x, θ) √ d ˜ θ∈ΘH ∂Qn(θ0) 3. If a test φ is a MP level α test, then it must be a level α 2. n ∂θ −→ N(0, Σ) for some positive definite Σ; √ likelihood ratio test. That is, there exists k and γ such Because distribution properties are often complex (and be- ˆ ˜ 3. n(θ − θ0) converges in distribution (to something); that for any θ ∈ {θ0, θ1}, we have Pθ(φ(X) 6= φ(X)) = cause ΘH is often of smaller dimension than Θ = ΘH ∪ ΘK , 0. 4. Σ = −Ψ. which means sup over ΘK equals sup over Θ), we often use Corollary: Power of a MP test ≥ its size. Consider the test These conditions are satisfied for ML and efficient GMM un- sup p(x, θ) p(x, θˆ) der certain conditions. that rejects for all data with probability α; it has power α, λ(x) ≡ θ∈Θ = ˆ ∂a(θ) supθ∈Θ p(x, θ) p(x, θC ) so MP test can’t do worse. H Under null H0 : ar×1(θ0) = 0, with A(θ) ≡ ∂θ0 and A(θ0) of full row rank (no redundant restrictions), define Monotone likelihood ratio models (Mahajan 4-16–8; C&B 8.3.16) ˆ ˆ ˆ where θ is the MLE and θC is the constrained MLE. the constrained estimator as θc ≡ argmaxb∈Θ Qn(W, b) s.t. {P : θ ∈ Θ}, where Θ ⊆ is a MLR family if L(x, θ , θ ) θ R 1 2 We will generally base our test on some monotonic function a(b) = 0. Then is a monotone function of some function of the data T (x) ˆ ˆ of λ(x). Often 2 log λ(x) = 2[log p(x, θ)−log p(x, θC )]. Writ- ˆ ˆ in the same direction for all pairs θ2 > θ1. If L(x, θ1, θ2)  ∂Qn(θc) 0  ∂Qn(θc)  d ten more generally, we often have LM ≡ n Σe−1 −→ χ2 is increasing (decreasing) in T (x), the family is increasing ∂θ ∂θ r (decreasing) MLR in T (x). | {z } |{z} | {z } ˆ ˆ d 2 1×p p×p 1×p 2n[Qn(θ) − Qn(θC )] −→ χ , In the one parameter exponential family f(x|θ) = r where Σ is a consistent estimator under the null (i.e., con- h(x) exp[η(θ)T (x) − B(θ)], where r is the number of restrictions (and where here, e 1 P structed using θˆc).    Qn(b) ≡ n log p(xi, b)). L(x, θ1, θ2) = exp η(θ2) − η(θ1) T (x) − B(θ2) − B(θ1) ; Asymptotic significance (Mahajan 39) A test δn(·) has asymp- Wald statistic (Hayashi 489–91; Greene 159–67; Mahajan 36–8; D&M 278) and if η(θ) is strictly increasing in θ ∈ Θ, the family is MLR totic significance level α iff limn→∞ PP (δn(X) = 1) ≤ α for Consider an estimator θˆn ≡ argmax Qn(W, b) satisfying: increasing in T (x). b∈Θ all P ∈ P0. √ Suppose {Pθ : θ ∈ Θ} where Θ ⊆ R is MLR increasing in 1. n(θˆ − θ0) has Taylor expansion Asymptotic size (Mahajan 39) Asymptotic size (a.k.a. limiting T (x). Let δt(·) be the test that rejects if T (x) > t (and size) of a test δ (·) is lim sup P (δ (X) = 1). n n→∞ P ∈P0 P n possibly with some random probability if T (x) = t). Then, √ −1√ ∂Qn(θ0) n(θˆ − θ0) = −Ψ n + op; ∂θ Consistent test (Mahajan 39) A test δn(·) is consistent iff lim P (δ (X) = 1) = 1 for all P ∈ P . 1. The power function βδt (θ) is increasing in θ for all θ ∈ Θ n→∞ P n 1 and all t; √ ∂Qn(θ0) d 2. n ∂θ −→ N(0, Σ) for some positive definite Σ; 2. If E0[δt(X)] = α > 0, then δt(·) is UMP level α for 1.13 Time-series concepts √ ˆ testing H : θ ≤ θ0 against K : θ > θ0. 3. n(θ − θ0) converges in distribution (to something); 4. Σ = −Ψ. iid white noise =⇒ stationary mds with finite variance =⇒ Unbiased test (Mahajan 4-20; C&B 8.3.9; Greene 157–8) A test φ for white noise. H : θ ∈ Θ against K : θ ∈ Θ is unbiased of level α iff H K These conditions are satisfied for ML and efficient GMM un- Stationarity (Hayashi 98–9; Hansen B1-11; D&M 132) {z } is (strictly) der certain conditions. i 1. βφ(θ) ≤ α for all θ ∈ ΘH (i.e., level α); and stationary iff for any finite set of subscripts i1, . . . , ir, the ∂a(θ) Under null H0 : ar×1(θ0) = 0, with A(θ) ≡ 0 and A(θ0) joint distribution of (zi, zi1 , zi2 , zir ) depends only on i1 − 2. βφ(θ) ≥ α for all θ ∈ ΘK (i.e., power ≥ α everywhere). ∂θ of full row rank (no redundant restrictions), i, . . . , ir − i but not on i.

If φ is biased, then there is some θH ∈ ΘH and θK ∈ ΘK Any (measurable) transformation of a stationary process is ˆ 0 ˆ −1 ˆ 0 −1 ˆ d 2 such that βφ(θH ) > βφ(θK ); i.e., we are more likely to reject W ≡ na(θ) [A(θ)Σb A(θ) ] a(θ) −→ χr. stationary. under some true value than some false value. Covariance stationarity (Hayashi 99–100, 401; Hansen B1-11) {zi} is Caution: Σ is the inverse of the asymptotic variance of the Uniformly most powerful unbiased test (Mahajan 4-20–3) A covariance- (or weakly) stationary iff: estimator. test φ∗ for H : θ ∈ Θ against K : θ ∈ Θ is UMPU of H K 1. µ ≡ E[z ] does not depend on i, and level α iff it is unbiased at level α and it is more powerful Note the restricted estimator doesn’t enter calculation of the i (for all θ ∈ ΘK ) than any other unbiased test of level α. Wald statistic, so the size of resulting tests may be limited. 2. Γj ≡ Cov[zi, zi−j ] (the j-th order autocovariance) ex- Also, the Wald statistic (unlike the LR or LM statistics) is ists, is finite, and depends only on j but not on i. The first condition is not strictly necessary, since the con- not invariant to the restriction formulation. stant randomized level α test is unbiased. If φ is UMP level A strictly stationary process is covariance-stationary as long α, it will also be UMPU level α. as the variance and covariances are finite. Lagrange multiplier statistic (Hayashi 489, 491–93; Greene 159–67; See Mahajan 4-23 for UMPU tests in one parameter expo- Mahajan 38–9; D&M 275–8) Consider an estimator θˆn ≡ LLN for covariance-stationary processes with vanishing au- p nential families. argmaxb∈Θ Qn(W, b) ∈ R satisfying: tocovariances:

15 L2 1. If lim γ = 0, theny ¯ −−→ µ; Martingale (Hayashi 102–3; Hansen B1-14; D&M 133) {xi} is a CLT for MA(∞): If {εt} is iid white noise, and absolute j→∞ j P √ martingale with respect to {zi} (where xi ∈ zi) iff summability |ψj | < ∞ holds, then P P j∈Z 2. If j∈ γj < ∞, then limn→∞ Var( ny¯) = j∈ γj < Z Z E[xi|zi−1, zi−2, . . . , z1] = xi−1 for i ≥ 2. ∞ (called the “long-run variance”).     √ d X n[¯y − µ] −→ N 0, γ = N 0, σ2[ψ(1)]2 . Random walk (Hayashi 103; Hansen B1-14) An example of a martin- j ε White noise (Hayashi 101; Hansen B1-28) {εi} is white noise iff: j∈ gale. Let {gi} be an iid white noise process. The sequence Z 1. It is covariance-stationary, of cumulative sums {zi} where zi = g1 + ··· + gi is a random walk. Autoregressive process of degree one (Hayashi 376–8, 385) yt = 2. E[εi] = 0 for all i (zero mean), and 2 c+φyt−1 +εt, with {εt} white noise and Var εt = σε . Equiv- Martingale difference sequence (Hayashi 104; D&M 134; Hansen 3. Γj ≡ Cov[εi, εi−j ] = 0 for j 6= 0 (no serial correlation). alently, (1 − φL)yt = c + εt, or (1 − φL)(yt − µ) = εt where B1-14) {gi} is an mds iff E[gi|gi−1, gi−2, . . . , g1] = 0 for i ≥ 2. µ ≡ c/(1 − φ). An iid sequence with mean zero and finite variance is an The cumulative sum of an mds is a martingale; conversely, “independent” (i.e., iid) white noise process. 1. If |φ| < 1, then the unique covariance-stationary so- the first differences of a martingale are an mds. A mds has P∞ j lution is the MA(∞) yt = µ + φ εt−j ; it has no serial correlation (i.e., Cov(gi, gj ) = 0 for all i 6= j). j=0 Ergodicity (Hayashi 101–2, 402–5; Hansen B1-12, 26; D&M 132–3) {zi} is absolutely summable coefficients. ergodic iff it is stationary and for any two bounded functions k l Ergodic stationary martingale differences CLT (Hayashi • E yt = µ, f : R → R and g : R → R, † 106–7; Hansen B1-15–6) Given a stationary ergodic mds {gi} • γ = φj σ2/(1 − φ2), 0 j ε with E[gigi] = Σ, lim E[f(zi, . . . , zi+k) · g(zi+n, . . . , zi+n+l)] • Autocovariances γj are absolutely summable, n→∞ j n • ρj ≡ γj /γ0 = φ , = E[f(z , . . . , z )] · E[g(z , . . . , z )] . √ 1 X d i i+k i+n i+n+l ng¯ ≡ √ gi −→ N(0, Σ). P 2 2 n • Long-run variance j∈ γj = σε /(1 − φ) . i=1 Z (Note the RHS needs no limit in n, since by stationarity it 2. If |φ| > 1, then the unique covariance-stationary solu- doesn’t affect g(·).) Intuitively, represents “asymptotic in- j tion is a MA(∞) of future values of ε. dependence”: two elements sufficiently far apart are nearly Lag operator (Hayashi 369–74; Hansen B1-31–3) L xt ≡ xt−j . Filter 2 independent. α(L) ≡ α0 + α1L + α2L + ··· . Filter applied to a constant 3. If |φ| = 1 (“unit root”), there is no covariance- P∞ is α(L)c = α(1)c = c j=0 αj . stationary solution. A sequence of the form {Z(zi, . . . , zi+k)} is also ergodic. P Ergodic Theorem is LLN for ergodic processes: if {zi} er- 1. If coefficients are absolutely summable ( |αj | < ∞), Autoregressive process of degree p (Hayashi 378–80, 385) 1 as P and {xt} is covariance-stationary, then yt ≡ α(L)xt godic stationary with E[zi] = µ, thenz ¯n ≡ n i zi −→ µ. converges in L and is covariance-stationary; if autoco- yt = c + φ1yt−1 + ··· + φpyt−p + εt, Gordin’s CLT for zero-mean∗ ergodic stationary processes: 2 variances of {x } are absolutely summable, then so are Suppose {z } is ergodic stationary and t 2 i autocovariances of {y }. with {εt} white noise and Var εt = σε . Equivalently, t p φ(L)yt = c + εt where φ(L) ≡ 1 − φ1L − · · · − φpL (note 0 −1 1. E[ztz ] exists and is finite; t 2. As long as α0 6= 0, the inverse α(L) is well defined. the minus signs). Equivalently, φ(L)(yt − µ) = εt where P L2 3. If α(z) = 0 =⇒ |z| > 1 (the “stability condi- µ ≡ c/φ(1) = c/(1 − φj ). 2. E[zt|z , z ,... ] −−→ 0 as j → ∞ (i.e., knowing t−j t−j−1 −1 about the distant past does you no good in estimating tion”), then the coefficients of α(L) are absolutely Assuming stability (i.e., φ(z) = 0 =⇒ |z| > 1), today); summable. q 1. The unique covariance-stationary solution is the P∞ 0 −1 3. j=0 E[rtj rtj ] < ∞, where rtj ≡ Moving average process (Hayashi 366–8, 402; Hansen B1-28–9, 36–7) MA(∞) y = µ+φ(L) ε ; it has absolutely summable P∞ t t yt = µ + ψj εt−j , with {εt} white noise and Var εt = E[zt|zt−j , zt−j−1,... ] − E[zt|zt−j−1, zt−j−2,... ] are j=0 coefficients; 2 the “revisions” to conditional expectations as the in- σε . Equivalently, (yt −µ) = ψ(L)εt. Given absolute summa- P 2. E yt = µ; bility |ψj | < ∞, a sufficient condition for convergence formation set increases. j∈Z (in L2), 3. Autocovariances γj are absolutely summable; Then P 2 2 4. Long-run variance j∈ γj = σε /[φ(1)] . 1. E[yt] = µ; Z 1. E[z ] = 0; t 2 P∞ 2. Autocovariances γj = σε k=0 ψj+kψk (if we have a ARMA process of degree (p, q) (Hayashi 380–3, 385; Hansen B1-30, 2 2. Autocovariances are absolutely summable (i.e., MA(q) process, γ = (ψ ψ +ψ ψ +···+ψ ψ )σ 43–7) Autoregressive/moving average process with p autore- P j j 0 j+1 1 q q−j ε |Γj | < ∞); j∈Z for |j| ≤ q); gressive lags and q moving average lags: 3.z ¯ is asymptotically normal with long-run variance as in 3. Autocovariances γj are absolutely summable (i.e., covariance-stationary processes with vanishing covari- P yt = c + φ1yt−1 + ··· + φpyt−p + θ0εt + ··· + θqεt−q, |γj | < ∞); ances: j∈Z 2 √   P 2 2 with {εt} white noise and Var εt = σ . Equivalently, d X 4. Long-run variance γj = σ [ψ(1)] . ε nz¯ −→ N 0, Γj . j∈Z ε j∈ Z 5. If {εt} is iid white noise, {yt} is ergodic stationary. φ(L)yt = c + θ(L)εt

∗Hansen B1-26 gives a slightly different statement for nonzero-mean ergodic stationary processes. †This is actually a corollary of Billingsley’s stronger CLT (which does not require ergodic stationarity) stated in Hansen B1-16.

16 p where φ(L) ≡ 1 − φ1L − · · · − φpL (note the minus signs) GARCH process (Hansen B1-38–9) GARCH(p, q) is a sequence 1. Exact ML: use the fact that q and θ(L) ≡ θ0 + θ1L + ··· + θqL . Equivalently, {ri} with conditional variance given by f(yn, . . . , y0) f(yn−1, . . . , y0) f(yn, . . . , y0) = · · 2 f(yn−1, . . . , y0) f(yn−2, . . . , y0) φ(L)(yt − µ) = θ(L)εt Var[ri|ri−1, . . . , r1] ≡ σi = 2 2 2 2 ···· f(y0) ω + α1r + ··· + αqr + β1σ + ··· + σ . where µ ≡ c/φ(1) = c/(1 − P φ ). Note if φ(z) and i−1 i−q i−1 i−p n j hY i θ(z) have a common root, we can factor it out and get an 2 2 = f(yt|yt−1, . . . , y0) f(y0). For {ri} a GARCH(1, 1) process, define vi ≡ ri − σi . Then ARMA(p − 1, q − 1). 2 2 t=1 we can write {ri } as an ARMA(1, 1) and {σi } as an AR(1), Assuming stability (i.e., φ(z) = 0 =⇒ |z| > 1), with shocks {vi}: Log-likelihood is generally nonlinear due to the last term. r2 = ω + (α + β )r2 + v − β v , 1. The unique covariance-stationary solution is the i 1 1 i−1 t 1 t−1 2. Conditional ML: use the (linear) −1 2 2 MA(∞) yt = µ + φ(L) θ(L)εt; it has absolutely σ = ω + (α1 + β1)σ + α1vt−1. i i−1 n summable coefficients; X log f(yn, . . . , y0|y0) = log f(yt|yt−1, . . . , y0). Local level model (Hansen B1-41–2) X is “true” latent variable, 2. E yt = µ; t t=1 Yt is observed value: 3. Autocovariances γj are absolutely summable. Yt = Xt + ηt, 1.14 Ordinary least squares

If θ(z) = 0 =⇒ |z| > 1 (here called the “invertability Xt = Xt−1 + εt −1 condition”), we can write as a AR(∞): θ(L) φ(L)yt = Ordinary least squares model (Hayashi 4–12, 34, 109–10, 123, 126; (c/θ(1)) + εt. with {ηt} (measurement error) and {εt} independent iid Mahajan 6-4–6, 12–3) Let ∗ shocks. Yt is an ARMA(1, 1) with Yt = Yt−1+εt+ηt−ηt−1.     For ARMA(1, 1) yt = c + φyt−1 + εt − θεt−1, long-run vari- xi1 β1 . . ance is Estimating number of lags (Greene 834–5) Calculate sample au- X =  .  , β =  .  ,  2 i  .   .  X 2 1 − θ tocorrelations and partial autocorrelations using OLS: |{z} |{z} γj = σ . K×1 x K×1 β ε 1 − φ iK K j∈Z • acf: For ρk, regress yt on yt−k (and potentially a con- y  ε  X0  stant); 1 1 1 For ARMA(p, q) y = c + [P φ y ] + ε − [P θ ε ], Y =  .  , ε =  .  ,X =  .  . t i i t−i t j j t−j ∗  .   .   .  • pacf: For ρk, regress yt on yt−1, . . . , yt−k (and poten- |{z} |{z} |{z} long-run variance is n×1 n×1 n×K 0 tially a constant), and use the coefficient on yt−k. yn εn Xn 2 2 P ! The model may assume: X [θ(1)] 1 − j θj Typically, an AR(p) will have acf declining monotonically, γ = σ2 = σ2 . j ε 2 ε P and pacf irregular out to lag p, after which they will abruptly [φ(1)] 1 − i φi 1. Linearity (p. 4): Y = Xβ + ε; j∈Z drop to zero and remain there. 2. Strict exogeneity (p. 7–9): E[ε|X] = 0 (n.b., conditional Typically, an MA(q) will have pacf declining monotonically, on regressors for all observations); implies: Autoregressive conditional heteroscedastic process and acf irregular out to lag q, after which they will abruptly • E[ε] = 0, (Hayashi 104–5; Hansen B1-38) ARCH(q) is a martingale drop to zero and remain there. difference sequence {gi} with conditional variance • E[xjkεi] = 0 for all i = 1, . . . , n, j = 1, . . . , n, 2 2 An ARMA(p, q) is a mixture of the above. k = 1,...,K, Var[gi|gi−1, . . . , g1] = ζ + α1gi−1 + ··· + αqgi−q. For example, ARCH(1) is: • Cov(εi, xjk) = 0 for all i = 1, . . . , n, j = 1, . . . , n, Estimating AR(p) (Hayashi 392–4, 547–9) Suppose {yt} with yt = p k = 1,...,K; c + [P φ y ] + ε , and {ε } iid white noise (note we q j=1 j t−j t t 2 3. No multicolinearity (p. 10): rank(X) = K with proba- gi = εi ζ + αgi−1 need iid for ergodicity and stationarity). Then we can use 0 bility 1 (full column rank); OLS to regress yt on (1, yt−1, . . . , yt−p) , since the model satisfies linearity, stationary ergodicity, independence of re- 0 2 and {εi} iid with E εi = 0 and Var εi = 1. 4. Spherical error variance (p. 10–2): E[εε |X] = σ In, or gressors from errors, conditional homoscedasticity, mds and equivalently, finite second moments of g, and the rank condition. We have 2 2 1. E[gi|gi−1, . . . , g1] = 0; • Conditional homoscedasticity: E[ε |X] = σ for all consistent, asymptotically normal estimation of φ, as well as i 2 2 i = 1, . . . , n, 2. Var[gi|gi−1, . . . , g1] = E[gi |gi−1, . . . , g1] = ζ + αgi−1, consistent variance estimation. which depends on the history of the process (own con- • No correlation between observations: E[εiεj |X] = Under Gaussian AR(1), conditional ML for φ is numerically ditional heteroscedasticity); 0 for i 6= j; equal to OLS estimate, and conditional MLσ ˆ2 = SSR . CML n 5. Normal errors (p. 34): ε|X ∼ N(0, σ2I ), which given 3. Strictly stationary and ergodic if |α| < 1 and g drawn n 1 above assumptions implies ε ⊥⊥ X and ε ∼ N(0, σ2I ); from an appropriate distribution (or process started in Maximum likelihood with serial correlation (Hayashi 543–47) n infinite past); Note that treating a serially correlated process as if it were 6. Ergodic stationarity (p. 109): {yi,Xi} is jointly station- iid may give consistent (“quasi-”) ML estimators, since it is ary and ergodic (implies unconditional homoscedastic- 2 4. If the process is stationary, E[gi ] = ζ/(1 − α). an M-estimator no matter the data’s distribution. ity, but allows conditional heteroscedasticity);

∗ It is not clear to me what the white noise sequence and MA coefficients are that generate this sequence, but since εt + ηt − ηt−1 is covariance stationary with two nonzero autocovariances, we should be able to fit one.

17 7. Predetermined regressors (p. 109–10) All regressors Asymptotic properties of OLS estimator (Hayashi 113, 115) Projection matrix (Hayashi 18–9; Mahajan 6-9–10; D&M 292–3) P ≡ are orthogonal to the contemporaneous error term: OLS estimator b satisfies X(X0X)−1X0. Projects onto space spanned by columns of E[xikεi] = 0 for all i = 1, . . . , n, k = 1,...,K; also p X (PX = X). Fitted values Yb ≡ Xb = PY . Idempotent 1. Consistent: b −→ β (under assumptions 1, 6–8). written E[gi] = 0 where gi ≡ Xiεi; and symmetric. √ d 8. Rank condition (p. 110): The K × K matrix Σxx ≡ 2. Asymptotically normal: n(b − β) −→ N(0, Avar(b)) An oblique projection matrix P Ω ≡ X(X0Ω−1X)−1X0Ω−1 0 −1 −1 E[XiXi] is nonsingular (and hence finite); where Avar(b) = Σxx SΣxx (assumptions 1, 6, 8–9). (and its associated annihilator) are idempotent, but not sym- metric. 9. gi is a mds with finite second moments (p. 110): {gi} ≡ 3. Consistent variance estimation: Assuming existence of 0 {Xiεi} is a mds (which implies assumption 7); the a consistent estimator Sb for S ≡ E[gigi], estimator 0 −1 −1 Annihilator matrix (Hayashi 18–9, 30–1; Mahajan 6-9–10) M ≡ I −P . K × K matrix S ≡ E[gigi] is nonsingular; Avar(\b) ≡ S SS is consistent for Avar(b) (assump- xx b xx Residuals e = Y − Yb = MY = Mε, and thus are orthogonal 10. Finite fourth moments for regressors (p. 123): tion 6). to X (since MX = 0). 2 E[(xikxij ) ] exists and is finite for all k, j = 1,...,K; OLS error variance estimator s2 (a.k.a. variance of residu- Sum of squared residuals SSR ≡ e0e = Y 0MY = Y 0e = 2 2 2 0 0 11. Conditional homoscedasticity (p. 126): E[εi |xi] = σ > als) is consistent for E[ε ], assuming expectation exists and e Y = ε Mε. 2 i 0 (implies S = σ Σxx). is finite (assumptions 1, 6–8). M is idempotent and symmetric. rank(M) = tr(M) = n−K.

Ordinary least squares estimators (Hayashi 16–8, 19, 27–8, 31, Estimating S (Hayashi 123–4, 127–8) If βˆ (usually b) is consistent, √ 2 51–2; Mahajan 6-6–11) OLS estimator for β is b ≡ 0 Standard error of the regression (Hayashi 19) SER ≡ s . thenε ˆi ≡ yi − x βˆ is consistent. Then under assumptions 1, ˜ ˜ 0 ˜ i [a.k.a. standard error of the equation.] argminβ˜ SSR(β) ≡ argminβ˜(Y − Xβ) (Y − Xβ). FOCs give 6, and 10, a consistent estimator for S is 0 “normal equations” (a.k.a. “moment equations”) X Xb = 0 −1 0 0 0 1 n Sampling error (Hayashi 19, 35) b − β = (X X) X ε. Under nor- X Y ⇐⇒ X e = 0 where e is OLS residual e ≡ Y − Xb. X 2 0 2 0 −1 Sb ≡ εˆi xixi. mal error assumption, b − β|X ∼ N(0, σ (X X) ). 0 n By no multicolinearity, X X is nonsingular (hence positive i=1 2 definite and satisfying SOC), so OLS R (Hayashi 20–1, 67; Mahajan 6-11–2; Greene 250–3) In population, Under assumptions 1, 6–8, and 11 (note with conditional ho- the fraction of variation in Y attributable to variation in X 0 −1 0 −1 moscedasticity, we no longer require finite fourth moments), b = (X X) X Y = S sXY XX 2 is can use S ≡ s Sxx where s is the (consistent) OLS estimate 0 b Var ε Var(x β) 2 2 1 − = = (ρ 0 ) . minimizes normalized distance function of σ . Var y Var y y,x β

1 1 Biases affecting OLS (Hayashi 188–9, 194–7) OLS estimation in- 0 0 (Y − Zδ˜)0(Y − Zδ˜) = SSR. (“Uncentered”) sample analogue is: Yb Y/Yb Y = 1 − 2 2 2nσˆ 2nσˆ consistent due to, e.g.: e0e/Y 0Y . 1. Unbiased: E[b|X] = β (under assumptions 1–3). 1. Simultaneity (p. 188–9); Centered: should be used if regressors include a constant; 2. Gauss-Markov: b is most efficient among linear unbi- 2. Errors-in-variables (e.g., classical measurement error) P(ˆy − y¯)2 e0e ased estimators,∗ i.e., best linear (in Y ) unbiased esti- (p. 194–5); i = 1 − . P(y − y¯)2 P(y − y¯)2 mator (assumptions 1–4). 3. Unobserved variables (p. 196–7). i i 3. Cov(b, e|X) = 0, where e ≡ Y − Xb (assumptions 1–4). Be careful when comparing equations on the basis of the Maximum likelihood estimator for OLS model (Hayashi 49; 2 0 −1 fit (i.e., R2): the equations must share the same dependent 4. Var[b|X] = σ (X X) (assumptions 1–4); therefore a Mahajan 6-18–9) Assuming normal errors, the OLS estimate variable to get a meaningful comparison. common estimator is Var(\b|X) ≡ s2(X0X)−1. Under for β is also the ML estimate. normal error assumption, CRLB ˆ Standard error (Hayashi 35–7) βML = b;  2 0 −1  −1 σ (X X) 0 σˆ2 = 1 e0e = 1 SSR = n−K s2. q q I = 4 ML n n n 2 0 −1 0 2σ /n SE(bk) ≡ s [(X X) ]kk = [Var(\b|X)]kk.

Best linear predictor (Mahajan 6-2–3; Hayashi 139–40) Population so b achieves CRLB and is UMVUE—this result is analogue of OLS. Suppose (y, x) a r.v. with y ∈ R and Robust standard error (Hayashi 117, 211-2) stronger than Gauss-Markov, but requires normal er- k x ∈ R . Best linear predictor of y given X is rors. q q ∗ 0 0 −1 SE∗(b ) ≡ 1 Avar(\b ) ≡ 1 [S−1SS−1] . Eb (y|x) ≡ L(y|x) = x E[xx ] E[xy]. k n k n xx b xx kk OLS estimator for σ2 is 0 0 ∗ SSR e0e If one regressor in x ≡ [1x ˜ ] is constant, then Eb (y|x) ≡ OLS t test (Hayashi 35–7; Mahajan 6-13–16) Assuming normal er- s2 ≡ = . ¯ ¯ n − K n − K Eb∗(y|1, x˜) = µ + γ0x˜ where γ = Var[˜x]−1 Cov[˜x, y] and rors. Test for H : βk = βk. Since bk − βk|X ∼ 2 0 −1 µ = E[y] − γ0 E[˜x]. N(0, σ [(X X) ]kk) under null, we have Unbiased (assumptions 1–4). Under normal error assump- 2 4 ¯ tion, Var(s |X) = 2σ /(n − K) which does not meet CRLB Fitted value (Hayashi 18) Yb ≡ Xb = PY . Thus OLS residuals bk − βk zk ≡ ∼ N(0, 1). but is UMVUE. e = Y − Y . p 2 0 −1 b σ [(X X) ]kk ∗Variance of b is less than or equal to variance of any other linear unbiased estimator of β in the matrix sense; i.e., difference is positive semidefinite.

18 2 2 If σ is unknown, use s to get t-ratio: CX,ε ˜ ≡ Cε, we get transformed modely ˜ = Xβ˜ +ε ˜ which Short and long regressions (Mahajan 6-33–7) Suppose we have satisfies assumptions of classical linear regression model. best linear predictors ¯ ¯ bk − βk bk − βk Thus we have the unbiased, efficient estimator: tk ≡ p ≡ ∼ tn−K ∗ 0 0 s2[(X0X)−1] SE(b ) Eb (y|x1, x2) = x1β1 + x2β2, kk k ˆ ˜ 0 −1 ˜ βGLS ≡ argmin(y − Xβ) V (y − Xβ) ∗ 0 ˜ Eb (y|x1) = x δ. under null. Accept for tk close to zero—t distribution sym- β 1 metric about zero. 0 −1 −1 0 −1 ≡ (X V X) X V y Then β1 = δ iff either E[x1x2] = 0 or β2 = 0 since ¯ Level-α confidence interval for βk is bk ±SE(bk)·tα/2(n−K). ˆ 2 0 −1 −1 Var(βGLS|X) ≡ σ (X V X) . 0 −1 0  δ = β1 + E(x1x ) E(x1x )β2 . ¯ 1 2 OLS robust t test (Hayashi 117–9) Test for H : βk = βk. Under This can also be seen as an IV estimator with instruments null, we have V −1X. Note this result is the population analogue of the Frisch- Waugh result for b . √ ¯ ¯ 1 n(bk − βk) bk − βk d Note consistency and other attractive properties rely on tk ≡ q ≡ ∗ −→ N(0, 1). strict exogeneity of regressors; if they are merely predeter- SE (bk) Avar(\bk) mined, the estimator need not be consistent (unlike OLS, 1.15 Linear Generalized Method of Mo- which is). ments OLS F test (Hayashi 39–44, 53; Mahajan 6-16–7, 20–5) Assuming nor- Feasible generalized least squares (Hayashi 59, 133-7, 415–7; Ma- mal errors. Test for H : R#r×K βK×1 = r#r×1, where This material is also covered (with slightly different hajan 6-25–9; D&M 298–301, 295) In practice, the covariance ma- rank(R) = #r (i.e., R has full row rank—there are no re- notation—notably the exchange of x with z—in Mahajan trix σ2V is generally not known (even up to a constant). dundant restrictions). Accept if F ∼ F#r,n−K (under null) 7. is small, where: Feasible GLS uses an estimate Vb , and “under reasonable conditions, yields estimates that are not only consistent but Linear GMM model (Hayashi 198–203, 212, 225–6, 406–7) The model also asymptotically equivalent to genuine GLS estimates, and may assume: (Rb − r)0[R(X0X)−1R0]−1(Rb − r)/#r therefore share their efficiency properties. However, even F ≡ 0 L s2 when this is the case, the performance in finite samples of 1. Linearity (p. 198): yi = ziδ + εi with zi, δ ∈ R ; 0 0 −1 FGLS may be much inferior to that of genuine GLS.” K = (Rb − r) [RVar(\b|X)R ] (Rb − r)/#r 2. Ergodic stationarity (p. 198): xi ∈ R instru- Caution that FGLS is that consistency is not guaranteed if ments; unique, non-constant elements of {(y , z , x )} (SSR − SSR )/#r n − K i i i = R U = (λ2/n − 1) the regressors are merely predetermined (rather than strictly are jointly ergodic stationary (stronger than individu- SSRU /(n − k) #r exogenous). ally ergodic stationary); −n/2 3. Orthogonality (p. 198): All K elements of x predeter- (and λ ≡ LU /LR = (SSRU / SSRR) the likelihood ra- Weighted least squares (Hayashi 58–9; Mahajan 6-27–8) GLS where i tio). The first and second expressions are based on Wald there is no correlation in the error term between observations mined (i.e., orthogonal to current error term): E[gi] ≡ 0 principle; the third on LR. (i.e., the matrix V is diagonal). GLS divides each element E[xiεi] = E[xi(yi − ziδ)] = 0; F ratio is square of relevant t ratio. F is preferred to multiple of X and Y by the standard deviation of the relevant error 4. Rank condition for identification (p. 200): Σxz ≡ term, and then runs OLS. That is, observations with a higher 0 t tests—gives an eliptical rather than rectangular confidence E[xizi]K×L has full column rank (i.e., rank L); conditional variance get a lower weight. region. • K > L: Overidentified/determined (use GMM es- Can also think of as timator), OLS robust Wald statistic (Hayashi 118, 122) Test for ˆ X 1 0 ˜ 2 • K = L: Just/exactly identified/determined (GMM H : R#r×K βK×1 = r#r×1, where rank(R) = #r (i.e., β = argmin 2 (yi − xiβ) . σi estimator reduces to IV estimator), R has full row rank—there are no redundant restrictions). β˜ i Under null, • K < L: Not/under identified/determined; Thus the “weights” typically passed to OLS software are the 5. Requirements for asymptotic normality (p. 202–3): 0 0 −1 d 2 reciprocal of the variances (not the reciprocal of the standard W ≡ n(Rb − r) [RAvar(\b)R ] (Rb − r) −→ χ#r. deviations). •{gi} ≡ {xiεi} is a martingale difference sequence 0 For H : a#a(β) = 0, where A#a×K (β) ≡ ∇a(β) is continu- with K × K matrix of cross moments E(gigi) non- ous and of full row rank, under null, Frisch-Waugh Theorem (D&M 19–24; Hayashi 72–4; Greene 245–7) singular; Suppose we partition X into X1 and X2 (and therefore β √ • S ≡ limn→∞ Var[ ng¯] = Avar(¯g) = −1 d into β and β ). Then OLS estimators W ≡ na(b)0A(b)Avar(\b)A(b)0 a(b) −→ χ2 . 1 2 1 P 1 P 2 0 #a Avar( n gi) = Avar( n εi xixi); 0 −1 0 b1 = (X1X1) X1(Y − X2b2); • By assumption 2 and ergodic stationary mds CLT, Generalized least squares (Hayashi 54–9, 415–7; Mahajan 6-25–9; 0 2 0 0 −1 0 S = E[gigi] = E[εi xixi]; b2 = (X2M1X2) (X2M1Y ) D&M 289–92, 295) Without assumption 4 (spherical error vari- 2 0 2 0 −1 0 6. Finite fourth moments (p. 212): E[(xikzil) ] exists and ance), we have E[εε |X] ≡ σ V with Vn×n 6= I assumed to = (X˜ X˜ ) X˜ Y˜ 2 2 2 is finite for all k and l; be nonsingular. Gauss-Markov no longer holds; the t and F ˜ ˜ 2 tests are no longer valid; however, the OLS estimator is still where X2 ≡ M1X2 and Y ≡ M1Y are the residuals from 7. Conditional homoscedasticity (p. 225–6): E(εi |xi) = 2 0 2 0 unbiased. regressing X2 and Y respectively on X1. σ , which implies S ≡ E[gigi] = E[εi xixi] = σ2 E[x x0 ] = σ2Σ . Note: By the orthogonal decomposition of V −1, we can write In addition, the residuals from regressing Y on X equal the i i xx −1 0 2 V ≡ C C for some (non-unique C). Writingy ˜ ≡ Cy, X˜ ≡ residuals from regressing Y˜ on X˜2. • By assumption 5, σ > 0 and Σxx nonsingular;

19 2 1 P 0 2 2 ˆ • Sb =σ ˆ n i xixi =σ ˆ Sxx whereσ ˆ consistent, so (assumptions 1–4, 5 or 8). Note sampling error δ(Wc) − Distance principle statistic is by no longer need assumption 6 for consistent estima- 0 −1 0 δ = (SxzWSc xz) SxzWc gn(δ). tion of S; 3. Consistent variance estimation: If S is a consistent es- ˆ −1 −1 ˆ −1 −1 d 2 8. {g } ≡ {x ε } satisfies Gordin’s condition (p. 406–7): it b J(δr(Sb ), Sb ) − J(δu(Sb ), Sb ) −→ χ#a, t t t timator of S, then (assumption 2) is ergodic stationary and 0 (a) E[gtgt] exists and is finite; where J is 2n times the minimized objective function: \ˆ L2 Avar(δ(Wc)) = (b) E[gt|gt−j , gt−j−1,... ] −−→ 0 as j → ∞; q 0 −1 0 0 −1 ˆ −1 ˆ 0 −1 ˆ P∞ 0 (SxzWSc xz) SxzWcSbWSc xz(SxzWSc xz) . J(δ, Sb ) ≡ ngn(δ) Sb gn(δ) (c) j=0 E[rtj rtj ] < ∞, where rtj ≡ 1 ˜ 0 −1 0 ˜ E[gt|gt−j , gt−j−1,... ] − E[gt|gt−j−1, gt−j−2,... ]; ≡ (Y − Zδ) XSb X (Y − Zδ) 4. Consistent estimation of S: If δˆ is consistent, S ≡ n and the long-run covariance matrix S = Avar(¯g) ≡ 0 P E[gigi] exists and is finite, and assumptions 1, 2, (5 Γj is nonsingular. j∈Z implied?] and 6, then LM statistic is on Majajan 7-7–12 or D&M 617. Instrumental variables estimator (Hayashi 205–6) A method of X X p moments estimator based on K orthogonality (moment) con- 1 2 0 1 0 ˆ 2 0 Sb ≡ n εˆi xixi = n (yi − ziδ) xixi −→ S. Note that unlike the Wald, these statistics—which are nu- ditions. For just identified GMM model (K = L; identifica- i i merically equal in finite samples—are only asymptotically 0 tion also requires nonsingularity of Σxz ≡ E[xizi]), chi squared under efficient GMM. If the hypothesis’ restric- For estimation of S under assumption 8 in place of 5, −1 tions are linear and the weighting matrix is optimally chosen, ˆ −1  1 X 0  1 X  see Hayashi 407–12 on kernel estimation and VARHAC. δIV ≡ Sxz sxy ≡ n xizi n xiyi . they are also numerically equal to the Wald statistic. i i 5. Choosing instruments: Adding (correct) instruments Numerically equivalent to linear GMM estimator with any generally increases efficiency, but also increases finite- sample bias; “weak” instruments (that are almost un- Efficient GMM (Hayashi 212–5; D&M 588-9, 597–8, 607–8) Lower symmetric positive definite weighting matrix. If instruments ˆ 0 −1 −1 correlated with regressors) make asymptotics poorly ap- bound Avar(δ(Wc)) ≥ (ΣxzS Σxz) is achieved if Wc cho- are same as regressors (xi = zi), reduces to OLS estimator. −1 For properties, see GMM estimators. proximate finite-sample properties. sen so that W ≡ plim Wc = S . Two-step efficient GMM: If one non-constant regressor and one non-constant instru- GMM estimator can be seen as a GLS estimator: GMM min- ment, plim δˆ = Cov(x, y)/ Cov(x, z). 1. Calculate δˆ(W ) for some W (usually S−1; use residuals IV imizes e0XWX0e = e0W e where W ≡ XWX0. c c xx c cGLS cGLS c 0 ˆ εˆi ≡ yi − z δ(Wc) to obtain a consistent estimator Sb of Linear GMM estimator (Hayashi 206–12, 406–7; D&M 219–20) For i S. overidentified case (K > L), cannot generally satisfy all K GMM hypothesis testing (Hayashi 211–2, 222–4; Mahajan 7-7–12; moment conditions, so estimator is D&M 617–8) Assume 1–5. Under null, Wald principle gives: 2. Calculate δˆ(Sb−1). ˆ 1 ˜ 0 ˜ ¯ δ(Wc) ≡ argmin 2 gn(δ) Wc gn(δ) 1. H0 : δl = δl, δ˜ Note the requirement of estimating Sb−1 harms small-sample 1 ˜ 0 0 ˜ ˆ ¯ ≡ argmin 2 (Y − Zδ) XWXc (Y − Zδ) δl(Wc) − δl d properties, since it requires the estimation of fourth mo- 2n t ≡ −→ N(0, 1). δ˜ l ∗ ments. Efficient GMM is often outperformed in bias and SEl ˜ 1 P 0 ˜ variance by equally-weighted GMM (Wc = I) in finite sam- with gn(δ) ≡ n i xi(yi −ziδ) and some symmetric positive p ples. definite matrix Wc −→ W (where Wc can depend on data, and 2. H0 : R#r×LδL×1 = r#r×1, where rank(R) = #r (i.e., W also symmetric and positive definite). R has full row rank—there are no redundant restric- tions), Testing overidentifying restrictions (Hayashi 217–21; Mahajan 7- δˆ ≡ (S0 WS )−1S0 W s GMM xz c xz xz c xy 13–6; D&M 232–7, 615–6) Assuming overidentification (K > L), 0 0 −1 0 0 0 0−1 a consistent estimator S (ensured by assumption 6), and as- ≡ (Z XWXc Z) Z XWXc Y. W ≡ n(Rδˆ(Wc) − r) R[Avar(\δˆ(Wc))]R b sumptions 1–5, then If K = L (just identified case), reduces to IV estimator for ˆ d 2 (Rδ(Wc) − r) −→ χ#r. any choice of Wc. d J ≡ ng (δˆ(S−1))0S−1g (δˆ(S−1)) −→ χ2 . p 3. H : a (δ) = 0, where A (δ) ≡ ∇a(δ) is continu- n b b n b K−L 1. Consistency: δˆ(Wc) −→ δ (under assumptions 1–4). 0 #a #a×L ous and of full row rank, √ ˆ d 2. Asymptotic normality: n(δ(Wc) − δ) −→ To test a subset of orthogonality conditions (as long as K1 ˆ N(0, Avar(δ(Wc))) where 0 0−1 remaining—non-suspect—instruments are enough to ensure W ≡ na(δˆ(Wc)) A(δˆ(Wc))[Avar(\δˆ(Wc))]A(δˆ(Wc)) identification), calculate J1 statistic using remaining instru- Avar(δˆ(W )) = ˆ d 2 d 2 c a(δ(Wc)) −→ χ#a. ments, and C ≡ J − J1 −→ χ (K − K1) (this is a distance 0 −1 0 0 −1 principle test, since J is 2n times the minimized objective (ΣxzW Σxz) ΣxzWSW Σxz (ΣxzW Σxz) | {z } | {z } | {z } (Note W is not numerically invariant to the represen- function). In finite samples, use the leading submatrix of Sb −1 Σ −1 −Ψ −Ψ tation of the restriction.) to calculate J1.

20 Two-Stage Least Squares (Hayashi 226–31; D&M 215–24) Under as- Test by estimating either artificial regression using OLS: 5. Conditional homoscedasticity becomes sumption 7 (conditional homoscedasticity), efficient GMM E[ε ε |x , x ] = σ for all m, h = 1,...,M; ∗ im ih im ih mh becomes 2SLS. Minimizing normalized distance function Y = Zβ + PxZ δ +ε ˆ in this case ∗ Y = Zβ + MxZ η +ε ˇ; 1 0 0 0 (Y − Zδ˜) Px(Y − Zδ˜)  σ E[x x ] ··· σ E[x x ]  2nσˆ2 i.e., including either fitted values or residuals from a first- 11 i1 i1 1M i1 iM stage regression of suspect regressors Z∗ on known instru- S =  . . .  .  . .. .  yields estimator ments X. If Z∗ are exogenous, the coefficient δ or η should σ E[x x0 ] ··· σ E[x x0 ] be zero. M1 iM i1 MM iM iM ˆ ˆ −1 ˆ 2 −1 ˆ −1 δ2SLS ≡ δ(Sb ) = δ([σ Sxx] ) = δ(S ) xx Interpretation note: although the test is often interpreted as 0 −1 −1 0 −1 = (SxzSxx Sxz) SxzSxx sxy. a test for exogeneity, the test is actually for the consistency Linear multiple-equation GMM estimator (Hayashi 266-7, ∗ of the OLS estimate. OLS can be consistent even if Z are 270–3) Hayashi 270 summarizes comparison to single-equation ˆ 2 ˆ Note δ2SLS does not depend onσ ˆ . endogenous. GMM. As in single-equation GMM case, δGMM minimizes 1 ˜ 0 0 ˜ ˆ 2 0 −1 −1 2 (Y − Zδ) XWXc (Y − Zδ), hence Asymptotic variance Avar(δ2SLS) = σ (ΣxzΣxx Σxz) es- Analogous test will work for nonlinear model as long as sus- 2n timated by pect regressors Z∗ enter linearly; if Z∗ enter nonlinearly, failure is due to Jensen’s inequality. δˆ ≡ (S0 WS )−1S0 W s \ˆ 2 0 −1 −1 GMM xz c xz xz c xy Avar(δ2SLS) ≡ σˆ (S S Sxz) xz xx 0 0 −1 0 0 2 0 −1 ≡ (Z XWXc Z) Z XWXc Y, ≡ σˆ n(Z PxZ) 1.16 Linear multiple equation GMM

2 1 P 0 ˆ 2 See Hayashi 284 for summary of multiple-equation GMM es- (See Hayashi 267 for huge partitioned matrix representation; withσ ˆ ≡ i(yi − ziδ2SLS) . n timators. This material is also covered (with slightly different can also be written with Y , ε, and β stacked vectors, and X If errors are heteroscedastic, 2SLS is inefficient GMM. Ro- notation—notably the exchange of x with z—in Mahajan 7. and Z block matrices), where −1 bust covariance estimate (since Wc = Sxx ) is Linear multiple-equation GMM model (Hayashi 259–265, 269–  1 0  70, 274–5) Hayashi 270 summarizes comparison to single- P x z \ˆ n i1 i1 Avar(δ2SLS) = equation GMM.  .  Sxz ≡  ..  , 0 −1 −1 0 −1 −1 0 −1 −1 Model has M linear equations, where equation m has L |{z}   (SxzSxx Sxz) SxzSxx SSb xx Sxz(SxzSxx Sxz) . m P P 1 P 0 Km× Lm xiM z regressors and Km instruments. Assumptions are exactly n iM  1 P  2 parallel to single-equation GMM. xi1yi1 Other characterizations of 2SLS minimize kPx(Y − Zδ)k = n (Y − Zδ)0P (Y − Zδ), or else note: s ≡  .  . x 1. Each equation must satisfy its (equation-by-equation) xy  .  |{z}   rank condition for identification. P 1 P x y ˆ 0 0 −1 0 −1 0 0 −1 0 Km×1 n iM iM δ2SLS = (Z X(X X) X Z) Z X(X X) X Y 2. The vector of all (yi, zi, xi) must be jointly ergodic 0 −1 0 = (Z PxZ) Z PxY stationary—this is stronger than equation-by-equation 1. If each equation is exactly identified, then choice of ergodic stationarity. 0 −1 0 weighting matrix doesn’t matter, and multiple-equation where the projection matrix Px ≡ X(X X) X . So can 3. Asymptotic normality requires that {gi} be a mds with GMM is numerically equivalent to equation-by-equation view 2SLS as: nonsingular second moment, where IV estimation.   1. IV regression of Y (or PxY ) on Z with PxZ (fitted val- xi1εi1 2. Equation-by-equation GMM corresponds to using a ues from regressing Z on X) as instruments, or; g ≡  .  . block diagonal weighting matrix. i  .  2. OLS regression of Y (or P Y ) on P Z (fitted values |{z} x x P x ε from regressing Z on X)—this is why it’s called “two- Km×1 iM iM 3. Assuming overidentification, efficient multiple-equation GMM will be asymptotically more efficient than stage least squares.” P P By ergodic stationary mds CLT, S Km× Km = 0 equation-by-equation efficient GMM; they will only be E[gig ] where Caution: Don’t view standard errors reported by these re- i asymptotically equivalent if efficient weighting matrix 0 √ X is block diagonal (i.e., if E[εimεihximx ] = 0 for all gressions are relevant for 2SLS, since they ignore errors from ¯ ¯ 1 ih S ≡ lim Var[ ng] = Avar(g) = Avar( n gi) m 6= h; under conditional homoscedasticity, this is “first stage” (i.e., calculation of fitted values PxZ). n→∞  0 0  E[εimεih] = 0). E[εi1εi1xi1xi1] ··· E[εi1εiM xi1xiM ] Durbin-Wu-Hausman test (D&M 237–42; MaCurdy) Suppose we =  . . .  . do not know whether certain regressors are endogenous or  . .. .  Full-Information Instrumental Variables Efficient (Hayashi not, and therefore whether or not they need to be included E[ε ε x x0 ] ··· E[ε ε x x0 ] iM i1 iM i1 iM iM iM iM 275–6; Mahajan 7–16-7) Efficient GMM estimator with: as instruments. Assuming endogeneity, OLS will be incon- sistent and we prefer 2SLS; assuming exogeneity, both OLS 4. Consistent estimation of S requires finite fourth mo- 2 0 and 2SLS estimates will be consistent and we prefer OLS (it ments E[(ximkzihj ) ] for all k = 1,...,Km; j = • Conditional homoscedasticity (E(εε |X) = ΣM×M ⊗ is BLUE, and unbiased under appropriate assumptions). 1,...,Lh; and m, h = 1,...,M. In×n).

21 −1 Weighting matrix is Sb , where Suppose we have linearity; joint ergodic stationarity; orthog- Seemingly Unrelated Regressions (Hayashi 279–83, 309; Mahajan X onality; rank condition; mds with finite second moments; 7–17) [a.k.a. Joint Generalized Least Squares] Efficient GMM S ≡ 1 X ΣX0, 0 b n i b i conditional homoscedasticity; E[zimzih] exists and is finite estimator with: i for all m, h; and xim = x˜i (common instruments). Then  1 P 0 1 P 0  δˆ is consistent, asymptotically normal, and efficient. σˆ11 n i xi1xi1 ··· σˆ1M n i xi1xiM 3SLS 0 • Conditional homoscedasticity (E(εε |X) = ΣM×M ⊗  . . .  3SLS is more efficient than multiple-equation 2SLS unless =  . .. .  , In×n),  . .  either σˆ 1 P x x0 ··· σˆ 1 P x x0 M1 n i iM i1 MM n i iM iM • The set of instruments the same across equations (X = 1. Σ is diagonal, in which case the two estimators are I ⊗ X˜ , where X˜ are the instruments for each and X the P K ×M block-diagonal matrix of instruments asymptotically equivalent (although different in finite M×M n×K i m equation). for the ith observation; and samples), 2. Each structural equation is exactly identified, in which • The set of instruments the union of all the regressors. 1 X 0 Σb ≡ n εˆiεˆi, case the system is exactly identified, and 2SLS and 3SLS i are computationally identical in finite samples. Note this is the 3SLS estimator; it just has another name or Multiple equation Two-Stage Least Squares (MaCurdy) Effi- when instruments are the union of all regressors. Here, all re- cient GMM estimator with: 1 X gressors satisfy “cross” orthogonality: each is predetermined σˆmh ≡ n εˆimεˆih in every other equation. Since regressors are a subset of i • Conditional homoscedasticity across equations (Σ = ∗ 2 instruments, residuals used to estimate Σb are from OLS. X σ IM×M ), = 1 (y − z0 δˆ )(y − z0 δˆ ) Normalized distance function n im im m ih ih h 0 i • Conditional homoscedasticity (E(εε |X) = ΣM×M ⊗ 2 In×n = σ InM×nM ), ˆ 1 ˜ 0 −1 ˜ for some consistent estimator δm of δm (usually 2SLS). • The set of instruments the same across equations (X = 2n (Y − Zδ) (Σb ⊗ I)(Y − Zδ) ˜ ˜ Suppose we have linearity; joint ergodic stationarity; orthog- IM×M ⊗ Xn×K , where X are the instruments for each onality; rank condition; mds with finite second moments; equation). 0 yields estimator conditional homoscedasticity; and E[zimzih] exists and is fi- ˆ Weighting matrix doesn’t matter up to a factor of propor- nite for all m, h. Then δFIVE is consistent, asymptotically tionality, and is thus S−1 = I ⊗ ( 1 P x˜ x˜0 )−1 = I ⊗ b n i i i ˆ  0 −1 −1 0 −1 normal, and efficient. 1 ˜ 0 ˜ −1 δSUR = Z (Σb ⊗ I)Z Z (Σb ⊗ I)y, ( n X X) . Normalized distance function −1 Three-Stage Least Squares (Hayashi 276–9, 308; Mahajan 7–17; Avar(\δˆ ) = nZ0(Σ−1 ⊗ I)Z . σˆ2 −1  SUR b MaCurdy) Efficient GMM estimator with: 1 1 (Y − Zδ˜)0  ..  ⊗ P  (Y − Zδ˜) 0  .  X˜  • Conditional homoscedasticity (E(εε |X) = ΣM×M ⊗ 2n   2 In×n), σˆM Suppose we have linearity; joint ergodic stationarity; orthog- • The set of instruments the same across equations (X = yields estimator onality; mds with finite second moments; conditional ho- S † ˆ moscedasticity; and xim = zim. Then δSUR is consis- IM×M ⊗ X˜n×K , where X˜ are the instruments for each m ˆ  0 −1 0 tent, asymptotically normal, and efficient. equation). δ2SLS = Z (I ⊗ PX˜ )Z Z (I ⊗ PX˜ )y 0 −1 0 −1 −1 1 P 0 −1 −1 = [Zb Zb] Zb y, If each equation is just identified, then all the regressors Weighting matrix is Sb = Σb ⊗ ( x˜ix˜ ) = Σb ⊗ n i i must be the same and SUR is called the “multivariate regres- 1 ˜ 0 ˜ −1 1 P 0 ( X X) , where Σb = εˆiεˆ is the matrix ofσ ˆ calcu- n n i i where Zb ≡ (I⊗PX˜ )Z are the fitted values from the first-stage sion”—it is numerically equivalent to equation-by-equation lated as in FIVE using 2SLS residuals. Normalized distance regression. The covariance matrix simplifies since Σ = σ2I: OLS. If some equation is overidentified, we gain efficiency function 1 ˜ 0 −1 ˜ over equation-by-equation OLS as long as some errors are (Y − Zδ) (Σb ⊗ P ˜ )(Y − Zδ) −1 2n X \ˆ 2 0  correlated across equations (i.e., σmh 6= 0); we are taking Avar(δ2SLS) = nσˆ Z (I ⊗ PX˜ )Z yields estimator advantage of exclusion restrictions vs. multivariate regres- 2 0 −1 = nσˆ [Zb Zb] ; sion. ˆ  0 −1 −1 0 −1 δ3SLS = Z (Σb ⊗ PX˜ )Z Z (Σb ⊗ PX˜ )y however, if Σ 6= σ2I, we instead get  0 −1 −1 0 −1 = Zb (Σb ⊗ I)Zb Zb (Σb ⊗ I)y, Multiple-equation GMM with common coefficients 0 −1 0 0 −1 = n[Zb Zb] Zb (Σ ⊗ I)Zb[Zb Zb] . (Hayashi 286–9) When all equations have same coefficient L where Zb ≡ (I ⊗ PX˜ )Z are the fitted values from the “first- vector δ ∈ R , This is the same estimator as equation-by-equation 2SLS, stage” regression of the Zms on X˜. and the estimated within-equation covariance matrices are also the same. Joint estimation gives the off-diagonal (i.e., 1. Linearity becomes y = z0 δ + ε for m = 1,...,M; \ˆ  0 −1 −1 im im im Avar(δ3SLS) = n Z (Σb ⊗ PX˜ )Z . cross-equation) blocks of of the covariance matrix. i = 1, . . . , n.

∗When regressors are a subset of instruments, 2SLS becomes OLS. † 0 We need neither the rank conditions (per Hayashi 285 q. 7) nor assumption that E[zimzih] exists and is finite for all m, h—it is implied.

22 1 P 0 −1 −1 1 P 0 −1 2. Rank condition requires full column rank of ( n xixi) rather than Σb ⊗ ( n xixi) . Caution: • Structural form Running a standard OLS will give incorrect standard errors.  0  E(xi1zi1) Using matrix notation as in random effects estimator, Γ y + B x = ε Σ ≡  .  ; t t t xz  .  |{z} |{z} |{z} |{z} |{z} |{z}  −1  M×M M×1 M×K K×1 M×1 P 0 ˆ 1 X 0 1 X 0 Km×L E(xiM ziM ) δpOLS = n ZiZi n Ziyi , i i Note here Σ is a stacked rather than a block matrix, xz = (Z0Z)−1Z0y; has nonsingular Γ; and the rank condition is weaker than in the standard ˆ 0 −1 0 0 −1 multiple-equation GMM model—rather than requiring Avar(δpOLS) = E[ZiZi] E[ZiΣZi] E[ZiZi] ; 5. εt|xt ∼ N(0, Σ) for positive definite Σ; equation-by-equation rank conditions, it suffices for one \ˆ equation to satisfy the rank condition. Avar(δpOLS) = 6. The vector that collects elements of yt (≡ yt1, . . . , ytM , zt1, . . . , ztM ) and xt is iid (stronger than δˆ ≡ (S0 WS )−1S0 W s −1 −1 a jointly ergodic stationary mds); GMM xz c xz xz c xy  1 P 0   1 P 0  1 P 0  n i ZiZi n i ZiΣbZi n i ZiZi , (as in standard multiple-equation GMM estimator), but here 7. (δ, Γ) is an interior points in a compact parameter space.  1 P 0   1 P  = n(Z0Z)−1Z0(Σ ⊗ I)Z(Z0Z)−1. n xi1zi1 n xi1yi1 b  .   .  We can rewrite complete system as a reduced form Sxz ≡  .  , sxy ≡  .  .  .   .  1 P 0 1 P 1.17 ML for multiple-equation linear mod- n xiM ziM n xiM yiM 0 els yt = Π xt + vt . |{z} |{z} Imposition of conditional homoscedasticity (and other appro- −1 −1 −Γ B Γ εt priate assumptions as above) gives common-coefficient ver- Structural form (D&M 212–3; Hayashi 529–30) Equations each sions of FIVE, 3SLS, and SUR estimators—the latter is the present one endogenous variable as a function of exogenous “random effects” estimator. and endogenous variables (and an error term). It is conven- xt is orthogonal to vector of errors vt, so this is a multivariate tional to write simultaneous equation models so that each regression model; given our normality assumption, we have Random effects estimator (Hayashi 289–90, 292–3) Efficient endogenous variable appears on the left-hand side of exactly multiple-equation GMM (SUR) where all equations have one equation, “but there is nothing sacrosancy about this −1 −1 −1 0 L yt|xt ∼ N(−Γ B0xt, Γ Σ(Γ ) ), same coefficient vector δ ∈ R , conditional homoscedasticity convention.” applies, and the set of instruments for all equations is the Can also be written with the errors a function of all variables union of all regressors. (“one error per equation”) and can estimate δ (on which Γ and B depend) and Σ by We can write the model as y = Z δ + ε , where maximum likelihood. Generally use concentrated ML to first i i i 1 Pn Reduced form (D&M 213–4; Hayashi 529–30) Each endogenous vari- maximize over Σ to get Σ(b δ) = n t=1(Γyt + Bxt)(Γyt +    0    0 yi1 zi1 εi1 able is on the left side of exactly one equation (“one endoge- Bxt) and then over δ. y =  .  ,Z =  .  , ε =  .  ; nous variable per equation”), and only predetermined vari- i  .  i  .  i  .  Estimator is consistent and asymptotically normal (and |{z} |{z} |{z} ables (exogenous variables and lagged endogenous variables) 0 asymptotically equivalent to 3SLS†) even if ε |x is not nor- M×1 yiM M×L ziM M×1 εiM are on the right side (along with an error term). t t mal, as long as model is linear in y’s and errors (not neces- or as y = Zδ + ε by stacking the above (note Z is no longer Full Information Maximum Likelihood (Hayashi 526–35; sarily in regressors or parameters). block-diagonal as it was for SUR). Then MaCurdy) Application of (potentially quasi-) maximum like- lihood to 3SLS model with two extensions: iid (dependent  X −1 X  Maximum likelihood for SUR (Hayashi 525–7; MaCurdy P.S. 3 1(v)) δˆ = 1 Z0Σ−1Z 1 Z0Σ−1y , RE n i b i n i b i variables, regressors, and instruments), and conditionally A specialization of FIML, since SUR is 3SLS with the set of i i homoscedastic normally distributed errors. Model requires: instruments the set of all regressors. In FIML notation, that  0 −1 −1 0 −1 = Z (Σb ⊗ I)Z Z (Σb ⊗ I)y; 0 means ztm is a subvector of xt; therefore yt just collects the 1. Linearity: ytm = ztmδm + εtm for m = 1,...,M; ˆ 0 −1 −1 (sole remaining endogenous variables) yt1, . . . , ytM and the Avar(δRE) = E[ZiΣ Zi] ; 0 2. Rank conditions: E[xtztm] of full column rank for all structural form parameter Γ = I. We get a closed form ex- −1 \ˆ  1 X 0 −1  m; pression (for some consistent estimator Σ),b unlike for general Avar(δRE) = ZiΣb Zi , n 0 FIML: i 3. E[xtxt] nonsingular;  0 −1 −1 = n Z (Σb ⊗ I)Z . 4. The system can be written as a “complete system” of ˆ  0 −1 0 − simultaneous equations—let yt be a vector of all en- δ(Σ)b = Z (Σb − 1 ⊗ I)Z Z (Σb 1 ⊗ I)y. Pooled OLS (Hayashi 290–3, 309–10) Inefficient multiple-equation dogenous variables (those in yt1, . . . , ytM , zt1, . . . , ztM ∗ GMM where all equations have same coefficient vector δ ∈ but not in xt) : L R calculated by pooling all observations and applying OLS. • The number of endogenous variables equals the LR statistic is valid even if the normality assumption fails to M Equivalent to GMM with weighting matrix Wc = IM ⊗ numbers of equations (i.e., yt ∈ R ); hold. ∗ Caution: yt does not merely collect yt1, . . . , ytM as usual. †Per Hayashi 534 and MaCurdy, this asymptotic equivalence does not carry over to nonlinear (in endogenous variables and/or errors) equation systems; nonlinear FIML is more efficient than nonlinear 3SLS.

23 Limited Information Maximum Likelihood (Hayashi 538–42; where ηt ≡ α(L)εt and αj = −(ψj+1 + ψj+2 + ··· ). 1.19 Limited dependent variable models MaCurdy) Consider the mth equation in the FIML frame- Eventually, the time trend (assuming there is one, i.e., δ 6= 0) work, and partition regressors into endogenousy ˜t and dominates the stochastic trend a.k.a. driftless random walk, Binary response model (D&M 512–4, 517–21; MaCurdy) Dependent predeterminedx ˜t: which dominates which dominates the stationary process. variable yt ∈ {0, 1} and information set at time t is Ωt. Then 0 0 0 ytm = ztm δm +εt =y ˜t γm +x ˜t βm +εt |{z} |{z} |{z} |{z} |{z} |{z} Brownian motion (Hansen B2-5–7, 18, 21; Hayashi 567–72) The ran- 1×L L ×1 1×M M ×1 1×K K ×1 m m m m m m dom CADLAG (continuous from right, limit from left) func- index where Lm = Mm + Km. tion W : [0, 1] → R (i.e., an element of D[0, 1]) such that: z }| { E[yt|Ωt] = Pr[yt = 1|Ωt] ≡ F (h(xt, β)) . Combining with the relevant rows from the FIML reduced | {z } 0 form yt = Π xt + vt gives a complete 1 + Mm equation sys- 1. W (0) = 0 almost surely; transformation tem: 2. For 0 ≤ t1 < t2 < ··· < tk ≤ 1, the random variables ¯ ¯ W (t2) − W (t1), W (t3) − W (t2),...,W (tk) − W (tk−1) 0 Γ y¯t + B xt =ε ¯t If E[yt|Ωt] = F (xtβ), then FOCs for maximizing log like- |{z} |{z} |{z} |{z} |{z} are independent normals with W (s)−W (t) ∼ N(0, s−t); lihood are same as for weighted least squares with weights (1+Mm)×(1+Mm) (1+Mm)×1 (1+Mm)×K K×1 (1+Mm)×1 −1/2 3. W (t) is continuous in t with probability 1. [F (1 − F )] , which is reciprocal of squareroot of error with   variance. ytm   y¯t ≡ , ε¯t ≡ εtmv˜t , If {ξt} is a driftless I(1) process, so that ∆ξt is zero- y˜t 2 FOCs may not have a finite solution if data set doesn’t iden- mean I(0) with λ the long-run variance of {∆ξt} and    0 0 0 −β 0 tify all parameters. ¯ 1 −γm ¯ m γ0 ≡ Var(∆ξt), then Γ ≡ , B ≡ ˜ 0 , 0 IMm −Π andx ˜ assumed to be the first K elements of x . T 1 t m t 1 X d Z ξ2 −→ λ2 W (r)2 dr Probit model (Hayashi 451–2, 460, 466–7,477–8; D&M 514–5; MaCurdy) We estimate this structural form by ML as in FIML. Estima- T 2 t−1 Binary response model with E[y |Ω ] = Φ(x0 β). t=1 0 t t t tor is consistent, and asymptotically normal even if εt|xt is not normal, as long as model is linear in y’s and errors (not T ∗ 0 1 X d 1 2 2  Can also think of as yt = xtβ + εt where εt ∼ N(0, 1) and ∆ξtξt−1 −→ λ W (1) − γ0 necessarily in regressors or parameters). It has the same T 2 yt = Iy∗>0. asymptotic distribution as 2SLS (again, if specification is t=1 t Z 1 linear), but generally outperforms 2SLS in finite samples. 2 Log likelihood function (QML if non-iid ergodic stationary) ≡ σ∆ξ W (u) dW (u) 0 1.18 Unit root processes jointly (i.e., the W (·) are all the same).∗ X 0 0 ytΦ(xtβ) + (1 − yt)(1 − Φ(xtβ)) Integrated process of order 0 (Hayashi 558) A sequence of r.v.s t Function Central Limit Theorem (Hansen B2-9–17) [Donsker’s is an I(0) process if it: Invariance Principle] A sequence {Yt} (where Y0 = 0) can 0 1. is strictly stationary; be mapped into a D[0, 1] function by is concave. MLE is identified iff E[xtxt] is nonsingular. MLE P is consistent and asymptotically normal if identification con- 2. has long-run variance γj ∈ (0, ∞). j∈Z Wn(u) ≡ Ybunc dition holds, and if {yt, xt} is ergodic stationary. It can be written as δ + ut, where {ut} is a zero-mean sta- tionary process with positive, finite long-run variance. or equivalently by Logit model (Hayashi 508–1; D&M 515) Binary response model with 0 Integrated process of order d (Hayashi 559) A sequence of r.v.s E[yt|Ωt] = Λ(x β) where Λ(·) is the logistic function: Λ(v) ≡ d d  t t+1  t {ξt} is an I(d) process if its dth differences ∆ ξt ≡ (1−L) ξt ≡ Yt for u ∈ n , n . exp(v)/(1 + exp(v)). are an I(0) process. Pt −1/2 ∗ 0 If Yt ≡ s=1 εs for {εs} iid white noise, then n Wn(u) ≡ Can also think of as yt = xtβ + εt where εt have extreme Unit root process (Hayashi 559–60, 565-6) An I(1) process, a.k.a. −1/2 d value distribution and yt = Iy∗>0. difference-stationary process. If first-differences are mean- n Ybunc −→ σεW (u), where W (·) is a Brownian motion. t zero, the process is called “driftless.” Log likelihood function (QML if non-iid ergodic stationary) Dickey-Fuller Test (Hayashi 573–7; Hansen B2-2–4) Test whether an If ∆ξ ≡ δ + u ≡ δ + ψ(L)ε , the process can be written t t t AR(1) process has a unit root (the null hypothesis). In case t without drift, use OLS to regress y = ρy + ε where {ε } X t t−1 t t X 0 0 ξt = ξ0 + δt + us iid white noise. Under null, ytΛ(xtβ) + (1 − yt)(1 − Λ(xtβ)) s=1 t t 1 P R 1 X T t Yt−1εt d 0 W (u) dW (u) = δt + ψ(1) εs + ηt +(ξ0 − η0) T (ˆρT − 1) = −→ ≡ DF. 1 P Y 2 R 1 W (u)2 du 0 s=1 T 2 t t−1 0 is concave. MLE is identified iff E[xtxt] is nonsingular. MLE |{z} | {z } |{z} is consistent and asymptotically normal if identification con- time trend stoch. trend stationary proc. Tests with drift or non-zero mean covered in Hayashi. dition holds, and if {yt, xt} is ergodic stationary. ∗ The last expression assumes {∆ξt} is iid white noise.

24 Truncated response model (Hayashi 511–7; D&M 534–7; MaCurdy) Maximum Likelihood is best estimation technique (called Cauchy-Schwarz Inequality (C&B 4.7.3; Wikipedia) 0 2 Model is yt = xtβ0 + εt with εt|xt ∼ N(0, σ0 , but we only “generalized Tobit” here), but also sometimes estimated us- q 2 2 see observations where yt > c for some known threshold c. ing either |EXY | ≤ E |XY | ≤ (E X )(E Y ). Density for each observation is This is H¨older’sInequality with p = q = 2. Implies that 1. Heckman two-step: first use probit estimation on all p f(yt) Cov(X,Y ) ≤ Var(X) Var(Y ). f(yt|yt > c) = . observations to estimateγ ˆ, and then use OLS for ob- Pr(y > c) ∗ ˆ t servations where zt > 0 to estimate β; note the reported Equality obtains iff X and Y are linearly dependent. 2 covariance matrix in the second step will not be correct. Note that if yt ∼ N(µ0, σ0 ), Minkowski’s Inequality (C&B 4.7.5) 2. Nonlinear least squares: only uses any data on obser- p 1/p p 1/p p 1/p E[yt|yt > c] = µ0 + σ0λ(v) ∗ [E |X + Y | ] ≤ [E |X| ] + [E |Y | ] vations where zt > 0; generally has identification prob- 2 0 2  lems, since estimation is principally based on exploiting Var[yt|yt > c] = σ0 [1 − λ (v)] = σ0 1 − λ(v)[λ(v) − v] for p ≥ 1. Implies that if X and Y have finite pth moment, nonlinearity of φ(·)/Φ(·). then so does X + Y . Implies that E |X¯|p ≤ E |X|p. c−µ0 φ(v) where v ≡ σ and λ(v) ≡ 1−Φ(v) is the inverse Mills ratio. 0 Jensen’s Inequality (C&B 4.7.7) If g(·) is a convex function, then Could estimate 1.20 Inequalities E g(X) ≥ g(E X).  0  0 c − xtβ0 Equality obtains iff, for every line a + bx tangent to g(x) at E[yt|yt > c] = xtβ0 + σ0λ Bonferroni’s Inequality (C&B 1.2.10–11) Bounds below the prob- x = E X, P (g(X) = a + bX) = 1 (i.e., if g(·) is linear almost σ0 ability of an intersection in terms of individual events: everywhere in the support of X). by nonlinear least squares, but ML is preferred because it is P (A ∩ B) ≥ P (A) + P (B) − 1; or more generally asymptotically more efficient (and we have already imposed n n n full distributional assumptions).  \  X X 1.21 Thoughts on MaCurdy questions P A ≥ P (A ) − (n − 1) = 1 − P (AC). 0 i i i Extension to nonlinear index function (in place of xtβ) is i=1 i=1 i=1 Default answer straighforward. If we assume , we know that is consis- Tobit model (Hayashi 518–21; D&M 537–42; MaCurdy) [a.k.a. censored Boole’s Inequality (C&B 1.2.11) Bounds above the probability S  tent, assuming correct specification and e.g., ergodic station- response model] All observations are observed, but we do not of a union in terms of individual events: P i Ai ≤ P arity, regressors orthogonal to the contemporary error term, see y if it is less than some known threshold c; “like a trun- i P (Ai). t instruments orthogonal to the contemporary error term, ap- cated regression model combined with a probit model, with propriate rank condition(s) for identification. the coefficient vectors restricted to be proportional to each Chebychev’s Inequality (C&B 3.6.1–2, 5.8.3; Greene 66) A widely other.” applicable but “necessarily conservative” inequality: for any We can test the null hypothesis that , which corre- r > 0 and g(x) nonnegative, sponds to = 0, using a Wald test on Model . The Model is 2 0 statistic W ≡ is asymptotically distributed χ ( ) yt = max{xtβ0 + εt, c} P (g(X) ≥ r) ≤ E g(X)/r. under the null, thus we can(not) reject the null hypothesis. | {z } ≡y∗ Note we have used normal/robust standard errors. t Letting g(x) = (x − µ)2/σ2 where µ = E X and σ2 = Var X 2 −2 We can also test the null hypothesis using a distance prin- where εt|xt ∼ N(0, σ0 ) and {xt, yt} iid. gives P (|X − µ| ≥ tσ) ≤ t . ciple test, where model is the unrestricted model, and Log-likelihood (a mix of density and mass) is See C&B 5.8.3 for a (complex) form that applies to sample model represents the restrictions imposed by the null   0   0  mean and variance rather than the (presumably unknown) hypothesis. Using a suitably normalized distance func- X 1 yt − xtβ0 X c − xtβ0 log φ + log Φ . population values. tion QT ≡ , the statistic LR ≡ 2T (Qr − Qu) = σ0 σ0 σ0 yt>c yt=c is asymptotically distributed χ2( ) under the Markov’s Inequality (C&B 3.8.3) Like Chebychev, but imposes null. Reparametrize to get concavity. conditions on Y to provide more information at equality [Note we cannot use the multiple equation 2SLS estimates to point. If P (Y ≥ 0) = 1 (i.e., Y is nonnegative) and Sample selection model (D&M 542–5; MaCurdy) Rather than trun- conduct a distance principle test, since generating a suitably P (Y = 0) < 1 (i.e., Y not trivially 0), and r > 0, then cating according to dependent variable, truncate on another normalized distance function would require having estimates P (Y ≥ r) ≤ E Y/r with equality iff P (Y = r) = 1 − P (Y = variable correlated with it. Suppose y∗ (e.g., wage), and z∗ for the variance of the error for each equation (along with t t 0). (e.g., weeks) are latent variables with the SSR for each equation).]

 ∗  0        2  Numerical inequality lemma (C&B 4.7.1) Let a and b be any [Note that the validity of this distance function requires the yt xtβ ut ut σ ρσ ∗ = 0 + , ∼ N 0, , positive numbers, p and q positive (necessarily > 1) satis- restricted and unrestricted models to use the same weighting zt wtγ vt vt ρσ 1 1 1 1 p 1 q matrix; since 6= , the test statistic is not fying p + q = 1. Then p a + q b ≥ ab, with equality iff and xt and wt exogenous or predetermined. We observe zt ap = bq. valid. However, given the regression output available, this is ∗ ∗ ∗ (the sign of zt ) and yt = yt if zt > 0. the best we can do.] ∗ ∗ H¨older’sInequality (C&B 4.7.2) Let positive p and q (necessarily For observations where zt > 0 (and hence we observe yt ), Thus we can(not) reject the null hypothesis. model becomes > 1) satisfy 1 + 1 = 1, then p q If we instead assume , our tests are no longer valid; 0 φ(wtγ) in this case is consistent (again, assuming correct y = x 0 β + ρσ + ε . p 1/p q 1/q t t 0 t |EXY | ≤ E |XY | ≤ (E |X| ) (E |Y | ) . specification), so. . . Φ(wtγ)

25 Thoughts on models • 2SLS should generally have better finite sample 3. 2SLS (single-equation): 1 e0P e σˆ2 z properties since it uses a priori knowledge of the Reported value is usually e0P e; required normalization 1. Two models with and without a restriction is for LR z form of the S matrix. is 1/σˆ2 . testing. • FIML, LIML, 3SLS, and 2SLS are all asymptoti- • Invalid under heteroscedasticity (not efficient 2. A model that includes only fitted values of an endoge- cally equivalent. GMM). nous variable is 2SLS. • Validity under serial correlation??? 3. A model that includes both a (potentially) endogenous Hypothesis tests • Make sure to use the same estimate forσ ˆ2 in variable and its fitted values or residuals is for a DWH both restricted and unrestricted models (typically test. Wald: basically always possible as long as estimator is con- SSR/n or SSR/(n−k) from one of the two models). 4. A model that includes lagged residuals is for testing sistent and standard errors are calculated appropriately 4. SUR / JGLS: e0(Φb−1 ⊗ I)e serial correlation. Reported value is usually as desired; no normalization • Be careful when using formulae from Hayashi; they are is needed. generally based on the asymptotic variance matrix; re- Which model and standard errors to use 1 • Invalid under heteroscedasticity (across gression output doesn’t estimate this, it estimates n times the asymptotic variance matrix. observations–not efficient GMM; however, it’s fine A less restrictive model will still be consistent and asymp- with heteroscedasticity across equations). • State assumptions that justify consistency of estimation totically more efficient; it is only harmed on finite sample • Validity under serial correlation??? technique, and standard errors used. properties; same goes for robust standard errors. • Watch out: this isn’t actually valid, since re- stricted and unrestricted estimation will use a dif- Check for exact identification—models often simplify in these Distance principle / LR ferent weighting matrix; however, we often com- cases. ment that this isn’t valid, but use it anyway since • Need to use 2n times a suitably normalized maximized Look ahead at all the questions; often it’s best to make an that’s the best output available. objective function, i.e., one that satisfies “the prop- assumption early, since later questions will ask you to give erty”: 5. 2SLS (multiple-equation): up that assumption. σˆ2 −1  "√ # 2 1 1. Heteroscedasticity across observations ∂QT a ∂ QT Var T = ± . e0  .  ⊗ P  e ∂θ ∂θ ∂θ0  ..  z • Parametrize and use GLS. θ0 θ0   σˆ2 • Parametrize and use ML. M • For efficient GMM, Q ≡ 1 (X0e)0Ω−1(X0e) satisfies 0 • Others? T 2 b Reported value is usually e (I ⊗ Pz)e; no normalization “the property,”√ where Ωb consistently estimates the vari- can get us a usable distance function. 2. Homoscedasticity (across observations) 0 ance of TX e. • Invalid under heteroscedasticity (across • 3SLS is efficient GMM. • We cannot generally use LR tests in the presence of observations—not efficient GMM; however, it’s • FIML is asymptotically equivalent to 3SLS. heteroscedasticity. fine with heteroscedasticity across equations). • Invalid if errors are correlated across equations 3. Homoscedasticity (across observations), uncorrelated • For GMM estimators, must use efficient weighting ma- (i.e., Phi is not a diagonal matrix) errors across equations (i.e., Φ diagonal) trix for LR to be valid. • Validity under serial correlation??? • 3SLS is efficient GMM; 2SLS is not efficient GMM • Restricted and unrestricted estimates should use same • I don’t think any of the regression programs actu- (e.g., cannot use minimized distance function for 0 weighting matrix to be valid. ally give this output (they typically give e (I⊗Pz)e; LR testing), but it is asymptotically distributed • Including the 2n and the normalization, use: maximizing this objective function gives the same identically to 3SLS estimator, but not the same maximized objective • 2SLS should generally have better finite sample function); thus we can’t do LR testing here here, 1. MLE: 2 · loglik properties since it uses a priori knowledge of the even though it may be theoretically valid. 2 form of the S matrix (it assumes that S is σ I; even Reported value is usually loglik; required normalization 0 −1 6. 3SLS: e (Φb ⊗ Pz)e though it is actually only diagonal, the resulting es- is 2. Reported value is usually as desired; no normalization timator is numerically identical to what we would • Valid no matter what if model is correctly specified. is needed. get if we only assumed diagonal S: equation-by- 2 0 2 equation 2SLS). 2. LS: SSR/σˆ = e e/σˆ • Invalid under heteroscedasticity (across observations—not efficient GMM; however, it’s • FIML, LIML, 3SLS, and 2SLS are all asymptoti- Reported value is usually SSR; required normalization 2 fine with heteroscedasticity across equations). cally equivalent. is 1/σˆ . • Validity under serial correlation??? • Invalid under heteroscedasticity. 4. Homoscedasticity (across observations), uncorrelated • Watch out: this isn’t actually valid, since re- errors across equations, homoscedasticity across equa- • Validity under serial correlation??? 2 stricted and unrestricted estimation will use a dif- tions (i.e., Φ = σ I) • Make sure to use the same estimateσ ˆ2 in both re- ferent weighting matrix; however, we often com- • 2SLS and 3SLS are both efficient GMM; they are stricted and unrestricted models (typically SSR/n ment that this isn’t valid, but use it anyway since asymptotically equal and identically distributed. or SSR/(n − k) from one of the two models). that’s the best output available.

26 Multiple equation systems Finding asymptotic distributions Choice rule (Choice 6) Given a choice set B and preference rela- tion %, choice rule C(B, %) ≡ {x ∈ B : ∀y ∈ B, x % y}. This 1. Structural equation: can have multiple endogenous 1. CLTs for correspondence gives the set of “best” elements of B. variables; typically written as explicitly solved for one endogenous variable. • iid sample If % is complete and transitive and |B| finite and non-empty, • niid sample then C(B, %) 6= ∅. 2. Reduced form equation: only one endogenous vari- able per equation (and one equation per endogenous • ergodic stationary process Revealed preference (Choice 6–8; MWG 11) We observe choices variable). • ergodic stationary mds and deduce a preference relation. Consider revealed pref- erences C : 2B → 2B satisfying ∀A, C(A) ⊆ A. Assuming • MA(∞) the revealed preference sets are always non-empty, there is Instruments 2. Delta method a well-defined preference relation % (complete and transi- tive) satisfying ∀A, C(A, ) = C(A) iff C satisfies HARP 1. Must be correlated with endogenous variable and un- % 3. MLE is asymptotically normal with variance equal to (or WARP, and the set of budget sets contains all subsets of correlated with error. inverse of Fisher info (for a single observation, not joint up to three elements). 2. Typically things that are predetermined—not meant distribution) technically, they were actually determined before the Houthaker’s Axiom of Revealed Preferences (Choice 6-88) A B B endogenous variable. set of revealed preferences C : 2 → 2 satisfies HARP iff Unreliable standard errors ∀x, y ∈ U ∩ V such that x ∈ C(U) and y ∈ C(V ), it is also 3. Predetermined variables, lagged endogenous variables, the case that x ∈ C(V ) and y ∈ C(U). In other words, sup- interactions of the above. 1. Remember standard errors are asymptotic estimates ex- pose two different choice sets both contain x and y; if x is 4. In 2SLS, check the quality of instruments’ correlation cept in Gaussian OLS; therefore finite sample inference preferred to all elements of one choice set, and y is preferred with endogenous variable (note this does not check the may be inaccurate. to all elements of the other, then x is also preferred to all lack of correlation with error) by looking at the first 2. If you run a multiple-stage regression technique in sepa- elements of the second, and y is also preferred to all elements stage regression: rate stages (e.g., sticking in fitted values along the way). of the first. • R2; 3. If you stick something into the regression that doesn’t Weak Axiom of Revealed Preference (MWG 10–11) Choice • t-statistics on each instrument; “belong” (e.g., fitted values for a DWH test—although structure (B,C(·))—where B is the set of budget sets— 0 0 • F -statistic for model as a whole, and potentially for some reason this one may be OK, inverse Mills for satisfies WARP iff B, B ∈ B; x, y ∈ B; x, y ∈ B ; 0 0 F -statistic on excluded instruments. sample selection, . . . ). x ∈ C(B); y ∈ C(B ) together imply that x ∈ C(B ). (Basically, HARP, but only for budget sets). 4. Heteroscedastic errors (when not using robust standard Omitted variables errors). Generalized Axiom of Revealed Preference (Micro P.S. 1.3) A set of revealed preferences C : A → B satisfies GARP if for 5. Serially correlated errors (when not using HAC stan- When we get asked to find out what happens if variables are any sequences A1,...,An and x1, . . . , xn where excluded (i.e., an incorrectly specified model is estimated), a dard errors). 1. ∀i ∈ {1, . . . , n}, x ∈ A ; good tool is the Frisch-Waugh Theorem. 6. Poor instruments (see above). i i 2. ∀i ∈ {1, . . . , n − 1}, xi+1 ∈ C(Ai); 7. When regression assumptions fail ( e.g., using regular 3. x1 ∈ C(An); Finding probability limits/showing consistency standard errors when inappropriate, failure of fourth moment assumptions, . . . ). then x ∈ C(A ) for all i. (That is, there are no revealed 1. Solve as explicit function of data and use LLN with i i preference cycles except for revealed indifference.) CMT/Slutsky. 2. Solve as explicit function of data and show bias → 0 Utility function (Choice 9-13; MWG 9) Utility function u: X → R and variance → 0. 2 Microeconomics represents % on X iff x % y ⇐⇒ u(x) ≥ u(y).

3. Solve explicitly, find the probability that |θˆ − θ0| < ε, 1. This turns choice rule into a maximization problem: and show the limit of this probability is 1 (per C&B 2.1 Choice Theory C(B, %) = argmaxy∈B u(y). 468). 2. A preference relation can be represented by a utility ˆ function only if it is rational (complete and transitive). 4. MaCurdy: if the estimator is defined by LT (θ) ≡ Rational preference relation (Choice 4; MWG 6–7) A binary rela- 1 P ˆ tion is a rational preference relation iff it satisfies 3. If |X| is finite, then any rational preference relation T t lt(θ) = 0, show that lt(θ0) ∼ niid(0), and that % % p can be represented by a utility function; if |X| is infinite, it satisfies an LLN so that LT (θ0) −→ 0. 1. Completeness: ∀x, y, x % y∨y % x (NB: implies x % x); this is not necessarily the case. 5. MaCurdy: if the estimator is defined by minimiz- 4. If X ⊆ n, then (complete, transitive) can be repre- ˆ 0 ˆ 1 P 2. Transitivity: ∀x, y, z,(x y∧y z) =⇒ x z (which R % ing HT (θ) MT HT (θ) with HT ≡ ht, show that % % % T t rules out cycles, except where there’s indifference). sented by a continuous utility function iff % is contin- ht(θ0) ∼ niid(0), and that it satisfies an LLN so that p uous (i.e., limn→∞(xn, yn) = (x, y) and ∀n, xn % yn HT (θ0) −→ 0. imply x % y). If % is rational, then is both irreflexive and transitive; ∼ 5. The property of representing on X is ordinal (i.e., 6. Aprajit/Hayashi: general consistency theorems with is reflexive, transitive, and symmetric; and x y % z =⇒ % and without compact parameter space. x z. invariant to monotone increasing transformations).

27 Interpersonal comparison (Choice 13) It is difficult to weigh util- Quasi-linear preferences (Choice 20–1; MWG 45) Suppose % on Constant returns to scale (Producer 5) y ∈ Y implies αy ∈ Y ity tradeoffs between people. Two possible systems are X = R × Y is complete and transitive, and that for all α ≥ 0; i.e., nonincreasing and nondecreasing returns Rawls’ “veil of ignorance” (which effectively makes all the to scale. choices one person’s), and a system of “just noticeable dif- 1. The numeraire good (“good 1”) is valuable: (t, y) % 0 0 ferences” (which suffers transitivity issues). (t0, y) iff t ≥ t0; Convexity (Producer 6) y, y ∈ Y imply ty + (1 − t)y ∈ Y for all t ∈ [0, 1]. Vaguely “nonincreasing returns to specialization.” 0 2. Compensation is possible: For every y, y ∈ Y , there If 0 ∈ Y , then convexity implies nonincreasing returns to Continuous preference (Choice 11; MWG 46–7, Micro P.S. 1-5) % on 0 ∞ exists some t ∈ R such that (0, y) ∼ (t, y ); scale. Strictly convex iff for t ∈ (0, 1), the convex combina- X is continuous if for any sequence {(xn, yn)}n=1 with 0 0 limn→∞(xn, yn) = (x, y) and ∀n, xn % yn, we have x % y. 3. No wealth effects: If (t, y) % (t , y ), then for all d ∈ R, tion is in the interior of Y . 0 0 Equivalently, % is continuous iff for all x, the upper and (t + d, y) (t + d, y ). n % Transformation function (Producer 6, 24) A function T : → lower contour sets of x are both closed sets. is rational R R % with T (y) ≤ 0 ⇐⇒ y ∈ Y . Can be interpreted as the and continuous iff it can be represented by a continuous util- Then there exists a utility function representing of the % amount of technical progress required to make y feasible. ity function. form u(t, y) = t + v(y) for some v : Y → . (Note it can also R The set {y : T (y) = 0} is the transformation frontier. be represented by utility functions that aren’t of this form.) ∗ Monotone preference (Choice 15–6; MWG 42–3) % is monotone iff Conversely, any preference relation % on X = R × Y rep- Kuhn-Tucker FOC gives necessary condition ∇T (y ) = λp, x ≥ y =⇒ x % y (i.e., more of something is better). resented by a utility function of the form u(t, y) = t + v(y) which means the price vector is normal to the production (MWG uses x  y =⇒ x y; this is not equivalent.) satisfies the above conditions. (MWG uses slightly different possibility frontier at the optimal production plan. formulation.) Strictly/strongly monotone iff x > y =⇒ x y. Marginal rate of transformation (Producer 6–7) When the transformation function T is differentiable, MRT between Lexicographic preferences (MWG 46) A preference relation on % is (notes) monotone iff u(·) is nondecreasing. % is strictly % ∂T (y) ∂T (y) 2 0 0 0 0 0 goods k and l is MRTk,l(y) ≡ / . Measures monotone iff u(·) monotone increasing. Strictly monotone R defined by (x, y) % (x , y ) iff x > x or x = x ∧ y ≥ y . ∂yl ∂yk =⇒ (notes or MWG) monotone. MWG monotone =⇒ Lexicographic preferences are complete, transitive, strongly the extra amount of good k that can be obtained per unit n monotone, and strictly convex; however, they is not contin- reduction of good l. Equals the slope of the transformation locally non-satiated (on e.g., R+). uous and cannot be represented by any utility function. frontier. Locally non-satiated preference (Choice 15–6; MWG 42–3) % is lo- cally non-satiated on X iff for any y ∈ X and ε > 0, there Production function (Producer 7) For a firm with only a single exists x ∈ X such that kx − yk ≤ ε and x y (i.e., there 2.2 Producer theory output q (and inputs −z), defined as f(z) ≡ max q such that T (q, −z) ≤ 0. Thus Y = {(q, −z): q ≤ f(z)}, allowing for are no “thick” indifference curves). % is locally non-satiated iff u(·) has no local maxima in X. Strictly monotone =⇒ Competitive producer behavior (Producer 1–2) Firms choose a free disposal. MWG monotone =⇒ locally non-satiated (on e.g., n ). production plan (technologically feasible set of inputs and R+ Marginal rate of technological substitution (Producer 7) outputs) to maximize profits. Assumptions include: When the production function f is differentiable, MRTS Convex preference (Choice 15–6; MWG 44–5) % is convex on X iff ∂f(z) ∂f(z) 0 between goods k and l is MRTSk,l(z) ≡ / . Mea- (de facto, X is a convex set, and) x % y and x % y together 1. Firms are price takers (applies to both input and output ∂zl ∂zk 0 imply that ∀t ∈ (0, 1), tx + (1 − t)x % y (i.e., one never gets markets); sures how much of input k must be used in place of one unit worse off by mixing goods). Equivalently, is convex on X of input l to maintain the same level of output. Equals the % 2. Technology is exogenously given; iff the upper contour set of any y ∈ X (i.e., {x ∈ X : x % y}) slope of the isoquant. is a convex set. Can be interpreted as diminishing marginal 3. Firms maximize profits; should be true as long as Profit maximization (Producer 7–8) The firm’s optimal produc- rates of substitution. n n • The firm is competitive; tion decisions are given by correspondence y : R ⇒ R % is strictly convex on X iff X is a convex set, and x % y • There is no uncertainty about profits; and x0 y (with x 6= x0) together imply that ∀t ∈ (0, 1), y(p) ≡ argmax p · y = {y ∈ Y : p · y = π(p)}. % • Managers are perfectly controlled by owners. y∈Y tx + (1 − t)x0 y. n Resulting profits are given by % is (strictly) convex iff u(·) is (strictly) quasi-concave. Production plan (Producer 4) A vector y = (y1, . . . , yn) ∈ R where an output has yk > 0 and an input has yk < 0. π(p) ≡ sup p · y. Homothetic preference (MWG 45, Micro P.S. 1.6) % is homothetic n y∈Y iff for all λ > 0, x % y ⇐⇒ λx % λy. (MWG uses ∀λ ≥ 0, Production set (Producer 4) Set Y ⊆ R of feasible production x ∼ y =⇒ λx ∼ λy.) A continuous preference relation is plans; generally assumed to be non-empty and closed. Rationalization: profit maximization functions (Producer 9– homothetic iff it can be represented by a utility function that 11, 13) 0 0 is homogeneous of degree one (note it can also be represented Free disposal (Producer 5) y ∈ Y and y ≤ y imply y ∈ Y . by utility functions that aren’t). 1. Profit function π(·) is rationalized by production set Y Shutdown (Producer 5) 0 ∈ Y . iff ∀p, π(p) = supy∈Y p · y. Separable preferences (Choice p.18–9) Suppose on X×Y is rep- % 2. Supply correspondence y(·) is rationalized by produc- resented by u(x, y). Then preferences over x do not depend Nonincreasing returns to scale (Producer 5) y ∈ Y implies αy ∈ tion set Y iff ∀p, y(p) ⊆ argmaxy∈Y p · y. on y iff there exist functions v : X → R and U : R × Y → R Y for all α ∈ [0, 1]. Implies shutdown. such that U is increasing in its first argument and ∀(x, y), 3. π(·) or y(·) is rationalizable if it is rationalized by some u(x, y) = U(v(x), y). Note that this property is asymmet- Nondecreasing returns to scale (Producer 5, Micro P.S. 2-1) y ∈ Y production set. ric. Preferences over x given y will be represented by v(x), implies αy ∈ Y for all α ≥ 1. Along with shutdown, implies 4. π(·) and y(·) are jointly rationalizable if they are both regardless of y. that π(p) = 0 or π(p) = +∞ for all p. rationalized by the same production set.

28 We seek a Y that rationalizes both y(·) and π(·). Consider an The latter two properties imply WAPM. The second de- Single-output case (Producer 22) For a single-output firm with “inner bound”: all production plans the firm chooses must scribes the FOC of the maximization problem, the third term free disposal, production set described as {(q, −z): z ∈ I S m be feasible (Y ≡ p∈P y(p)). Consider an “outer bound”: describes the second-order condition. R+ , q ∈ [0, f(z)]}. With positive output price p, Y can only include points that don’t give higher profits than profit-maximization requires q = f(z), so firms maximize O ∗ m π(p)(Y ≡ {y : p · y ≤ π(p) for all p ∈ P }). A nonempty- Rationalization: differentiable y(·) (Producer 16) Differentiable maxz∈ m pf(z) − w · z, where w ∈ R+ input prices. n n R+ valued supply correspondence y(·) and profit function π(·) y : P → R on an open convex set P ⊆ R is rationalizable on a price set are jointly rationalized by production set Y iff: iff Cost minimization (Producer 22, Micro P.S. 2-4) For a fixed output level q ≥ 0, firms minimize costs, choosing inputs according 1. p · y = π(p) for all y ∈ y(p) (adding-up); 1. y(·) is homogeneous of degree zero; to a conditional factor demand correspondence: 2. Y I ⊆ Y ⊆ Y O; i.e., p · y0 ≤ π(p) for all p, p0, and all 2. The Jacobian Dy(p) is symmetric and positive semidef- 0 0 inite. c(q, w) ≡ inf w · z; y ∈ y(p ) (Weak Axiom of Profit Maximization). z : f(z)≥q We construct π(·) by adding-up, ensure Hotelling’s Lemma Z∗(q, w) ≡ argmin w · z If we observe a firm’s choices for all positive price vectors on z : f(z)≥q an open convex set P , then necessary conditions for ratio- by symmetry and homogeneity of degree zero, and ensure nalizability include: convexity of π(·) by Hotelling’s lemma and PSD. = {z : f(z) ≥ q, and w · z = c(q, w)}.

Rationalization: differentiable π(·) (Producer 17) Differentiable Once these problems are solved, firms solve maxq≥0 pq − 1. π(·) must be a convex function; n π : P → R on a convex set P ⊆ R is rationalizable iff c(q, w). 2. π(·) must be homogeneous of degree one; i.e., π(λp) = By the envelope theorem, ∂c (w, q) = Z∗(q, w). λπ(p) for all p ∈ P and λ > 0; 1. π(·) is homogeneous of degree one; ∂w

3. y(·) must be homogeneous of degree zero; i.e., y(λp) = 2. π(·) is convex. Rationalization: single-output cost function (Producer 23, Mi- y(p) for all p ∈ P and λ > 0. cro P.S. 2-2) Conditional factor demand function z : R × W ⇒ n We construct y(·) by Hotelling’s Lemma, and ensure adding- R and differentiable cost function c: R × W → R for a fixed m Loss function (Producer 12) L(p, y) ≡ π(p) − p · y. This is the up by homogeneity of degree one; convexity of π(·) is given. output q on an open convex set W ⊆ R of input prices are loss from choosing y rather than the profit-maximizing fea- jointly rationalizable iff sible production plan. The outer bound can be written Rationalization: general y(·) and π(·) (Producer 17–9) y : P ⇒ 0 n and π : P → on a convex set P ⊆ n are jointly ratio- Y = {y : infp L(p, y) ≥ 0}. R R R 1. c(q, w) = w · z(q, w) (adding-up); nalizable iff for any selectiony ˆ(p) ∈ y(p), 2. ∇wc(q, w) = z(q, w) (Shephard’s Lemma); Hotelling’s Lemma (Producer 14) ∇π(p) = y(p), assuming differ- 0 1. p · yˆ(p) = π(p) (adding-up); entiable π(·). Equivalently, ∇pL(p, y)|p=p0 = ∇π(p )−y = 0 3. c(q, ·) is concave. for all y ∈ y(p0). An example of the Envelope Theorem. Im- 2. (Producer Surplus Formula) For any p, p0 ∈ P , plies that if π(·) is differentiable at p, then y(p) must be a Other necessary properties follow from corresponding prop- singleton. Z 1 erties of profit-maximization, e.g., π(p0) = π(p) + (p0 − p) · yˆ(p + λ(p0 − p)) dλ; 0 Substitution matrix (Producer 15–6) The Jacobian of the optimal 1. c(q, ·) is homogeneous of degree one in w; supply function, Dy(p) = [∂yi/∂pj ]. By Hotelling’s Lemma, 3. (p0 − p) · (ˆy(p0) − yˆ(p)) ≥ 0 (Law of Supply). 2. Z∗(q, ·) is homogeneous of degree zero in w; Dy(p) = D2π(p) (the Hessian of the profit function), hence ∗ the substitution matrix is symmetric. Convexity of π(·) im- 0 3. If Z (q, ·) is differentiable, then the matrix Producer Surplus Formula (Producer 17–20) π(p ) = π(p) + ∗ 2 DwZ (q, w) = D c(q, w) is symmetric and negative plies positive semidefiniteness. R 1(p0 − p) · yˆ(p + λ(p0 − p)) dλ. w 0 semidefinite; 0 0 Law of Supply (Producer 16) (p − p) · (y(p ) − y(p)) ≥ 0; i.e., sup- 1. “Works in the opposite direction of Hotelling’s Lemma: 4. Under free disposal, c(·, w) is nondecreasing in q; ply curves are upward-sloping. Law of Supply is the finite- it recovers the firm’s profits from its choices, rather than 5. If the production function has nondecreasing (nonin- difference equivalent of PSD of substitution matrix. Follows the other way around.” from WAPM (p · y(p) ≥ p · y(p0)). creasing) RTS, the average cost function c(q, w)/q is 2. If π(·) is differentiable, integrating Hotelling’s Lemma nonincreasing (nondecreasing) in q; along the linear path from p to p0 gives PSF; however Rationalization: y(·) and differentiable π(·) (Producer 15) 6. If the production function is concave, the cost function n PSF is more general (doesn’t require differentiability of y : P → R (the correspondence ensured to be a func- c(q, w) is convex in q. tion by Hotelling’s lemma, given differentiable π(·)) and π(·)). n differentiable π : P → R on an open convex set P ⊆ R are 3. As written the integral is along a linear path, but it is Monopoly pricing (MWG 384–7) Suppose demand at price p is jointly rationalizable iff actually path-independent. x(p), continuous and strictly decreasing at all p for which x(p) > 0. Suppose the monopolist faces cost function 4. PSF allows calculation of change in profits when price of 1. p · y(p) = π(p) (adding-up); c(q). Monopolist solves max px(p) − c(x(p)) for optimal good i changes by knowing only the supply function for p price, or maxq≥0 p(q)q − c(q) for optimal quantity (where 2. ∇π(p) = y(p) (Hotelling’s Lemma); good i; we need not know the prices or supply functions −1 R b p(·) = x (·) is the inverse demand function). Further as- 3. π(·) is convex. for other goods: π(p−i, b) − π(p−i, a) = a yˆi(pi) dpi. sumptions:

∗ n O If Y is convex and closed and has free disposal, and P = R+ \{0}, then Y = Y .

29 1. p(q), c(q) continuous and twice differentiable at all Increasing differences (Producer 30–1, Micro P.S. 2-4’) F : X × T → Milgrom-Shannon Monotonicity Theorem (Producer 34) ∗ q ≥ 0; R (with X, T ⊆ R) has ID (a.k.a. weakly increasing differ- X (t) ≡ argmaxx∈X F (x, t) is nondecreasing in t in SSO for ences) iff for all x0 > x and t0 > t, F (x0, t0) + F (x, t) ≥ all sets X ∈ iff it has the single-crossing condition (which 2. p(0) > c0(0) (ensures that supply and demand curves R F (x0, t) + F (x, t0). Strictly/strongly increasing differences is non-symmetric): for all x0 > x and t0 > t, cross); (SID) iff F (x0, t0) + F (x, t) > F (x0, t) + F (x, t0). 3. There exists a unique socially optimal output level Assuming F (·, ·) is sufficiently smooth, all of the following F (x0, t) ≥ F (x, t) =⇒ F (x0, t0) ≥ F (x, t0), and qO ∈ (0, ∞) such that p(qo) = c0(qo). are equivalent: F (x0, t) > F (x, t) =⇒ F (x0, t0) > F (x, t0). A solution qm ∈ [0, qo] exists, and satisfies FOC p0(qm)qm + 1. F has ID; p(qm) = c0(qm). If p0(q) < 0, then p(qm) > c0(qm); i.e., monopoly price exceeds optimal price. 2. Fx(x, t) is nondecreasing in t for all x; MCS: robustness to objective function perturbation ∗ (Producer 34–5) [Milgrom-Shannon] X (t) ≡ 3. Ft(x, t) is nondecreasing in x for all t; argmaxx∈X [F (x, t) + G(x)] is nondecreasing in t in SSO 2.3 Comparative statics 4. Fxt(x, t) ≥ 0 for all (x, t); for all functions G: X → R iff F (·) has ID. Note Topkis 5. F (x, t) is supermodular. gives sufficiency of ID. Implicit function theorem (Producer 27–8) Consider x(t) ≡ argmaxx∈X F (x, t). Suppose: Additional results: Complement inputs (Producer 40–2) Restrict attention to price m+1 1. F is twice continuously differentiable; vectors (p, w) ∈ R+ at which input demand correspon- 1. If F (·, ·) and G both have ID, then for all α, β ≥ 0, the dence z(p, w) is single-valued. If production function f(z) is 2. X is convex; function αF + βG also has ID. increasing and supermodular, then z(p, w) is nondecreasing in p and nonincreasing in w. That is, supermodularity of the 3. F < 0 (strict concavity of F in x; together with con- 2. If F has ID, and g1(·) and g2(·) are nondecreasing func- xx production function implies price-theoretic complementarity vexity of X, this ensures a unique maximizer); tions, then F (g1(·), g2(·)) has ID. of inputs. 4. ∀t, x(t) is in the interior of X. 3. Suppose h(·) is twice differentiable. Then h(x − t) has ID in x, t iff h(·) is concave. If profit function π(p, w) is continuously differentiable, then Then the unique maximizer is given by Fx(x(t), t) = 0, and zi(p, w) is nonincreasing in wj for all i 6= j iff π(p, w) is n Supermodularity (Producer 37) F : X → R on a sublattice X is supermodular in w. 0 Fxt(x(t), t) supermodular iff for all x, y ∈ X, we have F (x ∧ y) + F (x ∨ x (t) = − . y) ≥ F (x) + F (y). Fxx(x(t), t) Substitute inputs (Producer 41–2) Suppose there are only two in- 3 If X is a product set, F (·) is supermodular iff it has ID in all puts. Restrict attention to price vectors (p, w) ∈ R+ at which Note by strict concavity, the denominator is negative, so x0(t) pairs (xi, xj ) with i 6= j (holding other variables x−ij fixed). input demand correspondence z(p, w) is single-valued. If pro- and Fxt(x(t), t) have the same sign. duction function f(z) is increasing and submodular, then Submodularity (Producer 41) F (·) is submodular iff −F (·) is su- z1(p, w) is nondecreasing in w2 and z2(p, w) is nondecreas- Envelope theorem (Clayton Producer I 6–8) Given a constrained op- permodular. ing in w1. That is, submodularity of the production function timization v(θ) = maxx f(x, θ) such that g1(x, θ) ≤ b1; ... ; implies price-theoretic substitutability of inputs in the two g (x, θ) ≤ b , comparative statics on the value function are K K Topkis’ Theorem (Producer 31–2, 8) If input case. given by: If there are ≥ 3 inputs, feedback between inputs with un- 1. F : X × T → R (with X, T ⊆ R) has ID, K changing prices makes for unpredictable results. ∂v ∂f X ∂gk ∂L 0 = − λk = 2. t > t, ∂θi ∂θi x∗ ∂θi x∗ ∂θi x∗ If profit function π(p, w) is continuously differentiable, then k=1 3. x ∈ X∗(t) ≡ argmax F (ξ, t), and x0 ∈ X∗(t0), then ξ∈X zi(p, w) is nondecreasing in wj for all i 6= j iff π(p, w) is submodular in w. (for Lagrangian L) for all θ such that the set of binding con- 0 ∗ 0 ∗ 0 straints does not change in an open neighborhood. min{x, x } ∈ X (t) and max{x, x } ∈ X (t ). In other words, X∗(t) ≤ X∗(t0) in strong set order. This implies Can be thought of as application first of chain rule, and then sup X∗(·) and inf X∗(·) are nondecreasing; if X∗(·) is single- LeChatelier principle (Producer 42–45) Argument (a.k.a of FOCs. valued, then X∗(·) is nondecreasing. Samuelson-LeChatelier principle) that firms react more to input price changes in the long-run than in the short- If F : X × T → (with X a lattice and T fully ordered) is Envelope theorem (integral form) (Clayton Producer II 9–10) R run, because it has more inputs that it can adjust. Does supermodular in x and has ID in (x, t); t0 > t; and x ∈ X∗(t) [a.k.a. Holmstrom’s Lemma] Given an optimization v(q) = not consistently hold; only holds if each pair of inputs are and x0 ∈ X∗(t0), then (x∧x0) ∈ X∗(t) and (x∨x0) ∈ X∗(t0). max f(x, q), the envelope theorem gives us v0(q) = substitutes everywhere or complements everywhere. x In other words, X∗(·) is nondecreasing in t in the stronger f 0 (x(q), q). Integrating gives q set order. Suppose twice differentiable production function f(k, l) sat- isfies either f ≥ 0 everywhere, or f ≤ 0 everywhere. Then Z q2 ∂f kl kl v(q2) = v(q1) + (x(q), q) dq. Monotone Selection Theorem (Producer 32) Analogue of Top- if wage wl increases (decreases), the firm’s labor demand will ∂q q1 kis’ Theorem for SID. If F : X × T → R with X, T ∈ R has decrease (increase), and the decrease (increase) will be larger SID, t0 > t, x ∈ X∗(t), and x0 ∈ X∗(t0), then x0 ≥ x. in the long-run than in the short-run.

30 2.4 Consumer theory 2. For z ∈ z(p) (where z(·) is excess demand z(p) ≡ 3. Slutsky matrix is negative semidefinite (since e(·, u¯) is x(p, p · e) − e), we have p · z = 0; concave in prices), −∇pv Roy: x= ∂v/∂w 3. v(p, w) is nonincreasing in p and strictly increasing in 4. Slutsky matrix satisfies Dph(p, u¯)p = 0 (i.e., h(·, u¯ is r w. homogeneous of degree zero in prices). x(p, w) 2 v(p, w) K S v=u(x) O Expenditure minimization problem (Consumer 9) min n p · x∈R+ Rationalization: differentiable x (?) Slutsky matrix can be x such that u(x) ≥ u¯; whereu ¯ > u(0) and p  0. Finds the generated using Slutsky equation. Rationalizability requires ∂xi ∂xi ∂hi x=h(p,v) v(p,e)=¯u Slutsky: + xj = cheapest bundle that yields utility at leastu ¯. Equivalent to Marshallian demand to be homogeneous of degree 0, and ∂pj ∂w ∂pj h=x(p,e) e(p,v)=w cost minimization for a single-output firm with production the Slutsky matrix to be symmetric and negative semidefi- function u. nite. [Potentially also positive everywhere and/or increasing Shephard: h=∇pe   in w?] h(p, u¯) r e(p, u¯) If p  0, u(·) is continuous, and ∃xˆ such that u(ˆx) ≥ u¯, then 2 EMP has a solution. Adding-up: e=p·h Rationalization: differentiable e (?) Rationalizability re-

Expenditure function (Consumer 9) e(p, u¯) ≡ min n p · x such quires e to be: x∈R+ Budget set (Consumer 2) Given prices p and wealth w, B(p, w) ≡ n that u(x) ≥ u¯. {x ∈ R+ : p · x ≤ w}. 1. Homogeneous of degree one in prices; Hicksian demand correspondence (Consumer 9) [a.k.a. com- 2. Concave in prices; Utility maximization problem (Consumer 1–2, 6–8) n n pensated demand] h: R+ × R+ ⇒ R+ with h(p, u¯) ≡ {x ∈ maxx∈ n u(x) such that p · x ≤ w, or equivalently n 3. Increasing inu ¯; R+ : u(x) ≥ u¯} = argmin n p · x such that u(x) ≥ u¯. R+ x∈R+ maxx∈B(p,w) u(x). Assumes: 4. Positive everywhere, or equivalently nondecreasing in Relating Marshallian and Hicksian demand (Consumer 10) all prices. 1. Perfect information, Suppose preferences are continuous and locally non-satiated, 2. Price taking, and p  0, w ≥ 0,u ¯ ≥ u(0). Then: Slutsky equation (Consumer 13–4) Relates Hicksian and Marshal- 3. Linear prices, lian demand. Suppose preferences are continuous and lo- 1. x(p, w) = h(p, v(p, w)), cally non-satiated, p  0, and demand functions h(p, u¯) and 4. Divisible goods. 2. e(p, v(p, w) = w, x(p, w) are single-valued and differentiable. Then for all i, j, Construct Lagrangian L = u(x) + λ(w − p · x) + P µ x . i i i 3. h(p, u¯) = x(p, e(p, u¯)), ∂x (p, w) ∂h (p, u(x(p, w))) ∂x (p, w) If u is concave and differentiable, Kuhn-Tucker conditions i i i 4. v(p, e(p, u¯) =u ¯. = − xj (p, w), (FOCs, nonnegativity, complementary slackness) are neces- ∂pj ∂pj ∂w | {z } sary and sufficient. Thus ∂u/∂x ≤ λp with equality if | {z } | {z } k k Rationalization: h and differentiable e (Consumer 11) Hicksian Total Substitution Wealth xk > 0. For any two goods consumed in positive quantities, n demand function h: P × R+ → R+ and differentiable expen- ∂u/∂xj n ∂xi ∂hi ∂xi pj /pk = ≡ MRSjk. The Lagrange multiplier on the diture function e: P × → on an open convex set P ⊆ or more concisely, = − xj . ∂u/∂xk R R R ∂pj ∂pj ∂w budget constraint λ is the value in utils of an additional unit are jointly rationalizable by expenditure-minimization for a of wealth; i.e., the shadow price of wealth or marginal utility given utility levelu ¯ of a monotone utility function iff: Derived by differentiating hi(p, u¯) = xi(p, e(p, u¯)) with re- of wealth or income. spect to pj and applying Shephard’s lemma. 1. e(p, u¯) = p · h(p, u¯) (adding-up—together with Shep- Indirect utility function (Consumer 3–4) v(p, w) ≡ hard’s Lemma, ensures e(·, u¯) is homogeneous of degree Normal good (Consumer 15) Marshallian demand xi(p, w) increas- ing in w. By Slutsky equation, normal goods must be regular. supx∈B(p,w) u(x). Homogeneous of degree zero. one in prices); 2. ∇pe(p, u¯) = h(p, u¯) (Shephard’s Lemma—equivalent to Inferior good (Consumer 15) Marshallian demand x (p, w) decreas- Marshallian demand correspondence (Consumer 3–4) [a.k.a. envelope condition applied to e(p, u¯) = min p · h); i n h ing in w. Walrasian or uncompensated demand] x: R+ × R+ ⇒ n 3. e(·, u¯) is concave in prices. R+ with x(p, w) ≡ {x ∈ B(p, w): u(x) = v(p, x)} = argmax u(x). Regular good (Consumer 15) Marshallian demand xi(p, w) de- x∈B(p,w) Rationalization: differentiable h (Consumer 12) A continuously creasing in pi. differentiable Hicksian demand function h: P × → n on 1. Given continuous preferences, x(p, w) 6= for p  0 R+ R+ ∅ an open convex set P ⊆ n is rationalizable by expenditure- and w ≥ 0. R Giffen good (Consumer 15) Marshallian demand xi(p, w) increas- minimization with a monotone utility function iff ing in pi. By Slutsky equation, Giffen goods must be infe- 2. Given convex preferences, x(p, w) is convex-valued. rior. 3. Given strictly convex preferences, x(p, w) is single- 1. Hicksian demand is increasing inu ¯; and valued. 2. The Slutsky matrix Substitute goods (Consumer 15–6) Goods i and j substitutes iff Hicksian demand hi(p, u¯) is increasing in pj . Symmetric re- 4. x(p, w) is homogeneous of degree zero.  ∂h (p,u¯) ∂h (p,u¯)  1 ··· n lationship. In a two-good world, the goods must be substi- ∂p1 ∂p1   tutes. Walras’ Law (Consumer 4) Given locally non-satiated preferences: D h(p, u¯) =  . . .  p  . .. .    1. For x ∈ x(p, w), we have p · x = w (i.e., Marshallian de- ∂h1(p,u¯) ··· ∂hn(p,u¯) Complement goods (Consumer 15–6) Goods i and j complements ∂pn ∂pn mand is on budget line, and we can replace inequality iff Hicksian demand hi(p, u¯) is decreasing in pj . Symmetric constraint with equality in consumer problem); is symmetric, relationship.

31 Gross substitute (Consumer 15–7) Good i is a gross substitute for Equivalent variation (Consumer 21, 3) How much additional ex- 2.5 Choice under uncertainty good j iff Marshallian demand xi(p, w) is increasing in pj . penditure is required at old prices p to achieve same utility Not necessarily a symmetric relationship. as consumption at p0 (equivalent to price change—consumer Lottery (Uncertainty 2–4) A vector of probabilities adding to 1 as- faces either new prices or revised wealth). signed to each possible outcome (prize). The set of lotteries Gross complement (Consumer 15–7) Good i is a gross complement for a given prize space is convex. 0 0 0 for good j iff Marshallian demand xi(p, w) is decreasing in EV ≡ e(p, u¯ ) − e(p , u¯ ) Preference axioms under uncertainty (Uncertainty 5–6) In ad- pj . Not necessarily a symmetric relationship. 0 = e(p, u¯ ) − w dition to (usual) completeness and transitivity, assume pref-

Engle curve (Consumer 15–6) [a.k.a. income expansion curve] For erences are: a given price p, the locus of Marshallian demands at various which gives—when only price i is changing—the area to the 0 00 0 00 1. Continuous: For any p, p , p ∈ P with p % p % p , wealth levels. left of the Hicksian demand curve corresponding to the new 00 0 utility u0 by the consumer surplus formula and Shephard’s there exists α ∈ [0, 1] such that αp + (1 − α)p ∼ p . Offer curve (Consumer 16–7) [a.k.a. price expansion path] For a Lemma: 2. Independent: [a.k.a. substitution axiom] For any p, 0 00 0 given wealth w and prices (for goods other than good i) p−i, p , p ∈ P and α ∈ [0, 1], we have p % p ⇐⇒ Z pi 0 Z pi 00 0 00 the locus of Marshallian demands at various prices pi. ∂e(p, u¯ ) 0 αp + (1 − α)p % αp + (1 − α)p . = dpi = hi(p, u¯ ) dpi. 0 ∂pi 0 pi pi Expected utility function (Uncertainty 6–10) Utility function Roy’s identity (Consumer 17–8) Gives Marshallian demand from indirect utility: u: P → R has expected utility form iff there are numbers Marshallian consumer surplus (Consumer 23–4) Area to the left (u1, . . . , un) for each of the n (certain) outcomes such that R pi P of Marshallian demand curve: CS ≡ x (p, w) dp . for every p ∈ P, u(p) = pi · ui. ∂v(p, w)/∂pi p0 i i i xi(p, w) = − , i ∂v(p, w)/∂w Equivalently, for any p, p0 ∈ P, α ∈ [0, 1], we have 0 0 Price index (Consumer 25–6) Given a basket of goods consumed at u(αp + (1 − α)p ) = αU(p) + (1 − α)U(p ). 0 0 Derived by differentiating v(p, e(p, u¯)) =u ¯ with respect quantity x given price p, and quantity x given price p , Unlike a more general utility function, an expected utility to p and applying Shephard’s lemma. Alternately, by ap- functions is not merely ordinal—it is not invariant to any p0·x p0·x plying envelope theorem to utility maximization problem 1. Laspeyres index: p·x = e(p,u¯) (basket is old pur- increasing transformation, only to affine transformations. If ∂v ∂L v(p, w) = maxx : p·x≤w u(x) (giving ∂w = ∂w = λ and chases). Overestimates welfare effects of inflation due u(·) is an expected utility representation of %, then v(·) is ∂v ∂L to substitution bias. also an expected utility representation of iff ∃a ∈ , ∃b > 0 ∂p = ∂p = −λx.) % R p0·x0 e(p0,u¯0) such that v(p) = a + bu(p) for all p ∈ P. 2. Paasche index: p·x0 = p·x0 (basket is new pur- Consumer welfare: price changes (Consumer 20-2, 4) Welfare chases). Underestimates welfare effects of inflation. Preferences can be represented by an expected utility func- change in utils when prices change from p to p0 is v(p0, w) − tion iff they are complete, transitive, and satisfy continuity 0 v(p, w), but because utility is ordinal, this is meaningless. 3. Ideal index: e(p ,u¯) for some fixed utility levelu ¯, gener- and independence (assuming |P| < ∞; otherwise we also More useful to have dollar-denominated measure. So mea- e(p,u¯) ally either u(x) or u(x0); the percentage compensation need the “sure thing principle”). Obtains since both require sure amount of additional wealth required to reach some ref- in the wealth of a consumer with utilityu ¯ needed to indifference curves to be parallel straight lines. erence utility, generally either previous utility (CV) or new make him as well off at the new prices as he was at the utility (EV). Bernoulli utility function (Uncertainty 12) Assuming prize space old ones. X is an interval on the real line, Bernoulli utility function If preferences are quasi-linear, then CV = EV. u: X → assumed increasing and continuous. Paasche ≤ Ideal ≤ Laspeyres. Substitution biases result from R On any range where the good in question is either normal or using the same basket of goods at new and old prices. Include inferior, min{CV, EV} ≤ CS ≤ max{CV, EV}. von Neumann-Morgenstern utility function (Uncertainty 12) An expected utility representation of preferences over lot- 1. New good bias, Compensating variation (Consumer 21, 3) How much less wealth teries characterized by a cdf over prizes X (an interval on consumer needs to achieve same utility at prices p0 as she 2. Outlet bias. the real line). If F (x) is the probability of receiving less had at p (compensating for price change—consumer faces than or equal to x, and u(·) is the Bernoulli utility function, then vN-M utility function U(F ) ≡ R u(x) dF (x). both new prices and new wealth). Aggregating consumer demand (Consumer 29–32) R

Risk aversion (Uncertainty 12–4) A decision maker is (strictly) risk- CV ≡ e(p, u¯) − e(p0, u¯) 1. Can we predict aggregate demand knowing only aggre- averse iff for any non-degenerate lottery F (·) with expected 0 gate wealth (not distribution)? True iff indirect utility R = w − e(p , u¯) value EF = x dF (x), the lottery δE which pays EF for functions take Gorman form: v (p, w ) = a (p) + b(p)w R F i i i i certain is (strictly) preferred to F . with the same function b(·) for all consumers. which gives—when only price i is changing—the area to the Stated mathematically, R u(x) dF (x) ≤ u(R x dF (x)) for all 2. Can aggregate demand be explained as though there R left of the Hicksian demand curve corresponding to the old F (·), which by Jensen’s inequality obtains iff u(·) is concave. utility u by the consumer surplus formula and Shephard’s were a single “positive representative consumer”? Lemma: The following notions of u(·) being “more risk-averse” then 3. (If 2 holds), can the welfare of the representative con- v(·) are equivalent: p p sumer be used as a proxy for some welfare aggregate of Z i ∂e(p, u¯) Z i ∗ = dpi = hi(p, u¯) dpi. individual consumers? (i.e., Do we have a “normative 1. F %u δx =⇒ F %v δx for all F and x. p0 ∂pi p0 i i representative consumer”?) 2. Certain equivalent c(F, u) ≤ c(F, v) for all F .

∗ Note this does not mean F %u G =⇒ F %v G where G is also a risky prospect—this would be a stronger version of “more risk averse.”

32 3. u(·) is “more concave” than v(·); i.e., there exists an If insurance is actuarially fair (q = p), the agent fully insures 1. ∀i, ui(·) is continuous; ∗ increasing concave function g(·) such that u = g ◦ v. (a = L) for all wealth levels. If p < q, the agent’s insurance 2. ∀i, ui(·) is increasing; i.e., ui(x0) > ui(x) whenever ∗ 4. Arrow-Pratt coefficient A(x, u) ≥ A(x, v) for all x. coverage a will decrease (increase) with wealth if the agent x0  x; has decreasing (increasing) absolute risk aversion. 3. ∀i, ui(·) is concave; Certain equivalent (Uncertainty 13–4) c(F, u) is the certain pay- Portfolio problem (Uncertainty 23–5) A risk-averse agent with i out such that δ ∼u F , or equivalently u(c(F, u)) = 4. ∀i, e  0; c(F,u) wealth w must choose to allocate investment between a R u(x) dF (x). Given risk aversion (i.e., concave u), c(F, u) ≤ R “safe” asset that returns r and a risky asset that pays re- Walrasian equilibrium (G.E. 4, 17, 24–9) A WE for economy E is EF . i turn z with cdf F . a vector of prices and allocations (p, (x )i∈I ) such that: Absolute risk aversion (Uncertainty 14–6) For a twice differen- If risk-neutral, the agent will invest all in the asset with i 1. Agents maximize their utilities: maxx∈Bi(p) u (x) for tiable Bernoulli utility function u(·), the Arrow-Pratt coeffi- higher expected return (r or E z). If (strictly) risk-averse, all i ∈ I; cient of absolute risk aversion is A(x) ≡ −u00(x)/u0(x). she will invest at least some in the risky asset as long as 2. Markets clear: P xi = P ei for all l ∈ L, or its real expected return is positive. (To see why, consider i∈I l i∈I l u(·) has decreasing (constant, increasing) absolute risk aver- equivalently P xi = P ei. sion iff A(x) is decreasing (. . . ) in x. Under DARA, if I will marginal utility to investing in the risky asset at investment i∈I i∈I gamble $10 when poor, I will gamble $10 when rich. a = 0.) Under assumptions 1–4 above, a WE exists (proof using fixed Since R(x) = xA(x), we have IARA =⇒ IRRA. If u is more risk-averse than v, then u will invest less in the point theorem). WE are not generally unique, but are locally risky asset than v for any initial level of wealth. An agent unique (and there are an odd number of them). Price ad- Certain equivalent rate of return (Uncertainty 16) A propor- with decreasing (constant, increasing) absolute risk aversion justment process (“tatonnement”) may not converge to an tionate gamble pays tx where t is a non-negative random will invest more (same, less) in the risky asset at higher levels equilibrium. variable with cdf F . The certain equivalent rate of return is of wealth. i IL cr(F, x, u) ≡ tˆ where u(txˆ ) = R u(tx) dF (t). Feasible allocation (G.E. 4) An allocation (x )i∈I ∈ R+ is fea- P i P i Subjective probabilities (Uncertainty 26–8) We relax the assump- sible iff i∈I x ≤ i∈I e . Relative risk aversion (Uncertainty 16) For a twice differentiable tion that there are objectively correct probabilities for var- i Bernoulli utility function u(·), the coefficient of relative risk ious states of the world to be realized. If preferences over Pareto optimality (G.E. 5) A feasible allocation x ≡ (x )i∈I for aversion is R(x) ≡ −xu00(x)/u0(x) = xA(x). acts (bets) satisfy a set of properties “similar in spirit” to economy E is Pareto optimal iff there is no other feasible al- locationx ˆ such that ui(ˆxi) ≥ ui(xi) for all i ∈ I with strict u(·) has decreasing (constant, increasing) relative risk aver- the vN-M axioms (completeness, transitivity, something like inequality for some i ∈ I. sion iff R(x) is decreasing (. . . ) in x. An agent exhibits continuity, the sure thing principle, and two axioms that have the flavor of substitution), then decision makers’ choices are DRRA iff certain equivalent rate of return cr(F, x) is increas- Edgeworth box (G.E. 6–10) Graphical representation of the two- ing in x. Under DRRA, if I will invest 10% of my wealth in consistent with some utility function and some prior proba- bility distribution (Savage 1954). good, two-person exchange economy. Bottom left corner is a risky asset when poor, I will invest 10% when rich. origin for one consumer; upper right corner is origin for other Since R(x) = xA(x), we have DRRA =⇒ DARA. Savage requires an exhaustive list of possible states of the consumer (with direction of axes reversed). Budget line has world. No reason to assume different decision makers are us- slope −p1/p2, and passes through endowment e. First-order stochastic dominance (Uncertainty 17–8) cdf G first- ing the same implied probability distribution over states, al- order stochastically dominates cdf F iff G(x) ≤ F (x) for all though we often make a “common prior” assumption, which 1. Locus of Marshallian demands for each consumer as rel- x. implies that “differences in opinion are due to differences in ative prices shift is her offer curve. WE are intersec- information.” tions of the two consumers’ offer curves. Equivalently, for every nondecreasing function u: R → R, R u(x) dG(x) ≥ R u(x) dF (x). 2. Set of PO allocations is locus of points of tangency be- 2.6 General equilibrium tween the two consumers’ indifference curves, generally Equivalently, we can construct G as a compound lottery a path from the upper right to lower left of the Edge- starting with F and followed by (weakly) upward shifts. i i worth box. Walrasian model (G.E. 3–4, 5) An economy E ≡ ((u , e )i∈I ) Second-order stochastic dominance (Uncertainty 18–21) cdf G comprises: 3. Portion of Pareto set that lies between the indifference second-order stochastically dominates cdf F (where F and curves that pass through e is the contract curve: PO ∗ R x 1. L commodities (indexed l ∈ L ≡ {1,...,L)); outcomes preferred by both consumers to their endow- G have the same mean ) iff for every x, −∞ G(y) dy ≤ R x ments. −∞ F (y) dy. 2. I agents (indexed i ∈ I ≡ {1,...,I)), each with i L i Equivalently, for every (nondecreasing?) concave function • Endowment e ∈ R+, and First Welfare Theorem (G.E. 11) If ∀i, u (·) is increasing (i.e., R R ui(x0) > ui(x) whenever x0  x) and (p, (xi) ) is a WE, u: R → R, u(x) dG(x) ≥ u(x) dF (x). • Utility function ui : L → . i∈I R+ R then the allocation (xi) is PO. Note implicit assumptions Equivalently, we can construct F as a compound lottery i∈I L such as starting with G and followed by mean-preserving spreads. Given market prices p ∈ R+, each agent chooses con- sumption to maximize utility given a budget constraint: 1. All agents face same prices; Demand for insurance (Uncertainty 21–3) A risk-averse agent i i maxx∈ L u (x) such that p · x ≤ p · e , or equivalently with wealth w faces a probability of p of incurring a loss R+ 2. All agents are price takers; p ∈ Bi(p) ≡ {x: p · x ≤ p · ei}. L. She can insure against this loss by buying a policy that 3. Markes exist for all goods, and individuals can freely will pay out a in the event the loss occurs, at cost qa. We often assume (some or all of): participate;

∗If E F > E G, there is always a concave utility function that will prefer F to G.

33 4. Prices are somehow arrived at. 2. If z(p∗) = 0 (i.e., p∗ are WE prices), then for any p not 2. There is a unique root—a node that has no predecessors colinear with p∗, we have p∗ · z(p) > 0. and is a predecessor for everything else; Proof by adding results of Walras’ Law across consumer at 3. The tatonnement process dp = αz(p(t)) with α > 0 3. There is a unique path (following precedence) from the potentially Pareto-improving allocation. Result shows that dy root to each terminal node (those nodes without suc- converges to WE prices for any initial condition p(0). allocation cannot be feasible. cessors). i 4. Any change that raises the excess demand for good k Second Welfare Theorem (G.E. 11-3) If allocation (e )i∈I is PO will increase the equilibrium price of good k. We also add: and 1. Information: a partition over the set of nodes such that (Jackson) i Incomplete markets Under incomplete markets, Wal- 1. ∀i, u (·) is continuous; rasian equilibria may: • The same player makes the decisions at all nodes within any element of the partition; 2. ∀i, ui(·) is increasing; i.e., ui(x0) > ui(x) whenever x0  x; 1. Fail to be efficient, and even fail to be constrained ef- • The same actions are available at all nodes within ficient (i.e., there may be more efficient outcomes that any element of the partition; i 3. ∀i, u (·) is concave; are feasible under restricted trading regime); • No element of the partition contains both a node i and its predecessor. 4. ∀i, e  0; 2. Fail to exist (“although only rarely”); 2. Payoffs: a mapping from terminal nodes to a vector of L i 3. Have prices/trades that depend on the resolution of un- then there exists a price vector p ∈ R+ such that (p, (e )i∈I ) utilities. is a WE. certainty. Perfect recall (Bernheim 6) Informally, a player never forgets ei- Note this does not say that starting from a given endowment, Rational expectations equilibrium (Jackson) Suppose l (prim- ther a decision he made in the past or information that he every PO allocation is a WE. Thus decentralizing a PO al- itive) goods, state space S (with |S| < ∞), and n agents possessed when making a previous decision. location is not simply a matter of identifying the correct each of whom has prices—large-scale redistribution may be required as well. Perfect information (Bernheim 7) Every information set is a sin- l gleton. Proof by separating hyperplane theorem. Consider the set • endowment ei : S → R+, of changes to total endowment that strictly improve every l • preferences ui : R+ × S → R, Complete information (Bernheim 13, 73–4) Each player knows the consumer’s utility; by concavity of ui(·), this set is convex. payoffs received by every player at every terminal node. Separate this set from 0, and show that the resulting prices • information Ii (including information contained in ei Per Harsanyi, a game of incomplete information can be writ- are nonnegative, and that at these prices, ei maximizes each and ui). ten as a game of imperfect information by adding nature as a consumer’s utility. l |S|·l player whose choices determine hidden characteristics. The The allocations xi : S → R+ (or equivalently xi ∈ R+ ) i i i i i l probability governing Nature’s decisions is taken to be com- Excess demand (G.E. 18) z (p) ≡ x (p, p · e ) − e , where x is the and prices p: S → R+ are an REE iff: agent’s Marshallian demand. Walras’ Law gives p·zi(p) = 0. mon knowledge. NE of this “Bayesian game” is a Bayesian NE. P i 1. Information revelation: xi is measurable with respect Aggregate excess demand is z(p) ≡ i∈I z (p). If z(p) = 0, i i to Ii ∨ Iprices for all i; Strategy (Bernheim 7–8) A mapping that assigns a feasible action to then (p, (x (p, p · e ))i∈I ) is a WE. P P all information sets for which a player is the decision-maker 2. Market clearing: i xi(s) ≤ i ei(s) for all s ∈ S; Sonnenschein-Mantel-Debreu Theorem (G.E. 30) Let B ⊆ (i.e., a complete contingent plan). Notation is: 3. Optimizing behavior: xi(s) ∈ argmax ui[xi(s)] such L be open and bounded, and f : B → L be continu- x R++ R that xi is measurable with respect to Ii ∨ Iprices and 1. Si the set of player i’s feasible strategies; ous, homogeneous of degree zero, and satisfy p · z(p) = 0 for p(s) · xi(s) ≤ p(s) · ei(s) for all i. 2. S ≡ Q S the set of feasible strategy profiles; all p. Then there exist an economy E with aggregate excess j j Q demand function z(p) satisfying z(p) = f(p) on B. 3. S−i ≡ j6=i Sj the set of feasible strategy profiles for every player but i. Interpretation is that without special assumptions, pretty 2.7 Games much any any comparative statics result could be obtained Payoff function (Bernheim 8) gi : S → R gives player i’s expected Game tree (Bernheim 2–5) Description of a game comprising: in a GE model. However, Brown and Matzkin show that if utility if everyone plays according to s ∈ S. we can observe endowments as well as prices, GE may be 1. Nodes, testable. Normal form (Bernheim 8) A description of a game as a collection 2. A mapping from nodes to the set of players, of players {1,...,I}, a strategy profile set S, and a payoff I Gross substitutes property (G.E. 32–5) A Marshallian demand function g : S → R where g(s) = (g1(s), . . . , gI (s)). 3. Branches, function x(p) satisfies the gross substitutes property if for Revelation principle (Bernheim 79) In a game with externali- all k, whenever p0 > p and p0 = p , then x (p0) > 4. A mapping from branches to the set of action labels, k k −k −k −k ties, when the mechanism designer doesn’t have information x−k(p); i.e., all pairs of goods are (strict) gross substitutes. 5. Precedence (a partial ordering), about preferences, agents will have an incentive to under- Implies that excess demand function satisfies gross substi- state or exaggerate their preferences. The revelation princi- tutes. If every individual satisfies gross substitutes, then so 6. A probability distribution over branches for all notes that map to nature. ple states that in searching for an optimal mechanism within does aggregate excess demand. a much broader class, the designer can restrict attention to If aggregate excess demand satisfies gross substitutes, We assume: direct revelation mechanisms (those that assume agents have revealed their true preferences) for which truth-telling is an 1. The economy has at most one (price-normalized) WE. 1. There is a finite number of nodes; optimal strategy for each agent.

34 ∗ Proper subgame (Bernheim 91) Consider a node t in an extensive Weakly dominated strategy (Bernheim 20) sˆi is a weakly domi- Pure strategy Nash equilibrium (Bernheim 29–33) s ∈ S is a ∗ ∗ ∗ form game, with information set h(t) and successors S(t). nated strategy iff there exists some probability distribution ρ PSNE iff for all s ∈ S, we have gi(si , s−i) ≥ gi(si, s−i). 1 M Then {t} ∪ S(t) (along with associated mappings from infor- over Si ≡ {si , . . . , si } such that for all s−i ∈ S−i, we have The strategies played in a PSNE are all rationalizable. mation sets to players, from branches to action labels, and A finite game of perfect information has a PSNE [Zermelo]. from terminal notes to payoffs) is a proper subgame iff M X m m Proved using backwards induction. ρ gi(s , s−i) ≥ gi(ˆsi, s−i), i If S ,...,S are compact, convex Euclidean sets and g is 1. h(t) = {t} (the information set for t is a singleton); and m=1 1 I i continuous in s and quasiconcave in si, then there exists a 2. ∀t0 ∈ S(t), we have h(t0) ⊆ S(t) (every player knows we with strict inequality for some s−i ∈ S−i. PSNE. By Berge’s Theorem, the best response correspon- are at t). dence is upper hemi-continuous. By quasiconcavity of gi, We cannot iteratively delete weakly dominated strategies; the best response correspondence is convex valued. Thus by unlike for strict domination, the order of deletion matters. System of beliefs (Bernheim 94) Given decision nodes X, infor- Kakutani’s Fixed Point Theorem, it has a fixed point. mation sets H, including the information set h(t) contain- ing t ∈ X, and φ(h) the player who makes the decision at Mixed strategy Nash equilibrium (Bernheim 56–9) We define a information set h ∈ H, a system of beliefs is a mapping 2.9 Equilibrium concepts new game where the strategy space is the set of probability P distributions over (normal form) strategies in the original µ: X → [0, 1] such that ∀h ∈ H we have t∈h µ(t) = 1. That is, a set of probability distributions over nodes in each PSNE ⊆ rationalizable ⊆ ISD. Rationalizable strategies are game (i.e., the set of mixed strategies). A MSNE of the information set. a best response based on some prior. PSNE are best re- original game is any PSNE of this new game. Players must sponses based on a common prior. be indifferent between playing any strategies included (with strictly positive probability) in a MSNE. Strictly mixed strategy (Bernheim 98) Behavior strategy profile Normal form equilibrium concepts (static games): THPE ⊆ δ is strictly mixed if every action at every information set is MSNE; PSNE ⊆ MSNE; MSNE are made up of rationaliz- Another approach is to consider randomization over ac- selected with strictly positive probability. able strategies. BNE are MSNE of “extended game” that tions at each information set (behavior strategies). However, includes nature choosing types. Kuhn’s Theorem assures us that for any game of perfect Note since every information set is reached with strictly pos- recall there are mixed strategies that yields the same dis- itive probability, one can completely infer a system of beliefs Extensive form equilibrium concepts (dynamic games)∗: tribution over outcomes as any combination of behavioral using Bayes’ rule. strategies; also it gives that there are behavioral strategies that yield the same distribution over outcomes as any com- MSNE bination of mixed strategies. 2.8 Dominance Every finite game has a MSNE. WPBE SPNE Dominant strategy (Bernheim 13) si is a (strictly) dominant PBE Trembling-hand perfection (Bernheim 65–7) There is always strategy iff for alls ˆ ∈ S withs ˆi 6= si, we have gi(si, sˆ−i) > SE some risk that another player will make a “mistake.” Bern- g (ˆs , sˆ ). i i −i EFT heim notes include two equivalent rigorous definitions; origi- HPE nal definitions are for finite games, but there is an extension Dominated strategy (Bernheim 15) si is a (strictly) dominated available for infinite games. Key notes: strategy iff there exists some probability distribution ρ over 1 M 1. In a THPE, no player selects a weakly dominated strat- Si ≡ {si , . . . , si } such that for all s−i ∈ S−i, we have egy with positive probability. M Rationalizable strategy (Bernheim 27) 2. For two player games, an MSNE is a THPE iff no player X m m ρ gi(si , s−i) > gi(ˆsi, sˆ−i). selects a weakly dominated strategy with positive prob- m=1 1. A 1-rationalizable strategy is a best response to some ability. † (independent) probability distribution over other play- 3. For more than two player games, the set of THPE may ers’ strategies. Iterated deletion of dominated strategies (Bernheim 15–6) We be smaller than the set of MSNE where no player selects iteratively delete (strictly) dominated strategies from the 2. A k-rationalizable strategy is a best response to some a weakly dominated strategy with positive probability; game. Relies on common knowledge of rationality (i.e., ev- (independent) probability distribution over other play- if we allow correlations between the trembles of different eryone is, everyone knows, everyone knows everyone knows, ers’ (k − 1)-rationalizable strategies. players, the sets are the same. . . . ). If this yields a unique outcome, the game is “dominance 3. A rationalizable strategy is k-rationalizable for all k. Correlated equilibrium (Bernheim 70) In a finite game solvable.” The order of deletion is irrelevant. I ∗ ({Si}i=1, g), a probability distribution δ over S is a CE iff For two player games, strategies that survive iterative dele- For two player games, rationalizable strategies are precisely for all i and si chosen with strictly positive probability, si tion of dominated strategies are precisely are precisely the those that survive iterative deletion of dominated strategies. solves 0 ∗ set of rationalizable strategies. This equivalence holds for This equivalence holds for games with more than two play- max Es−i [gi(si, s−i)|si, δ ]; s0 ∈S games with more than two players only if we do not insist on ers only if we do not insist on independence in defining ra- i i independence in defining rationalizability; if we do insist on tionalizability; if we do insist on independence, the set of i.e., player i has no incentive to defect from any strategy si, independence, the set of rationalizable strategies is smaller. rationalizable strategies is smaller. assuming that other players respond per δ∗.

∗Unclear whether PBE is the intersection of WPBE and SPNE, or a subset of the intersection. †Question is whether to allow other players’ randomized strategies to be correlated with each other.

35 Bayesian Nash equilibrium (Bernheim 73–4) Per Harsanyi, we Extensive form trembling hand perfection (Bernheim 102–5) 2. Characterized by strategic complements; i.e., best re- write our game of incomplete information as a game of imper- “Agent normal form” is the normal form that would obtain sponse curves are upward sloping. fect information where nature selects hidden characteristics. if each player selected a different agent to make her decisions 3. All production is done by most efficient firm. A BNE is a MSNE of this “Bayesian game.” A pure strategy at every information set, and all of a player’s agents acted BNE is a PSNE of the Bayesian game. independently with the object of maximizing the player’s Bertrand competition—non-spatial differentiation payoffs. An EFTHPE is a THPE in the agent normal form Characterized by a set of decision rules that determine each (Bernheim 38–40) Firm i chooses price for good i. Demand of the game. player’s strategy contingent on his type. for good i is given by Q(pi, p−1). Strategic complements; EFTHPE ⊆ SE; for generic finite games, they are the same. i.e., best response curves are upward sloping. Subgame perfect Nash equilibrium (Bernheim 92) A MSNE in behavior strategies δ∗ is a SPNE iff for every proper sub- Bertrand competition—horizontal differentiation game, the restriction of δ∗ to the subgame forms a MSNE in 2.10 Basic game-theoretic models (Bernheim 40–2) a.k.a. Hotelling spatial location model. Con- behavior strategies. sumers are indexed by θ ∈ [0, 1], representing location. Each Other models and examples are throughout Bernheim lecture consumer purchases zero or one unit, with payoff 0 if no For finite games, we can find SPNE using backwards induc- notes. purchase, and v − p − t(x − θ)2 from purchasing a type x tion on subgames of extensive form. i i i good at price pi. v is value of good, t is unit transport cost. Cournot competition (Bernheim 11, 46–50) Firms simultaneously A SPNE need not be a WPBE, and a WPBE need not be a If firms cannot choose product type x , prices are strategic choose quantity. Inverse demand is P (Q) (monotonically de- i SPNE. complements; i.e., best response curves are upward sloping. creasing); cost to firm i of producing quantity qi is ci(qi) Sequentially rational strategy (Bernheim 94) Behavior strategy where ci(0) = 0. Normal form: Bertrand competition—vertical differentiation (Bernheim profile δ is sequentially rational given a system of beliefs µ 42–5) Consumers are indexed by θ ∈ [0, 1], representing value 1. Strategies: S = I ; iff for all information sets h ∈ H, the actions for player φ(h) R+ attached to quantity. Each consumer purchases zero or at h ∪ [S S(t)] are optimal starting from h given an ini- P t∈h 2. Payouts: gi(s) = P ( sj )si − ci(si). one unit, with payoff 0 if no purchase, and θvi − pi from tial probability over h governed by µ, and given that other j purchasing a quality xi good at price pi. players will adhere to δ−φ(h). To ensure PSNE existence, we need quasiconcavity of gi in si (generally don’t worry about unboundedness of strategy Sequential competition (Bernheim 107–12) First one firm selects Weak perfect Bayesian equilibrium (Bernheim 93–5) Implausi- set). Sufficient conditions are ci(·) convex and P (·) concave. price/quantity, then the other firm follows. The leader al- ble equilibria can still be SPNE, because we lack beliefs at The former rules out increasing returns to scale. The latter ways does (weakly) better than in the simultaneous choice ∗ each information set. Behavior strategy profile δ and sys- “is not a conventional property of demand functions, but is model; whether the follower does better or worse than in si- ∗ tem of beliefs µ are a WPBE iff: satisfied for linear functions.” multaneous choice depends whether there are strategic com- ∗ ∗ Note: plements or substitutes. This also determines which firm 1. δ is sequentially rational given beliefs µ , and does better in the sequential choice model. 2. Where possible, µ∗ is computed from δ∗ using Bayes’ 1. Characterized by strategic substitutes; i.e., best re- ∗ P 2 rule; i.e., for any information set h with Pr(h|δ ) > 0, sponse curves are downward sloping. Herfindahl-Hirshman index (Bernheim 37) H ≡ 10000 × i αi , for all t ∈ h, where αi is the market share of firm i. When all N firms Pr(t|δ∗) 2. Production is spread among firms. evenly split market, H = 10000/N. µ∗(t) = . Pr(h|δ∗) Bertrand competition (Bernheim 11, 37) Firms simultaneously 0 P pi−ci(qi) Lerner index (Bernheim 37) L ≡ αiLi, where Li ≡ choose price; consumers purchase from low-price firm. De- i pi Note only restriction on out-of-equilibrium beliefs is that mand is Q(p) (monotonically decreasing); cost to firm i of is firm i’s margin and αi is the market share of firm i. they exist. A SPNE need not be a WPBE, and a WPBE producing quantity q is c (q ) where c (0) = 0. Normal form need not be a SPNE. i i i i (two-firm case): Monopolistic competition (Bernheim 135–8) ∗ ∗ Perfect Bayesian equilibrium (Bernheim 93–5) (δ , µ ) is a PBE I 1. Products are distinct, and each firm faces a downward- 1. Strategies: S = R ; if it a WPBE in all proper subgames. This ensures it is also + sloping demand curve; a SPNE. 2. Payouts: 2. The decisions of any given firm have negligible effects on  any other firm (note this does not hold for the Hotelling Consistent strategy (Bernheim 98) Behavior strategy profile δ is 0, si > s−i;  model, where firms have a measurable effect on their consistent given a system of beliefs µ iff there exists a se- g (s) = s Q(s ) − c (Q(s )), s < s ; i i i i i i −i neighbors); quence of strictly mixed behavior strategy profiles δn → δ  1   2 siQ(si) − ci(Q(si)) , si = s−i. such that µn → µ, where µn is generated from δn by Bayes’ 3. There is free entry, with zero profits. rule. Note: Can formalize as a vector of N differentiated commodities ∗ ∗ Sequential equilibrium (Bernheim 98) (δ , µ ) is a SE if it is se- (for N large) and a numeraire good y, where representative 1. One firm case is monopoly (in which case demand mat- quentially rational and consistent. consumer has utility ters); more than one yields perfectly competitive out- SE places additional restriction on beliefs vs. WPBE, hence come since only the marginal cost of the second-most N SE ⊆ WPBE; also, can show that SE are SPNE, so an SE is efficient firm matters—the properties of the demand X  u(x, y) = y + g f(xi) also a PBE. curve are then irrelevant. i=1

36 and the curvature of f(·) gives the extent of substitutability Public good (Bernheim 51–4) A non-rival and non-excludable good. Folk Theorem (Bernheim 171–2, 5, 7–8) Consider a supergame of the goods. formed by repeating a finite stage game an infinite number Spence signaling model (Bernheim 218–35) Workers choose edu- of times; suppose players use the average payoff criterion. Relationship between equilibrium variety and optimal vari- cation level; education does nothing for worker’s productiv- Then the set of feasible and individually rational payoffs is ety is dictated by: ity, but is less costly on margin for more productive workers. precisely the set of average payoffs for Nash equilibria (which All workers have same outside opportunities. Three types of 1. When a product is added, revenues generated fall short need not be SPNE). WPBE: of incremental consumer surplus because firms can’t 1. “Anything can happen;” this makes comparative statics perfectly price discriminate; this biases towards too lit- 1. Separating equilibria: different types choose different are problematic. tle entry. education levels. 2. Inability to write binding contracts is not very dam- 2. Firms don’t take into account the effect of introducing 2. Pooling equilibria: different types choose same educa- aging; anything attainable through a contract can also a product on the profits of others; if goods are substi- tion levels. (Note pooling equilibria with strictly posi- be obtained through some self-enforcing agreement (if tutes, this biases towards too much entry. tive levels of education are highly inefficient—education there is no discounting). adds nothing to productivity nor does it differentiate If goods are complements, these biases reinforce and we have workers.) For discounted payoffs, all feasible payoffs that strictly ex- too little variety relative to social optimum. ceed minmax payoffs for every player are the average payoff 3. Hyprids: some members of differently-typed groups for some NE and all discount rates sufficiently close to 1. Entry deterrence (Bernheim ) An incumbent can take some ac- pool, others separate. tion with long-term commitment prior to the arrival of an Subject to some technical conditions, versions of the folk entrant that makes entry less attractive. For example: Equilibrium satisfies equilibrium dominance condition a.k.a. theorem (with and without discounting) hold for SPNE. the “intuitive criterion” iff whenever an individual gets some 1. Selecting to produce a middle-of-the-road product level of education that no one should in equilbrium, no Nash reversion (Bernheim 176–7) a.k.a. “grim trigger” strategy. (Hotelling model); note the deterrent action is the same low-productivity worker should ever choose, and some high- Players attempt to support cooperation, reverting to a static as what the firm would do if it faced no entry with cer- productivity worker could conceivably choose, the firm as- (stage-game) equilibrium as punishment if someone deviates. tainty (entry is “blockaded.”) sumes he is a high-productivity worker with certainty. If a Nash reversion strategy is a NE, then it is a SPNE.

2. Producing a proliferating range of products (e.g., RTE Finitely repeated game (Bernheim 179–80) If there is a unique (up cereal). 2.11 Repeated games with complete infor- to payoffs) NE for the stage game, there is a unique SPNE 3. Preemptive investment in production capacity; shifts a mation for the repeated game, consisting of the repeated stage game portion of marginal cost to a sunk cost. Note entrant equilibrium. However, cooperation may be possible when the stage game has multiple NE. may also limit capacity to reduce threat to incumbent. Infinitely repeated game (Bernheim 164–5) A supergame formed by (potentially) infinite repetitions of some stage game (i.e., Stick-and-carrot equilibrium (Bernheim 185–6) If a firm strays Vickrey auction (Bernheim 22–3, 82–3) a.k.a. second-price auction. the game played in each period). Note there need not ac- from the (cooperative) equilibrium path, all firms including I bidders simultaneously submit sealed bids for an item that tually be infinite repetitions, but there must be a nonzero itself punish it for one period. If any firm does not partici- each values at vi. Winning bidder pays second highest bid. possibility in each stage that the game will continue. pate in the punishment (including the punished firm itself), Bidding pi = vi weakly dominates any other pure strategy, and is not weakly dominated by any other strategy. Analysis Given absence of terminal nodes, need mapping from strate- it gets punished again in the next period. The punishment does not depend on complete vs. incomplete information— gies to expected payoffs, e.g., is the “stick;” the fact that punishment will end as soon as the firm punishes itself is the “carrot.” everyone bidding their valuation is a BNE. P∞ t 1. Discounted payoffs: ui(vi) = t=0 δ vi(t), where the discount factor may reflect both time preference and the First-price sealed bid auction (Bernheim 83–7) I bidders simul- taneously submit sealed bids for an item that each values at probability of continuation. 2.12 Collusion and cooperation 1 PT vi. Winning bidder pays his own bid. Symmetric BNE gives 2. Average payoffs: ui(vi) = limT →∞ vi(t). T t=1 Core (Jackson; MWG 653–4) The core is the set of allocations not a decision rule where each bids below his valuation. 3. Overtaking criterion: a strategy is preferred iff there is blocked by any coalition. A coalition will block an alloca- Realized revenues typically differ from Vickrey (second-price some T beyond which all partial sums of stage game tion if there is some allocation feasible for the coalition such sealed bid) auction, but expected revenue is the same given payoffs exceed the corresponding partial sum for other that each member is strictly better off (“strong blocking”), independent private valuations (per the revenue equivalence strategies; n.b., only a partial ordering. or that every member is weakly better off and some member theorem). is strictly better off (“weak blocking”). Feasible payoff (Bernheim 169–70) The convex hull of payoff vectors Core allocations must be Pareto optimal. English auction (Bernheim 87) a.k.a. ascending price auction from pure strategy profiles in the stage game. Note this is Posted price of good is slowly increased until only one bid- potentially larger than the set of payoffs achievable through The set of Walrasian equilibria is a subset of the core (ba- der remains. Staying in until posted price exceeds valuation mixed strategies, since we allow for correlations. sically, the FWT) since there are no externalities. In other is a weakly dominated strategy; outcomes are equivalent to settings, the core could be empty due to externalities. Vickrey (second-price sealed bid) auction. Individually rational payoff (Bernheim 170–1) (Stage game) pay- off vectors where each player receives at least his minmax Core convergence (Jackson; MWG 655–7) As we increase the num- Dutch auction (Bernheim 87) a.k.a. descending price auction payoff ber of “replicas” of a pure exchange economy, the core of the m Posted price of good is slowly decreased until a bidder buys. πi ≡ min max πi(δ). replicated economy converges to (equal treatment replicas Outcomes are equivalent to first-price sealed bid auction. δ−i δi of) the W.E. of the original economy.

37 Nontransferable utility game (Jackson) Feasible allocations for In a simple game, i is a “veto player” iff V (S) = 1 =⇒ i ∈ S. • Two-sided matching (cf, roommate matching); a coalition T must be listed explicitly. The core is nonempty iff there is at least one veto player. Shapley values are then: • One-to-one matching (cf, firm and workers); Transferable utility game (Jackson; MWG 676) Feasible alloca- ( • Strict preferences. |S| 1 SV # of veto players , i is a veto player; tions for a coalition S are V (S) ≡ R+ . Generally assume: φ = i 0, otherwise. The algorithm is not strategy-proof: there may be profitable 1. The normalization V (∅) = 0; If the set of veto players is a carrier, φSV ∈ core. manipulations (lying by rejecting proposals), but they are 2. V (I) ≥ V (S) + V (I \ S) for all I ⊆ S, where I is the typically difficult to implement. set of all players; Convex game (Jackson; MWG 683) A TU game V (·) is convex iff S ⊆ T and i 6∈ T implies 3. Superadditivity (a stronger version of the prior assump- tion): V (S ∪ T ) ≥ V (S) + V (T ) when S ∩ T = ∅. V (S ∪ {i}) − V (S) ≤ V (T ∪ {i}) − V (T ); 2.13 Principal-agent problems

In a TU game, an allocation can be strictly blocked iff it can i.e., there are increasing marginal returns as coalitions grow. Moral hazard (Jackson) “Hidden action” problems. Agent takes be weakly blocked; the weak and strong cores are therefore For a convex game, the core is nonempty; the Shapley value (non-contractable) action e, outcome π has some distribu- identically is in the core. tion that varies based on e. Generally assume risk-neutral principal and risk-averse (and/or liability-limited) agent, so |I| X X {x ∈ R : xi = V (I) and ∀S ⊆ I, xi ≥ V (S)}. One-sided matching (Jackson) Players are each allocated one ob- that optimal solution is not merely “selling the firm to the i i ject; each player has ordinal preferences over objects. Given agent.” strict preferences, we can find the unique (weak) core alloca- tion using the Top Trading Cycles algorithm: Principal structures optimal contract—payment w(π) as a Shapley value (Jackson; MWG 673, 679–81) Given a characteristic function of realized outcome—that, for a desired effort level I function V : 2 → R+, the Shapley value is value function: 1. Look for cycles among individuals’ top choices; e∗, 2. Assign each member of cycles her top choice; X |S|!(|I| − |S| − 1)! φSV(V ) ≡ V (S∪{i})−V (S). 1. Maximizes principal’s expected payoff: E[π − w(π)|e∗]; i |I|! 3. Return to step 1, with already assigned individu- S⊆I s.t. i6∈S als/objects removed from consideration. 2. Satisfies the agent’s participation constraint (i.e., indi- vidual rationality constraint): E[u(w(π))|e∗] − g(e∗) ≥ That is, the average marginal value that i contributes over We can support this core as a competitive equilibrium by as- u¯, where g(·) is the agent’s cost of effort, andu ¯ is his all possible orderings in which members could be added to signing a common price to the objects in each cycle as they reservation utility; a coalition. It is a normative allocation, and can be seen are assigned. The price for cycles assigned in the same round as a positive description under certain kinds of bargaining can vary across cycles, but must be strictly lower than the 3. Satisfies the agent’s incentive compatibility constraint: regimes. prices for objects assigned in previous rounds. ∗ e ∈ argmaxe E[u(w(π))|e] − g(e). The Shapley value is the unique value function satisfying: The algorithm is strategy-proof: truth telling dominates.

Adverse selection (Jackson) “Hidden information” problems. Two-sided matching (Jackson) Two sets M and W , which may 1. Symmetry: If we relabel agents, the Shapley values are Can be mitigated using, e.g., warantees, repeated interac- have different cardinalities. Every individual is either relabeled accordingly. tion, reputation mechanisms, signaling/screening. matched to someone in the other set, or remains unmatched. 2. Carrier: T ⊆ I is a carrier iff V (S ∩ T ) = V (S) for all P Can find two core allocations (here strict = weak) using Gale- S ⊆ I; if T is a carrier, then φ (V ) = V (I) = Signaling (Jackson) Agents choose a costly signal (which per i∈T i Shapley [a.k.a. Deferred acceptance] algorithm: Suppose M V (T ). Spence is often assumed to be meaningless other than for “propose,” then its informational value), and then principals Bertrand bid 3. Dummy: i is a dummy iff V (S ∪ {i}) = V (S) for all for contracts with agents. S ⊆ I; if i is a dummy, φi(V ) = 0. Note Carrier =⇒ 1. Each M proposes to his favorite W to whom he has not yet proposed (unless he would prefer to stay single); Dummy. A multitude of pure strategy sequential equilibria typically 4. Additivity: φ(V + W ) = φ(V ) + φ(W ); implies that 2. Each W who has been proposed to accepts her favorite exist—both pooling (various types choose same signal) and φ(λV ) = λφ(V ). Convenient, but not mathematically proposal (unless she would prefer to stay single)—this separating (various types choose different signals). clear why this should hold. could involve breaking an engagement she made in a previous round; Screening (Jackson) Bertrand principals announce a schedule of Simple game (Jackson) A TU game where 3. Return to step 1, where the only unengaged M pro- contracts they are willing to engage in (pairs of signals and pose (i.e., those who either had a proposal rejected last wages), and then agents choose a contract (and hence a sig- 1. V (S) ∈ {0, 1}, round, or had an engagement broken). nal). 2. S ⊆ T =⇒ V (S) ≤ V (T ), The core is a lattice; Gale-Shapley algorithm yields “best” No pooling equilibria can exist, and there is only one sep- for proposing group (and “worst” for other group). Thus arating equilibrium that can exist (but may not). Thus in 3. V (S) = 1 =⇒ V (I \ S) = 0, if we get same result from M-proposing and W -proposing contrast with signaling, where our problem was a multitude 4. V (I) = 1. algorithms, core is a singleton. Note this result relies on: of PSNE, screening games may have no PSNE.

38 2.14 Social choice theory Vickrey-Clarke-Groves mechanism (Jackson) Let θi ∈ Θi be We further assume U(·) satisfies Inada conditions; that the the type of individual i, and θˆi be his announced type. Sup- production function is constant returns to scale and satis- 0 00 0 00 Voting rule (Jackson) n individuals must choose among a set A of pose d(θ) makes ex post efficient decisions. Then a mecha- fies Fk > 0, Fkk < 0, Fn > 0, and Fnn < 0; and TVC t 0 0 alternatives. Each individual has a complete, transitive pref- nism with transfers limt→∞ β U (ct)f (kt)kt = 0. erence relation i over A that does not have indifference. ˆ X ˆ ˆ  ˆ The problem can be rewritten in “SL canonical form” ti(θ) = uj (d(θ), θj ) + xi(θ−i) P t A voting rule R( ) ≡ R( 1,..., n) gives a social welfare as maximizing t β U(f(kt) − kt+1) subject to kt+1 ∈ j6=i ordering over A (which in general allows indifference). The [0, f(kt)], the given k0, and conditions on f(·) and U(·) corresponding strict social welfare ranking is P ( ). as above. In functional equation form, we write V (k) ≡ for any x (·) is dominant-strategy incentive compatible 0 0 i max 0 U(f(k) − k ) + βV (k ) (again, with additional (DSIC). k ∈[0,f(k)] Neutrality (Jackson) For A = {a, b} (i.e., |A| = 2), consider and conditions as above). 0 with 6= 0 . Then R(·) is neutral (over alternatives) iff i i Conversely, if (d, t) is DSIC, d(·) is ex post efficient, and Θi Steady state satisfies βf 0(k∗) = 1. Note the utility func- aR( )b ⇐⇒ bR( 0)a. is sufficiently rich such that ∀vi ∈ {vi : D → R} there exists tion does not affect the steady state (although it will affect θi such that vi(·) = ui(·, θi), then t(·) must satisfy the above dynamics). Anonymity (Jackson) Let π(·) be a permutation over {1, . . . , n}. condition. 0 Define i≡ π(i). Then R(·) is anonymous (over individuals) Linearization of the intereuler iff aR( )b ⇐⇒ aR( 0)b. Note that Gibbard-Satterthwaite says that a general DSIC mechanism must be dictatorial. However, here we have re- 0 0 0 u (f(kt) − kt+1) − βf (kt+1)u (f(kt+1) − kt+2) = 0 0 0 stricted ourselves to quasilinear preferences, and get a DSIC Monotonicity (Jackson) Consider and with j = j for all 0 mechanism without being dictatorial. However, although it about steady state gives: j 6= i, and i= a while i= b. Then R(·) satisfies mono- tonicity iff bR( )a =⇒ bR( 0)a. reaches an efficient decision it is not always balanced (pay- " 0 # ments do not total to zero), and hence is not typically overall   1 U (c∗) 00 1   0 kt+2 − k∗ 1 + + β 00 f (k∗) − kt+1 − k∗ R(·) satisfies strict monotonicity iff bR( )a =⇒ bP ( )a. efficient. ≈ β U (c∗) β kt+1 − k∗ 1 0 kt − k∗

May’s Theorem (Jackson) Let A = {a, b} (i.e., |A| = 2). Then Pivotal mechanism (Jackson) An example of a VCG mechanism R(·) is complete, transitive (which has no bite for |A| = 2), where The optimal policy function g(k) satisfies: neutral, anonymous, and satisfies strict monotonicity iff it is majority rule. X  X ti(θˆ) = uj (d(θˆ), θˆj ) − max uj (d, θˆj ). 1. g(·) is single-valued, since the value function is strictly d If we replace strict monotonicity with monotonicity, we get j6=i j6=i concave. that R(·) must be a quota rule. 2. g(0) = 0 since that is the only feasible choice. That is, the transfers are the externality imposed on others ˆ ˆ Unanimity (Jackson) R(·) is unanimous iff a i b∀i implies that by choosing d(θ) instead of d(θ−i). 3. g(k) is in the interior of γ(k), since otherwise exactly one of U 0(c) and U 0(c0) would be infinite, violating the aP ( )b. Ensures feasibility, since transfers are always negative. How- Intereuler condition. ever, budget generally doesn’t balance—there is a need to Arrow’s independence of irrelevant alternatives (Jackson) 0 0 0 “burn money.” 4. g (k) > 0, since as k increases, marginal cost of saving Consider and with a i b ⇐⇒ a i b for all i. goes down, while marginal benefit stays same. 0 R(·) satisfies AIIA iff aR( )b ⇐⇒ aR( )b; i.e., R( ) 5. There is unique k∗ > 0 such that g(k∗) = k∗. over {a, b} only depends on over {a, b}, not over any other 3 Macroeconomics (“irrelevant”) alternatives. 6. k > k∗ =⇒ k > g(k) > k∗, and k < k∗ =⇒ k < g(k) < k∗; i.e., capital moves closer to (without crossing 3.1 Models Arrow’s Theorem (Jackson; Micro P.S. 4) Let |A| ≥ 3. Then R(·) over) the steady-state level. is complete, transitive, unanimous, and satisfies AIIA iff Two period intertemporal choice model (Nir 1-11) Maximize 7. The sequence k0, g(k0), g(g(k0)),... displays mono- it is dictatorial (i.e., if there is some i such that ∀ , tonic and global convergence. U(C0,C1) subject to C0 + S0 ≤ Y0 and C1 ≤ RS0. The aP ( )b ⇐⇒ a i b). 1 constraints can be rewritten C0 + R C1 ≤ Y0. The theorem is tight; i.e., giving up any of completeness, Endowment economy, Arrow-Debreu (Nir 13-4–6) At every period household gets y units of good; there is no storage, transitivity, unanimity, or AIIA allows a non-dictatorial R(·). Neoclassical growth model (discrete time) (Nir 3-3–5, 11-3, t and trading in all commodities (goods delivered in each pe- 8, 10, 16–28, 12-12–26) (a.k.a. Ramsey model) Maximize P t Condorcet winner (Jackson) a ∈ A is a Condorcet winner iff it is P t riod) takes place in period 0. Maximize β U(ct) subject t β U(ct) subject to: P P majority preferred to every other alternative. to ptct ≤ ptyt (budget constraint), and ensure yt = ct (market clearing). 1. ct + kt+1 ≤ F (kt, nt) + (1 − δ)kt for all t (RHS can be Condorcet-consistent rule (Jackson) A voting rule that picks denoted f(kt, n), or f(kt), since the lack of disutility of the Condorcet winner if there is one. Endowment economy, sequential markets (Nir 13-7–9) At ev- working ensures that nt = 1); ery period household gets yt units of good; there is no stor- Gibbard-Satterthwaite Theorem (Jackson) Let 3 ≤ |A| < ∞, 2. kt+1 ≥ 0 for all t; age, and trading in “assets” (loans across periods) takes place P t and F (·) be a social choice function (note we do not require a each period. Maximize β U(ct) subject to ct + at+1 ≤ 3. ct ≥ 0 for all t; full ordering). Then F (·) has range A (implied by unanimity) yt + Rtat (budget constraint), and ensure yt = ct, at+1 = 0 and is strategy-proof (i.e., DSIC) iff it is dictatorial. 4. k0 is given. (market clearing).

39 c1−σ −1 n1+χ (Nir 13-10–6) (Nir 21) Production economy, Arrow-Debreu Household 1. CRRA utility: U(c, n) = 1−σ − 1+χ ; Imperfect competition: intermediate goods Inter- owns capital rented for production by competitive firms, mediate goods producers have identical production technolo- 2. Cobb-Douglas production: y = A kαn1−α; α 1−α which also rent labor from households. Capital depreciates at t t t t gies (affected by same shocks): Qj = zkj nj − φ, where rate δ. There is no storage, and trading in all commodities 3. Productivity given by log A = ρ log A + ε with ε φ > 0 is overhead cost (since there aren’t really economic (goods delivered, capital rental, and labor in each period) t t−1 t t distributed iid normal. profits). Has constant marginal cost, and therefore increas- takes place in period 0. ing returns to scale (because of φ).

P t P (Nir 10) 1. Households maximize β U(ct) subject to pt[ct + Optimal taxation—primal approach Solve HH prob- Monopolistic problem is to maximize over prices pQ(p) − P kt+1] ≤ pt[rtkt + (1 − δ)kt + ntwt]. lem and combine FOC and budget constraint with govern- c(Q(p)); first-order condition gives markup ≡ p/ MC = ment budget constraint to get one expression that ties to- η = 1 . 2. Firms maximize (in each period) p F (k , n ) − p r k − 1+η ρ t t t t t t gether allocations but has no prices or taxes. Then gover- p w n . t t t ment maximizes HH utility over allocations subject to this 3. Markets clear: ct + kt+1 = F (kt, nt) + (1 − δ)kt. constraint. 3.3 General concepts Note the rental rate of capital (r ) and wage rate (w ) are Note that in the case with capital,∗ the problem is not sta- t t Cobb-Douglas production function (Macro P.S. 1) F (K,N) = tionary; i.e., we cannot use our typical dynamic program- measured in units of consumption good. zKαN 1−α. Constant returns to scale. Fraction α of output ming tools, and need to worry about the government’s ability goes to capital, and 1 − α to labor. Production economy, sequential markets (Nir 13-25) As to commit to a future tax policy. above, household owns capital rented for production by Constant relative risk aversion utility function (Nir 14-15, competitive firms, which also rent labor from households. 1−σ 18-72–4) U(c) = (c − 1)/(1 − σ), with σ > 0 (reduced to Capital depreciates at rate δ. There is no storage, and each 3.2 Imperfect competition log utility with σ = 1;† empirically we expect σ ∈ [1, 5]). period, goods delivered, capital rental, and labor are traded. Relative risk aversion is σ. Higher risk aversion also cor- Note there is no inter-period trading, so we do not need a Imperfect competition model (Nir 21) Consider two stages of responds to a higher desire to smooth consumption over price for goods. production: final good manufacturer operates competitively, time. P t purchasing inputs from intermediate good producers, who 1. Households maximize β U(ct) subject to ct +kt+1 ≤ are small enough to take general price levels and aggregate Additively separability with stationary discounting (Nir 1- rtkt + (1 − δ)kt + ntwt. demand as given, but whose products are imperfect substi- 24) General assumption that U(C0,C1) = u(C0) + βu(C1) 2. Firms maximize (in each period) F (kt, nt)−rtkt −wtnt. tutes to the production process. with β ∈ (0, 1). 3. Markets clear: ct + kt+1 = F (kt, nt) + (1 − δ)kt. 0 Imperfect competition: final goods (Nir 21) Final good pro- Intertemporal Euler equation (Nir 1-28, 11-5) u (ct) = 0 Formulated recursively (assuming n = 1), ducer has Dixit-Stiglitz production technology: βRu (ct+1), where R is the return on not-consuming (e.g., 0 0 f (kt+1) ≡ F (kt+1, nt+1) + (1 − δ)). This first-order differ- 0 0 k 1. Households have V (k, K) = max 0 [U(c) + βV (k ,K ) c,k Z 1 1/ρ ence equation is given by FOCs of the maximization problem subject to c + k0 = R(K)k + W (K); where K is aggre- ρ Y = Qj dj for two consecutive time periods. gate capital, and R(K) and W (K) are the rental and 0 wage rates as functions of aggregate capital. We further Intertemporal elasticity of substitution (Nir 1-31, Max notes) 1 C C U0(C ) require the “forecast” of future capital to be rational: which is constant elasticity of substitution ( 1−ρ between all 0 0 0 0 −d log C /d log R = −d log C /d log U0(C ) . K = G(K), where G(·) is the optimal aggregate policy pairs of inputs) and constant returns to scale. ρ ∈ (0, 1), 1 1 1 function. with ρ → 1 corresponding to a competitive market where all For CRRA utility function U(c) = c1−σ/(1 − σ), intereuler −σ −σ 1 −1 2. Firms maximize F (K) − RK − W , which gives FOC inputs are perfect substitutes. is C0 = βC1 R so IEoS = σ = RRA . R(K) = Fk(K) + 1 − δ and W (K) = Fn(K). Conditional factor demands are therefore General equilibrium (Nir 2-11) A competitive equilibrium is a set 3. Markets clear: C + K0 = F (K) + (1 − δ)K. 1 1 of allocations and prices such that h pj i ρ−1 h pj i ρ−1 4. Consistency: G(K) = g(K,K). Qj (Y ) = Y = Y, λ P 1. Actors (i.e., households, firms, . . . ) maximize their ob- Overlapping generations (Nir 17-15–6, 17-32) In an endowment jectives; OLG economy, a competitive equilibrium is Pareto optimal where the Lagrange multiplier on [R 1 Qρ dj]1/ρ ≥ Y also sat- 0 j 2. Markets clear; and P∞ ∗ iff t=0 1/pt = ∞ where the interest rate is Rt = pt/pt+1. isfies λ = E /Y ≡ P , the (aggregate) average cost. Solving 3. Resource constraints are satisfied. In an an OLG economy with production, a competitive equi- for this gives 0 librium is dynamic efficient iff F (K∗) + 1 − δ ≥ 1. K ρ−1 Inada conditions (Nir 3-5) We generally assume utility satisfies Z 1 ρ  ρ ρ−1 Inada conditions, which imply that the nonnegativity con- Real Business Cycles model (Nir 18-7–14) Benchmark RBC λ = P = pj . model is NCGM with endogenous hours, ownership of firms 0 straint on consumption will not bind: and capital by households, and stochastic productivity At 1 1. U(0) = 0; multiplying labor (so production function is F (kt,Atnt)). The own-price elasticity of demand is η ≡ ρ−1 , ignoring the We specified: effects through P . 2. U is continuously differentiable;

∗We generally need to impose restrictions on first-period taxation, since otherwise the government will tax capital very highly then (since it’s supplied perfectly inelastically). †Note the Taylor approximation of c1−σ − 1 about σ = 1 is c1−σ − 1 ≈ (c0 − 1) − c0(log c)(σ − 1) = (1 − σ) log c. Thus as σ → 1, we have U(c) = (c1−σ − 1)/(1 − σ) ≈ log c.

40 0 3. U (x) > 0 (U is strictly increasing); Normed vector space (Nir 6-5–6) A vector space S and a norm Contraction Mapping Theorem (Nir 8-5) [a.k.a. Banach fixed 4. U 00(x) ≤ 0 (U is strictly concave); k·k: S → R such that: point theorem.] If T is a contraction on a complete metric space, then 0 5. limx→0+ U (x) = ∞; 1. ∀x ∈ S, kxk ≥ 0; 0 1. T has exactly one fixed point V ∗ such that TV ∗ = V ∗; 6. limx→∞ U (x) = 0. 2. ∀x ∈ S, kxk = 0 ⇐⇒ x = 0; and Transversality condition (Macro P.S. 2) In NCGM, 3. Triangle inequality: ∀x, y ∈ S, kx + yk ≤ kxk + kyk. 2. The sequence {V } where V = TV converges to V ∗ t 0 0 i i+1 i limt→∞ β U (ct)f (kt)kt = 0. 4. Scalar multiplication: ∀x ∈ S, ∀α ∈ R, kαxk = |α| kxk. from any starting V0.

No Ponzi condition (Max notes) Credit constraint on agents, Note any normed vector space induces a metric space with Contraction on subsets (Nir 8-11) Suppose T is a contraction on which should never bind but suffice to prevent the optimality ∗ ∗ ρ(x, y) ≡ kx − yk. a complete metric space (S, ρ), with fixed point V = TV . of a “doubling” strategy. For example, ∀t, at ≥ −k¯ for some ¯ Further suppose Y ⊆ S is a closed set, that Z ⊆ Y , and that k. n n ∗ Supremum norm (R ) (Nir 6-7–8) k·ks : R → R with kxks ≡ ∀y ∈ Y , T y ∈ Z. Then V ∈ Z. i Dynamic systems (Nir 12-4–11, Macro P.S. 5-4) Let the sequence supi=1,...,n |x |. Blackwell’s sufficient conditions for a contraction (Nir 8- {xt} evolve according to xt+1 = W xt. Using eigen de- n 12–3) Blackwell gives sufficient (not necessary) conditions for −1 Euclidean norm (Nir 6-9) k·kE : → with composition W = P ΛP , giving the “decoupled system” R R n −1 −1 −1 an operator to be a contraction on the metric space B(R , R) P xt+1 = ΛP xt. Then ifx ˜t ≡ P xt, we have n t t v n (the set of bounded functions R → R) with the sup norm. x˜ti = λ x˜0i and hence xt = P Λ X˜0. u i un X n An operator T is a contraction if it satisfies kxkE ≡ t |xi| . For 2 × 2 case, if i=1 1. Monotonicity: if ∀x, f(x) ≤ g(x), then ∀x, T f(x) ≤ 1. |λ1| > 1 and |λ2| > 1, then the system “explodes” un- T g(x); and lessx ˜ =x ˜ = 0 (source). Continuous function (Nir 6-10; Micro math 3) f : S → is contin- 0,1 0,2 R 2. Discounting: there exists β ∈ (0, 1) such that for all 2. |λ | < 1 and |λ | < 1, then the system converges for uous at x iff ∀y ∈ S and ∀ε > 0, there exists δ > 0 such that 1 2 ky − xk < δ =⇒ |f(y) − f(x)| < ε. α ≥ 0, f(·), and x, we have T (f(x)+α) ≤ T (f(x))+βα anyx ˜0,1 andx ˜0,2 (sink). [slight abuse of notation]. Equivalently, iff for all sequences xn converging to x, the 3. |λ1| < 1 and |λ2| > 1, then the system converges for sequence f(xn) converges to f(x). Principle of Optimality (Nir 9) Under certain conditions, the anyx ˜0,1 as long asx ˜0,2 = 0 (saddle path). solutions to the following two problems are the same: Supremum norm (real-valued functions) (Nir 6-7–8) Let The speed of convergence for a state variable equals one mi- ∞ 0 1. Sequence problem: W (x ) = max P βtF (x , x ) nus the slope of the optimal policy function (1 − g (k)). C(X) denote the set of bounded, continuous functions from 0 {xt+1} t=0 t t+1 X to R. Then k·ks : C(X) → R with kfks ≡ supx∈X |f(x)|. such that x0 given and ∀t, xt+1 ∈ Γ(xt). First Welfare Theorem (Nir 13-27) The competitive equilibrium 0 ∞ 2. Functional equation: V (x) = maxx0∈Γ(x)[F (x, x ) + is Pareto optimal. Then if we know our solution to the social Convergent sequence (Nir 7-4) The sequence {x } in S con- i i=0 βV (x0)]. planner problem is the unique Pareto optimal solution, we verges to x ∈ S (or equivalently, the sequence has limit x) iff know it must equal the competitive equilibrium. ∀ε > 0, there exists n such that i > n =⇒ kxi − xk < ε. In Assume other words, there is a point in the sequence beyond which Ramsey equilibrium (Nir 20-3, final solutions) A Ramsey equilib- all elements are arbitrarily close to the limit. 1. Γ(x) is nonempty for all x; rium is allocations, prices, and taxes such that: ∞ Cauchy sequence (Nir 7-9) The sequence {x } in S is Cauchy 2. For all initial conditions x0 and feasible plans {xt}, the i i=0 t 1. Households maximize utility subject to their budget iff ∀ε > 0, there exists n such that i > n and j > n together limit u({xt}) ≡ limn→∞ β F (xt, xt+1) exists (although constraints, taking prices and taxes as given, and it may be ±∞). imply that kxi − xj k < ε. In other words, there is a point in 2. Government maximizes households’ utility while financ- the sequence beyond which all elements are arbitrarily close Then: ing government expenditures (i.e., meeting its budget to each other. Every convergent sequence is Cauchy (shown via triangle inequality), but not every Cauchy sequence is constraint). 1. If W (x ) is the supremum over feasible {x } of u({x }), convergent. 0 t t then W satisfies the FE.

3.4 Dynamic programming mathematics Complete metric space (Nir 7-10–16) A metric space (S, ρ) is 2. Any solution V to the FE that satisfies boundedness n complete iff every Cauchy sequence in S converges to some condition limn→∞ β V (xn) = 0 is a solution to the Metric space (Nir 6-3–4) A set S and a distance function ρ: S × point in S. Importantly, if C(X) is the set of bounded, con- SP. S → such that: R tinuous functions from X to R, and k·ks : C(X) → R is the ∗ 3. Any feasible plan {xt } that attains the supremum in sup norm, then (C(X), k·ks is a complete metric space. ∗ ∗ ∗ ∗ 1. ∀x, y ∈ S, ρ(x, y) ≥ 0; the SP satisfies W (xt ) = F (xt , xt+1) + βw(xt+1) for ∀t. 2. ∀x, y ∈ S, ρ(x, y) = 0 ⇐⇒ x = y; Contraction (Nir 8-3–4) If (S, ρ) is a metric space, the operator ∗ T : S → S is a contraction of modulus β ∈ (0, 1) iff ∀x, 4. Any feasible plan {xt } that satisfies 3. ∀x, y ∈ S, ρ(x, y) = ρ(y, x); ∗ ∗ ∗ ∗ y ∈ S, we have ρ(T x, T y) ≤ βρ(x, y). That is, T brings any W (xt ) = F (xt , xt+1) + βw(xt+1) for ∀t and 4. Triangle inequality: ∀x, y, z ∈ S, ρ(x, z) ≤ ρ(x, y) + t ∗ two elements closer together. Every contraction is a contin- limn→∞ sup β W (xt ) ≤ 0 attains the supremum in ρ(y, z). uous mapping. the SP. [?]

41 So approach is: solve FE, pick the unique solution that sat- 3.5 Continuous time 1. x ≈ x∗(1 +x ˆ); isfies boundeness condition, construct a plan from the policy 2. xy ≈ x∗y∗(1 +x ˆ +y ˆ); corresponding to this solution, check the limit condition to Dynamic systems in continuous time (Nir 14-2–3, 7) In R, the α β α β make sure this plan is indeed optimal for the SP. system 3. x y ≈ x∗ y∗ (1 + αxˆ + βyˆ); 0 dx dx 4. f(x) ≈ f(x∗) + f (x∗)x∗xˆ = f(x∗)(1 + ηxˆ). Dynamic programming: bounded returns (Nir 10) Given the t t x˙ t = axt ≡ dt = ax ⇐⇒ x = a dt SP/FE as above, assume: ⇐⇒ log xt = at + c Given Y = F (K,L), log-linearization gives Yb = ηYK Kb + l at ηYLLb where ηYX is the elasticity of Y with respect to X: 1. x takes on values in a convex subset of R , and Γ(x) is ⇐⇒ xt = x0e nonempty and compact-valued for all x. This implies 0 FX (K∗,L∗)X∗ assumption 1 above. converges iff real(a) < 0. ηYX ≡ . F (K∗,L∗) n ˙ −1 2. The function F is bounded and continuous; β ∈ (0, 1). In R , the systemx ˙ t = Axt ≡ x˜t = Λ˜xt wherex ˜t = P xt Together with assumption 1, this implies assumption 2 and the eigen decomposition is A = P ΛP −1. Therefore Note that xˆ˙ ≡ d (log x − log x ) = d (log x) =x/x. ˙ above. dt ∗ dt   eλ1t 3. For each x0, the function F (·, x0) is increasing in each Λt  .  of its first arguments. x˜t = e x˜0 =  .  x˜0. 3.6 Uncertainty  .  λnt 4. Γ(·) is monotone; i.e., x1 ≤ x2 =⇒ Γ(x1) ⊆ Γ(x2). e Uncertainty: general setup (Nir 16-3–6) s ≡ {y1, . . . , yn} set 5. F is strictly concave. of possible states of economy. st realization at time t. Q P t 0 0 For 2 × 2 case (recall det A = λi and tr A = λi), s ≡ (s0, . . . , st) “history” at time t. π(·) probability func- 6. Γ is convex; i.e., if x1 ∈ Γ(x1) and x2 ∈ Γ(x2), then 0 0 tion. θx1 + (1 − θ)x2 ∈ Γ(θx1 + (1 − θ)x2). det A < 0: Saddle path Often assume a Markov process: π(s |st) = π(s |s ); 7. F is continuously differentiable on the interior of the set t+1 t+1 t det A > 0, tr A > 0: Unstable i.e., only the previous period matters. Process described by on which it is defined. tr A < 0: Sink a transition matrix P where Pij = π(st+1 = yj |st = yi) with P P = 1 for all i. “Invariant distribution” is eigenvector Then j ij Linearization in continuous time (Nir 14-6, Max notes) x˙ t = π1×n = πP . f(x ) =⇒ x˙ ≈ D (x )(x − x ). [cf discrete time 1. Under assumptions 1 and 2, the operator T where t t f ∗ t ∗ 0 0 xt+1 = g(xt) =⇒ xt+1 − x∗ ≈ Dg(x∗)(xt − x∗).] Lucas tree (Nir 16-22–4) Assume von Neumann-Morgenstern util- TV (x) ≡ max 0 [F (x, x ) + βV (x )] maps bounded x ∈Γ(x) ity function, thus objective function continuous functions into bounded continuous func- NCGM in continuous time (Nir 14-8–15) Maximize tions, is a contraction (and therefore has a unique fixed R ∞ −ρt ∞ ∞ e U(ct) dt such that k˙ t = Wt + Rtkt − ct − δkt. X X X point), and is the value function for the corresponding t=0 βtπ(zt)U(c (zt)) = E βtU(c (zt)). The Hamiltonian is given by: t 0 t FE. t=0 zt t=0 −ρth i 2. Under assumptions 1–4, the value function V (unique H ≡ e U(ct) + λt[ Wt + Rtkt − ct − δkt ] Price (at time 0) of a claim that delivers one unit at time t fixed point of T as defined above) is strictly increasing. | {z } t t given history z is pt(z ). Budget constraint (Arrow-Debreu) Budget constraint w/o k˙ t 3. Under assumptions 1–2 and 5–6, the value function V is is −ρt ∞ ∞ strictly concave, and the corresponding optimal policy ≡ e U(ct) + µt[Wt + Rtkt − ct − δkt]. X X t t X X t t pt(z )ct(z ) ≤ pt(z )yt(z ) correspondence g is a continuous function. t=0 zt t=0 zt ∂H ∂H d −ρt 4. Under assumptions 1–2 and 5–7, the value function V is First order conditions ∂c = 0 and ∂k = − dt (e λt) = t t where yt is a (stochastic) endowment. Normalizing p0 = 1 −ρt ∂H differentiable (by Benveniste-Scheinkman) and satisfies e (ρλt −λ˙ t) (or equivalently, = −µ˙ t), along with TVC 0 ∂F ∂kt gives FOC envelope condition V (x0) = |(x ,g(x ). −ρt ∂x 0 0 limt→∞ e λtkt = 0 (or equivalently, limt→∞ µtkt = 0), βtπ(zt)U 0(y (zt)) U 0(y (zt)) characterize the solution. Solving gives: p (zt) = t = βtπ(zt) t . Benveniste-Scheinkman Theorem (Nir 10-25) Suppose t 0 λ U (y0) 0 U (ct) l c˙t = 00 (δ − Rt + ρ) 1. X ⊆ R is a convex set; U (ct) Risk-free bond pricing (Nir 16-25–6) An asset purchased at time ˙ 2. V : X → R is a concave function; kt = Wt + Rtkt −ct − δkt. t that pays 1 consumption unit at time t + 1. Price is: | {z } 3. x0 ∈ interior(X); and D is a neighborhood of x0, with =yt by CRS P t pt+1(zt+1, z ) D ⊆ X; zt+1 qRF(zt) = Imposing functional forms for U(·) and F (·) and log- t t 4. Q: D → R is a concave differentiable function with pt(z ) linearizing allows us to get a continuous time dynamic system t 0 t Q(x0) = V (x0) and ∀x ∈ D, Q(x) ≤ V (x). X π(zt+1, z ) U (yt+1(zt+1, z )) for kˆt andc ˆt. = β t 0 t πt(z ) U (yt(z )) 0 0 zt+1 Then V (·) is differentiable at x0 with V (x0) = Q (x0). Log-linearization (Nir 14-24, Max notes) Write every variable x as 0 t log x Et[U (yt+1(zt+1, z ))] The envelope condition of a value function is sometimes e and then linearize in log x about steady state log x∗. = β . x−x∗ 0 t called the Benveniste-Scheinkman condition. Use notationx ˆ ≡ log x − log x∗ ≈ . For example, U (yt(z )) x∗

42 ∞ ∞ Stock pricing (Nir 16-26–7) An asset purchased at time t that pays x = Ra∗(x ) + y, the RHS of which is stochastic. So endowment that has no aggregate shock. Neither can com- t dividend dt(z ). Price is: x → ∞. mit to remain in an insurance contract. Value to insurer in excess of autarky is: P∞ P s s Thus in an infinite-horizon model, complete and incomplete s ps(z )ds(z ) qtree(zt) = s=t+1 z markets with βR = 1 necessarily look different. t t Q(∆0, s0) = pt(z ) X ∞ 0 s max u(1 − c) − u(1 − y(s0)) + β π(s)Q(∆(s), s) X s−t U (ys(z )) s Hall’s martingale hypothesis (Manuel) Suppose u(c) = Ac − = E β d (z ). 2 c,{∆(s)}s t 0 t s Bc (which may be a second-order Taylor expansion of the s∈S U (yt(z )) s=t+1 true utility function), and that only a risk-free bond is avail- such that able, with Rβ = 1. Intereuler gives that E ct+1 = ct; that is, X [PK (µ)]: u(c) − u(y(s0)) + β π(s)∆(s) ≥ ∆0, {ct} follows an AR(1) process. 3.7 Manuel Amador s∈S However, many models have c today as a predictor of c to- [IC-1 (βπ(s)λ(s))]: ∆(s) ≥ 0, Lucas cost of busines cycles (Manuel) We find that the wel- morrow, even without the same intereuler (e.g., the Aiyagari fare impact of eliminating the business cycle is very small model). Tests are always rejected, but are generally run with [IC-2 (βπ(s)θ(s))]: Q(∆(s), s) ≥ 0. (< 0.04%), especially compared to the impact of raising the aggregate data, with resulting problems of growth rate of consumption. Complaints include the facts FOCs and envelope condition give that that: 1. Goods aggregation (i.e., how do you define consumption 0 0 given a basket of goods?), Q (∆0, s0) = λ(s) + (1 + θ(s))Q (∆(s), s), 1. A representative agent is bad for measuring utility and thus either consumption stays the same, increases the the estimating the variance in consumption. 2. Agent aggregation (i.e., even if intereuler holds for each of hetrogeneous agents, it may not hold in aggregate), lowest level that satisfies IC-1, or falls to the highest level 2. The model treats shocks as transitory; treating con- that satisfies IC-2. sumption as persistent results in significantly higher 3. Time aggregation (i.e., how do you deal with data that “cost” estimates. is in discrete time period “chunks”?). Bulow-Rogoff defaultable debt model (Manuel) An agent (who need only have monotone preferences) cannot com- 3. Uncertainty could also affect the consumption growth The presence of durable goods introduces an MA component mit to repay a risk-neutral insurer; if he defaults, he will rate, which requires an endogenous growth model (not to {ct}, making it an ARMA process. only be able to save in a complete-markets “Swiss bank the NCGM). account” that pays expected return R.

(Manuel) Incomplete markets: finite horizon (Manuel) Suppose a two One-sided lack of commitment One reason markets If wealth X X y(sτ ) period model with uncertain income in the second period, might be incomplete is a lack of commitment. Suppose risk- W (st) ≡ π(sτ |st) averse consumuer can borrow from a risk-neutral lender, but Rτ−t and incomplete markets in which only a risk-free bond is τ≥t sτ available at return R. The consumer chooses savings a to can’t commit to repayment. Suppose Rβ = 1; lender takes t stochastic endowment and gives consumer consumption c(s) is finite for all s , and we impose no Ponzi condition (natural maximize u(y0 − a) + β E u(y1 + Ra), yielding FOC (in- borrowing constraint) that debt tereuler) today, promising consumer future lifetime value w(s). Thus 0 ∗ 0 ∗ value to lender is: X X p(sτ ) u (y0 − a ) = βR E u (y1 + Ra ). D(st) ≡ π(sτ |st) ≤ W (st) τ−t X τ R If instead there were no uncertainty in the second period, the P (v) = max π(s)y(s) − c(s) + βP (w(s)) τ≥t s {c(s),w(s)} analogous FOC would be s s∈S for all st, then debt will always be nonpositive—if the agent 0 0 were ever a borrower, he would do better to defect. u (y0 − aˆ) = βRu (E y1 + Raˆ). such that Thus to get (e.g., international) debt, we need either By Jensen’s inequality, optimal savings is higher in the un- X [PK (µ)]: π(s)u(c(s)) + βw(s) ≥ v, 1. Infinite wealth in some state st, certain world (a∗ > aˆ) as long as u000 > 0. s∈S 2. A limited ability to save in nearly-complete markets, or Thus in a two period model, the difference between savings [IC (λ(s))]: u(c(s)) + βw(s) ≥ u(y(s)) + β vautarky . 3. Lenders’ ability to punish. in complete and incomplete markets depends on the sign of | {z } u000. u(y(s0)) E Hyperbolic discounting (Manuel) Agent has a higher discount 1−β rate between today and tomorrow than between tomorrow Incomplete markets: infinite horizon (Manuel) Again, we and following periods (time-inconsistent). Often parameter- suppose uncertain income, and the availability of only a FOCs and envelope condition give that ized as “β-δ discounting”: risk-free bond. Taking FOC and envelope condition of ∞ 1 1 X = + λ(s0), u(c ) + β δτ u(c ). X u0(c(s0) u0(c(s) t t+τ V (x) = max u(x − a) + βπ(s)V [Ra + y(s)] τ=1 a∈[φ,x] s∈S so consumption is weakly increasing over time. This is not Caution: Suppose an agent only likes candy bars when he’s gives that V 0(x) ≥ βR E[V 0(x0)]. Thus if βR ≥ 1, we consistent with G.E., where we therefore must have Rβ < 1. healthy. If he knows he’s healthy today, he may prefer 1 to- have V 0(x) is a nonnegative supermartingale and converges day to 2 tomorrow, but 2 the day after tomorrow to 1 tomor- to a finite value by Dobb’s Convergence Theorem. Thus Two-sided lack of commitment (Manuel) [Kocherlakota] Sup- row. This may wind up looking like hyperbolic discounting, x converges, but cannot converge to a finite value since pose two risk-averse consumuers who each get a stochastic but isn’t.

43 P 3.8 John Taylor 6. Use γ0 = Hµ0 + µ0 as another equation to solve for 4 Mathematics solution. Taylor Rule (Taylor) A policy rule (or family of policy rules given different coefficients) for the nominal interest rate as a func- Cagan model (Taylor) Rational expectations model given by: 4.1 General mathematical concepts tion of (four-quarter average) inflation and the GDP output gap (expressed as a percentage deviation from trend): mt − pt = −β(ˆpt+1 − pt). Elasticity (Wikipedia) Elasticity of f(x) is i = π + 0.5y + 0.5(π − 2) + 2 Derived from Cagan money demand equation with 0 ∂ log f(x) ∂f x ∂f/f xf (x) = 1.5 π + 0.5y + 1. = · = = . |{z} ∂ log x ∂x f(x) ∂x/x f(x) >1 1. Either zero coefficient on yt, or yt = 0 for all t, and

The “greater than one” principal suggests that the coefficient 2. rt ignored in it = rt +π ˆt+1 = rt +p ˆt+1 − pt. on inflation should exceed one. Euler’s law (Producer 14) If f is differentiable, it is homogeneous Price reactions to money supply shocks are not quite one- of degree k iff p · ∇f(p) = kf(p). Granger causality (Taylor) pt Granger causes yt iff the er- for-one, since there’s an anticipated future deflation which k ror from predicting yt based on lagged y and p causes people to hold some of the extra money. One direction proved by differentiating f(λp) = λ f(p) with 2 respect to λ, and then setting λ = 1. Implies that if f is ho- (σprediction(yt|pt−1, pt−2, . . . , yt−1, yt−2,... )) is less than mogeneous of degree one, then ∇f is homogeneous of degree the prediction error from predicting yt based on lagged y only Dornbush model (Taylor) Rational expectations model given by: 2 zero. (σprediction(yt|yt−1, yt−2,... )). Note this is not a “philo- sophical” point, and does not address causation vs. correla- mt − pt = −α(ˆet+1 − et) tion issues. Strong set order (Producer 32) A ≤ B in the strong set order iff pt − pt−1 = β(et − pt). That is, the hypothesis that p does not Granger cause y is for all a ∈ A and b ∈ B with a ≥ b, then a ∈ B and b ∈ A. Equivalently, every element in A \ B is ≤ every element in the hypothesis that coefficients on lagged ps are all zero. (Note higher e is currency depreciation.) A ∩ B, which is ≤ every element in B \ A. Okun’s Law (Taylor) Prices adjust gradually, but the exchange rate overshoots. By first equation, if money supply is shocked there must be n Y − Y ∗ Meet (Producer 36) For x, y ∈ R , the meet is x ∧ y ≡ = −2.5(u − u∗) expected future currency appreciation (e ↓); but there must Y (min{x1, y1},..., min{xn, yn}). More generally, on a par- be long run depreciation (e ↑) by second equation. tially ordered set, x ∧ y is the greatest lower bound of x and where Y ∗ is potential GDP and u∗ is the natural rate of y. unemployment. Philips Curve (Taylor) Connection between inflation and either GDP gap or unemployment (by Okun’s Law). n Cagan money demand (Taylor) mt−pt = ayt−bit, where money Join (Producer 36) For x, y ∈ R , the meet is x ∨ y ≡ supply m, price level p, and output gap y are all logs. LHS 1. Traditional (short-run): (max{x1, y1},..., max{xn, yn}). More generally, on a par- is real money balance; when interest rate is higher, demand tially ordered set, x ∨ y is the least upper bound of x and ∗ y. for money is lower (semi-log specification). π = ϕsry = −2.5ϕsr(u − u ). Note Lucas prefers log-log specification mt − pt = ayt − b log(it). Sublattice (Producer 37) A set X is a sublattice iff ∀x, y ∈ X, we 2. Expectations-augmented (long-run): n have x ∧ y ∈ X and x ∨ y ∈ X. Any sublattice in R can be Rational expectations models (Taylor) Variables in system are described as an intersection of sets of the forms π =π ˆ + ϕ y =π ˆ − 2.5ϕ (u − u∗). a function of exogenous shocks, lags, and expected future lr lr values of variables. General solution method: 1. A product set X1 × · · · × Xn; or Time inconsistency (Taylor) Three types of policy plans π2 that 1. Set up in vector notation (may require time shifting and aim to maximize social welfare S(x1, x2, π1, π2) where π are 2. A set {(x1, . . . , xn): xi ≤ g(xj )}, where g(·) is an in- substitution. policies and x are responses (including expectations of pol- creasing function. 2. Use method of undetermined coefficients to get a (de- icy): terministic) difference equation in the coefficients (γi) Orthants of Euclidean space (?) on the MA(∞) representation of variables. 1. Consistent (discretion): Take x1 as given when choosing π2. 3. Guess a form for the particular solution, and solve for n 1. ≡ {x: x ≥ 0} ≡ {x: xi ≥ 0 ∀i}, which includes the coefficients. 2. Optimal (rule-based): Recognize that x1 responds to R+ π . axes and 0. 4. Cholesky decompose the coefficient matrix (A = 2 −1 HΛH ) in the homogenous part of the determinis- 3. Inconsistent (cheating): Promise optimal, but then 2. {x: x > 0} ≡ {x: xi ≥ 0 ∀i}\ 0, which includes the tic difference equation to get a decoupled equation (in name consistent. axes, but not 0. µ ≡ H−1γH). i i n 3. R++ ≡ {x: x  0} ≡ {x: xi > 0 ∀i}, which includes Lucas supply function (Taylor) y = φ(π − πˆ ) + ε . Can be 5. Apply stability condition to pin down some element(s) t t t t neither the axes nor 0. of µ0. seen directly from Expectations-augmented Philips curve.

44 Sum of powers (?) 4.4 Functions Taylor series (C&B 5.5.20–1) If g(x) has derivatives of order r (i.e., (r) n g (x) exists), then for any constant a the Taylor polynomial X n(n + 1) i = Function (Math camp) A binary relation that is of order r about a is: 2 i=1 1. Well defined on its range: ∀x, ∃y such that xRy; n r (i) X 2 n(n + 1)(2n + 1) X g (a) i i = 2. Uniquely defined on its range: xRy ∧ xRz =⇒ y = z. g(x) ≈ Tr(x) ≡ (x − a) . 6 i! i=1 i=0 n X n2(n + 1)2 Surjective (Math camp) Range is Y . (a.k.a. “onto.”) i3 = 4 0 0 The remainder from the approximation Taylor approxima- i=1 Injective (Math camp) x 6= x =⇒ f(x) 6= f(x ). R x (r+1) r tion, g(x) − Tr(x) equals g (t) · (x − t) /r! dt. This n 2 a X 4 n(n + 1)(2n + 1)(3n + 3n − 1) error always tends to 0 faster than the highest-order explicit i = Bijective (Math camp) Both surjective and injective. (a.k.a. “one- r 30 term—i.e., limx→a[g(x) − Tr(x)]/(x − a) = 0. i=1 to-one.”)

Geometric series (?) Monotone (C&B p. 50) g(x) monotone on its domain iff either u > Derivative of power series (Hansen 2-29) If g(t) ≡ a0 + a1t + n n+1 v =⇒ g(u) > g(v) (increasing), or u < v =⇒ g(u) < g(v) 2 P i X a(1 − R ) a2t + ··· = ait , then aRi = (decreasing). A monotone function is one-to-one and onto i 1 − R i=0 from its domain to its image (note, not necessarily to its ∞ k a range). ∂ X i g(t) = a k!. R = (if |R| < 1) k k 1 − R n ∂t t=0 i=0 Convex function (Hansen 2-22) g : R → R is convex iff ∀x, y ∈ n R , ∀λ ∈ [0, 1], λg(x) + (1 − λ)g(y) ≥ g(λx + (1 − λ)y). 4.2 Set theory This is useful for calculating moments if we can express the Binomial theorem (C&B 3.2.2) For any x, y ∈ R and n ∈ Z, mgf as a power series. n ≥ 0, then (x + y)n = Pn nxiyn−i. Special cases: 1 = Supremum limit (Math camp) lim supn An ≡ limn An ≡ i=0 i T∞ S∞ n Pn n i n−i n Pn n Ak. Set of all elements appearing in an infinite (p + (1 − p)) = i=0 i p (1 − p) and 2 = i=0 i . 0 0 0 n=1 k=n Taylor series examples (?) f(x) ≈ f(x ) + ∇f(x ) · (x − x ). number of Ai. R ∞ α−1 −t Note that for concave (convex) f(·), the LHS is weakly less Gamma function (C&B p. 99) Γ(α) ≡ 0 t e dt = Infimum limit (Math camp) lim inf A ≡ lim A ≡ 1 (greater) than the RHS. n n n n R [log(1/t)]α−1 dt on α > 0. (Γ is also defined everywhere S∞ T∞ A . Set of all elements appearing in all but 0 n=1 k=n k else except for 0, −1, −2,... .) For small ε, we have: a finite number of Ai. 1. Γ(α + 1) = αΓ(α), when α > 0. Partition (Math camp) {Aα} such that Aα ⊆ A and enε ≈ 1 + nε; 2. Γ(n) = (n − 1)! for integer n > 0. 1. Non-empty: Aα 6= ∅; √ (1 + αε)n ≈ 1 + αnε; S 3. Γ( 1 ) = π. 2. Exhaustive: Aα = A; 2 log(1 + ε) ≈ ε; 3. Non-overlapping: A 6= A =⇒ A ∩ A = . i j i j ∅ aε − 1 20 gamma(x) 0 0 ≈ log a. Convex set (Math camp) ∀x and x ∈ S, ∀t ∈ [0, 1], tx + (1 − t)x ∈ 15 ε S (i.e., the linear combination of any two points in S is also 10 in S.) 5 0 Second-order is -5

4.3 Binary relations -10

-15 0 0 0 1 0 2 0 0 f(x) ≈ f(x ) + ∇f(x ) · (x − x ) + 2 (x − x ) · ∇ f(x )(x − x ). (Math camp) -20 Complete aRb ∨ bRa. -4 -2 0 2 4

Transitive (Math camp) aRb ∧ bRc =⇒ aRc. p Beta function (C&B eq. 3.3.17) B(α, β) ≡ Γ(α) · Γ(β)/Γ(α + β). Mean value expansion (Mahajan 3-13; Hayashi 470–1) If h: R → Symmetric (Math camp) aRb ⇐⇒ bRa. q R is a continuously differentiable function, then h(·) admits Logistic function (D&M 515; Metrics P.S. 5-4a) Negative transitive (Math camp) aRb =⇒ cRb ∨ aRc. a mean value expansion about θ x Reflexive (Math camp) aRa. −x −1 e Λ(x) ≡ (1 + e ) ≡ . ¯ 1 + ex h(b) = h(θ) + ∂h(b) (b − θ) Irreflexive (Math camp) a 6= b =⇒ aRb. ∂b Inverse is Λ−1(µ) ≡ `(µ) ≡ log[µ/(1 − µ)]. First derivative Asymmetric (Math camp) aRb ∧ b 6= c =⇒ cRa. ¯ x where b is a value lying between b and θ (note the MVT ap- 0 e ¯ Equivalence relation (Math camp) A binary relation that is re- λ(x) ≡ Λ (x) ≡ = Λ(x)(1 − Λ(x)) = Λ(x)Λ−x. plies to individual elements of h(·), so b actually differs from flexive, symmetric, and transitive. (1 + ex)2 element to element in the vector equation.

45 0 0 Pn Pm 2 4.5 Correspondences Determinant (Amemiya Ch. 11; Greene 23–6) |A| ≡ 6. tr(An×mA ) = tr(A A) = i=1 j=1 aij = P (−1)i+j a |A |, where j is an arbitrarily chosen in- Pm 0 Pm 0 i ij ij j=1 aj aj = j=1 tr(aj aj ), where ai are the columns Correspondence (Micro math 5) φ: X ⇒ Y is a mapping from teger and Aij is the matrix that deletes the ith row and jth of A. elements of X to subsets of Y (i.e., for x ∈ X, we have column from A. Caution: in general, tr(AB) 6= tr(A) · tr(B). φ(x) ⊆ Y ). This is the volume of the parallelotope with edges along the columns of A. Lower semi-continuity (Micro math 7; Clayton G.E. I) φ: X Y Linear independence, rank (Amemiya Ch. 11; Greene 20–3, 39) Vec- ⇒ P is lower semi-/hemi-continuous at x ∈ X iff for every open tors {xi} linearly independent iff i cixi = 0 =⇒ ∀i, 1. Determinant of a 2 × 2 matrix A is a11a22 − a12a21. set G ⊆ Y containing φ(x), there is an open set U(x) ⊆ X ci = 0. 0 0 containing x such that if x ∈ U(x), then φ(x ) ∩ G 6= ∅. 2. Determinant of a 3 × 3 matrix A is a11a22a33 − a11a32a23 − a21a12a33 + a21a32a13 + a31a12a23 − 1. The following are equivalent and called nonsingularity Intuitively, any element in φ(x) can be “approached” from a31a22a13. of square A: all directions. 3. |A| = |A0|. • |A| 6= 0, A−1 exists; For a single-valued correspondence (i.e., a function), lsc Q • A is column independent or row independent; ⇐⇒ usc ⇐⇒ continuous. 4. For diagonal A, |A| = i aii; thus |I| = 1. n • ∀y, ∃x, Ax = y; 5. |cAn×n| = c |A|. Upper semi-continuity (Micro math; Clayton G.E. I; Bernheim 32) • Ax = 0 =⇒ x = 0. 6. If A, B both n × n, then |AB| = |A| |B|; together with φ: X ⇒ Y is upper semi-/hemi-continuous at x ∈ X iff |I| = 1, this gives |A−1| = |A|−1. 2. Column rank (maximum number of linearly indepen- for every open set G ⊆ Y containing φ(x), there is an open dent columns) equals row rank, which implies rank A = 0 set U(x) ⊆ X containing x such that if x ∈ U(x), then 7. If any column/row of A is all zeroes, |A| = 0; switch- rank A0. 0 φ(x ) ⊂ G. ing two adjacent columns/rows changes the sign of the 3. A is full rank iff rank equals min(n, m); a square determinant; if two c/rs are identical, the determinant n×m Identically, if for any sequences xt → x and yt → y with matrix is full rank iff it is nonsingular. is zero. yt ∈ φ(xt) for all t, we have y ∈ φ(x). 4. rank(A) = rank(A0A) = rank(AA0); in particular, 8. For any eigenvalue λ, |A − λI| = 0. 0 Intuitively, φ(x) does not “suddenly contain new points;” An×m with m ≤ n (i.e., “tall”) is full rank iff A A i.e., the graph of the function is a closed set. Q 9. |An×n| = i λi (i.e., determinant is the product of is nonsingular. eigenvalues). For a single-valued correspondence (i.e., a function), usc 5. A is nonsingular =⇒ rank(AB) = rank(B). ⇐⇒ lsc ⇐⇒ continuous. −1 −1 Matrix inverse (Amemiya Ch. 11; Greene 30–1) A A = AA = I 6. rank(An×n) equals the number of nonzero eigenvalues for any matrix A such that |A| 6= 0. (not necessarily distinct). Brouwer’s Fixed Point Theorem (Clayton G.E. I) If a function f : A → A is continuous, and A is compact and convex, then Eigen decomposition (Mathworld) For any square A with Λ a di- f(x) = x for some fixed point x ∈ A. 1. If A, B both n × n, and |A|= 6 0, |B| 6= 0, then (AB)−1 = B−1A−1. agonal matrix containing eigenvalues of A, and P a matrix −1 −1 0 0 −1 whose columns are the eigenvectors of A (ordered as in Λ), Kakutani’s Fixed Point Theorem (Clayton G.E. I; Bernheim 32) 2. If A exists, then (A ) = (A ) . then A = PΛP−1. Suppose S is compact and convex. γ : S S is convex- ⇒ 3. The inverse of a 2 × 2 matrix is valued and upper semi-continuous. Then there exists a fixed ∗ ∗ ∗ Orthogonal decomposition (Amemiya Ch. 11; Greene 37–8) a.k.a. point s ∈ S such that s ∈ γ(s ). −1 a b 1  d −b 1  d −b spectral decomposition. For any symmetric A, ∃Hn×n, = = . 0 0 c d |A| −c a ad − bc −c a Λn×n such that A = HΛH , where H H = I (i.e., H is 0 4.6 Linear algebra orthogonal) and Λ is diagonal. Note that H AH = Λ. 4. For diagonal A, the inverse is diagonal with elements This is just the eigen decomposition for a symmetric matrix; a−1. Matrix (Amemiya Ch. 11) Matrix An×m ≡ {aij } has n rows and m ii here the matrix of eigenvectors is orthogonal. columns. Orthogonal matrix (Amemiya Ch. 11) A square matrix An×n is or- Matrix squareroot (Greene 42) Given a positive definite symmet- 0 −1 0 1. Square iff n = m. thogonal iff A = A . This corresponds to having columns ric matrix A = HΛH , we can get a (symmetric!) matrix orthogonal and normalized. squareroot A1/2 ≡ HΛ1/2H0, where Λ1/2 is the diagonal 0 2. Transpose A ≡ {aji} is m × n. matrix that takes the squareroot of diagonal elements of Λ. P Trace (Amemiya Ch. 11; Metrics section; Greene 41–2) tr(An×n) ≡ aii 3. Symmetric iff square and ∀i, j, aij = aji, or equiva- i lently iff A = A0. (i.e., the sum of the diagonal elements). Eigenvector, eigenvalue (Amemiya Ch. 11; Woodford 670–2) Note some results may only hold for symmetric matrices (not made 4. Diagonal iff ∀i 6= j, aij = 0 (i.e., nondiagonal elements 1. tr(cA) = c tr(A). explicit in Amemiya). Let the orthogonal decomposition of are 0). symmetric A be A = HΛH0, where H is orthogonal and Λ 2. tr(A0) = tr(A). is diagonal. Matrix multiplication (Amemiya Ch. 11) If c is a scalar, cA = 3. tr(A + B) = tr(A) + tr(B). Ac = {caij }. If A is n × m and B is m × r, then C = AB is 1. Diagonal elements of Λ ≡ D(λi) are eigenvalues (a.k.a. Pm 4. tr(An×mBm×n) = tr(BA). an n × r matrix with cij = k=1 aikbkj . If AB is defined, characteristic roots) of A; columns of H are the corre- 0 0 0 P then (AB) = B A . 5. Trace is the sum of eigenvalues: tr(An×n) = i λi. sponding eigenvectors (a.k.a. characteristic vectors).

46 2. If h is the eigenvector corresponding to λ, then Ah = 7. A ≥ 0 =⇒ |A| ≥ 0 and A > 0 =⇒ |A| > 0— Partitioned matrices (Hayashi 670–673; Greene 32–3) λh and |A − λI| = 0. (This is an alternate definition note the natural analogues for negative (semi)definite   A11 A12 −1 for eigenvectors/eigenvalues.) matrices do not hold. = |A22| · |A11 − A12A22 A21| A21 A22 3. Matrix operations f(A) can be reduced to the corre- Orthogonal completion (Hansen 5-24) For a “wide” matrix An×k −1 0 = |A11| · |A22 − A21A11 A12|; sponding scalar operation: f(A) = HD[f(λi)]H . (i.e., n < k) with full row rank (i.e., all rows are linearly inde-      P  4. rank(An×n) equals the number of nonzero eigenvalues pendent), we can construct a (non-unique) (k−n)×k matrix A11 ··· A1N c1 i A1ici (not necessarily distinct). A⊥ such that:  . . .   .  =  .  ;  . .. .   .   .  5. A B and BA have the same nonzero eigenval- n×m m×n 1. The rows of A and A are linearly independent (i.e., A ··· A c P A c ues. ⊥ M1 MN N i Mi i |(A0, A0 )| 6= 0; ⊥ A11   c1   A11c1  6. Symmetric An×n and Bn×n can be diagonalized by the 2. The rows of A⊥ are orthogonal to the rows of A (i.e.,  .   .   .  same orthogonal matrix H iff AB = BA. 0 0 .. . = . ; A⊥A = 0k−n×n, or identically, AA = 0n×k−n).    .   .  Q ⊥ 7. |An×n| = i λi (i.e., determinant is the product of AMM cN AMM cM For a “tall” matrix B (i.e., n > k) with full column rank eigenvalues). n×k A B  (i.e., all columns are linearly independent), we can construct 11 11 8. tr(A ) = P λ (i.e., trace is the sum of eigenval- n×n i i a (non-unique) n × (n − k) matrix B such that: A B =  ..  ; ues). ⊥ diag diag  .  −1 AMM BMM 9. A and A have same eigenvectors and inverse eigen- 1. The columns of B and B⊥ are linearly independent (i.e., values. |(B, B⊥)| 6= 0; A0 BA 2. The columns of B are orthogonal to the columns diag diag A 2 × 2 matrix A has two explosive eigenvalues (outside the ⊥ of B (i.e., B0 B = 0 , or identically, B0B =  A0 B A ··· A0 B A  unit circle) iff either ⊥ n−k×k ⊥ 11 11 11 11 1M MM 0k×n−k). =  . . .  ;  . .. .  • Case I: 0 0 Idempotent matrix (Hansen 5-33–4; Amemiya 37; Hayashi 30–1, 36–7, AMM BM1A11 ··· AMM BMM AMM |A| + tr A ≥ −1, 244-5) P is idempotent iff PP = P. 0 |A| − tr A ≥ −1, 1. If P is idempotent, then so is I − P. AdiagBc |A| ≤ 1.  0 0  2. Every eigenvalue of a (symmetric?) idempotent matrix A11B11c1 + ... + A11B1M cM is either 0 or 1. =  .  ; • Case II:  .  3. A symmetric and idempotent matrix is positive semidef- 0 0 AMM BM1c1 + ... + AMM BMM cM |A| + tr A ≤ −1, inite. A−1  |A| − tr A ≤ −1, 4. For any Xn×k, we can find an idempotent projection 11 n −1  .  |A| ≤ 1 (trivially). matrix (that projects onto the subspace of R spanned A =  .  . 0 −1 0 diag  .  by the columns of X): PX ≡ X(X X) X . −1 AMM Positive definiteness, &c. (Amemiya Ch. 11; Greene 46–9) Symmet- 5. For any Xn×k tall (i.e., n > k), we can find an idempo- ric A is positive definite iff ∀x 6= 0, x0Ax > 0. Also written tent projection matrix (that projects onto the subspace The inverse of a 2 × 2 partitioned matrix is: A > 0; if A − B is positive definite, write A > B. If equal- n of R not spanned by columns of X): I − PX = PX⊥ .  −1 A11 A12 ity isn’t strict, A is positive semidefinite (a.k.a. nonnegative 6. rank(P) = tr(P) if P is idempotent. definite). Negative definite and negative semidefinite (a.k.a. A21 A22  −1 −1 −1 −1  nonpositive definite) similarly defined. Matrix derivatives (Ying handout; Greene 51–53; MaCurdy) [Note A + A A12F2A21A −A A12F2 = 11 11 11 11 , derivative convention may be transpose of MWG.] −F A A−1 F 1. A symmetric matrix is positive definite iff its eigenval- 2 21 11 2 ues are all positive; similar for other (semi-)definiteness. ∂ 0 where F ≡ (A − A A−1A )−1. The upper left block 1. ∂x Ax = A . 2 22 21 11 12 2. If A > 0, all diagonal elements are > 0 (consider a can also be written as F ≡ (A − A A−1A )−1 2. ∂ x0Ax = (A + A0)x. 1 11 12 22 21 quadratic form with a unit vector). ∂x ∂ 0 0 Kronecker product (Hayashi 673; Greene 34–5) For AM×N and 3. B0B ≥ 0; if B has full column rank, then B0B > 0. 3. ∂A x Ax = xx . BK×L, −1 ∂ 0−1   4. A > 0 =⇒ A > 0 (i.e., the inverse of a positive 4. ∂A log |A| = A . a11B ··· a1N B . . definite matrix is positive definite). ∂ 0 A ⊗ B =  . .. .  5. −1 log |A| = −A . . 0 ∂A  . .  5. ∀Bn×k “tall” (i.e., n ≥ k), An×n ≥ 0 =⇒ B AB ≥ 0. ∂ −1 ∂ −1 ∂ aM1B ··· aMN B If B is full rank (i.e., rank(B) = k), then An×n > 6. ∂w log |A| = |A| · ∂w |A| = tr[A · ∂w A] (may re- 0 =⇒ B0AB > 0. quire symmetric A?). is MK × NL. Operation is not commutative. ∂ −1 −1 ∂ −1 6. For An×n and Bn×n both positive definite, A ≥ 7. ∂w A = −A [ ∂w A]A (may require symmetric 1. (A ⊗ B)(C ⊗ D) = AC ⊗ BD (assuming matrix multi- B ⇐⇒ B−1 ≥ A−1, and A > B ⇐⇒ B−1 > A−1. A?). plications are conformable);

47 0 0 0 1 2. (A ⊗ B) = A ⊗ B ; Deviation from means (Hansen 5-34–5; Greene 14–5) Mιx is the is the symmetric idempotent matrix with 1 − n on diagonal, 0 P 2 −1 −1 −1 vector of xi − x¯, and x Mιx = (xi − x¯) , where 1 ∗ 3. (A ⊗ B) = A ⊗ B ; i and − n off diagonal. 4. tr(AM×M ⊗ BK×K ) = tr(A) · tr(B); M K 0 −1 0 1 0 5. |AM×M ⊗ BK×K | = |A| · |B| . Mι ≡ I − ι(ι ι) ι = I − n ιι

5 References MaCurdy MaCurdy: Economics 272 notes Mahajan Mahajan: Economics 270/271 notes Amemiya Amemiya: Introduction to Econometrics and Statistics Manuel Amador: Economics 211 lectures Bernheim Bernheim: Economics 203 notes Math camp Fan: Math camp lectures C&B Casella, Berger: Statistical Inference, 2 ed. MathWorld mathworld.wolfram.com Choice Levin, Milgrom, Segal: Introduction to Choice Theory notes Max notes Floetotto: Economics 210 section Clayton Featherstone: Economics 202 section notes Micro Math Levin, Rangel: Useful Math for Microeconomists notes Consumer Levin, Milgrom, Segal: Consumer Theory notes MWG Mas-Collell, Whinston, Green: Microeconomic Theory D&M Davidson, MacKinnon: Estimation and Inference in Econometrics Nir Jaimovich: Economics 210 notes G.E. Levin: General Equilibrium notes Producer Levin, Milgrom, Segal: Producer Theory notes Greene Greene: Econometric Analysis, 3 ed. Taylor Taylor: Economics 212 notes Hansen Hansen: Economics 270/271 notes Uncertainty Levin: Choice Under Uncertainty notes Hayashi Hayashi: Econometrics Wikipedia wikipedia.com Jackson Jackson: Economics 204 lectures Woodford Woodford: Interest and Prices

∗ 0 In Greene, Mι is called M .

48 Index

Γ(·) (Gamma function), 45 Autoregressive conditional heteroscedastic process, 17 Φ(·) (standard normal cdf), 7 Autoregressive process of degree p, 16 α3 (skewness), 7 Autoregressive process of degree one, 16 α4 (kurtosis), 7 Autoregressive/moving average process of degree (p, q), 16 β-δ discounting, 43 Axioms of Probability, 4 χ2 distribution, 8 µ (expected value), 6 B(·) (Beta function), 45 1 µn (central moment), 7 B (Borel field), 4 0 µn (moment), 7 Banach fixed point theorem, 41 φ(·) (characteristic function), 7 Basic set theory, 4 φ(·) (standard normal pdf), 7 Basu’s theorem, 11 ρ (correlation), 7 Bayes’ Rule, 4 σ (standard deviation), 7 Bayesian Nash equilibrium, 36 σ2 (variance), 7 Benveniste-Scheinkman Theorem, 42 0-1 loss, 14 Bernoulli utility function, 32 2SLS (Two-Stage Least Squares), 21 Bertrand competition, 36 3SLS (Three-Stage Least Squares), 22 Bertrand competition—horizontal differentiation, 36 Bertrand competition—non-spatial differentiation, 36 Absolute risk aversion, 33 Bertrand competition—vertical differentiation, 36 Absolute value loss, 14 Best linear predictor, 18 Action space, 14 Best linear unbiased estimator, 18 Adding-up, 29 Best unbiased estimator, 13 Additively separability with stationary discounting, 40 Beta function, 45 Adverse selection, 38 Biases affecting OLS, 18 Aggregating consumer demand, 32 Big O error notation, 10 AIIA (Arrow’s independence of irrelevant alternatives), 39 Bijective, 45 Almost sure convergence, 10 Billingsley CLT, 16 Alternate hypothesis, 14 Binary response model, 24 Analogy principle, 11 Binomial theorem, 45 Ancillary statistic, 11 Bivariate normal distribution, 8 Annihilator matrix, 18 Blackwell’s sufficient conditions for a contraction, 41 Anonymity, 39 BLP (best linear predictor), 18 AR(1) (autoregressive process of degree one), 16 BLUE (best linear unbiased estimator), 18 AR(p) (autoregressive process of degree p), 16 BNE (Bayesian Nash equilibrium), 36 ARCH (autoregressive conditional heteroscedastic) process, 17 Bonferroni’s Inequality, 25 ARMA process of degree (p, q), 16 Boole’s Inequality, 25 Arrow’s independence of irrelevant alternatives, 39 Borel field, 4 Arrow’s Theorem, 39 Borel Paradox, 5 Arrow-Pratt coefficient of absolute risk aversion, 33 Brouwer’s Fixed Point Theorem, 46 Ascending price auction, 37 Brownian motion, 24 Associativity, 4 BS (Benveniste-Scheinkman Theorem), 42 Asymmetric, 45 Budget set, 31 Asymptotic equivalence, 10 BUE (best unbiased estimator), 13 Asymptotic normality for GMM estimator, 13 Bulow-Rogoff defaultable debt model, 43 Asymptotic normality for M-estimator, 13 Asymptotic normality for MD estimator, 13 Cagan model, 44 Asymptotic normality for MLE, 12 Cagan money demand, 44 Asymptotic properties of OLS estimator, 18 CARA (constant absolute risk aversion), 33 Asymptotic significance, 15 Carrier axiom, 38 Asymptotic size, 15 Cauchy sequence, 41 Asymptotic variance, 12 Cauchy-Schwarz Inequality, 25 Asymptotically efficient estimator, 12 cdf (cumulative distribution function), 5 Asymptotically normal estimator, 12 CE (correlated equilibrium), 35 Autocovariance, 15 Censored response model, 25

49 Central Limit Theorem for MA(∞), 16 Continuous r.v., 5 Central Limit Theorem for ergodic stationary mds, 16 Contract curve, 33 Central Limit Theorem for iid samples, 10 Contraction, 41 Central Limit Theorem for niid samples, 10 Contraction Mapping Theorem, 41 Central Limit Theorem for zero-mean ergodic stationary processes, 16 Contraction on subsets, 41 Central moment, 7 Convergence almost surely, 10 Certain equivalent, 33 Convergence in Lp, 10 Certain equivalent rate of return, 33 Convergence in distribution, 10 Characteristic function, 7 Convergence in mean square, 10 Characteristic root, 46 Convergence in probability, 9 Characteristic vector, 46 Convergence in quadratic mean, 10 Chebychev’s Inequality, 25 Convergent sequence, 41 Chi squared distribution, 8 Convex function, 45 Choice rule, 27 Convex game, 38 CLT (Central Limit Theorem) for MA(∞), 16 Convex preference, 28 CLT (Central Limit Theorem) for ergodic stationary mds, 16 Convex set, 45 CLT (Central Limit Theorem) for iid samples, 10 Convexity, 28 CLT (Central Limit Theorem) for niid samples, 10 Convolution formulae, 6 CLT (Central Limit Theorem) for zero-mean ergodic stationary processes, 16 Core, 37 CMT (Continuous Mapping Theorem), 9 Core convergence, 37 CMT (Contraction Mapping Theorem), 41 Correlated equilibrium, 35 Cobb-Douglas production function, 40 Correlation, 7 Coefficient of absolute risk aversion, 33 Correspondence, 46 Coefficient of relative risk aversion, 33 Cost function, 29 Commutativity, 4 Cost minimization, 29 Compensated demand correspondence, 31 Cost of business cycles, 43 Compensating variation, 32 Counting, 4 Competitive producer behavior, 28 Cournot competition, 36 Complement goods, 31 Covariance, 7 Complement inputs, 30 Covariance stationarity, 15 Complete, 45 Cram´er-RaoInequality, 14 Complete information, 34 Cram´er-Wold Device, 10 Complete metric space, 41 Critical region, 14 Complete statistic, 11 CRLB (Cram´er-Raolower bound), 14 Completeness, 27 CRRA (coefficient of relative risk aversion), 33 Conditional expectation, 6 CRRA (constant relative risk aversion), 33 Conditional factor demand correspondence, 29 CRRA (constant relative risk aversion) utility function, 40 Conditional homoscedasticity, 10 CRS (constant returns to scale), 28 Conditional pdf, 5 CS (Marshallian consumer surplus), 32 Conditional pmf, 5 Cumulant generating function, 7 Conditional probability, 4 Cumulative distribution function, 5 Conditional variance, 7 CV (compensating variation), 32 Condorcet winner, 39 Condorcet-consistent rule, 39 DARA (decreasing absolute risk aversion), 33 Consistency for for MLE, 12 Decision rule, 14 Consistency for GMM estimators, 13 Defaultable debt, 43 Consistency with compact parameter space, 11 Deferred acceptance algorithm, 38 Consistency without compact parameter space, 12 Delta Method, 10 Consistent estimator, 11 Demand for insurance, 33 Consistent strategy, 36 DeMorgan’s Laws, 4 Consistent test, 15 Derivative of power series, 45 Constant relative risk aversion utility function, 40 Descending price auction, 37 Constant returns to scale, 28 Determinant, 46 Consumer welfare: price changes, 32 Deviation from means, 48 Continuous function, 41 Diagonal matrix, 46 Continuous Mapping Theorem, 9, 10 Dickey-Fuller Test, 24 Continuous preference, 28 Difference-stationary process, 24

50 Direct revelation mechanism, 34 Extensive form trembling hand perfection, 36 Discrete r.v., 5 Extremum estimator, 11 Disjointness, 4 Distance function principle, 14 F distribution, 8 Distributive laws, 4 F test, 19 Dixit-Stiglitz aggregator, 40 Factorial moment generating function, 7 Dobb’s Convergence Theorem, 43 Factorization theorem, 11 Dominant strategy, 35 FCLT (Function Central Limit Theorem), 24 Dominated strategy, 35 Feasible allocation, 33 Donsker’s Invariance Principle, 24 Feasible generalized least squares, 19 Dornbush model, 44 Feasible payoff, 37 DRRA (decreasing relative risk aversion), 33 FGLS (Feasible Generalized Least Squares), 19 Dummy axiom, 38 FIML (Full Information Maximum Likelihood), 23 Durbin-Wu-Hausman test, 21 Finitely repeated game, 37 Dutch auction, 37 First Welfare Theorem, 33, 41 DWH (Durbin-Wu-Hausman) test, 21 First-order ancillary statistic, 11 Dynamic programming: bounded returns, 42 First-order stochastic dominance, 33 Dynamic systems, 41 First-price sealed bid auction, 37 Dynamic systems in continuous time, 42 Fisher Information, 10, 14 Fisher Information Equality, 14 Edgeworth box, 33 Fitted value, 18 Efficient GMM, 13, 20 FIVE (Full-Information Instrumental Variables Efficient), 21 EFTHPNE (Extensive form trembling hand perfect equilibrium), 36 Folk Theorem, 37 Eigen decomposition, 46 Free disposal, 28 Eigenvalue, 46 Frisch-Waugh Theorem, 19 Eigenvector, 46 Full Information Maximum Likelihood, 23 Eigenvector, eigenvalue, 46 Full-Information Instrumental Variables Efficient, 21 Elasticity, 44 Function, 45 Elasticity of intertemporal substitution, 40 Function Central Limit Theorem, 24 EMP (expenditure minimization problem), 31 FWL (Frisch-Waugh-Lovell) Theorem, 19 Empirical distribution, 11 Endowment economy, Arrow-Debreu, 39 Gale-Shapley algorithm, 38 Endowment economy, sequential markets, 39 Game tree, 34 Engle curve, 32 Gamma function, 45 English auction, 37 GARCH process, 17 Entry deterrence, 37 GARP (Generalized Axiom of Revealed Preference), 27 Envelope theorem, 30 Gauss-Markov theorem, 18 Envelope theorem (integral form), 30 Gaussian distribution, 7 Equilibrium dominance, 37 Gaussian regression model, 11 Equivalence relation, 45 General equilibrium, 40 Equivalent variation, 32 Generalized Axiom of Revealed Preference, 27 Ergodic stationary martingale differences CLT, 16 Generalized least squares, 19 Ergodic Theorem, 16 Generalized Method of Moments estimator, 13 Ergodicity, 16 Generalized Tobit model, 25 Estimating AR(p), 17 Geometric series, 45 Estimating S, 18 Gibbard-Satterthwaite Theorem, 39 Estimating number of lags, 17 Giffen good, 31 Estimator, 11 GLS (Generalized Least Squares), 19 Euclidean norm, 41 GMM (Generalized Method of Moments), 13, 19 Euler’s law, 44 GMM hypothesis testing, 20 EV (equivalent variation), 32 Gordin’s CLT, 16 Excess demand, 34 Gordin’s condition, 16 Expected utility function, 32 Gorman form, 32 Expected value, mean, 6 Granger causality, 44 Expenditure function, 31 “Grim trigger” strategy, 37 Expenditure minimization problem, 31 Gross complement, 32 Exponential families, 8 Gross substitute, 32

51 Gross substitutes property, 34 Instrumental variables estimator, 20 Groves mechanism, 39 Integrated process of order d, 24 GS (Gale-Shapley) algorithm, 38 Integrated process of order 0, 24 GS (Gibbard-Satterthwaite) Theorem, 39 Intereuler equation, 40 Interpersonal comparison, 28 H¨older’sInequality, 25 Intertemporal elasticity of substitution, 40 Hall’s martingale hypothesis, 43 Intertemporal Euler equation, 40 Hamiltonian, 42 Irreflexive, 45 Hansen’s test of overidentifying restrictions, 20 Iterated deletion of dominated strategies, 35 HARP (Houthaker’s Axiom of Revealed Preferences), 27 Iterated expectations, 6 Hausman Principle, 14 IV (instrumental variables) estimator, 20 Heckman two-step, 25 Herfindahl-Hirshman index, 36 J statistic, 20 Heteroscedasticity robust asymptotic variance matrix, 11 Jacobian, 6 Heteroscedasticity-robust standard error, 18 Jensen’s Inequality, 25 Hicksian demand correspondence, 31 JGLS (Joint Generalized Least Squares), 22 Holmstrom’s Lemma, 30 Join, 44 Homothetic preference, 28 Joint cdf, 5 Hotelling spatial location model, 36 Joint Generalized Least Squares, 22 Hotelling’s Lemma, 29 Joint pdf, 5 Houthaker’s Axiom of Revealed Preferences, 27 Joint pmf, 5 Hyperbolic discounting, 43 Hypothesis testing, 14 Kakutani’s Fixed Point Theorem, 46 Kocherlakota two-sided lack of commitment model, 43 I(0) process, 24 Kolmogorov Axioms or Axioms of Probability, 4 I(1) process, 24 Kronecker product, 47 I(d) process, 24 Kuhn’s Theorem, 35 ID (increasing differences), 30 Kullback-Liebler Divergence, 12 Ideal index, 32 Kurtosis, 7 Idempotent matrix, 47 Identification, 10 Lag operator, 16 Identification in exponential families, 10 Lagrange multiplier statistic, 15 IEoS (intertemporal elasticity of substitution), 40 Landau symbols, 9, 10 Imperfect competition model, 40 Laspeyres index, 32 Imperfect competition: final goods, 40 Law of Iterated Expectations, 6 Imperfect competition: intermediate goods, 40 Law of Large Numbers for covariance-stationary processes with vanishing autocovariances, 15 Imperfect information, 34 Law of Large Numbers for ergodic processes (Ergodic Theorem), 16 Implicit function theorem, 30 Law of Large Numbers for iid samples, 9, 10 Inada conditions, 40 Law of Large Numbers for niid samples, 10 Income expansion curve, 32 Law of Supply, 29 Incomplete information, 34 LeChatelier principle, 30 Incomplete markets, 34 Lehmann-Scheff´eTheorem, 14 Incomplete markets: finite horizon, 43 Lerner index, 36 Incomplete markets: infinite horizon, 43 Level, 14 Increasing differences, 30 Lexicographic preferences, 28 Independence of events, 4 lhc (lower hemi-continuity), 46 Independence of r.v.s, 5 Likelihood function, 11 Independence of random vectors, 5 Likelihood ratio test, 15 Independent, 4, 5 lim inf, 45 Indirect utility function, 31 lim sup, 45 Individually rational payoff, 37 Limited Information Maximum Likelihood, 24 Inferior good, 31 Limiting size, 15 Infimum limit, 45 LIML (Limited Information Maximum Likelihood), 24 Infinitely repeated game, 37 Lindeberg-Levy CLT, 10 Information Equality, 14 Linear GMM estimator, 20 Injective, 45 Linear GMM model, 19 Inner bound, 29 Linear independence, rank, 46

52 Linear instrumental variables, 11 Mean independence, 5 Linear multiple-equation GMM estimator, 21 Mean squared error risk function, 14 Linear multiple-equation GMM model, 21 Mean value expansion, 45 Linear regression model with non-stochastic covariates, 11 Measurability, 5 Linearization in continuous time, 42 Median, 6 Little o error notation, 9 Meet, 44 LLN (Law of Large Numbers) for covariance-stationary processes with vanishing autocovariances, Method of Moments estimator, 13 15 Metric space, 41 LLN (Law of Large Numbers) for ergodic processes (Ergodic Theorem), 16 mgf (moment generating function), 7 LLN (Law of Large Numbers) for iid samples, 9 Milgrom-Shannon, 30 Local level model, 17 Milgrom-Shannon Monotonicity Theorem, 30 Locally non-satiated preference, 28 Minimal sufficient statistic, 11 Location and Scale families, 8 Minimum Distance estimator, 13 Log-linearization, 42 Minkowski’s Inequality, 25 Logistic function, 45 Mixed strategy Nash equilibrium, 35 Logit model, 24 MLE (Maximum Likelihood estimator), 12 Lognormal distribution, 8 MLR (monotone likelihood ratio), 15 Long-run variance, 16 Mode, 6 Loss function, 14, 29 Moment, 7 Lottery, 32 Moment generating function, 7 Lower hemi-continuity, 46 Monopolistic competition, 36 Lower semi-continuity, 46 Monopoly pricing, 29 lsc (lower semi-continuity), 46 Monotone, 45 Lucas cost of busines cycles, 43 Monotone comparative statics, 30 Lucas supply function, 44 Monotone likelihood ratio models, 15 Lucas tree, 42 Monotone preference, 28 Lyapounov’s Theorem, 10 Monotone Selection Theorem, 30 Monotonicity, 39 M-estimator, 12 Moral hazard, 38 MA (moving average) process, 16 Moving average process, 16 Maintained hypothesis, 14 MRS (marginal rate of substitution), 31 Marginal pdf, 5 MRT (marginal rate of transformation), 28 Marginal pmf, 5 MRTS (marginal rate of technological substitution), 28 Marginal rate of substitution, 31 MSE (mean squared error), 14 Marginal rate of technological substitution, 28 MSNE (mixed strategy Nash equilibrium), 35 Marginal rate of transformation, 28 Multiple equation 2SLS (Two-Stage Least Squares), 22 Marginal utility of wealth, 31 Multiple equation TSLS (Two-Stage Least Squares), 22 Markov process, 42 Multiple equation Two-Stage Least Squares, 22 Markov’s Inequality, 25 Multiple-equation GMM with common coefficients, 22 Marshallian consumer surplus, 32 Multivariate covariance, 7 Marshallian demand correspondence, 31 Multivariate normal distribution, 8 Martingale, 16 Multivariate regression, 22 Martingale difference sequence, 16 Multivariate variance, 7 Matrix, 46 Mutual exclusivity, 4 Matrix derivatives, 47 Mutual independence of events, 4 Matrix inverse, 46 Matrix multiplication, 46 Nash reversion, 37 Matrix squareroot, 46 Natural parameter space, 8 Matrix transpose, 46 NCGM (neoclassical growth model), 39 Maximum Likelihood estimator, 12 NCGM in continuous time, 42 Maximum likelihood estimator for OLS model, 18 Negative transitive, 45 Maximum likelihood for SUR, 23 Neoclassical growth model (discrete time), 39 Maximum likelihood with serial correlation, 17 Neutrality, 39 May’s Theorem, 39 Neyman-Pearson Lemma, 14 MCS (monotone comparative statics), 30 NLS (nonlinear least squares), 11 MCS: robustness to objective function perturbation, 30 No Ponzi condition, 41 mds (martingale difference sequence), 16 Nondecreasing returns to scale, 28

53 Nonincreasing returns to scale, 28 PE (Pareto efficiency), 33 Nonlinear least squares, 11 Perfect Bayesian equilibrium, 36 Nonsingularity, 46 Perfect information, 34 Nontransferable utility game, 38 Perfect recall, 34 Normal distribution, 7 Philips Curve, 44 Normal equations, 18 Pivotal mechanism, 39 Normal form, 34 pmf (probability mass function), 5 Normal good, 31 PO (Pareto optimality), 33 Normative representative consumer, 32 Point identification, 10 Normed vector space, 41 Pooled OLS, 23 NPL (Neyman-Pearson Lemma), 14 Portfolio problem, 33 NTU (nontransferable utility) game, 38 Positive definiteness, &c., 47 Null hypothesis, 14 Positive representative consumer, 32 Numerical inequality lemma, 25 Power, 14 Power series, 45 O error notation, 10 Preference axioms under uncertainty, 32 o error notation, 9 Price expansion path, 32 Observational equivalence, 10 Price index, 32 Offer curve, 32, 33 Primal approach to optimal taxation, 40 Okun’s Law, 44 Principle of Optimality, 41 OLG (overlapping generations), 40 Probability density function, 5 OLS F test, 19 Probability function, 4 OLS R2, 18 Probability integral transformation, 6 OLS t test, 18 Probability mass function, 5 OLS (ordinary least squares), 17 Probability space, 4 OLS residual, 18 Probability-generating function, 7 OLS robust t test, 19 Probit model, 11, 24 OLS robust Wald statistic, 19 Producer Surplus Formula, 29 One-sided lack of commitment, 43 Production economy, Arrow-Debreu, 40 One-sided matching, 38 Production economy, sequential markets, 40 One-to-one, 45 Production function, 28 Onto, 45 Production plan, 28 Optimal taxation—primal approach, 40 Production set, 28 Order statistic, 9 Profit maximization, 28 Order symbols, 10 Projection matrix, 18 Ordinary least squares estimators, 18 Proper subgame, 35 Ordinary least squares model, 17 “The Property”, 26 Orthants of Euclidean space, 44 Proportionate gamble, 33 Orthogonal completion, 47 PSF (Producer Surplus Formula), 29 Orthogonal decomposition, 46 PSNE (pure strategy Nash equilibrium), 35 Orthogonal matrix, 46 Public good, 37 Other generating functions, 7 Pure strategy Nash equilibrium, 35 Outer bound, 29 Overlapping generations, 40 Quadrants of Euclidean space, 44 Overtaking criterion, 37 Quadratic loss, 14 Quasi-linear preferences, 28 p-value, 14 Paasche index, 32 R2, 18 Parameter, 10 Ramsey equilibrium, 41 Parametric model, 10 Ramsey model, 39 Pareto optimality, 33 Random effects estimator, 23 Pareto set, 33 Random sample, iid, 9 Partition, 4, 45 Random variable, 5 Partitioned matrices, 47 Random vector, 5 Payoff function, 34 Random walk, 16 PBE (perfect Bayesian equilibrium), 36 Rank, 46 pdf (probability density function), 5 Rao-Blackwell Theorem, 14

54 Rational expectations equilibrium, 34 Seemingly Unrelated Regressions, 11, 22 Rational expectations models, 44 Separable preferences, 28 Rational preference relation, 27 Sequential competition, 36 Rationalizable strategy, 35 Sequential equilibrium, 36 Rationalization: h and differentiable e, 31 Sequentially rational strategy, 36 Rationalization: y(·) and differentiable π(·), 29 SER (standard error of the regression), 18 Rationalization: differentiable π(·), 29 Shadow price of wealth, 31 Rationalization: differentiable e, 31 Shapley value, 38 Rationalization: differentiable h, 31 Shephard’s Lemma, 29, 31 Rationalization: differentiable x, 31 Short and long regressions, 19 Rationalization: differentiable y(·), 29 Shutdown, 28 Rationalization: general y(·) and π(·), 29 SID (strictly increasing differences), 30 Rationalization: profit maximization functions, 28 Sigma algebra, 4 Rationalization: single-output cost function, 29 Signaling, 37, 38 RBC (Real Business Cycles) model, 40 Simple game, 38 RE (random effects) estimator, 23 Simple likelihood ratio statistic, 14 Real Business Cycles model, 40 Single-crossing condition, 30 Reduced form, 23 Single-output case, 29 REE (rational expectations equilibrium), 34 Singularity, 46 Reflexive, 45 Size, 14 Regression, 6 Skewness, 7 Regression model, 11 Slutsky equation, 31 Regular good, 31 Slutsky matrix, 31 Relating Marshallian and Hicksian demand, 31 Slutsky’s Theorem, 10 Relative risk aversion, 33 SM (supermodularity), 30 Revealed preference, 27 Smallest σ-field, 5 Revelation principle, 34 Snedecor’s F distribution, 8 Revenue equivalence theorem, 37 Sonnenschein-Mantel-Debreu Theorem, 34 Risk aversion, 32 Spectral decomposition, 46 Risk function, 14 Spence signaling model, 37, 38 Risk-free bond pricing, 42 SPNE (subgame perfect Nash equilibrium), 36 Robust standard error, 18 Square matrix, 46 Robust t test, 19 SSO (strong set order), 44 Robust Wald statistic, 19 Stable distribution, 8 Roy’s identity, 32 Stage game, 37 RRA (relative risk aversion), 33 Standard deviation, 7 rXY (sample correlation), 9 Standard error, 18 Standard error of the regression, 18 s2 (sample variance), 9 “Stars and bars”, 4 Sample correlation, 9 Stationarity, 15 Sample covariance, 9 Statistic, 9 Sample mean, 9 Statistics in exponential families, 11 Sample median, 9 Stein’s Lemma, 7 Sample selection model, 25 Stick-and-carrot equilibrium, 37 Sample standard deviation, 9 Stochastic ordering, 5 Sample variance, 9 Stock pricing, 43 Samples from the normal distribution, 9 Strategy, 34 Sampling error, 18 Strict dominance, 35 Samuelson-LeChatelier principle, 30 Strict stationarity, 15 Scedastic function, 7 Strictly increasing differences, 30 Score, 14 Strictly mixed strategy, 35 Screening, 38 Strong Law of Large Numbers, 9, 10 SE (sequential equilibrium), 36 Strong set order, 44 Second Welfare Theorem, 34 Strongly increasing differences, 30 Second-order stochastic dominance, 33 Structural form, 23 Second-price auction, 37 Student’s t distribution, 8 SEE (standard error of the equation), 18 Subgame perfect Nash equilibrium, 36

55 Subjective probabilities, 33 Two period intertemporal choice model, 39 Sublattice, 44 Two-sided lack of commitment, 43 Submodularity, 30 Two-sided matching, 38 Substitute goods, 31 Two-Stage Least Squares, 21 Substitute inputs, 30 Two-way rule for expectations, 6 Substitution matrix, 29 Type I/Type II error, 14 Sufficient statistic, 11 Sum of powers, 44 uhc (upper hemi-continuity), 46 Sup norm, 41 ULLN (Uniform Law of Large Numbers), 12 Superconsistent estimator, 11 UMP (uniformly most powerful) test, 14 Supergame, 37 UMP (utility maximization problem), 31 Supermodularity, 30 UMPU (uniformly most powerful unbiased) test, 15 Support, 5 UMVUE (uniformly minimum variance unbiased estimator), 13 Support set, 5 Unanimity, 39 Supremum limit, 45 Unbiased estimator, 9 n Supremum norm (R ), 41 Unbiased test, 15 Supremum norm (real-valued functions), 41 Uncertainty: general setup, 42 SUR (seemingly unrelated regressions), 11, 22 Uncompensated demand correspondence, 31 Sure thing principle, 32 Uniform (Weak) Law of Large Numbers, 12 Surjective, 45 Uniform convergence in probability, 9 sXY (sample covariance), 9 Uniformly minimum variance unbiased estimator, 13 Symmetric, 45 Uniformly most powerful test, 14 Symmetric distribution, 6 Uniformly most powerful unbiased test, 15 Symmetric matrix, 46 Unimodality, 6 System of beliefs, 35 Unit root process, 24 Upper hemi-continuity, 46 t distribution, 8 Upper semi-continuity, 46 t test, 18 usc (upper semi-continuity), 46 Taylor Rule, 44 Utility function, 27 Taylor series, 45 Utility maximization problem, 31 Taylor series examples, 45 Test function, 14 Variance, 7 Test statistic, 14 Variance of residuals, 18 Testing overidentifying restrictions, 20 VCG (Vickrey-Clarke-Groves) mechanism, 39 THPE (trembling-hand perfect equilibrium), 35 Veto player, 38 Three-Stage Least Squares, 22 Vickrey auction, 37 Threshold crossing model, 11 Vickrey-Clarke-Groves mechanism, 39 Time inconsistency, 44 von Neumann-Morgenstern utility function, 32 Tobit model, 25 Voting rule, 39 Top Trading Cycles algorithm, 38 Topkis’ Theorem, 30 Wald statistic, 15 Trace, 46 Walras’ Law, 31 Transferable utility game, 38 Walrasian demand correspondence, 31 1 1 Transformation R → R , 6 Walrasian equilibrium, 33 2 2 Transformation R → R , 6 Walrasian model, 33 Transformation frontier, 28 WAPM (Weak Axiom of Profit Maximization), 29 Transformation function, 28 WARP (Weak Axiom of Revealed Preference), 27 Transitive, 45 WE (Walrasian equilibrium), 33 Transitivity, 27 Weak Axiom of Profit Maximization, 29 Transpose, 46 Weak Axiom of Revealed Preference, 27 Transversality condition, 41 Weak dominance, 35 Trembling-hand perfection, 35 Weak Law of Large Numbers, 9 Truncated response model, 25 Weak perfect Bayesian equilibrium, 36 TSLS (Three-Stage Least Squares), 22 Weak stationarity, 15 TSLS (Two-Stage Least Squares), 21 Weakly dominated strategy, 35 TU (transferable utility) game, 38 Weakly increasing differences, 30 TVC (transversality condition), 41 Weighted least squares, 19

56 White noise, 16 White’s standard error, 18 WID (weakly increasing differences), 30 Wiener process, 24 WLS (Weighted Least Squares), 19 WPBE (weak perfect Bayesian equilibrium), 36

X¯ (sample mean), 9

Zermelo’s Theorem, 35 0-1 loss, 14

57 58 Standard normal critical values (two-tailed: e.g., 5% of observations from N(0,1) will fall outside ±1.96) (one-tailed: e.g., 5% of observations from N(0,1) will fall above 1.64)

Significance 50% 25% 10% 5% 2.5% 1% 0.5% 0.25% 0.1% 0.05% 0.025% 0.01% two-tailed 0.67 1.15 1.64 1.96 2.24 2.58 2.81 3.02 3.29 3.48 3.66 3.89 one-tailed (0.00) 0.67 1.28 1.64 1.96 2.33 2.58 2.81 3.09 3.29 3.48 3.72

Chi-squared critical values (one-tailed: e.g., 5% of observations from χ2 (1) will exceed 3.84)

Significance df 50% 25% 10% 5% 2.5% 1% 0.5% 0.25% 0.1% 0.05% 0.025% 0.01% 1 0.45 1.32 2.71 3.84 5.02 6.63 7.88 9.14 10.83 12.12 13.41 15.14 2 1.39 2.77 4.61 5.99 7.38 9.21 10.60 11.98 13.82 15.20 16.59 18.42 3 2.37 4.11 6.25 7.81 9.35 11.34 12.84 14.32 16.27 17.73 19.19 21.11 4 3.36 5.39 7.78 9.49 11.14 13.28 14.86 16.42 18.47 20.00 21.52 23.51 5 4.35 6.63 9.24 11.07 12.83 15.09 16.75 18.39 20.52 22.11 23.68 25.74 6 5.35 7.84 10.64 12.59 14.45 16.81 18.55 20.25 22.46 24.10 25.73 27.86 7 6.35 9.04 12.02 14.07 16.01 18.48 20.28 22.04 24.32 26.02 27.69 29.88 8 7.34 10.22 13.36 15.51 17.53 20.09 21.95 23.77 26.12 27.87 29.59 31.83 9 8.34 11.39 14.68 16.92 19.02 21.67 23.59 25.46 27.88 29.67 31.43 33.72 10 9.34 12.55 15.99 18.31 20.48 23.21 25.19 27.11 29.59 31.42 33.22 35.56 11 10.34 13.70 17.28 19.68 21.92 24.72 26.76 28.73 31.26 33.14 34.98 37.37 12 11.34 14.85 18.55 21.03 23.34 26.22 28.30 30.32 32.91 34.82 36.70 39.13 13 12.34 15.98 19.81 22.36 24.74 27.69 29.82 31.88 34.53 36.48 38.39 40.87 14 13.34 17.12 21.06 23.68 26.12 29.14 31.32 33.43 36.12 38.11 40.06 42.58 15 14.34 18.25 22.31 25.00 27.49 30.58 32.80 34.95 37.70 39.72 41.70 44.26 16 15.34 19.37 23.54 26.30 28.85 32.00 34.27 36.46 39.25 41.31 43.32 45.92 17 16.34 20.49 24.77 27.59 30.19 33.41 35.72 37.95 40.79 42.88 44.92 47.57 18 17.34 21.60 25.99 28.87 31.53 34.81 37.16 39.42 42.31 44.43 46.51 49.19 59 19 18.34 22.72 27.20 30.14 32.85 36.19 38.58 40.88 43.82 45.97 48.08 50.80 20 19.34 23.83 28.41 31.41 34.17 37.57 40.00 42.34 45.31 47.50 49.63 52.39 21 20.34 24.93 29.62 32.67 35.48 38.93 41.40 43.78 46.80 49.01 51.17 53.96 22 21.34 26.04 30.81 33.92 36.78 40.29 42.80 45.20 48.27 50.51 52.70 55.52 23 22.34 27.14 32.01 35.17 38.08 41.64 44.18 46.62 49.73 52.00 54.22 57.07 24 23.34 28.24 33.20 36.42 39.36 42.98 45.56 48.03 51.18 53.48 55.72 58.61 25 24.34 29.34 34.38 37.65 40.65 44.31 46.93 49.44 52.62 54.95 57.22 60.14 26 25.34 30.43 35.56 38.89 41.92 45.64 48.29 50.83 54.05 56.41 58.70 61.66 27 26.34 31.53 36.74 40.11 43.19 46.96 49.64 52.22 55.48 57.86 60.18 63.16 28 27.34 32.62 37.92 41.34 44.46 48.28 50.99 53.59 56.89 59.30 61.64 64.66 29 28.34 33.71 39.09 42.56 45.72 49.59 52.34 54.97 58.30 60.73 63.10 66.15 30 29.34 34.80 40.26 43.77 46.98 50.89 53.67 56.33 59.70 62.16 64.56 67.63 31 30.34 35.89 41.42 44.99 48.23 52.19 55.00 57.69 61.10 63.58 66.00 69.11 32 31.34 36.97 42.58 46.19 49.48 53.49 56.33 59.05 62.49 65.00 67.44 70.57 33 32.34 38.06 43.75 47.40 50.73 54.78 57.65 60.39 63.87 66.40 68.87 72.03 34 33.34 39.14 44.90 48.60 51.97 56.06 58.96 61.74 65.25 67.80 70.29 73.48 35 34.34 40.22 46.06 49.80 53.20 57.34 60.27 63.08 66.62 69.20 71.71 74.93 36 35.34 41.30 47.21 51.00 54.44 58.62 61.58 64.41 67.99 70.59 73.12 76.36 37 36.34 42.38 48.36 52.19 55.67 59.89 62.88 65.74 69.35 71.97 74.52 77.80 38 37.34 43.46 49.51 53.38 56.90 61.16 64.18 67.06 70.70 73.35 75.92 79.22 39 38.34 44.54 50.66 54.57 58.12 62.43 65.48 68.38 72.05 74.73 77.32 80.65 40 39.34 45.62 51.81 55.76 59.34 63.69 66.77 69.70 73.40 76.09 78.71 82.06 41 40.34 46.69 52.95 56.94 60.56 64.95 68.05 71.01 74.74 77.46 80.09 83.47 42 41.34 47.77 54.09 58.12 61.78 66.21 69.34 72.32 76.08 78.82 81.47 84.88 43 42.34 48.84 55.23 59.30 62.99 67.46 70.62 73.62 77.42 80.18 82.85 86.28 44 43.34 49.91 56.37 60.48 64.20 68.71 71.89 74.93 78.75 81.53 84.22 87.68 45 44.34 50.98 57.51 61.66 65.41 69.96 73.17 76.22 80.08 82.88 85.59 89.07 46 45.34 52.06 58.64 62.83 66.62 71.20 74.44 77.52 81.40 84.22 86.95 90.46 47 46.34 53.13 59.77 64.00 67.82 72.44 75.70 78.81 82.72 85.56 88.31 91.84 48 47.34 54.20 60.91 65.17 69.02 73.68 76.97 80.10 84.04 86.90 89.67 93.22 49 48.33 55.27 62.04 66.34 70.22 74.92 78.23 81.38 85.35 88.23 91.02 94.60 50 49.33 56.33 63.17 67.50 71.42 76.15 79.49 82.66 86.66 89.56 92.37 95.97 Standard normal critical values

% of obs. from N(0,1)… % of obs. from N(0,1)… % of obs. from N(0,1)… % of obs. from N(0,1)… x above x outside ±x x above x outside ±x x above x outside ±x x above x outside ±x 0.00 50.0% 100.0% 1.00 15.9% 31.7% 2.00 2.3% 4.6% 3.00 0.13% 0.27% 0.02 49.2% 98.4% 1.02 15.4% 30.8% 2.02 2.2% 4.3% 3.02 0.13% 0.25% 0.04 48.4% 96.8% 1.04 14.9% 29.8% 2.04 2.1% 4.1% 3.04 0.12% 0.24% 0.06 47.6% 95.2% 1.06 14.5% 28.9% 2.06 2.0% 3.9% 3.06 0.11% 0.22% 0.08 46.8% 93.6% 1.08 14.0% 28.0% 2.08 1.9% 3.8% 3.08 0.10% 0.21% 0.10 46.0% 92.0% 1.10 13.6% 27.1% 2.10 1.8% 3.6% 3.10 0.097% 0.19% 0.12 45.2% 90.4% 1.12 13.1% 26.3% 2.12 1.7% 3.4% 3.12 0.090% 0.18% 0.14 44.4% 88.9% 1.14 12.7% 25.4% 2.14 1.6% 3.2% 3.14 0.084% 0.17% 0.16 43.6% 87.3% 1.16 12.3% 24.6% 2.16 1.5% 3.1% 3.16 0.079% 0.16% 0.18 42.9% 85.7% 1.18 11.9% 23.8% 2.18 1.5% 2.9% 3.18 0.074% 0.15% 0.20 42.1% 84.1% 1.20 11.5% 23.0% 2.20 1.4% 2.8% 3.20 0.069% 0.14% 0.22 41.3% 82.6% 1.22 11.1% 22.2% 2.22 1.3% 2.6% 3.22 0.064% 0.13% 0.24 40.5% 81.0% 1.24 10.7% 21.5% 2.24 1.3% 2.5% 3.24 0.060% 0.12% 0.26 39.7% 79.5% 1.26 10.4% 20.8% 2.26 1.2% 2.4% 3.26 0.056% 0.11% 0.28 39.0% 77.9% 1.28 10.0% 20.1% 2.28 1.1% 2.3% 3.28 0.052% 0.10% 0.30 38.2% 76.4% 1.30 9.7% 19.4% 2.30 1.1% 2.1% 3.30 0.048% 0.097% 0.32 37.4% 74.9% 1.32 9.3% 18.7% 2.32 1.0% 2.0% 3.32 0.045% 0.090% 0.34 36.7% 73.4% 1.34 9.0% 18.0% 2.34 0.96% 1.9% 3.34 0.042% 0.084% 0.36 35.9% 71.9% 1.36 8.7% 17.4% 2.36 0.91% 1.8% 3.36 0.039% 0.078% 0.38 35.2% 70.4% 1.38 8.4% 16.8% 2.38 0.87% 1.7% 3.38 0.036% 0.072% 0.40 34.5% 68.9% 1.40 8.1% 16.2% 2.40 0.82% 1.6% 3.40 0.034% 0.067% 0.42 33.7% 67.4% 1.42 7.8% 15.6% 2.42 0.78% 1.6% 3.42 0.031% 0.063% 0.44 33.0% 66.0% 1.44 7.5% 15.0% 2.44 0.73% 1.5% 3.44 0.029% 0.058% 0.46 32.3% 64.6% 1.46 7.2% 14.4% 2.46 0.69% 1.4% 3.46 0.027% 0.054% 0.48 31.6% 63.1% 1.48 6.9% 13.9% 2.48 0.66% 1.3% 3.48 0.025% 0.050% 0.50 30.9% 61.7% 1.50 6.7% 13.4% 2.50 0.62% 1.2% 3.50 0.023% 0.047% 0.52 30.2% 60.3% 1.52 6.4% 12.9% 2.52 0.59% 1.2% 3.52 0.022% 0.043% 0.54 29.5% 58.9% 1.54 6.2% 12.4% 2.54 0.55% 1.1% 3.54 0.020% 0.040%

60 0.56 28.8% 57.5% 1.56 5.9% 11.9% 2.56 0.52% 1.0% 3.56 0.019% 0.037% 0.58 28.1% 56.2% 1.58 5.7% 11.4% 2.58 0.49% 0.99% 3.58 0.017% 0.034% 0.60 27.4% 54.9% 1.60 5.5% 11.0% 2.60 0.47% 0.93% 3.60 0.016% 0.032% 0.62 26.8% 53.5% 1.62 5.3% 10.5% 2.62 0.44% 0.88% 3.62 0.015% 0.029% 0.64 26.1% 52.2% 1.64 5.1% 10.1% 2.64 0.41% 0.83% 3.64 0.014% 0.027% 0.66 25.5% 50.9% 1.66 4.8% 9.7% 2.66 0.39% 0.78% 3.66 0.013% 0.025% 0.68 24.8% 49.7% 1.68 4.6% 9.3% 2.68 0.37% 0.74% 3.68 0.012% 0.023% 0.70 24.2% 48.4% 1.70 4.5% 8.9% 2.70 0.35% 0.69% 3.70 0.011% 0.022% 0.72 23.6% 47.2% 1.72 4.3% 8.5% 2.72 0.33% 0.65% 3.72 0.0100% 0.020% 0.74 23.0% 45.9% 1.74 4.1% 8.2% 2.74 0.31% 0.61% 3.74 0.0092% 0.018% 0.76 22.4% 44.7% 1.76 3.9% 7.8% 2.76 0.29% 0.58% 3.76 0.0085% 0.017% 0.78 21.8% 43.5% 1.78 3.8% 7.5% 2.78 0.27% 0.54% 3.78 0.0078% 0.016% 0.80 21.2% 42.4% 1.80 3.6% 7.2% 2.80 0.26% 0.51% 3.80 0.0072% 0.014% 0.82 20.6% 41.2% 1.82 3.4% 6.9% 2.82 0.24% 0.48% 3.82 0.0067% 0.013% 0.84 20.0% 40.1% 1.84 3.3% 6.6% 2.84 0.23% 0.45% 3.84 0.0062% 0.012% 0.86 19.5% 39.0% 1.86 3.1% 6.3% 2.86 0.21% 0.42% 3.86 0.0057% 0.011% 0.88 18.9% 37.9% 1.88 3.0% 6.0% 2.88 0.20% 0.40% 3.88 0.0052% 0.010% 0.90 18.4% 36.8% 1.90 2.9% 5.7% 2.90 0.19% 0.37% 3.90 0.0048% 0.0096% 0.92 17.9% 35.8% 1.92 2.7% 5.5% 2.92 0.18% 0.35% 3.92 0.0044% 0.0089% 0.94 17.4% 34.7% 1.94 2.6% 5.2% 2.94 0.16% 0.33% 3.94 0.0041% 0.0081% 0.96 16.9% 33.7% 1.96 2.5% 5.0% 2.96 0.15% 0.31% 3.96 0.0037% 0.0075% 0.98 16.4% 32.7% 1.98 2.4% 4.8% 2.98 0.14% 0.29% 3.98 0.0034% 0.0069% Standard normal critical values % of observations from χ2(1)… 1 x above x x above x x above x x above x x above x x above x 0.000 100.0% 1.250 26.4% 2.500 11.4% 3.750 5.3% 5.000 2.5% 6.250 1.2% 0.025 87.4% 1.275 25.9% 2.525 11.2% 3.775 5.2% 5.025 2.5% 6.275 1.2% 0.050 82.3% 1.300 25.4% 2.550 11.0% 3.800 5.1% 5.050 2.5% 6.300 1.2% 0.075 78.4% 1.325 25.0% 2.575 10.9% 3.825 5.0% 5.075 2.4% 6.325 1.2% 0.100 75.2% 1.350 24.5% 2.600 10.7% 3.850 5.0% 5.100 2.4% 6.350 1.2% 0.125 72.4% 1.375 24.1% 2.625 10.5% 3.875 4.9% 5.125 2.4% 6.375 1.2% 0.150 69.9% 1.400 23.7% 2.650 10.4% 3.900 4.8% 5.150 2.3% 6.400 1.1% 0.175 67.6% 1.425 23.3% 2.675 10.2% 3.925 4.8% 5.175 2.3% 6.425 1.1% 0.200 65.5% 1.450 22.9% 2.700 10.0% 3.950 4.7% 5.200 2.3% 6.450 1.1% 0.225 63.5% 1.475 22.5% 2.725 9.9% 3.975 4.6% 5.225 2.2% 6.475 1.1% 0.250 61.7% 1.500 22.1% 2.750 9.7% 4.000 4.6% 5.250 2.2% 6.500 1.1% 0.275 60.0% 1.525 21.7% 2.775 9.6% 4.025 4.5% 5.275 2.2% 6.525 1.1% 0.300 58.4% 1.550 21.3% 2.800 9.4% 4.050 4.4% 5.300 2.1% 6.550 1.0% 0.325 56.9% 1.575 20.9% 2.825 9.3% 4.075 4.4% 5.325 2.1% 6.575 1.0% 0.350 55.4% 1.600 20.6% 2.850 9.1% 4.100 4.3% 5.350 2.1% 6.600 1.0% 0.375 54.0% 1.625 20.2% 2.875 9.0% 4.125 4.2% 5.375 2.0% 6.625 1.0% 0.400 52.7% 1.650 19.9% 2.900 8.9% 4.150 4.2% 5.400 2.0% 6.650 0.99% 0.425 51.4% 1.675 19.6% 2.925 8.7% 4.175 4.1% 5.425 2.0% 6.675 0.98% 0.450 50.2% 1.700 19.2% 2.950 8.6% 4.200 4.0% 5.450 2.0% 6.700 0.96% 0.475 49.1% 1.725 18.9% 2.975 8.5% 4.225 4.0% 5.475 1.9% 6.725 0.95% 0.500 48.0% 1.750 18.6% 3.000 8.3% 4.250 3.9% 5.500 1.9% 6.750 0.94% 0.525 46.9% 1.775 18.3% 3.025 8.2% 4.275 3.9% 5.525 1.9% 6.775 0.92% 0.550 45.8% 1.800 18.0% 3.050 8.1% 4.300 3.8% 5.550 1.8% 6.800 0.91% 0.575 44.8% 1.825 17.7% 3.075 8.0% 4.325 3.8% 5.575 1.8% 6.825 0.90% 0.600 43.9% 1.850 17.4% 3.100 7.8% 4.350 3.7% 5.600 1.8% 6.850 0.89% 0.625 42.9% 1.875 17.1% 3.125 7.7% 4.375 3.6% 5.625 1.8% 6.875 0.87% 0.650 42.0% 1.900 16.8% 3.150 7.6% 4.400 3.6% 5.650 1.7% 6.900 0.86% 61 0.675 41.1% 1.925 16.5% 3.175 7.5% 4.425 3.5% 5.675 1.7% 6.925 0.85% 0.700 40.3% 1.950 16.3% 3.200 7.4% 4.450 3.5% 5.700 1.7% 6.950 0.84% 0.725 39.5% 1.975 16.0% 3.225 7.3% 4.475 3.4% 5.725 1.7% 6.975 0.83% 0.750 38.6% 2.000 15.7% 3.250 7.1% 4.500 3.4% 5.750 1.6% 7.000 0.82% 0.775 37.9% 2.025 15.5% 3.275 7.0% 4.525 3.3% 5.775 1.6% 7.025 0.80% 0.800 37.1% 2.050 15.2% 3.300 6.9% 4.550 3.3% 5.800 1.6% 7.050 0.79% 0.825 36.4% 2.075 15.0% 3.325 6.8% 4.575 3.2% 5.825 1.6% 7.075 0.78% 0.850 35.7% 2.100 14.7% 3.350 6.7% 4.600 3.2% 5.850 1.6% 7.100 0.77% 0.875 35.0% 2.125 14.5% 3.375 6.6% 4.625 3.2% 5.875 1.5% 7.125 0.76% 0.900 34.3% 2.150 14.3% 3.400 6.5% 4.650 3.1% 5.900 1.5% 7.150 0.75% 0.925 33.6% 2.175 14.0% 3.425 6.4% 4.675 3.1% 5.925 1.5% 7.175 0.74% 0.950 33.0% 2.200 13.8% 3.450 6.3% 4.700 3.0% 5.950 1.5% 7.200 0.73% 0.975 32.3% 2.225 13.6% 3.475 6.2% 4.725 3.0% 5.975 1.5% 7.225 0.72% 1.000 31.7% 2.250 13.4% 3.500 6.1% 4.750 2.9% 6.000 1.4% 7.250 0.71% 1.025 31.1% 2.275 13.1% 3.525 6.0% 4.775 2.9% 6.025 1.4% 7.275 0.70% 1.050 30.6% 2.300 12.9% 3.550 6.0% 4.800 2.8% 6.050 1.4% 7.300 0.69% 1.075 30.0% 2.325 12.7% 3.575 5.9% 4.825 2.8% 6.075 1.4% 7.325 0.68% 1.100 29.4% 2.350 12.5% 3.600 5.8% 4.850 2.8% 6.100 1.4% 7.350 0.67% 1.125 28.9% 2.375 12.3% 3.625 5.7% 4.875 2.7% 6.125 1.3% 7.375 0.66% 1.150 28.4% 2.400 12.1% 3.650 5.6% 4.900 2.7% 6.150 1.3% 7.400 0.65% 1.175 27.8% 2.425 11.9% 3.675 5.5% 4.925 2.6% 6.175 1.3% 7.425 0.64% 1.200 27.3% 2.450 11.8% 3.700 5.4% 4.950 2.6% 6.200 1.3% 7.450 0.63% 1.225 26.8% 2.475 11.6% 3.725 5.4% 4.975 2.6% 6.225 1.3% 7.475 0.63% Standard normal critical values % of observations from χ2(2)… 2 x above x x above x x above x x above x x above x x above x 0.00 100.0% 2.50 28.7% 5.00 8.2% 7.50 2.4% 10.00 0.67% 12.50 0.19% 0.05 97.5% 2.55 27.9% 5.05 8.0% 7.55 2.3% 10.05 0.66% 12.55 0.19% 0.10 95.1% 2.60 27.3% 5.10 7.8% 7.60 2.2% 10.10 0.64% 12.60 0.18% 0.15 92.8% 2.65 26.6% 5.15 7.6% 7.65 2.2% 10.15 0.63% 12.65 0.18% 0.20 90.5% 2.70 25.9% 5.20 7.4% 7.70 2.1% 10.20 0.61% 12.70 0.17% 0.25 88.2% 2.75 25.3% 5.25 7.2% 7.75 2.1% 10.25 0.59% 12.75 0.17% 0.30 86.1% 2.80 24.7% 5.30 7.1% 7.80 2.0% 10.30 0.58% 12.80 0.17% 0.35 83.9% 2.85 24.1% 5.35 6.9% 7.85 2.0% 10.35 0.57% 12.85 0.16% 0.40 81.9% 2.90 23.5% 5.40 6.7% 7.90 1.9% 10.40 0.55% 12.90 0.16% 0.45 79.9% 2.95 22.9% 5.45 6.6% 7.95 1.9% 10.45 0.54% 12.95 0.15% 0.50 77.9% 3.00 22.3% 5.50 6.4% 8.00 1.8% 10.50 0.52% 13.00 0.15% 0.55 76.0% 3.05 21.8% 5.55 6.2% 8.05 1.8% 10.55 0.51% 13.05 0.15% 0.60 74.1% 3.10 21.2% 5.60 6.1% 8.10 1.7% 10.60 0.50% 13.10 0.14% 0.65 72.3% 3.15 20.7% 5.65 5.9% 8.15 1.7% 10.65 0.49% 13.15 0.14% 0.70 70.5% 3.20 20.2% 5.70 5.8% 8.20 1.7% 10.70 0.47% 13.20 0.14% 0.75 68.7% 3.25 19.7% 5.75 5.6% 8.25 1.6% 10.75 0.46% 13.25 0.13% 0.80 67.0% 3.30 19.2% 5.80 5.5% 8.30 1.6% 10.80 0.45% 13.30 0.13% 0.85 65.4% 3.35 18.7% 5.85 5.4% 8.35 1.5% 10.85 0.44% 13.35 0.13% 0.90 63.8% 3.40 18.3% 5.90 5.2% 8.40 1.5% 10.90 0.43% 13.40 0.12% 0.95 62.2% 3.45 17.8% 5.95 5.1% 8.45 1.5% 10.95 0.42% 13.45 0.12% 1.00 60.7% 3.50 17.4% 6.00 5.0% 8.50 1.4% 11.00 0.41% 13.50 0.12% 1.05 59.2% 3.55 16.9% 6.05 4.9% 8.55 1.4% 11.05 0.40% 13.55 0.11% 1.10 57.7% 3.60 16.5% 6.10 4.7% 8.60 1.4% 11.10 0.39% 13.60 0.11% 1.15 56.3% 3.65 16.1% 6.15 4.6% 8.65 1.3% 11.15 0.38% 13.65 0.11% 1.20 54.9% 3.70 15.7% 6.20 4.5% 8.70 1.3% 11.20 0.37% 13.70 0.11% 1.25 53.5% 3.75 15.3% 6.25 4.4% 8.75 1.3% 11.25 0.36% 13.75 0.10% 1.30 52.2% 3.80 15.0% 6.30 4.3% 8.80 1.2% 11.30 0.35% 13.80 0.10% 62 1.35 50.9% 3.85 14.6% 6.35 4.2% 8.85 1.2% 11.35 0.34% 13.85 0.098% 1.40 49.7% 3.90 14.2% 6.40 4.1% 8.90 1.2% 11.40 0.33% 13.90 0.096% 1.45 48.4% 3.95 13.9% 6.45 4.0% 8.95 1.1% 11.45 0.33% 13.95 0.093% 1.50 47.2% 4.00 13.5% 6.50 3.9% 9.00 1.1% 11.50 0.32% 14.00 0.091% 1.55 46.1% 4.05 13.2% 6.55 3.8% 9.05 1.1% 11.55 0.31% 14.05 0.089% 1.60 44.9% 4.10 12.9% 6.60 3.7% 9.10 1.1% 11.60 0.30% 14.10 0.087% 1.65 43.8% 4.15 12.6% 6.65 3.6% 9.15 1.0% 11.65 0.30% 14.15 0.085% 1.70 42.7% 4.20 12.2% 6.70 3.5% 9.20 1.01% 11.70 0.29% 14.20 0.083% 1.75 41.7% 4.25 11.9% 6.75 3.4% 9.25 0.98% 11.75 0.28% 14.25 0.080% 1.80 40.7% 4.30 11.6% 6.80 3.3% 9.30 0.96% 11.80 0.27% 14.30 0.078% 1.85 39.7% 4.35 11.4% 6.85 3.3% 9.35 0.93% 11.85 0.27% 14.35 0.077% 1.90 38.7% 4.40 11.1% 6.90 3.2% 9.40 0.91% 11.90 0.26% 14.40 0.075% 1.95 37.7% 4.45 10.8% 6.95 3.1% 9.45 0.89% 11.95 0.25% 14.45 0.073% 2.00 36.8% 4.50 10.5% 7.00 3.0% 9.50 0.87% 12.00 0.25% 14.50 0.071% 2.05 35.9% 4.55 10.3% 7.05 2.9% 9.55 0.84% 12.05 0.24% 14.55 0.069% 2.10 35.0% 4.60 10.0% 7.10 2.9% 9.60 0.82% 12.10 0.24% 14.60 0.068% 2.15 34.1% 4.65 9.8% 7.15 2.8% 9.65 0.80% 12.15 0.23% 14.65 0.066% 2.20 33.3% 4.70 9.5% 7.20 2.7% 9.70 0.78% 12.20 0.22% 14.70 0.064% 2.25 32.5% 4.75 9.3% 7.25 2.7% 9.75 0.76% 12.25 0.22% 14.75 0.063% 2.30 31.7% 4.80 9.1% 7.30 2.6% 9.80 0.74% 12.30 0.21% 14.80 0.061% 2.35 30.9% 4.85 8.8% 7.35 2.5% 9.85 0.73% 12.35 0.21% 14.85 0.060% 2.40 30.1% 4.90 8.6% 7.40 2.5% 9.90 0.71% 12.40 0.20% 14.90 0.058% 2.45 29.4% 4.95 8.4% 7.45 2.4% 9.95 0.69% 12.45 0.20% 14.95 0.057% Standard normal critical values % of observations from χ2(3)… 3 x above x x above x x above x x above x x above x x above x 0.00 100.0% 2.50 47.5% 5.00 17.2% 7.50 5.8% 10.00 1.9% 12.50 0.59% 0.05 99.7% 2.55 46.6% 5.05 16.8% 7.55 5.6% 10.05 1.8% 12.55 0.57% 0.10 99.2% 2.60 45.7% 5.10 16.5% 7.60 5.5% 10.10 1.8% 12.60 0.56% 0.15 98.5% 2.65 44.9% 5.15 16.1% 7.65 5.4% 10.15 1.7% 12.65 0.55% 0.20 97.8% 2.70 44.0% 5.20 15.8% 7.70 5.3% 10.20 1.7% 12.70 0.53% 0.25 96.9% 2.75 43.2% 5.25 15.4% 7.75 5.1% 10.25 1.7% 12.75 0.52% 0.30 96.0% 2.80 42.3% 5.30 15.1% 7.80 5.0% 10.30 1.6% 12.80 0.51% 0.35 95.0% 2.85 41.5% 5.35 14.8% 7.85 4.9% 10.35 1.6% 12.85 0.50% 0.40 94.0% 2.90 40.7% 5.40 14.5% 7.90 4.8% 10.40 1.5% 12.90 0.49% 0.45 93.0% 2.95 39.9% 5.45 14.2% 7.95 4.7% 10.45 1.5% 12.95 0.47% 0.50 91.9% 3.00 39.2% 5.50 13.9% 8.00 4.6% 10.50 1.5% 13.00 0.46% 0.55 90.8% 3.05 38.4% 5.55 13.6% 8.05 4.5% 10.55 1.4% 13.05 0.45% 0.60 89.6% 3.10 37.6% 5.60 13.3% 8.10 4.4% 10.60 1.4% 13.10 0.44% 0.65 88.5% 3.15 36.9% 5.65 13.0% 8.15 4.3% 10.65 1.4% 13.15 0.43% 0.70 87.3% 3.20 36.2% 5.70 12.7% 8.20 4.2% 10.70 1.3% 13.20 0.42% 0.75 86.1% 3.25 35.5% 5.75 12.4% 8.25 4.1% 10.75 1.3% 13.25 0.41% 0.80 84.9% 3.30 34.8% 5.80 12.2% 8.30 4.0% 10.80 1.3% 13.30 0.40% 0.85 83.7% 3.35 34.1% 5.85 11.9% 8.35 3.9% 10.85 1.3% 13.35 0.39% 0.90 82.5% 3.40 33.4% 5.90 11.7% 8.40 3.8% 10.90 1.2% 13.40 0.38% 0.95 81.3% 3.45 32.7% 5.95 11.4% 8.45 3.8% 10.95 1.2% 13.45 0.38% 1.00 80.1% 3.50 32.1% 6.00 11.2% 8.50 3.7% 11.00 1.2% 13.50 0.37% 1.05 78.9% 3.55 31.4% 6.05 10.9% 8.55 3.6% 11.05 1.1% 13.55 0.36% 1.10 77.7% 3.60 30.8% 6.10 10.7% 8.60 3.5% 11.10 1.1% 13.60 0.35% 1.15 76.5% 3.65 30.2% 6.15 10.5% 8.65 3.4% 11.15 1.1% 13.65 0.34% 1.20 75.3% 3.70 29.6% 6.20 10.2% 8.70 3.4% 11.20 1.1% 13.70 0.33% 1.25 74.1% 3.75 29.0% 6.25 10.0% 8.75 3.3% 11.25 1.0% 13.75 0.33% 1.30 72.9% 3.80 28.4% 6.30 9.8% 8.80 3.2% 11.30 1.0% 13.80 0.32% 63 1.35 71.7% 3.85 27.8% 6.35 9.6% 8.85 3.1% 11.35 1.00% 13.85 0.31% 1.40 70.6% 3.90 27.2% 6.40 9.4% 8.90 3.1% 11.40 0.97% 13.90 0.30% 1.45 69.4% 3.95 26.7% 6.45 9.2% 8.95 3.0% 11.45 0.95% 13.95 0.30% 1.50 68.2% 4.00 26.1% 6.50 9.0% 9.00 2.9% 11.50 0.93% 14.00 0.29% 1.55 67.1% 4.05 25.6% 6.55 8.8% 9.05 2.9% 11.55 0.91% 14.05 0.28% 1.60 65.9% 4.10 25.1% 6.60 8.6% 9.10 2.8% 11.60 0.89% 14.10 0.28% 1.65 64.8% 4.15 24.6% 6.65 8.4% 9.15 2.7% 11.65 0.87% 14.15 0.27% 1.70 63.7% 4.20 24.1% 6.70 8.2% 9.20 2.7% 11.70 0.85% 14.20 0.26% 1.75 62.6% 4.25 23.6% 6.75 8.0% 9.25 2.6% 11.75 0.83% 14.25 0.26% 1.80 61.5% 4.30 23.1% 6.80 7.9% 9.30 2.6% 11.80 0.81% 14.30 0.25% 1.85 60.4% 4.35 22.6% 6.85 7.7% 9.35 2.5% 11.85 0.79% 14.35 0.25% 1.90 59.3% 4.40 22.1% 6.90 7.5% 9.40 2.4% 11.90 0.77% 14.40 0.24% 1.95 58.3% 4.45 21.7% 6.95 7.4% 9.45 2.4% 11.95 0.76% 14.45 0.24% 2.00 57.2% 4.50 21.2% 7.00 7.2% 9.50 2.3% 12.00 0.74% 14.50 0.23% 2.05 56.2% 4.55 20.8% 7.05 7.0% 9.55 2.3% 12.05 0.72% 14.55 0.22% 2.10 55.2% 4.60 20.4% 7.10 6.9% 9.60 2.2% 12.10 0.70% 14.60 0.22% 2.15 54.2% 4.65 19.9% 7.15 6.7% 9.65 2.2% 12.15 0.69% 14.65 0.21% 2.20 53.2% 4.70 19.5% 7.20 6.6% 9.70 2.1% 12.20 0.67% 14.70 0.21% 2.25 52.2% 4.75 19.1% 7.25 6.4% 9.75 2.1% 12.25 0.66% 14.75 0.20% 2.30 51.3% 4.80 18.7% 7.30 6.3% 9.80 2.0% 12.30 0.64% 14.80 0.20% 2.35 50.3% 4.85 18.3% 7.35 6.2% 9.85 2.0% 12.35 0.63% 14.85 0.19% 2.40 49.4% 4.90 17.9% 7.40 6.0% 9.90 1.9% 12.40 0.61% 14.90 0.19% 2.45 48.4% 4.95 17.5% 7.45 5.9% 9.95 1.9% 12.45 0.60% 14.95 0.19% Standard normal critical values % of observations from χ2(4)… 4 x above x x above x x above x x above x x above x x above x 0.00 100.0% 2.50 64.5% 5.00 28.7% 7.50 11.2% 10.00 4.0% 12.50 1.4% 0.05 100.0% 2.55 63.6% 5.05 28.2% 7.55 11.0% 10.05 4.0% 12.55 1.4% 0.10 99.9% 2.60 62.7% 5.10 27.7% 7.60 10.7% 10.10 3.9% 12.60 1.3% 0.15 99.7% 2.65 61.8% 5.15 27.2% 7.65 10.5% 10.15 3.8% 12.65 1.3% 0.20 99.5% 2.70 60.9% 5.20 26.7% 7.70 10.3% 10.20 3.7% 12.70 1.3% 0.25 99.3% 2.75 60.0% 5.25 26.3% 7.75 10.1% 10.25 3.6% 12.75 1.3% 0.30 99.0% 2.80 59.2% 5.30 25.8% 7.80 9.9% 10.30 3.6% 12.80 1.2% 0.35 98.6% 2.85 58.3% 5.35 25.3% 7.85 9.7% 10.35 3.5% 12.85 1.2% 0.40 98.2% 2.90 57.5% 5.40 24.9% 7.90 9.5% 10.40 3.4% 12.90 1.2% 0.45 97.8% 2.95 56.6% 5.45 24.4% 7.95 9.3% 10.45 3.3% 12.95 1.2% 0.50 97.4% 3.00 55.8% 5.50 24.0% 8.00 9.2% 10.50 3.3% 13.00 1.1% 0.55 96.8% 3.05 54.9% 5.55 23.5% 8.05 9.0% 10.55 3.2% 13.05 1.1% 0.60 96.3% 3.10 54.1% 5.60 23.1% 8.10 8.8% 10.60 3.1% 13.10 1.1% 0.65 95.7% 3.15 53.3% 5.65 22.7% 8.15 8.6% 10.65 3.1% 13.15 1.1% 0.70 95.1% 3.20 52.5% 5.70 22.3% 8.20 8.5% 10.70 3.0% 13.20 1.0% 0.75 94.5% 3.25 51.7% 5.75 21.9% 8.25 8.3% 10.75 3.0% 13.25 1.0% 0.80 93.8% 3.30 50.9% 5.80 21.5% 8.30 8.1% 10.80 2.9% 13.30 0.99% 0.85 93.2% 3.35 50.1% 5.85 21.1% 8.35 8.0% 10.85 2.8% 13.35 0.97% 0.90 92.5% 3.40 49.3% 5.90 20.7% 8.40 7.8% 10.90 2.8% 13.40 0.95% 0.95 91.7% 3.45 48.6% 5.95 20.3% 8.45 7.6% 10.95 2.7% 13.45 0.93% 1.00 91.0% 3.50 47.8% 6.00 19.9% 8.50 7.5% 11.00 2.7% 13.50 0.91% 1.05 90.2% 3.55 47.0% 6.05 19.5% 8.55 7.3% 11.05 2.6% 13.55 0.89% 1.10 89.4% 3.60 46.3% 6.10 19.2% 8.60 7.2% 11.10 2.5% 13.60 0.87% 1.15 88.6% 3.65 45.5% 6.15 18.8% 8.65 7.0% 11.15 2.5% 13.65 0.85% 1.20 87.8% 3.70 44.8% 6.20 18.5% 8.70 6.9% 11.20 2.4% 13.70 0.83% 1.25 87.0% 3.75 44.1% 6.25 18.1% 8.75 6.8% 11.25 2.4% 13.75 0.81% 1.30 86.1% 3.80 43.4% 6.30 17.8% 8.80 6.6% 11.30 2.3% 13.80 0.80% 64 1.35 85.3% 3.85 42.7% 6.35 17.4% 8.85 6.5% 11.35 2.3% 13.85 0.78% 1.40 84.4% 3.90 42.0% 6.40 17.1% 8.90 6.4% 11.40 2.2% 13.90 0.76% 1.45 83.5% 3.95 41.3% 6.45 16.8% 8.95 6.2% 11.45 2.2% 13.95 0.75% 1.50 82.7% 4.00 40.6% 6.50 16.5% 9.00 6.1% 11.50 2.1% 14.00 0.73% 1.55 81.8% 4.05 39.9% 6.55 16.2% 9.05 6.0% 11.55 2.1% 14.05 0.71% 1.60 80.9% 4.10 39.3% 6.60 15.9% 9.10 5.9% 11.60 2.1% 14.10 0.70% 1.65 80.0% 4.15 38.6% 6.65 15.6% 9.15 5.7% 11.65 2.0% 14.15 0.68% 1.70 79.1% 4.20 38.0% 6.70 15.3% 9.20 5.6% 11.70 2.0% 14.20 0.67% 1.75 78.2% 4.25 37.3% 6.75 15.0% 9.25 5.5% 11.75 1.9% 14.25 0.65% 1.80 77.2% 4.30 36.7% 6.80 14.7% 9.30 5.4% 11.80 1.9% 14.30 0.64% 1.85 76.3% 4.35 36.1% 6.85 14.4% 9.35 5.3% 11.85 1.9% 14.35 0.63% 1.90 75.4% 4.40 35.5% 6.90 14.1% 9.40 5.2% 11.90 1.8% 14.40 0.61% 1.95 74.5% 4.45 34.9% 6.95 13.9% 9.45 5.1% 11.95 1.8% 14.45 0.60% 2.00 73.6% 4.50 34.3% 7.00 13.6% 9.50 5.0% 12.00 1.7% 14.50 0.59% 2.05 72.7% 4.55 33.7% 7.05 13.3% 9.55 4.9% 12.05 1.7% 14.55 0.57% 2.10 71.7% 4.60 33.1% 7.10 13.1% 9.60 4.8% 12.10 1.7% 14.60 0.56% 2.15 70.8% 4.65 32.5% 7.15 12.8% 9.65 4.7% 12.15 1.6% 14.65 0.55% 2.20 69.9% 4.70 31.9% 7.20 12.6% 9.70 4.6% 12.20 1.6% 14.70 0.54% 2.25 69.0% 4.75 31.4% 7.25 12.3% 9.75 4.5% 12.25 1.6% 14.75 0.52% 2.30 68.1% 4.80 30.8% 7.30 12.1% 9.80 4.4% 12.30 1.5% 14.80 0.51% 2.35 67.2% 4.85 30.3% 7.35 11.9% 9.85 4.3% 12.35 1.5% 14.85 0.50% 2.40 66.3% 4.90 29.8% 7.40 11.6% 9.90 4.2% 12.40 1.5% 14.90 0.49% 2.45 65.4% 4.95 29.2% 7.45 11.4% 9.95 4.1% 12.45 1.4% 14.95 0.48% Standard normal critical values % of observations from χ2(5)… 5 x above x x above x x above x x above x x above x x above x 0.00 100.0% 2.50 77.6% 5.00 41.6% 7.50 18.6% 10.00 7.5% 12.50 2.9% 0.05 100.0% 2.55 76.9% 5.05 41.0% 7.55 18.3% 10.05 7.4% 12.55 2.8% 0.10 100.0% 2.60 76.1% 5.10 40.4% 7.60 18.0% 10.10 7.2% 12.60 2.7% 0.15 100.0% 2.65 75.4% 5.15 39.8% 7.65 17.7% 10.15 7.1% 12.65 2.7% 0.20 99.9% 2.70 74.6% 5.20 39.2% 7.70 17.4% 10.20 7.0% 12.70 2.6% 0.25 99.8% 2.75 73.8% 5.25 38.6% 7.75 17.1% 10.25 6.8% 12.75 2.6% 0.30 99.8% 2.80 73.1% 5.30 38.0% 7.80 16.8% 10.30 6.7% 12.80 2.5% 0.35 99.7% 2.85 72.3% 5.35 37.5% 7.85 16.5% 10.35 6.6% 12.85 2.5% 0.40 99.5% 2.90 71.5% 5.40 36.9% 7.90 16.2% 10.40 6.5% 12.90 2.4% 0.45 99.4% 2.95 70.8% 5.45 36.3% 7.95 15.9% 10.45 6.3% 12.95 2.4% 0.50 99.2% 3.00 70.0% 5.50 35.8% 8.00 15.6% 10.50 6.2% 13.00 2.3% 0.55 99.0% 3.05 69.2% 5.55 35.2% 8.05 15.4% 10.55 6.1% 13.05 2.3% 0.60 98.8% 3.10 68.5% 5.60 34.7% 8.10 15.1% 10.60 6.0% 13.10 2.2% 0.65 98.6% 3.15 67.7% 5.65 34.2% 8.15 14.8% 10.65 5.9% 13.15 2.2% 0.70 98.3% 3.20 66.9% 5.70 33.7% 8.20 14.6% 10.70 5.8% 13.20 2.2% 0.75 98.0% 3.25 66.2% 5.75 33.1% 8.25 14.3% 10.75 5.7% 13.25 2.1% 0.80 97.7% 3.30 65.4% 5.80 32.6% 8.30 14.0% 10.80 5.5% 13.30 2.1% 0.85 97.4% 3.35 64.6% 5.85 32.1% 8.35 13.8% 10.85 5.4% 13.35 2.0% 0.90 97.0% 3.40 63.9% 5.90 31.6% 8.40 13.6% 10.90 5.3% 13.40 2.0% 0.95 96.6% 3.45 63.1% 5.95 31.1% 8.45 13.3% 10.95 5.2% 13.45 2.0% 1.00 96.3% 3.50 62.3% 6.00 30.6% 8.50 13.1% 11.00 5.1% 13.50 1.9% 1.05 95.8% 3.55 61.6% 6.05 30.1% 8.55 12.8% 11.05 5.0% 13.55 1.9% 1.10 95.4% 3.60 60.8% 6.10 29.7% 8.60 12.6% 11.10 4.9% 13.60 1.8% 1.15 95.0% 3.65 60.1% 6.15 29.2% 8.65 12.4% 11.15 4.8% 13.65 1.8% 1.20 94.5% 3.70 59.3% 6.20 28.7% 8.70 12.2% 11.20 4.8% 13.70 1.8% 1.25 94.0% 3.75 58.6% 6.25 28.3% 8.75 11.9% 11.25 4.7% 13.75 1.7% 1.30 93.5% 3.80 57.9% 6.30 27.8% 8.80 11.7% 11.30 4.6% 13.80 1.7% 65 1.35 93.0% 3.85 57.1% 6.35 27.4% 8.85 11.5% 11.35 4.5% 13.85 1.7% 1.40 92.4% 3.90 56.4% 6.40 26.9% 8.90 11.3% 11.40 4.4% 13.90 1.6% 1.45 91.9% 3.95 55.7% 6.45 26.5% 8.95 11.1% 11.45 4.3% 13.95 1.6% 1.50 91.3% 4.00 54.9% 6.50 26.1% 9.00 10.9% 11.50 4.2% 14.00 1.6% 1.55 90.7% 4.05 54.2% 6.55 25.6% 9.05 10.7% 11.55 4.2% 14.05 1.5% 1.60 90.1% 4.10 53.5% 6.60 25.2% 9.10 10.5% 11.60 4.1% 14.10 1.5% 1.65 89.5% 4.15 52.8% 6.65 24.8% 9.15 10.3% 11.65 4.0% 14.15 1.5% 1.70 88.9% 4.20 52.1% 6.70 24.4% 9.20 10.1% 11.70 3.9% 14.20 1.4% 1.75 88.3% 4.25 51.4% 6.75 24.0% 9.25 9.9% 11.75 3.8% 14.25 1.4% 1.80 87.6% 4.30 50.7% 6.80 23.6% 9.30 9.8% 11.80 3.8% 14.30 1.4% 1.85 86.9% 4.35 50.0% 6.85 23.2% 9.35 9.6% 11.85 3.7% 14.35 1.4% 1.90 86.3% 4.40 49.3% 6.90 22.8% 9.40 9.4% 11.90 3.6% 14.40 1.3% 1.95 85.6% 4.45 48.7% 6.95 22.4% 9.45 9.2% 11.95 3.5% 14.45 1.3% 2.00 84.9% 4.50 48.0% 7.00 22.1% 9.50 9.1% 12.00 3.5% 14.50 1.3% 2.05 84.2% 4.55 47.3% 7.05 21.7% 9.55 8.9% 12.05 3.4% 14.55 1.2% 2.10 83.5% 4.60 46.7% 7.10 21.3% 9.60 8.7% 12.10 3.3% 14.60 1.2% 2.15 82.8% 4.65 46.0% 7.15 21.0% 9.65 8.6% 12.15 3.3% 14.65 1.2% 2.20 82.1% 4.70 45.4% 7.20 20.6% 9.70 8.4% 12.20 3.2% 14.70 1.2% 2.25 81.4% 4.75 44.7% 7.25 20.3% 9.75 8.3% 12.25 3.2% 14.75 1.1% 2.30 80.6% 4.80 44.1% 7.30 19.9% 9.80 8.1% 12.30 3.1% 14.80 1.1% 2.35 79.9% 4.85 43.4% 7.35 19.6% 9.85 8.0% 12.35 3.0% 14.85 1.1% 2.40 79.1% 4.90 42.8% 7.40 19.3% 9.90 7.8% 12.40 3.0% 14.90 1.1% 2.45 78.4% 4.95 42.2% 7.45 18.9% 9.95 7.7% 12.45 2.9% 14.95 1.1%