Mathematical Methods for

Andreas M. Hefti

15. August 2014 Draft Introductory Maths course composed of three sequences:

Real analysis and maths for micro se- 1 quence: Andreas Hefti

Dynamic programming sequence (maths 2 for macro): Michelle Rendall

Probability theory sequence (maths for 3 econometrics): Marc Sommer

See http://www.econ.uzh.ch/dpe/courses.html for course material, dates and venues Draft Micro sequence consists of a lecture and an exercise session

Lecture:

18.08: 08:15 - 12:00, 14:00 - 15:30 @ KOL-E-21

19.08, 21.08, 22.08: 08:15 - 12:00 @ KOL-E-21

Exercises (Jean-Michel Benkert)

25.08 - 26.08: 08:30 - 12:00 @ KOL-E-21

27.08 - 28.08: 08:30 - 12:00 @ PLD-E-04 Draft Exam:

The exam (22.09. 10:00 - 11:30 @ KOL-H-312) is 90 minutes – 30 minutes for each the micro, macro and empirics sequence

The exam is closed book

For the micro sequence we expect you to have command over all definitions (≈ 30) as well as over the theorems (≈ 40), but not the propositions, lemmata, corollaries (of course, you still need to understand them)

While linear algebra is a course prerequisite, there will not be any specific questions in the exam Draft In this dense (crash) course you

1 study the main mathematical building blocks required by modern microeconomic and

2 are trained in their usage (proofs!)

Focus is on deriving the mathematical apparatus required to understand (develop) economic theory.

=⇒ We concentrate more on (abstract) tools and less on their applications (they will come in the core courses). Draft Overview (Micro sequence):

1 Preliminaries: Mathematical statements; implication; proof techniques; sets; functions; relations

2 Topology: Metric spaces; norms; sequences; convergence; open, closed, compact, convex sets

3 Continuity (Lipschitz and uniform); uniform convergence

4 Differentiability; directional and partial derivatives

5 Implicit theorem

6 Homogeneous functions; (quasi) concavity /convexity Draft 7 Constrained and unconstrained optimization; Lagrange’s method; Constraint qualification; Envelope theorem

8 Uni- and multivariate (Riemann) Integration; fundamental theorem; convergence theorem; techniques; Leibnitz rule; Fubini’s theorem

9 Correspondences: closed graph; upper/lower hemi-continuity

10 Fixpoint theory: Brouwer; Kakutani; Tarski; Banach

11 : Hyperplanes; supporting and separating hyperplane theorems

12 (Course prerequisite): Linear Algebra Draft 1. Preliminaries

Draft 1.1. Mathematical statements

A statement is a sentence“ that can either be true or ” false, but never both (“tertium non datur”) Examples: 4 > 2 What about: I am lying now“ ” What about I always lie“ ” Let A be a statement. Then ¬A ( not A“) also is a ” statement. ¬A is true (false) if A is false (true). Example: A =“Every student loves micro“. Hence ¬A = No student ” loves micro“. Correct? Draft Suppose A and B are statements. Then a new statement, usually called the implication, can be defined: A B A ∧ B A ∨ B A ⇒ B T T T T T F T F T T F F F F T T F F T F If A ⇒ B is true, then A is sufficient for (the truth of) B and B is necessary for (the truth of) A.

A ⇒ B :⇔ ¬A ∨ B A =“Madonna is a man“, B =“Snow is black“, A ⇒ B true? The truth of the implication depends ONLY on the truth of the involved partial statements!

If A is both necessary and sufficient for B, then A ⇔ B ( A is equivalent ” to B“). A = X is yellow“. B = X is a lemon“. A and B equivalent? ” ” Draft There are three ways to prove the truth of A ⇒ B.

1 Direct proof Uses ((A ⇒ C) ∧ (C ⇒ B)) ⇒ (A ⇒ B). Show that A ⇒ C and C ⇒ B are true.

2 Proof by contraposition Prove ¬B ⇒ ¬A.

3 Indirect Proof (by contradiction) Assume that B is false but A is true. Show that A and ¬B together imply C, but it is already known that C is false. Hence ¬B must be false if A true. Thus A implies B. Draft 1.2. Sets

(Georg Cantor) A set is an unordered collection of distinct items, called elements. For a 6= b: {a, b} = {b, a} = {a, b, a} N = {1, 2, 3, ...} {n ∈ N :n divides 15} = {1, 3, 5} ∅ ≡ {x ∈ X : x 6= x} empty set (of X)

(Bertrand Russell) Is the following recursive definition a set? Let M be the set of all sets that do not contain themselves as elements.

Note: Neither sets“ nor elements“ are actually ” ” well-defined objects in . Only allowed operations ( calculus”) are relevant! ” Draft Definition 1.1 (Quantors) Let M denote a set and E is a property, which any x ∈ M either satisfies or not. 1 ∀ (all quantor) ∀x ∈ M : E(x) ⇔ {x ∈ M : E(x)} = M 2 ∃ (existence quantor) ∃x ∈ M : E(x) ⇔ {x ∈ M : E(x)}= 6 ∅

Proposition 1.1 1 ¬(∀x ∈ M : E(x)) ⇔ (∃x ∈ M : ¬E(x)) 2 ¬(∃x ∈ M : E(x)) ⇔ (∀x ∈ M : ¬E(x))

Negation: revert“ quantors and junctors (keep the order)! ” (∀x∃y : E(x, y)) and (∃y∀x : E(x, y)) are different statements! Example: E(x, y) = Proposition y is trivial to student x“ ” Draft X ⊂ Y ≡ ∀x ∈ X : x ∈ Y X = Y ⇔ (X ⊂ Y ) ∧ (Y ⊂ X) Let A, B ⊂ X Intersection: A ∩ B ≡ {x ∈ X :(x ∈ A) ∧ (x ∈ B)} Union: A ∪ B ≡ {x ∈ X :(x ∈ A) ∨ (x ∈ B)} Difference: A\B ≡ {x ∈ X :(x ∈ A) ∧ (x∈ / B)} Complement: Ac ≡ X\A Power set: P(X) = {Y : Y ⊂ X} ∅X ≡ {x ∈ X : x 6= x} Use Venn-diagram for illustrations (but not for proofs)

Proposition 1.2 Let X,Y be two sets.

1 The elements of ∅X have ANY property

2 ∅X = ∅Y ≡ ∅ (empty set is unique) 3 ∅ ⊂ X,Y Is P(∅) = P({∅})? Draft Proposition 1.3 (DeMorgan Laws)

Let {Ai : i ∈ I} be a family of subsets of a set. Then:

1 T c S c ( i Ai) = i Ai 2 S c T c ( i Ai) = i Ai

Draft Two objects a, b together form a new object (a, b), the (ordered) pair

If X,Y are sets, then the (cartesian) product X × Y is a set, the set of all ordered pair (x, y), x ∈ X and y ∈ Y .

n Q Similarly X1 × ... × Xn ≡ Xj is a new set, the product j=1 set.

n Q x ∈ Xj is written as (x1, ..., xn). j=1

xj is the j-th component of x, sometimes xj = prj(x) (j-th projection of x) Draft 1.3. Functions

Definition 1.2 (Function) X,Y are sets. A function (mapping) f from X to Y is a rule that assigns to each element of X exactly one element of Y .

A function is specified by its domain (X), its codomain (Y ) and its mapping rule f, notation: f : X → Y, x 7→ f(x)

Two functions f, g are equal (f = g) if the have the same domain, codomain and f(x) = g(x), x ∈ X

Image of f: im(f) = f(X) = {y ∈ Y : ∃x ∈ X : y = f(x)} Draft Let f : X → Y be a function.

f is injective (or one-to-one) if x 6= y ⇒ f(x) 6= f(y) f is surjective (or onto) if im(f) = Y f is bijective if f is both one-to-one and onto Inverse image: f −1(C) = {x ∈ X : f(x) ∈ C}, C ⊂ Y Composition: If f : X → U and g : U → Y then g ◦ f : X → Y , x 7→ g(f(x))

Examples of functions:

Identity: idX : X → X ,x 7→ x  x x ≥ 0 Absolute value: |·| : → [0, ∞) ,|x| ≡ R −x x < 0

(Infinite) Sequence: ϕ : N → X is a sequence (in X), often just denoted by (xn), ϕ(n) = xn Draft Proposition 1.4 For C ⊂ Y we have f(f −1(C)) ⊂ C. Moreover, f is bijective if and only if (iff) ∃ a mapping g : Y → X with g ◦ f = idX and f ◦ g = idY . Then g is uniquely determined.

If f is bijective then the inverse function f −1 is the uniquely −1 −1 determined mapping f : Y → X with f(f ) = idY and −1 f (f) = idX

Proposition 1.5 Let f : X → Y , A, B ⊂ X and C,D ⊂ Y .

1 f −1(C ∪ D) = f −1(C) ∪ f −1(D) 2 f −1(C ∩ D) = f −1(C) ∩ f −1(D) 3 f(A ∪ B) = f(A) ∪ f(B) 4 f(A ∩ B) ⊂ f(A) ∩ f(B) Draft 1.4. Relations

Frequently, we care about how the elements of a set X relate to each other. For example: order relation >, or relation . A (binary) relation on X is any subset R ⊂ X × X. Notation: (x, y) ∈ R denoted as xRy. Further R is reflexive if xRx, x ∈ X transitive if (xRy) ∧ (yRz) ⇒ (xRz) symmetric if xRy ⇒ yRx A relation R on X is called an equivalence relation if R is reflexive, transitive and symmetric. Frequently, equivalence relations are denoted by ∼ For any x ∈ X the set [x] ≡ {y ∈ X : y ∼ x} is the equivalence class of x. Example: X is population of Zurich. Let x ∼ y if x and y have the same parents. Draft A relation ≤ is called order on X if ≤ is reflexive, transitive and antisymmetric, i.e. (x ≤ y) ∧ (y ≤ x) ⇒ (y = x)

If ≤ is an order on X, then the pair (X, ≤) is a (partially) ordered set. If ∀x, y ∈ X :(x ≤ y) ∨ (y ≤ x) then X is a totally ordered (or linearly ordered) set.

Qualify the following relations

⊂ between subsets of a set

n 0 0 The relation ≥ on R defined by x ≥ x if xj ≥ xj for all j = 1, ..., n. Draft 1.5. The completeness axiom

(R, ≤) is a totally ordered set that, other than e.g. Q additionally satisfies the completeness axiom: For any nonempty A, B ⊂ R with a ≤ b ∀a ∈ A, b ∈ B ∃ c ∈ R such that a ≤ c ≤ b for a ∈ A, b ∈ B A subset A ⊂ R is bounded from above if ∃b ∈ R: a ≤ b ∀a ∈ A. Every such b is called upper bound of A. A subset A ⊂ R is bounded from below if ∃b ∈ R: a ≥ b ∀a ∈ A. Every such b is called lower bound of A Draft The smallest upper bound of A ⊂ R is called supremum of A, sup(A). The largest lower bound of A ⊂ R is called infimum of A, inf(A). Note that sup(A), inf(A) ∈/ A possible. If sup(A), inf(A) ∈ A, then we write sup(A) = max(A), inf(A) = min(A). It follows from the completeness axiom that 1 Every nonempty A ⊂ R bounded from above has a supremum. 2 Every nonempty A ⊂ R bounded from below has an infimum. 3 If it exists, sup(A), inf(A) are uniquely determined. Draft When working with sup and inf the following characterization is mostly useful:

Theorem 1.1 (A characterization of sup and inf) Consider a nonempty set A ⊂ R. Then 1 x < sup(A) ⇔ ∃ a ∈ A: x < a 2 x > inf(A) ⇔ ∃ a ∈ A: x > a

Provided the sup exists, the first statement in the theorem can be reformulated as: ∀ε > 0 ∃a ∈ A: a + ε > sup(A) Draft 2. Basic topology

2.1. Metric spaces

Definition 2.3 (Metric) Let X be a set. A function d : X × X → [0, ∞) is a metric on X if 1 d(x, y) = 0 ⇔ x = y 2 d(x, y) = d(y, x) (Symmetry) 3 d(x, y) ≤ d(x, z) + d(z, y) (Triangle inequality)

1. - 3. are natural“ for measuring a distance. ” If d is a metric on X, (X, d) is a metric space. Subsets of metric spaces are again metric spaces (with the induced metric). Draft Definition 2.4 (Norm) n n n Let X = R . A function k·k : R → [0, ∞) is a norm on R if 1 kxk = 0 ⇔ x = 0 2 kλxk = |λ| kxk, λ ∈ R 3 kx + yk ≤ kxk + kyk

n P Examples: |x|, kxk ≡ |xi| i=1

n (R , k·k) is a normed (vector) space.

As (x, y) 7→ kx − yk is a metric (induced by a norm) any normed is also a metric space. Draft Definition 2.5 (Ball) (X, d) is a metric space. For a ∈ X and r > 0 the set BX (a, r) ≡ {x ∈ X : d(a, x) < r} is called the (open) Ball around a with radius r.

Definition 2.6 (Neighborhood) A subset U ⊂ X of a metric space is a neighborhood of a ∈ X if ∃r > 0 with B(a, r) ⊂ U

Balls are special neighborhoods

Theorem 2.2 (Hausdorff-Property of metric spaces) Let x, y be two points in a metric space with x 6= y. Then ∃ neigborhoods U of x and V of y such that U ∩ V = ∅.

Hence two different points always have disjoint neigborhoods! Draft Definition 2.7 (Convergence)

A sequence (xn) in X is convergent with limit a if for any neighborhood U ⊂ X of a ∃N: xn ∈ U for n ≥ N. Then we write n→∞ lim xn = a or xn −→ a. n→∞

Theorem 2.3 The following statements are equivalent

1 lim xn = a n→∞ 2 ∀ε > 0 ∃N: d(xn, a) < ε for n ≥ N

Example: d(x, y) = |x − y|, xn = 1/n has lim 1/n = 0. n→∞ 1 Note that 0 ∈/ n Draft If (xn) is a sequence we can construct a new sequence (xnk ) by cancelling“ some members of (x ). (x ) is called a ” n nk subsequence of (xn).

Theorem 2.4 (Characterization of convergence)

A sequence (xn) converges to a if and only if any subsequence

(xnk ) has a further subsequence that converges to a.

Theorem 2.5 The limit of a convergent sequence is unique.

n Remark: For (R , k·k), where k·k is a norm, convergence of a sequence is invariant to the specific choice of norm. Reason: On Rn all norms are equivalent (see exercises) and generate the same neighborhoods (but not the same balls) n Consequence: A vector sequence (xn) in R converges iff it converges componentwise. Draft 2.2. Topological notions

n In the following consider the normed space X = (R , k·k).

n A point a ∈ A ⊂ R is an interior point of A if ∃ neighborhood U of a with U ⊂ A.

A is anopen set if every point of A is interior. Example: B(a, r) is an open set.

A is a closed set if Ac is open.

Int(A) = {a ∈ A : a interior point of A} is interior of A

Examples Intervals on the real line X is both open and closed (clopen) Draft Theorem 2.6 n Let I be an index set and Oi ⊂ R S 1 If Oi open ∀i ∈ I then Oi is open. i∈I n T 2 If O1, ..., On are open, then Oi is open. i=1

What about closed sets? DeMorgan rules imply: Arbitrary intersections of closed sets are closed. Finite unions of closed sets are closed.

Definition 2.8 (Boundary) n For A ⊂ R a point x ∈ X is a boundary point of A if any neighborhood U of a contains points both from A and Ac. The set of all boundary points of A is denoted by ∂A. Draft The set A¯ = ∂A ∪ A is the closure of A (or the closed hull of A). Consequence: ∂A = A¯\Int(A). Moreover: ∂A closed.

Theorem 2.7 n For A ⊂ R the following statements are equivalent: 1 A is closed 2 A contains its boundary n 3 Any sequence (xn) in A that converges in R has its limit in A

Definition 2.9 (Compacta) n A ⊂ R is bounded if ∃M > 0 such that kxk ≤ M ∀x ∈ A. If A is both closed and bounded, then A is a compact set. Draft As compact sets are of central importance in (optimization) it is useful to have multiple ways of describing such sets. n A collection of open subsets Oi is an open cover for A ⊂ R if S A ⊂ Oi.

Theorem 2.8 n For A ⊂ R the following statements are equivalent: 1 A is compact 2 Every sequence in A has a subsequence that converges in A 3 Every open cover for A has a finite subcover. Draft Consider A ⊂ X. A point x ∈ A is isolated, if ∃ε > 0 such that for any x0 ∈ A: x0 6= x ⇒ kx0 − xk ≥ ε.

Proposition 2.6 If A ⊂ X is compact and every a ∈ A is isolated, then A is finite.

Remark: Explains why the equilibrium set of a regular competitive economy has a finite number of (normalized) market equilibria.

Definition 2.10 (Convex sets) n A ⊂ R is a if for x, y ∈ A and λ ∈ [0, 1] we have λx + (1 − λ)y ∈ A.

How do convex and compact sets relate? Remark: A convex set is a special case of a connected set. Arbitrary intersections of convex sets are convex. Draft 3. Continuity

Let f : X → Y be a function. Generally, it is very difficult to adequatly describe f(X). Therefore, it is often vital to study the qualitative properties of functions. Example: Do small changes in x induce arbitrarily small changes in f(x)? The notion of continuity, one of the most central concepts in analysis, specifies the idea of such small changes“. ” Draft n In the following we consider X ⊂ (R , k·kX ), m Y = (R , k·kY ), but many results of this section hold for general metric spaces.

Definition 3.11 (Continuity) A function f : X → Y is continuous at x ∈ X if for any neighborhood V of f(x) ∃ a neighborhood U of x such that f(U) ⊂ V .

f is continuous if it is continuous ∀x ∈ X. C(X,Y ) is the set of all continuous functions from X to Y . Draft As continuity is of central importance, it is desirable to have many equivalent ways of defining it.

Theorem 3.9 f : X → Y . The following statements are equivalent: 1 f is continuous at x 2 ∀ε > 0 ∃δ > 0: kf(x) − f(x0)k < ε for any x0 ∈ X with kx − x0k < δ

3 Every sequence (xn) in X with lim xn = x satisfies n→∞ lim f(xn) = f(x) n→∞

Sums and products of continuous functions are continuous. 1 m m Vector-valued functions f = (f , ..., f ): X → R are continuous at x ∈ X iff every component function is continuous at x. Draft Definition 3.12 (Lipschitz-continuity) A function f : X → Y is Lipschitz(-continuous) if ∃L > 0: kf(x0) − f(x)k ≤ L kx0 − xk for any x, x0 ∈ X.

Proposition 3.7 If f : X → Y is Lipschitz, f is continuous.

Example: The norm(function) k·k is Lipschitz (hence also continuous). But: 1/x : (0, 1) → R not Lipschitz.

Proposition 3.8 (Compositions and continuity) X,Y,Z are metric spaces, f : X → Y is continuous at x and g : Y → Z is continuous at f(x). Then g ◦ f : X → Z is continuous at x.

The converse is generally false Draft Definition 3.13 (Uniform continuity) k m A function f : A ⊂ R → R is uniformly continuous if ∀ε > 0 ∃δ > 0, ∀x, y, ∈ A:

kx − yk < δ ⇒ kf(x) − f(y)k < ε

1/x : (0, 1) → R not uniformly continuous

Theorem 3.10 k m Suppose that A ⊂ R is compact and f ∈ C(A, R ). Then f is even uniformly continuous. Draft How does the continuity-property interplay with our previous topological notions?

Theorem 3.11 (Topological version of continuity) Let f : X → Y . Then the following statements are equivalent: 1 f ∈ C(X,Y ) 2 f −1(O) is open in X for any open O ⊂ Y 3 f −1(A) is closed in X for any closed A ⊂ Y

Consequence: If f ∈ C(X, R) then {x ∈ X : f(x) < r} is open, {x ∈ X : f(x) = r} is closed. Note that images of open (closed) sets need not be open (closed), consider e.g. f(x) = x2 and O = (−1, 1). Draft Theorem 3.12 If f ∈ C(X,Y ) and X is convex, then f(X) also is convex.

Consequence: If f ∈ C(X, R) then f(X) is an interval.

Theorem 3.13 If f ∈ C(X,Y ) and X is compact, then f(X) also is compact.

For real-valued function we then get the central result that:

Theorem 3.14 (Theoreom of the max/min) If X is compact and f ∈ C(X, R) then f has a maximum and a minimum on X. Draft 3.1. Uniform convergence and continuity

k m Let f, fn : A ⊂ R → R for n ∈ N

Definition 3.14 (Pointwise convergence)

The sequence (fn) converges pointwise to f if:

∀x ∈ A : fn(x) → f(x)(n → ∞)

Definition 3.15 (Uniform convergence)

The sequence (fn) converges uniformly to f if:

supx∈A kfn(x) − f(x)k → 0 (n → ∞)

Note: uniform conv ⇒ pointwise conv, but not vice-versa Draft Proposition 3.9 k Let A ⊂ R and suppose that fn ∈ C(A, R), n ∈ N converges uniformly to f : A → R. Then also f ∈ C(A, R).

Hence C(A, R) is a closed set, further C(A, R) is a vector space.

k Suppose that A ⊂ R is compact.

kfk∞ ≡ supx∈A kf(x)k is a norm on C(A, R).

If (fn) is a sequence in (C(A, R), k·k∞), then (fn) converges in (C(A, R), k·k∞) iff (fn) converges uniformly.

The space (C(A, R), k·k∞) is a complete, normed vector space (Banachspace) Draft 4. Differentiability

In the following let X be an open subset of Rn.

Definition 4.16 (Differentiability) m A function f : X → R is differentiable at x0 ∈ X if there exists a linear f(x)−f(x )−A (x−x ) operator A : n → m such that lim 0 x0 0 = 0. R R kx−x0k x→x0

f is differentiable at x0 iff f can be approximated linearly at x0.

If f is differentiable at x0, then f continuous at x0.

The linear map Ax0 is uniquely determined and it is convenient to

denote it by Ax0 = Df(x0). If f is differentiable at any x ∈ X, f is a differentiable function and the mapping x 7→ Df(x) is the derivative of f. If Df is continuous, then f is called continuously differentiable and f ∈ C1(X, Rm). Draft m n Let f : X → R , x0 ∈ X and v ∈ R \{0}. Because X open, m there is ε > 0 such that the mapping t :(−ε, ε) → R , t 7→ f(x0 + tv) is well-defined.

Definition 4.17 (Directional Derivatives) If the above function is differentiable at 0, its derivative f(x0+tv)−f(x0) Dvf(x0) = lim is called directional derivative of f t→0 t at x0 in direction v.

If f differentiable at x0, then f has a directional derivative for any non-zero direction v, and Dvf(x0) = Df(x0)v. Draft Definition 4.18 (Partial Derivatives) The directional derivatives in the directions of the standard n coordinates of R are called partial derivatives, and

∂kf(x0) = Dek f(x0) for k = 1, ..., n.

n P If f differentiable at x0 then Df(x0)v = ∂kf(x0)vk, k=1 v = (v1, ..., vn).

If f differentiable at x0 its derivative at x0 can be represented by the (Jacobian) matrix

 1 1  ∂1f (x0) . . . ∂nf (x0)  . .  [Df(x0)] =  . .  m m ∂1f (x0) . . . ∂nf (x0) Draft Theorem 4.15 f is continuously differentiable iff every coordinate function j f : X → R is continuously partially differentiable.

3 2 x Example: f : R → R , (x, y, z) 7→ (e cos(y), sin(xz)) is continuously differentiable and

 ex cos(y) −ex sin(y) 0  [Df(x )] = 0 z cos(xz) 0 x cos(xz)

If f : X → R is differentiable in x0, then its derivative can be represented by the gradient ∇f(x0): ∇f(x0) = (∂1f(x0), ..., ∂nf(x0)) The gradient (a vector) points in the direction of the strongest ascent of f. Draft n If f, g : X → R are differentiable at x0, then D(f + αg)(x0) = Df(x0) + αDg(x0).

Chain rule: The composition g ◦ f is differentiable at x0 if the involved functions are differentiable and D(g ◦ f)(x0) = Dg(f(x0))Df(x0)

1 1 Product rule: If f, g ∈ C (X, R) then also fg ∈ C (X, R) and D(fg) = gDf + fDg

Similar statements hold if fg is a vector-valued function (see MWG, p. 927). Draft n Higher-order derivatives: f : X → R is m-times continuously differentiable iff f is m-times continuously m m partially differentiable. Notation: f ∈ C (X, R ). 2 Hessian matrix: Let f ∈ C (X, R). Then  2 2  ∂f (x) ··· ∂f (x) ∂x1∂x1 ∂x1∂xn 2  . .  D f(x) =  . . , a symmetric matrix.  2 2  ∂f (x) ··· ∂f (x) ∂xn∂x1 ∂xn∂xn 2 A function f ∈ C (X, R) can be represented by its Taylor-expansion: 1 f(y) = f(x)+Df(x)(y−x)+ (y−x)·D2f(x)(y−x)+o(ky − xk2) 2

f(y) The Landau notation o means: lim 2 = 0, i.e. f y→x ky−xk vanishes quicker than quadratic. Draft 5. Implicit Function Theorem

In economics, the variables of interest (e.g. consumer demand functions) are frequently defined only implicitly by a system of (nonlinear) equations F (x(y), y) = 0.

Consider a system of N equations depending on N endogenous variables x = (x1, ..., xN ) and M parameters q = (q1, ..., qM ):

f 1(x; q) = 0 . . f N (x; q) = 0

We frequently are interested how the solution vector x(q) depends on the parameters (comparative statics). =⇒ The IFT provides an answer without the need to explicitly solve the system! Draft Suppose that (¯x, q¯) solve F (¯x;q ¯) = 0. The Implicit Function Theorem gives conditions under which we can locally solve the equations at (¯x, q¯) for x as a function of q. Locally: There are neighborhoods V about q¯ and U about x¯ such that for any q ∈ V the system F = 0 has a unique solution x = x(q) ∈ U.

Theorem 5.16 (Comparative statics) q N N+M Suppose that F ∈ C (X, R ), where X ⊂ R is open, and F (¯x;q ¯) = 0 for some (¯x, q¯) ∈ X. If Det(DxF (¯x;q ¯)) 6= 0 then the system can be locally solved at (¯x, q¯), where x = x(q) are Cq-functions of the parameters q. Moreover, we have that

−1 Dxq(¯q) = −[DxF (¯x;q ¯)] DqF (¯x;q ¯) Draft Note that the IFT does not assert the existence of a solution F (x; q) = 0 (this must eventually be shown e.g. by a fix point theorem), but it tells us how the endogenous variables vary with the exogenous parameters provided such a solution exists. We can use Cramer’s rule known from the theory of linear equation systems to calculate the derivatives of the implicitly defined functions. Cramer’s rule states that for Ax = b, where A is a nonsingular n-square matrix, the solution x = (x1, ..., xn) can be Det(Aj ) calculated by xj = Det(A) , where Aj is obtained by replacing the j-th column of A by the vector b. Draft 6. Homogeneous functions

Many functions in economics are naturally homogeneous functions, e.g. demand functions (Walras’ law), profit functions,...

Definition 6.19 (Homogeneous function) n A function f : R → R defined on an open cone is homogeneous of degree k if f(λx) = λkf(x), λ > 0.

A set S is a cone if x ∈ S ⇒ λx ∈ S. Terminology: Increasing (k > 1), constant (k = 1), decreasing (0 < k < 1) returns to scale. Draft Proposition 6.10 If f(x) is a real-valued C1 function that is homogeneous of degree k, then its first-order partial derivatives are homogeneous of degree k − 1.

Proposition 6.11 If f(x) is a real-valued homogeneous C1 function defined on the positive orthant, then the tangent planes to the level sets have constant slopes along each ray from the origin.

Example: Homogeneous ⇒ linear income expansion paths (no aggregation problems as indirect utility satisfies Gorman form)

Theorem 6.17 (Euler-Theorem) Suppose that f(x) is a real-valued C1 function defined on the positive orthant. f is homogeneous of degree k iff x · ∇f(x) = kf(x). Draft Definition 6.20 (Homothetic function) n A function v : R → R is homothetic if it is a monotone transformation of a homogeneous function.

A real-valued function is monotone if x ≥ y ⇒ f(x) ≥ f(y). If u is C1 and homothetic on the positive orthant, then the slopes of the tangent planes to the level sets of u are constant along rays from the origin. Remark: A property of a real-valued function is called ordinal if it is preserved under any monotonic transformation. Properties that are not invariant to monotonic transformations are cardinal. Homogeneity is a cardinal property of a function, whereas homotheticity is ordinal. The same type of distinction holds for concave and quasi-concave functions explored next. Draft 7. Concavity and Quasi-Concavity

In this section we consider functions f : A → R, where n A ⊂ R is convex.

Definition 7.21 (Concavity) f is concave if

f(λx + (1 − λ)x0) ≥ λf(x) + (1 − λ)f(x0) holds ∀x, x0 ∈ A and any λ ∈ [0, 1]. If the inequality is strict for x 6= x0 and all λ ∈ (0, 1), then f is strictly concave.

f is (strictly) convex iff −f is (strictly) concave. Draft If f concave, then the level sets of f bound convex sets from below

Proposition 7.12 If f : A → R concave, then for any x0 ∈ A the upper

+ Cx0 = {x ∈ A : f(x) ≥ f(x0)} is a convex set.

=⇒ If f is utility function, concavity implies the indifference curves to have “the right shape” (convex, i.e. decreasing MRS). Draft Proposition 7.13 A C1- function f on an open, convex set A is concave iff

f(x) − f(x0) ≤ ∇f(x0) · (x − x0) ∀x, x0 ∈ A

Moreover, f is strictly concave iff the inequality is strict for x 6= x0.

For convexity reverse the inequality.

Proposition 7.14 A C2- function f on an open, convex set A is concave iff D2f(x) is NSD on A. If D2f(x) is ND on A, f is strictly concave. Draft f is concave iff

f(α1x1 + ... + αkxk) ≥ α1f(x1) + ... + αkf(xk) for any collection of vectors x1, ..., xk ∈ A and non-negative numbers with α1 + ... + αk = 1.

If A ⊂ R, this is known as Jensen’s inequality. Sums of concave (convex) functions are concave (convex).

If f is concave and F (u) is concave and increasing, then F (f(x)) is concave.

If f concave (convex) on A, then f is also continuous on Int(A). Draft Definition 7.22 (Quasiconcavity) f : A → R, is quasiconcave on a convex set A if ∀a ∈ R: + Ca ≡ {x ∈ A : f(x) ≥ a} is a convex subset. Similarly, f is quasiconvex if − Ca ≡ {x ∈ A : f(x) ≤ a} is a convex subset.

Theorem 7.18 (Characterization of q-concavity) The following statements are equivalent:

1 f is q-concave on A 2 ∀x, y ∈ A and any λ ∈ [0, 1]:

f(x) ≥ f(y) ⇒ f(λx + (1 − λ)y) ≥ f(y)

3 ∀x, y ∈ A and any λ ∈ [0, 1]:

f(λx + (1 − λ)y) ≥ min {f(x), f(y)}

f is strictly q-concave if for x 6= y ∈ A and λ ∈ (0, 1): f(x) ≥ f(y) ⇒ f(λx + (1 − λ)y) > f(y) Draft f is (strictly) quasiconvex iff −f is (strictly) quasiconcave. f quasiconcave if f concave

If f quasiconcave (-convex) and F increasing, then F (f) quasiconcave (-convex).

Sums of quasiconcave (-convex) functions need not be quasiconcave (-convex). Draft Proposition 7.15 A C1-function f is quasiconcave on an open, convex set A iff

f(x) ≥ f(x0) ⇒ ∇f(x0) · (x − x0) ≥ 0 ∀x, x0 ∈ A

If the second inequality is strict for x 6= x0, then f is strictly quasiconcave.

Proposition 7.16 If a C2-function f is quasiconcave on an open, convex set A then

v · ∇f(x) = 0 ⇒ v · D2f(x)v ≤ 0 x ∈ A

The above condition is only necessary (a mistake in MWG, p. 935) The condition means: The Hessian D2f(x) must be NSD on n the subspace {v ∈ R : v · ∇f(x) = 0}. Draft Proposition 7.17 A C2-function f is strictly quasiconcave on an open, convex set A if for x ∈ A:

v · v = 1, v · ∇f(x) = 0 ⇒ v · D2f(x)v < 0

Determinantal tests for the last two propositions exist by using appropriately bordered Hessian matrices (see MWG, p. 938). Draft 8. Classical optimization

n In this section we consider f : X → R, X ⊂ R . f has a (global) maximum at x∗ ∈ X if f(x∗) − f(x) ≥ 0, ∀x ∈ X. x∗ is called max(imum) (point) or arg max of f f(x∗) is called maximum value. x∗ is called extreme point of f if it is a max or a min of f. f has a (global) minimum at x∗ iff −f has a maximum at x∗.

Proposition 8.18 Suppose that F (u) is a strictly increasing function on the codomain of f. Then x∗ maximizes f(x) on X iff x∗ maximizes F (f(x)) on X. Draft f has a local max at x∗ if there is a neigborhood S around x∗ such that f(x∗) − f(x) ≥ 0, ∀x ∈ S. If f is differentiable then a point x ∈ X with ∇f(x) = 0 is called a stationary point of f (also: critical point, rest point).

Proposition 8.19 (FOC) If f is differentiable and x ∈ Int(S) is a local extreme point of f, then x is stationary.

A stationary point therefore is a candidate for an extreme point (only necessary condition, it could be a saddle point) To characterize (local) maximizers (minimizers) of f, we need to look at SOC. Draft Theorem 8.19 2 Suppose that f ∈ C (X, R) and x ∈ Int(X) is a stationary point of f. 1 If x is a local max of f, then D2f(x) is NSD. 2 If D2f(x) is ND, then x is a local max. 3 If D2f(x) is indefinite, then x is a saddle point of f.

Replace negative“ by positive for local minimizers. ” A saddle point is a stationary point that is neither a min nor a max (i.e. f is increasing in some directions and decreasing in other directions).

Theorem 8.20 (Global optima) 1 Let f ∈ C (X, R) be concave (convex), X convex and x ∈ Int(S). Then x maximizes (minimizes) f iff x is stationary. Draft 9. Constrained Optimization

9.1. Equality constraints

Consider the problem   g1(x) = b1 max f(x) s.t. ... (1) x  gm(x) = bm

n where all functions are defined on an open subset A ⊂ R and n ≥ m.

The set C = {x ∈ A : gj(x) = bj, j = 1, ..., m} is the constraint set. The definition of a local or global (constrained) max are similar to the unconstrained problem up to the fact that we only consider feasible points x ∈ C. Draft Theorem 9.21 (FOC)

Suppose that both the objective and constraint functions of problem (1) are differentiable and x ∈ C is a local constrained maximizer. Assume that the m × n Jacobian   ∂g1(x) ··· ∂g1(x) ∂1 ∂n  . . .   . .. .    ∂gm(x) ··· ∂gm(x) ∂1 ∂n has full rank, i.e. rank m (the constraint qualification; CQ). Then there exists numbers λj ∈ R (called Lagrange multipliers), one m P for each constraint, such that ∇f(x) = λj∇gj(x). j=1

Constraint Qualification: Gradients ∇gj(x) are linearly independent. Draft Lagrange’s trick (assuming CQ satisfied): (x, λ) is critical point of the Lagrange function:

m X L(x, λ) = f(x) − λj (gj(x) − bj) j=1

Upshot: No new theory required (unconstrained optimization problem). If A convex, (x, λ) is a stationary point of L and L concave in x, then x is a global constraint maximizer of f on C. Draft Proposition 9.20 (SOC) 2 Suppose that f and g1, ..., gm are C and consider problem (1). If (x, λ) satisfies the CQ and the FOC in proposition 21, and the Hessian of L w.r.t. x is ND on the linear constraint set {v : Dg(x)v = 0}, i.e.

2 v 6= 0 Dg(x)v = 0 =⇒ v · DxL(x, λ)v < 0 then x is a strict local constrained maximizer of f on C.

See proposition 37 for a determinantal test (bordered Hessian). Draft 9.2. Inequality constraints

Consider the problem

 g1(x) = b1   ...  g (x) = b max f(x) s.t. m M (2) x∈ n h1(x) ≤ c1 R   ...  hk(x) ≤ cK where n ≥ m + k Problem (2) is sometimes called a (nonlinear) programming problem. The constraint set is similar to before and denoted by C. Draft Theorem 9.22 (Kuhn-Tucker FOC)

Suppose that x ∈ C is a local maximizer of problem (2) and the CQ is satisfied. Then there are multipliers λm ∈ R, one for each equality constraint, and λk ∈ R+, one for each inequality constraint, such that M K P P 1 ∇f(x) = λm∇gm(x) + λk∇hk(x) m=1 k=1 2 λk(hk(x) − ck) = 0, k = 1, ..., K i.e. λk = 0 for any inactive inequality constraint.

The CQ must hold only for constraints that are binding (active) at x The CQ assures that x is also a maximizer in the locally linearised problem. Non-binding inequality constraints can be ignored. SOC as before (only for binding constraints) Draft Proposition 22 is perhaps easier to remember as a Lagrange-recipe We ignore equality constraints for simplicity

1 Suppose x ∈ C is a local maximizer. Set up the natural K P Lagrangian: L(x, λ) = f(x) − λk(hk(x) − ck) k=1 2 Derive the FOC w.r.t. x of L.

3 Write down all complementary slackness conditions, i.e. λk ≥ 0 and λk(hk(x) − ck) Draft Proposition 9.21 Suppose that M = 0 and every inequality constraint is a ∗ quasiconvex function hk(·). Suppose that x ∈ C satisfies the Kuhn-Tucker FOC and the CQ holds at x∗. If ∇f(x) · (x0 − x) > 0 for any x, x0 ∈ C with f(x0) > f(x), then x∗ is a global maximizer.

A function with the property that ∇f(x) · (x0 − x) ≤ 0 ⇒ f(x0) ≤ f(x) is called pseudoconcave. Hence FOC are also sufficient if f is concave or f is quasiconcave with non-vanishing gradient at critical points.

Proposition 9.22 If the constraint set C of problem 2 is convex and f is strictly quasiconcave, then there can be at most a one global constrained maximizer. Draft Proposition 9.23

Let M = 0 for simplicity and f be quasiconcave. Suppose further that all constraint functions are quasiconvex.

1 For any fixed c = (c1, ..., cK ) let Z(c) denote the set of global constrained maximizers of f on C. Then Z(c) is a convex set. 2 Let V (b) denote the maximum value function associated with problem (2). If f is concave and the g’s are convex, then V (·) is concave.

Remark: Proposition 23 has several applications, e.g. cost functions (or expenditure functions) are naturally concave. Draft 9.3. Envelope theorem

1 ∗ Let f, h1, ..., hk be C functions and suppose that x (a) maximizes f(x, a) on the constraint set h1(x, a) = 0, ..., hk(x, a) = 0, where a is a parameter. The function v(a) = f(x∗(a), a) is called maximum value function. The IFT tells us how x∗ depends on a. Similarly, the envelope theorem tells us how v depends and a:

Theorem 9.23 (Envelope-theorem) Suppose that x∗(a) and the Lagrange multipliers ∗ ∗ 1 λ1(a), ..., λk(a) are C functions of a and the CQ holds at ∗ 0 ∂ ∗ ∗ x (a). Then v (a) = ∂a L (x (a), λ (a), a), where L(·) is the natural Lagrangian for the maximization problem.

Roy’s identity or Hotelling’s Lemma are important applications of theDraft Envelope theorem. 10. (Riemann) Integration

10.1. Step functions

Let −∞ < a < b < ∞ and I = [a, b].

Z ≡ {α0, ..., αn} is a dissection of I if n ∈ N and a = α0 < α1 < ... < αn = b

A bounded function f : I → R is called step function, if ∃ Z of I such that f constant on each (aj−1, aj):

f (aj−1,aj ) = ej , 1 ≤ j ≤ n

T (I, R) is the set of all step functions f : I → R. Draft Definition 10.23 (Integral for step functions) If f : I ∈ T (I, R) with dissection Z, we call R Pn (Z) f ≡ j=1 ej(aj−1 − aj) integral of f with dissection Z.

Lemma 10.1 (Dissection invariance of R ) 0 Suppose f ∈ T (I, R), and Z,Z are two dissections for f. Then R R (Z) f = (Z0) f.

R R b For f ∈ T (I, R)we can define I f = a f(x)dx using an arbitrary dissection Z of f.

R b R b Note that a : T (I, R) → R, f 7→ a f defines a (linear) mapping, which we also call integral. Draft 10.2. The R-integral for bounded functions

Let f : I → R be an arbitrary bounded function. Can area “under the graph” of f be approximated by area “below” certain step functions? R Upper R-integral: R(f) ≡ inf{ I g : g ∈ T (I, R), g ≥ f} R Lower R-integral: r(f) ≡ sup{ I e : e ∈ T (I, R), e ≤ f} Note that r(f) ≤ R(f)

Definition 10.24 (Riemann Integral) A bounded function f : I → R is R-integrable over I if R r(f) = R(f) ≡ I f R Then I f is the Riemann integral of f over I.

Not all bounded functions are R-integrable. Draft Theorem 10.24 (Characterizing R-integrable functions)

Consider the bounded function f : I → R. The following is equivalent:

1 f is R-integrable 2 ∀ε > 0 ∃ e, g ∈ T (I, R), e ≤ f ≤ g, such that R R I g − I e < ε

Theorem 24 is useful for finding important subclasses of R-integrable functions Draft Theorem 10.25

Any f ∈ C(I, R) is R-integrable on I.

Theorem 10.26 Any f : I → R is R-integrable Draft 10.3. Properties of the integral

Theorem 10.27 (Linearity) R The mapping I : C(I, R) → R is linear, i.e. for g, h ∈ C(I, R) and s, r ∈ R: R R R I (sg + rh) = s I g + r I h

Linearity also holds if it is only known that g, h are R-integrable.

Theorem 10.28 (Positivity) If f : I → R is R-integrable and f(x) ≥ 0, x ∈ I, then R I f ≥ 0 Draft Theorem 10.29 (Monotonicity)

If f1, f2 : I → R are R-integrable and f1(x) ≤ f2, x ∈ I, then R R I f1 ≤ I f2.

Theorem 10.30 If f, f1, f2 : I → R are R-integrable, then |f|, min{f1, f2} and max{f1, f2} are also R-integrable on I.

Theorem 10.31 If f : I → R is R-integrable, then R R I f ≤ I |f| ≤ (b − a)kfk∞ Draft R b R a Oriented integral: a f = − b f

Theorem 10.32 (Area Additivity)

If f : I → R is R-integrable and x, y, z ∈ I then R y R z R y x f = x f + z f

Draft 10.4. Fundamental theorem of calculus

Let f ∈ C(I, R). The integral then defines a mapping R x F : I → R x 7→ a f(s)ds

The integral is Lipschitz-continuous in the upper bound:

Proposition 10.24 For f ∈ C(I, R):

|F (x) − F (y)| ≤ kfk∞ |x − y| x, y ∈ I Draft Theorem 10.33 Let f : I → R be R-integrable, and suppose that f continuous at x ∈ I. Then F is differentiable at a and F 0(x) = f(x)

Definition 10.25 (Antiderivative) 1 A function F ∈ C (I, R) is called antiderivative 0 (Stammfunktion) of f : I → R if F (x) = f(x), x ∈ I.

Note that antiderivatives are unique up to an additive constant Draft Theorem 10.34 (Fundamental theorem) Every f ∈ C(I, R) has an antiderivative, and every antiderivative F of f satisfies:

R x F (x) = F (a) + a f(s)ds x ∈ I

Consequence: Every antiderivative F of f ∈ C(I, R) satisfies R b b a f = F (b) − F (a) ≡ F |a Hence integration trivial if antiderivative known Generalizations to higher dimensions: Gaussian theorem, Stokes’ theorem... Draft Can we interchange integration and limit?

Proposition 10.25 (Convergence theorem)

Suppose that fn ∈ C(I, R), n ∈ N.

1 If (fn) converges uniformly to f, then R R R lim I fn = I limfn = I f P 2 If fn converges uniformly, then R P∞ P∞ R I ( n=0) = n=0 I f

The theorem may fail with pointwise convergence For f ∈ C(I, R) ∃ a sequence of SF that converges uniformly to f on I Many of the previous results could also be proven using this theorem Draft 10.5. Integration: Techniques

By the fundamental theorem we can use the rules of differentiation to derive rules of integration as well.

Main tools: Integration by parts and integration by substitution (see exercises for examples)

Theorem 10.35 (Integration by parts) For f, g ∈ C1(I, R):

R b 0 b R b 0 a fg = fg|a − a f v Draft Theorem 10.36 (Integration by substitution)

1 Let f ∈ C(I, R), ϕ ∈ C (I, R) with −∞ < a < b < ∞ and ϕ (I) ⊂ I. Then:

R b 0 R ϕ(b) a f (ϕ(x)) ϕ (x)dx = ϕ(a) f(y)dy

Draft 10.6. Improper R-integral

What about integrating f on non-compact intervals? Idea: Consider the restriction of f on compact subintervals, and take limits.

Definition 10.26 (Improper integral) Let −∞ ≤ a < b ≤ ∞. A function f :(a, b) → R is admissible, if f|[c,d] is R-integrable for any compact [c, d] ⊂ (a, b). The admissible function f :(a, b) → R is improperly R c R-integrable if ∃c ∈ (a, b) such that the limits α↓a f and R β↑b R b R c R β↑b c f exist. Then a f ≡ α↓a f + c f Draft The improper integral does not depend on the choice of c ∈ (a, b).

If f :[a, b] → R is R-integrable, then f is also improperly R-integrable, and the integrals coincide.

R ∞ γ 1 Example: 1 x dx exists if γ < −1 and then = − 1+γ . Draft 10.7. Integration in Rn

The R-integral can be generalized to Rn. We now illustrate how this works.

Suppose that Ii ⊂ R is a (not necessarily compact) Qn interval in R. The object Q = i=1 Ii is called a rectangle in Rn

Qn The measure of Q is µ(Q) = i=1 |Ii| Draft n Let Qj, 1 ≤ j ≤ J be rectangles in R , such that all Qj are SJ disjoint and Q = j=1 Qj. The set P = {Qj : 1 ≤ j ≤ J} is a partition of Q.

Wlog we can consider product partitions, where each Ii is partitioned into disjoint intervals. The function g : Q → R is a stepfunction, if ∃ a partition P PJ such that g = j=1 cjχQj . If g is a stepfunction, then its R-integral is R PJ Q g = j=1 cjµ(Qj) Draft For bounded f : Q → R we define (as before): R R(f) ≡ inf{ Q g : g SF, g ≥ f} R r(f) ≡ sup{ Q e : e SF, e ≤ f}

Definition 10.27 (R-integral) Let f : Q → R be a bounded function. f is R-integrable, if R R r(f) = R(f) ≡ Q f ≡ Q fdµ Draft The n-dimensional integral shares the essential properties of the one-dimensional case.

In particular: Theorem 10.24, theorem 10.25 (for compact Q) and propositions 10.27 - 10.32 are valid also if n > 1. Draft The integral is defined over limits ⇒ How calculate R f in applications? If n = 1 this is easy if the antiderivative is known If n > 1 the most important Fubini theorem tells us that we can split integration into iterated one-dimensional integration, where the order of integration is irrelevant:

Theorem 10.37 (Fubini’s theorem) Let Q = [a, b] × [c, d], f ∈ C(Q, R). Then:

R R b R d  R d R b  Q f = a c f(x, y)dy dx = c a f(x, y)dx dy

Fubini’s theorem applies also to n > 2. Draft One can extend the notion of integration to areas other than rectangles (→ measure theory)

y

v(x)

u(x) Ω a b x

If f :Ω ⊂ Q → R is integrable, then

R R b R v(x)  Ω f(x, y)dxdy = a u(x) f(x, y)dy dx Draft As a nice application we can use Fubini to obtain an elegant proof for the following fact: Proposition 10.26 (Differentiation under the integral) Suppose that Q = [a, b] × [c, d] and f ∈ C(Q, R). If 1 f(·, t) is R-integrable ∀t ∈ [c, d] ∂f(x,t ) 2 0 ∂t exists ∀x ∈ [a, b] ∂f(x,t) 3 ∂t ∈ C(Q, R)

d R b R b ∂f(x,t0) then dt a f(x, t0)dx = a ∂t dx

Theorem 10.38 (Leibnitz rule) 1 1 2 Suppose that u, v ∈ C (R, R, f ∈ C (R , R) and u(t0) 6= v(t0). Then

d R v(t0) f(x, t0)dx = dt u(t0) 0 0 R v(t0) f (v(t0), t0) v (t0) − f (u(t0), t0) u (t0) + ft(x, t0)dx u(t0) Draft Finally, we present the n-dimensional analogon to proposition 10.36 (proof highly technical!) A function Φ ∈ C1(U, V ) is a diffeomorphism if Φ is bijective and Φ−1 ∈ C1(V,U).

Proposition 10.27 (Rule of substitution) n 1 n Let U ⊂ R open, Φ ∈ C (U, R ) a diffeomorphism from U to Φ(U). Further, consider Ω ⊂ U and f : Φ(Ω) → R. Then R R Φ(Ω) fdµ = Ω(f ◦ Φ)|Det(dΦ)|dµ where f is integrable iff (f ◦ Φ)|Det(dΦ)| is integrable.

Example: With this theorem it is possible to show that R ∞ −x2 √ −∞ e = π. Draft 11. Correspondences

Definition 11.28 (Correspondence) A correspondence f from a set A to a set B is a rule that maps each x ∈ A to a nonempty subset f(x) of B.

While a function maps point-to-point, a correspondence maps point-to-set.

Correspondences are sometimes given the notation f : A ⇒ B to distinguish them from functions. But note that a correspondence f actually is a function: f : A → P(B). Motivation: Solutions to optimization problems or to equilibrium equations need not generally produce unique solutions. Draft The graph of a correspondence f is the set graph(f) = {(a, b) ∈ A × B : b ∈ f(a)}.

Definition 11.29 (Closed graph) n m The correspondence f : X ⊂ R → R has a closed graph if for m any two convergent sequences (xk) in X and (yk) in R , with yk ∈ f(xk) and lim xk = x ∈ X, we have that lim yk ∈ f(x). k k

Thus f has a closed graph iff graph(f) is a (relatively) closed n subset of X × R .

Theorem 11.39 n m Let f : X ⊂ R ⇒ R , where f(x) = f1(x) × f2(x) × ... × fm(x) and fi : X ⇒ R. If each fi has the closed graph property, so does f. Draft Definition 11.30 (Upper hemicontinuity) n m The correspondence f : X ⊂ R ⇒ R is uhc at x0 if for every open set V that contains f(x0), there exists a neighborhood U of x0 such that f(x) ⊂ V , x ∈ U.

A correspondence is uhc if it is uhc at any point. m A function f : X → R is continuous iff it is uhc. n m A correspondence f : X ⊂ R ⇒ R is compact-valued at a m point x ∈ X if f(x) is compact in R .

Proposition 11.28 (Sequence characterization of uhc) n m f : X ⊂ R ⇒ R is uhc at x0 ∈ X, if any sequence (xn) in X converging to x0 and any sequence (yn) ∈ Y such that yn ∈ f(xn), there is y ∈ f(x0) and a subsequence (ynk ) that converges to y. If f is compact-valued, then the converse is also true. Draft Proposition 11.29 n m Let f : X ⊂ R ⇒ R be an uhc and compact-valued correspondence. If C ⊂ X is compact, then f(C) is compact.

The next proposition is a central ingredient e.g. in the existence proof of Walrasian or Nash equilibria.

Theorem 11.40 n m Let f : X ⊂ R ⇒ Y ⊂ R be a correspondence and Y compact. Then f is uhc iff f has a closed graph. Draft Definition 11.31 (Lower hemicontinuity) n m The correspondence f : X ⊂ R ⇒ R is lhc at x0 if for each y0 ∈ f(x0) and each neighborhood V of y0 there is a neighborhood U of x0 such that f(x) ∩ V 6= ∅, x ∈ U.

f is lhc at x0 ∈ X iff ∀xn ∈ X with lim xn = x0 there is a n m sequence (yn) ∈ R such that yn ∈ f(xn) and yn → y. Draft Difference between uhc and lhc:

uhc: if the image sets of a convergent sequence of domain-points contain a convergent sequence of image points, then the image of the domain-limit must contain the limit of the image-sequence.

lhc: if a domain-sequence converges, then for any point in the image of the domain-limit one can find a domain-subsequence whose image contains a sequence converging to the image of the limit point.

A correspondence that is both uhc and lhc is called continuous. Draft 12. Fixpoint Theorems

The most frequent technique for establishing existence of a solution to a system of equilibrium equations is setting up the problem as finding a fixed point of a correspondence.

Let f : A ⇒ B and A ⊂ B. An element a ∈ A with a ∈ f(a) is a fixed point of f.

Consider f : A ⇒ B, A ⊂ B. If g(x) ≡ f(x) + x then a ∈ A is a zero of f iff a is a FP of g. Draft Proposition 12.30 (Brouwer FP theorem) n Suppose that A ⊂ R is non-empty, compact, convex and f : A → A is a continuous self-mapping. Then f has a FP.

Frequently, the following generalization of the Brouwer FPT is invoked (e.g. in the existence-proof of general equilibrium).

Theorem 12.41 (Kakutani FP theorem) n Suppose that A ⊂ R is non-empty, compact, convex and f : A ⇒ A is a uhc correspondence with the property that f(x) is non-empty and convex-valued for each x ∈ A. Then f has a FP. Draft Kakutani and Brouwer rely on f being (upper-hemi-) continuous. There is also a FPT without the requirement that f be continuous.

Proposition 12.31 (Tarski FPT) Let f : [0, 1]n → [0, 1]n be a nondecreasing function, i.e. f(x0) ≥ f(x) if x0 ≥ x. Then f has a FP.

Remark: Tarski FPT holds on general non-empty compact n sublattices of R . Draft Note that all previous FPT assert existence but NOT uniqueness of a FP. Moreover, they are static concepts, i.e. they do not entail any notion of how the FP might be (dynamically) reached. n A mapping f : A → A, where A ⊂ R , is called a contraction if ∃q ∈ (0, 1) such that kx0 − xk ≤ qkx0 − xk, x, x0 ∈ A.

Proposition 12.32 (Banach FPT) n If f : A → A, A ⊂ R , is a contraction, then there is one and only one FP a ∈ A of f. Moreover, the successive approximation xt+1 ≡ f(xt) converges to a for any initial condition x0 ∈ A. Draft 13. Hyperplane theorems

Definition 13.32 (Hyperplane) n Let c ∈ R and p ∈ R . Then the hyperplane generated by p, c is n n the set Hp,c = {v ∈ R : p · v = c}. The sets {v ∈ R : p · v ≥ c} n and {v ∈ R : p · v ≤ c} are called the upper and lower halfspaces of Hp,c.

n Suppose K ⊂ R is a convex set. Then the hyperplane H supports K (is a supporting hyperplane) if i) K is contained in one halfspace of H and ii) H contains a point in K¯ . Draft Proposition 13.33 (Supporting hyperplane theorem) n Let K ⊂ R be a convex set. For any x ∈ ∂K¯ there exists a supporting hyperplane H containing x, i.e. there is p 6= 0 and c ∈ R such that p · x ≥ c ≥ p · y ∀y ∈ K.

The hyperplane may not be unique.

Proposition 13.34 (Separating hyperplane theorem) n Suppose that K1,K2 ⊂ R are two convex sets with Int(K1) 6= ∅ and Int(K1) ∩ K2 = ∅. Then there exists a hyperplane that separates K1 from K2, i.e. K1 and K2 lie in opposite halfspaces. In other words there exists (p, c), p 6= 0 such that p · k2 ≥ c ≥ p · k1, k1 ∈ K1, k2 ∈ K2. Draft 14. Linear Algebra (Course prerequisite)

14.1. Vectors

 x1  . A vector x =  .  is an element of Rn. xn A matrix   a11 a12 ··· a1n  a21 a22 ··· a2n  M = [aij]m×n =    ············  am1 am2 ··· amn

is an element of Rm×n An n-vector can be viewed as an N × 1-matrix Draft If c1, ..., cm ∈ R then c1v1 + ... + cmvm is a linear combination n of the vectors v1, ..., vm ∈ R . n The vectors v1, ..., vm in R are linearly dependent if there are numbers c1, ..., cm not all zero such that c1v1 + ... + cmvm = 0 linearly independent otherwise. n For V ⊂ R the set S[V ] of all linear combinations of vectors in V is the span of V . n n n A set B of vectors in R is a basis for R if S[B] = R and the vectors in B are linearly independent.  1   0  Example: B = , = {e , e }. is a basis for 2. 0 1 1 2 R Consequence: Every v ∈ Rn can be uniquely represented as n P v = cjbj where c1, ..., cn ∈ R and bj ∈ B are the base j=1 vectors. Draft 14.2. Matrices

Matrices play a fundamental part in almost every form of scientific economic analysis. From game theory to dynamical systems to econometrics: matrices (and their properties) are omnipresent. Matrices can be viewed just as convenient rectangular arrays of scalars, or as linear transformations between vector spaces.

Matrix Addition: If A = [aij]m×n and B = [bij]m×n and λ ∈ R, then A + λB = [aij + λbij]m×n. n n P Inner product of two vectors v, w ∈ R : v · w = vjwj. j=1

Matrix Multiplication: If A = [aij]m×p and B = [bij]p×n then C = [cij]m×n with cij = ai· · b·j

Definition 14.33 (Linear transformation) A function T : Rn → Rm is called linear transformation (linear operator, linear map) if T (λv + µw) = λT (v) + µT (w) for λ, µ ∈ R and v, w ∈ Rn. Draft If A is an m × n-matrix then T (x) = Ax is a linear n m transformation from R to R . n m Conversely, any linear transformation T : R → R can be  represented by a m × n-matrix A = T (e1) ··· T (en) . This corresponds to the standard basis representation of T . Example: The linear map T : R3 → R2,   2x1 − 3x2 T (x1, x2, x3) = has the representation x1 − 2x2 + x3  2 −3 0  matrix A = 1 −2 1 Remark: The representation matrix A depends on the basis and therefore generally is not unique. n k k m If T : R → R and S : R → R with representation matrices A and B, then the composition S ◦ T is a linear map with matrix representation BA. Consequence: The compositions of linear maps can be represented by matrix multiplication. Draft 14.2.1. Determinants

Definition 14.34 (Determinant)

Let A = [aij]n×n be a square matrix. Then the determinant of n P i+j A is the real number Det(A) = (−1) aijDet(Aij), where j=1 Aij is the submatrix obtained by deleting row i and column j.

a11 a12 Example: = a11a22 − a12a21 a21 a22 Geometrically, the absolute value of the determinant corresponds to the volume of the parallelepiped spanned by the column vectors of A. If Det(A) 6= 0 the sign of Det(A) specifies the orientation of the respective linear transformation (there are always two orientations in Rn). Draft Proposition 14.35 (Properties of determinants)

Let A = [aij]n×n be a square matrix. 1 Det(A) = Det(AT ), where AT is the transpose of A. 2 Interchanging two rows only reverts the sign of Det(A). 3 Multiplying all elements of a row with a constant c, multiplies the determinant by c 4 Adding a multiple of a row to another row leaves the determinant unchanged. 5 Det(AB) = Det(A)Det(B)

Note that because of 1. all above statements also hold for rows. Draft 14.2.2. Eigenvalues

Suppose that T : Rn → Rn is a linear transformation (an endomorphism). We know that T can be represented by a n × n - matrix A. As A depends on the underlying choice of basis one can try to find some order among other matrix representations of the same transformation. If A is a basis representation of T one can show that any possible basis representations of T can be written as S−1AS. S is called the change-of-basis matrix. The square matrix S−1AS is called similar to A. Similar but not identical matrices are different basis representations of a single linear transformation T . Remark: Similarity is an equivalence relation on the space of the corresponding square matrices. Draft One would therefore expect similar matrices to share important properties. One of them is that the have the same eigenvalues.

Definition 14.35 (Eigenvalue) A scalar λ ∈ C is an eigenvalue of a n × n-matrix A if there exists a (possibly complex-valued) n-vector v 6= 0 (called eigenvector) such that Av = λv.

A linear transformation Av changes direction and magnitude of the vector v. If however v is an eigenvector of A (with eigenvalue λ) then Av only scales its length according to λ while keeping (or flipping) its direction. Draft Proposition 14.36 λ is an eigenvalue of A iff Det(A − λI) = 0, where I is the identity matrix.

Det(A − λI) is called the characteristic polynomial of A If A is a n-square matrix, there are exactly n eigenvalues of A (counting multiplicity). If A is symmetric (A = AT ) then all eigenvalues of A are real. AB and BA have the same eigenvalues. Det(A) product of all eigenvalues of A Draft 14.2.3. Quadratic Forms and Definitness types

n P Q = x · Ax = aijxixj, where A is a symmetric n-square i,j=1 matrix, is called a quadratic form. Wlog we can assume A to be symmetric. Quadratic forms appear naturally in economic problems in the context of concavity/convexity of a function (e.g. in the SOC of certain optimization problems)

Definition 14.36 (Definitness) A symmetric square matrix A is positive definite (PD), if its associate quadratic form Q > 0 for any choice of x 6= 0. If Q ≥ 0, then A is positive semidefinite (PSD).

A is negative (semi)-definite (ND) iff −A is PD (PSD). A is indefinite if it is not semi-definite. These five categories form a partition over the set of all symmetric matrices. Draft A minor of order k of a square matrix A is the determinant of the k-submatrix obtained by deleting k rows and k columns. A principal minor of order k is the minor obtained from A by deleting all but k rows and the same k columns. A leading principal minor of order k is the principal minor obtained by deleting all but the first k rows (and columns).

Theorem 14.42 (Characterization of definitness) Consider a symmetric square matrix A. The following statements are equivalent:

1 A is PD 2 All eigenvalues of A are positive 3 All leading principal minors of A are positive

A is PSD iff all principal minors of A are non-negative. Draft As the concept of definitness is related to quadratic forms, the notion of definitness also makes sense for non-symmetrric square matrices A.

A is PD iff the symmetric matrix M + M T is PD.

A is PD iff A−1 is PD.

If −A is PD, then the matrix kIA, where k > 0, is stable (i.e. all eigenvalues of A have negative real parts). Draft SOC of optimization sometimes require to verify the definitness of a matrix on a constraint set, usually a linear subspace (FOC). Question: When is a symmetric matrix A (positive) definite on n a linear subspace {x ∈ R : Bx = 0}, where B = [bij]m×n.  a b  Example: Consider A = and B = (r, s), with r 6= 0. b c n Then A is positive definite on {x ∈ R : Bx = 0} iff as2 − 2bsr + r2c > 0, or equivalently, iff

0 r s

− r a b > 0

s b c

This just is A bordered with the coefficients of the linear constraint. Draft Proposition 14.37

To determine the definiteness of a symmetric n-matrix A (or a general quadratic form) on a linear constraint set Bx = 0 consisting of m equations, construct the symmetric (n + m)-square matrix H by bordering A by the coefficients B of the linear constraints:

 0 B  H = BT A

1 If Det(H) and the last n − m leading principal minors of H have the same sign as (−1)m, then A is PD on the constraint set.

2 If Det(H) has the same sign as (−1)n and if the last n − m leading principal minors of H alternate in sign, then A is ND on the constraint set.

There exists a similar test for the case of PSD (NSD) on a constraint set, which is much more intricate and rarely required in applications (see MWG, p. 938). Draft A square matrix A is nonsingular if Ax = 0 ⇔ x = 0.

Definition 14.37 (Inverse) Let A be a square matrix. A is invertible if ∃ a square matrix A−1 such that A−1A = I.

Proposition 14.38 Let A be a square matrix. The following statements are equivalent.

1 A is non-singular 2 A−1 exists 3 Det(A) 6= 0 4 0 is not an eigenvalue of A 5 Ax = b has a unique solution for each b 6 All rows and all columns of A are linerly independent (A has full rank) Draft