<<

and Optimization

Kazufumi Ito∗ November 29, 2016

Abstract In this monograph we develop the space method for optimization problems and operator equations in Banach spaces. Optimization is the one of key components for mathematical modeling of real world problems and the solution method provides an accurate and essential description and validation of the mathematical model. Op- timization problems are encountered frequently in engineering and sciences and have widespread practical applications.In general we optimize an appropriately chosen cost functional subject to constraints. For example, the constraints are in the form of equal- ity constraints and inequality constraints. The problem is casted in a function space and it is important to formulate the problem in a proper function space framework in order to develop the accurate theory and the effective algorithm. Many of the constraints for the optimization are governed by partial differential and functional equations. In order to discuss the existence of optimizers it is essential to develop a comprehensive treatment of the constraints equations. In general the necessary optimality condition is in the form of operator equations. Non-smooth optimization becomes a very basic modeling toll and enlarges and en- hances the applications of the optimization method in general. For example, in the classical variational formulation we analyze the non Newtonian energy functional and nonsmooth friction penalty. For the imaging/signal analysis the sparsity optimization is used by means of L1 and TV regularization. In order to develop an efficient solu- tion method for large scale optimizations it is essential to develop a set of necessary conditions in the equation form rather than the variational inequality. The Lagrange multiplier theory for the constrained minimizations and nonsmooth optimization prob- lems is developed and analyzed The theory derives the complementarity condition for the multiplier and the solution. Consequently, one can derive the necessary optimality system for the solution and the multiplier for a general class of nonsmooth and noncovex optimizations. In the light of the theory we derive and analyze numerical algorithms based the primal-dual active set method and the semi-smooth Newton method. The monograph also covers the basic materials for real analysis, functional anal- ysis, theory, , operator theory and PDE theory, which makes the book self-contained, comprehensive and complete. The analysis component is naturally connected to the optimization theory. The necessary optimality condition is in general written as nonlinear operator equations for the primal variable and La- grange multiplier. The Lagrange multiplier theory of a general class of nonsmooth and

∗Department of , North Carolina State University, Raleigh, North Carolina, USA

1 non-convex optimization are based on the functional analysis tool. For example, the necessary optimality for the constrained optimization is in general nonsmooth and a psuedo-monontone type operator equation. To develop the mathematical treatment of the PDE constrained minimization we also present the PDE theory and linear and nonlinear C0-semigroup theory on Banach spaces and we have a full discussion of well-posedness of the control dynamics and optimization problems. That is, the existence and uniqueness of solutions to a general class of controlled linear and nonlinear PDEs and Cauchy problems for nonlinear evo- lution equations in Banach space are discussed. Especially the linear operator theory and the (psuedo-) monotone operator and mapping theory and the fixed point the- ory.Throughout the monograph demonstrates the theory and algorithm using concrete examples and describes how to apply it for motivated applications

1 Introduction

In this monograph we develop the function space approach for the optimization prob- lems, e.g., nonlinear programming in Banach spaces, convex and non-convex nonsmooth variational problems, control and inverse problems, image/signal analysis, material design, classification, and resource allocation. We also develop the basic functional analysis tool for for the nonlinear equations and Cauchy problems in Banach spaces in order to establish the well-posedness of the constraint optimization. The function space optimization method allows us to study the well-posednees for a general class of optimization problems systematically using the functional analysis tool and it also enables us to develop and analyze the numerical optimization algorithm based on the calculus of variations and the Lagrange multiplier theory for the constrained opti- mization. The function space analysis provides the regularizing procedure to remeady the ill-posedness of the constraints and the necessary optimality system and thus one can develop a stable and effective numerical algorithm. The convergence analysis for the function space optimization guides us to analyze and improve the stability and effectiveness of algorithms for the associated large scale optimizations. A general class of constrained optimization problems in function spaces is consid- ered and we develop the Lagrange multiplier theory and effective solution algorithms. Applications are presented and analyzed for various examples including control and design optimization, inverse problems, image analysis and variational problems. Non- convex and non-smooth optimization becomes increasing important for the advanced and effective use of optimization methods and thus they are essential topics for the monograph. We introduce the basic function space and functional analysis tools needed for our analysis. The monograph provides an introduction to the functional analysis, real analysis and convex analysis and their basic concepts with illustrated examples. Thus, one can use the book as a basic course material for the functional analysis and nonlinear operator theory. The nonlinear operator theory and their applications to PDE’s problems are presented in details, including classical variational optimization problems in Newtonian and non-Newtonian mechanics and fluid. An emerging appli- cation of optimizations include the imaging and signal analysis and the classification and machine learning. It is often formulated as the PDE’s constrained optimization. There are numerous applications of the optimization theory and variational method in all sciences and to real world applications. For example, the followings are the

2 motivated examples. Example 1 (Quadratic programming)

1 t t n min 2 x Ax − a x over x ∈ R

subject to Ex = b (equality constraint) Gx ≤ c (inequality constraint). where A ∈ Rn×n is a symmetric positive matrix, E ∈ Rm×n,G ∈ Rp×n, and a ∈ Rn, b ∈ Rm, c ∈ Rp are given vectors. The quadratic programming is formulated on a X for the variational problem, in which (Ax, x)X defines the quadratic form on X.

Example 2 (Linear programming) The linear programing minimizes (c, x)X subject to the linear constraint Ax = b and x ≥ 0. Its dual problem is to maximize (b, x) subject to A∗x ≤ c. It has many important classes of applications. For example, the optimal transport problem is to find the joint distribution function µ(x, y) ≥ 0 on X × Y that transports the probability density p0(x) to p1(y), i.e. Z Z µ(x, y) dx = p0(x), µ(x, y) dy = p1(y), Y Y with minimum energy Z c(x, y)µ(x, y) dxdy = (c, µ). X×Y Example 3 (Integer programming) Let F is a continuous function Rn. The integer programming is to minimize F (x) over the integer variables x, i.e.,

min F (x) subject to x ∈ {0, 1, ··· N}.

The binary optimizations to minimize F (x) over the binary variables x. The network optimization problem is the one of important applications. Example4 (Obstacle problem) Consider the variational problem;

Z 1 1 d min J(u) = ( | u(x)|2 − f(x)u(x)) dx (1.1) 0 2 dx subject to u(x) ≤ ψ(x) 1 over all displacement function u(x) ∈ H0 (0, 1), where the function ψ represents the upper bound (obstacle) of the deformation. The functional J represents the Newtonian deformation energy and f(x) is the applied force. Example 5 (Function Interpolation) Consider the interpolation problem;

Z 1 2 1 d 2 min ( | 2 u(x)| dx over all functions u(x), (1.2) 0 2 dx subject to the interpolation conditions d u(x ) = b , u(x ) = c , 1 ≤ i ≤ m. i i dx i i

3 Example 6 (Image/Signal Analysis) Consider the deconvolution problem of finding the image u that satisfies (Ku)(x) = f(x) for the convolution K Z (Ku)(x) = k(x, y)u(y) dy. Ω over a domain Ω. It is in general ill-posed and we use the (Tikhonov) regularization formulation to have a stable deconvolution solution u, i.e., for Ω = [0, 1]

Z 1 α d min (|Ku − f|2 + | u(x)|2 + β u(x)) dx over all possible image u ≥ 0, (1.3) 0 2 dx where α, β ≥ 0 are the regularization parameters. The first regularization term is the quadratic varation of image u. Since if u ≥ 0 then |u| = u, the regularization term for the L1 sparsity. The two regularization parameters are chosen so that the variation of image u is regularized. Example 7 (L0 optimization) Problem of optimal resource allocation and material dis- tribution and optimal time scheduling can be formulated as Z F (u) + h(u(ω)) dω (1.4) Ω

α 2 0 with h(u) = 2 |u| + β |u| where |0|0 = 0, |u|0 = 1, otherwise.

Since Z |u(ω)|0 dω = vol({u(ω) 6= 0}), Ω the solution to (1.4) provides the optimal distribution of u for minimizing the perfor- mance F (u) with appropriate choice of (regularization) parameters α, β > 0. Example 8 (Bayesian statistics) Consider the statical optimization;

min −log (p(y|x) + βp(x)) over all admissible parameters x, (1.5) where p(y|x) is the conditional probability density of observation y given parameter x and the p(x) is the prior probability density for parameter x. It is the maximum posterior estimation of the parameter x given observation y. In the case of inverse problems it has the form φ(x|y) + α p(x) (1.6) for x is in a function space X and φ(x|y) is the fidelity and p is a regularization semi- on X. A constant α > 0 is the Tikhonov regularization parameter. Example 9 (Optimal Control Problem) Consider the constrained optimization of the form min F (x) + H(u) subject to E(x, u) = 0 and u ∈ C where x ∈ X and u ∈ U is the state and control variables in Hilbert spaces X and U, respectively. Equality constraint E represents the model equations for x = x(u) ∈ X as a function of u ∈ C, a closed in U. The separable cost functionals are appropriately selected to achieve a specific performance. Such optimization problems

4 arise in the optimal control problem and inverse medium problem and optimal design problem. Example 8 (Machine Learning) Consider the constrained optimization of the form

L(p, q) =

Multi-dimensional versions of these examples and more illustrated examples will be discussed and analyzed throughout the monograph. Discretization An important aspect of the function space optimization is the approx- n imation method and theory. For example, let Bi (x) is the linear B-spline defined by  x−xi−1 on [xi−1, xi]  xi−xi−1 x−xi+1 Bi(x) = on [x , x ] (1.7) xi−xi+1 i i+1  0 otherwise and let n−1 n X u (x) = uiBi(x) ∼ u(x). (1.8) i=1 Then, d n ui − ui−1 u = , x ∈ (xi−1, xi). dx xi − xi−1 i 1 Let xi = n and ∆x = n and one can define the discretized problem of Example 1:

n X 1 ui − ui−1 min ( | |2 − f u ) ∆x 2 ∆x i i i=1 subject to u0 = un = 0, ui ≤ ψi = ψ(xi). n−1 where fi = f(xi). The discretized problem is the quadratic programming for ~u ∈ R with     2 −1 f1  −1 2 −1   f2  1      ......   .  A =  . . .  , b = ∆x  .  ∆x      −1 2 −1   fn−2  −1 2 fn−1 Thus, we will discuss a convergence analysis of the approximate solutions {un} to the solution u as n → ∞ in an appropriate function space. In general, one has the optimization of the form

min F0(x) + F1(x) over x ∈ X subject to E(x) ∈ K

1 where X is a Banach space, F0 is C and F1 is nonsmooth functional and E(x) ∈ K represents the constraint conditions with K is a closed convex cone. For the existence of a minimizer we require a coercivity condition and lower-semicontinuity and the

5 continuity of the constraint in a weak or weak star topology of X. For the necessary optimality condition, we develop a generalized Lagrange multiplier theory:

 F 0(x∗) + λ + E0(x∗)∗µ = 0,E(x) ∈ K    ∗ C1(x , λ) = 0 (1.9)    ∗  C2(x , µ) = 0, where λ is the Lagrange multiplier for a relaxed derivative of F1 and µ is the Lagrange multiplier for the constraint E(x) ∈ K. We derive the complementarity conditions C1 ∗ and C2 so that (1.9) is a complete set of equation for determining (x , λ, µ). We discuss the solution method for linear and nonlinear equations in Banach spaces of the form f ∈ A(x) (1.10) where f ∈ X∗ in the of a Banach space and A is pseudo-monotone mapping X → X∗ and the Cauchy problem: d x(t) ∈ A(t)(x(t)) + f(t), x(0) = x , (1.11) dt 0 where A(t) is an m-dissipative mapping in X × X. Also, we note that the necessary optimality (1.9) is a nonlinear operator equation and we use the nonlinear operator theory to analyze solutions to (1.9). We also discuss the PDE constrained optimiza- tion problems, in which f and f(t) are a function of control and design and medium parameters. We minimize a proper performance index defined for the state and control functions subject to the constraint (1.10) or (1.11). Thus, we combine the analysis of the PDE theory and the optimization theory to develop a comprehensive treatment and analysis of the PDE constrained optimization problems. The outline of the monograph is as follows. In Section 2 we introduce the basic function space and functional analysis tools needed for our analysis. It provides the introduction to the functional analysis and real analysis and their basic concepts with illustrated examples. For example, the dual space of normed space, Hahn-Banach theorem, open mapping theory, closed range theory, distribution theory and weak solution to PDEs, and spectral theory. In Section 3 we develop the Lagrange multiplier method for a general class of (smooth) constrained optimization problems in Banach spaces.The Hilbert space the- ory is introduced first and the Lagrange multiplier is defined as the limit of the penal- ized problem. The constraint qualification condition is introduced and we derive the (normal) necessary optimality system for the primal and dual variables. We discuss the minimum norm problem and the principle in Banach spaces. In Section 4 we present the linear operator theory and C0-semigroup theory on Banach spaces. In Section 5 we develop the monotone operator theory and pseudo-monotone operator theory for nonlinear equations in Banach spaces and the nonlinear semigroup theory in Banach spaces. The fixed point theory for nonlinear equations in Banach spaces. In Section 7 we develop the basic convex analysis tools and the Lagrange multiplier theory for a general class of convex non-smooth optimization. We introduce and analyze the augmented Lagrangian method for the constrained optimization and non-smooth optimization problems.

6 In Section 8 we develop the Lagrange multiplier theory for a general class of non- smooth and non-convex optimizations. In this course we discuss the well-posededness of the evolution equations in Ba- nach spaces. Such problems arise in PDEs dynamics and functional equations. We develop the linear and nonlinear theory for the corresponding solution semigroups. The lectures include for example the Hille-Yosiida theory, Lumer-Philiips theory for linear semigroup and Crandall-Liggett theory for nonlinear contractive semigroup and Crandall-Pazy theory for nonlinear evolution equations. Especially, (numerical) ap- proximation theory for PDE solutions are discussed based on Trotter-Kato theory and Takahashi-Oharu theory, Chernoff theory and the operator splitting method. The the- ory and its applications are examined and demonstrated using many motivated PDE examples including linear dynamics (e.g. heat, wave and hyperbolic equations) and nonlinear dynamics (e.g. nonlinear diffusion, conservation law, Hamilton-Jacobi and Navier-Stokes equations). A new class of PDE examples are formulated and the de- tailed applications of the theory is carried out. The lecture also covers the basic elliptic theory via Lax-Milgram, Minty-Browder theory and convex optimization. Functional analytic methods are also introduced for the basic PDEs theory. The students are expected to have the basic knowledge in real and functional anal- ysis and PDEs. Lecture notes will be provided. Reference book: ”Evolution equations and Approx- imation” K. Ito and F. Kappel, World Scientific.

2 Basic Function Space Theory

In this section we discuss the basic theory and concept of the functional analysis and introduce the function spaces, which is essential to our function space analysis of a gen- eral class of optimizations. We refer to the more comprehensive treatise and discussion on the functional analysis for e.g., Yosida [].

2.1 Complete Normed Space In this section we introduce the , normed space and Banach spaces. Definition (Vector Space) A vector space over the scalar field K is a set X, whose elements are called vectors, and in which two operations, addition and scalar multipli- cation, are defined, with the following familiar algebraic properties. (a) To every pair of vectors x and y corresponds a vector x + y, in such a way that

x + y = y + x and x + (y + z) = (x + y) + z;

X contains a unique vector 0 (the zero vector) such that x + 0 = 0 + x for every x ∈ X, and to each x ∈ X corresponds a unique inverse vector −x such that x + (−x) = 0. (b) To every pair (α, x) with α ∈ K and x ∈ X corresponds a vector α x, in such a way that 1 x = x, α(βx) = (αβ) x,

7 and such that the two distributive laws

α (x + y) = α x + α y, (α + β) x = α x + β x hold. The symbol 0 will of course also be used for the zero element of the scalar field. It is easy to see that the inverse element −x of the additivity satisfies −x = (−1) x since 0 x = (1 − 1) x = x + (−1) x for each x ∈ X. In what follows we consider vector spaces only over the field R or the field C. Definition (Subspace, Basis) (1) A subset S of X is a (linear) subspace of a vector space X if α x1 + β x2 ∈ S for all x, x2 ∈ S and and α, β ∈ K. n Pn (2) A family of linear vectors {xi}i=1 are linearly independent if i=1 αi xi = 0 if and only if αi = 0 for all 1 ≤ i ≤ n. The dimension dim(X) is the largest number of independent vectors in X. (3) A basis {eα} of a vector space X is a set of linearly independent vectors that, in a P linear combination, can represent every vector in X, i.e., x = α aαeα. Example (Vector Space) (1) Let X = C[0, 1] be the vector space of continuous functions f on [0, 1]. X is a vector space by

(α f + β g)(t) = α f(t) + β g(t), t ∈ [0, 1]

S= a space of polynomials of order less than n is a linear subspace of X and dim(X) = k ∞ n. A family {t }k=0 of polynomials is a basis of X and dom(X) = ∞. (2) The vector space of real number sequences x = (x1, x2, x3, ··· ) is a vector space by

(α x + β y)k = α xk + β yk, k ∈ N. A family of unit vectors ek = (0, ··· , 1, 0, ··· ), k ∈ N is a basis.

(3) Let X be the vector space of square integrable functions on (0, 1). Let ek(x) = Z 1 p (2) sin(kπx), 0 ≤ x ≤ 1. Then {ek} are orthonomal basis, i.e. (ek, ej) = ek(x)ej(x) dx = 0 δk,j and ∞ X f(x) = fk ek(x), the Fourier sine series k=1 R 1 where fk = 0 f(x)ek(x) dx. en+1 is not linearly dependent with (e1, ··· , en). If so, Pn en+1 = i=1 βiei but βi = (en+1, ei) = 0, which is the contradiction. Definition (Normed Space) A norm | · | on a vector space X is a real-valued func- tional on X which satisfies |x| ≥ 0 for all x ∈ X with equality if and only if x = 0,

|α x| = |α| |x| for every x ∈ x and α ∈ K

|x + y| ≤ |x| + |y| () for every x, y ∈ X.

A normed space is a vector space X which is equipped with a norm | · | and will be denoted by (X, | · |). The norm of a normed space X will be denoted by | · |X or simply by | · | whenever the underlined space X is understood from the context.

8 Example (Normed space) Let X = C[0, 1] be the vector space of continuous functions f on [0, 1]. The followings are examples the two normed space for X; X1 is equipped with sup norm

|f|X1 = max |f(x)|. x∈[0,1] 2 and X2 is equipped with L norm s Z 1 2 |f|X2 = |f(x)| dx. 0

Definition (Banach Space) Let X be a normed space. ∗ (1) A sequence {xn} in a normed space X converges to the limit x ∈ X if and only if ∗ limn→∞ |xn − x | = 0 in R. (2) A subspace S of a normed space X is said to be dense in X if each x ∈ X is the limit of a sequence of elements in S. (3) A sequence {xn} in a normed space X is called a Cauchy sequence if and only if limm, n→∞ |xm − xn| = 0, i.e., for all  > 0 there exists N such that for m, n ≥ N such that |xm − xn| < . If every Cauchy sequence in X converges to a limit in X, then X is complete and X is called a Banach space.

A Cauchy sequence {xn} in a normed space X has a unique limit if it has accumu- lation points. In fact, for any limit x∗

∗ lim |x − xn| = lim lim |xm − xn| = 0. n→∞ n→∞ m→∞

Example (continued) Two norms |x|1 and |x|2 on a vector space X are equivalent if there exist 0 < c ≤ c¯ < ∞ such that

c|x|1 ≤ |x|2 ≤ c¯|x|1 for all x ∈ X.

(1) All norms on Rn is are equivalent. But for X = C[0, 1] consider xn(t) = tn. Then, n n q 1 n |x |X1 = 1 but |x |X2 = 2n+1 for all n. Thus, |x |X2 → 0 as n → ∞ and the two norms X1 and X2 are not equivalent. (2) Every bounded sequence in Rn has a convergent subsequence in Rn (Bolzano- n Weierstrass theorem). Consider the sequence xn(t) = t in X = C[0, 1]. Then xn is not convergent despite |xn|X1 = 1 is bounded. Definition (Hilbert Space) If X is a vector space, a functional (·, ·) defined on X×X is called an inner product on X provided that for every x, y, z ∈ X and α, β ∈ C

(x, y) = (y, x)

(α x + β y, z) = α (x, z) + β (y, z),

(x, x) ≥ 0 and (x, x) = 0 if and only if x = 0. wherec ¯ denotes the complex conjugate of c ∈ C. Given such an inner product, a norm on X can be defined by 1 |x| = (x, x) 2 .

9 A vector space (X, (·, ·)) equipped with an inner product is called a pre–Hilbert space. If X is complete with respect to the induced norm, then it is called a Hilbert space. Every inner product satisfies the Cauchy-Schwarz inequality

|(x, y)| ≤ |x| |y| for x, y ∈ X

In fact, for all t ∈ R

|x + t(x, y) y|2 = |x|2 + 2t (x, y) + t2 |y|2 ≥ 0.

Thus, we must have |(x, y)|2 −|x|2|y|2 ≤ 0 which shows the Cauchy-Schwarz inequality.

Example (Hilbert space) (1) Rn with inner product

n X n (x, y) = xkyk for x, y ∈ R k=1

2 (2) Consider a space ` of square summable real number sequences x = (x1, x2, ··· ) define the inner product

∞ X 2 (x, y) = xkyk for x, y ∈ ` k=1

Then, `2 is an Hilbert space. In fact, let {xn} is a Cauchy sequence in `2. Since

n m n m |xk − xk | ≤ |x − x |`2 ,

n {xk } is a Cauchy sequence in R for every k. Thus, one can define a sequence x by n xk = limn→∞ xk . Since m n lim |x − x |`2 = |x − xn|`2 m→∞ 2 for a fixed n, x ∈ ` and |xn − x|`2 → as n → ∞. (3) Consider a space X2 of continuously differentiable functions on [0, 1] vanishing at x = 0, x = 1 with inner product

Z 1 d d (f, g) = f g dx for f, g ∈ X2. 0 dx dx

Then X2 is a Pre-Hilbert space. Every normed space X is either a Banach space or a dense subspace of a Banach space Y whose norm |x|Y satisfies

|x|Y = |x| for every x ∈ X.

In this latter case Y is called the completion of X. The completion of a normed space proceeds as in Cantor’s Construction of real numbers from rational numbers.

Completion The set of Cauchy sequences {xn} of the normed space X can be classified according to the equivalence {xn} ∼ {yn} if lim n→∞ |xn −yn| = 0. Since ||xn|−|xm|| ≤ 0 |xn − xm| and R is complete, it follows that limn→∞ |xn| exists. We denote by {xn} ,

10 0 the class containing {xn}. Then the set Y of all such equivalent classesx ˜ = {xn} is a vector space by

0 0 0 0 0 {xn} + {yn} = {xn + yn} , α {xn} = {α xn} and we define 0 |{xn} |Y = lim |xn|. n→∞ It is easy to see that these definitions of the vector sum and scalar multiplication and 0 0 norm do not depend on the particular representations for the classes {xn} and {yn} . For example, if {xm} ∼ {yn}, then lim n→∞ ||xn| − |yn|| ≤ lim n→∞ |xn − yn| = 0, and 0 thus lim n→∞ |xn| = lim n→∞ |yn|. Since |{xn} |Y = 0 implies that limn→∞ |xn − 0|, 0 we have {0} ∈ {xn} and thus | · |Y is a norm. Since an element x ∈ X corresponds to the Cauchy sequence: x¯ = {x, x, . . . , x, . . . }0 ∈ Y, X is naturally contained in Y . (k) To prove the completeness of Y , let {x˜k} = {{xn }} be a Cauchy sequence in Y . Then for each k, we can choose an integer nk such that |x(k) − x(k)| ≤ k−1 for m > n . m nk X k

We show that the sequence {x˜k} converges to the class containing the Cauchy sequence (k) {xnk } of X. To this end, note that

(k) (k) (k) −1 |x˜k − xn |Y = lim |x − x |X ≤ k . k m→∞ m nk Since

|x(k) − x(m)| = |x(k) − x(m)| ≤ |x(k) − x˜ | + |x˜ − x˜ | + |x˜ − x(m)| nk nm X nk nm Y nk k Y k m Y m nm Y

−1 −1 ≤ k + m + |x˜k − x˜m|Y ,

(k) (k) 0 {xnk } is a Cauchy sequence of X. Letx ˜ = {xnk } . Then,

(k) (k) (k) −1 |x˜ − x˜k|Y ≤ |x˜ − xnk |Y + |xnk − x˜k|Y ≤ |x˜ − xnk |Y + k . Since, as shown above

(k) m (k) −1 |x˜ − xn |Y = lim |x − x |X ≤ lim |x˜m − x˜k|Y + k k m→∞ nm nk m→∞

(k) thus we prove that limk→∞ |x˜ − xnk |Y = 0 and so limk→∞ |x˜ − x˜k|Y = 0. Moreover, the above proof shows that the correspondence x ∈ X tox ¯ ∈ Y is isomorphic and isometric and the image of X by this correspondence is dense in Y , that is X is dense in Y .  Example (Completion) Let X = C[0, 1] be the vector space of continuous functions f on [0, 1]. We consider the two normed space for X; X1 is equipped with sup norm

|f|X1 = max |f(x)|. x∈[0,1]

11 2 and X2 is equipped with L norm s Z 1 2 |f|X2 = |f(x)| dx 0

2 We claim that X1 is complete but X2 is not. The completion of X2 is L (0, 1) =the space of square integrable functions. First, discuss X1. Let {fn} is a Cauchy sequence in X1. Since

|fn(x) − fm(x)| ≤ |fn − fm|X1 ,

{fn(x)} is Cauchy sequence in R for every x ∈ [0, 1]. Thus, one can define a function f by f(x) = limn→∞ fn(x). Since

|f(x) − f(ˆx)| ≤ |fn(x) − fn(ˆx)| + |fn(x) − f(x)| + |fn(ˆx) − f(ˆx)| and |fn(x) − f(x)| → 0 uniformly x ∈ [0, 1], the candidate f is continuous. Thus, X1 is complete. Next, discuss X2. Define a Cauchy sequence {fn} by

 1 1 1 1 1 1 2 + n (x − 2 ) on [ 2 − n , 2 + n ]  1 1 fn(x) = 0 on [0, 2 − n ]  1 1 1 on [ 2 + n , 1]

1 1 1 The candidate limit f in this case is defined by f(x) = 0, x < 2 , f( 2 ) = 2 and 1 f(x) = 1, x > 2 and is not in X. Consider a Cauchy sequences gn and hn in the equivalence class {fn};

 1 1 1 1 1 + n (x − 2 ) on [ 2 − n , 2 ]  1 1 gn(x) = 0 on [0, 2 − n ]  1 1 on [ 2 , 1]

 1 1 1 1 n (x − 2 ) on [ 2 , 2 + n ]  1 hn(x) = 0 on [0, 2 ]  1 1 1 on [ 2 + n , 1].

1 1 Thus, the candidate limits g,(g(x) = 0, x < 2 , g(x) = 1, x ≥ 2 and h,(h(x) = 0, x ≤ 1 1 2 , h(x) = 1, x > 2 belong to the same equivalent class f = {fn}. Note that f, g and 1 h differ only at x = 2 . In general functions in an equivalent class differ at countably many points. The completion Y of X2 is denoted by

L2(0, 1) = {the space of square integrable measurable functions on (0, 1)} with Z 1 2 2 |f|L2 = |f(x)| dx. 0 Here, the integrable is defined in the sense of Lebesgue as discussed in the next section. The concept of completion states, for every f ∈ Y = X¯ =the completion of X there exists a sequence fn in X such that |fn −f|Y → 0 as n → ∞ and |f|Y = limn→∞ |fn|X .

12 2 For the example X2, for all square integrable function f ∈ L (0, 1) = Y = X¯2 there exists a continuous function sequence fn ∈ X2 such that |f − fn|L2(0,1) → 0. 1 1 Example H0 (0, 1) The completion of C( 0, 1), the pre-Hilberrt space of continuously 1 differentiable functions on [0, 1] vanishing at x = 0 x = 1 with respect to H0 inner product Z 1 d d (f, g)H1 = f(x) g(x) dx 0 0 dx dx 1 is denoted by H0 (0, 1). It can be proved that

1 H0 (0, 1) = {a space of absolutely continuous functions on [0, 1] with square integrable derivative and vanishing at x = 0, x = 1} with Z 1 d d (f, g)H1 = f(x) g(x) dx 0 0 dx dx where the integrable is defined in the sense of Lebesgue. Here, f is absolutely contin- uous function if there exits an integrable function g such that Z x f(x) = f(0) + g(x) dx x ∈ [0, 1]. 0

d and f is almost everywhere (a.e.) differentiable and dx f = g a.e. in (0, 1). For the n 1 linear spline function Bk (x), 1 ≤ k ≤ n − 1 is in H0 (0, 1).

2.2 Measure Theory In this section we discuss the measure theory. Definition (1) A topological space is a set E together with a collection τ of subsets of E, called open sets and satisfying the following axioms; The empty set and E itself are open. Any union of open sets is open. The intersection of any finite number of open sets is open. The collection τ of open sets is then also called a topology on E. We assume that a topological space (E, τ) satisfies the Hausdoroff’s axiom of separation:

for every disjoints x1, x2 ∈ E there exist disjoint open sets G, G2 auch that x1 ∈ G1, x2 ∈ G2.

(2) A metric space E is a topological space with metric d if for all x, y, z ∈ E

d(x, y) ≥ for x, y ∈ E and d(x, y) = 0 if and only if x = y

d(x, y) = d(y, x)

d(x, z) ≤ d(x, y) + d(y, z).

For x0 ∈ E and r > 0 B(x0, r) = {x ∈ E : d(x, x0) < r is an open ball of E with center x0 an radius r. A set O is called open if for any points x ∈ E there exists an open ball B(x, r) contained in the set O.

13 (3) A collection of subsets F of E is σ-algebra if

for all A ∈ F,Ac = E \ A ∈ F

S∞ for arbitrary family {Ai} in F, countable union i=1 Ai ∈ F (4) A measure µ on (E, F) assigns A ∈ F the nonnegative real value µ(A) and satisfies σ-additivity; ∞ ∞ [ X µ( Ai) = µ(Ai) i=1 i=1 for all disjoint sets {Ai} in F. Theorem (Monotone Convergence) The measure µ is σ-additive if and only if for S all sequence {Ak} of nondecreasing events and A = k≥1 Ak, limn→∞ µ(An) = µ(A). c c T c Proof: {Ak} is a sequence of nonincreasing events and A = k≥1 Ak. Since

\ c c c c c c Ak = A1 + (A2 \ A1) + (A3 \ A2) + ··· k≥1 we have c c c c c c µ(A ) = µ(A1) + µ(A2 \ A1) + µ(A3 \ A2) + ···

c c c c c c = µ(A1) + µ(A2) − µ(A1) + µ(A3) − µ(A1) + ··· = limn→∞ µ(An) Thus, c µ(A) = 1 − µ(A ) = 1 − (1 − lim µ(An)) = lim µ(An). n→∞ n→∞ P∞ Conversely, let A1,A2, · · · ∈ F be pairwise disjoint and let k=1 Ak ∈ F. Then ∞ n ∞ X X X µ( Ak) = µ( Ak) + µ( Ak) k=1 k=1 k=n+1 P∞ Since k=n+1 Ak ↓ ∅, we have ∞ X lim Ak = n→∞ ∅ k=n+1 and thus ∞ n ∞ X X X µ( Ak) = lim µ(Ak) = µ(Ak). n→∞  k=1 k=1 k=1 Example (Metric space) A normed space X is a metric space with d(x, y) = |x − y|.A point x ∈ E is the accumulation point of a sequence {xn} if d(xn, x) → 0 as n → ∞. The A¯ of a set A is the collection of all accumulation points of A. The boundary of asset A is the collection points of x that any open ball at x contains both points in c A and A . A set B in E is closed if B = B¯, i.e., every sequence in {xn} in B has a accumulation x. The complement Ac of an open set A is closed. Examples (σ-algebra) There are trivial σ-algebras;

∗ F0 = {Ω, ∅}, F = all subsets of Ω.

14 Let A be a subset of Ω and σ-algebra generated by A is

c FA = {Ω, ∅, A, A }.

A finite set of subsets A1,A2, ··· ,An of E which are pairwise disjoint and whose union S is E. It is called a partition of Ω . It generates the σ-algebra: A = {A = j∈J Aj} where J runs over all subsets of 1, ··· , n. This σ-algebra has 2n elements. Every finite σ-algebra is of this form. The smallest nonempty elements {A1, ··· ,An} of this algebra are called atoms.

Example (Countable measure) Let Ω has a countable decomposition {Dk}, i.e., X Ω = Dk,Dj ∩ Dj = ∅, i 6= j.

∗ P Let F = F and P (Dk) = αk > 0 and k αk = 1. For the Poisson measure on nonnegative integers; λk D = {k}, m(D ) = e−λ k k k! for λ > 0. Definition For any family C of subsets of E, we can define the σ-algebra σ(C) by the smallest σ-algebra A which contains C. The σ-algebra σ(C) is the intersection of all σ-algebras which contains C. It is again a σ-algebra, i.e., \ A = Aα α where Aα are all σ-algebras that contain C. The following construction of the measure space and the measure is essential for the triple (E, F, m) on uncountable space E. If (E, O) is a topological space, where O is the set of open sets in E, then the σ- algebra B(E) generated by O is called the Borel σ-algebra of the topological space E and (E, B(E)) defines the measure space and a set B in B(E) is called a Borel set. For example (Rn, B(Rn)) is the the measure space and B(Rn) is the σ-algebra generated by open balls in Rn. Caratheodory Theorem Let B = σ(A), the smallest σ-algebra containing an algebra A of subsets of E. Let µ0 is a σ additive measure of on (E, A). Then there exist a unique measure µ on (E, B) which is an extension of µ0, i.e., µ(A) = µ0(A),A ∈ A. Definition A map f from a measure space (X, A) to an other measure space (Y, B) is called measurable, if f −1(B) = {x ∈ X : f(x) ∈ B} ∈ A for all B ∈ B. If f :(Rn, B(Rn)) → (Rm, B(Rm)) is measurable, we say f is a Borel function. In general every continuous function Rn → Rm is a Borel function since the inverse image of open sets in Rn are open in Rm. Example (Measure space) Let Ω = R and B(R) be the Borel σ-algebra. Note that

\ 1 \ 1 1 (a, b] = (a, b + ), [a, b] = (a − , b + ) ∈ B(R). n n n n n

15 Thus, B(R) coincides with the σ-algebra generated by the semi-closed intervals. Let A be the algebra of finite disjoint sum of semi-closed intervals (ai, bi] and define P0 by

n n X X µ0( (ak, bk]) = (F (bk) − F (ak)) k=1 k=1 where x ∈ R → F (x) ∈ R+ is nondecreasing, right continuous and the left limit exists everywhere. We have the measure µ on (R, B(R)) by the Caratheodory Theorem if µ0 is σ-additive on A We now prove that µ0 is σ-additive on A. By the monotone convergence theorem it suffices to prove that

P0(An) ↓ 0,An ↓ ∅,An ∈ A.

Without loss of the generality one can assume that An ⊂ [−N,N]. Since F is the right continuous, for each An there exists a set Bn ∈ A such that Bn ⊂ An and

−n µ0(An) − µ0(Bn) ≤ 2 for all  > 0. The collection of sets {[−N,N]\Bn} is an open covering of the compact set [−N,N] since ∩Bn = ∅. By the Heine-Borel theorem there exists a finite subcovering: n [0 [−N,N] \ Bn = [−N,N]. n=1

Tn0 and thus n=1 Bn = 0. Thus,

Tn0 Tn0 Tn0 µ0(An0 ) = µ0(An0 \ k=1 Bk) + µ0( k=1 Bk) = µ0(An0 \ k=1 Bk)

n n \0 X0 µ0( (Ak \ Bk)) ≤ µ0(Ak \ Bk) ≤ . k=1 k=1

Since  > 0 is arbitrary µ0(An) → 0 as n → ∞. Next, we define the integral of a measurable function f on (Ω, F, µ). Defintion (Elementary function) An elementary function f is defined by

n X f(x) = xk IAk (x) k=1 where {Ak} is a partition of E, i.e, Ak ∈ F are disjoint and ∪Ak = E. Then the integral of f is given by

n Z X f(x) dµ(x) = xkµ(Ak). E k=1 Theorem (Approximation) For every measurable function f ≥ 0 on (E, F) there exists a sequence of elementary functions {fn} such that 0 ≤ fn(x) ≤ f(x) and fn(x) ↑ f(x) for all x ∈ E.

16 Proof: For n ≥ 1, define a sequence of elementary functions by

n2n X k − 1 f (ω) = I (ω) + nI n 2n k,n f(x)>n k=1

k−1 k where Ik,n is the indicator function of the set {x : 2n < f(x) ≤ 2n }. It is easy to verify that fn(x) is monotonically nondecreasing and fn(x) ≤ f(x) and thus fn(x) → f(x) for all x ∈ E.  Definition (Integral) For a nonnegative measurable function f on (E, F) we define the integral by Z Z f(x) dµ(x) = lim fn(x) dµ(x) E n→∞ E Z where the limit exists since fn(x) dµ(x) is an increasing number sequence. E Note that f = f + − f − with f +(x) = max(0, f(x)), f −(x) = max(0, f(x)). So, we can apply for the above Theorem and Definition for f + and f −. Z Z Z f(x) dµ(x) = f +(x) dµ(x) − f −(x) dµ(x) E E E Z Z If f +(x) dµ(x), f −(x) dµ(x) < ∞, f is integrable and E E Z Z Z |f(x)| dµ(x) = f +(x) dµ(x) + f −(x) dµ(x). E E E Corollary Let f is a measurable function on (R, B(R), µ) with µ((a, b]) = F (b)−F (a), then we have as n → ∞

n2n Z ∞ X k − 1 k k − 1 Z ∞ f(x) dµ(x) = f( )(F ( )−F ( ))+f(n)(1−F (n)) → f(x) dF (x), 2n 2n 2n 0 k=0 0 which is the Lebesgue Stieljes integral with respect to measure dF . The space L of measurable functions on (E, F) is a with metric Z |f − g| d(f, g) = dµ. E 1 + |f − g| In fact, d(f, g) = 0 if and if f = g almost everywhere and since

|f + g| |f| |g| ≤ + , 1 + |f + g| 1 + |f| 1 + |g| d satisfies the triangle inequality. Theorem (completeness) (L, d) is a complete metric space.

Proof: Let {fn} be a Cauchy sequence of (L, d). Select a subsequence fnk such that

∞ X d(fnk , fnk+1 ) < ∞ k=1

17 Then,

X |fnk − fnk+1 | E[ < ∞ 1 + |fn − fn | k k k+1 2|f| Since |f| ≤≤ 1+|f| for |f| ≤ 1,

∞ X |fnk − fnk+1 | < ∞ a.e.

and {fnk } almost everywhere converges to f for some f ∈ L. Moreover, d(fnk , f) → 0 and thus (L, d) is complete.

Definition (Convergence in Measure) A sequence {fn} of mesurable functions on S con- verges to f in measure if

for  > 0, limn→∞ m({s ∈ S : |fn(s) − f(s)| ≥ })=0.

Theorem (Convergence in measure) {fn} converges in measure to f if and only if {fn} converges to f in d-metric. Proof: For f, g ∈ L Z |f − g| Z |f − g|  d(f, g) = dµ + dµ ≤ µ(|f − g| ≥ ) + |f−g|≥ 1 + |f − g| |f−g|< 1 + |f − g| 1 +  holds for all  > 0. Thus, fn converges in mesure to f, then fn converges to f in d-metric. Conversely, since  d(f, g) ≥ µ(|f − g| ≥ ) 1 +  if fn converges to f in d-metric, then fn converges in measure to f.  Corollary The completion of C(E) with respect to Lp(E)-norm:

Z 1 ( |f|p dµ) p E

p R p is L (E) = {f ∈ L : E |f| dµ < ∞}.

2.3 Baire’s Category Theory In this section we discuss the uniform boundedness principle and the open mapping theorem. The underlying idea of the proofs of these theorems is the Baire theorem for complete metric spaces. It is concerned with the decomposition of a space as a union of subsets. For instance, we have a countable union;

2 [ R = Sk,j,Sk = (k, k + 1] × (j, j + 1]. k,j

On the other hand, we have uncountable union; [ `α, `α = {α} × R α∈R

18 2 but `α has no points, the one dimensional subspace of R . The question is can we represent R2 asa countable union of these sets? It turns out that the answer is no. Definition (Category) Let (X, d) be a metric space. A subset E of X is called nowhere dense if its closure does not contain any open set. X is said to be the first category if X is the union of a countable number of nowhere dense sets. Otherwise, X S is said to be the second category, i.e., suppose X = k Ek, then the closure of at least one of Ek’s has non-empty interior. Baire-Hausdorff Theorem A complete metric space is of the second category.

Proof: Let {Ek} be a sequence of closed sets and assume ∪n En = X in a complete metric space X. We will show that there is at least one of Ek’s has non-empty interior. c Assume otherwise, no En contains an interior point. Let On = En and then On is open and dense in X for every n ≥ 1. Thus, O1 contains a closed ball B1 = {x : 1 d(x, x1) ≤ r1} with r1 ∈ (0, 2 ). Since O2 is open and dense, there exists a closed ball 1 B2 = {x : d(x, x2) ≤ r2} with r2 ∈ (0, 22 ) in B1. By repeating the same process, there exists a sequence {Bk} of closed ball Bk = {x : x : d(x, xk) ≤ rk such that 1 B ⊂ B , 0 < r ,B ∩ E = . k+1 k k 2k k k ∅ ∗ The sequence {xk} is a Cauchy sequence in X and since X is complete x = lim xn ∈ X exists. Since

∗ ∗ ∗ d(xn, x ) ≤ d(xn, xm) + d(xm, x ) ≤ rn + d(xm, x ) → rn ∈ ∗ ∗ as m → ∞, x Bn for every n, i.e. x ∈/ En and thus x ∈/ ∩nEn = X, which is a contradiction.  Baire Category theorem Let M be a set of the first category in a compact topolog- ical space. Then, the complement M c is dense in X. Theorem (Totally bounded) A subset M in a complete metric space X is relatively compact if and only if it is totally bounded, i.e, for every  > 0 there exists a fam- ily of finite many points {m1, m2, ··· , mn} in M such that for every point m ∈ M, d(m, mk) <  for at least one mk. Theorem (Banach-Steinhaus, uniform boundedness principle) Let X,Y be Banach spaces and let Tk, k ∈ K be a family (not necessarily count- able) of continuous linear operators from X to Y . Assume that

sup |Tkx|Y < ∞ for all x ∈ X. k∈K Then there exists a constant c such that

|Tkx|Y ≤ c |x|X for all k ∈ K and x ∈ X. Proof: Define Xn = {x ∈ X : |Tkx| ≤ n for all k ∈ K}.

Then, Xn is closed, and by the assumption we have X = ∪n Xn. It follows from the Baire category theorem that int(X0) is not empty for some n0 and select x0 ∈ X and r > 0 such that B(x0, r) ⊂ int(X0). We have

Tk(x0 + r z) ≤ n0 for all k and z ∈ B(0, 1),

19 which implies r |Tk|L(X,Y ) ≤ n0 + |Tkx0|. The uniform bounded principle is quite remarkable and surprising, since from point- wise estimates one derives a global (uniform) estimate. We have the following examples.

Examples (1) Let X be a Banach space and B be a subset of X. If for every f ∈ X∗ the set f(B) = {hf, xi, x ∈ B} is bounded in R, then B is bounded. (2) Let X be a Banach space and B∗ be a subset of X∗. If for every x ∈ X the set hB∗, xi = {hf, xi, f ∈ B∗} is bounded in R, then B∗ is bounded. Theorem (Banach closed graph theorem) A closed linear operator on a Banach space X to a Banach space Y is continuous.

2.4 Dual space and Hahn-Banach theorem In this section we discuss the dual space of a normed space and the Hahn-Banach theorem. Definition (Dual Space) If X be a normed space, let X∗ denote the collection of all bounded linear functionals on X. X∗ is called the dual space of X. We define the dual product hx, fiX×X∗ by

∗ hf, xiX∗×X = f(x) for x ∈ X f ∈ X .

For f ∈ X∗ we define the of f by |f(x)| |f|X∗ = sup . x6=0 |x|X

Theorem (Dual space) The dual space X∗ equipped with the operator norm is a Banach space. ∗ Proof: {fn} be a Cauchy sequence in X , i.e.

|fn(φ) − fm(φ)| ≤ |fn − fm|∗|φ| → 0 as m → ∞, for all φ ∈ X. Thus, {fn(φ)} is Cauchy in R, define the functional f on X by

f(φ) = lim fn(φ), φ ∈ X. n→∞

Then f is linear and for all  > 0 there exists N such that for m, n ≥ N

|fm(φ) − fn(φ)| ≤  |φ|

Letting m → ∞, we have

|f(φ) − fn(φ)| ≤  |φ| for all n ≥ N.

∗ and thus f ∈ X and |fn − f|∗ → 0 as n → ∞.  Examples (Dual space) (1) Let `p be the space of p-summable infinite sequences x = 1 P p p p ∗ q 1 1 (s1, s2, ··· ) with norm |x|p = ( |sk| ) . Then, (` ) = ` , p + q = 1 for 1 ≤ p <

20 ∞ ∞. Let c = ` be the space of uniformly bounded sequences x = (ξ1, ξ2, ··· ) with norm |x|∞ = maxk |ξk| and c0 be the space of uniformly bounded sequences with 1 0 lim n → ∞ξn = 0. Then, (` ) = c0 (2) Let Lp(Ω) is p-integrable functions f on Ω with Z p 1 |f|Lp = ( |f(x)| dx) p Ω

p ∗ q 1 1 2 ∗ 2 Then, L (Ω) = L (Ω), p + q = 1 for p ≥ 1. Especially, L (Ω) = L (Ω) and L1(Ω)∗ = L∞(Ω), where

L∞(Ω) = the space of essentially bounded functions on Ω with |f|L∞ = inf M where |f(x)| ≤ M almost everywhere on Ω. ∗ ∗ (3) Consider a linear functional f(x) = x(1). Then, f is in X1 but not in X2 since n n n for the sequence defined by x (t) = t , t ∈ [0, 1] we have f(xn) = 1 but |x |X2 = q 1 2n+1 → 0 as n → ∞. Since for x ∈ X2 s Z t Z 1 |x(t)| = |x(0) + x0(s) ds| ≤ |x0(t)|2 dt, 0 0

∗ thus f ∈ X2 . 1 (4) The total variation V0 (g) of a function g on an interval [0, 1] is defined by

n−1 1 X V0 (g) = sup |g(xi+1) − g(xi)|, P ∈P i=0 where the supremum is taken over all partitions P = {P : 0 ≤ t0 ≤ · · · ≤ tn ≤ 1} of 1 the interval [0, 1]. Then, BV (0, 1) = {uniformly bounded function g with V0 (f) < ∞} 1 and |g|BV = V0 (g). It is a normed space if we assume g(0) = 0. For example, every monotone functions is of bounded variation. Every real-valued BV-function can be expressed as the difference of two increasing functions (Jordan decomposition theorem).

Hahn-Banach Theorem Suppose S is a subspace of a normed space X and f : S → R is a linear functional satisfying f(x) ≤ |x| for all x ∈ S. Then there exists an F ∈ X∗ such that F (x) ≤ |x| and F (s) = f(s) for all s ∈ S. In general, the theorem holds for a vector space X and sub-additive and positive p, i.e., for all x, y ∈ X and α > 0

p(x + y) ≤ p(x) + p(y), p(αx) = αp(x).

Every linear functional f on a subspace S, satisfying f(x) ≤ p(x) has a extension F on X satisfying F (x) ≤ p(x) for all x ∈ X.

21 Proof: First, we show that there exists an one step extension of f, i.e., for x0 ∈ X \ S, there exists an extension f1 of f on the subspace Y1 spanned by (S, x0) such that f1(x) ≤ p(x) on Y1. Note that for arbitrary s1, s2 ∈ S,

f(s1) + f(s2) ≤ p(s1 − x0) + p(s2 + x0).

Hence, there exist a constant c such that

sup{f(s) − p(s − x0)} ≤ c ≤ inf {p(s + x0) − f(s)}. s∈S s∈S Extend f by defining, for s ∈ S and α ∈ R

f1(s + αx0) = f(s) + α c, where f1(x0) = c, is determined as above. First note that the representation s+α x0 ∈ Y1 is unique since if s1 + α x0 = s2 + β x0 s2 − s1 = (α − β)x0 and thus s1 = s2 and α = β. Next, we claim that

f1(s + αx0) ≤ p(s + αx0) for all α ∈ R and s ∈ S, i.e., by the definition of c, for α > 0 s s s s f (s + αx ) = α(c + f( ) ≤ α (p( + x ) − f( ) + f( ) = p(s + αx ), 1 0 α α 0 α α 0 and for all α < 0 s s s s f (s + αx ) = −α(−c + f( ) ≤ −α(p( − x ) − f( ) + f( ) = p(s + αx ). 1 0 −α −α 0 −α −α 0

If X is a separable normed space. Let {x1, x2,... } be a countable dense base of X \ S. Select victors from this dense vectors, which is independent of S, recursively. Extend f to subspaces Yn = span{A, x1, ·, xn}, recursively, as described above. Thus, we have an extension F on Y = span{S, x1, x2,... } Since Y is dense in X, the final extension F is defined by the continuous extension F from Y to X = Y¯ . In general, the remaining proof is done by using Zorn’s lemma.  Remark If X is a Hilbert space, then we have the orthogonal decomposition theorem

X = S ⊕ S⊥.

Thus, one can define the extension of f by

⊥ f(x) = f(x1) for x = x1 + x2, ; x1 ∈ S, x2 ∈ S .

∗ Corollary Let X be a normed space. For each x0 ∈ X there exists an f ∈ X such that f(x0) = |f|X∗ |x0|X .

Proof of Corollary: Let S = {α x0 : α ∈ R} and define f(α x0) = α |x0|X . By Hahn- Banach theorem there exits an extension F ∈ X∗ of f such that F (x) ≤ |x| for all x ∈ X. Since −F (x) = F (−x) ≤ | − x| = |x|,

22 we have |F (x)| ≤ |x|, in particular |F |X∗ ≤ 1. On the other hand, F (x0) = f(x0) = |x0|, thus |F |X∗ = 1 and F (x0) = f(x0) = |F ||x0|.  ∗ Example (Hahn-Banach Theorem) (1) |x| = sup|x∗|=1 hx , xi (2) Next, we prove the geometric form of the Hahn-Banach theorem. It states that given a convex set K containing an interior point and x0 ∈/ K, then x0 is separated from K by a hyperplane. Let K be a convex set and 0 is an interior point of K.A hyperplane is a maximal affine space x0 +M where M is a subspace of X. We introduce the Minkowski functional that defines the distance from the origin measured by K. Definition (Minkowski functional) Let K be a convex set in a normed space X and 0 is an interior point of K. Then the Minkowski functional p of K is defined by x p(x) = inf{r : ∈ K, r > 0}. r Note that if K is the unit sphere in X, p(x) = |x|. Lemma (Minkowski functional) (1) 0 ≤ p(x) < ∞. (2) p(αx) = α p(x) for α ≥ 0. (3) p(x1 + x2) ≤ p(x1) + p(x2). (4) p is continuous. (5) K¯ = {x : p(x) ≤ 1}, int(K) = {x : p(x) < 1}. Proof: (1) Since K contains a sphere at 0, for every x ∈ X there exists an r > 0 such x that r ∈ K and thus p(x) is finite. (2) For α > 0 αx x x p(αx) = inf{r : ∈ K} = inf{αr0 : ∈ K} = α inf{r0 : ∈ K} = α p(x). r r0 r0

(3) Let p(xi) < ri < p(xi)+, i = 1, 2 for arbitrary  > 0. Let r = r1+r2. By convexity of K since r1 x1 + r2 x2 = x1+x2 ∈ K and thus p(x + x ) ≤ r ≤ p(x ) + p(x ) + 2. r r1 r r2 r 1 2 1 2 Since  > 0 is arbitrary, p is subadditive. (4) Let  be a radius of a closed sphere centered at 0 and contained in K. Since x x 1  |x| ∈ K, p( |x| ) ≤ 1 and thus p(x) ≤  |x|. This shows that p is continuous at 0. Since p is sub linear

p(x) = p(x − y + y) ≤ p(x − y) + p(y) and p(y) = p(y − x + x) ≤ p(y − x) + p(x) and thus −p(y − x) ≤ p(x) − p(y) ≤ p(x − y). Hence, p is continuous since it is continuous at 0. (5) It follows from the continuity of p.  Lemma (Hyperplane) Let H be a hyperplane in a vector space X if and only if there exist a linear functional f on X and a constant c such that H = {f(x) = c}. Proof: If H ia hyperplane, then H is the translation of a subspace M in X, i.e., H = x0 + M. If x0 6 inM, then [M + x0] = X. if for x = α x0 + m, m ∈ M we let f(x) = α, we have H = {x : f(x) = 1}. Conversely, if f is a nonzero linear functional

23 on X, then M = {x : f(x) = 0} is a subspace. Let x0 ∈ X such that f(x0) = 1. Since f(x − f(x)x0) = 0, X = span[{x0},M] and {x : f(x) = c} = {f(x − c x0) = 0} is a hyperplane. Theorem (Mazur’s Theorem) Let K be convex set with a nonempty interior point in a normed space X. Suppose V is an affine space in X that contains no interior points of K. Then, there exist x∗ ∈ X∗ and constant c such that the hyperplane langlex∗, vi = c has the separation:

hx∗, vi = c for all v ∈ V and hx∗, ki < c for all k ∈ int(K).

Proof: By an appropriate translation one may assume that 0 is an interior point of K. Let M is the subspace spanned by V . Since V does not contain 0, exists a linear functional f on M such that V = {x : f(x) = 1}. Let p be the Minkowski functional of K. Since V contains no interior points of K, f(x) = 1 ≤ p(x) for all x ∈ V . By the homogeneity of p, f(x) ≤ p(x) for all x ∈ V By the Hahn-Banach space, there exits an extension F of f to V from M to X with F (x) ≤ p(x). From Lemma p is continuous and F is continuous and F (x) < 1 for x ∈ int(K). Thus, H = {x : F (x) = 1} is the desired closed hyperplane.  Example (Hilbert space) If X is a Hilbert space and S is a subspace of X. Then the extension F is defined as  lim f(s ) for s ∈ S F (s) = n 0 for s ∈ S⊥ where X = S ⊕ S⊥. 2 R 1 Example (Integral) Let X = L (0, 1) S = C(0, 1) and f(x) = R − 0 x(t) dt be the R 1 Riemman integral of x ∈ S . Then the extension F is F (x) = L − 0 x(t) dt is the Lebesgue integral of x ∈ X.

Example (Alignment) (1) For Hilbert space X f = x0 is self aligned. p ∗ q (2) Let x0 ∈ X = L (Ω), 1 ≤ p < ∞. For f ∈ X = L (Ω) defined by by

p−2 f(t) = |x0(t)| x0(t), t ∈ Ω, Z p f(x) = |x0(t)| dt. Ω x0 ∗ Thus, p−2 ∈ X is aligned with x0. |x0|

A function g is called right continuous if limh→0+ g(t + h) = g(t). Let

BV0[0, 1] = {g ∈ BV0[0, 1] : g is right continuous on [0, 1) and g(0) = 0}.

Let C[0, 1] is the normed space of continuous functions on [0, 1] with the sup norm. For f ∈ C[0, 1] and g ∈ BV0[0, 1] define the Riemann-Stieltjes integral by

Z t X x(t)dg(t) = lim R(f, g, P ) = lim f(zk)(g(tk) − g(tk−1) ∆t ∆t 0 k=1 for the partition P = {0 = t0 < ··· < tn = 1} and zk ∈ [tk−1, tk]. There is a version of the Riesz representation theorem.

24 Theorem (Dual of C[0, 1]) There is a norm-preserving linear isomorphism from C[0, 1] ∗ to V [0, 1], i.e., for F ∈ C[0, 1] there exists a unique g ∈ BV0[0, 1] such that

Z 1 F (x) = x(t)dg(t) 0 and |F |C[0,1]∗ = |g|BV . That is, C[0, 1]∗ =∼ BV (0, 1), the space of functions with bounded variation. Proof: Define the normed space of uniformly bound function on [0, 1] by

B = {x : [0, 1] → R is bounded on [0, 1]} with norm |x|B = sup |x(t)|. t∈[0,1]

By the Hahn-Banach theorem, an extension F of f from C[0, 1] to B exists and |F | = |f|. For any s ∈ (0, 1], define χ[0,s] ∈ B and define v(s) = F (χ[0,s]). Then, P P k |v(tk) − v(tk−1)| = k sign(v(tk) − v(tk−1))(v(tk) − v(tk−1)) P = sign(v(tk) − v(tk−1)(F (χ[0,tk]) − F (χ[0,tk−1])) = F ( k sign(v(tk) − v(tk−1)(χ[0,tk] − χ[0,tk−1]) P ≤ |F || k sign(v(tk) − v(tk−1)χ(tk−1,tk]| ≤ |f|.

Thus, v ∈ BV [0, 1] and |v|BV ≤ |F |. Next, for any x ∈ C[0, 1] define the function X z(t) = x(tk−1(χ[0,tk] − χ[0,tk−1]). k Note that the difference

+ |z − x|B = max max |x(tk−1) − x(t)| → 0 as ∆t → 0 . k t∈[tk−1,tk]

Since F is bounded F (z) → F (x) = f(x) and

X Z 1 F (z) = x(tk−1)(v(tk) − v(tk−1)) → x(t)dv(t) k 0 we have Z 1 f(x) = x(t)dv(t) 0 and |F | ≤ |x|C[0,1]|v|BV . Hence F | = |f| = |v|BV .  Riesz Representation Theorem For every bounded linear functional f on a Hilbert space X there exits a unique element yf ∈ X such that

f(x) = (x, yf ) for all x ∈ X, and |f|X∗ = |yf |X .

We define the Riesz map R : X∗ → X by

∗ Rf = yf ∈ X for f ∈ X

25 for a Hilbert space X. Proof: There exists a z ∈ X such that (z, x) = 0 for all f(x) = 0 and |z| = 1. Otherwise, f(x) = 0 for all x and f = 0. Note that for every x ∈ X we have f(f(x)z − f(z)x) = 0. Thus, 0 = (f(x) − f(z)x, z) = (f(x)z, z) − (f(z)x, z) which implies f(x) = (x, f(z)z)

The existence then follows by taking yf = f(z)z. Let for u1, u2 f(x) = (x, u1) = (x, u2) for all x ∈ X. Then, then (x, u1 − u2) = 0 and by taking x = u1 − u2 we obtain |u1 − u2| = 0, which implies the uniqueness. Definition (Reflexive Banach space) For a Banach space X we define an injection i of X into X∗∗ = (X∗)∗ (the double dual) as follows. If x ∈ X then i x ∈ X∗∗ is defined by i x(f) = f(x) for f ∈ X∗. Note that

|f(x)| |f|X∗ |x|X |i x|X∗∗ = sup ≤ sup = |x|X f∈X∗ |f|X∗ f∈X∗ |f|X∗

∗ an thus |i x|X∗∗ ≤ |x|X . But by Hahn-Banach theorem there exists an f ∈ X such ∗∗ that |f|X∗ = 1 and f(x) = |x|X . Hence |i x|X∗∗ = |x|X and i : X → X is an isometry. If i is also subjective, then we say that X is reflexive and we may identify X with X∗∗.

Definition (Weak and Weak star Convergence) (1) A sequence {xn} in a normed ∗ space X is said to converge weakly to x ∈ X if limn→∞ f(xn) → f(x) for all f ∈ X ∗ (2) A sequence {fn} in the dual space X of a normed space X is said to converge ∗ weakly star to f ∈ X if limn→∞ fn(x) → f(x) for all x ∈ X.

Notes (1) If xn converges strongly to x ∈ X then xn converges weakly to x, but not conversely. (2) If xn converges weakly to x, then {|xn|X } is bounded and |x| ≤ lim inf |xn|X , i.e., the norm is weakly lower semi-continuous. (3) A sequence {xn} in a normed space X converges weakly to x ∈ X if limn→∞ f(xn) → f(x) for all f inS, where S is a dense set in X∗.

(4) If X be a Hilbert space. If a sequence {xn} of X converges weakly to x ∈ X, then xn converges strongly to x if and only if limn→∞ |xn| = |x|. In fact,

2 2 2 |xn − x| = (xn − x, xn − x) = |xn| − (xn, x) − (x, xn) + |x| → 0 as n → ∞ if xn converges weakly to x. Example (Weak Convergence) (1) Let X = L2(0, 1) and consider a sequence x = √ n 2 sin(n πt), t ∈ (0, 1). Then |xn| = 1 and xn converges weakly to 0. In fact, since a family {sin(kπt)} is dense in L2(0, 1) and

(xn, sin(kπt))L2 → 0 as n → ∞ for all k and thus xn converges weakly to 0. Since |0| = 0, the sequence xn does not converge strongly to 0.

26 In general, if xn → x weakly and yn → y strongly in a Hilbert space, then (xn, yn) → (x, y) since (xn, yn) = (xn − x, y) + (xn, yn − y). R 1 2 But, note that (xn, xn) = 2 0 sin (nπt) dt = 1 6= 0 and (xn, xn) does not converges to 0. In general (xn, yn) does not necessarily converge to (x, y) if xn → x and yn → y weakly in X. ∗ ∗ Alaoglu Theorem If X is a normed space and {xn} is a bounded sequence in X , then there exists a subsequence that converges weakly star to an element of X∗. ∗ Proof: We prove the theorem for the case when X is separable. Let {xn} be sequence ∗ ∗ ∗ in X such that |xn| ≤ 1. Let {xk} is a dense sequence in X. The sequence {hxn, x1i} ∗ is a bounded sequence in R and thus a convergent subsequence denoted by {hxn1 , x1i}. ∗ ∗ Similarly, the sequence {hxn1 , x2i} contains a a convergent subsequence {hxn2 , x1i}. Continuing in this manner we extract subsequences {hx∗ , x i} that converges in R. nk k ∗ ∗ Thus, the diagonal sequence {xnn} on the dense subset {xk}, i.e., {hxnn, xki} converges ∗ ∗ ∗ for each xk. We will show that xnn converges weakly start to an element x ∈ X . Let a x ∈ X be fixed. For arbitrary  > 0 there exists N ≥ N such that

N X  |x − α x | ≤ . k k 3 k=1 Thus, ∗ ∗ ∗ PN |hxnn, xi − hxmm, xi| ≤ |hxnn, x − k=1 αkxki|

∗ ∗ PN ∗ PN +|hxnn − xmm, k=1 αkxki| + |hxmm, x − k=1 αkxki| ≤  ∗ for sufficiently large nn, mm. Thus, {hxnn, xi} is a Cauchy sequence in R and converges to a real number hx∗, xi. The functional hx∗, xi so defined is obviously linear and ∗ ∗ |x | ≤ 1 since |hxnn, xi| ≤ |x|.  Example (Weak Star Convergence) (1) Let X = L1(Ω) and L∞(Ω) = X∗. Thus, a bounded sequence in L∞(Ω) has a weakly star convergent subsequence. Let X = ∞ L (0, 1) and {xn} be a sequence in X defined by

 1 on [ 1 + 1 , 1]  2 n   1 1 1 1 xn(t) = n x on [ 2 − n , 2 − n ]    1 1 −1 on [0, 2 − n ];

1 Then |xn|X = 1 and {xn} converges weakly star to the sign function x(t) = sign(t− 2 ). 1 But, since |xn − x| = 2 , {xn} does not converge strongly to x. (2) Let X = c0, the space of infinite sequences x = (x1, x2, ··· ) that converge to zero ∗ ∗∗ ∗ ∗ with norm |x|∞ = maxk |xk|. Then X = `1 and X = `∞. Let {xn} in `1 = X ∗ ∗ defined by xn = (0, ··· , 1, 0, ··· ) where 1 appears at n−th term only. Then, xn converge ∗ ∗ ∗ ∗∗ weakly star to 0 in X but xn does not converge weakly to 0 since hxn, x i does not converge for x∗∗ = (1, 1, ··· ) ∈ X∗∗.

27 1 1 Example (L (Ω) bounded sequence) Let X = L (0, 1) and let {xn} ∈ X be a sequence defined by  n t ∈ ( 1 − 1 , 1 + 1 ) x (t) = 2 2n 2 2n n 0 otherwise.

Then, |xn|L1 = 1 and Z 1 1 xn(t)φ(t) dt → φ( ) 0 2 1 ∞ for all φ ∈ C[0, 1] but a linear functional f(φ) = φ( 2 ) is not bounded on L (0, 1). Thus, xn does not converge weakly. Actually, sequence xn converges weakly star in C[0, 1]∗. Eberlein-Suhmulyan Theorem A Banach space X is reflexive if and only if and only if every closed is weakly sequentially compact, i.e., every bounded sequence of X contains a subsequence that converges weakly to an element of X. Next, we consider linear operator T : X → Y . Definition (Linear Operator) Let X,Y be normed spaces. A linear operator T on D(T ) ⊂ X into Y satisfies

T (α x1 + β x2) = α T x1 + β T x2 where the domain D(T ) is a subspace of X. Then a linear operator T is continuous if and only if there exists a constant M ≥ 0 such that

|T x|Y ≤ M |x|X for all x ∈ X.

For a continuous linear operator we define the operator norm |T | by

|T | = sup |T x|Y = inf M, |x|X =1 where we assume D(T ) = X. A continuous linear operator on a normed space X into Y is called a bounded linear operator on X into Y and will be denoted by T ∈ L(X,Y ). The norm of T ∈ L(X,Y ) is denoted either by |T |L(X,Y ) or simply by |T |. Note that the space L(X,Y ) is a Banach space with the operator norm and X∗ = L(X,R). Open Mapping theorem Let T be a continuous linear operator from a Banach space X onto a Banach space Y . Then, T is an open map, i.e., maps every open set to of X onto an open set in Y . Proof: Suppose A : X → Y is a surjective continuous linear operator. In order to prove that A is an open map, it is sufficient to show that A maps the open unit ball in X to a neighborhood of the origin of Y . Let U = B1(0) and V = B1(0) be a open unit ball S of 0 in X and Y , respectively. Then X = k∈N kU. Since A is surjective, ! [ [ Y = A(X) = A kU = A(kU). k∈N k∈N Since Y is a Banach, by the Baire’s category theorem there exists a k ∈ N such that  ◦  ◦ A(kU) 6= ∅. Let c ∈ A(kU) . Then, there exists r > 0 such that Br(c) ⊆

28  ◦ A(kU) . If v ∈ V , then

 ◦ c, c + rv ∈ Br(c) ⊆ A(kU) ⊆ A(kU).

 ◦ and the difference rv = (c + rv) − c of elements in A(kU) satisfies rv ∈⊆ A(2kU). 2k  Since A is linear, V ⊆ A r U . Thus, it follows that for all y ∈ Y and  > 0, 2k there exists x ∈ X such that |x|X < r |y|Y and |y − Ax|Y < . (2.1) δ Fix y ∈ δV , by (2.1), there is some x1 ∈ X with |x1| < 1 and |y − Ax1| < 2 . By (2.1) one can find a sequence {xn} inductively by −(n−1) −n |xn| < 2 and |y − A(x1 + x2 + ··· + xn)| < δ2 . (2.2)

Let sn = x1 +x2 +···+xn. From the first inequality in (2.2), {sn} is a Cauchy sequence, and since X is complete, {sn} converges to some x ∈ X. By (2.2), the sequence Asn tends to y in Y , and thus Ax = y by continuity of A. Also, we have ∞ X |x| = lim |sn| ≤ |xn| < 2. n→∞ n=1 This shows that every y ∈ δ V belongs to A(2U), or equivalently, that the image A(U) δ of U = B1(0) in X contains the open ball 2 V ⊂ Y .  Corollary (Open Mapping theorem) A bounded linear operator T on Banach spaces is bounded invertible if and only if it is one-to-one and onto. Definition (Closed Linear Operator) (1) The graph G(T ) of a linear operator T on the domain D(T ) ⊂ X into Y is the set (x, T x): x ∈ D(T )} in the product space X × Y . Then T is closed if its graph G(T ) is a closed linear subspace of X × Y , i.e., if xn ∈ D(T ) converges strongly to x ∈ X and T xn converges strongly to y ∈ Y , then x ∈ D(T ) and y = T x. Thus the notion of a closed linear operator is an extension of the notion of a bounded linear operator. (2) A linear operator T is said be closable if xn ∈ D(T ) converges strongly to 0 and T xn converges strongly to y ∈ Y , then y = 0. For a closed linear operator T , the domain D(T ) is a Banach space if it is equipped by the graph norm 2 2 1 |x|D(T ) = (|x|X + |T x|Y ) 2 . d 2 Example (Closed linear Operator) Let T = dt with X = Y = L (0, 1) is closed and dom (A) = H1(0, 1) = {f ∈ L2(0, 1) : absolutely continuous functions on [0, 1] with square integrable derivative}.

If yn = T xn, then Z t xn(t) = xn(0) + yn(s) ds. 0 2 If xn ∈ dom(T ) → x and yn → y in L (0, 1), then letting n → ∞ we have Z t x(t) = x(0) + y(s) ds, 0

29 i.e., x ∈ dom (T ) and T x = y. In general if for λ I + T for some λ ∈ R has a bounded inverse (λ I + T )−1, then T : dom (A) ⊂ X → X is closed. In fact, T xn = yn is equivalent to

−1 xn = (λ I + T ) (yn + λ xn

Suppose xn → x and yn → y in X, letting n → ∞ in this, we have x ∈ dom (T ) and T x = T (λ I + T )−1(λ x + y) = y. Definition (Dual Operator) Let T be a linear operator X into Y with dense domain D(T ). The dual operator of T ∗ of T is a linear operator on Y ∗ into X∗ defined by

∗ ∗ ∗ hy , T xiY ∗×Y = hT y , xiX∗×X for all x ∈ D(T ) and y∗ ∈ D(T ∗). In fact, for y∗ ∈ Y ∗ x∗ ∈ X∗ satisfying

hy∗, T xi = hx∗, xi for all x ∈ D(T ) is uniquely defined if and only if D(T ) is dense. The only if part follows since if ∗ ∗ D(T ) 6= X then the Hahn-Banach theory there exits a nonzero x0 ∈ X such that ∗ hx0, xi = 0 for all D(T ), which contradicts to the uniqueness assumption. If T is bounded with D(T ) = X then T ∗ is bounded with kT k = kT ∗k. Examples Consider the gradient operator T : L2(Ω) → L2(Ω)n as

T u = ∇u = (Dx1 u, ··· Dxn u) with D(T ) = H1(Ω). The, we have for v ∈ L2(Ω)n

∗ X T v = −div v = − Dxk vk with domain D(T ∗) = {v ∈ L2(Ω)n : div v ∈ L2(Ω) and n·v = 0 at the boundary ∂Ω}. In fact by the divergence theorem Z Z Z (T u, v) = ∇u · v (n · v)u ds − u(div v) dx = (u, T ∗v) Ω ∂Ω Ω

1 1 ∗ 2 1 for all v ∈ C (Ω). First, let u ∈ H0 (Ω) we have T v = −div v ∈ L (Ω) since H0 (Ω) is dense in L2(Ω). Thus, n · v ∈ L(∂Ω) and n · v = 0. Definition (Hilbert space Adjoint operator) Let X, Y be Hilbert spaces and T be a linear operator X into Y with dense domain D(T ). The Hilbert self adjoint operator of T ∗ of T is a linear operator on Y into X defined by

∗ (y, T x)Y = (T y, x)X for all x ∈ D(T ) and y ∈ D(T ∗). Note that if we let T 0 : Y ∗ → X∗ is the dual operator of T , then ∗ 0 T RY ∗→Y = RX∗→X T where RX∗→X and RY ∗→Y are the Riesz maps.

30 Examples (self-adjoint operator) Let X = L2(Ω) and T be the Laplace operator

n X T u = ∆u = Dxkxk u k=1

2 1 ∗ with domain D(T ) = H (Ω) ∩ H0 (Ω). Then T is sel-adjoint, i.e., T = T . In fact Z Z Z ∗ (T u, v)X = ∆u v dx = ((n · ∇u)v − (n · ∇v)u) ds + ∆v u dx = (x, T v) Ω ∂Ω Ω for all v ∈ C1(Ω). Closed Range Theorem Let X and Y be Banach spaces and T a closed linear op- erator on D(T ) ⊂ X into Y . Assume that D(T ) is dense in X. The the following properties are all equivalent. (a) R(T ) is closed in Y . (b) R(T ∗) is closed in X∗. (c) R(T ) = N(T ∗)⊥ = {y ∈ Y : hy∗, yi = 0 for all y∗ ∈ N(T ∗)}. (d) R(T ∗) = N(T )⊥ = {x∗ ∈ X∗ : hx∗, xi = 0 for all x∗ ∈ N(T )}. Proof: If y∗ ∈ N(T ∗)⊥, then hT x, y∗i = hx, T ∗y∗i for all x ∈ dom(T ) and N(T ∗)⊥ ⊂ R(T ). If y ∈ R(T ), then

hy, y∗i = hx, T ∗y∗i for all y∗ ∈ N(T ∗) and R(T ) ⊂ N(T ∗)⊥. Thus, R(T ) = N(T ∗)⊥. Since T is closed, the graph G = G(T ) = {(x, T x)inX × Y : x ∈ dom(T )} is closed. Thus, G is a Banach space equipped with norm |{x, y}| = |x| + |y| of X × Y . Define a continuous linear operator S from G into Y by

S{x, T x} = T x

Then, the dual operator S∗ of S is continuous linear operator from Y ∗ intoG∗ and we have h{x, T x},S∗y∗i = hS{x, T x}, y∗i = hT x, y∗i = h{x, T x}, {0, y∗}i for x ∈ dom(T ) and y∗ ∈ Y ∗. Thus, the functional S˜ Sy˜ ∗ = S ∗ y∗ − {0, y∗} : Y ∗ → ∗ ∗ ∗ ∗ X ×Y vanishes for all elements of G. But, since h{x, T x}, {x , y1} for all x ∈ dom(T ), ∗ ∗ ∗ ∗ ∗ is equivalent to hx, x i = h−T x, y1i for all x ∈ dom(T ), i.e., −T y1 = x ,

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ S y = {0, y } + {−T y1, y1} = {−T y1, y + y1}

∗ ∗ ∗ ∗ ∗ ∗ ∗ for all y ∈ Y . Since y1 is arbitrary, R(S ) = R(T ) × Y . Therefore, R(S ) is closed in X∗ × Y ∗ if and only if R(T ) is closed and since R(S) = R(T ), R(S) is closed if and only if R(T ) is closed in Y . Hence we have only prove the equivalency of (a) and (b) in the special case of a S. If T is bounded, then (a) implies(b). Since R(T ) a Banach space, by the Hahn- Banach theorem, one can assume R(T ) = Y without loss of generality

∗ ∗ ∗ hT x, y1i = hx, T y1i

31 By the open mapping theorem, there exists c > 0 such that for each y ∈ Y there exists an x ∈ X with T x = y and |x| ≤ c |y|. Thus, for y∗ ∈ Y ∗, we have

|hy, y∗i| = |hT x, y∗i| = |hx, T ∗y∗i| ≤ c|y||T ∗y∗|. and |y∗| ≤ sup |hy, y∗i| ≤ c|T ∗y∗| |y|≤1 and so (T ∗)−1 exists and bounded and D(T ∗) = R(T ∗) is closed. ∗ ∗ Let T1 : X → Y1 = R(T ) be defined by T1x = T x For y1 ∈ Y1

∗ ∗ ∗ hT1x, y1i = hx, T y i = 0, for all x ∈ X

∗ ∗ ∗ ∗ −1 and since R(T ) is dense in R(T ), y1 = 0. Since R(T ) = R(T1 ) is closed, (T1 ) exists ∗ on Y1 = (Y1)

2.5 Distribution and Generalized Derivatives In this section we introduce the distribution (). The concept of distribution is very essential for defining a generalized solution to PDEs and provides the foundation of PDE theory. Let D(Ω) be a vector space of all infinitely many con- ∞ tinuously differentiable functions C0 (Ω) with compact support in Ω. For any compact ∞ set K of Ω, let DK (Ω) be the set of all functions f ∈ C0 (Ω) whose support are in K. Define a family of on D(Ω) by

s pK,m(f) = sup sup |D f(x)| x∈K |s|≤m where  ∂ s1  ∂ sn Ds = ··· ∂x1 ∂xn P where s = (s1, ··· , sn) is nonnegative integer valued vector and |s| = sk ≤ m. Then, DK (Ω) is a locally convex topological space. ∞ Definition (Distribution) A linear functional T defined on C0 (Ω) is a distribution if for every compact subset K of Ω, there exists a positive constant C and a positive integer k such that

s |T (φ)| ≤ C sup|s|≤k, x∈K |D φ(x)| for all φ ∈ DK (Ω).

Definition (Generalized Derivative) A distribution S defined by

∞ S(φ) = −T (Dxk φ) for all φ ∈ C0 (Ω)

is called the distributional derivative of T with respect to xk and we denote S = Dxk T .

In general we have

s |s| s ∞ S(φ) = D T (φ) = (−1) T (D φ) for all φ ∈ C0 (Ω).

32 This definition is naturally followed from that for f is continuously differentiable Z Z ∂ Dxk fφ dx = − f φ dx Ω Ω ∂xk s and thus Dxk f = Dxk Tf = T ∂ f. Thus, we let D f denote the distributional deriva- ∂xk tive of Tf . Example (Distribution) (1) For f is a locally integrable function on Ω, one defines the corresponding distribution by Z ∞ Tf (φ) = fφ dx for all φ ∈ C0 (Ω). Ω since Z |Tf (φ)| ≤ |f| dx sup |φ(x)|. K x∈K

(2) T (φ) = φ(0) defines the Dirac delta δ0 at x = 0, i.e.,

|δ0(φ)| ≤ sup |φ(x)|. x∈K (3) Let H be the Heaviside function defined by

 0 for x < 0 H(x) = 1 for x ≥ 0

Then, Z ∞ 0 DTH (φ) = − H(x)φ (x) dx = φ(0) −∞ and thus DTH = δ0 is the Dirac delta function at x = 0. 2 (4) The distributional solution for −D u = δx0 satisfies Z ∞ 00 − uφ dx = φ(x0) −∞

∞ 1 for all φ ∈ C0 (R). That is, u = 2 |x − x0| is the fundamental solution, i.e.,

Z ∞ Z x0 Z ∞ 00 0 0 − |x − x0|φ dx = φ (x) dx − φ (x) dx = 2φ(x0). −∞ ∞ x0 In general for d ≥ 2 let

 1 log|x − x | d = 2  4π 0 G(x, x0) = 2−d  cd |x − x0| d ≥ 3.

Then ∆G(x, x0) = 0, x 6= x0. d and u = G(x, x0) is the fundamental solution to to −∆ in R ,

−∆u = δx0 .

33 In fact, let B = {|x−x0| ≤ } and Γ = {|x−x0| = } be the surface. By the divergence theorem Z Z ∂ ∂ G(x, x0)∆φ(x) dx = φ(G(x, x0) − G(x, x0)φ(s)) ds d R \B(x0) Γ ∂ν ∂ν Z 2−d ∂φ 1−d 1 = ( − (2 − d) φ(s)) ds → φ(x0) Γ ∂ν cd

That is, G(x, x0) satisfies Z − G(x, x0)∆φ dx = φ(x0). Rd In general let L be a linear diffrenrtial operator and L∗ denote the formal adjoint operator of L An locally integrable function u is said to be a distributional solution to Lu = T where L with a distribution T if Z u(L∗φ) dx = T (φ) Ω

∞ for all φ ∈ C0 (Ω). Definition (Sovolev space) For 1 ≤ p < ∞ and m ≥ 0 the Sobolev space is

W m,p(Ω) = {f ∈ Lp(Ω) : Dsf ∈ Lp(Ω), |s| ≤ m} with norm   1 Z p X s p |f|W m,p(Ω) =  |D f| dx . Ω |s|≤m That is, s 1 1 |D f(φ)| ≤ c |φ| q with + = 1. L p q m,p s Remark (1) X = W (Ω) is complete. In fact If {fn} is Cauchy in X, then {D fn} p p s s p is Cauchy in L (Ω) for all |s| ≤ m. Since L (Ω) is complete, D fn → g in L (Ω). But since Z Z Z s s s lim fnD φ dx = fD φ dx = g φ dx, n→∞ Ω Ω s s we have D f = g for all |s| ≤ m and |fn − f|X → 0 as n → ∞. (2) Hm,p ⊂ W 1,p(Ω). Let Hm,p(Ω) be the completion of Cm(Ω) with respect to W I,p(Ω) m,p m norm. That is, f ∈ H (Ω) there exists a sequence fn ∈ C (Ω) such that fn → f s s p and D fn → g strongly in L (Ω) and thus Z Z s |s| |s| s D fn(φ) = (−1) Dsfnφ dx → (−1) g φ dx Ω Ω which implies gs = Dsf and f ∈ W 1,p(Ω). (3) If Ω has a Lipschitz continuous boundary, then

W m,p(Ω) = Hm,p(Ω).

34 2.6 Lax-Milgram Theorem ann Banach-Necas-Babuska The- orem Let X be a Hilbert space. Let σ be a (complex-valued) sesquilinear form on X × X satisfying σ(α x1 + β x2, y) = α σ(x1, y) + β σ(x2, y)

σ(x, α y1 + β y2) =α ¯ σ(x, y1) + β¯ σ(x, y2), |σ(x, y)| ≤ M |x||y| for all x, y ∈ X (Bounded) and Re σ(x, x) ≥ δ |x|2 for all x ∈ X and δ > 0 (Coercive). Then for each f ∈ X∗ there exist a unique solution x ∈ X to

σ(x, y) = hf, yiX∗×X for all y ∈ X and −1 |x|X ≤ δ |f|X∗ . Proof: Let us define the linear operator S from X∗ into X by

Sf = x, f ∈ X∗ where x ∈ X satisfies σ(x, y) = hf, yi for all y ∈ X.

The operator S is well defined since if x1, x2 ∈ X satisfy the above, then σ(x1−x2, y) = 2 0 for all y ∈ X and thus δ |x1 − x2|X ≤ Re σ(x1 − x2, x1 − x2) = 0. ∗ Next we show that dom(S) is closed in X . Suppose fn ∈ dom(S), i.e., there exists ∗ xn ∈ X satisfying σ(xn, y) = hfn, yi for all y ∈ X and fn → f in X as n → ∞. Then

σ(xn − xm, y) = hfn − fm, yi for all y ∈ X

Setting y = xn − xm in this we obtain

2 δ |xn − xm|X ≤ Re σ(xn − xm, xn − xm) ≤ |fn − fm|X∗ |xn − xm|X .

Thus {xn} is a Cauchy sequence in X and so xn → x for some x ∈ X as n → ∞. Since σ and the dual product are continuous, thus x = Sf. Now we prove that dom(S) = X∗. Suppose dom(S) 6= X∗. Since dom(S) is closed there exists a nontrivial x0 ∈ X such that hf, x0i = 0 for all f ∈dom(S). Consider ∗ the linear functional F (y) = σ(x0, y), y ∈ X. Then since σ is bounded F ∈ X and x0 = SF . Thus F (x0) = 0. But since σ(x0, x0) = hF, x0i = 0, by the coercivity of σ ∗ x0 = 0, which is a contradiction. Hence dom(S) = X .  Assume that σ is coercive. By the Lax-Milgram theorem A has a bounded inverse S = A−1. Thus, dom (A˜) = A−1H. Moreover A˜ is closed. In fact, if

xn ∈ dom (A˜) → x and fn = Axn → f in H,

35 then since xn = Sfn and S is bounded, x = Sf and thus x ∈ dom (A˜) and Ax˜ = f. If σ is symmetric, σ(x, y) = (x, y)X defines an inner product on X. and SF coincides with the Riesz representation of F ∈ X∗. Moreover,

hAx, yi = hAy, xi for all x, y ∈ X. and thus A˜ is a self-adjoint operator in H. 1 2 Example (Laplace operator) Consider X = H0 (Ω), H = L (Ω) and Z σ(u, φ) = (u, φ)X = ∇u · ∇φ dx. Ω Then, ∂2 ∂2 Au = −∆u = −( 2 u + 2 u) ∂x1 ∂x2 and ˜ 2 1 dom (A) = H (Ω) ∩ H0 (Ω). for Ω with C1 boundary or convex domain Ω. For Ω = (0, 1) and f ∈ L2(0, 1)

Z 1 d d Z 1 y u dt = f(x)y(x) dx 0 dx dx 0 is equivalent to Z 1 d d Z 1 y ( u + f(s) ds) dx = 0 0 dx dx x 1 for all y ∈ H0 (0, 1). Thus, d Z 1 u + f(s) ds = c (a constant) dx x

d 1 and therefore dx u ∈ H (0, 1) and d2 Au = − u = f in L2(0, 1). dx2 Example (Elliptic operator) Consider a second order elliptic equation

∂u Au = −∇ · (a(x)∇u) + b(x) · ∇u + c(x)u(x) = f(x), = g at Γ u = 0 at Γ ∂ν 1 0 where Γ0 and Γ1 are disjoint and Γ0 ∪ Γ1 = Γ. Integrating this against a test function φ, we have Z Z Z Z Auφ dx = (a(x)∇u · ∇φ + b(x) · ∇uφ + c(x)uφ) dx − gφ dsx = f(x)φ(x) dx, Ω Ω Γ1 Ω for all φ ∈ C1(Ω) vanishing at Γ . Let X = H1 (Ω) is the completion of C1(Ω) 0 Γ0 vanishing at Γ0 with inner product Z (u, φ) = ∇u · ∇φ dx Ω

36 i.e., 1 1 HΓ0 (Ω) = {u ∈ H (Ω) : u|Γ0 = 0} Define the bilinear form σ on X × X by Z σ(u, φ) = (a(x)∇u · ∇φ + b(x) · ∇uφ + c(x)uφ. Ω Then, by the Green’s formula Z 1 σ(u, u) = (a(x)|∇u|2 + b(x) · ∇( |u|2) + c(x)|u|2) dx Ω 2 Z Z 2 1 2 1 2 = (a(x)|∇u| + (c(x) − ∇ · b) |u| ) dx + + n · b|u| dsx. Ω 2 Γ1 2 If we assume 1 0 < a ≤ a(x) ≤ a,¯ c(x) − ∇ · b ≥ 0, n · b ≥ 0 at Γ , 2 1 then σ is bounded and coercive with δ = a. The Banach space version of Lax-Milgram theorem is as follows. Banach-Necas-Babuska Theorem Let V and W be Banach spaces. Consider the linear equation for u ∈ W

a(u, v) = f(v) for all v ∈ V (2.3) for given f ∈ V ∗, where a is a bounded bilinear form on W × V . The problem is well-posed in if and only if the following conditions hold:

a(u, v) inf sup ≥ δ > 0 u∈W |u| |v| v∈V W V (2.4) a(u, v) = 0 for all u ∈ W implies v = 0

Under conditions we have the unique solution u ∈ W to (2.3) satisfies 1 |u| ≤ |f| ∗ . W δ V Proof: Let A be a bounded linear operator from W to V ∗ defined by

hAu, vi = a(u, v) for all u ∈ W, v ∈ V.

The inf-sup condition is equivalent to for any w?W

|Aw|V ∗ ≥ δ|u|W , and thus the range of A, R(A), is closed in V ∗ and N(A) = 0. But since V is reflexive and ∗ hAu, viV ∗×V = hu, A viW ×W ∗

37 from the second condition N(A∗) = {0}. It thus follows from the closed range and −1 open mapping theorems that A is bounded.  Next, we consider the generalized Stokes system. Let V and Q be Hilbert spaces. We consider the mixed variational problem for (u, p) ∈ V × Q of the form

a(u, v) + b(p, v) = f(v), b(u, q) = g(q) (2.5) for all v ∈ V and q ∈ Q, where a and b is bounded bilinear form on V × V and V × Q. If we define the linear operators A ∈ L(V,V ∗) and B ∈ L(V,Q∗) by

hAu, vi = a(u, v) and hBu, qi = b(u, q) then it is equivalent to the operator form:

 AB∗   u   f      =   . B 0 p g

Assume the coercivity on a 2 a(u, u) ≥ δ |u|V (2.6) and the inf-sup condition on b

b(u, q) inf sup ≥ β > 0 (2.7) q∈P u∈V |u|V |q|Q Note that inf-sup condition that for all q there exists u ∈ V such that Bu = q and 1 ∗ |u|V ≤ β |q|Q. Also, it is equivalent to |B p|V ∗ ≥ β |p|Q for all p ∈ Q. Theorem (Mixed problem) Under conditions (2.6)-(2.7) there exits a unique solution (u, p) ∈ V × Q to (2.5) and

|u|V + |p|Q ≤ c (|f|V ∗ + |g|Q∗ )

Proof: For  > 0 consider the penalized problem

a(u, v) + b(v, p) = f(v), for all v ∈ V (2.8) −b(u, q) + (p, q)Q = −g(q), for all q ∈ Q.

By the Lax-Milgram theorem for every  > 0 there exists a unique solution (u, p) ∈ V × Q. From the first equation and (2.7),

β |p|Q ≤ |f − Au|V ∗ ≤ |f|V ∗ + M |u|V .

Letting v = u and q = p in the first and second equation and (2.7), we have

2 2 δ |u|V +  |p|Q ≤ |f|V ∗ ||u|V + |p|Q|g|Q∗ ≤ C (|f|V ∗ + |g|Q∗ )|u|V ), and thus |u|V and thus |p|Q are bounded uniformly in  > 0. Thus, (u, p) has a weakly convergent subspace to (u, p) in V × Q and (u, p) satisfies (2.5). 

38 2.7 Compact Operator A compact operator is a linear operator A from a Banach space X to another Banach space Y , such that the image under A of any bounded subset of X is a relatively compact subset of Y , i.e., its closure is compact. For example, the integral operator is a concrete example of compact operators. A Fredholm integral equation gives rise to a compact operator A on function spaces. A compact set in a Banach space is bounded. The converse doses not hold in general. The the closed unit sphere is weakly star compact but not strongly compact unless X is of finite dimension. Let C(S) be the space of continuous functions on a compact metric space S with norm |x| = maxs∈S |x(s)|. the Arzela-Ascoli theorem gives the necessary and sufficient condition for a subset {xα} being (strongly) relatively compact. Arzela-Ascoli theorem A family {xα} in C(S) is strongly compact if and only if the following conditions hold:

sup |xα| < ∞ (equi-bouned) α

0 0 lim sup |xα(s ) − xα(s )| = 0 (equi-continuous). δ→0+ α, |s0−s00|≤δ

Proof: By the Bolzano-Weirstrass theorem for a fixed s ∈ S {xα(s)} contains a con- vergent subsequence. Since S is compact, there is a countable dense subset {sn} in S such that for every  > 0 there exists n satisfying

sup inf dist(s, sj) ≤ . s∈S 1≤j≤n

Let us denote a convergent subsequence of {xn(s1)} by {xn1 (s1)}. Similarly, the se- quence {hxn1 (s2)} contains a a convergent subsequence {xn2 (s2)}. Continuing in this ∗ manner we extract subsequences {xnk (sk)}. The diagonal sequence {xnn(s)} converges for s = s1, s2, ··· , simultaneously for the dense subset {sk}. We will show that {xnn} is a Cauchy sequence in C(S). By the equi-continuity of {xn(s)} for all  > 0 there 0 00 0 0 0 00 exists δ such that |xn(s ) − xn(s )| ≤  for all s , s satisisfying dist(s , s ) ≤≤ δ. Thus, for s ∈ S there exists a j ≤ k such that

|xnn(s) − xmm(s)| ≤ |xnn(s) − xmm(sj)| + |xnn(sj) − xmm(sj)| + |xmm(s) − xmm(sj)|

≤ 2 + |xnn(sj) − xmm(sj)| which implies that limm,m→∞ maxs∈S |xnn(s) − xmm(s)| ≤ 2 for arbitrary  > 0.  For the space Lp(Ω), we have d p Frechet-Kolmogorov theorem Let Ω be a subset of R . A family {xα} of L (Ω) functions is strongly compact if and only if the following conditions hold: Z p sup |xα| ds < ∞ (equi-bouned) α Ω Z p lim sup |xα(t + s) − xα(s)| ds = 0 (equi-continuous). |t|→0 α Ω

39 Example (Integral Operator) Let k(x, y) be the kernel function satisfying Z Z |k(x, y)|2 dxdy < ∞ Ω Ω Define the integral operator A by Z Au = k(x, y)u(y) dy for u ∈ X = L2(Ω). Ω Then A ∈ L(X) i compact. This class of operators A is called the Hilbert-Schmitz operator. Rellich-Kondrachov theorem Every uniformly bounded sequence in W 1,p(Ω) has a q ∗ dp subsequence that converges in L (Ω) provided that q < p = d−p . Fredholm Alternative theorem Let A ∈ L(X) be compact. For any nonzero λ ∈ C, either 1) The equation Ax − λ x = 0 has a nonzero solution x, or 2) The equation Ax − λ, x = f has a unique solution x for any function f ∈ X. In the second case, (λ I − A)−1 is bounded. Exercise Problem 1 T : D(T ) ⊂ X → Y is closable if and only if the closure of its graph G(T ) in X × Y the graph of a linear operator S. Problem 2 For a closed linear operator T , the domain D(T ) is a Banach space if it is equipped by the graph norm

2 2 1 |x|D(T ) = (|x|X + |T x|Y ) 2 .

3 Constrained Optimization

In this section we develop the Lagrange multiplier theory for the constrained minimiza- tion in Banach spaces.

3.1 Hilbert space theory In this section we develop the Hilbert space theory for the constrained minimization. Theorem 1 (Riesz Representation) Let X be a Hilbert space. Given F ∈ X∗, consider the minimization 1 J(x) = (x, x) − F (x) over x ∈ X 2 X There exists a unique solution x∗ ∈ X and x∗ satisfies

(v, x∗) = F (v) for all v ∈ X (3.1)

∗ ∗ ∗ The solution x ∈ X is the Riesz representation of F ∈ X and it satisfies |x |X = |F |X∗ .

40 Proof: Suppose x∗ is a minimizer. Then 1 1 (x∗ + tv, x∗ + tv) − F (x∗ + tv) ≥ (x∗, x∗) − F (x∗) 2 2 for all t ∈ R and v ∈ X. Thus, for t > 0 t (v, x∗) − F (v) + |v|2 ≥ 0 2 Letting t → 0+, we have (v, x∗) − F (v) ≥ 0 for all v ∈ X. If v ∈ X, then −v ∈ X and this implies (3.1). ∗ ∗ (Uniqueness) Suppose x1, x2 are minimizers. Then, ∗ ∗ (v, .x1) − (v, x2) = F (v) − F (v) = 0 for all v ∈ X

∗ ∗ ∗ ∗ 2 ∗ ∗ ∗ ∗ ∗ ∗ Letting v = x1 − x2, we have |x1 − x2| = (x1 − x2, x1 − x2) = 0, i.e., x1 = x2. 

(Existence) Suppose {xn} is a minimizing sequence, i.e., J(xn) is decreasing and limn→∞ J(xn) = δ = infx∈X J(x). Note that

2 2 2 2 |xn − xm| + |xn + xm| = 2(|xn| + |xm| )

Thus, 1 x + x x + x |x −x |2 = J(x )+J(x )−2J( n m )−F (x )−F (x )+2F ( n m ) ≤ J(x )+J(x )−2δ. 4 n m m n 2 n m 2 n m ∗ This implies {xn} is a Cauchy sequence in X. Since X is complete there exists x = ∗ ∗ limn→∞ xn in X and J(x ) = δ, i.e, x is the minimizer.  Remark The existence and uniqueness proof of the theorem holds if we assume J is uniformly convex functional with modulus φ, i.e.,

J(λx1 + (1 − λ)x2) ≤ J(x1) + (1 − λ)J(x2) − λ(1 − λ)φ(|x1 − x2|) where φ is a function that is increasing and vanishes only at 0. In fact 1 x + x φ(|x − x |) ≤ J(x ) + J(x ) − 2J( n m ) ≤ J(x ) + J(x ) − 2δ. 4 n m m n 2 n m The necessary and sufficient optimality is given by 0 ∈ ∂J(x∗) where ∂J(x∗) is the subdifferential of J at x∗. Example (Mechanical System)

Z 1 1 d Z 1 min | u(x)|2 − f(x)u(x) dx (3.2) 0 2 dx 0 1 over u ∈ H0 (0, 1). Here 1 1 X = H0 (0, 1) = {u ∈ H (0, 1) : u(0) = u(1) = 0} is a Hilbert space with the inner product

Z 1 d d (u, v)H1 = u v dx. 0 0 dx dx

41 H1(0, 1) is a space of absolute continuous functions on [0, 1] with square integrable derivative, i.e., Z x d u(x) = u(0) + g(x) dx, g(x) = u(x), a.e. and g ∈ L2(0, 1). 0 dx ∗ The solution yf = x to (3.8) satisfies

Z 1 d d Z 1 v yf dx = f(x)v(x) dx. 0 dx dx 0 By the integration by part

Z 1 Z 1 Z 1 d f(x)v(x) dx = ( f(s) ds) v(x) dx, 0 0 x dx and thus Z 1 d Z 1 d ( yf − f(s) ds) v(x) dx = 0 0 dx x dx 1 for all v ∈ H0 (0, 1). This implies that d Z 1 yf − f(s) ds = c = a constant dx x Integrating this, we have

Z x Z 1 yf = f(s) dsdx + cx 0 x

Since yf (1) = 0 we have Z 1 Z 1 c = − f(s) dsdx. 0 x d 1 Moreover, dx yf ∈ H (0, 1) and d2 − y (x) = f(x) a.e. dx2 f with yf (0) = yf (1) = 0. Now, we consider the case when F (v) = v(1/2). Then,

Z 1/2 Z 1/2 Z 1/2 d d 1/2 1/2 1 |F (v)| = | v dx| ≤ ( | v| dx) ( 1 dx) ≤ √ |v|X . 0 dx 0 dx 0 2 That is, F ∈ X∗. Thus, we have

Z 1 d d ( yf − χ(0,1/2)(x)) v(x) dx = 0 0 dx dx

1 for all v ∈ H0 (0, 1), where χS is the characteristic function of a set S. It is equivalent to d y − χ (x) = c = a constant. dx f (0,1/2)

42 Integrating this, we have

 (1 + c)x x ∈ [0, 1 ]  2 yf (x) =  1 1 cx + 2 x ∈ [ 2 , 1].

1 Since yf (1) = 0 we have c = − 2 . Moreover yf satisfies d2 1 − y = 0, for x 6= y (0) = y (1) = 0 dx2 f 2 f f

1 − 1 + d 1 − d 1 + yf (( 2 ) ) = yf ( 2 ) ), dx yf (( 2 ) ) − dx yf (( 2 ) ) = 1. Or, equivalently d2 − yf = δ 1 (x) dx2 2 where δx0 is the Dirac delta distribution at x = x0. Theorem 2 (Orthogonal Decomposition X = M ⊕ M ⊥) Let X be a Hilbert space and M is a linear subspace of X. Let M be the closure of M and M ⊥ = {z ∈ X : (z, y) = 0 for all y ∈ M} be the orthogonal complement of M. Then every element u ∈ X has the unique representation

⊥ u = u1 + u2, u1 ∈ M and u2 ∈ M .

Proof: Consider the minimum distance problem

min |x − u|2 over x ∈ M. (3.3)

As for the proof of Theorem 1 one can prove that there exists a unique solution x∗ for problem (3.3) and x∗ ∈ M satisfies

(x∗ − u, y) = 0, for all y ∈ M

∗ ⊥ ∗ ∗ ⊥ i.e., u − x ∈ M . If we let u1 = x ∈ M and u2 = u − x ∈ M , then u = u1 + u2 is the desired decomposition.  P Example (Closed subspace) (1) M = span{(yk, k ∈= {x ∈ X; x = k∈K αk yk} is a subspace of X. If K is finite, M is closed. (2) Let X = L2(−1, 1) and M = {odd functions} = {f ∈ X : f(−x) = f(x), a.e.} and 2 then M is closed. Let M1 = span{sin(kπt)} and M2 = span{cos(kπt)}. L (0, 1) = M1 ⊕ M2. (3) Let E ∈ L(X,Y ). Then M = N(E) = {x ∈ X : Ex = 0} is closed. In fact, if xn ∈ M and xn → x in X, then 0 = limn→∞ Exn = E limn→∞ xn = Ex and thus x ∈ N(E) = M. Example (Hodge-Decomposition) Let X = L2(Ω)3 with where Ω is a bounded open set in R3 with Lipschitz boundary Γ. Let M = {∇u, u ∈ H1(Ω)} is the gradient field. Then, M ⊥ = {ψ ∈ X : ∇ · ψ = 0, n · ψ = 0 at Γ}

43 is the divergence free field. Thus, we have the Hodge decomposition

X = M ⊕ M ⊥ = {grad u, u ∈ H1(Ω)} ⊕ {the divergence free field}.

In fact ψ ∈ M ⊥ Z Z Z ∇φ · ψ = n · ψφ ds − ∇ψφ dx = 0 Γ Ω and ∇u ∈ M, u ∈ H1(Ω)/R is uniquely determined by Z Z ∇u · ∇φ dx = f · ∇φ dx Ω Ω for all φ ∈ H1(Ω). Theorem 3 (Decomposition) Let X,Y be Hilbert spaces and E ∈ L(X,Y ). Let ∗ ∗ E ∈ L(Y,X) is the adjoint of E, i.e., (Ex, y)Y = (x, E y)X for all x ∈ X and y ∈ Y . Then N(E) = R(E∗)⊥ and thus from the orthogonal decomposition theorem X = N(E) ⊕ R(E∗). Proof: Suppose x ∈ N(E), then

∗ (x, E y)X = (Ex, y)Y = 0 for all y ∈ Y and thus N(E) ⊂ R(E∗)⊥. Conversely, x∗ ∈ R(E∗)⊥, i.e.,

(x∗,E∗y) = (Ex, y) = 0 for all y ∈ Y

∗ ∗ ⊥ Thus, Ex = 0 and R(E ) ⊂ N(E).  Remark Suppose R(E) is closed in Y , then R(E∗) is closed in X and

Y = R(E) ⊕ N(E∗),X = R(E∗) ⊕ N(E)

∗ Thus, we have b = b1 + b2, b1 ∈ R(E), b2 ∈ N(E ) for all b ∈ Y and

2 2 2 |Ex − b| = |Ex − b1| + |b2| . Example (R(E) is not closed) Consider X = L2(0, 1) and define a linear operator E ∈ L(X) by Z t (Ef)(t) = f(s) ds, t ∈ (0, 1) 0 Since R(E) ⊂ {absolutely continuous functions on [0, 1]}, R(E) = L2(0, 1) and R(E) is not a closed subspace of L2(0, 1).

3.2 Minimum norm problem We consider 2 min |x| subject to (x, yi)X = ci, 1 ≤ i ≤ m. (3.4)

Theorem 4 (Minimum Norm) Suppose {yi} are linearly independent. Then there ∗ ∗ Pm m exits a unique solution x to (3.4) and x = i=1 βiyi where β = {βi}i=1 satisfies

Gβ = c, Gi,j = (yi, yj).

44 m ⊥ ∗ Proof: Let M = span{yi}i=1. Since X = M ⊕ M , it is necessary that x ∈ M, i.e., m ∗ X x = βiyi. i=1 ∗ n From the constraint (x , yi) = ci, 1 ≤ i ≤ m, the condition for β ∈ R follows. In ∗ ⊥ fact, x ∈ X satisfies (x, yi) = ci, 1 ≤ i ≤ m if and only if x = x + z, z ∈ M . But 2 ∗ 2 2 |x| = |x | + |z| .  n,n Remark {yi} is linearly independent if and only if the symmetric matrix G ∈ R is positive definite since n X 2 n (x, Gx)Rn = | xk yk|X for all x ∈ R . k=0 Theorem 5 (Minimum Norm) Suppose E ∈ L(X,Y ). Consider the minimum norm problem min |x|2 subject to Ex = c If R(E) = Y , then the minimum norm solution x∗ exists and x∗ = E∗y, (EE∗)y = c. Proof: Since R(E∗) is closed by the closed range theorem, it follows from the decom- position theorem that X = N(E) ⊕ R(E∗). Thus, we x∗ = E∗y if there exists y ∈ Y such that (EE∗)y = c. Since Y = N(E∗) ⊕ R(E), N(E∗) = {0}. Since if EE∗x = 0, then (x, EE∗x) = |E∗x| = 0 and E∗x = 0. Thus, N(EE∗) = N(E∗) = {0}. Since R(EE∗) = R(E) it follows from the open mapping theorem EE∗ is bounded invertible ∗ and there exits a unique solution to (EE )y = c.  Remark If R(E) is closed and c ∈ R(E), we let Y = R(E) and Theorem 5 is valid. For example we define E ∈ L(X,Rm) by

m Ex = ((y1, x)X , ··· , (ym, x)X ) ∈ R .

m m m If {yi}i=1 are dependent, then span{yi}i=1 is a closed subspace of R Example (DC-motor Control) Let (θ(t), ω(t))) satisfy

d dt ω(t) + ω(t) = u(t), ω(0) = 0 (3.5) d dt θ(t) = ω(t), θ(0) = 0 This models a control problem for the dc-motor and the control function u(t) is the current applied to the motor, θ is the angle and ω is the angular velocity. The objective to find a control u on [0, 1] such the desired state (θ,¯ 0) is reached at time t = 1 and with minimum norm Z 1 |u(t)|2 dt. 0 Let X = L2(0, 1). Given u ∈ X we have the unique solution to (3.5) and

R 1 t−1 ω(1) = 0 e u(t) dt

R 1 t−1 θ(t) = 0 (1 − e )u(t) dt

45 Thus, we can formulate this control problem as problem (3.4) by letting X = L2(0, 1) with the standard inner product and

t−1 t−1 y1(t) = 1 − e , y2(t) = e c1 = θ,¯ c2 = 0.

Example (Linear Control System) Consider the linear control system

d x(t) = Ax(t) + Bu(t), x(0) = x ∈ Rn (3.6) dt 0 where x(t) ∈ Rn and u(t) ∈ Rm are the state and the control function, respectively and system matrices A ∈ Rn×n and B ∈ Rn×m are given. Consider a control problem of finding a control that transfers x(0) = 0 to a desired statex ¯ ∈ Rn at time T > 0, i.e., x(T ) =x ¯ with minimum norm. The problem can be formulated as (3.4). First, we have the variation of constants formula: Z t At A(t−s) x(t) = e x0 + e Bu(s) ds 0

At d At At At where e is the matrix exponential and satisfies dt e = Ae = e A. Thus, condition x(T ) =x ¯ is written as Z T eA(T −t)Bu(t) ds =x ¯ 0 Let X = L2(0,T ; Rm) and it then follows from Theorem 4 that the minimum norm solution is given by t u∗(t) = BtEA (T −t)β, Gβ =x, ¯ provided that the control Gramean

T Z t G = eA(T −s)BBteA (T −t) dt ∈ Rn×n 0 is positive on Rn. If G is positive, then control system (3.6) is controllable and it is equivalent to t BteA x = 0, t ∈ [0,T ] implies x = 0. since T Z t (x, Gx) = |BteA tx|2 dt 0 Moreover, G is positive if and only if the Kalman rank condition holds; i.e.,

rank [B, AB, ··· An−1B] = n.

In fact, (x, Gx) = 0 implies BteAttx = 0, t ≥ 0 and thus Bt(At)k = 0 for all k ≥ 0 and thus the claim follows from the Cayley-Hamilton theorem. Example (Function Interpolation) 1 (1) Consider the minimum norm on X = H0 (0, 1): Z 1 d 2 1 min | u| dx over H0 (0, 1) (3.7) 0 dx

46 subject to u(xi) = ci, 1 ≤ i ≤ m, where 0 < x1 < ··· < xm < 1 are given nodes. 1 Let X = H0 (0, 1). From the mechanical system example we have the piecewise linear function yi(x) ∈ X such that (u, yi)X = u(xi) for all u ∈ X for each i. From Theorem 3 the solution to (3.7) is given by

m ∗ X u (x) = βi yi(x). i=1

∗ That is, u is a piecewise linear function with nodes {xi}, 1 ≤ i ≤ m and nodal values ci ci at xi, i.e., βi = , 1 ≤ i ≤ n. yi(xi) (2) Consider the Spline interpolation problem (1.2):

Z 1 2 1 d 2 min | 2 u(x)| dx over all functions u(x) 0 2 dx

0 subject to the interpolation conditions u(xi) = bi, u (xi) = ci, 1 ≤ i ≤ m. (3.8) 2 2 Let X = H0 (0, 1) be the completion of the pre-Hilbert space C0 [0, 1] = the space of twice continuously differentiable function with u(0) = u0(0) = 0 and u(1) = u0(1) = 0 2 with respect to H0 (0, 1) inner product;

Z 1 d2 d2 (f, g)H2 = 2 f 2 g dx, 0 0 dx dx i.e.,

d d2 H2(0, 1) = {u ∈ L2(0, 1) : u, u ∈ L2(0, 1) with u(0) = u0(0) = u(1) = u0(1) = 0}. 0 dx dx2

2 ∗ Problem (3.8) is a minimum norm problem on x = H0 (0, 1). For Fi ∈ X defined by Fi(v) = v(xi) we have

Z 1 d2 d2 (y , v) − F (v) = ( y + (x − x )χ ) v dx, (3.9) i X i 2 i i [0,xi] 2 0 dx dx

∗ 0 and for F˜i ∈ X defined by F˜i(v) = v (xi) we have

Z 1 d2 d2 (˜y , v) − F˜ (v) = ( y˜ − χ ) v dx. (3.10) i X i 2 i [0,xi] 2 0 dx dx Thus, d2 d2 y + (x − x )χ = a + bx, y˜ − χ =a ˜ + ˜bx dx2 i i [0,xi] dx2 i [0,xi] where a, a˜ , b, ˜b are constants and yi = RFi andy ˜i = RF˜i are piecewise cubic functions 0 0 on [0, 1]. One can determine yi, y˜i using the conditions y(0) = y (0) = y(1) = y (0) = 0. From Theorem 3 the solution to (3.8) is given by

m ∗ X u (x) = (βi yi(x) + β˜i y˜i(x)) i=1

47 ∗ That is, u is a piecewise cubic function with nodes {xi}, 1 ≤ i ≤ m and nodal value ci and derivative bi at xi. Example (Differential form, Natural Boundary Conditions) (1) Let X be the Hilbert space defined by du X = H1(0, 1) = {u ∈ L2(0, 1) : ∈ L2(0, 1)}. dx Given a, c ∈ C[0, 1] with a > 0 c ≥ 0 and α ≥ 0 consider the minimization Z 1 1 du α min ( (a(x)| |2 + c(x)|u|2) − uf) dx + |u(1)|2 − gu(1) 0 2 dx 2 over u ∈ X = H1(0, 1). If u∗ is a minimizer, it follows from Theorem 1 that u∗ ∈ X satisfies Z 1 du∗ dv (a(x) + c(x) u∗v − fv) dx + α u(1)v(1) − gv(1) = 0 (3.11) 0 dx dx for all v ∈ X. By the integration by parts Z 1 d du∗ du∗ du∗ (− (a(x) )+c(x)u∗(x)−f(x))v(x) dx−a(0) (0)v(0)+(a(1) (1)+α u∗(1)−g)v(1) = 0, 0 dx dx dx dx du∗ 1 1 assuming a(x) dx ∈ H (0, 1). Since v ∈ H (0, 1) is arbitrary, d du∗ − (a(x) ) + c(x) u∗ = f(x), x ∈ (0, 1) dx dx (3.12) du∗ du∗ a(0) (0) = 0, a(1) (1) + α u∗(1) = g. dx dx (3.11) is the weak form and (3.12) is the strong form of the optimality condition. (2) Let X be the Hilbert space defined by du d2u X = H2(0, 1) = {u ∈ L2(0, 1) : , ∈ L2(0, 1)}. dx dx Given a, b, c ∈ C[0, 1] with a > 0 b, c ≥ 0 and α ≥ 0 consider the minimization on X = H2(0, 1) Z 1 2 1 d u 2 du 2 2 α 2 β 0 2 0 min ( (a(x)| 2 | +b(x)| | +c(x)|u| )−uf) dx+ |u(1)| + |u (1)| −g1u(1)−g2u (1) 0 2 dx dx 2 2 over u ∈ X = H2(0, 1) satisfying u(0) = 1. If u∗ is a minimizer, it follows from Theorem 1 that u∗ ∈ X satisfies Z 1 2 ∗ 2 ∗ d u d v du dv ∗ ∗ ∗ 0 0 0 (a(x) 2 2 +b(x) +c(x) u v−fv) dx+α u (1)v(1)+β (u ) (1)v (1)−g1v(1)−g2v (1) = 0 0 dx dx dx dx (3.13) for all v ∈ X satisfying v(0) = 0. By the integration by parts Z 1 2 2 d d ∗ d d ∗ ∗ ( 2 (a(x) 2 u ) − (b(x) u ) + c(x)u (x) − f(x))v(x) dx 0 dx dx dx dx

d2u∗ du∗ d2u∗ +(a(1) (1) + β (1) − g )v0(1) − a(0) (0)v0(0) dx2 dx 2 dx2

d d2u∗ du∗ d d2u∗ du∗ +(− (a(x) + b(x) + α u∗ − g )(1)v(1) − (− (a(x) ) + b(x) )(0)v(0) = 0 dx dx2 dx 1 dx dx2 dx

48 assuming u∗ ∈ H4(0, 1). Since v ∈ H2(0, 1) satisfying v(0) = 0 is arbitrary,

d2 d2 d du∗ (a(x) u∗) − (b(x) ) + c(x) u∗ = f(x), x ∈ (0, 1) dx2 dx2 dx dx

d2u∗ u∗(0) = 1, a(0) (0) = 0 dx2

d2u∗ du∗ d d2u∗ du∗ a(1) (1) + β (1) − g = 0, (− (a(x) ) + b(x) )(1) + α u∗(1) − g = 0. dx2 dx 2 dx dx2 dx 1 (3.14) (3.13) is the weak form and (3.14) is the strong (differential) form of the optimality condition. Next consider the multi dimensional case. Let Ω be a subdomain in Rd with Lips- chitz boundary and Γ0 is a closed subset of the boundary ∂Ω with Γ1 = ∂Ω \ Γ0. Let Γ is a smooth hypersurface in Ω.

1 1 HΓ0 (Ω) = {u ∈ H (Ω) : u(s) = 0, s ∈ Γ0} with norm |∇u|L2(Ω). Let Γ is a smooth hyper-surface in Ω and consider the weak form of equation for u ∈ H1 (Ω): Γ0 Z Z (σ(x)∇u · ∇φ + u(x)(~b(x) · ∇φ) + c(x) u(x)φ(x)) dx + α(s)u(s)φ(s) ds Γ1 Z Z Z − g0(s)φ(s) ds − f(x)φ(x) dx − g(s)u(s) ds = 0 Γ Ω Γ1 for all H1 (Ω). Then the differential form for u ∈ H1 (Ω) is given by Γ0 Γ0  −∇ · (σ(x)∇u +~bu) + c(x)u(x) = f in Ω    n · (σ(x)∇u +~bu) + α u = g at Γ1     [n · (σ(x)∇u +~bu)] = g0 at Γ.

By the divergence theorem if u satisfies the strong form, then u is a weak solution. Conversely, letting φ ∈ C0(Ω \ Γ) we obtain the first equation, i.e., ∇ · (σ∇u + ~bu) ∈ L2(Ω) Thus, n·(σ∇u+~bu) ∈ L2(Γ) and the third equation holds. Also, n·(σ∇u+~bu) ∈ 2 L (Γ1) and the third equation holds.

3.3 Lax-Milgram Theory and Applications Let H be a Hilbert space with scalar product (φ, ψ) and X be a Hilbert space and X ⊂ H with continuous dense injection. Let X∗ denote the of X. H is identified with its dual so that X ⊂ H = H∗ ⊂ X∗ (i.e., H is the pivoting space). The dual product hφ, ψi on X∗ × X is the continuous extension of the scalar product of H restricted to H × X. This framework is called the Gelfand triple. Let σ is a bounded coercive bilinear form on X ×X. Note that given x ∈ X, F (y) = σ(x, y) defines a bounded linear functional on X. Since given x ∈ X, y → σ(x, y) is a

49 bounded linear functional on X, say x∗ ∈ X∗. We define a linear operator A from X into X∗ by x∗ = Ax. Equation σ(x, y) = F (y) for all y ∈ X is equivalently written as an equation Ax = F ∈ X∗. (3.15) It from the Lax-Milgram theorem that A has the bouned inverse A−1 ∈ L(X∗,X) Also,

hAx, yiX∗×X = σ(x, y), x, y ∈ X, and thus A is a bounded linear operator. In fact,

|Ax|X∗ ≤ sup |σ(x, y)| ≤ M |x|. |y|≤1

Let R be the Riesz operator X∗ → X, i.e.,

∗ ∗ ∗ ∗ |Rx |X = |x | and (Rx , x)X = hx , xi for all x ∈ X, then Aˆ = RA represents the linear operator Aˆ ∈ L(X,X). Moreover, we define a linear operator A˜ on H by Ax˜ = Ax ∈ H with dom (A˜) = {x ∈ X : |σ(x, y)| ≤ cx |y|H for all y ∈ X}. That is, A˜ is a restriction of A on dom (A˜). It thus follows from the Lax-Milgram theorem dom (A˜) = A−1H. We will use the symbol A for all three linear operators as above in the lecture note and its use should be understood by the underlining context.

3.4 Quadratic Constrained Optimization In this section we discuss the constrained minimization. First, we consider the case of equality constraint. Let A be coercive and self-adjoint operator on X and E ∈ L(X,Y ). Let σ : X × X → R is a symmetric, bounded, coercive bilinear form, i.e.,

σ(α x1 + β x2, y) = α σ(x1, y) + β σ(x2, y) for all x1, x2, y ∈ X and α, β ∈ R

2 σ(x, y) = σ(y, x), |σ(x, y)| ≤ M |x|X |y|X , σ(x, x) ≥ δ |x| for some 0 < δ ≤ M < ∞. Thus, σ(x, y) defines an inner product on X. In fact, since σ is bounded and coercive, pσ(x, x) = p(Ax, x) is an equivalent norm of X. Since y → σ(x, y) is a bounded linear functional on X, say x∗ and define a linear ∗ ∗ ∗ operator from X to X by x = Ax. Then, A ∈ L(X,X ) with |A|X→X∗ ≤ M and ∗ hAx, yiX∗×X = σ(x, y) for all x, y ∈ X. Also, if RX∗→X is the Riesz map of X and define Aˆ = RX∗→X A ∈ LX,X), then

(Ax,ˆ y)X = σ(x, y) for all x, y ∈ X,

50 and thus Aˆ is a self-adjoint operator in X. Throughout this section, we assume X = X∗ and Y = Y ∗, i.e., we identify A as the self-adjoint operator A˜. Consider the unconstrained minimization problem: for f ∈ X∗ 1 min F (x) = σ(x, x) − f(x). (3.16) 2 Since for x, v ∈ X and t ∈ R

F (x + tv) − F (x) t = σ(x, v) − f(v) + σ(v, v), t 2 x ∈ X → F (x) ∈ R is differentiable and we have the necessary optimality for (3.16):

σ(x, v) − f(v) = 0 for all v ∈ X. (3.17)

Or, equivalently Ax = f in X∗ ∗ Also, if RX∗→X is the Riesz map of X and define A˜ = RX∗→X A ∈ L(X,X), then

(Ax,˜ y)X = σ(x, y) for all x, y ∈ X, and thus A˜ is a self-adjoint operator in X. Throughout this section, we assume X = X∗ and Y = Y ∗, i.e., we identify A as the self-adjoint operator A˜. Consider the equality constraint minimization; 1 min (Ax, x) − (a, x) subject to Ex = b ∈ Y, (3.18) 2 X X or equivalently for f ∈ X∗ 1 min σ(x, x) − f(x) subject to Ex = b ∈ Y. 2 The necessary optimality for (3.18) is: Theorem 6 (Equality Constraint) Suppose there exists ax ¯ ∈ X such that Ex¯ = b then (3.18) has a unique solution x∗ ∈ X and Ax∗ − a ⊥ N(E). If R(E∗) is closed, then there exits a Lagrange multiplier λ ∈ Y such that

Ax∗ + E∗λ = a, Ex∗ = b. (3.19)

If range(E) = Y (E is surjective), there exits a Lagrange multiplier λ ∈ Y is unique. Proof: Suppose there exists ax ¯ ∈ X such that Ex¯ = b then (3.18) is equivalent to 1 min (A(z +x ¯), z +x ¯) − (a, z +x ¯) over z ∈ N(E). 2 Thus, it follows from Theorem 2 that it has a unique minimizer z∗ ∈ N(E) and (A(z∗ +x ¯) − a, v) = 0 for all v ∈ N(E) and thus A(z∗ +x ¯) − a ⊥ N(E). From the orthogonal decomposition theorem Ax∗ − a ∈ R(E∗). Moreover if R(E∗) is surjective, from the closed range theorem R(E∗) = R(E∗) and there exist exists a λ ∈ Y satisfying

51 Ax∗ +E∗λ = a. If E is surjective (R(E) = Y ), then N(E∗) = 0 and thus the multiplier λ is unique.  Remark (Lagrange Formulation) Define the Lagrange functional

L(x, λ) = J(x) + (λ, Ex − b).

The optimality condition (3.19) is equivalent to ∂ ∂ L(x∗, λ) = 0, L(x∗, λ) = 0. ∂x ∂λ Here, x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y is the dual variable. Remark (1) If E is not surjective, one can seek a solution using the penalty formula- tion: 1 β min (Ax, x) − (a, x) + |Ex − b|2 (3.20) 2 X X 2 Y for β > 0. The necessary optimality condition is

∗ Axβ + β E (Exβ − b) − a = 0

If we let λβ = β (Ex − b), then we have

 ∗      AE xβ a I = . E − β λβ b the optimality system (3.19) is equivalent to the case when β = ∞. One can prove the monotonicity J(xβ) ↑, |Exβ − b|Y ↓ as β → ∞ In fact for β > βˆ we have

β 2 β 2 J(xβ) + 2 |Exβ − b|Y ≤ J(xβˆ) + 2 |Exβˆ − b|Y

βˆ 2 βˆ 2 J(xβˆ) + 2 |Exβˆ − b|Y ≤ J(xβ) + 2 |Exβ − b|Y . Adding these inequalities, we obtain

ˆ 2 2 (β − β)(|Exβ − b|Y − |Exβˆ − b|Y ) ≤ 0, which implies the claim. Suppose there exists ax ¯ ∈ X such that Ex¯ = b then from Theorem 6 β J(x ) + |Ex − b|2 ≤ J(x∗). β 2 β and thus |xβ|X is bounded uniformly in β > 0 and

lim |Exβ − b| = 0 β→∞

Since X is a Hilbert space, there exists a weak limit x ∈ X with J(x) ≤ J(x∗) and Ex = b since norms are weakly sequentially lower semi-continuous. Since the optimal ∗ ∗ solution x to (3.18) is unique, lim xβ = x as β → ∞. Suppose λβ is bounded in Y ,

52 ∗ then there exists a subsequence of λβ that converges weakly to λ in Y . Then, (x , λ) ∗ ∗ ∗ satisfies Ax + E λ = a. Conversely, if there is a Lagrange multiplier λ ∈ Y , then λβ is bounded in Y . In fact, we have

∗ ∗ ∗ ∗ ∗ (A(xβ − x ), xβ − x ) + (xβ − x ,E (λβ − λ )) = 0 where

∗ ∗ ∗ ∗ 2 ∗ β (xβ − x ,E (λβ − λ )) = β (Exβ − b, λβ − λ ) = |λβ| − (λβ, λ ).

Thus we obtain ∗ ∗ 2 ∗ β (xβ − x ,A(xβ − x )) + |λβ| = (λβ, λ ) ∗ ∗ and hence |λβ| ≤ |λ |. Moreover, if R(E) = Y , then N(E ) = {0}. Since λβ satisfies ∗ E λβ = a − Axβ ∈ X, we have

∗ −1 λβ = Esβ, sβ = (E E) (a − Axβ) and thus λβ ∈ Y is bounded uniformly in β > 0. (2) If E is not surjective but R(E) is closed. We have λ ∈ R(E) ⊂ Y such that

Ax∗ + E∗λ = a by letting Y = R(E) is a Hilbert space. Also, we have (xβ, λβ) → (x, λ) satisfying

Ax + E∗λ = a

Ex = PR(E)b since 2 2 2 |Ex − b| = |Ex − PR(E)b| + |PN(E∗)b| . Example (Parameterized constrained problem)

1 min 2 ((Ay, y) + α (p, p)) − (a, y)

subject to E1y + E2p = b where x = (y, p) ∈ X1 × X2 = X and E1 ∈ L(X1,Y ), E2 ∈ L(X2,Y ). Assume Y = X1 and E1 is bounded invertible. Then we have the optimality

 Ay∗ + E∗λ = a  1   ∗ α p + E2 λ = 0    ∗ ∗  E1y + E2p = b

1 1 Example (Stokes system) Let X = H0 (Ω) × H0 (Ω) where Ω is a bounded open set in 2 1 1 R . A Hilbert space H0 (Ω) is the completion of C0(Ω) with respect to the H0 (Ω) inner product Z (f, g)H1(Ω) = ∇f · ∇g dx 0 Ω

53 Consider the constrained minimization Z 1 2 2 ( (|∇u1| + |∇u2| ) − f · u) dx Ω 2 over the velocity field u = (u1, u2) ∈ X satisfying ∂ ∂ div u = u1 + u2 = 0. ∂x1 ∂x2 From Theorem 5 with Y = L2(Ω) and Eu = div u on X we have the necessary opti- mality −∆u + gradp = f, div u = 0 which is called the Stokes system. Here f ∈ L2(Ω) × L2(Ω) is a applied force p = −λ∗ ∈ L2(Ω) is the pressure.

3.5 Variational Inequalities Let Z be a Hilbert lattice with order ≤ and Z∗ = Z and G ∈ L(X,Z). For exam- ple, Z = L2(Ω) with a.e. pointwise inequality constarint. Consider the quadratic programming: 1 min (Ax, x) − (a, x) subject to Ex = b, Gx ≤ c. (3.21) 2 X X Note that the constrain set C = {x ∈ X, Ex = b, Gx ≤ c} is a closed convex set in X. In general we consider the constrained problem on a Hilbert space X:

F0(x) + F1(x) over x ∈ C (3.22)

1 where C is a closed convex set, F : X → R is C and F1 : X → R is convex, i.e.

F1((1 − λ)x1 + λ x2) ≥ (1 − λ) F1(x1) + λ F (x2) for all x, x2 ∈ X and 0 ≤ λ ≤ 1.

Then, we have the necessary optimality in the form of the variational inequality: Theorem 7 (Variational Inequality) Suppose x∗ ∈ C minimizes (3.22), then we have the optimality condition

0 ∗ ∗ ∗ F0(x )(x − x ) + F1(x) − F1(x ) ≥ 0 for all x ∈ C. (3.23)

∗ ∗ Proof: Let xt = x + t (x − x ) for all x ∈ C and 0 ≤ t ≤ 1. Since C is convex, xt ∈ C. Since x∗ ∈ C is optimal,

∗ ∗ F0(xt) − F0(x ) + F1(xt) − F1(x ) ≥ 0, (3.24) where ∗ F1(xt) ≤ (1 − t)F1(x ) + tF (x) and thus ∗ ∗ F1(xt) − F1(x ) ≤ t (F1(x) − F1(x )).

54 Since F (x ) − F (x∗) 0 t 0 → F 0(x∗)(x − x∗) as t → 0+, t 0 + the variational inequality (3.23) follows from (3.24) by letting t → 0 .  Example (Inequality constraint) For X = Rn F 0(x∗)(x − x∗) ≥ 0, x ∈ C implies that F 0(x∗) · ν ≤ 0 and F 0(x∗) · τ = 0, where ν is the outward normal and τ is the tangents of C at x∗. Let X = R2, C = [0, 1] × [0, 1] and assume x∗ = [1, s], 0 < s < 1. Then, ∂F ∂F (x∗) ≤ 0, (x∗) = 0. ∂x1 ∂x2 If x∗ = [1, 1] then ∂F ∂F (x∗) ≤ 0, (x∗) ≤ 0. ∂x1 ∂x2 Example (L1-optimization) Let U be a closed convex subset in R. Consider the mini- mization problem on X = L2(Ω) and C = {u ∈ U a.e. in Ω}; Z 1 2 min |Eu − b|Y + α |u(x)| dx. over u ∈ C. (3.25) 2 Ω Then, the optimality condition is Z ∗ ∗ ∗ (Eu − b, E(u − u ))Y + α (|u| − |u |) dx ≥ 0 (3.26) Ω for all u ∈ C. Let p = −E∗(Eu∗ − b) ∈ X and u = v ∈ U on |x − x¯| ≤ δ, otherwise u = u∗. From (4.31) we have 1 Z (−p(v − u∗(x)) + α (|v| − |u∗(x)|) dx ≥ 0. δ |x−x¯|≤δ Suppose u∗ ∈ L1(Ω) and letting δ → 0+ we obtain −p(v − u∗(¯x)) + α (|v| − |u∗(¯x)|) ≥ 0 a.e.x ¯ ∈ Ω (Lebesgue points of |u∗| and for all v ∈ U. That is, u∗ ∈ U minimizes −pu + α |u| over u ∈ U. (3.27) ∗ u∗ Suppose U = R it implies that if |p| < α, then u = 0 and otherwise p = α |u∗| = α sign(u∗). Thus, we obtain the necessary optimality (E∗(Eu∗ − b))(x) + α ∂|u|(u∗(x)) = 0, a.e. in Ω, where the sub-differential of |u| (see, Section ) is defined by   1 s > 0 ∂|u|(s) = [−1, 1] s = 0  −1 s < 0.

55 3.5.1 Inequality constraint problem Consider the quadratic programing problem. Theorem 8 (Quadratic Programming) Let Z be a Hilbert space lattice with order ≤. For E ∈ L(X,Y ) and G ∈ L(X,Z) consider the quadratic programming (3.21): 1 min (Ax, x) − (a, x) subject to Ex = b, Gx ≤ c. 2 X If there existsx ¯ ∈ X such that Ex¯ = b and Gx¯ ≤ c, then there exits unique solution to (3.21) and (Ax∗ − a, x − x∗) ≥ 0 for all x ∈ C, (3.28) where C = {Ex = b, Gx ≤ c}. Let Z = Z+ + Z− with positive corn Z+ and negative cone Z−. Moreover, if we assume the regular point condition

0 ∈ int {(E(x − x∗),G(x − x∗) − Z− + G(x∗) − c): x ∈ X}, (3.29) then there exists λ ∈ Y and µ ∈ Z+ such that

 Ax∗ + E∗λ + G∗µ = a    Ex∗ = b (3.30)    ∗ ∗  Gx − c ≤ 0, µ ≥ 0 and (Gx − c, µ)Z = 0.

Proof: First note that (3.21) is equivalent to 1 min (A(z +x, ¯ z +x ¯) − (a, z +x ¯) subject to Ez = 0, Gz ≤ c − Gx¯ =c ˆ 2 X X

Since Cb = {Ez = 0, Gz ≤ cˆ} is a closed convex set in the Hilbert space N(E). As in the proof of Theorem 1 the minimizing sequence zn ∈ Cb satisfies 1 |z − z | ≤ J(z ) + J(z ) − 2 δ → 0 4 m n n m as m ≥ n → ∞. Since Cb is closed, there exists a unique minimizer z∗ ∈ Cb and z∗ satisfies (A(z∗ +x ¯) − a, z − z∗) ≥ 0 for all z ∈ Cb, which is equivalent to (3.28). Moreover, it follows from the Lagrange Multiplier theory (below) if the regular point condition holds, then we obtain (3.30).  m ∗ Example Suppose Z = R . If (Gx − c)i = 0, then i is called an active index and let Ga the (row) vectors of G corresponding to the active indices. Suppose

 E  is surjective (Kuhn-Tuker condition), Ga then the regular point condition holds. Remark (Lagrange Formulation) Define the Lagrange functional;

L(x, λ, µ) = J(x) + (λ, Ex − b) + (µ, (Gx − c)+).

56 for (x, λ) ∈ X × Y, µ ∈ Z+; The optimality condition (3.30) is equivalent to ∂ ∂ ∂ L(x∗, λ, µ) = 0, L(x∗, λ, µ) = 0, L(x∗, λ, µ) = 0 ∂x ∂λ ∂µ Here x ∈ X is the primal variable and the Lagrange multiplier λ ∈ Y and µ ∈ Z+ are the dual variables. Remark Let A ∈ L(X,X∗) and E : X → Y is surjective. If we define G is natural injection X to Z. Define µ ∈ X∗ by

−µ = Ax∗ − a + E∗λ.

Since ht(x∗ − c) − (x∗ − c), µi = (t − 1) hx∗ − c, µi ≤ 0 for t > 0 we have hx∗ − c, µi = 0. Thus, hx − c, µi ≤ 0 for all x − c ∈ Z+ ∩ X ∗ If we define the dual cone X+ by ∗ ∗ ∗ ∗ X+ = {x ∈ X : hx , xi ≤ 0 for all x ∈ X and x ≤ 0},

∗ + then µ ∈ X+. Thus, the regular point condition gives the stronger regularity µ ∈ Z for the Lagrange multiplier µ. Example (Source Identification) Let X = Z = L2(Ω) and S : L(X,Y ) and consider the constrained minimization: Z 1 2 α 2 min |Sf − y|Y + f dx + |f| 2 Ω 2 subject to −f ≤ 0. The second term represents the L1(Ω) regularization for the sparsity of f. In this case, A = α I + S∗S. α > 0, a = S∗y − β and G = −I. The regular point condition (??) is given by

0 ∈ int {(−Z + f ∗) − Z− − f ∗},

Since −Z + f ∗ contains Z− and the regular point condition holds. Hence there exists µ ∈ Z+ such that

α f ∗ + S∗Sf ∗ + β + µ = S∗y, µ ≥ 0 and (−f ∗, µ) = 0.

For example, the operator S is defined for Poisson equation:

−∆u = f in Ω ∂ with a boundary condition + u = 0 at Γ = ∂Ω and Sf = Cu with a measurement ∂u ˆ operator C, e.g., Cu = u|Γ or Cu = u|Ωˆ with Ω, a subdomain of Ω. Remark (Penalty Method) Consider the penalized problem

β β J(x) + |Ex − b|2 + | max(0, Gx − c)|2 . (3.31) 2 Y 2 Z The necessary optimality condition is given by

∗ ∗ + Ax + E λβ + G µβ = a, λβ = β (Exβ − b), µβ = β (Gxβ − c) .

57 It follows from Theorem 7 that there exists a unique solution xβto (3.21) and satisfies β β J(x ) + |Ex − b|2 + | max(0, Gx − c)|2 ≤ J(x∗) (3.32) β 2 β Y 2 β Z

Thus, {xβ} is bounded in X and has a weak convergence subsequence that converges to x in X. Moreover, we have

Ex = b and max(0, Gx − c) = 0 and J(x) ≤ J(x∗) since the norm is weakly sequentially lower semi-continuous. Since ∗ ∗ the minimizer to (3.21) is unique, x = x and limβ→∞ J(xβ) = J(x ) which implies ∗ that xβ → x strongly in X. Suppose λβ ∈ Y and µβ ∈ Z are bounded, there exists a subsequence that converges ∗ weakly to (λ, µ) in Y × Z and µ ≥ 0. Since limβ→∞ J(xβ) = J(x ) it follows from (3.32) that (µβ, Gxβ − c) → 0 as β → ∞ and thus (µ, Gx∗ − c) = 0. Hence (x∗, λ, µ) satisfies (3.30). Conversely, if there is a ∗ ∗ Lagrange multiplier (λ , µ ) ∈ Y , then (λβ, µβ) is bounded in Y × Z. In fact, we have

∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ (xβ − x ,A(xβ − x )) + (xβ − x ,E (λβ − λ )) + (xβ − x ,G (µβ − µ )) ≤ 0 where

∗ ∗ ∗ ∗ 2 ∗ β (xβ − x ,E (λc − λ )) = β (Exβ − b, λβ − λ ) = |λβ| − (λβ, λ ) and

∗ ∗ ∗ ∗ ∗ 2 ∗ β (xβ − x ,G (µβ − µ )) = β (Gxβ − c − (Gx − c), µβ − µ ) ≥ |µβ| − (µβ, µ ).

Thus we obtain

∗ ∗ 2 2 ∗ β (xβ − x ,A(xβ − x )) + |λβ| + |µβ| ≤ (λβ, λ) + (µβ, µ )

∗ ∗ and hence |(λβ, µβ)| ≤ |(λ , µ )|. Moreover, we have Lemma Consider the penalized problem β β J(x) + |E(x)|2 + | max(0,G(x))|2 . 2 Y 2 Z over x ∈ C, where C is a closed convex set. Let xβ is a minimizer. Then

λβ = β E(xβ), µβ = max(0, Gxβ) are uniformly bounded. Proof: The necessary optimality condition is given by

0 0 ∗ 0 ∗ (J (xβ) + (E (xβ)) λβ + (G (xβ)) µβ, x − xβ) ≥ 0 for all x ∈ C

λβ = β E(xβ), µβ = β max(0,G(xβ)).

58 For all feasible solutionx ¯ ∈ C β β J(x ) + |E(x )|2 + | max(0,G(x ))|2 ≤ J(¯x). (3.33) β 2 β Y 2 β Z

Assume that {xβ} is bounded in X (for example, J is J(x) → ∞ as |x| → ∞, x ∈ |calC or C is bounded) and thus has a subsequence that converges weakly to x∗ ∈ C. ∗ ∗ Moreover, if E(xβ) → E(x ) and G(xβ) → G(x ) weakly in Y and Z, respectively (i.e., E and G are weakly sequentially continuous), then

E(x∗) = 0 and max(0,G(x∗)) = 0. and J(x∗) ≤ J(¯x). Since J is weakly sequentially lower semi-continuous. Hence x∗ is ∗ ∗ a minimizer of (3.34) and limβ→∞ J(xβ) = J(x ). If J is convex, xβ → x in X. Suppose λβ ∈ Y and µβ ∈ Z are bounded, there exists a subsequence that converges ∗ weakly to (λ, µ) in Y × Z and µ ≥ 0. Since limβ→∞ J(xβ) = J(x ) it follows from (3.33) that (µβ,G(xβ)) → 0 as β → ∞ and thus (µ, G(x∗)) = 0. Hence (x∗, λ, µ) satisfies (3.39). Next we will show that λβ ∈ Y and µβ ∈ Z are bounded under the regular pint condition. Note that

0 0 ∗ ∗ 0 0 ∗ ∗ 0 ∗ (λβ,E (xβ)(x − xβ)) = (λβ,E (x )(x − x ) + (E (xβ) − E (x ))(x − x ) + E (xβ)(x − xβ))

0 0 ∗ ∗ 0 0 ∗ ∗ 0 ∗ (µβ,G (xβ)(x − xβ)) = (µβ,G (x )(x − x ) + (G (xβ) − G (x ))(x − x ) + G (xβ)(x − xβ)) and it follows from the regular point condition that there exists ρ > 0 and x ∈ C and z ≤ 0 such that 0 ∗ ∗ −ρ λβ = E (x )(x − x )

0 ∗ ∗ ∗ −ρ µβ = G (x )(x − x ) − z + G(x ). Since for all x ∈ C

0 0 0 (J (xβ), x − xβ) + (λβ,E (xβ)(x − xβ) + (µβ,G (xβ)(x − xβ)) ≥ 0 we obtain

2 2 0 ∗ 0 0 ∗ ∗ ρ (|λβ| + |µβ| ) ≤ (J (xβ), x − xβ) − (λβ, (E (xβ) − E (x ))(x − x ))

0 0 ∗ ∗ ∗ −(µβ, (G (xβ) − G (x ))(x − x )) − (µβ,G(xβ)) + (µβ,G(xβ) − G(x )).

∗ ∗ where we used (µβ, z) ≤ 0. Since |x − x | ≤ ρ M and |xβ − x | → 0, there exist a constant M˜ > 0 such that for sufficiently large β > 0 ˜ |λβ| + |µβ| ≤ M. Example (Quadratic Programming on Rn) Let X = Rn. Given a ∈ Rn, b ∈ Rm and c ∈ Rp consider 1 min (Ax, x) − (a, x) 2

subject to Bx = b, Cx ≤ c

59 n×n m×n p×n where A ∈ R is symmetric and positive, B ∈ R and C ∈ R . Let Ca the (row)  B  vectors of C corresponding to the active indices and assuming is surjective, Ca the necessary and sufficient optimality condition is given by

 Ax∗ + Btλ + Ctµ = a    Bx∗ = b    ∗ ∗  (Cx − c)j ≤ 0 µj ≥ 0 and (Cx − c)jµj = 0.

Example (Minimum norm problem with inequality constraint)

2 min |x| subject to (x, yi) ≤ ci, 1 ≤ i ≤ m.

∗ Assume {yi} are linearly independent. Then it has a unique solution x and the necessary and sufficient optimality is given by

∗ Pm x + i=1 µiyi = 0

∗ ∗ (x , yi) − ci ≤ 0, µi ≥ 0 and ((x , yi) − ci)µi = 0, ; 1 ≤ i ≤ m.

That is, if G is Grammian (Gij = (yi, yj)) then

∗ ∗ (−Gx − c)i ≤ 0 µi ≥ 0 and (−Gx − c)iµi = 0, 1 ≤ i ≤ m.

Example (Pointwise Obstacle Problem) Consider

Z 1 1 d Z 1 min | u(x)|2 − f(x)u(x) dx 0 2 dx 0 1 subject to u( ) ≤ c 2

1 over u ∈ X = H0 (0, 1). Let f = 1. The Riesz representation of

Z 1 1 F1(u) = f(t)u(t) dt, F2(u) = u( ) 0 2 are 1 1 1 1 y (x) = x(1 − x). y (x) = ( − |x − |), 1 2 2 2 2 2 respectively. Assume that f = 1. Then, we have 1 1 µ u∗ − y + µ y = 0, u∗( ) = − ≤ c, µ ≥ 0. 1 2 2 8 4

1 ∗ 1 ∗ 1 Thus, if c ≥ 8 , then u = y1 and µ = 0. If c ≤ 8 , then u = y1 − µ y2 and µ = 4(c − 8 ).

60 3.6 Constrained minimization in Banach spaces and La- grange multiplier theory In this section we discuss the general nonlinear programming. Let X,Y be Hilbert spaces and Z be a Hilbert lattice. Consider the constrained minimization;

min J(x) subject to E(x) = 0 and G(x) ≤ 0. (3.34) over a closed convex set C in X. Here, where J : X → R, E : X → Y and G : X → Z are continuously differentiable. Definition (Derivatives) (1) The functional F on a Banach space X is Gateaux differentiable at u ∈ X if for all d ∈ X F (u + td) − F (u) lim = F 0(u)(d) t→0 t exists. If the G-derivative d ∈ X → F 0(u)(d) ∈ R is liner and bounded, then F is Frechet differentiable at u. (2) E : X → Y , Y ia Banach space is Frechet differentiable at u if there exists a linear bounded operator E0(u) such that

|E(u + d) − E(u) − E0(u)d| lim Y = 0. |d|X →0 |d|X

If Frechet derivative u ∈ X → E0(u) ∈ L(X,Y ) is continuous, then E is continuously differentiable. Definition (Lower semi-continuous) (1) A functional F is lower-semi continuous if lim inf F (xn) ≥ F ( lim xn) n→∞ n→∞ (2) A functional F is weakly lower-semi continuous if

lim inf F (xn) ≥ F (w − limn→∞ xn) n→∞ Theorem (Lower-semicontinuous) (1) Norm is weakly lower-semi continuous. (2) A convex lower-semicontinuous functional is weakly lower-semi continuous. ∗ ∗ ∗ Proof: Assume xn → x weakly in X. Let x ∈ F (x), i.e., hx , xi = |x ||x|. Then, we have 2 ∗ |x| = lim hx , xni n→∞ and ∗ ∗ |hx , xni| ≤ |xn||x |. Thus, lim inf |xn| ≥ |x|. n→∞ (2) Since F is convex, X X F ( tkxk) ≤ tk F (xk) k k

61 PP for all convex combination of xk, i.e., k tk = 1, tk ≥ 0. By the Mazur lemma there exists a sequence of convex combination of weak convergent sequence ({xk}, {F (xk)}) to (x, F (x)) in X × R that converges strongly to (x, F (x)) and thus

F (x) ≤ lim inf n → ∞ F (xn). Theorem (Existence) (1) A lower-semicontinuous functional F on a compact set S has a minimizer. (2) A weak lower-semicontinuous, coercive functional F has a minimizer.

Proof: (1) One can select a minimizing sequence {xn} such that F (xn) ↓ η = infx∈S F (x). Since S is a compact, there exits a subsequaence that converges strongly to x∗ ∈ S. Since F is lower-semicontinuous, F (x∗) ≤ η. (2) Since F (x) → ∞ as |x| → ∞, there exits a bounded minimizing sequence {xn} of F . A bounded sequence in a reflexible Banach space X has a weak convergent subsequence ∗ ∗ to x ∈ X. Since F is weakly lower-semicontinuous, F (x ) ≤ η.  Next, we consider the constrained optimization in Banach spaces X,Y :

min F (y) subject to G(y) ∈ K (3.35) over y ∈ C, where K is a closed convex cone of a Banach space Y , F : X → R and G : X → Y are C1 and C is a closed convex set of X. For example, if K = {0}, then G(y) = 0 is the equality constraint. If Y = Z is a Hilbert lattice and

K = {z ∈ Z : z ≤ 0}, then G(y) ≤ 0 is the inequality constraint. Then, we have the Lagrange multiplier theory: Theorem (Lagrange Multiplier Theory) Let y∗ ∈ C be a solution to (3.35) and assume the regular point condition

0 ∈ int {G0(y∗)(C − y∗) − K + G(y∗)} (3.36) holds. Then, there exists a Lagrange multiplier

+ ∗ λ ∈ K = {y ∈ Y :(y, z) ≤ 0 for all z ∈ K} satisfying (λ, G(y ))Y = 0 such that 0 ∗ 0 ∗ ∗ ∗ (F (y ) + G (y ) λ, y − y )X ≥ 0 for all y ∈ C. (3.37) Proof: Let F = {y ∈ C : G(y) ∈ K} be the feasible set. The sequential tangent cone of F at y∗ is defined by

∗ ∗ yn − y + T (F, y ) = {x ∈ X : x = lim , tn → 0 , yn ∈ F}. n→∞ tn It is easy show that

(F 0(y∗), y − y∗) ≥ 0 for all y ∈ T (F, y∗).

Let ∗ ∗ ∗ ∗ C(y ) = ∪s≥0, y∈C {s(y − y )},K(G(y )) = {K − t G(y ), t ≥ 0}.

62 and define the linearizing cone L(F, y∗) of F at y∗ by

L(F, y∗) = {z ∈ C(y∗): G0(y∗)z ∈ K(G(y∗))}.

It follows the Listernik’s theorem [] that

L(F, y∗) ⊆ T (F, y∗), and thus (F 0(y∗), z) ≥ 0 for all z = y − y∗ ∈ L(F, y∗), (3.38) Define the convex cone B by

B = {(F 0(y∗)C(y∗) + R+,G0(y∗)C(y∗) − K(G(y∗))} ⊂ R × Y.

By the regular point condition B has an interior point. From (3.38) the origin (0, 0) is a boundary point of B and thus there exists a hyperplane in R × Y that supports B at (0, 0), i.e., there exists a nontrivial (α, λ) ∈ R × Y such that

α (F 0(y∗)z + r) + (λ, G0(y∗)z − y) ≥ 0 for all z ∈ C(y∗), y ∈ K(G(y∗)) and r ≥ 0. Setting (z, r) = (0, 0), we have (λ, y) ≤ 0 for all y ∈ K(G(y∗)) and thus

λ ∈ K+ and (λ, G(y∗)) = 0.

Letting (r, y) = (0, 0) and z ∈ L(F, y∗), we have α ≥ 0. If α = 0, the regular point condition implies λ = 0, which is a contradiction. Without loss of generality we set α = 1 and obtain (3.37) by setting (r, y) = 0.  Corollary (Nonlinear Programming ) If there existsx ¯ such that E(¯x) = 0 and G(¯x) ≤ 0, then there exits a solution x∗ to (3.34). Assume the regular point condition

0 ∈ int {E0(x∗)(C − x∗) × (G0(x∗)(C − x∗) − Z− + G(x∗))}, then there exists λ ∈ Y and µ ∈ Z such that

(J 0(x∗) + (E0(x∗))∗λ + (G0(x∗)∗µ, x − x∗) ≥ 0 for all x ∈ C,

E(x∗) = 0 (3.39)

∗ ∗ G(x ) ≤ 0, µ ≥ 0 and (G(x ), µ)Z = 0.

3.7 Control problem Next, we consider the parameterized nonlinear programming. Let X,Y,U be Hilbert spaces and Z be a Hilbert lattice. We consider the constrained minimization of the form

min J(x, u) = F (x) + H(u) subject to E(x, u) = 0 and G(x, u) ≤ 0 over u ∈ Cˆ, (3.40) where J : X × U → R, E : X × U → Y and G : X × U → Z are continuously differentiable and Cˆ is a closed convex set in U. Here we assume that for given u ∈ Cˆ

63 there exists a unique x = x(u) ∈ X that satisfies E(x, u) = 0. Thus, one can eliminate x ∈ X as a function of u ∈ U and thus (3.40) can be reduced to the constrained minimization over u ∈ Cˆ. But it is more advantageous to analyze (3.40) directly using Langrange Theorem as Theorem (Lagrange Caluclus) Assume F and H is C1. Suppose that for given v ∈ Cˆ there exists a unique x(t) = x(u + t (v − u)) ∈ X for u(t) = u + t(v − u) in a neighborhood B(x, δ) in X and t ∈ [0, δ] that satisfies E(x(t), u + t (v − u)) = 0. Assume there exist a p satisfies the adjoint equation

∗ 0 Ex(x, u) p + F (x) = 0 (3.41) and for all 0 < t < 1 and v ∈ Cˆ

(E(x(t), u(t)) − E(x, u) − Ex(x(t) − x(0)) − Eu(t(v − u)), p) (3.42) +F (x(t)) − F (x) − F 0(x)(x(t) − x) = o(t)

Then, we have

J(x(t), ut(v − u) − J(x, u) ∗ 0 lim = (Eu(x, u) p + H (u), v − u). t→0 t

Proof For v ∈ Cˆ let u(t) = u + t(v − u) and x(t) = x(u + t(v − u))

0 0 J(x(t), u(t)) − J(x, u) = F (x)(x(t) − x(0)) + H (u)(th) + o(t)1 = o(t), (3.43) 0 1 = F (x(t)) − F (x) − F (x(t))(x(t) − F (x)

From (3.41) 0 F (x)(x(t) − x) = −(Ex(x, u)(x(t) − x(0)), p) (3.44) Since E(x(t), u(t)) = E(x(0), u(0)) = 0 define

Ex(x, u)(x(t) − x(0)) + Eu(x, u)(u(t) − u) =

Ex(x, u)(x(t) − x(0)) + Eu(x, u)(u(t) − u) − E(x(t), u(t) − E(x(0, u(0)) = 2 (3.45) and thus it follows from it follows from (3.43)–(3.45) that

J(x(t), u(t) − J(x, u) = (Eu(x, u)(u(t) − u), p) + H(u)(t (v − u)) + o(t) + 1 + 2 and from the assumption (3.42) the claim holds.  Corollary (Lagrange Multiplier theory (revised)) Let (x∗, u∗) be a minimizer of (3.40), The conditions in Theorem hols at (x∗, u∗). Then the necessary optimality condition is given by

∗ ∗ ∗ 0 ∗ (Eu(x , u ) p + H (u), v − u ) for all v ∈ Cˆ where p satisfies the adjoint equation

∗ ∗ 0 ∗ Ex(x , u )p + F (x ) = 0.

64 1 Remark (1) If E(x, u) is C and Ex(¯x, u¯) is bijective it follows from the implicit function theorem (see, Section??) there exists a neighborhood B((¯x, u¯), δ) such that E(x, u) = 0 has a unique solution x given u. Moreover, t → x(t) ∈ X is continuously differentiable and d E (x, u) x(0) + E (x, u)(v − u) = 0. x dt u Thus, assumption (3.42) holds. ∗ 2 |x(t)−x(0)|2 (2) Suppose E : Y → Y and F are C , p ∈ Y and t → 0 as t → 0, then assumption (3.42) holds. (3) The penalty formulation is given by

β min J (x, u) = F (x) + H(u) + |E(x, u)|2 subject to u ∈ Cˆ. β 2

Assume (xβ, uβ) is a solution. Then the necessary optimality condition is given by

∗ 0 Ex(xβ, uβ) λβ + F (xβ) = 0

∗ 0 (Eu(xβ, uβ) λβ + H (uβ), v − uβ) = 0 for all v ∈ Cˆ (3.46)

λβ E(xβ, uβ) − β = 0.

Consider the iterative method for (xn, un) for as follows. If we apply E(x, u) ∼ + + E(xn, un) + Ex(xn, un)(x − xn) + Eu(u − u). the linearization of E at (xn, un) + + for solutions to (3.46), then we have the update formula for (x , u ), given (xn, un) with 0 < β ≤ ∞ by solving the system:

+ + λ+ E(xn, un) + Ex(xn, un)(x − xn) + Eu(u − u) − β = 0

∗ + 0 + Ex(xn, un) λ + F (x ) = 0

∗ + 0 + + (Eu(xn, un) λ + H (u ), v − u ) = 0 for all v ∈ Cˆ, which is a sequential programming as in Section 10. In general the optimality condition is written as u=Ψ(λ+) and thus the system is for (x+, λ+). Corollary (General case) Let (x∗, u∗) be a minimizer of (3.40) and assume the reg- ular point holds. It follows from Theorem 8 that the necessary optimality condition is given by

 J (x∗, u∗) + E (x∗, u∗)∗λ + G (x∗, u∗)∗µ = 0  x x x    ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ˆ  (Ju(x , u ) + Eu(x , u ) λ + Gu(x , u ) µ, u − u ) ≥ 0 for all u ∈ C (3.47)  E(x∗, u∗) = 0     G(x∗, µ∗) ≤ 0, µ ≥ 0, and (G(x∗, u∗), µ) = 0.

65 Example (Control problem) Consider the optimal control problem

Z T min (`(x(t)) + h(u(t))) dt + G(x(T )) 0 d (3.48) subject to x(t) − f(x, u) = 0, x(0) = x dt 0

u(t) ∈ Uˆ = a closed convex set in Rm.

We let X = H1(0,T ; Rn),U = L2(0,T : Rm),Y = L2(0,T ; Rn) and

Z T d J(x, u) = (`(x) + h(u)) dt + G(x(T )),E(x, u) = f(x, u) − x 0 dt and Cˆ = {u ∈ U, u(t) ∈ Uˆ a.e in (0,T )}. From (3.47) we have

Z T 0 ∗ ∗ ∗ d 0 (` (x (t))φ(t) + (p(t), fx(x , u )φ(t) − φ) dt + G (x(T ))φ(T ) = 0 0 dt for all φ ∈ H1(0,T ; Rn) with φ(0) = 0. By the integration by part

Z T Z T 0 ∗ ∗ ∗ t 0 ∗ dφ ( (` (x ) + fx(x , u ) p) ds + G (x (T )) − p(t), ) dt = 0. 0 t dt Since φ ∈ H1(0,T ; Rn) satisfying φ(0) = 0 is arbitrary, we have

Z T 0 ∗ 0 ∗ ∗ ∗ t p(t) = G (x (T )) + (` (x ) + fx(x , u ) p) ds t and thus the adjoint state p is absolutely continuous and satisfies d − p(t) = f (x∗, u∗)tp(t) + `0(x∗), p(T ) = G0(x∗(T )). dt x From the the optimality, we have

Z T 0 ∗ ∗ ∗ t ∗ (h (u ) + fu(x , u ) p(t))(u(t) − u (t)) dt 0 for all u ∈ Cˆ, which implies

0 ∗ ∗ ∗ t ∗ (h (u (t)) + fu(x (t), u (t)) p(t))(v − u (t)) ≥ 0 for all v ∈ Uˆ at Lebesgue points t of u∗(t). Hence we obtain the necessary optimality is of the form of Two-Point-Boundary value problem;

 d x∗(t) = f(x∗, u∗), x(0) = x  dt 0   0 ∗ ∗ ∗ t ∗ (h (u (t)) + fu(x (t), u (t)) p(t), v − u (t)) ≥ 0 for all v ∈ Uˆ    d ∗ ∗ t ∗ ∗ − dt p(t) = fx(x , u ) p(t) + lx(x ), p(T ) = Gx(x (T )).

66 ˆ m α 2 ˜ Let U = [−1, 1] coordinate-wise and h(u) = 2 |u| , f(x, u) = f(x) + Bu. Thus, the optimality condition implies

∗ t ∗ (α u (t) + B p(t), v − u (t))Rm ≥ 0 fur all |v|∞ ≤ 1.

Or, equivalently u∗(t) minimizes α |u|2 + (Btp(t), u) over |u| ≤ 1 2 ∞ Thus, we have  Btp(t)  1 if − α > 1    t t u∗(t) = B p(t) B p(t) (3.49) − α if | α | ≤ 1    Btp(t) −1 if − α < −1. Example (Coefficient estimation) Let ω be a bounded open set in Rd and Ω˜ be a subset in Ω. Consider the parameter estimation problem Z Z min J(u, c) = |y − u|2 dx + |∇c|2 dx Ω˜ Ω subject to −∆u + c(x)u = f, c ≥ 0 1 1 1 ∗ over (u, c) ∈ H0 (Ω) × H (Ω). We let Y = (H0 (Ω)) and

hE(u, c), φi = (∇u, ∇φ) + (c u, φ)L2(Ω) − hf, φi

1 for all φ ∈ H0 (Ω). Since

∗ hJx + Exλ, hi = (∇h, ∇λ) + (c h, λ)L2(ω) − ((y − u) χΩ˜ , h) = 0,

1 1 for all h ∈ H0 (Ω), i.e., the adjoint λ ∈ H0 (Ω) satisfies

−∆λ + c λ = (y − u) χΩ˜ . Also, there exits µ ≥ 0 in L2(Ω) such that

∗ ∗ hJc + Ec λ, di = α (∇c , ∇d) + (d u, λ)L2(Ω) − (µ, d) = 0 for all d ∈ H1(Ω) and

hµ, c − c∗i = 0 and hµ, di ≥ 0 for all d ∈ H1(Ω) satisfying d ≥ 0.

3.8 Minimum norm solution in Banach space In this section we discuss the minimum distance and the minimum norm problem in Banach spaces. Given a subspace M of a normed space X, the orthogonal complement M ⊥ of M is defined by

M ⊥ = {x∗ ∈ X∗ : hx∗, xi = 0 for all x ∈ M}.

67 Theorem If M is a closed subspace of a normed space, then

(M ⊥)⊥ = {x ∈ X : hx, x∗i = 0 for all x∗ ∈ M ⊥} = M

Proof: If there exits a x ∈ (M ⊥)⊥ but x∈ / M, define a linear functional f by

f(α x + m) = α on the space spanned by x + M. Since x∈ / m and M is closed |f(α x + m)| |f(x + m)| 1 |f| = sup = sup = < ∞, m∈M, α∈R |α x + m|X m∈M |x + m|X infm∈M |x + m|X and thus f is bounded. By the Hahn-Banach theorem, f can be extended to x∗ ∈ X∗. Since hx∗, mi = 0 for all m ∈ M, thus x∗ ∈ M ⊥. It is contradicts to the fact that ∗ hx , xi = 1.  Theorem (Duality I) Given x ∈ X and a subspace M, we have

∗ ∗ d = inf |x − m|X = max hx , xi = hx0, xi, m∈M |x∗|≤1, x∗∈M ⊥

∗ ⊥ ∗ where the maximum is attained by some x0 ∈ M with |x0| = 1. If the infimum on ∗ the first equality holds for some m0 ∈ M, then x0 is aligned with x − m0.

Proof: By the definition of inf, for any  > 0, there exists m ∈ M such that |x−m| ≤ d + . For any x∗ ∈ M ⊥, |x∗| ≤ 1, it follows that

∗ ∗ ∗ hx , xi = hx , x − mi ≤ |x ||x − m| ≤ d + .

∗ ∗ Since  > 0 is arbitrary, hx , xi ≤ d. It only remains to show that hx0, xi = d for some ∗ ∗ x0 ∈ X . Define a linear functional f on the space spanned by x + M by f(α x + m) = α d

Since |f(α x + m)| d |f| = sup = = 1 m∈M, α∈R |α x + m|X infm∈M |x + m|X ∗ f is bounded. By the Hahn-Banach theorem, there exists an extension x0 of f with ∗ ∗ ∗ ⊥ ∗ |x0| = 1. Since hx0, mi = 0 for all m ∈ M, x0 ∈ M . By construction hx0, xi = d. ???Moreover, if |x − m0| = d for some m0 ∈ M, then

∗ ∗ hx0, x − m0i = d = |x0||x − m0|.

3.8.1 Duality Principle

Corollary (Duality) If m0 ∈ M attains the minimum if and only if there exists a ∗ ⊥ nonzero x ∈ M that is aligned with x − m0. ∗ ⊥ Proof: Suppose that x ∈ M is aligned with x − m0,

∗ ∗ ∗ |x − m0| = hx , x − m0i = hx , xi = hx , xi ≤ |x − m| for all m ∈ M. 

68 Theorem (Duality principle) (1) Given x∗ ∈ X∗, we have

∗ ∗ ∗ min |x − m |X∗ = sup hx , xi. m∗∈M ⊥ x∈M, |x|≤1

∗ ⊥ (2) The minimum of the left is achieved by some m0 ∈ M . ∗ ∗ (3) If the supremum on the right is achieved by some x0 ∈ M, then x − m0 is aligned with x0. Proof: (1) Note that for m∗ ∈∈ M ⊥ |x∗ − m∗| = sup hx∗ − m∗, xi ≥ sup hx∗ − m∗, xi = sup hx∗, xi. |x|≤1 |x|≤1, x∈M |x|≤1, x∈M

∗ ∗ ∗ Consider the restriction of x to M (with norm supx∈M, |x|≤1hx , xi. Let y be the ∗ ∗ ∗ ∗ ∗ ⊥ Hahn-Banach extension of this restricted x . Let m0 = x − y . Then m0 ∈ M and ∗ ∗ ∗ ∗ |x − m0| = |y | = supx∈M, |x|≤1hx , xi. Thus, the minimum of the left is achieved ∗ ⊥ by some m0 ∈ M . If the supremum on the right is achieved by some x0 ∈ M, then ∗ ∗ ∗ ∗ ∗ ∗ ∗ x −m0 is aligned with x0. Obviously, |x0| = 1 and |x −m0| = hx , x0i = hx −m0, x0i.  In all applications of the duality theory, we use the alignment properties of the space and its dual to characterize optimum solutions, guarantee the existence of a solution by formulating minimum norm problems in the dual space, and examine to see if the dual problem is easier than the primal problem. Example 1(Chebyshev Approximation) Problem is to to find a polynomial p of degree less than or equal to n that ”best” approximates f ∈ C[0, 1] in the the sense of the sup-norm over [a, b]. Let M be the space of all polynomials of degree less than or equal to n. Then, M is a closed subspace of C[0, 1] of dimension n + 1. The infimum of |f − p|∞ is achievable by some p0 ∈ M since M is closed. The Chebyshev theorem states that the set Γ = {t ∈ [0, 1] : |f(t) − p0(t)| = |f − p0|∞} contains at least n + 2 ⊥ ∗ points. In fact, f −p0 must be aligned with some element in M ⊂ C[0, 1] = BV [0, 1]. Assume Γ contains m < n + 2 points 0 ≤ t1 < ··· < tm ≤ 1. If v ∈ BV [0, 1] is aligned with f − p0, then v is a piecewise continuous function with jump discontinuities only at these tk’s. Let tk be a point of jump discontinuity of v. Define m q(t) = Πj6=k(t − tj) Then, q ∈ M but hq, vi= 6 0 and hence v∈ / M ⊥, which is a contradiction. Next, we consider the minimum norm problem;

∗ ∗ min |x |X∗ subject to hx , yki = ck, 1 ≤ k ≤ n. where y1, ··· , yn ∈ X are given linearly independent vectors. ∗ ∗ ∗ Theorem Letx ¯ ∈ X satisfying the constraints hx¯ , yki = ck, 1 ≤ k ≤ n. Let M = span{yk}. Then, ∗ ∗ ∗ d = min |x |X∗ = min |x¯ − m |. ∗ ∗ ⊥ hx¯ ,yki=ck m ∈M By the duality principle, d = min |x¯∗ − m∗| = sup hx¯∗, xi. m∗ıM ⊥ x∈M, |x|≤1

69 n Write x = Y a where Y = [y1, ··· , yn] and a ∈ R . Then

d = min |x∗| = max hx¯∗, Y ai = max (c, a). |Y a|≤1 |Y a|≤1

∗ Thus, the optimal solution x0 must be aligned with Y a. Example (DC motor control) Let X = L1(0, 1) and X∗ = L∞(0, 1).

min |u|∞

Z 1 Z 1 subject to et−1u(t) dt = 0, (1 + et−1)u(t) = 1. 0 0

t−1 t−1 Let y1 = e and y2 = 1 + e . From the duality principle

min |u|∞ = max a2. hyk,ui=ck |a1y1+a2y2|1≤1

The control problem now is reduced to finding constants a1 and a2 that maximizes a2 R 1 t−1 ∞ subject to 0 |(a1 − a2)e + a2| dt ≤ 1. The optimal solution u ∈ L (0, 1) should be t−1 1 t−1 aligned with (a1 − a2)e + a2 ∈ L (0, 1). Since (a1 − a2)e + a2, can change sign at most once. The alignment condition implies that u must have values |u|∞ and can change sign at most once, which is the so called bang-bang control.

4 Linear Cauchy problem and C0-semigroup the- ory

In this section we discuss the Cauchy problem of the form d u(t) = Au(t) + f(t), u(0) = u ∈ X dt 0 1 in a Banach space X, where u0 ∈ X is the initial condition and f ∈ L (0,T ; X). Such problems arise in PDE dynamics and functional equations. We construct the mild solution u(t) ∈ C(0,T ; X):

Z t u(t) = S(t)u0 + S(t − s)f(s) ds (4.1) 0 where a family of bounded linear operator {S(t), t ≥ 0} is C0-semigroup on X.

Definition (C0 semigroup) (1) Let X be a Banach space. A family of bounded linear operators {S(t), t ≥ 0} on X is called a strongly continuous (C0) semigroup if

S(t + s) = S(t)S(s) for t, s ≥ 0 with S(0) = I

|S(t)φ − φ| → 0 as t → 0+ for all φ ∈ X.

(2) A linear operator A in X defined by

S(t)φ − φ Aφ = lim (4.2) t→0+ t

70 with S(t)φ − φ dom(A) = {φ ∈ X : the strong limit of lim in X exists}. t→0+ t is called the infinitesimal generator of the C0 semigroup S(t).

In this section we present the basic theory of the linear C0-semigroup on a Banach space X. The theory allows to analyze a wide class of the physical and engineering dynamics using the unified framework. We also present the concrete examples to demonstrate the theory. There is a necessary and sufficient condition (Hile-Yosida Theorem) for a closed, densely defined linear A in X to be the infinitesimal generator of the C0 semigroup S(t). Moreover, we will show that the mild solution u(t) satisfies Z ∗ hu(t), ψi = hu0, ψi + (hx(s),A ψi + hf(s), ψi ds (4.3) for all ψ ∈ dom (A∗). Examples (1) For A ∈ L(X), define a sequence of linear operators in X

X 1 S (t) = (At)k. N k! k Then X 1 |S (t)| ≤ (|A|t)k ≤ e|A| t N k! and d S (t) = AS (t) dt N N−1 At Since S(t) = e = limN→∞ SN (t) in the operator norm, we have d S(t) = AS(t) = S(t)A. dt (2) Consider the the hyperbolic equation

ut + ux = 0, u(0, x) = u0(x) in (0, 1). (4.4)

Define the semigroup S(t) of translations on X = L2(0, 1) by

[S(t)u0](x) =u ˜0(x − t), whereu ˜0(x) = 0, x ≤ 0, u˜0 = u0 on [0, 1].

Then, {S(t), t ≥ 0} is a C0 semigroup on X. If we define u(t, x) = [S(t)u0](x) with 1 u0 ∈ H (0, 1) with u0(0) = 0 satisfies (4.6) a.e.. The generator A is given by

Aφ = −φ0 with dom(A) = {φ ∈ H1(0, 1) with φ(0) = 0}.

In fact S(t)u − u u˜ (x − t) − u˜ (x) 0 0 = 0 0 = u0 (x), a.e. x ∈ (0, 1). t t 0 d if u0 ∈ dom(A). Thus, u(t) = S(t)u0 satisfies the Cauchy problem dt u(t) = Au(t) if u0 ∈ dom(A).

71 (3) Let Xt is a Markov process, i.e.

x 0,Xt E [g(Xt+h|calF t] = E [g(Xh)]. for all g ∈ X = L2(Rn). Define the linear operator S(t) by

0,x (S(t)u0)(x) = E [u0(Xt)], t ≥ 0, u0 ∈ X.

The semigroup property of S(t) follows from the Markov property, i.e.

0,x 0,x 0,x Xt x S(t+s)u0 = E [u0(Xt+s)] = E[E[g(Xt+s|Ft] = E[E u0(Xs)]] = E [(S(t)u0)(Xs)] = S(s)(S(t)u0).

0,x n The strong continuity follows from that Xt − x is a.s for all x ∈ R . If Xt = Bt is a Brownian motion, then the semigroup S(t) is defined by

Z 2 1 − |x−y| [S(t)u ](x) = √ e 2σ2 t u (y) dy, (4.5) 0 n 0 ( 2πtσ) Rn and u(t) = S(t)u0 satisfies the heat equation. σ2 u = ∆u, u(0, x) = u (x) in L2(Rn). (4.6) t 2 0 4.1 Finite difference in time Let A be closed, densely defined linear operator dom(A) → X. We use the finite difference method in time to construct the mild solution (4.1). For a stepsize λ > 0 consider a sequence {un} in X generated by

un − un−1 = Aun + f n−1, (4.7) λ with 1 Z nλ f n−1 = f(t) dt. λ (n−1)λ Assume that for λ > 0 the resolvent operator

−1 Jλ = (I − λ A) is bounded. Then, we have the product formula:

n−1 n n X n−k k u = Jλ u0 + Jλ f λ. (4.8) k=0 n In order to u ∈ X is uniformly bounded in n for all u0 ∈ X (with f = 0), it is necessary that M |J n| ≤ for λω < 1, (4.9) λ (1 − λω)n for some M ≥ 1 and ω ∈ R. Hille’s Theorem Define a piecewise constant function in X by

k−1 uλ(t) = u on [tk−1, tk)

72 Then, max |uλ − u(t)|X → 0 t∈[0,T ] as λ → 0+ to the mild solution (4.1). That is,

t [ t ] S(t)x = lim (I − A) n x n→∞ n exists for all x ∈ X and {S(t), t ≥ 0} is the C0 semigoup on X and its generator is A, where [s] is the largest integer less than s ∈ R. Proof: First, note that M |J | ≤ λ 1 − λω and for x ∈ dom(A) Jλx − x = λJλAx, and thus λ |J x − x| = |λJ Ax| ≤ |Ax| → 0 λ λ 1 − λω as λ → 0+. Since dom(A) is dense in X it follows that

+ |Jλx − x| → 0 as λ → 0 for all x ∈ X.

Define the linear operators Tλ(t) and Sλ(t) by t − t S (t) = J k and T (t) = J k−1 + k (J k − J k−1), on (t , t ]. λ λ λ λ λ λ λ k−1 k Then, d T (t) = AS (t), a.e. in t ∈ [0,T ]. dt λ λ Thus,

Z t d Z t Tλ(t)u0−Tµ(t)u0 = (Tλ(s)Tµ(t−s)u0) ds = (Sλ(s)Tµ(t−s)−Tλ(s)Sµ(t−s))Au0 ds 0 ds 0 Since s − t T (s)u − S (s)u = k T (t )(J − I)u on s ∈ (t , t ]. λ λ λ λ k−1 λ k−1 k By the bounded convergence theorem

|Tλ(t)u0 − Tµ(t)u|X → 0 as λ, µ → 0+ for all u ∈ dom(A2). Thus, the unique limit defines the linear operator S(t) by S(t)u0 = lim Sλ(t)u0. (4.10) λ→0+ 2 for all u0 ∈ dom(A ). Since M |S (t)| ≤ ≤ Meωt λ (1 − λω)[t/n]

73 2 and dom(A ) is dense, (4.10) holds for all u0 ∈ X. Moreover, we have

n+m n m S(t + s)u = lim Jλ = Jλ Jλ u = S(t)S(s)u λ→0+ and limt→0+ S(t)u = limt→0+ Jtu = u for all u ∈ X. Thus, S(t) is the C0 semigroup on X. Moreover, {S(t), t ≥ 0} is in the class G(M, ω), i.e.,

|S(t)| ≤ Meωt.

Note that Z t Tλ(t)u0 − u0 = A Sλu0 ds. 0

Since limλ→0+ Tλ(t)u0 = limλ→0+ Sλ(t)u0 = S(t)u0 and A is closed, we have Z t Z t S(t)u0 − u0 = A S(s)u0 ds, S(s)u0 ds ∈ dom(A). 0 0 If B is a generator of {S(t), t ≥ 0}, then S(t)x − x Bx = lim = Ax t→0+ t if x ∈ dom(A). Conversely, if u0 ∈ dom(B), then u0 ∈ dom(A) since A is closed and t → S(t)u is continuous at 0 for all u ∈ X and thus Z t 1 + A S(s)u0 ds = Au0 as t → 0 . t 0 Hence S(t)u − u Au = 0 0 = Bu 0 t 0 That is, A is the generator of {S(t), t ≥ 0}. Similarly, we have

n−1 Z t Z t X n−k k + Jλ f = Sλ(t − s)f(s) ds → S(t − s)f(s) ds as λ → 0 k=0 0 0 by the Lebesgue dominated convergence theorem. 

The following theorem state the basic properties of C0 semigroups: Theorem (Semigroup) (1) There exists M ≥ 1, ω ∈ R such that S ∈ G(M, ω) class, i.e., |S(t)| ≤ M eωt, t ≥ 0. (4.11)

(2) If x(t) = S(t)x0, x0 ∈ X, then x ∈ C(0,T ; X) 1 (3) If x0 ∈ dom (A), then x ∈ C (0,T ; X) ∩ C(0,T ; dom(A)) and d x(t) = Ax(t) = AS(t)x . dt 0 (4) The infinitesimal generator A is closed and densely defined. For x ∈ X Z t S(t)x − x = A S(s)x ds. (4.12) 0

74 (5) λ > ω the resolvent is given by Z ∞ (λ I − A)−1 = e−λsS(s) ds (4.13) 0 with estimate M |(λ I − A)−n| ≤ . (4.14) (λ − ω)n Proof: (1) By the uniform boundedness principle there exists M ≥ 1 such that |S(t)| ≤ M on [0, t0] For arbitrary t = k t0+τ, k ∈ N and τ ∈ [0, t0) it follows from the semigroup property we get

k k log |S(t0)| ω t |S(t)| ≤ |S(τ)||S(t0| ≤ Me ≤ Me with ω = 1 log |S(t )|. t0 0 (2) It follows from the semigroup property that for h > 0

x(t + h) − x(t) = (S(h) − I)S(t)x0 and for t − h ≥ 0 x(t − h) − x(t) = S(t − h)(I − S(h))x0 Thus, x ∈ C(0,T ; X) follows from the strong continuity of S(t) at t = 0. (3)–(4) Moreover, x(t + h) − x(t) S(h) − I S(h)x − x = S(t)x = S(t) 0 0 h h 0 h and thus S(t)x0 ∈ dom(A) and x(t + h) − x(t) lim = AS(t)x0 = Ax(t). h→0+ h Similarly, x(t − h) − x(t) S(h)φ − φ lim = lim S(t − h) = S(t)Ax0. h→0+ −h h→0+ h

Hence, for x0 ∈ dom(A) Z t Z t Z t S(t)x0 − x0 = S(s)Ax0 ds = AS(s)x0 ds = A S(s)x0 ds (4.15) 0 0 0

If xn ∈ don(A) → x and Axn → y in X, we have Z t S(t)x − x = S(s)y ds 0 Since 1 Z t lim S(s)y ds = y t→0+ t 0 x ∈ dom(A) and y = Ax and hence A is closed. Since A is closed it follows from (4.15) that for x ∈ X Z t S(s)x ds ∈ dom(A) 0

75 and (4.12) holds. For x ∈ X let

1 Z h xh = S(s)x ds ∈ dom(A) h 0

+ Since xh → x as h → 0 , dom(A) is dense in X. (5) For λ > ω define Rt ∈ L(X) by

Z t −λs Rt = e S(s) ds. 0

Since A − λ I is the infinitesimal generator of the semigroup eλtS(t), from (4.12)

−λt (λ I − A)Rtx = x − e S(t)x.

−λt Since A is closed and |e S(t)| → 0 as t → ∞, we have R = limt→∞ Rt satisfies

(λ I − A)Rφ = φ.

Conversely, for φ ∈ dom(A) Z ∞ R(A − λ I)φ = e−λsS(s)(A − λ I)φ = lim e−λtS(tφ − φ = −φ 0 t→∞ Hence Z ∞ R = e−λsS(s) ds = (λ I − A)−1 0 Since for φ ∈ X

Z ∞ Z ∞ M |Rx| ≤ |e−λsS(s)x| ≤ M e(ω−λ)s|x| ds = |x|, 0 0 λ − ω we have M |(λ I − A)−1| ≤ , λ > ω. λ − ω Note that Z ∞ Z ∞ Z ∞ Z ∞ (λ I − A)−2 = e−λtS(t) ds eλ sS(s) ds = e−λ(t+s)S(t + s) ds dt 0 0 0 0 Z ∞ Z ∞ Z ∞ = e−λσS(σ) dσ dt = σe−λσS(σ) dσ. 0 t 0 By induction, we obtain

1 Z ∞ (λ I − A)−n = tn−1e−λtS(t) dt. (4.16) (n − 1)! 0 Thus, Z ∞ −n 1 n−1 −(λ−ω)t M |(λ I − A) | ≤ t e dt = n . (n − 1)! 0 (λ − ω) We then we have the necessary and sufficient condition:

76 Hile-Yosida Theorem A closed, densely defined linear operator A on a Banach space X is the infinitesimal generator of a C0 semigroup of class G(M, ω) if and only if M |(λ I − A)−n| ≤ for all λ > ω (4.17) (λ − ω)n Proof: The sufficient part follows from the previous Theorem. In addition, we describe the Yosida construction. Define the Yosida approximation Aλ ∈ L(X) of A by J − I A = λ = AJ . (4.18) λ λ λ Define the Yosida approximation:

t t Aλt − Jλ Sλ(t) = e = e λ e λ .

Since M |J k| ≤ λ (1 − λω)k we have ∞ − t X M k t k ω t |S (t)| ≤ e λ |J |( ) ≤ Me 1−λω . λ k! λ λ k=0 Since d S (s)S (t − s) = S (s)(A − A )S (t − s), ds λ λˆ λ λ λˆ λˆ we have Z t Sλ(t)x − Sλˆ(t)x = Sλ(s)Sλˆ(t − s)(Aλ − Aλˆ)x ds 0 Thus, for x ∈ dom(A)

2 ωt |Sλ(t)x − Sλˆ(t)x| ≤ M te |(Aλ − Aλˆ)x| → 0 as λ, λˆ → 0+. Since dom(A) is dense in X this implies that

S(t)x = lim Sλ(t)x exist for all x ∈ X, λ→0+ defines a C0 semigroup of G(M, ω) class. The necessary part follows from (4.16)  Theorem (Mild solution) (1) If for f ∈ L1(0,T ; X) define

Z t x(t) = x(0) + S(t − s)f(s) ds, 0 then x(t) ∈ C(0,T ; X) and it satisfies

Z t Z t x(t) = A x(s) ds + f(s) ds. (4.19) 0 0 (2) If Af ∈ L1(0,T ; X) then x ∈ C(0,T ; dom(A)) and

Z t x(t) = x(0) + (Ax(s) + f(s)) ds. 0

77 1,1 R t 0 d 0 1 (3) If f ∈ W (0,T ; X), i.e. f(t) = f(0) + 0 f (s) ds, dt f = f ∈ L (0,T ; X), then Ax ∈ C(0,T ; X) and

Z t Z t A S(t − s)f(s) ds = S(t)f(0) − f(t) + S(t − s)f 0(s) ds. (4.20) 0 0 Proof: Since Z t Z τ Z t Z t S(t − s)f(s) dsdτ = ( S(τ − s)d τ)f(s) ds, 0 0 0 s and Z t A S(s) ds = S(t) − I 0 we have x(t) ∈ dom(A) and

Z t Z t Z t A x(s) ds = S(t)x − x + S(t − s)f(s) ds − f(s) ds. 0 0 0 and we have (4.19). (2) Since for h > 0

x(t + h) − x(t) Z t S(h) − I 1 Z t+h = S(t − s) f(s) ds + S(t + h − s)f(s) ds h 0 h h t if Af ∈ L1(0,T ; X)

x(t + h) − x(t) Z t lim = S(t − s)Af(s) ds + f(t) h→0+ h 0 a.e. t ∈ (0,T ). Similarly,

x(t − h) − x(t) Z t−h S(h) − I 1 Z t = S(t − h − s) f(s) ds + S(t − s)f(s) ds −h 0 h h t−h

Z t → S(t − s)Af(s) ds + f(t) 0 a.e. t ∈ (0,T ). (3) Since

S(h) − I 1 Z h Z t+h x(t) = ( S(th − s)f(s) ds − S(t + f − s)f(s) ds h h 0 t

Z t f(s + h) − f(s) + S(t − s) ds, 0 h

+ letting h → 0 , we obtain(4.20).  It follows from Theorems the mild solution Z t x(t) = S(t)x(0) + S(t − s)f(s) ds 0

78 satisfies Z t Z t x(t) = x(0) + A x(s) + f(s) ds. 0 0 Note that the mild solution x ∈ C(0,T ; X) depends continuously on x(0) ∈ X and f ∈ L1(0,T ; X) with estimate

Z t |x(t)| ≤ M(eωt|x(0)| + eω(t−s)|f(s)| ds). 0

Thus, the mild solution is the limit of a sequence {xn} of strong solutions with xn(0) ∈ 1,1 1,1 dom(A) and fn ∈ W (0,T ; X), i.e., since dom(A) is dense in X and W (0,T ; X) is dense in L1(0,T ; X),

|xn(t) − x(t)|X → 0 uniformly on [0,T ] for |xn(0) − x(0)|x → 0 and |fn − f|L1(0,T ;X) → 0 as n → ∞. Moreover, the mild solution x ∈ C(0,T : X) is a weak solution to the Cauchy problem d x(t) = Ax(t) + f(t) (4.21) dt ∗ in the sense of (4.3), i.e., for all ψ ∈ dom(A ) hx(t), ψiX×X∗ is absolutely continues and d hx(t), ψi = hx(t), ψi + hf(t), ψi a.e. in (0,T ). dt If x(0) ∈ dom(A) and Af ∈ L1(0,T ; X), then Ax ∈ C(0,T ; X), x ∈ W 1,1(0,T ; X) and d x(t) = Ax(t) + f(t), a.e. in (0,T ) dt If x(0) ∈ dom(A) and f ∈ W 1,1(0,T ; X), then x ∈ C(0,T ; dom(A)) ∩ C1(0,T ; X) and d x(t) = Ax(t) + f(t), everywhere in [0,T ]. dt 4.2 Weak-solution and Ball’s result Let A be a densely defined, closed linear operator on a Banach space X. Consider the Cauchy equation in X: d u = Au + f(t), (4.22) dt where u(0) = x ∈ X and f ∈ L1(0, τ; X) is a weak solution to of (4.22) if for every ψ ∈dom(A∗) the function t → hu(t), ψi is absolutely continuous on [0, τ] and d hu(t), ψi = hu(t),A∗ψi + hf(t), ψi, a.e. in [0, τ]. (4.23) dt It has been shown that the mild solution to (4.22) is a weak solution. Lemma B.1 Let A be a densely defined, closed linear operator on a Banach space X. If x, y ∈ X satisfy hy, ψi = hx, A∗ψi for all ψ ∈dom(A∗), then x ∈ dom(A) and y = Ax.

79 Proof: Let G(A) ⊂ X × X denotes the graph of A. Since A is closed G(A) is closed. Suppose y 6= Ax. By Hahn-Banach theorem there exist z, z∗ ∈ X∗ such that hAx, zi + hx, z∗i = 0 and hy, zi + hx, z∗i= 6 0. Thus z ∈dom(A∗) and z∗ = A∗z. By the ∗ condition hy, zi + hx, z i = 0, which is a contradiction.  Then we have the following theorem. Theorem (Ball) There exists for each x ∈ X and f ∈ L1(0, τ; X) a unique weak solution of (4.22)satisfying u(0) = x if and only if A is the generator of a strongly continuous semigroup {T (t)} of bounded linear operator on X, and in this case u(t) is given by Z t u(t) = T (t)x + T (t − s)f(s) ds. (4.24) 0 Proof: Let A generate the strongly continuous semigroup {T (t)} on X. Then, for some M, |T (t)| ≤ M on t ∈ [0, τ]. Suppose x ∈dom(A) and f ∈ W 1,1(0, τ; X). Then we have d hu(t), ψi = hAu(t) + f(t), ψi = hu(t),A∗ψi + hf(t), ψi. dt 1 1,1 For (x, f) ∈ X × L (0, τ; X) there exista a sequence (xn, fn) in dom(A) × W (0, τ; X) such that |xn − x|X + |fn − f|L1(0,τ;X) → 0 as n → ∞ If we define

Z t un(t) = T (t)xn + T (t − s)fn(s) ds, 0 then we have Z t ∗ hun(t), ψi = hx, ψi + (hun(s),A ψi + hfn(s), ψi) ds 0 and Z t |un(t) − u(t)|X ≤ M (|xn − x|X + |fn(s) − f(s)|X ds). 0 Passing limit n →]∞, we see that u(t) is a weak solution of (4.22). Next we prove that u(t) is the only weak solution to (4.22) satisfying u(0) = x. Let u˜(t) be another such weak solution and set v = u − u˜. Then we have

Z t hv(t), ψi = h v(s) dt, A∗ψi 0

∗ R t for all ψ ∈dom(A ) and t ∈ [0, τ]. By Lemma B.1 this implies z(t) = 0 v(s) ds ∈dom(A) d and dt z(t) = Az(t) with z(0) = 0. Thus z = 0 and hence u(t) =u ˜(t) on [0, τ]. Suppose that A such that (4.22) has a unique weak solution u(t) satisfying u(0) = x. For t ∈ [0, τ] we define the linear operator T (t) on X by T (t)x = u(t) − u0(t), where u0 is the weak solution of (4.22) satisfying u(0) = 0. If for t = nT + s, where n is a nonnegative integer and s ∈ [0, τ) we define T (t)x = T (s)T (τ)nx, then T (t) is a semigroup. The map θ : x → C(0, τ; X) defined by θ(x) = T (·)x has a closed graph by the uniform bounded principle and thus T (t) is a strongly continuous semigroup. Let B be the generator of {T (t)} and x ∈dom(B). For ψ ∈dom(A∗) d hT (t)x, ψi| = hBx, ψi = hx, A∗ψi. dt t=0

80 It follows from Lemma that x ∈dom(A) and Ax = Bx. Thus dom(B) ⊂dom(A). The proof of Theorem is completed by showing dom(A) ⊂dom(B). Let x ∈dom(A). Since for z(t) = T (t)x Z t hz(t), ψi = h z(s) dt, A∗ψi 0 R t R t it follows from Lemma that 0 T (s)x ds and 0 T (s)Ax ds belong to dom(A) and Z t T (t)x = x + A T (s)x ds 0 (4.25) Z t T (t)Ax = Ax + A T (s)Ax ds 0 Consider the function Z t Z t w(t) = T (s)Ax ds − A T (s)x ds. 0 0 It then follows from (4.25) that z ∈ C(0, τ; X). Clearly w(0) = 0 and it also follows from (4.25) that d hw(t), ψi = hw(t),A∗ψi. (4.26) dt for ψ ∈dom(A∗). But it follows from our assumptions that (4.26) has the unique solution w = 0. Hence from (4.25)

Z t T (t)x − x = A T (s)x ds 0 and thus T (t)x − x lim = Ax t→0+ t which implies x ∈dom(B). 

4.3 Lumer-Phillips Theorem The condition (4.17) is very difficult to check for a given A in general. For the case M = 1 we have a very complete characterization. Lumer-Phillips Theorem The followings are equivalent: (a) A is the infinitesimal generator of a C0 semigroup of G(1, ω) class. (b) A − ω I is a densely defined linear m-dissipative operator,i.e.

|(λ I − A)x| ≥ (λ − ω)|x| for all x ∈ don(A), λ > ω (4.27) and for some λ0 > ω R(λ0 I − A) = X. (4.28) Proof: It follows from the m-dissipativity

−1 1 |(λ0 I − A) | ≤ λ0 − ω

81 Suppose xn ∈ dom(A) → x and Axn → y in X, the

−1 −1 x = lim xn = (λ0 I − A) lim (λ0 xn − Axn) = (λ0 I − A) (λ0 x − y). n→∞ n→∞ Thus, x ∈ dom(A) and y = Ay and hence A is closed. Since for λ > ω

−1 λ I − A = (I + (λ − λ0)(λ0 I − A) )(λ0 I − A), if |λ−λ0| < 1, then (λ I − A)−1 ∈ L(X). Thus by the continuation method we have λ0−ω (λ I − A)−1 exists and 1 |(λ I − A)−1| ≤ , λ > ω. λ − ω It follows from the Hile-Yosida theorem that (b) → (a). (b) → (a) Since for x∗ ∈ F (x), the dual element of x, i.e. x∗ ∈ X∗ satisfying ∗ 2 ∗ hx, x iX×X∗ = |x| and |x | = |x|

he−ωtS(t)x, x∗i ≤ |x||x∗| = hx, x∗i we have for all x ∈ dom(A)

e−ωtS(t)x − x 0 ≥ lim h , x∗i = h(A − ω I)x, x∗i for all x∗ ∈ F (x). t→0+ t which implies A − ω I is dissipative.  Theorem (Dissipative I) (1) A is a ω-dissipative

|λ x − Ax| ≥ (λ − ω)|x| for all x ∈ dom (A). if and only if (2) for all x ∈ dom(A) there exists x∗ ∈ F (x) such that

hAx, x∗i ≤ ω |x|2. (4.29)

(2) → (1). Let x ∈ dom(A) and choose x∗ ∈ F (0) such that hA, x∗i ≤ 0. Then, for any λ > 0,

|x|2 = hx, x∗i = hλ x − Ax + Ax, x∗i ≤ hλ x − Ax, x∗i − ω |x|2 ≤ |λ x − Ax||x| − ω |x|2, which implies (2). (1) → (2). From (1) we obtain the estimate 1 − (|x − λ Ax| − |x|) ≤ 0 λ and 1 hAx, xi− = lim = (|x| − |x − λ Ax|) ≤ 0 λ→0+ λ ∗ ∗ which implies there exists x ∈ F (x) such that (4.29) holds since hAx, xi− = hAX, x i ∗ for some x ∈ F (x).  Thus, Lumer-Phillips theorem says that if m-diisipative, then (4.29) hold for all x∗ ∈ F (x).

82 Theorem (Dissipative II) Let A be a closed densely defined operator on X. If A and A∗ are dissipative, then A is m-dissipative and thus the infinitesimal generator of a C − 0-semigroup of contractions.

Proof: Let y ∈ R(I − A) be given. Then there exists a sequence xn ∈ dom(A) such that y=xn − Axn → y as n → ∞. By the dissipativity of A we obtain

|xn − xm| ≤ |xn − xm − A(xn − xm)| ≤ |y−ym|

Hence xn is a Cauchy sequence in X. We set x = limn→∞xn. Since A is closed, we see that x ∈ dom (A) and x − Ax = y, i.e., y ∈ R(I − A). Thus R(I − A) is closed. Assume that R(I − A) 6= X. Then there exists an x∗ ∈ X∗ such that

h(I − A)x, x∗) = 0 for all x ∈ dom (A).

By definition of the dual operator this implies x∗ ∈ dom (A∗) and (I − A)∗x∗ = 0. ∗ ∗ ∗ ∗ Dissipativity of A* implies |x | < |x − A x | = 0, which is a contradiction.  Example (revisited example 1)

Aφ = −φ0 in X = L2(0, 1) and for φ ∈ H1(0, 1)

Z 1 0 1 2 2 (Aφ, φ)X = − φ (x)φ dx == (|φ(0)| − |φ(1)| ) 0 2 Thus, A is dissipative if and only if φ(0) = 0, the in flow condition. Define the domain of A by dom(A) = {φ ∈ H1(0, 1) : φ(0) = 0} The resolvent equation is equivalent to d λ u + u = f dx and Z x u(x) = e−λ (x−s)f(s) ds, 0 and R(λ I − A) = X. By the Lumer-Philips theorem A generates the C0 semigroup on X = L2(0, 1). Example (Conduction equation) Consider the heat conduction equation:

d X ∂2u u = Au = a (x) + b(x) · ∇u + c(x)u, in Ω. dt ij ∂x ∂x i,j i j

Let X = C(Ω) and dom(A) ⊂ C2(Ω). Assume that a ∈ Rn×n ∈ C(Omega) b ∈ Rn,1 and c ∈ R are continuous on Ω¯ and a is symmetric and

m I ≤ a(x) ≤ MI for 0 < m ≤ M < ∞.

83 2 Then, if x0 is an interior point of Ω at which the maximum of φ ∈ C (Ω) is attained. Then, X ∂2u ∇φ(x ) = 0, a (x ) (x ) ≤ 0. 0 ij 0 ∂x ∂x 0 ij i j and thus (λ φ − Aφ)(x0) ≤ ω φ(x0) where ω ≤ max c(x). x∈Ω 2 Similarly, if x0 is an interior point of Ω at which the minimum of φ ∈ C (Ω) is attained, then (λ φ − Aφ)(x0) ≥ 0

If x0 ∈ ∂Ω attains the maximum, then ∂ φ(x ) ≤ 0. ∂ν 0 Consider the domain with the Robin boundary condition: ∂ dom(A) = {u ∈ α(x) u(x) + β(x) u = 0 at ∂Ω} ∂ν with α, β ≥ 0 and infx∈∂Ω(α(x) + β(x)) > 0. Then,

|λφ − Aφ|X ≥ (λ − ω)|φ|X . (4.30) for all φ ∈ C2(Ω). It follows from the he Lax Milgram theory that

−1 2 2 (λ0 I − A) ∈ L(L (Ω),H (Ω), assuming that coefficients (a, b, c) are sufficiently smooth. Let

−1 dom(A) = {(λ0 I − A) C(Ω)}.

Since C2(Ω) is dense in dom(A), (??) holds for all φ ∈ dom(A), which shows A is dissipative. Example (Advection equation) Consider the advection equation

ut + ∇ · (~b(x)u) = ν ∆u.

Let X = L1(Ω). Assume ~b ∈ L∞(Ω) Let ρ ∈ C1(R) be a monotonically increasing function satisfying ρ(0) = 0 and ρ(x) = s 1 sign(x), |x| ≥ 1 and ρ(s) = ρ(  ) for  > 0. For u ∈ C (Ω) Z ∂ ~ ~ 1 0 (Au, u) = (ν u − n · b u, ρ(u)) ds + (b u − ν ∇u, ρ(u) ∇u) + (c u, ρ(u)). Γ ∂n  where 1 1  (~b u, ρ0 (u) ∇u) ≤ ν (∇u, ρ0 (u) ∇u) + meas({|u| ≤ }).     4ν

84 Assume the inflow condition u on {s ∈ ∂Ω: n · b < 0}. Note that for u ∈ L1(Rd)

1 (u, ρ(u)) → |u|1 and (ψ, ρ(u)) → (ψ, sign0(u)) for ψ ∈ L (Ω) as  → 0. It thus follows (λ − ω) |u| ≤ |λ u − λ Au|. (4.31) Since C2(Ω) is dense in W 2,1(Rd) (4.31) holds for u ∈ W 2,1(Ω). Example (X = Lp(Ω)) Let Au = ν ∆u + b · ∇u with homogeneous boundary condition u = 0 on X = Lp(Ω). Since Z Z h∆u, u∗i = (∆u, |u|p−2u) = −(p − 1) (∇u, |u|p−2∇u) Ω Ω and

2 p−2 (p − 1)ν p−2 |b|∞ p (b · ∇u, |u| u) 2 ≤ |(∇u, |u| ∇u) 2 + (|u| , 1) 2 L 2 L 2ν(p − 1) L we have hAu, u∗i ≤ ω|u|2 for some ω > 0. Example (Fractional PDEs I) In this section we consider the nonlocal diffusion equation of the form Z ut = Au = J(z)(u(x + z) − u(x)) dz. Rd Or, equivalently Z Au = J(z)(u(x + z) − 2u(x) + u(x − z)) dz (Rd)+ for the symmetric kernel J in Rd. It will be shown that Z Z (Au, φ)L2 = J(z)(u(x + z) − u(x))(φ(x + z) − φ(x)) dz dx Rd (Rd)+ and thus A has a maximum extension. Also, the nonlocal Fourier law is given by Z Au = ∇ · ( J(z)∇u(x + z) dz). Rd Thus, Z (Au, φ)L2 = J(z)∇u(x + z) · ∇φ(x) dz dx Rd×Rd Under the kernel J is completely monotone, one can prove that A has a maximal monotone extension.

85 4.4 Jump diffusion Model for American option In this section we discuss the American option for the jump diffusion model σ2 σ2x2 u + (x − )u + u + Bu + λ = 0, u(T, x) = ψ, t 2 x 2 xx

(λ, u − ψ) = 0, λ ≤ 0, u ≥ ψ where the generator B for the jump process is given by Z ∞ s Bu = k(s)(u(x + s) − u(x) + (e − 1)ux) ds. −∞ The CMGY model for the jump kernel k is given by  −M|s| 1+Y +  Ce |s| = k (s) s ≥ 0 k(s) =  Ce−G|s||s|1+Y = k−(s) s ≤ 0 Since Z ∞ Z ∞ Z ∞ k(s)(u(x + s) − u(x)) ds = k+(s)(u(x + s) − u(x)) ds + k−(s)(u(x − s) − u(x)) ds −∞ 0 0

Z ∞ k+(s) + k−(s) Z ∞ k+(s) − k−(s) = (u(x + s) − 2u(x) + u(x − s)) ds + (u(x + s) − u(x − s)) ds. 0 2 0 2 Thus, Z ∞ Z ∞ ( k(s)(s)(u(x + s) − u(x) ds)φ dx −∞ −∞

Z ∞ Z ∞ = ks(s)(u(x + s) − u(x))(φ(x + s) − φ(s)) ds dx −∞ 0

Z ∞ Z ∞ + ( ku(s)(u(x + s) − u(s)))φ(x) dx −∞ 0 where k+(s) + k−(s) k+(s) − k−(s) k (s) = , k (s) = s 2 u 2 and hence Z ∞ Z ∞ (Bu, φ) = ks(s)(u(x + s) − u(x))(φ(x + s) − φ(s)) ds dx −∞ −∞

Z ∞ Z ∞ Z ∞ + ( ku(s)(u(x + s) − u(s)))φ(x) dx + ω uxφ dx. −∞ −∞ −∞ where Z ∞ ω = (es − 1)k(s) ds. −∞ If we equip V = H1(R) by Z ∞ Z ∞ 2 Z ∞ 2 2 σ 2 |u|V = ks(s)|u(x + s) − u(x)| ds dx + |ux| dx, −∞ −∞ 2 −∞ then A + B ∈ L(V,V ∗) and A + B generates the analytic semigroup on X = L2(R).

86 4.5 Numerical approximation of nonlocal operator In this section we describe our higher order integration method for the convolution; Z ∞ k+(s) + k−(s) Z ∞ k+(s) − k−(s) (u(x+s)−2u(x)+u(x−s)) ds+ (u(x+s)−u(x−s)) ds. 0 2 0 2 For the symmetric part, Z ∞ 2 u(x + s) − 2u(x) + u(x − s) s ks(s) 2 ds, −∞ s where we have u(x + s) − 2u(x) + u(x − s) s2 ∼ u (x) + u (x) + O(s4) s2 xx 12 xxxx

We apply the fourth order approximation of uxx by u(x + h) − 2u(x) + u(x − h) 1 u(x + 2h) − 4u(x) + 6u(x) − 4u(x − h) + u(x − 2h) u (x) ∼ − xx h2 12 h2 and we apply the second order approximation of uxxxx(x) by u(x + 2h) − 4u(x) + 6u(x) − 4u(x − h) + u(x − 2h) u (x) ∼ . xxxx h4 Thus, one can approximate Z h 2 2 u(x + s) − 2u(x) + u(x − s) s ks(s) ds h s2 − 2 by u − 2u + u 1 u − 4u + 6u − 4u + u ρ ( k+1 k k−1 − k+2 k+1 k k−1 k−2 ) 0 h2 12 h2 ρ u − 4u + 6u − 4u + u + 1 k+2 k+1 k k−1 k−2 , 12 h2 where Z h Z h 2 2 1 2 4 ρ0 = s ks(s) ds and ρ1 = s ks(s) ds. h h2 h − 2 − 2 The remaining part of the convolution

1 Z (k+ 2 )h u(xk+j + s)ks(s) ds 1 (k− 2 )h can be approximated by three point quadrature rule based on s2 u(x + s) ∼ u(x ) + u0(x )s + u00(x ) k+j k+j k+j 2 k+j with u − u u0(x ) ∼ k+j+1 k+j−1 k+j 2h u − 2u + u u00(x ) ∼ k+j+1 k+j k+j−1 . k+j h2

87 That is,

1 Z (k+ 2 )h u(xk+j + s)ks(s) ds 1 (k− 2 )h u − u u − 2u + u ∼ ρku + ρk k+j−1 k+j+1 + ρk j+k+1 k+j j+k−1 0 k+j 1 2 2 2 where 1 k R (k+ 2 )h ρ0 = 1 ks(s) ds (k− 2 )h

1 k 1 R (k+ 2 )h ρ1 = h 1 (s − xk)ks(s) ds (k− 2 )h

1 k 1 R (k+ 2 )h 2 ρ2 = h2 1 (s − xk) ks(s) ds. (k− 2 )h For the skew-symmetric integral

Z h 2 ρ3 2 ku(s)(u(x + s) − u(x − s)) ds ∼ ρ2 ux(x) + h uxxx(x) h 6 − 2 where Z h Z h 2 1 2 3 ρ2 = 2sku(s) ds, ρ3 = 2s ku(s) ds. h h2 h − 2 − 2 We may use the forth order difference approximation

u(x + h) − u(x − h) u(x + 2h) − 2u(x + h) + 2u(x − h) − u(x − 2h) u (x) ∼ − x 2h 6h and the second order difference approximation

u(x + 2h) − 2u(x + h) + 2u(x − h) − u(x − 2h) u (x) ∼ xxx h3 and obtain

h Z 2 ku(s)(u(x + s) − u(x − s)) ds h − 2 u − u u − 2u + 2u − u ρ u − 2u + 2u − u ∼ ρ ( k+1 k−1 − k+2 k+1 k−1 k−1 ) + 3 k+2 k+1 k−1 k−1 . 2 2h 6h 6 h Consider the nonlocal PDE equations of the form

Example (Fractional PDEs II) Consider the fractional equation of the form

Z 0 0 g(θ)u (t + θ) dθ = Au, u(0) = u0, −t

88 where the kernel g satisfies

g > 0, g ∈ L1(−∞, 0) and non-decreasing.

For example, the case of the Caputo (fractional) derivative has 1 g(θ) = |θ|−α. |Gamma(1 − α)

d ∂ Define z(t, θ) = u(t + θ), θ ∈ (−∞, 0]. Then, dt z = ∂θ z. Thus, we define the linear operator A on Z = C((−∞, 0]; X) by

Z 0 Az = z0(θ) with dom(A) = {z0 ∈ X : g(θ)z0(θ) dθ = Az(0)} −∞ Theorem 4.1 Assume A is m-dissipative in a Banach space X. Then, A is dissipative and R(λ I − A) = Z for λ > 0. Thus, A generates the C0-semigroup T (t) on Z = C((−∞, 0]; X).

Proof: First we show that A is dissipative. For φ ∈ dom (A) suppose |φ(0)| > |φ(θ)| for all θ < 0. Define 1 Z θ g(θ) = g(θ) dθ.  θ− For all x∗ ∈ F (φ(0))

Z 0 ∗ 0 hx , g(θ)(φ )dθi −∞

Z 0 g(θ) − g(θ − ) = hx∗, hφ(θ) − φ(0)), i dθ ≤ 0 −∞  since hx∗, φ(θ) − φ(0)i ≤ (|φ(θ)| − |φ(0)|)|φ(0)| < 0, θ < 0. Thus, Z 0 Z 0 ∗ 0 0 ∗ lim hx , g(θ)(φ )dθi = h g(θ)φ dθ, x i < 0. (4.32) →0+ −∞ −∞ But, since there exists a x∗ ∈ F (φ(0)) such that

hx∗, Axi ≥ 0 which contradicts to (4.32). Thus, there exists θ0 such that |φ(θ0)| = |φ|Z . Since ∗ ∗ ∗ hφ(θ), x i ≤ |φ(θ)| for x ∈ F (φ(θ0)), θ → hφ(θ), x i attains the maximum at θ0 and ∗ 0 thus hx φ (θ0)i = 0 Hence,

0 ∗ 0 |λ φ − φ |Z ≥ hx , λ φ(θ0) − φ (θ0), i = λ |φ(θ0)| = λ |φ|Z . (4.33)

For the range condition λφ − Aφ = f we note that

φ(θ) = eλθφ(0) + ψ(θ)

89 where Z 0 ψ(θ) = eλ(θ−ξ)f(ξ) dξ. θ Thus, Z 0 (∆(λ) I − A)φ(0) = g(θ)ψ0(θ) dθ) −∞ where Z 0 ∆(λ) = λ g(θ)eλθ dθ > 0 −∞ Thus, Z 0 φ(0) = (∆(λ) I − A)−1 g0(θ)ψ(θ) dθ. −∞ Since A is dissipative and

λψ − ψ0 = f ∈ Z, ψ(0) = 0, 1 thus |ψ| ≤ |f| . Thus φ = (λ I − A)−1f ∈ Z. Z λ Z  Example (Renewable system) We discuss the renewable system of the form

dp0 P P R L dt = − i λi p0(t) + i 0 µi(x)pi(x, t) dx

(pi)t + (pi)x = −µi(x)p, p(0, t) = λi p0(t)

1 d for (p0, pi, 1 ≤ i ≤ d) ∈ R × L (0,T ) . Here, p0(t) ≥ 0 is the total utility and λi ≥ 0 is the rate for the energy conversion to the i-th process pi. The first equation is the energy balance law and s is the source = generation −consumption. The second equation is for the transport (via pipeline and storage) for the process pi and µi ≥ 0 is the renewal rate andµ ¯ ≥ 0 is the natural loos rate. {λi ≥ 0} represent the distribution of the utility to the i-th process. Assume at the time t = 0 we have the available utility p0(0) = 1 and pi = 0. Then we have the following conservation

Z t p0(t) + pi(s) ds = 1 0 if t ≤ L. Let X = R × L1(0,L)d. Let A(µ) defined by

X X Z L Ax = (− λi p0 + µi(x) dx, −(pi)x − µi(x)pi) i i 0 with domain

11 d dom(A) = {(p0, pi) ∈ R × W (0,L) : pi(0) = λi p0}

Let  s |s| >   |s| sign(s) =  s  |s| ≤ 

90 Then, P R L (A(p0, p), (sign0(p0), sign(p)) ≤ −( i λi)|p0| + | 0 µipi dx|

P R L i(Ψ(pi(0)) − Ψ(pi(L)) − 0 µipisign dx) where  |s| |s| >   Ψ(s) =  s2  2 + 2 |s| ≤  Since sign → sign, Ψ → |s| by the Lebesgue dominated convergence theorem, we have

(A(p0, p), (sign0(p0), sign0(p)) ≤ 0.

The resolvent equation A(p0, p) = (s, f), (4.34) has a solution R x x R x − 0 µi R − s µi pi(x) = λip0e + 0 e f(s) ds

R L L P 0 µi R ( i λi)(1 − e ) p0 = s + 0 µipi(x) dx

Thus, A generates the contractive C0 semigroup S(t) on X. Moreover, it is cone preserving S(t)C+ ⊂ C+ since the resolvent is positive cone preserving. Example (Bi-domain equation) Example (Second order equation) Let V ⊂ H = H∗ ⊂ V ∗ be the Gelfand triple. Let ρ be a bounded bilinear form on H × H, µ and σ be bounded bilinear forms on V × V . Assume ρ and σ are symmetric and coercive and µ(φ, φ) ≥ 0 for all φ ∈ V . Consider the second order equation

ρ(utt, φ) + µ(ut, φ) + σ(u, φ) = hf(t), φi for all φ ∈ V. (4.35)

Define linear operators M (mass), D (dampping), K and (stiffness) by

(Mφ, ψ)H = ρ(φ, ψ), φ, ψ ∈ HhDφ, ψi = µ(φ, ψ) φ, ψ ∈ V hKφ, ψiV ∗×V , φ, ψ ∈ V

Let v = ut and define A on X = V × H by

A(u, v) = (v, −M −1(Ku + Dv)) with domain dom (A) = {(u, v) ∈ X : v ∈ V and Ku + Dv ∈ H} The state space X is a Hilbert space with inner product

((u1, v1), (u,v2)) = σ(u1, u2) + ρ(v1, v2) and 2 E(t) = |(u(t), v(t))|X = σ(u(t), u(t)) + ρ(v(t), v(t))

91 defines the energy of the state x(t) = (u(t), v(t)). First, we show that A is dissipative:

−1 (A(u, v), (u, v))X = σ(u, v)+ρ(−M (Ku+Dv), v) = σ(u, v)−σ(u, v)−µ(v, v) = −µ(v, v) ≤ 0

Next, we show that R(λ I − A) = X. That is, for (f, g) ∈ X the exists a solution (u, v) ∈ dom (A) satisfying

λ u − v = f, λMv + Dv + Ku = Mg, or equivalently v = λ u − f and

λ2Mu + λDu + Ku = Mg + λ Mf + Df (4.36)

Define the bilinear form a on V × V

a(φ, ψ) = λ2 ρ(φ, ψ) + λ µ(φ, ψ) + σ(φ, ψ)

Then, a is bounded and coercive and if we let

F (φ) = (M(g + λ f)φ)H + µ(f, φ) then F ∈ V ∗. It thus follows from the Lax-Milgram theory there exists a unique solution u ∈ V to (4.36) and Dv + Ku ∈ H. For example, consider the wave equation

1 c2(x) utt + κ(x)ut = ∆u

∂u [ ∂n ] + αu = γ ut at Γ In this example we let V = H1(Ω)/R and H = L2(Ω) and define Z Z σ(φ, ψ) = (∇φ, ∇ψ) dx + αφψ ds Ω Γ Z Z µ(φ, ψ) = κ(x)φ, ψ dx + γ)φ, ψ ds ∂Ω Γ Z 1 ρ(φ, ψ) = 2 φψ dx Ω c (x) Example (Maxwell system for elecro-magnetic equations)

 Et = ∇ × H, ∇E = ρ

µ Ht = −∇ × E, ∇ · B = 0 with boundary condition E × n = 0 where E is Electric field, Bµ H is Magnetic field and D =  E is dielectric with , µ is electric and magnetic permittivity, respectively Let X = L2(Ω)d × L2(Ω)d with the norm defined by Z ( |E|2 + µ |H|2) dx. Ω

92 The dissipativity follows from Z Z Z (E · (∇ × H) − H · (∇ × E)) dx = ∇ · (E × H) dx = n · (E × H) ds = 0 Ω Ω ∂Ω Let ρ = 0 and thus ∇ · E = 0. The range condition is equivalent to 1  E + ∇ × (∇ × E − g) = f µ The weak form is given by 1 1 ( E, ψ) + ( ∇ × E, ∇ × ψ) = (t, ψ) + (g, ∇ × ψ). (4.37) µ µ for ψ ∈ V = {H1(Ω) : ∇ cot ψ = 0, n × ψ = 0 at ∂Ω} Since |∇ × ψ|2 = |∇ψ|2 for ∇ · ψ = 0. the right hand side of (4.37) defines the bounded coercive quadratic form on V × V , it follows from the Lax-Milgram equation that (??) has a unique solution in V .

4.6 Dual semigroup Theorem (Dual semigroup) Let X be a reflexive Banach space. The adjoint S∗(t) of the C0 semigroup S(t) on X forms the C0 semigroup and the infinitesimal generator of S∗(t) is A∗. Let X be a Hilbert space and dom(A∗) be the Hilbert space with graph ∗ norm and X−1 be the strong dual space of dom(A ), then the extension S(t) to X−1 defines the C0 semigroup on X−1. Proof: (1) Since for t, s ≥ 0

S∗(t + s) = (S(s)S(t))∗ = S∗(t)S∗(s) and ∗ hx, S (t)y − yiX×X∗ = hS(t)x − x, yiX×X∗ → 0. for x ∈ X and y ∈ X∗ Thus, S∗(t) is weakly star continuous at t = 0 and let B is the generator of S∗(t) as S∗(t)x − x Bx = w∗ − lim . t Since S(t)x − x S∗(t)y − y ( , y) = (x, ), t t for all x ∈ dom(A) and y ∈ dom(B) we have

hAx, yiX×X∗ = hx, ByiX×X∗ and thus B = A∗. Thus, A∗ is the generator of a w∗− continuous semigroup on X∗. (2) Since Z t S∗(t)y − y = A∗ S∗(s)y ds 0 for all y ∈ Y = dom (A∗). Thus, S∗(t) is strongly continuous at t = 0 on Y .

93 ∗ ∗ (3) If X is reflexive, dom (A ) = X . If not, there exists a nonzero y0 ∈ X such that ∗ ∗ ∗ −1 ∗ hy0, x iX×X∗ = 0 for all x ∈ dom(A ). Thus, for x0 = (λ I −A) y0 hλ x0 −Ax0, x i = ∗ ∗ ∗ ∗ ∗ −1 ∗ ∗ hx0, λ x − A x i = 0. Letting x = (λ I − A ) x0 for x0 ∈ F (x0), we have x0 = 0 and thus y0 = 0, which yields a contradiction. ∗ ∗ ∗ ∗ (4) X1 = dom (A ) is a closed subspace of X and is a invariant set of S (t). Since A ∗ is closed, S (t) is the C0 semigroup on X1 equipped with its graph norm. Thus,

∗ ∗ ∗ (S (t)) is the C0 semigroup on X−1 = X1

∗ ∗ and defines the extension of S(t) to X−1. Since for x ∈ X ⊂ X−1 and x ∈ X

hS(t)x, x∗i = hx, S∗(t)x∗i,

∗ ∗ S(t) is the restriction of (S (t)) onto X. 

4.7 Stability Theorem (Datko 1970, Pazy 1972). A strongly continuous semigroup S(t), t ≥ 0 on a Banach space X is uniformly exponentially stable if and only if for p ∈ [1, ∞) one has Z ∞ |S(t)x|p dt < ∞ for all x ∈ X. 0 Theorem. (Gearhart 1978, Pruss 1984, Greiner 1985) A strongly continuous semigroup on S(t), t ≥ 0 on a Hilbert space X is uniformly exponentially stable if and only if the half-plane {λ ∈ C : Reλ > 0} is contained in the resolvent set ρ(A) of the generator A with the resolvent satisfying

−1 |(λ I − A) |∞ < ∞

4.8 Sectorial operator and Analytic semigroup In this section we have the representation of the semigroup S(t) in terms of the inverse Laplace transform. Taking the Laplace transform of d x(t) = Ax(t) + f(t) dt we have xˆ = (λ I − A)−1(x(0) + fˆ) where for λ > ω Z ∞ xˆ = e−λtx(t) dt 0 is the Laplace transform of x(t). We have the following the representation theory (inverse formula). Theorem (Resolvent Calculus) For x ∈ dom(A2) and γ > ω

1 Z γ+i∞ S(t)x = eλt(λ I − A)−1x dλ. (4.38) 2πi γ−i∞

94 ω0 Proof: Let Aµ be the Yosida approximation of A. Since Re σ(Aµ) ≤ < γ, we 1 − µω0 have Z γ+i∞ 1 λt −1 uµ(t) = Sµ(t)x = e (λ I − Aµ) x dλ. 2πi γ−i∞ Note that λ(λ I − A)−1 = I + (λ I − A)−1A. (4.39) Since 1 Z γ+i∞ eλt dλ = 1 2πi γ−i∞ λ and Z γ+i∞ |(|λ − ω|−2| dλ < ∞, γ−i∞ we have 2 |Sµ(t)x| ≤ M |A x|, uniformly in µ > 0. Since µ (λ I − A )−1x − (λ I − A)−1x = (ν I − A)−1(λ I − A)−1A2x, µ 1 + λµ λ where ν = , {u (t)} is Cauchy in C(0,T ; X) if x ∈ dom(A2). Letting µ → 0+, 1 + λµ µ we obtain (4.38).  Next we consider the sectorial operator. For δ > 0 let π Σδ = {λ ∈ C : arg(λ − ω) < + δ} ω 2 be the sector in the complex plane C. A closed, densely defined, linear operator A on a Banach space X is a sectorial operator if M |(λ I − A)−1| ≤ for all λ ∈ Σδ . |λ − ω| ω

For 0 < θ < δ let Γ = Γω,θ be the integration path defined by

± π Γ = {z ∈ C : |z| ≥ δ, arg(z − ω) = ±( 2 + θ)},

π Γ0 = {z ∈ C : |z| = δ, |arg(z − ω)| ≤ 2 + θ}. For 0 < θ < δ define a family {S(t), t ≥ 0} of bounded linear operators on X by 1 Z S(t)x = eλt(λ I − A)−1x dλ. (4.40) 2πi Γ Theorem (Analytic semigroup) If A is a sectorial operator on a Banach space X, then A generates an analytic (C0) semigroup on X, i.e., for x ∈ X t → S(t)x is an analytic function on (0, ∞). We have the representation (4.40) for x ∈ X and M |AS(t)x| ≤ θ |x| X t X

95 Proof: The elliptic operator A defined by the Lax-Milgram theorem defines a sectorial operator on Hilbert space. Theorem (Sectorial operator) Let V,H are Hilbert spaces and assume H ⊂ V ∗. Let ρ(u, v) is bounded bilinear form on H × H and

2 ρ(u, u) ≥ |u|H for all u ∈ H

Let a(u, v) to be a bounded bilinear form on V × V with

2 σ(u, u) ≥ δ |u|V for all u ∈ V.

Define the linear operator A by

ρ(Au, φ) = a(u, φ) for all φ ∈ V.

Then, for Re λ > 0we have

−1 1 |(λ I − A) |L(V,V ∗) ≤ δ

−1 M |(λ I − A) |L(H) ≤ |λ|

−1 M |(λ I − A) |L(V ∗,H) ≤ √ |λ|

−1 M |(λ I − A) |L(H,V ) ≤ √ |λ|

Proof: Let a(u, v) to be a bounded bilinear form on V × V . For f ∈ V ∗ and <λ > 0, Define MH, H by (Mu, v) = ρ(u, v) for all v ∈ H ∗ and A0 ∈ L(V,V ) by hA0u, vi = σ(u, v) for v ∈ V −1 −1 Then, A = M A0 and (λ I − A)u = M f is equivalent to

λρ(u, φ) + a(u, φ) = hf, φi, for all φ ∈ V. (4.41)

It follows from the Lax-Milgram theorem that (4.41) has a unique solution, given f ∈ V ∗ and Re λρ(u, u) + a(u, u) ≤ |f|V ∗ |u|V . Thus, −1 1 |(λ I − A) | ∗ ≤ . L(V ,V ) δ Also, 2 2 2 |λ| |u|H ≤ |f|V ∗ |u|V + M |u|V = M1 |f|V ∗ M for M1 = 1 + δ2 and thus √ −1 M1 |(λ I − A) | ∗ ≤ . L(V ,H) |λ|1/2

96 For f ∈ H ⊂ V ∗ 2 δ |u|V ≤ Re λ ρ(u, u) + a(u, u) ≤ |f|H |u|H , (4.42) and 2 |λ|ρ(u, u) ≤ |f|H |u|H + M|u|V ≤ M1|f|H |u|H Thus, M |(λ I − A)−1| ≤ 1 . L(H) |λ| Also, from (4.42) 2 2 δ |u|V ≤ |f|H |u|H ≤ M1 |f| . which implies M |(λ I − A)−1| ≤ 2 . L(H,V ) |λ|1/2

4.9 Approximation Theory

In this section we discuss the approximation theory for the linear C0-semigroup. Equiv- alence Theorem (Lax-Richtmyer) states that for consistent numerical approximations, stability and convergence are equivalent. In terms of the linear semigroup theory we have

Theorem (Trotter-Kato theorem) Let X and Xn be Banach spaces and A and An be the infinitesimal generator of C0 semigroups S(t) on X and Sn(t) on Xn of G(M, ω) class. Assume a family of uniformly bounded linear operators Pn ∈ L(X,Xn) and En ∈ L(Xn,X) satisfy

PnEn = I |EnPnx − x|X → 0 for all x ∈ X (4.43)

Then, the followings are equivalent. (1) there exist a λ0 > ω such that for all x ∈ X

−1 −1 |En(λ0 I − An) Pnx − (λ0 I − A) x|X → 0 as n → ∞, (4.44)

(2) For every x ∈ X and T ≥ 0

|EnSn(t)Pnx − S(t)x|X → as n → ∞. uniformly on t ∈ [0,T ]. Proof: Since for λ > ω Z ∞ −1 −1 En(λ I − A) Pnx − (λ I − A) x = EnSn(t)Pnx − S(t)x dt 0 (1) follows from (2). Conversely, from the representation theory

Z γ+i∞ 1 −1 −1 EnSn(t)Pnx − S(t)x = (En(λ I − A) Pnx − (λ I − A) x) dλ 2πi γ−i∞ where −1 −1 −1 −1 (λ I − A) − (λ0 I − A) = (λ − λ0)(λ I − A) (λ0 I − A) .

97 Thus, from the proof of Theorem (Resolvent Calculus) (1) holds for x ∈ dom(A2). But 2 since dom(A ) is dense in X, (2) implies (1). 

Remark (Stability) If An is uniformly dissipative:

|λun − Anun| ≥ (λ − ω) |un| for all un ∈ dom(An) and some ω ≥ 0, then An generates ω contractive semigroup Sn(t) on Xn. Remark (Consistency) (λI − An)Pnun = Pnf

Pn(λI − A)u = Pnf we have (λ I − An)Pn(u − un) + PnAu − AnPnu = 0 Thus |Pn(u − un)| ≤ M |PnAu − AnPnu| The consistency(4.44) foll0ws from

|PnAu − AnPnu| → 0 for all u in a dense subset of dom(A). Corollary Let the assumptions of Theorem hold. The statement (1) of Theorem is equivalent to (4.43) and the followings:

(C.1) there exists a subset D of doma(A) such that D = X and (λ0 I − A)D = X.

(C.2) for all u ∈ D there exists a sequenceu ¯n ∈ dom(An) such that lim Enu¯n = u and lim EnAnu¯n = Au.

Proof: Without loss of generality we can assume λ0 = 0. First we assume that condition (1) hold. We set D = dom(A) and thus AD = X. For u ∈ dom(A) we setu ¯n = −1 An PnAu and −1 −1 Enu¯n − u = EnAn Pnx − A x → 0 and −1 −1 EnAnu¯n − Au = EnAnAn Pnx − AA x = EnPnx − x → 0 as n → ∞. Hence condition (C.2) hold. Conversely, we assume conditions (C.1)–(C.2) hold.

−1 −1 −1 −1 −1 EnAn Pn − A = En(An PnA − Pn)A + (EnPn − I)A .

−1 −1 For x ∈ AD we choose u ∈ D such that x = Au and set un = An Pnx = An PnAu. We then for u we chooseu ¯n according to (C.2). Thus, we obtain

|u¯n − Pnu| = |Pn(Enu¯n − u)| ≤ M |Enu¯n − u| → 0 as n → ∞ and

−1 −1 |u¯n − un| ≤ |An ||Anu¯n + PnAu| ≤ |An ||Pn|EnAnu¯n + Au| → 0

98 as n∞. It thus follows that

−1 −1 |EnAn Pnx−A x| ≤ |En(un −Pnu)|+|EnPnu−u| ≤ M (|un −Pnu|+|EnPnu−u| → 0 as n → ∞ for all x ∈ AD.  Example 1 (Trotter-Kato theoarem) Consider the heat equation on Ω = (0, 1) × (0, 1):

d u(t) = ∆u, u(0, x) = u (x) dt 0 with boundary condition u = 0 at the boundary ∂Ω. We use the central difference 1 approximation on uniform grid points: (i h, j h) ∈ Ω with mesh size h = n : d 1 u − u u − u 1 u − u u − u u (t) = ∆ u = ( i+1,j i,j − i,j i−1,j ) + ( i,j+1 i,j − i,j i,j−1 ) dt i,j h h h h h h h for 1 ≤ i, j ≤ n1, where ui,0 = ui,n = u1, j = un,j = 0 at the boundary node. First, (n−1)2 let X = C(Ω) and Xn = R with sup norm. Let Enui,j = the piecewise linear interpolation and (Pnu)i,j = u(i h, j h) is the point-wise evaluation. We will prove that ∆h is dissipative on Xn. Suppose uij = |un|∞. Then, since

λ ui,j − (∆hu)i,j = fij and 1 −(∆ u) = (4u − u − u − u − u ) ≥ 0 h i,j h2 i,j i+1,j i,j+1 i−1,j i,j−1 we have f 0 ≤ u ≤ i,j . i,j λ 2 2 Thus, ∆h is dissipative on Xn with sup norm. Next X = L (Ω) and Xn with ` norm. Then, X ui,j − ui−1,j ui,j − ui,j−1 (−∆ u , u ) = | |2 + | |2 ≥ 0 h n n h h i,j 2 and thus ∆h is dissipative on Xn with ` norm. Example 2 (Galerkin method) Let V ⊂ H = H∗ ⊂ V ∗ is the Gelfand triple. Consider the parabolic equation d ρ( u , φ) = a(u , φ) (4.45) dt n n for all φ ∈ V , where the ρ is a symmetric mass form

2 ρ(φ, φ) ≥ c |φ|H and a is a bounded coercive bilnear form on V × V such that

2 a(φ, φ) ≥ δ |φ|V . Define A by ρ(Au, φ) = a(u, φ) for all φ ∈ V. By the Lax-Milgram theorem

(λI − A)u = f ∈ H

99 has a unique solution satisfying

λρ(u, φ) − a(u, φ) = (f, φ)H for all φ ∈ V . Let dom(A) = (I − A)−1H. Assume

X n n Vn = {u = akφk , φk ∈ V } is dense in V

Consider the Galerkin method, i.e. un(t) ∈ Vn satisfies d ρ( u (t), φ) = a(u , φ). dt n n

Since for u = (λ I − A)u andu ¯n ∈ Vn

λρ(un, φ) + a(un, φ) = (f, φ) for φ ∈ Vn

λρ(¯un, φ) + a(¯un, φ) = λρ(¯un − u, φ) + a(¯un − u, φ) + (f, φ) for φ ∈ Vn

λρ(un − u¯n, φ) + a(un − u¯n, φ) = λρ(¯un − u, φ) + a(¯un − u, φ). Thus, M |u − u¯ | ≤ |u¯ − u| . n n δ n V Example 2 (Discontinuous Galerkin method) Consider the parabolic system for u = ~u ∈ L2(Ω)d ∂ u = ∇ · (a(x)∇u) + c(x)u ∂t where a ∈ Rd × d is symmetric and

2 2 d a|ξ| ≤ (ξ, a(x), ξ)Rd ≤ a|ξ| , ξ ∈ R for 0 < a ≤ a < ∞. The region Ω is dived into n non-overlapping sub-domains Ωi with boundaries ∂Ωi such that Ω = ∪Ωi. At the interface Γij = ∂Ωi ∩ ∂Ωj define

[[u]] = u| − u| ∂Ωi ∂Ωj

<< u >>= 1 (u| + u| ). 2 ∂Ωi ∂Ωj

The approximate solution uh(t) in

2 Vh = {uh ∈ L (Ω) : uh is linear on Ωi}.

Define the bilinear for on Vh × Vh X Z X Z β a (u, v) = (a(x)∇u, ∇v) dx− (<< n·(a∇u) >> [[v]]± << n·(a∇v) >> [[u]]+ [[u]][[v]] ds), h h i Ωi i>j Γij whee h is the meshsize and β > 0 is sufficiently large. If + on the third term ah is symmetric and for the case − then ah enjoys the coercivity X Z ah(u, u) ≥ (a(x)∇u, ∇u) dx, ∈ u ∈ Vh, i Ωi

100 regardless of β > 0. Example 3 (Population dynaims) The transport equation

∂p ∂p ∂t + ∂x + m(x)p(x, t) = 0

p(0, t) = R β(x)p(x, t) dx

Define the difference approximation

pi − pi−1 X A p = (− − m(x )p , 1 ≤ i ≤ n), p = β p n h i i 0 i i i Then, X (Anp, sign0(p)) ≤ ( mi − βi)|pi| ≤ 0

1 Thus, An on L (0, 1) is dissipative.

(A, Enφ) − (PnAn, φ). Example 4 (Yee’s scheme) Consider the two dimensional Maxwell’s equation. Consider the staggered grid; i.e. 1 2 E = (E 1 ,E 1 ) is defined at the the sides and H = Hi− 1 ,j− 1 is defined at the i− 2 ,j i,j− 2 2 2 center of the cell Ωi,j = ((i − 1)h, ih) × ((j − 1)h, jh).

H 1 1 −H 1 1 d 1 i− 2 ,j+ 2 i− 2 ,j− 2 i− 1 ,j dt E 1 = − h 2 i− 2 ,j

H 1 1 −H 1 1 d 2 i+ 2 ,j+ 2 i− 2 ,j−+ 2 i,j+ 1 dt E 1 = h (4.46) 2 i,j+ 2

2 2 1 1 E 1 −E 1 E 1 −E 1 d i,j− 2 i−1,j− 2 i− 2 ,j i− 2 ,j−1 µ 1 1 H 1 1 = − , i− 2 ,j− 2 dt i− 2 ,j− 2 h h 1 2 where E 1 = 0, j = 0, j = N and E 1 = 0, i = 0, j = N. i− 2 ,j i,j− 2 ,j Since

N N 2 H 1 1 − H 1 1 H 1 1 − H 1 1 h X X i− 2 ,j+ 2 i− 2 ,j− 2 1 i+ 2 ,j+ 2 i− 2 ,j−+ 2 − E 1 + i− ,j 1 h 2 E i,j+ i=1 j=1 2

2 2 1 1 E 1 − E 1 E 1 − E 1 i,j− 2 i−1,j− 2 i− 2 ,j i− 2 ,j−1 +( − )Hi− 1 ,j− 1 = 0 h h 2 2

(4.46) is uniformly dissipative. The range condition λ I −Ah = (f, g) ∈ Xh is equivalent to the minimization for E 2 2 1 1 E 1 − E 1 E 1 − E 1 1 1 2 1 1 i,j− 2 i−1,j− 2 2 i− 2 ,j i− 2 ,j−1 2 min (i− 1 ,jEi− 1 ,j + i,j+ 1 Ei,j+ 1 ) + (| | + | | ) 2 2 2 2 2 2 µi,j h h

g 1 1 −g 1 1 H 1 1 −H 1 1 1 1 i− 2 ,j+ 2 i− 2 ,j− 2 1 2 1 i+ 2 ,j+ 2 i− 2 ,j−+ 2 2 −(f 1 − ,E 1 − (f 1 + )E 1 . µi,j h µi,j h i− 2 ,j i− 2 ,j i,j+ 2 i,j+ 2 Example 5 Legende-Tau method

101 5 Dissipative Operators and Semigroup of Non- linear Contractions

In this section we consider du ∈ Au(t), u(0) = u ∈ X dt 0 for the dissipative mapping A on a Banach space X. Definition (Dissipative) A mapping A on a Banach space X is dissipative if

|x1 − x2 − λ (y1 − y2)| ≥ |x1 − x2| for all λ > 0 and [x1, y1], [x2, y2] ∈ A, or equivalently

hy1 − y2, x1 − x2i− ≤ 0 for all [x1, y1], [x2, y2] ∈ A. and if in addition R(I − λA) = X, then A is m-dissipative. In particular, it follows that if A is dissipative, then for λ > 0 the operator (I − λ A)−1 is a single-valued and nonexpansive on R(I − λ A), i.e.,

|(I − λ A)−1x − (I − λ A)−1y| ≤ |x − y| for all x, y ∈ R(I − λ A).

Define the resolvent and Yosida approximation A by

−1 Jλx = (I − λ A) x ∈ dom (A), x ∈ dom (Jλ) = R(I − λ A) (5.1) −1 Aλ = λ (Jλx − x), x ∈ dom (Jλ).

We summarize some fundamental properties of Jλ and Aλ in the following theorem. Theorem 1.4 Let A be an ω− dissipative subset of X × X, i.e.,

|x1 − x2 − λ (y1 − y2)| ≥ (1 − λω) |x1 − x2|

−1 for all 0 < λ < ω and [x1, y1], [x2, y2] ∈ A and define kAxk by

kAxk = inf{|y| : y ∈ Ax}.

Then for 0 < λ < ω−1, −1 (i) |Jλx − Jλy| ≤ (1 − λω) |x − y| for x, y ∈ dom (Jλ). (ii) Aλx ∈ AJλx for x ∈ R(I − λ A). −1 (iii) For x ∈ dom(Jλ) ∩ dom (A) |Aλx| ≤ (1 − λω) kAxk and thus |Jλx − x| ≤ λ(1 − λω)−1 kAxk. (iv) If x ∈ dom (Jλ), λ, µ > 0, then µ λ − µ x + J x ∈ dom (J ) λ λ λ µ and µ λ − µ  J x = J x + J x . λ µ λ λ λ

102 −1 (v) If x ∈ dom (Jλ)∩dom (A) and 0 < µ ≤ λ < ω , then (1−λω)|Aλx| ≤ (1−µω) |Aµy| −1 −1 −1 (vi) Aλ is ω (1 − λω) -dissipative and for x, y ∈ dom(Jλ) |Aλx − Aλy| ≤ λ (1 + (1 − λω)−1) |x − y|.

Proof: (i) − −(ii) If x, y ∈ dom (Jλ) and we set u = Jλx and v = Jλy, then there exist uˆ andv ˆ such that x = u − λ uˆ and y = v − λ vˆ. Thus, from (1.8)

−1 −1 |Jλx − Jλy| = |u − v| ≤ (1 − λω) |u − v − λ (ˆu − vˆ)| = (1 − λω) |x − y|.

−1 Next, by the definition Aλx = λ (u − x) =u ˆ ∈ Au = AJλx. (iii) Let x ∈ dom (Jλ)∩dom (A) andx ˆ ∈ Ax be arbitrary. Then we have Jλ(x−λ xˆ) = x since x − λ xˆ ∈ (I − λ A)x. Thus,

−1 −1 −1 −1 −1 |Aλx| = λ |Jλx−x| = λ |Jλx−Jλ(x−λ xˆ)| ≤ (1−λω) λ |x−(x−λ xˆ)| = (1−λω) |xˆ|. which implies (iii). (iv) If x ∈ dom (Jλ) = R(I − λ A) then we have x = u − λ uˆ for [u, uˆ] ∈ A and thus u = Jλx. For µ > 0 µ λ − µ µ λ − µ x + J x = (u − λ uˆ) + u = u − µ uˆ ∈ R(I − µ A) = dom (J ). λ λ λ λ λ µ and µ λ − µ  J x + J x = J (u − µ uˆ) = u = J x. µ λ λ λ µ λ (v) From (i) and (iv) we have

λ |Aλx| = |Jλx − x| ≤ |Jλx − Jλy| + |Jµx − x|   µ λ − µ ≤ Jµ x + Jλx − Jµx + |Jµx − x| λ λ

−1 µ λ − µ ≤ (1 − µω) x + Jλx − x + |Jµx − x| λ λ

−1 = (1 − µω) (λ − µ) |Aλx| + µ |Aµx|, which implies (v) by rearranging. (vi) It follows from (i) that for ρ > 0 ρ ρ |x − y − ρ (A x − A y)| = |(1 + )(x − y) − (J x − J y)| λ λ λ λ λ λ ρ ρ ≥ (1 + ) |x − y| − |J x − J y| λ λ λ λ ρ ρ ≥ ((1 + ) − (1 − λω)−1) |x − y| = (1 − ρ ω(1 − λω)−1) |x − y|. λ λ

The last assertion follows from the definition of Aλ and (i).  Theorem 1.5 (1) A dissipative set A ⊂ X × X is m-dissipative, if and only if

R(I − λ0 A) = X for some λ0 > 0.

103 (2) An m-dissipative mapping is maximal dissipative, i.e., all dissipative set containing A in X × X coincide with A. (3) If X = X∗ = H is a Hilbert space, then the notions of the maximal dissipative set and m-dissipative set are equivalent.

Proof: (1) Suppose R(I − λ0 A) = X. Then it follows from Theorem 1.4 (i) that Jλ0 is contraction on X. We note that λ  λ  I − λ A = 0 I − (1 − 0 ) J (I − λ A) (5.2) λ λ λ0 0 for 0 < λ < ω−1. For given x ∈ X define the operator T : X → X by λ T y = x + (1 − 0 ) J y, y ∈ X λ λ0 Then λ |T y − T z| ≤ |1 − 0 | |y − z| λ where |1 − λ | < 1 if 2λ > λ . By Banach fixed-point theorem the operator T has a λ0 0 λ0 unique fixed point z ∈ X, i.e., x = (I − (1 − λ Jλ0 )z. Thus, λ x ∈ (I − (1 − 0 ) J )(I − λ A)dom (A). λ λ0 0

λ0 and it thus follows from (5.2) that R(I − λ A) = X if λ > 2 . Hence, (1) follows from applying the above argument repeatedly. (2) Assume A is m-dissipative. Suppose A˜ is a dissipative set containing A. We need to show that A˜ ⊂ A. Let [x, xˆ] ∈ A˜ Since x − λ xˆ ∈ X = R(I − λ A), for λ > 0, there exists a [y, yˆ] ∈ A such that x − λ xˆ = y − λ yˆ. Since A ⊂ A˜ it follows that [y, yˆ] ∈ A and thus |x − y| ≤ |x − y − λ (ˆx − yˆ)| = 0. Hence, [x, xˆ] = [y, yˆ] ∈ A. (3) It suffices to show that if A is maximal dissipative, then A is m-dissipative. We use the following extension lemma, Lemma 1.6. Let y be any element of H. By Lemma 1.6, taking C = H, we have that there exists x ∈ H such that (ξ − x, η − x + y) ≤ 0 for all [ξ, η] ∈ A. and thus (ξ − x, η − (x − y)) ≤ 0 for all [ξ, η] ∈ A. Since A is maximal dissipative, this implies that [x, x − y] ∈ A, that is x − y ∈ Ax, and therefore y ∈ R(I − H).  Lemma 1.6 Let A be dissipative and C be a closed, convex, non-empty subset of H such that dom (A) ∈ C. Then for every y ∈ H there exists x ∈ C such that (ξ − x, η − x + y) ≤ 0 for all [ξ, η] ∈ A. Proof: Without loss of generality we can assume that y = 0, for otherwise we define Ay = {[ξ, η + y]:[ξ, η] ∈ A} with dom (Ay) = dom (A). Since A is dissipative if and only if Ay is dissipative , we can prove the lemma for Ay. For [ξ, η] ∈ A, define the set C([ξ, η]) = {x ∈ C :(ξ − x, η − x) ≤ 0}. T Thus, the lemma is proved if we can show that [ξ,η]∈A C([ξ, η]) is non-empty.

104 5.0.1 Properties of m-dissipative operators In this section we discuss some properties of m-dissipative sets. Lemma 1.7 Let X∗ be a strictly convex Banach space. If A is maximal dissipative, then Ax is a closed convex set of X for each x ∈ dom (A). Proof: It follows from Lemma that the duality mapping F is single-valued. First, we show that Ax is convex. Letx ˆ1, xˆ2 ∈ Ax and setx ˆ = αxˆ1 + (1 − α)x ˆ2 for 0 ≤ α ≤ 1. Then, Since A is dissipative, for all [y, yˆ] ∈ A

Re hxˆ − y,ˆ F (x − y)i = α Re hxˆ1 − y,ˆ F (x − y)i + (1 − α) Re hxˆ2 − y,ˆ F (x − y)i ≤ 0. Thus, if we define a subset A˜ by   Az if z ∈ dom (A) \{x} Az˜ =  Ax ∪ {xˆ} if z = x, then A˜ is a dissipative extension of A and dom (A˜) = dom (A). Since A is maximal dissipative, it follows that Ax˜ = Ax and thusx ˆ ∈ Ax as desired. Next, we show that Ax is closed. Letx ˆn ∈ Ax and limn→∞ xˆn =x ˆ. Since A is dissipative, Re hxˆn − y,ˆ x − yi ≤ 0 for all [y, yˆ] ∈ A. Letting n → ∞, we obtain Re hxˆ − y,ˆ x − yi ≤ 0. Hence, as shown abovex ˆ ∈ Ax as desired. 

Definition 1.4 A subset A of X × X is said to be demiclosed if xn → x and yn * y and [xn, yn] ∈ A imply that [x, y] ∈ A. A subset A is closed if [xn, yn], xn → x and yn → y imply that [x, y] ∈ A. Theorem 1.8 Let A be m-dissipative. Then the followings hold. (i) A is closed. + (ii) If {xλ} ⊂ X such that xλ → x and Aλxλ → y as λ → 0 , then [x, y] ∈ A.

Proof: (i) Let [xn, xˆn] ∈ A and (xn, xˆn) → (x, xˆ) in X × X. Since A is dissipative Re hxˆn − y,ˆ xn − yii ≤ 0 for all [y, yˆ] ∈ A. Since h·, ·ii is lower semicontinuous, letting n → ∞, we obtain Re hxˆ − y,ˆ x − yii ≤ for all [y, yˆ] ∈ A. Then A1 = [x, xˆ] ∪ A is a dissipative extension of A. Since A is maximal dissipative, A1 = A and thus [x, xˆ] ∈ A. Hence, A is closed. (ii) Since {Aλx} is a bounded set in X, by the definition of Aλ, lim |Jλxλ − xλ| → 0 + and thus Jλxλ → x as λ → 0 . But, since Aλxλ ∈ AJλxλ, it follows from (i) that [x, y] ∈ A.  Theorem 1.9 Let A be m-dissipative and let X∗ be uniformly convex. Then the followings hold. (i) A is demiclosed. + (ii) If {xλ} ⊂ X such that xλ → x and {|Aλx|} is bounded as λ → 0 , then x ∈ dom (A). Moreover, if for some subsequence Aλn xn * y, then y ∈ Ax. (iii) limλ→0+ |Aλx| = kAxk.

Proof: (i) Let [xn, xˆn] ∈ A be such that lim xn = x and w − limx ˆn =x ˆ as n → ∞. Since X∗ is uniformly convex, from Lemma the duality mapping is single-valued and uniformly continuous on the bounded subsets of X. Since A is dissipative Re hxˆn − y,ˆ F (xn − y)i ≤ 0 for all [y, yˆ] ∈ A. Thus, letting n → ∞, we obtain Re hxˆ − y,ˆ F (x − y)i ≤ 0 for all [y, yˆ] ∈ A. Thus, [x, xˆ] ∈ A, by the maximality of A.

105 Definition 1.5 The minimal section A0 of A is defined by

A0x = {y ∈ Ax : |y| = kAxk} with dom (A0) = {x ∈ dom (A): A0x is non-empty}.

Lemma 1.10 Let X∗ be a strictly convex Banach space and let A be maximal dissi- pative. Then, the followings hold. (i) If X is strictly convex, then A0 is single-valued. (ii) If X reflexible, then dom (A0) = dom (A). (iii) If X strictly convex and reflexible, then A0 is single-valued and dom (A0) = dom (A). Theorem 1.11 Let X∗ is a uniformly convex Banach space and let A be m-dissipative. Then the followings hold. 0 (i) limλ→0+ F (Aλx) = F (A x) for each x ∈ dom (A). Moreover, if X is also uniformly convex, then 0 (ii) limλ→0+ Aλx = A x for each x ∈ dom (A). Proof: (1) Let x ∈ dom (A). By (ii) of Theorem 1.2

|Aλx| ≤ kAxk

∗ Since {Aλx} is a bounded sequence in a reflexive Banach space (i.e., since X is uniformly convex, X∗ is reflexive and so is X), there exists a weak convergent sub- sequence {Aλn x}. Now we set y = w − limn→∞ Aλn x. Since from Theorem 1.2

Aλn x ∈ AJλn x and limn→∞ Jλn x = x and from Theorem 1.10 A is demiclosed, it follows that [x, y] ∈ A. Since by the lower-semicontinuity of norm this implies

kAxk ≤ |y| lim inf |Aλn x| ≤ lim sup |Aλn x| ≤ kAxk, n→∞ n→∞

0 we have |y| = kAxk = limn→∞ |Aλn x| and thus y ∈ A x. Next, since |F (Aλn x)| = ∗ |Aλn x| ≤ kAxk, F (Aλn x) is a bounded sequence in the reflexive Banach space X ∗ and has a weakly convergent subsequence F (Aλk x) of F (Aλn x). If we set y = w − limk→∞ F (Aλk x), then it follows from the dissipativity of A that

−1 Re hAλk x − y, F (Aλk x)i = λn Re hAλk x − y, F (Jλk xx)i ≤ 0,

2 2 ∗ or equivalently |Aλk x| ≤ Re hy, F (Aλn x)i. Letting k → ∞, we obtain |y| ≤ Re hy, y i. Combining this with

∗ |y | ≤ lim |F (Aλ x)| = lim |Aλ x| = |y|, k→∞ k k→∞ k we have |y∗| ≤ Re hy, y∗i ≤ |hy, y∗i| ≤ |y||y∗| ≤ |y|2. Hence, hy, y∗i = |y|2 = |y∗|2 ∗ and we have y = F (y). Also, limk→∞ |F (Aλk x)| = |y| = |F (y)|. It thus follows from the uniform convexity of X∗ that

lim F (Aλ x) = F (y). k→∞ k

106 Since Ax is a closed convex set of X from Theorem , we can show that x → F (A0x) is single-valued. In fact, if C is a closed convex subset of X, the y is an element of minimal norm in C, if and only if |y| ≤ |(1 − α)y + α z| for all z ∈ C and 0 ≤ α ≤ 1. Hence, hz − y, yi+ ≥ 0. and from Theorem 1.10 0 ≤ Re hz − y, fi = Re hz, fi − |y|2 (5.3)

0 for all z ∈ C and f ∈ F (y). Now, let y1, y2 be arbitrary in A x. Then, from (5.3) 2 |y1| ≤ Re hy2,F (y1)i ≤ |y1||y2| 2 which implies that hy2,F (y1)i = |y2| and |F (y1)| = |y2|. Therefore, F (y1) = F (y2) as desired. Thus, we have shown that for every sequence {λ} of positive numbers that converge to zero, the sequence {F (Aλx)} has a subsequence that converges to the same 0 0 limit F (A x). Therefore, limλ→0 F (Aλx) → F (A x). Furthermore, we assume that X is uniformly convex. We have shown above that for x ∈ dom (A) the sequence {Aλ} contains a weak convergent subsequence {Aλn x} 0 and if y = w − limn→∞ Aλn x then [x, y] ∈ A and |y| = limn→∞ |Aλn x|. But since X is uniformly convex, it follows from Theorem 1.10 that A0 is single-valued and thus 0 0 0 y = A x. Hence, w − limn→∞ Aλn x = A x and limn→∞ |Aλn x| = |A x|. Since X is 0 uniformly convex, this implies that limn→∞ Aλn x = A x.  Theorem 1.12 Let X is a uniformly convex Banach space and let A be m-dissipative. Then dom (A) is a convex subset of X. Proof: It follows from Theorem 1.4 that

|Jλx − x| ≤ λ kAxk for x ∈ dom (A) + Hence |Jλx − x| → 0 as λ → 0 . Since Jλx ∈ dom (A) for X ∈ X, it follows that + dom (A) = {x ∈ X : |Jλx − x| → 0 as λ → 0 }.

Let x1, x2 ∈ dom (A) and 0 ≤ α ≤ 1 and set

x = α x1 + (1 − α) x2. Then, we have |Jλx − x1| ≤ |x − x1| + |Jλx1 − x1| (5.4) |Jλx − x2| ≤ |x − x2| + |Jλx2 − x2| where x − x1 = (1 − α)(x2 − x1) and x − x2 = α (x1 − x2). Since {Jλx} is a bounded set and a uniformly convex Banach space is reflexive, it follows that there exists a subsequence {Jλn x} that converges weakly to z. Since the norm is weakly lower semi- continuous, letting n → ∞ in (5.4) with λ = λn, we obtain

|z − x1| ≤ (1 − α) |x1 − x2|

|z − x2| ≤ α |x1 − x2|

107 Thus, |x1 − x2| = |(x1 − z) + (z − x2)| ≤ |x1 − z| + |z − x2| ≤ |x1 − x2| and therefore |x1 −z| = (1−α) |x1 −x2|, |z −x2| = α |x1 −x2| and |(x1 −z)+(z −x2)| =

|x1 − x2|. But, since X is uniformly convex we have z = x and w − lim Jλn x = x as n → ∞. Since we also have

|x − x1| ≤ lim inf |Jλ x − Jλ x1| ≤ |x − x1| n→∞ n n

|Jλn x − Jλn x1| → |x − x1| and w − lim Jλn x − Jλn x1 = x − x1 as n → ∞. Since X is uniformly convex, this implies that limλ→0+ Jλx = x and x ∈ dom (A). 

5.0.2 Generation of Nonlinear Semigroups In this section, we consider the generation of nonlinear semigroup by Crandall-Liggett on a Banach space X. Definition 2.1 Let X0 be a subset of X. A semigroup S(t), t ≥ of nonlinear contractions on X0 is a function with domain [0, ∞) × X0 and range in X0 satisfying the following conditions:

S(t + s)x = S(t)S(s)x and S(0)x = x for x ∈ X0, t, s ≥ 0

t → S(t)x ∈ X is continuous

|S(t)x − S(t)y| ≤ |x − y| for t ≥ 0, x, y ∈ X0

−1 Let A be a ω-dissipative operator and Jλ = (I − λ A) is the resolvent. The following estimate plays an essential role in the Crandall-Liggett generation theory.

Lemma 2.1 Assume a sequence {an,m} of positive numbers satisfies

an,m ≤ α an−1,m−1 + (1 − α) an−1,m (5.5)

µ and a0,m ≤ m λ and an,0 ≤ n µ for λ ≥ µ > 0 and α = λ . Then we have the estimate

2 2 1 2 1 an,m ≤ [(mλ − nµ) + mλ ] 2 + [(mλ − nµ) + nλµ] 2 . (5.6)

Proof: From the assumption, (7.2) holds for either m = 0 or n = 0. We will use the induction in n, m, that is if (7.2) holds for (n + 1, m) when (7.2) is true for (n, m) and (n, m − 1), then (7.2) holds for all (n, m). Let β = 1 − α. We assume that (7.2) holds for (n, m) and (n, m − 1). Then, by (7.1) and Cauchy-Schwarz inequality 1 2 2 1 αx + βy ≤ (α + β) 2 (αx + βy ) 2

an+1,m ≤ α an,m−1 + β an,m

2 2 1 2 1 ≤ α ([((m − 1)λ − nµ) + (m − 1)λ ] 2 + [((m − 1)λ − nµ) + nλµ] 2 ) 2 2 1 2 1 +β ([(mλ − nµ) + mλ ] 2 + [(mλ − nµ) + nλµ] 2 )

1 2 2 2 2 1 = (α + β) 2 (α [((m − 1)λ − nµ) + (m − 1)λ ] + β [(mλ − nµ) + mλ ]) 2 1 2 2 1 +(α + β) 2 (α [((m − 1)λ − nµ) + nλµ] + β [(mλ − nµ) + nλµ]) 2

2 2 1 2 1 ≤ [(mλ − (n + 1)µ) + mλ ] 2 + [(λ − (n + 1)µ) + (n + 1)λµ] 2 .

108 Here, we used α + β = 1, αλ = µ and α [((m − 1)λ − nµ)2 + (m − 1)λ2] + β [(mλ − nµ)2 + mλ2])

≤ (mλ − nµ)2 + mλ2 − αλ(mλ − nµ) = (mλ − (n + 1)µ)2 + mλ2 − µ2

α [((m − 1)λ − nµ)2 + nλµ] + β [(mλ − nµ)2 + nλµ]

2 2 2 ≤ (mλ − nµ) + (n + 1)λµ − 2αλ(mλ − nµ) ≤ (mλ − (n + 1)µ) + (n + 1)λµ − µ . Theorem 2.2 Assume A be a dissipative subset of X × X and satisfies the range condition dom (A) ⊂ R(I − λ A) for all sufficiently small λ > 0. (5.7) Then, there exists a semigroup of type ω on S(t) on dom (A) that satisfies for x ∈ dom (A) −[ t ] S(t)x = lim (I − λ A) λ x, t ≥ 0 (5.8) λ→0+ and |S(t)x − S(t)x| ≤ |t − s| kAxk for x ∈ dom (A), and t, s ≥ 0.

Proof: First, note that from (7.3) dom (A) ⊂ dom (Jλ). Let x ∈ dom (A) and set n m an,m = |Jµ x − Jλ x| for n, m ≥ 0. Then, from Theorem 1.4

m 2 m−1 m a0,m = |x − Jλ x| ≤ |x − Jλx| + |Jλx − Jλx| + ··· + |Jλ x − Jλ x|

≤ m |x − Jλx| ≤ mλ kAxk.

n Similarly, an,0 = |Jµ x − x| ≤ nµ kAxk. Moreover, µ λ − µ  a = |J nx − J mx| ≤ |J nx − J J m−1x + J mx | n,m µ λ µ µ λ λ λ λ

µ λ − µ ≤ |J n−1x − J m−1x| + |J n−1x − J mx| λ µ λ λ µ λ

= α an−1,m−1 + (1 − α) an−1,m. It thus follows from Lemma 2.1 that

t t [ µ ] [ ] 1 λ 2 2 |Jµ x − Jλ x| ≤ 2(λ + λt) kAxk. (5.9)

t [ λ ] + Thus, Jλ x converges to S(t)x uniformly on any bounded intervals, as λ → 0 . Since t [ λ ] Jλ is non-expansive, so is S(t). Hence (5.8) holds. Next, we show that S(t) satisfies the semigroup property S(t + s)x = S(t)S(s)x for x ∈ dom (A) and t, s ≥ 0. Letting µ → 0+ in (??), we obtain

[ t ] 1 λ 2 2 |T (t)x − Jλ x| ≤ 2(λ + λt) . (5.10)

s [ λ ] for x ∈ dom (A). If we let x = Jλ z, then x ∈ dom (A) and

[ s ] [ t ] [ s ] 1 λ λ λ 2 2 |S(t)Jλ z − Jλ Jλ z| ≤ 2(λ + λt) kAzk (5.11)

109 s [ λ ] t+s t s where we used that kAJλ zk ≤ kAzk for z ∈ dom (A). Since [ λ ] − ([ λ ] + [ λ ]) equals 0 or 1, we have t+s t s [ λ ] [ λ ] [ λ ] |Jλ z − Jλ Jλ z| ≤ |Jλz − z| ≤ λ kAzk. (5.12) It thus follows from (5.10)–(5.12) that

t+s t+s t s [ λ ] [ λ ] [ λ ] [ λ ] |S(t + s)z − S(t)S(s)z| ≤ |S(t + s)x − Jλ z| + |Jλ z − Jλ Jλ z|

t s s s [ λ ] [ λ ] [ λ ] [ λ ] +|Jλ Jλ z − S(t)Jλ z| + |S(t)Jλ z − S(t)S(s)z| → 0 as λ → 0+. Hence, S(t + s)z = S(t)S(s)z. Finally, since [ t ] t |J λ x − x| ≤ [ ] |J x − x| ≤ t kAxk λ λ λ we have |S(t)x − x| ≤ t kAxk for x ∈ dom (A). From this we obtain |S(t)x − x| → 0 as t → 0+ for x ∈ dom (A) and also

|S(t)x − S(s)x| ≤ |S(t − s)x − x| ≤ (t − s) kAxk for x ∈ dom (A) and t ≥ s ≥ 0. 

6 Cauchy Problem

Definition 3.1 Let x0 ∈ X and ω ∈ R. Consider the Cauchy problem d u(t) ∈ Au(t), u(0) = x . (6.1) dt 0 (1) A continuous function u(t) : [0,T ] → X is called a strong solution of (6.1) if u(t) is Lipschitz continuous with u(0) = x0, strongly differentiable a.e. t ∈ [0,T ], and (6.1) holds a.e. t ∈ [0,T ]. (2) A continuous function u(t) : [0,T ] → X is called an integral solution of type ω of (6.1) if u(t) satisfies

Z t |u(t) − x| − |u(tˆ) − x| ≤ (ω |u(s) − x| + hy, u(s) − xi+) ds (6.2) tˆ for all [x, y] ∈ A and t ≥ tˆ∈ [0,T ]. Theorem 3.1 Let A be a dissipative subset of X × X. Then, the strong solution to (6.1) is unique. Moreover if the range condition (7.3) holds, then then the strong solution u(t) : [0, ∞) → X to (6.1) is given by

−[ t ] u(t) = lim (I − λ A) λ x for x ∈ dom (A) and t ≥ 0. λ→0+

Proof: Let ui(t), i = 1, 2 be the strong solutions to (6.1). Then, t → |u1(t) − u2(t)| is Lipschitz continuous and thus a.e. differentiable t ≥ 0. Thus d |u (t) − u (t)|2 = 2 hu0 (t) − u0 (t), u (t) − u (t)i dt 1 2 1 2 1 2 i

110 0 a.e. t ≥ 0. Since ui(t) ∈ Aui(t), i = 1, 2 from the dissipativeness of A, we have d 2 dt |u1(t) − u2(t)| ≤ 0 and therefore Z t 2 d 2 |u1(t) − u2(t)| ≤ |u1(t) − u2(t)| dt ≤ 0, 0 dt which implies u1 = u2. −[ t ] −1 For 0 < 2λ < s let uλ(t) = (I − λ A) λ x and define gλ(t) = λ (u(t) − u(t − λ)) − 0 u (t) a.e. t ≥ λ. Since limλ→0+ |gλ| = 0 a.e. t > 0 and |gλ(t)| ≤ 2M for a.e. t ∈ [λ, s], R s where M is a Lipschitz constant of u(t) on [0, s], it follows that limλ→0+ λ |gλ(t)| dt = 0 by Lebesgue dominated convergence theorem. Next, since

0 u(t − λ) + λ gλ(t) = u(t) − λ u (t) ∈ (I − λ A)u(t),

−1 we have u(t) = (I − λ A) (u(t − λ) + λ gλ(t)). Hence,

−[ t−λ ] |uλ(t) − u(t)| ≤ |(I − λ A) λ x − u(t − λ) − λ gλ(t)|

≤ |uλ(t − λ) − u(t − λ)| + λ |gλ(t)| a.e. t ∈ [λ, s]. Integrating this on [λ, s], we obtain

Z s Z λ Z s −1 −1 λ |uλ(t) − u(t)| dt ≤ λ |uλ(t) − u(t)| dt + |gλ(t)| dt. s−λ 0 λ

Letting λ → 0+, it follows from Theorem 2.2 that |S(s)x−u(s)| = 0 since u is Lipschitz continuous, which shows the desired result.  In general, the semigroup S(t) generated on dom (A) in Theorem 2.2 in not neces- sary strongly differentiable. In fact, an example of an m-dissipative A satisfying (7.3) is given, for which the semigroup constructed in Theorem 2.2 is not even weakly dif- ferentiable for all t ≥ 0. Hence, from Theorem 3.1 the corresponding Cauchy problem (6.1) does not have a strong solution. However, we have the following. Theorem 3.2 Let A be an ω-dissipative subset satisfying (7.3) and S(t), t ≥ 0 be the semigroup on dom (A), as constructed in Theorem 2.2. Then, the followings hold. (1) u(t) = S(t)x on dom (A) defined in Theorem 2.2 is an integral solution of type ω to the Cauchy problem (6.1). (2) If v(t) ∈ C(0,T ; X) be an integral of type ω to (6.1), then |v(t)−u(t)| ≤ eωt |v(0)− u(0)|. (3) The Cauchy problem (6.1) has a unique solution in dom (A) in the sense of Definition 2.1.

Proof: A simple modification of the proof of Theorem 2.2 shows that for x0 ∈ dom (A)

−[ t ] S(t)x0 = lim (I − λ A) λ x0 λ→0+ exists and defines the semigroup S(t) of nonlinear ω-contractions on dom (A), i.e.,

|S(t)x − S(t)y| ≤ eωt |x − y| for t ≥ 0 and x, y ∈ dom (A).

111 For x0 ∈ dom (A) we define for λ > 0 and k ≥ 1

k −1 k k−1 k−1 k yλ = λ (Jλ x0 − Jλ x0) = AλJλ x0 ∈ AJλ . (6.3)

k k Since A is ω-dissipative, hyλ,k − y, Jλ x0 − x)i− ≤ ω |Jλ x0 − x)| for [x, y] ∈ A. Since from Lemma 1.1 (4) hy, xi− − hz, xi+ ≤ hy − z, xi−, it follows that

k k k k hyλ,Jλ x0 − xi− ≤ ω |Jλ x0 − x)| + hy, Jλ x0 − xi+ (6.4)

Since from Lemma 1.1 (3) hx + y, xi− = |x| + hy, xi−, we have

k k k k−1 k k k−1 hλyλ,Jλ x0 − x)i− = |Jλ x0 − x| + h−(Jλ − x),Jλ x0 − xi≥|Jλ x0 − x)| − |Jλ x0 − x|. It thus follows from (6.4) that

k k−1 k k |Jλ x0 − x| − |Jλ x0 − x| ≤ λ (ω |Jλ x0 − x| + hy, Jλ x0 − xi+).

[ t ] k Since J λ = Jλ on t ∈ [kλ, (k + 1)λ), this inequality can be written as

Z (k+1)λ t t k k−1 [ λ [ λ ] |Jλ x0 − x| − |Jλ x0 − x| ≤ (ω |Jλ x0 − x| + hy, Jλ x0 − xi+ dt. kλ

tˆ t Hence, summing up this in k from k = [ λ ] + 1 to [ λ ] we obtain

t t tˆ Z [ λ ]λ s s [ λ ] [ λ ] [ λ ] [ λ ] |J x0 − x| − |J x0 − x| ≤ (ω |J x0 − x| + hy, J x0 − xi+ ds. λ λ tˆ λ λ [ λ ]λ

s [ λ ] −k kλω Since |Jλ x0| ≤ (1 − λω) |x0| ≤ e |x0|, by Lebesgue dominated convergence the- + orem and the upper semicontinuity of h·, ·i+, letting λ → 0 we obtain Z t |S(t)x0 − x| − |S(tˆ)x0 − x| ≤ (ω |S(s)x0 − x| + hy, S(t)x0 − xi+) ds (6.5) tˆ for x0 ∈ dom (A). Similarly, since S(t) is Lipschitz continuous on dom (A), again by Lebesgue dominated convergence theorem and the upper semicontinuity of h·, ·i+, (6.5) holds for all x0 ∈ dom (A). λ k (2) Let v(t) ∈ C(0,T ; X) be an integral of type ω to (6.1). Since [Jk x0, yλ] ∈ A, it follows from (6.2) that

Z t k ˆ k k k k |v(t) − Jλ x0| − |v(t) − Jλ x0| ≤ (ω |v(s) − Jλ x0| + hyλ, v(s) − Jλ x0i+) ds. (6.6) tˆ

k k k−1 Since λ yλ = −(v(s) − Jλ x0) + (v(s) − Jλ x0) and from Lemma 1.1 (3) h−x + y, xi+ = −|x| + hy, xi+, we have

k k k k−1 k k k−1 hλyλ, v(s)−Jλ x0i = −|v(s)−Jλ x0|+hv(s)−Jλ x0, v(s)−Jλ x0i+ ≤ −|v(s)−Jλ x0|+|v(s)−Jλ x0|. Thus, from (6.6)

Z t k ˆ k k k k−1 (|v(t)−Jλ x0|−|v(t)−Jλ x0|)λ ≤ (ωλ |v(s)−Jλ x0|−|v(s)−Jλ x0|+|v(s)−Jλ x0|) ds tˆ

112 τˆ τ Summing up the both sides of this in k from [ λ ] + 1 to [ λ ], we obtain

τ Z [ λ ]λ σ σ [ λ ] [ λ ] (|v(t) − J x0| − |v(tˆ− J x0| dσ τˆ λ λ [ λ ]λ

τ Z t τ τˆ Z [ λ ]λ σ [ λ ] [ λ ] [ λ ] ≤ (−|v(s) − Jλ x0| + |v(s) − Jλ x0| + ω |v(s) − Jλ x0| dσ) ds. ˆ τˆ t [ λ ]λ Now, by Lebesgue dominated convergence theorem, letting λ → 0+ Z τ Z t (|v(t) − u(σ)| − |v(tˆ− u(σ)|) dσ + (|v(s) − u(τ)| − |v(s) − u(ˆτ)|) ds τˆ tˆ (6.7) Z t Z τ ≤ ω |v(s) − u(σ)| dσ ds. tˆ τˆ

For h > 0 we define Fh by Z t+h Z t+h −2 Fh(t) = h |v(s) − u(σ)| dσ ds. t t d Then from (6.7) we have F (t) ≤ ω F (t) and thus F (t) ≤ eωtF (0). Since u, v are dt h h h h + continuous we obtain the desired estimate by letting h → 0 .  Lemma 3.3 Let A be an ω-dissipative subset satisfying (7.3) and S(t), t ≥ 0 be the semigroup on dom (A), as constructed in Theorem 2.2. Then, for x0 ∈ dom (A) and [x, y] ∈ A

Z t 2 2 2 |S(t)x0 − x| − |S(tˆ)x0 − x| ≤ 2 (ω |S(s)x0 − x| + hy, S(s)x0 − xis ds (6.8) tˆ and for every f ∈ F (x0 − x)

S(t)x0 − x0 2 lim sup Re h , fi ≤ ω |x0 − x| + hy, x0 − xis. (6.9) t→0+ t

λ k Proof: Let yk be defined by (6.3). Since A is ω-dissipative, there exists f ∈ F (Jλ x0 −x) k k 2 such that Re (yλ − y, fi ≤ ω |Jλ x0 − x| . Since

k −1 k k−1 Re hyλ, fi = λ Re hJλ x0 − x − (Jλ x0 − x), fi

−1 k 2 k−1 k −1 k 2 k−1 2 ≥ λ (|Jλ x0 − x| − |Jλ x0 − x||Jλ x0 − x|) ≥ (2λ) (|Jλ x0 − x| − |Jλ x0 − x| ), we have from Theorem 1.4

k 2 k−1 2 k k 2 k |Jλ x0 − x| − |Jλ x0 − x| ≤ 2λ Re hyλ, fi ≤ 2λ (ω |Jλ x0 − x| + hy, Jλ x0 − xis.

t [ λ ] k Since Jλ x0 = Jλ x0 on [kλ, (k + 1)λ), this can be written as

Z (k+1)λ t t k 2 k−1 2 [ λ 2 [ λ ] |Jλ x0 − x| − |Jλ x0 − x| ≤ (ω |Jλ x0 − x| + hy, Jλ x0 − xis) dt. kλ

113 Hence,

t t tˆ Z [ λ ]λ s s [ λ ] 2 [ λ ] 2 [ λ ] 2 [ λ ] |J x0 − x| − |J x0 − x| ≤ 2λ (ω |J x0 − x| + hy, J x0 − xis) ds. λ λ tˆ λ λ [ λ ]λ

s [ λ ] −k kλω Since |Jλ x0| ≤ (1 − λω) |x0| ≤ e |x0|, by Lebesgue dominated convergence the- + orem and the upper semicontinuity of h·, ·is, letting λ → 0 we obtain (6.8). Next, we show (6.9). For any given f ∈ F (x0 − x) as shown above

2 2 2Re hS(t)x0 − x0, fi ≤ |S(t)x0 − x| − |x0 − x| .

Thus, from (6.8)

Z t 2 Re hS(t)x0 − x0, fi ≤ (ω |S(s)x0 − x| + hy, S(s)x0 − xis) ds 0

Since s → S(s)x0 is continuous, by the upper semicontinuity of h·, ·is, we have (6.9).  Theorem 3.4 Assume that A is a close dissipative subset of X × X and satisfies the range condition (7.3) and let S(t), t ≥ 0 be the semigroup on dom (A), defined in Theorem 2.2. Then, if S(t)x is strongly differentiable at t0 > 0 then d S(t )x ∈ dom (A) and S(t)x | ∈ AS(t)x, 0 dt t=t0 and moreover

0 d 0 S(t0)x ∈ dom (A ) and S(t)x t=t = A S(t0)x. dt 0

d |o(λ)| Proof: Let dt S(t)x |t=t0 = y . Then S(t0 − λ)x − (S(t0)x − λ y) = o(λ), where λ → + 0 as λ → 0 . Since S(t0 − λ)x ∈ dom (A), there exists a [xλ, yλ] ∈ A such that S(t0 − λ)x = xλ − λ yλ and

λ (y − yλ) = S(t0)x − xλ + o(λ). (6.10)

If we let x = xλ, y = yλ and x0 = S(t0)x in (6.9), then we obtain

2 Re hy, fi ≤ ω |S(t0)x − xλ| + hyλ,S(t0)x − xλis. for all f ∈ (S(t0)x − xλ). It follows from Lemma 1.3 that there exists a g ∈ F (S(t0)x − xλ) such that hyλ,S(t)x − xλis = Re hyλ, gi and thus

2 Re hy − yλ, gi ≤ ω |S(t0)x − xλ| .

From (6.10)

|o(λ)| λ−1|S(t )x − x |2 ≤ |S(t )x − x | + ω |S(t )x − x |2 0 λ λ 0 λ 0 λ and thus S(t0)x − xλ + (1 − λω) → 0 as λ → 0 . λ

114 Combining this with (6.10), we obtain

xλ → S(t0)x and yλ → y + as λ → 0 . Since A is closed, it follows that [S(t0)x, y] ∈ A, which shows the first assertion. Next, from Theorem 2.2

|S(t0 + λ)x − S(t0)x| ≤ λ kAS(t0)xk for λ > 0

This implies that |y| ≤ kAS(t0)xk. Since y ∈ AS(t0)x, it follows that S(t0)x ∈ 0 0 dom (A ) and y ∈ A S(t0)x.  Theorem 3.5 Let A be a dissipative subset of X × X satisfying the range condition (7.3) and S(t), t ≥ 0 be the semigroup on dom (A), defined in Theorem 2.2. Then the followings hold. (1) For ever x ∈ dom (A)

lim |Aλx| = lim inf |S(t)x − x|. λ→0+ t→0+

(2) Let x ∈ dom (A). Then, limλ→0+ |Aλx| < ∞ if and only if there exists a sequence {xn} in dom (A) such that limn→∞ xn = x and supn kAxnk < ∞.

Proof: Let x ∈ dom (A). From Theorem 1.4 |Aλx| is monotone decreasing and t [ λ ] lim |Aλx| exists (including ∞). Since |Jλ − x| ≤ t |Aλx|, it follows from Theorem 2.2 that |S(t)x − x| ≤ t lim |Aλx| λ→0+ thus 1 lim inf |S(t)x − x| ≤ lim |Aλx|. t→0+ t λ→0+ Conversely, from Lemma 3.3 1 − lim inf |S(t)x − x||x − u| ≤ hv, x − uis t→0+ t for [u, v] ∈ A. We set u = Jλx and v = Aλx. Since x − u = −λ Aλx,

1 2 − lim inf |S(t)x − x|λ|Aλx| ≤ −λ |Aλx| t→0+ t which implies 1 lim inf |S(t)x − x| ≥ |Aλx|. t→0+ t Theorem 3.7 Let A be a dissipative set of X × X satisfying the range condition (7.3) and let S(t), t ≥ 0 be the semigroup defined on dom (A) in Theorem 2.2. (1) If x ∈ dom (A) and S(t)x is differentiable a.e., t > 0, then u(t) = S(t)x, t ≥ 0 is a unique strong solution of the Cauchy problem (6.1). (2) If X is reflexive. Then, if x ∈ dom (A) then u(t) = S(t)x, t ≥ 0 is a unique strong solution of the Cauchy problem (6.1). Proof: The assertion (1) follows from Theorems 3.1 and 3.5. If X is reflexive, then since an X-valued absolute continuous function is a.e. strongly differentiable, (2) follows from (1). 

115 6.0.1 Infinitesimal generator

Definition 4.1 Let X0 be a subset of a Banach space X and S(t), t ≥ 0 be a semigroup −1 of nonlinear contractions on X0. Set Ah = h (T (h) − I) for h > 0 and define the strong and weak infinitesimal generators A0 and Aw by

A0x = limh→0+ Ahx with dom (A0) = {x ∈ X0 : limh→0+ Ahx exists}

A0x = w − limh→0+ Ahx with dom (A0) = {x ∈ X0 : w − limh→0+ Ahx exists}, (6.11) respectively. We define the set Dˆ by

Dˆ = {x ∈ X0 : lim inf |Ahx| < ∞} (6.12) h→0 Theorem 4.1 Let S(t), t ≥ 0 be a semigroup of nonlinear contractions defined on a closed subset X0 of X. Then the followings hold. ∗ ∗ (1) hAwx1 − Awx2, x i for all x1, x2 ∈ dom (Aw) and x ∈ F (x1 − x2). In particular, A0 and Aw are dissipative. (2) If X is reflexive, then dom (A0) = dom (Aw) = Dˆ. (3) If X is reflexive and strictly convex, then dom(Aw) = Dˆ. In addition, if X is uniformly convex, then dom (Aw) = dom (A0) = Dˆ and Aw = A0. ∗ Proof: (1) For x1, x2 ∈ X0 and x ∈ F (x1 − x2) we have

∗ −1 ∗ 2 hAhx1 − Ahx2, x i = h (hS(h)x1 − S(h)x2, x i − |x1 − x2| )

−1 2 ≤ h (|S(h)x1 − S(h)x2||x1 − x2| − |x1 − x2| ) ≤ 0.

Letting h → 0+, we obtain the desired inequality. (2) Obviously, dom (A0) ⊂ dom (Aw) ⊂ Dˆ. Let x ∈ Dˆ. It suffices to show that x ∈ dom (A0). We can show that t → S(t)x is Lipschitz continuous. In fact, there exists a monotonically decreasing sequence {tk} of positive numbers and L > 0 such that tk → 0 as k → ∞ and |S(tk)x − x| ≤ L tk. Let h > 0 and nk be a nonnegative integer such that 0 ≤ h − nktk < tk. Then we have

|S(t + h)x − S(t)x| ≤ |S(t)x − x| = |S(h − nktk + nktk)x − x|

≤ |S(h − nktk)x − x| + L nktk ≤ |S(h − nktk)x − x| + L h

By the strong continuity of S(t)x at t = 0, letting k → ∞, we obtain |S(t + h)x − S(t)x| ≤ L h. Now, since X is reflexive this implies that S(t)x is a.e. differentiable d on (0, ∞). But since S(t)x ∈ dom (A0) whenever dt S(t)x exists, S(t)x ∈ dom (A0) a.e. + t > 0. Thus, since |S(t)x − x| → 0 as t → 0 , it follows that x ∈ dom (A0). (3) Assume that X is reflexive and strictly convex. Let x0 ∈ Dˆ and Y be the set of all −1 + weak cluster points of t (S(t)x0 − x0) as t → 0 . Let A˜ be a subset of X × X defined by A˜ = A0 ∪ [x0, co Y ] and dom (A˜) = dom (A0) ∪ {x0} where co Y denotes the closure of the convex hull of Y . Note that from (1)

∗ ∗ hA0x1 − A0x2, x i ≤ 0 for all x1, x2 ∈ dom (A0) and x ∈ F (x1 − x2)

116 and for every y ∈ Y

∗ ∗ hA0x1 − y, x i ≤ 0 for all x1 ∈ dom (A0) and x ∈ F (x1 − x0).

This implies that A˜ is a dissipative subset of X×X. But, since X is reflexive, t → S(t)x0 is a.e. differentiable and d S(t)x = A S(t)x ∈ AS˜ (t)x , a.e., t > 0. dt 0 0 0 0

It follows from the dissipativity of A˜ that d h (S(t)x − x ), x∗i ≤ hy, x∗i, a.e, t > 0 and y ∈ Ax˜ (6.13) dt 0 0 0 ∗ for x ∈ F (S(t)x0 − x0). Note that for h > 0

−1 ∗ −1 ∗ ∗ hh (S(t+h)x0−S(t)x0), x i ≤ h (|S(t+h)x0−x0|−|S(t)x0−x0|)|x | for x ∈ F (S(t)x0−x0).

Letting h → 0+, we have d d h (S(t)x − x ), x∗i ≤ |S(t)x − x | |S(t)x − x |, a.e. t > 0 dt 0 0 0 0 dt 0 0 The converse inequality follows much similarly. Thus, we have d d |S(t)x − x | |S(t)x − x | = h (S(t)x − x ), x∗i for x∗ ∈ F (S(t)x − x ). (6.14) 0 0 dt 0 0 dt 0 0 0 0 d It follows from (6.13)–(6.14) that |S(t)x − x | ≤ |y| for y ∈ Y and a.e. t > 0 and dt 0 0 thus |S(t)x0 − x0| ≤ t kAx˜ 0k for all t > 0. (6.15)

Note that Ax˜ 0 = co Y is a closed convex subset of X. Since X is reflexive and strictly convex, there exists a unique element y0 ∈ Ax˜ 0 such that |y0| = kAx˜ 0k. Hence, (6.15) implies that co Y = y0 = Awx0 and therefore x0 ∈ dom (Aw). Next, we assume that X is uniformly convex and let x0 ∈ dom(Aw) = Dˆ. Then S(t)x − x w − lim 0 0 = y as t → 0+. t 0 From (6.15) −1 |t (S(t)x0 − x0)| ≤ |y0|, a.e. t > 0. Since X is uniformly convex, these imply that

S(t)x − x lim 0 0 = y as t → 0+. t 0 which completes the proof.  Theorem 4.2 Let X and X∗ be uniformly convex Banach spaces. Let S(t), t ≥ 0 be the semigroup of nonlinear contractions on a closed subset X0 and A0 be the infinitesimal generator of S(t). If x ∈ dom (A0), then

117 (i) S(t)x ∈ dom (A0) for all t ≥ 0 and the function t → A0S(t)x is right continuous on [0, ∞). d+ d+ (ii) S(t)x has a right derivative dt S(t)x for t ≥ 0 and dt S(t)x = A0S(t)x, t ≥ 0. d (iii) dt S(t)x exists and is continuous except a countable number of values t ≥ 0.

Proof: (i) − (ii) Let x ∈ dom (A0). By Theorem 4.1, dom (A0) = Dˆ and thus S(t)x ∈ dom (A0) and d+ S(t)x = A S(t)x for t ≥ 0. (6.16) dt 0 d Moreover, t → S(t)x a.e. differentiable and dt S(t)x = A0S(t)x a.e. t > 0. We next prove that A0S(t)x is right continuous. For h > 0 d (S(t + h)x − S(t)x) = A S(t + h)x − A S(t)x, a.e. t > 0. dt 0 0 From (6.14)

d |S(t + h)x − S(t)x| |S(t + h)x − S(t)x| = hA S(t + h)x − A S(t)x, x∗i ≤ 0 dt 0 0 ∗ for all x ∈ F (S(t + h)x − S(t)x), since A0 is dissipative. Integrating this over [s, t], we obtain |S(t + h)x − S(t)x| ≤ |S(s + h)x − S(s)x| for 0 ≤ s ≤ t and therefore d+ d+ | S(t)x| ≤ | S(s)x|. dt ds

Hence t → |A0S(t)x| is monotonically non-increasing function and thus it is right continuous. Let t0 ≥ 0 and let {tk} be a decreasing sequence of positive numbers such that tk → t0. Without loss of generality, we may assume that w−limk→∞ A0S(tk) = y0. The right continuity of |A0S(t)x| at t = t0, thus implies that

|y0| ≤ |A0S(t0)x| (6.17) since norm is weakly lower semicontinuous. Let A˜0 be the maximal dissipative exten- sion of A0. It then follows from Theorem 1.9 that A˜ is demiclosed and thus y0 ∈ AS˜ (t)x. On the other hand, for x ∈ dom (A0) and y ∈ Ax˜ , we have d h (S(t)x − x), x∗i ≤ hy, x∗i for all x∗ ∈ F (S(t)x − x) dt ˜ d ˜ a.e. t > 0, since A is dissipative and dt S(t)x = A0S(t)x ∈ AS(t)x a.e. t > 0. From (6.14) we have t−1|S(t)x − x| ≤ |Ax˜ | = kA˜0xk for t ≥ 0 0 0 where A˜ is the minimal section of A˜. Hence A0x = A˜ x. It thus follows from (6.17) that y0 = A0S(t0)x and limk→∞ A0S(tk)x = y0 since X is uniformly convex. Thus, we have proved the right continuity of A0S(t)x for t ≥ 0. (iii) Integrating (6.16) over [t, t + h], we have

Z t+h S(t + h)x − S(t)x = A0S(s)x ds t

118 for t, h ≥ 0. Hence it suffices to prove that the function t → A0S(t)x is continuous except a countable number of t > 0. Using the same arguments as above, we can show that if |A0S(t)x| is continuous at t = t0, then A0S(t)x is continuous at t = t0. But since |A0S(t)x| is monotone non-increasing, it follows that it has at most countably many discontinuities, which completes the proof.  Theorem 4.3 Let X and X∗ be uniformly convex Banach spaces. If A be m-dissipative, then A is demiclosed, dom (A) is a closed convex set and A0 is single-valued operator with dom (A0) = dom (A). Moreover, A0 is the infinitesimal generator of a semigroup of contractions on dom (A). Proof: It follows from Theorem 1.9 that A is demiclosed. The second assertion follows from Theorem 1.12. Also, from Theorem 3.4 d S(t)x = A0S(t)x, a.e. t > 0 dt and |S(t)x − x| ≤ t |A0x|, t > 0 (6.18) for x ∈ dom (A). Let A0 be the infinitesimal generator of the semigroup S(t), t ≥ 0 generated by A defined in Theorem 2.2. Then, (6.18) implies that by Theorem 4.1 d+ x ∈ dom (A0) and by Theorem 4.2 dt S(t)x = A0S(t)x and A0S(t)x is right continuous in t. Since A is closed, A0x = lim A0S(t)x ∈ Ax. t→0+ 0 Hence, (6.17) implies that A0x = A x. When X is a Hilbert space we have the nonlinear version of Hille-Yosida theorem as follows. Theorem 4.4 Let H be a Hilbert space. Then, (1) The infinitesimal generator A0 of a semigroup of contractions S(t), t ≥ 0 on a closed convex set X0 has a dense domain in X0 and there exists a unique maximal 0 dissipative operator A such that A = A0. Conversely, (2) If A0 is a maximal dissipative operator, then dom (A) is a closed convex set and A0 is the infinitesimal generator of contractions on dom (A). Proof: (2) Since from Theorem 1.7 the maximal dissipative operator in a Hilbert space is m-dissipative, (2) follows from Theorem 4.3.  Example (Nonlinear Diffusion) Consider the nonlinear diffusion equation of the form

ut = Au = ∆γ(u) − β(u) on X = L1(Ω). Assume γ : R → R is maximal monotone. and γ : R → R is monotone. Let

dom(A) = {there exists a v ∈ W 1,1(Ω) such that v ∈ γ(u) and ∆v ∈ X}

Thus, sign(x − y) = sign(γ(x) − γ(y)). Let ρ ∈ C2(R) be a monotonically increasing x function satisfying ρ(0) = 0 and ρ(x) = sign(x), |x| ≥ 1 and ρ(x) = ρ(  ) for  > 0. Note that for u ∈ X

(u, ρ(u)) → |u| and (ψ, ρ(u)) → (ψ, sign0(u)) for ψ ∈ X

119 as  → 0+.

0 (Au1 − Au2, ρ(γ(u1) − γ(u2))) = −(∇(γ(u1) − γ(u2))ρ)∇(γ(u1) − γ(u2)))

−(β(u1) − β(u2), ρ(γ(u1) − γ(u2)) ≤ 0

Letting  → 0∗ we obtain

(Au1 − Au2, sign0(u1 − u2)) ≤ 0 for all u1, u2 ∈ dom(A). For the range condition: we consider v ∈ γ(u) such that

λ γ−1(v) − ∆v + β(γ−1(v)) = f, (6.19)

Let ∂j = λ γ−1(·) − β(γ−1(·)). For f ∈ L2(Ω) consider the minimization

1 Z (|∇v|2 + j(v) − f(x)v) dx 2 Ω

1 2 1 over v ∈ H0 (Ω). It has a unique solution v ∈ H (Ω) ∩ H0 (Ω) such that

∆v + f ∈ ∂j(v).

2 For f ∈ X we choose fn ∈ L (Ω) such that |fn − f|X → 0 as n → ∞. As show above un = ∆vn + fn, |un − um|X ≤ M |fn − fm|X

Thus, there exists u ∈ X such that ∆vn → u − f. Moreover, there exists a v ∈ 1,q d W , 1 < q < d−1 such that vn → v in X and ∆v = u − f. In fact, let p > d. For all p p d h0 ∈ L (Ω) and ~h ∈ L (Ω) −∆φ = h0 + ∇ · ~h has a unique solution u ∈ W 1,p(Ω) and ~ |φ|W 1,p ≤ M (|h|p + |h|p).

By the Green’s formula

|(h0, vn) − (h, ∇vn)| = | − (∆vn, φ)| ≤ M (|h|p + |~h|p|)|∆vn|1.

p p d Since h0 ∈ L (Ω) and h ∈ L (Ω) are arbitraly

|vn|W 1,q ≤ M |∆vn|,

Since W 1,q(Omega) is compactly embedded into L1(Ω),

1,q vn → v, v ∈ W0 (Ω).

Since ∂j is maximal monotone u ∈ ∂j(v) equivalently u ∈ γ−1(v) and ∆v + f ∈ ∂j(v).

120 Example (Conservation law) We consider the scalar conservation law

d ut + (f(u))x + f0(x, u) = 0, t > 0 u(x, 0) = u0(x), x ∈ R (6.20) where f : R → Rd is C1. Let X = L1(Rd) and define

Au = −(f(u))x, where we assume f0 = 0 for the sake of simplicity of our presentation. Define Let

C = {φ ∈ Z : φ ≥ 0}.

Since Ac = 0 for all constant c, it follows that

φ − c ∈ C =⇒ (I − λ A)−1φ − c ∈ C.

Similarly, c − φ ∈ C =⇒ c − (I − λ A)−1φ ∈ C. Thus, without loss of generality, one can assume f is bounded. We use the following lemma. 1 d 2 2 d d 2 d Lemma 2.1 For ψ ∈ H (R ) and φ ∈ Ldiv = {φ ∈ (L (R )) : ∇ · φ ∈ L (R )} we have (φ, ∇ψ) + (∇ · φ, ψ) = 0 ∞ d Proof: Note that for ζ ∈ C0 (R ) (φ, ∇(ζ ψ)) + (∇ · φ, ζ ψ) = 0

∞ d x Let g ∈ C (R ) satisfying g = 1 for |x| ≤ 1 and g = 0 if |x| ≥ 2 and set ζ = g( r ). Then we have 1 (φ, ζ∇ψ + ψ∇g) + (∇ · φ, ζ ψ) = 0. r x d Since g( r ) → 1 a.e. in R as r → ∞ thus the lemma follows from Fatou’s lemma. . Note that

0 −(f(u1)x − f(u2)x, ρ(u1 − u2)) = (f(u1) − f(u2), ρ (u1 − u2)(u1 − u2)x),

R x 0 x If we define Ψ(x) = 0 σρ (σ) dσ and ρ(x) = ρ(  ) for  > 0. then u − u |(η (u − u ), ρ0 (u − u )(u − u ) )| =  (Ψ( 1 2 ), η ) ≤ M  |η | → 0 1 2  1 2 1 2 x  x x 1 as  → 0. Note that for u ∈ L1(Rd)

1 d (u, ρ(u)) → |u| and (ψ, ρ(u)) → (ψ, sign0(u)) for ψ ∈ L (R ) as  → 0+. Thus, hAu1 − Au2, sign0(u1 − u2)i ≤ 0 and A is dissipative. It will be show that range(λ I − A) = X,

121 i.e., for any g ∈ X there exists an entropy solution satisfying

(sign(u − k)(λu − g), ψ) ≤ (sing(u − k)(f(u) − f(k)), ψx)

1 d 1 d for all ψ ∈ C0 (R ) and k ∈ R. Hence A has a maximal monotone extension in L (R ). In fact, for  > 0 consider the viscous equation

λ u −  ∆u + f(u)x = g (6.21)

First, assume f is Lipschitz continuos, one can show that

1 d 1 d ∗ λ u − ∆u − f(u)x : H (R ) → (H (R )) is monotone, hemi-continuous and coercive. It thus follows from [] that (6.21) has a 1 d 2 d 1 d 2 d 2 d solution u ∈ H (R ) for all g ∈ L (R ) ∩ L (R ). Since f(u)x ∈ L (R ), u ∈ H (R ). L∞(Rd) estimate From (6.21)

p−2 p−2 (λ u −  ∆u + f(u)x, |u| u) = (g, |u| u).

Since p 0 p−2 p −1 p −1 λ|u|p + (f (u)ux, |u| u) − (p − 1)(|u| 2 uux, |u| 2 uux)

δ |f 0| ≤ ((1 − − ∞ p − 1)|u|p, 2 2δ p we have 1 |f 0| |u ≤ ( λ − ∞ p − 1)−1|g| . p 2 2λ p By letting p → ∞ 1 |u| ≤ |g| . ∞ λ ∞ Thus, without loss of generality, f is C1(R)d. 1,1 d 11 d W (R ) estimate Assuming g ∈ W (R ), v = ux satisfies

0 λ v − ∆v + (f (u)v)x = gx.

Using the same arguments as above

|v| ≤ |gx|1.

∞ d Entropy condition We show that for every k ∈ R and nonnegative function ψ ∈ C0 (R )

(sign(u−k)(λu−g, , ψ)−(∆u, ψ) ≤ (sign(u−k)(f(x)−f(k)), ψx)+ (|u−k|, ∆ψ) ≥ 0 (6.22) 2 d It suffices to prove for u ∈ C0 (R ). Note that Z u 0 (ρ(u − k)f(u)x, ψ) = (( ρ(s − k)f (u) ds)x, ψ) k Z u = −( ρ(s − k)fu(t, x, u) ds, ψx) k

122 R k+ R 1 Since k ρ(s − k) ds =  0 ρ(s) ds → 0 as  → 0, letting  → 0 we obtain

(sgn(u − k)f(u)x, ψ) = −(sgn(u − k)(f(u) − f(k)), ψx)

Next, 0 −(ρ(u − k)∆u, ψ) = (ρ(u − k)ux, ψ ux) + (ρ(u − k)ux, ψx) where

(ρ(u − k)ux, ψx) = (−Ψ(u − k), De|taψ) → −(|u − k|, ∆ψ) as  → 0.

Thus, we obtain −(ρ(u − k)∆u, ψ) ≥ −(|u − k|, ∆ψ) and (6.22). Example (Hamilton Jacobi equation) Let u is a solution to a scalar conservation in R1, then v = R x u dx satisfies the the Hamilton-Jacobi equation

vt + f(vx) = 0. (6.23)

d Let X = C0(R ) and

Av = −f(vx) dom(A) = {f(vx) ∈ X}

1 d Then, for v1, v2 ∈ C( R )

hA(v1 − v2), δx0 i = −(f((v1)x(x0)) − f((v2)x(x0))) = 0

n where x0 ∈ R such that |v|X = |v(x0)|. We prove the range condition

range(λ I − A) = X for λ > 0.

1 That is, there exists a unique viscosity solution to λv − f(vx) = g; for all φ ∈ C (Ω) if d v − φ attains a local maximum at x0 ∈ R , then

λ v(x0) − g(x0) + f(φx(x0)) ≤ 0

d and if v − φ attains a local minimum at x0 ∈ R , then

λ v(x0) − g(x0) + f(φx(x0)) ≥ 0.

Thus, A is maximal monotone and (6.23) has an intgral solution. Consider the equation of the form

λ V + Hˆ (x, Vx) − ν ∆V = ω f.

1 2 d Assume that Hˆ is C and there existc ˜1 > 0, c˜2 > 0 andc ˜3 ∈ L (R ) such that

2 d |Hˆ (x, p) − Hˆ (x, q)| ≤ c˜1 |p − q| and Hˆ (t, x, 0) ∈ L (R ) and |Hˆx(x, p)| ≤ c˜3(x) +c ˜2 |p|

123 Define the Hilbert space H = H1(Rd) by

2 d 2 d H = {φ ∈ L (R ): φx ∈ L (R )} with inner product Z (φ, ψ)H = φ(x)ψ(x) + φx(x) · ψx(x) dx. Rd Define the single valued operator A on H by

AV = −Hˆ (x, Vx) +  ∆V with dom (A) = H3(Rd). We show that A − λ I is m-dissipative for some λ. First, 2 c˜1 A − ωˆ I with λ = 2 is monotone since

2 2 (Aφ − Aψ, φ − ψ)H = − (|φx − ψx|2 + |∆(φ − ψ)|2)

−(Hˆ (·, φx) − Hˆ (·, ψx), φ − ψ − ∆(φ − ψ))  ≤ λ |φ − ψ|2 − |φ − ψ |2 , H 2 x x H where we used the following lemma. 1 d 2 2 d d 2 d Lemma 2.1 For ψ ∈ H (R ) and φ ∈ Ldiv = {φ ∈ (L (R )) : ∇ · φ ∈ L (R )} we have (φ, ∇ψ) + (∇ · φ, ψ) = 0 ∞ d Proof: Note that for ζ ∈ C0 (R )

(φ, ∇(ζ ψ)) + (∇ · φ, ζ ψ) = 0

∞ d x Let g ∈ C (R ) satisfying g = 1 for |x| ≤ 1 and g = 0 if |x| ≥ 2 and set ζ = g( r ). Then we have 1 (φ, ζ∇ψ + ψ∇g) + (∇ · φ, ζ ψ) = 0. r x d Since g( r ) → 1 a.e. in R as r → ∞ thus the lemma follows from Fatou’s lemma. . ν 3 d Let us define the linear operator T on X by T φ = 2 ∆. with dom (T ) = H (R ). Then T is a self-adjoint operator in H. Moreover, if let X˜ = H2(Rd) then X˜ ∗ = L2(Rd) where H = H1(Rd) is the pivoting space and H = H∗ and T ∈ L(X,˜ X˜ ∗) is hermite and coercive. Thus, T is maximal monotone. Hence equation ω V − AV = ω f in L2(Rd) has a unique solution V ∈ H. Note that from (2.7) for φ ∈ H2(Rd)

2 d Hˆ (x, φx)x = Hˆx(x, φx) + Hˆp(x, φx)φxx ∈ L (R ).

Thus, if f ∈ H then the solution V ∈ dom (A) and thus A is maximum monotone in H. Step 3: We establish W 1,∞(Rd) estimate of solutions to

ω V + H(x, Vx) − ν ∆V = ω f,

124 ∞ d when fx,Hx(x, 0) ∈ L (R ) and H satisfies

p 2 (2.9) |Hx(x, p) − Hx(x, 0)| ≤ M1 |p| and |Hp(x, p)| ≤ M2 1 + |x| .

Consider the equation of the form

(2.10) ω V + ψ H(x, Vx) − ν ∆V = ω f where c2 ψ(x) = for c ≥ 1. c2 + |x|2 Then ψ(x) H(x, p) satisfies (2.6)–(2.7) and thus there exists a unique solution V ∈ 3 d H (R ) to (2.10) for sufficiently large ω > 0. Define U = Vx. Then, from (2.10) we have

(2.11) ω (U, φ) + (ψx H(x, U) + ψ Hp(x, U) · Ux, φ) + ν (Ux, φx) = ω (fx, φ) for φ ∈ H1(Rd)d. Let

1 q Z  p 2 2 p |U| = U1 + ··· + Un and |U|p = |U(x)| dx Rd Define the functions Ψ, Φ: R+ → R+ by

 p 2  p −1 2  r 2 for r ≤ R  r 2 for r ≤ R Ψ(r) = and Φ(r) =  Rp−2r for r ≥ R2  Rp−2 for r ≥ R2.

Setting φ = Φ(|U|2)U ∈ H1(Rd)d in (2.11), we obtain

2x · U ω |Ψ(|U|2)| − ( Φ(|U|2), ψ H(x, U)) 1 c2 + |x|2

1 (2.12) +(ψ (H (x, U) − ω f ), Φ(|U|2)U) + (ψ H (x, U), Φ(|U|2)( |U|2) ) x x p 2 x 1 1 +ν {2 (Φ0(|U|2)( |U|2) , ( |U|2) ) + (Φ(|U|2)U ,U ))} = 0. 2 x 2 x x x Since from (2.9) p |H(x, p)| ≤ const 1 + |x|2 (1 + |p|), there exists constants k1, k2 independent of c ≥ 1 and R > 0 such that 2x · U ( Φ(|U|2), ψ H(x, U)))) ≤ k (ψ, Ψ(|U|2)) + k |Φ(|U|2)U| c2 + |x|2 1 2 q

p where q = p−1 . It thus follows from (2.9) and (2.12) that 1 ω |Ψ(|U|2)| + (Φ(|U|2) U ,U ) ≤ ω˜|Ψ(|U|2)| +(k +|H (x, 0)| +ω |f | ) |Φ(|U|2)U| . 1 ν x x 1 2 x p x p q

125 2 for some constantω ˜ > 0 independent of c ≥ 1 and R > 0. Since |Ψ(|U| )|1 ≥ 2 q |Φ(|U| )U|q, it follows that for ω ≥ ω˜ 1 |Ψ(|U|2)| + (Φ(|U|2) U ,U ) 1 2ν x x is uniformly bounded in R > 0. Letting R → ∞, it follows from Fatou’s lemma that U ∈ Lp(Rd) and from (2.12) 2x · U 1 ω (|U| − |f | ) + (ψ |U|p−2,H(x, U) + (ψ H (x, U), |U|p−2( |U|2) ) p x p c2 + |x|2 p 2 x

1 1 +(ψ H (x, U), |U|p−2U) + ν {(p − 2) (|U|p−4, ( |U|2) , ( |U|2) ) + (|U|p−2U ,U )}. x 2 x 2 x x x Thus we have 1 |U| ≤ |f | + (σ |U| + k + |H (x, 0| ) p x p ω p p 2 x p where |ψ H (x, U)|2 σ = k + M + p ∞ . p 1 1 (p − 2)ν Letting p → ∞, we obtain 1 (2.13) |U| ≤ |f | + ((k + M ) |U| + k + |H (x, 0| ). ∞ x ∞ ω 1 1 ∞ 2 x ∞ For c ≥ 1 let us denote by V c, the unique solution of (2.10). Let ζ(x) = χ(x/r) ∈ C2(Rd), r ≥ 1 where χ = χ(|x|) ∈ C2(Rd) is a nonincreasing function such that

χ(s) = 1 for 0 ≤ s ≤ 1 and χ(s) = exp(−|s|) for s ≥ 2. and we assume −∆χ ≤ k3 χ. Then

(2.14) ω (V c, ζ ξ) + (ψ H(x, U c), ζ ξ) − ν (∆V c, ζ ξ) = ω (f, ζ ξ)

2 d c c for all ξ ∈ L (R ), and U = Vx satisfies

c c c (2.15) ω (U , ζ φ) + (ψ H(x, U ), ∇ · (ζ φ)) − ν (tr Ux, ∇ · ζ φ) = ω (fx, ζ φ) for all φ ∈ H1(Rd)d. Setting φ = U c in (2.15), we obtain

c c c c c ω (ζ U ,U ) + (ψ H(x, U ),U · ζx + ζ∇ · U )

1 +ν {(ζ U c,U c) − (∆ζ, |U c|2)} = ω (f , ζ U). x x 2 x

c d Since |U |∞ is uniformly bounded in c ≥ 1, it follows that for any compact set Ω in R there exists a constant MΩ independent of c ≥ 1 such that

c c c c ω (ζ U ,U ) + ν (ζ Ux,Ux) ≤ MΩ.

d 2 Hence for every compact set Ω of R Since if Ωr = {|x| < r}, then H (Ωr) is compactly 1 c embedded into H (Ωr), it follows that there exists a subsequence of {V } which con- 1 verges strongly in H (Ωr). By a standard diagonalization process, we can construct

126 a subsequence {V cˆ} which converges to a function V strongly in H1(Ω))ˆ and weakly 2 d cˆ in H (Ω)ˆ for every compact set Ωˆ in R . Let U = Vx. Then, U converges weakly in H1(Ω)ˆ d and strongly in L2(Ω).ˆ Since L2(Ω)ˆ convergent sequence has an a.e. pointwise convergent subsequence, without loss of generality we can assume that U cˆ converges to U a.e. in Rd. Hence, by Lebesgue dominated convergence theorem

cˆ cˆ Hp(·,U ) → Hp(·,U) and Hx(·,U ) → Hx(·,U) strongly in L2(Ω)ˆ d. It follows from (2.14)–(2.15) that the limit V satisfies

(2.16) ω (V, ζ ξ) + (H(x, Vx), ζ ξ) − ν(∆V, ζ ξ) = ω (V, ζ ξ)

2 d for all ξ ∈ Lloc(R ) and

(2.17) ω (U, ζ φ) + (H(x, U), ∇ · (ζ φ)) − ν (tr Ux, (ζ φ)x) = ω (fx, ζ φ)

1 d p−2 for all φ ∈ Hloc(R ). Setting φ = |U| U in (2.17), we obtain (2.18) 1 ω (ζ, |U|p) + (ζ H (x, U), |U|p−2( |U|2) ) + (ζ H (x, U), |U|p−2U) − ω (f , ζ |U|p−2U) p 2 x x x 1 1 1 +ν {(p − 2) (ζ |U|p−4( |U|2) , ( |U|2) ) + (ζ |U|p−2U ,U )) − (∆ζ, |U|p)} = 0 2 x 2 x x x p for all ζ = χ(x/r). Thus,

1 1 1 1 1 (2.19) (ζ, |U|p) p ≤ (ζ, |f |p) p + (c (ζ, |U|p) p + (ζ, |H (x, 0)|p) p ) x ω p x

2 k3 |ζ Hp(x,U)|∞ where cp = M1 + p + 2ν (p−2) and

2 d (2.20) (Hx(x, 0) − Hx(x, p), p) ≤ M1 |p| all x ∈ R .

Now, letting p → ∞ we obtain from (2.19)

(ω − M1) sup |U| ≤ ω |fx|∞ + |Hx(x, 0)|∞. |x|≤r

Since r ≥ 1 is arbitrary, we have

(2.21) (ω − M1) |U|∞ ≤ ω |fx|∞ + |Hx(x, 0)|∞

Setting ξ = V in (2.16), we obtain 1 ω (ζ, |V |2) + (ζ, H (x, U)) − ω (f, ζ V ) + ν { (ζ |V |2) , − (∆ζ, |U|2)} = 0. p x 2 Also, from (2.18)

2 ω (ζ, |U| ) + (ζ Hp(x, U),U · Ux) + (ζ Hx(x, U),U) − ω (fx, ζ U)

1 +ν {(ζ U ,U )) − (∆ζ, |U|2)} = 0 x x 2

127 2 Since |U|∞ is bounded, thus V ∈ H`oc. In fact 2 ω (ζ V, V ) + ν (ζ Ux,Ux) ≤ a1 |U|∞ + a2ω ((ζf, f) + (ζfx, fx) for some constants a1, a2. Step 4: Next we prove that for ω > max(M1, ωα) equation

ω V − Aν(t)V = ω f

2 d has a unique solution satisfying (2.21). We define the sequence {Vk} in H`oc(R ) by the successive iteration

(2.22) ω Vk+1 − Aν(t)Vk+1 − (ω − ω0)Vk = ω0 f.

From Step 3 (2.22) has a solution Vk+1 satisfying

(2.23) (ω − M1) |Uk+1|∞ ≤ (ω − ω0) |Uk|∞ + ω0 |fx|∞ + |Hx(x, 0)|∞

Thus,

ω − ω0 −1 (ω0 |fx|∞ + |Hx(x, 0)|∞) ω0 |fx|∞ + |Hx(x, 0)|∞ |Uk|∞ ≤ (1 − ) = = α ω − M1 ω − M1 ω0 − M1

1,∞ d 2 d for all k ≥ 1. {Vk} is bounded sequence in W (R ) and thus in H`oc(R ). Moreover, we have from (2.4) ω − ω0 |Vk+1 − Vk|X ≤ |Vk − Vk−1|X . ω − ωα

Thus {Vk} is a Cauchy sequence in X and {Vk} converges to V in X. Let us define the single-valued operator B on X by Bφ = −H(x, φx). Since {BVk} is bounded in ∞ d ∞ d L (R ) we may assume that BVk converges weakly star to w in L (R ). If we show 2 d that w = Bu, then V ∈ H (R ) solves the desired equation. Since {Vk} is bounded in 2 1 H (Ωr), {Vk} is strongly precompact in H (Ωr) for each r > 0. Hence BVk → BV a.e. in Ωr and thus w = BV . Step 5: Next, we consider the case when H ∈ C1 is without the Lipschitz bound in p but satisfies (2.2). Consider the cut-off function of H(x, p) by

 H (x, p )(p − M p ) + H(M p ) if |p| ≥ M  p |p| |p| |p| HM (x, p) =  H(x, p) if |p| ≤ M.

From Step 3 and (2.20) equation

M ω V + H (x, Vx) − ν ∆V = ω f has the unique solution V and U = Vx satisfies (2.21) with M1 = β M + a. Let M > 0 be so that (ω − (β M + a)) M ≤ ω |fx|∞ + |Hx(x, 0)|∞. Then, |U|∞ ≤ M. and thus 2 d 1,∞ d V ∈ Hloc(R ) ∩ W (R ) satisfies

(2.24) ω V + H(x, Vx) − ν ∆V = ω f and

(2.25) (ω − (β M + a)) M ≤ ω |fx|∞ + |Hx(x, 0)|∞

128 for ω > M1. If β = 0, then ϕ(V ) − ϕ(f) ≤ a ϕ(V ) + b ω where b = |Hx(x, 0)|∞. Step 6: We prove that the solution V ν to (2.24) converges to a viscosity solution V to

(2.26) ω V + H(x, Vx) = ω f as ν → 0+. First we show that for all φ ∈ C2(Ω) if V ν − φ attains a local maximum n (minimum, respectively) at x0 ∈ R , then

(2.27) ω (V (x0) − f(x0)) + H(x0, φx(x0)) − ν (∆V )(x0) ≤ 0 (≥ 0, respectively).

2 ν d For φ ∈ C (Ω) we assume that V − φ attains a local maximum at x0 ∈ R . Let d Ω = {x ∈ R : |x − x0| < 1} and Γ denote its boundary. Then without loss of ν generality we can assume that V − φ attains the unique global maximum 1 at x0 and V ν − φ ≤ 0 on Γ. In fact we can choose ζ ∈ C∞(Ω) such that V ν − (φ − ζ) attains ν the unique global maximum 1 at x0, V − (φ − ζ) ≤ 0 on Γ and ζx(x0) = 0. Let ν 1,∞ p−1 ψ = sup(0,V − φ) ∈ W0 (Ω). Multiplying ψ to (2.26) and integrating over Ω, we obtain

p ν p−1 p−2 p−1 ω |ψ|p + (η · (V − φ)x, ψ ) + ν (p − 1) (ψx, ψ ψx) = −(δ ψ, ψ ), where Z 1 ν η = Hp(x, φx + θ (V − φ)x) dθ 0 and δ = ω (φ − f) + H(x, φx) − ν ∆φ. Since p−1 1 2 p p −1 2 |(η · ψ , ψ )| ≤ |η| ∞ |ψ| + ν(p − 1) |ψ 2 ψ | x 4ν(p − 1) L (Ω) p x 2 it follows that |η|2 (ω − ∞ ) |ψ|p ≤ −(δψ, ψp−1). 4ν(p − 1) p

Letting p → ∞, we can conclude that δ(x0) ≤ 0 since δ ∈ C(Ω). Setting ψ = ν ν inf(0,V − φ)) and assuming V − φ has the unique global minimum −1 at x0 and V ν − φ ≥ 0 on Γ, the same argument as above shows the second assertion. Next, we show that there exists a subsequence of {V ν} that converges to a viscosity solution V to

(2.28) ω V + H(x, Vx) = ω f,

1 d i.e., for all φ ∈ C (Ω) if V − φ attains a local maximum at x0 ∈ R , then

(2.29a) ω (V (x0) − f(x0)) + H(x0, φx(x0)) ≤ 0

h and if V − φ attains a local minimum at x0 ∈ R , then

(2.29b) ω (V (x0) − f(x0)) + H(x0, φx(x0)) ≥ 0.

129 It follows from Step 5 that for some γ > 0 independent of ν

ν |V |W 1,∞(Ω) ≤ γ

Thus there exists a subsequence of {V ν} (denoted by the same) that converges weakly star to V in W 1,∞(Rd), and thus the convergence is uniform in Ω. We prove (2.29a) 2 2 ν first for φ ∈ C (Ω). Assume that for φ ∈ C (Ω) V −φ has a local maximum at x0 ∈ Ω. ∞ ν We can choose ζ ∈ C (Ω) such that ζx(x0) = 0 and V − (φ − ζ) has a strict local ν maximum at x0. For ν > 0 sufficiently small, V − (φ − ζ) has a local maximum at + some xν ∈ Ω and xν → x0 as ν → 0 . From (2.27)

ν ω (V (xν) − f(xν)) + H(xν, φx(xν)) − ν (∆φ)(xν) ≤ 0

ν We conclude (2.29a), since V (xν) → V (x0), φx(xν) − ζx(xν) → φx(x0) − ζx(x0) = + 1 φx(x0) and ν ∆φ(xν) → 0 as ν → 0 . For φ ∈ C (Ω) exactly the same argument is 2 1 applied to the convergent sequence φn ∈ C (Ω) to φ in C (Ω) to prove (2.29a).

Step 7: We show that if V,W ∈ Dα are viscosity solutions to ω (V −f)+H(x, Vx) = 0 and ω (W − g) + H(x, Wx) = 0, respectively, then

(3.30) (ω − ωα) |u − v|X ≤ ω |f − g|X .

For δ > 0 let 1 ψ(x) = p1 + |x|2+δ

If u, v ∈ Dα then

(2.31) lim ψ(x)V (x) = lim ψ(x)W (x) = 0. |x|→∞ |x|→∞

We choose a function β ∈ C∞(Rd) satisfying

0 ≤ β ≤ 1, β(0) = 1, β(x) = 0 if |x| > 1.

n n Let M = max (|u|Xδ , |W |Xδ ). Define the function Φ : R × R → R by

(2.32) Φ(x, y) = ψ(x)V (x) − ψ(y)W (y) + 3Mβ(x − y) where x β (x) = β( ) for x ∈ Rd.  

Off the support of β(x − y), Φ ≤ 2M, while if |x| + |y| → ∞ on this support, then |x|, |y| → ∞ and thus from (2.30) lim|x|+|y|→∞ Φ ≤ 3M. We may assume that V (¯x) − W (¯x) > 0 for somex ¯. Then,

Φ(¯x, x¯) = ψ(¯x)(V (¯x) − W (¯x)) + 3M β(0) > 3M.

d d Hence Φ attains its maximum value at some point (x0, y0) ∈ R × R . Moreover, |x0 − y0| ≤  since β(x0 − y0) > 0. Now x0 is a maximum point of

 ψ(y )W (y ) − 3Mβ (x − y ) + Φ(x , y ) ψ(x) V (x) − 0 0  0 0 0 ψ(x)

130 and since ψ > 0 the function ψ(y )W (y ) − 3Mβ (x − y ) + Φ(x , y ) x → V (x) − 0 0  0 0 0 ψ(x) attains a maximum 0 at x0. Since

ψ(y0)W (y0) − 3Mβ(x0 − y0) + Φ(x0, y0) = ψ(x0)V (x0) and V is a viscosity solution

(2.33) ψ(x0)(ω (V (x0) − f(x0)) + H(x0, p)) ≤ 0, where 0 2 + δ δ 3Mβ(x0 − y0) p = ψ(x0)V (x0)ψ(x0)|x0| x0 − . 2 ψ(x0) 2+δ 0 δ and we used the fact that (|x| ) = (2 + δ)|x| x. Moreover since V ∈ Dα (2.34) |p| ≤ α

Similarly, the function ψ(x )V (x ) + 3Mβ (x − y) − Φ(x , y ) y → W (y) − 0 0  0 0 0 ψ(y) attains a minimum 0 at y0 and since W is a viscosity solution

(2.35) ψ(y0)(ω (W (y0) − g(y0)) + H(y0, q)) ≥ 0, where 0 2 + δ δ 3Mβ(x0 − y0) q = ψ(y0)W (y0)ψ(y0)|y0| y0 − 2 ψ(y0) and |q| ≤ α. Thus by (2.33) and (2.35) we have

ω (ψ(x0)V (x0) − ψ(y0)W (y0)) (2.36) ≤ ψ(y0) H(y0, q) − ψ(x0) H(x0, p) + ω(ψ(x0)f(x0) − ψ(y0)g(y0))

Since Φ(x0, y0) ≥ Φ(¯x, x¯) we have

ψ(x0)V (x0) − ψ(y0)W (y0) ≥ ψ(¯x)(V (¯x) − W (¯x)) + 3M (1 − β(x0 − y0)) and thus

(ψ(x0)−ψ(y0)) V (x0)+ψ(y0)(V (x0)−W (y0)) ≥ ψ(¯x)(V (¯x)−W (¯x))+3M (1−β(x0−y0)). Since (2.37) q q 2+δ 2+δ |(ψ(x0)−ψ(y0)) V (x0)| = ψ(y0)ψ(x0)|V (x0)|( 1 + |y0| − 1 + |x0| ) ≤ const |x0−y0|, it follows that V (x0) ≥ W (y0) for sufficiently small  > 0. Note that

ψ(y0)H(y0, q) − ψ(x0)H(x0, p) = (ψ(y0) − ψ(x0))H(x0, p)

+ψ(y0)(H(y0, p) − H(x0, p)) + ψ(y0)(H(y0, q) − H(y0, p)).

131 From (2.36)–(2.37) we have that

ω (ψ(x0)V (x0) − ψ(y0)W (y0) − (ψ(x0)f(x0) − ψ(y0)g(y0))) (2.38) ≤ O() + ψ(y0)(c1(y0), p − q) + c2 |p − q|). where O() → 0 as  → 0. Now we evaluate p − q, i.e.,

2 + δ p − q = (ψ(x )V (x ) − ψ(y )W (y ))ψ(y )|y |δ y 2 0 0 0 0 0 0 0 2 + δ + ψ(x )V (x )(ψ(x ) |x |δx − ψ(y )|y |δy ) 2 0 0 0 0 0 0 0 0 q q 0 2+δ 2+δ +3Mβ(x0 − y0)( 1 + |x0| − 1 + |y0| ).

Since δp 2 δ |x0| 1 + |x0| |ψ(x0)V (x0)ψ(x0)|x0| x0| ≤ |u|X 2+δ |x0| ≤ M3 1 + |x0| for some M3 > 0, it follows from (2.34) that q 0 2+δ 3|β(x0 − y0) 1 + |x0| | ≤ M4 for some M4 > 0. Thus, q q 0 2+δ 2+δ |3Mβ(x0 − y0) | 1 + |x0| − 1 + |y0| | ≤ const (2 + δ)MM4|x0 − y0|. and therefore 2 + δ ψ(x )V (x )(ψ(x )|x |δx −ψ(y )|y |δy )+3Mβ (x −y )(|x |2+δ−|y |2+δ) = O(). 2 0 0 0 0 0 0 0 0  0 0 0 0 In the right-hand side of (2.38) we have

ψ(y0)((c1(y0), p − q) + c2 |p − q|)

2+δ 1+δ 2 + δ β |y0| + c2α |y0| ≤ O() + 2+δ (ψ(x0)u(x0) − ψ(y0)v(y0)). 2 1 + |y0| Hence from (2.38) we conclude

(2.39) ωy0 (ψ(x0)u(x0) − ψ(y0)v(y0)) ≤ ψ(x0)f(x0) − ψ(y0)g(y0) + O() where 2+δ 1+δ 2 + δ β |y0| + c2α |y0| ωδ = sup (ω − 2+δ ). y0 2 1 + |y0| d Assume that ω > λα. For x ∈ R we have

ψ(x)(u(x) − v(x)) + 3M = Φ(x, x) ≤ Φ(x0, y0) ≤ ψ(x0)u(x0) − ψ(y0)v(y0) + 3M

132 and so by (2.39)

+ ωδ sup ψ(x)(u(x) − v(x)) ≤ ω (ψ(x0)u(x0) − ψ(y0)v(y0)) ≤ ψ(x0)f(x0) − ψ(y0)g(y0) + O() Rd

+ ≤ sup ψ(f − g) + |ψ(x0)g(x0) − ψ(y0)g(y0)| + O() Rd

+ ≤ sup ψ(f − g) + ωψg() + O() Rd where ωψg(·) is the modulus of continuity of ψg. Letting  → 0, we obtain

+ + (3.40) ωδ sup ψ(x)(u(x) − v(x)) ≤ sup ψ(f − g) Rd Rn

+ + Since |ψ|Xδ → |ψ|X as δ → 0 for ψ ∈ X we obtain (2.30) by taking limit δ → 0 in (2.40). Example (Plastic equations) Consider the visco-plastic equation of the form

vt + div(σ) = 0 where the stress σ() = sigmat minimizes

h(σ) −  : σ and the strain  is given by 1 ∂ ∂ i,j = ( vj + vj 2 ∂xi ∂xi That is, σ∂h∗() where h∗ is the conjugate of h

h∗() = sup{ : σ − h(σ)} σ∈C For the case of the linear elastic system

σ11

6.1 Evolution equations In this section we consider the evolution equation of the form d x(t) ∈ A(t)u(t) (6.24) dt in a Banach space X ∩ D, where D is a closed set. We assume the dissipativity: there + + exist a constant ω = ωD and continuous functions f : [0,T ] → X and L : R → R independent of t, s ∈ [0,T ] such that

(1 − λω) |x1 − x2| ≤ |x1 − x2 − λ (y1 − y2)| + λ |f(t) − f(s)|L(x2)|K(|y2|)

133 for all x1 ∈ D ∩ dom (A(t)) x2 ∈ D ∩ dom (A(s))and y1 ∈ A(t)x1, y2 ∈ A(s)x2, and A(t), t in[0,T ] is m-dissipative and

−1 Jλ(t) = (λ I − A) : D → D ∩ dom(A(tk)).

Thus, one constructs the mild solution as

[t/λ] u(t) = lim Πk=1 Jλ(tk)u0, tk = kλ. λ→0+ Theorem 1.1 Let (A(t),X) satisfy (H.1)–(H.4). Then,

t−s [ λ ] U(t, s) = lim Πi=1 Jλ(s + iλ)x λ→0+ exists for x ∈ dom (A(0)) and 0 ≤ s ≤ t ≤ T . The U(t, s) for 0 ≤ s ≤ t ≤ T defines an evolution operator on dom A(0) and moreover satisfies

|U(t, s)x − U(t, s)y| ≤ eω (t−s) |x − y| for 0 ≤ s ≤ t ≤ T and x, y ∈ dom (A(0)).

Proof: Let hi = λ and ti = s + iλ in (). Then xi = Jλ(ti)xi−1, where we dropped the m −m 2ω(T −s) superscript λ and xm = (Πi=1Jλ(ti))x. Thus |xi| ≤ (1 − ωλ) ≤ e |x| = M1 for 0 < mλ ≤ T −s. Letx ˆj is the approximation solution corresponding to the stepsize hˆ − j = µ and tˆj = s + jµ. Define am,n = |xm − xˆn|. We first evaluate a0,n, am,0.

m X m m am,0 = |xm − x| ≤ |(Πi=kJλ(ti))x − (Πi=k+1Jλ(ti))x| k=1

m X −(m−k+1) 2ω(T −s) ≤ (1 − ωλ) λ |||A(ti)x||| ≤ e mλ M(x). k=1 where M(x) = supt∈[0,T ] |||A(t)x|||. Similarly, we have

2ω(T −s) a0,n ≤ e nµ M(x).

Next, we establish the recursive formula for ai,j. For λ ≥ µ > 0

ai,j = |xi − xˆj| ≤ |Jλ(ti)xi−1 − Jµ(tˆj)ˆxj−1|

≤ |Jλ(ti)xi−1 − Jµ(ti)ˆxj−1| + |Jµ(ti)ˆxj−1 − Jµ(tˆj)ˆxj−1|

From Theorem 1.10

|Jλ(ti)xi−1 − Jµ(ti)ˆxj−1|

µ λ − µ = |J ( x + J (t )x ) − J (t )ˆx | µ λ i−1 λ λ i i−1 µ i j−1 µ λ − µ ≤ (1 − ω µ)−1( |x − xˆ | + |x − xˆ |) λ i−1 j−1 λ i j−1

134 Hence for i, j ≥ 1 we have

−1 ai,j ≤ (1 − ωµ) (α ai−1,j−1 + β ai,j−1) + bi,j where µ λ − µ α = , and β = λ λ and

bi,j = |Jµ(ti)ˆxj−1 − Jµ(tˆj)ˆxj−1| ≤ µ |f(ti) − f(tˆj)|L(|xˆj|)K(|Aµ(tˆj)ˆxj−1|).

We show that if yi = Aλ(ti)xi−1, then there exists a constant M2 = M2(x0,T, ||A(s)x0||) such that |yi| ≤ M2 Since xi = Jλ(ti)xi−1 and Aλ(ti)xi−1 ∈ A(ti)xi, from (H.4) and (1.2)–(1.3) we have

|||A(ti)xi||| = |Aλ(ti)xi−1| ≤ |Aλ(ti−1)xi−1| + |f(ti) − f(ti−1)|L(M1)(1 + |Ah(ti−1)xi−1|))

−1 ≤ (1 − λ) (|||A(ti−1)xi−1||| + |f(ti) − f(ti−1)|L(M1)(1 + |||A(ti−1)xi−1|||)

If we define ai = |||A(ti)xi|||, then

(1 − ω λ) ai ≤ ai−1 + bi(1 + ai−1). where bi = L(M1)|f(ti) − f(ti−1)|. Thus, it follows from the proof of Lemma 2.4 that |||A(ti)xi||| ≤ M2, for some constant M2 = M(x0,T ). Since

−1 (1.∗) |yi| ≤ (1 − ωλ) (|||A(ti−1)xi−1||| + |f(ti) − f(ti−1)|L(M1)(1 + |||A(ti−1)xi−1|||), thus |yi| is uniformly bounded. It follows from Theorem and [] that

2ω(T −s) 2 1/2 2 1/2 am,n ≤ e M(x) [((nµ − mλ) + nµ(λ − µ)) + ((nµ − mλ) + mλ(λ − µ)) ]

n−1 min(m−1,j) X X  j  +(1 − ωµ)−n βj−1αi b , i m−i,n−j j=0 i=0 where mλ, nµ ≤ T − s. et ρ be the modulus of continuity of f on [0,T ], i.e.,

ρ(r) = sup {|f(t) − f(τ) : 0 ≤ t, τ ≤ T and |t − τ| ≤ r}

Then ρ is is nondecreasing and subadditive; i.e., ρ(r1 +r2) ≤ ρ(r1)+ρ(r2) for r1, r2 ≥ 0. Thus

n−1 min(m−1,j) X X  j  J = βj−1αi b i m−i,n−j j=0 i=0

 n−1  X X  j  ≤ Cµ n ρ(|nµ − mλ)|) + (m − 1)j βj−1αi ρ(|jµ − iλ)|) .  i  j=0 i=0

135 where we used () with C = L(M1)K(M2), the of of ρ and the estimate

min(m−1,j) X  j  βj−1αi ≤ 1. i i=0 Next, let δ > 0, be given and write

n−1 min(m−1,j) X X  j  βj−1αi ρ(|jµ − iλ)|) = I + I i 1 2 j=0 i=0 where I1 is the sum over indecies such that |jµ − iλ| < δ, while I2 is the sum over indecies satisfying |jµ − iλ| ≥ δ. Clearly I1 ≤ nρ(δ), but

n−1 min(m−1,j) X X  j  |jµ − iλ|2 ρ(T ) ρ(T )n2 I ≤ ρ(T ) βj−1αi = n(n−1)(λµ−µ2) ≤ µ(λ−µ) 2 i δ2 δ2 δ2 j=0 i=0 Therefore,  ρ(T )  J ≤ Cnµ ρ(|nµ − mλ| + ρ(δ) + nµ(λ − µ) . δ2 Combining ()–(), we obtain

2ω(T −s) 2 1/2 2 1/2 am,n ≤ e M(x) [((nµ − mλ) + nµ(λ − µ)) + ((nµ − mλ) + mλ(λ − µ)) ]

 ρ(T )  +e2ω(T −s)C nµ ρ(|nµ − mλ| + ρ(δ) + nµ(λ − µ) . δ2

2 √ Now, we can choose, e.g, δ = λ − µ and it follows from () that an,m as function of m, n and λ, µ, tends to zero as |nµ − mλ| → 0 and n, m → ∞, subject to 0 < nµ, mλ ≤ T − s, and the convergence is uniform in s. Theorem 1.1 Let (A(t),X) satisfy (H.1)–(H.4). Then,

t−s [ λ ] U(t, s) = lim Πi=1 Jλ(s + iλ)x λ→0+ exists for x ∈ dom (A(0)) and 0 ≤ s ≤ t ≤ T . The U(t, s) for 0 ≤ s ≤ t ≤ T defines an evolution operator on A(0) and moreover satisfies

|U(t, s)x − U(t, s)y| ≤ eω (t−s) |x − y| for 0 ≤ s ≤ t ≤ T and x, y ∈ dom (A(0)). Moreover, we have the following lemma. Lemma 1.2 If x ∈ Dˆ, then there exists a constant L such that

|U(s + r, s)x − U(ˆs + r, sˆ)x| ≤ L ρ(|s − sˆ|) for r ≥ 0 and s, s,ˆ s + r, sˆ + r ≤ T .

Proof: Let ak = |xk − xˆk| where

k −1 k −1 xk = Πi=1(I − λ A(s + i λ)) x andx ˆk = Πi=1(I − λ A(ˆs + i λ)) x.

136 r Then, for λ = n

ak = |Jλ(s + k λ)xk−1 − Jλ(ˆs + k λ)ˆxk−1|

≤ |Jλ(s + k λ)xk−1 − Jλ(ˆs + k λ)xk−1| + |Jλ(ˆs + k λ)xk−1 − Jλ(ˆs + k λ)ˆxk−1|

−1 ≤ λCρ (|s − sˆ|) + (1 − λω) ak−1 for some C. Thus, we obtain e2ωr − 1 |a | ≤ ρ(|s − sˆ|), k 2ω and letting λ → 0 we obtain the desired result. 

6.2 Applications We consider the Cauchy problem of the form d (1.1) u(t) = A(t, u(t))u(t) u(s) = u dt 0 where A(t, u), t ∈ [0,T ] is a maximal dissipative linear operator in a Banach space X for each u belonging to D. Define the nonlinear operator A(t) in X by A(t)u = A(t, u)u. We assume that dom (A(t, u)) is independent of u ∈ D and for each α ∈ R there exists an ωα ∈ R such that

(1.2) hA(t, u)x1−A(s, u)x2, x1−x2i− ≤ ωα |x1−x2|+|f(t)−f(s)|L(|x2|)K(|A(s, u)x2|) for u ∈ Dα and x1 ∈ dom (A(t, u)), x2 ∈ dom (A(s, u)). Moreover, A(t) satisfies (C.2), i.e., (1.3) (1−λ ωα) |u1 −u2| ≤ |(u1 −u2)−λ (A(t)u1 −A(s)u2)|+λ |f(t)−f(s)|L(|u2|)K(|A(s)u2|) for u1 ∈ dom (A(t)) ∩ Dα, u2 ∈ dom (A(s)) ∩ Dα. We consider the finite difference λ approximation of (1.1); for sufficiently small λ > 0 there exists a family {ui } in D such that λ λ ui − ui−1 λ λ λ λ λ = A(ti , ui−1)ui with u0 = u0 hi (1.4) λ λ ϕ(ui ) − ϕ(ui−1) λ λ ≤ a ϕ(ui ) + b. hi Then, it follows from Theorems 2.5–2.7 that if the sequence

λ λ λ λ λ λ λ PNλ λ λ + (1.5) i = A(ti , ui )ui − A(ti , ui−1)ui satisfy i=1 hi |i | → 0 as λ → 0 , then (1.1) has the unique integrable solution. We have the following theorem.

Theorem 5.0 Assume (1.2) holds and for each α ∈ R there exist a cα ≥ 0 such that

(1.6) |(A(t, u)u − A(t, v)u| ≤ cα |u − v|, for u, v ∈ dom (A(t)) ∩ Dα.

137 Let f be of bounded variation and u0 ∈ dom (A(s)) ∩ D. Then (1.3) and (1.5) are sat- isfied and thus (1.1) has a unique integrable solution u and limλ→0+ uλ = u uniformly on [s, T ]. λ λ −1 λ λ Proof: If yi = (hi ) (ui − ui−1), then we have

λ λ λ λ λ λ λ λ yi+1 − yi = A(ti+1, ui )ui+1 − A(ti , ui )ui

λ λ λ λ λ λ +A(ti , ui )ui − A(ti , ui−1)ui and thus from (1.2) and (1.6)

λ λ λ λ λ λ (1 − λωα) |yi+1| ≤ (1 + λcα) |yi | + |f(ti+1) − f(ti )|L(|ui |)K(|yi |)

λ 0 0 with y0 = A(s, u )u = A(s)u0. Thus by the same arguments as in the proof of Lemma λ 2.4, we obtain |yi | ≤ M for some constant M and therefore from (1.6)

PNλ λ + i=1 |i |λ ≤ MT λ → 0 as λ → 0 . We also note that (1.3) follows from (1.2) and (1.6).

6.3 Navier Stokes Equation, Revisited We consider the incompressible Navier-Stokes equations (5.1). We use exactly the same notation as in Section.. Define the evolution operator A(t, u) by w = A(t, u)v ∈ H, where

(5.1) (w, φ)H + ν σ(u, φ) + b(u, v, φ) − (f(t), φ)H = 0 for φ ∈ V , with dom (A(t)) = dom (A0). We let D = V and define the functional as below. Theorem 5.1 The evolution operator (A(t, u), ϕ, D, H) defined above satisfies the conditions (1.2)–(1.5) with g(r) = b, a suitably chosen positive constant.

Proof: For u1, u2 ∈ dom (A0).

2 (A(t, v)u1 − A(s, v)u2, u1 − u2)H + ν |u1 − u2|V = (f(t) − f(s), u1 − u2)

δ since b(u, v1 − v2, v1 − v2) = 0, which implies (1.2). The existence of u ∈ V for the equation: δ−1(uδ − u0) = A(t, u0)uδ for u0 ∈ V , δ > 0 and t ∈ [0,T ] follows from Step 2. of Section NS and we have δ 2 0 2 |u |H − |u |H δ 2 1 2 (5.2) + ν |u | ≤ |f(t)| ∗ . δ V ν V δ We also have the estimate of |u |V . 1 ν 27M 4 1 (5.3) (|uδ|2 − |u0|2 ) + |A uδ|2 ≤ 1 |u0|2 |u0|2 |uδ|2 + |P f(t)|2 . 2δ V V 2 0 4ν3 H V V ν H δ 2 0 2 for the two dimensional case. Multiplying the both side of (5.2) by |u |H + |u |H , we have δ 4 0 4 |u |H − |u |H δ 2 0 2 δ 2 1 δ 2 0 2 2 + ν (|u | + |u | )|u | ≤ (|u | + |u | ) |f(t)| ∗ δ H H V ν H H V

138 Since s → log(1 + s) is concave

log(1 + |uδ|2 ) − log(1 + |u0|2 ) c ν4 V V 0 δ

δ 2 0 2 0 2 0 2 δ 2 −1 2 4 −1 |u |V − |u |V |u |H |u |V |u |V + 2ν |P f(t)|H ≤ c0ν δ 2 ≤ 2ν 0 2 1 + |u0|V 1 + |u |V 4 where we set c0 = 4 . Thus, if we define 27M1 2 4 2 ϕ(u) = 2 |u|H + c0ν log(1 + |u|V ) then for every δ > 0 ϕ(uδ) − ϕ(u0) ≤ b δ λ λ λ −1 λ for some constant b ≥ 0, since |ui |H is uniformly bounded. For (1.5) if yi = (hi ) (ui − λ ui−1), then

λ λ λ λ λ 2 λ λ λ λ λ λ λ (yi+1 − yi , yi+1) + νhi+1 |yi+1| + hi b(yi , ui , yi+1) + (f(ti+1) = f(ti ), yi+1) Note that

1 1 1 1 λ λ λ λ λ 2 λ 2 λ 2 λ 2 |b(yi , ui , yi+1)| ≤ M1|ui |V |yi |H |yi |V |yi+1|H |yi+1|V (5.4) ν ν M 2|uλ|2 M 2|uλ|2 ≤ |yλ |2 + |yλ |2 + 1 i V |yλ |2 + 1 i V |yλ |2 . 2 i+1 V 2 i+1 V 8ν i+1 V 8ν i+1 V λ For simplicity of our discussions we assume hi = h Then we have (5.5) λ 2 λ 2 λ 2 λ 2 λ λ λ (1−h ωα) |yi+1|H +hν |yi+1|V + ≤ (1+h ωα) |yi+1|H +ν |yi+1|V +|f(ti+1−f(ti )|H |yi+1|H

λ which implies that |yi |H ≤ M for some constant independent of h > 0. Note that

1 1 1 1 2 2 2 2 |A(t, u)u − A(t, v)u|H = sup |b(u, u, φ) − b(v, u, φ)| ≤ |u − v|V |u − v|H |u|V |A0u|H |phi|H ≤1 and from (5.3) N ν X hλ|A uλ|2 ≤ C 2 i 0 i H i=1 λ λ λ λ λ for some constant C independent of N and hi . Since |yi |H ≤ M, |ui − ui−1| ≤ M hi . Thus (1.5) holds and (1.1) has a unique integrable solution u and limλ→0+ uλ = u uniformly on [s, T ]. Next, we consider the three dimensional problem (d = 3). We show that there exists a locally defined solution and a global solution exists when the data (u0, f(·)) are small. We have the corresponding estimate of (5.3):

1 ν 27M 4 1 (5.6) (|uδ|2 − |u0|2 ) + |A uδ|2 ≤ 2 |u0|4 |uδ|2 + |P f(t)|2 . 2δ V V 2 0 4ν3 V V ν H

139 We define the functional ϕ by

2 −1 ϕ(u) = 1 − (1 + |u|V ) −1 λ Since s → 1 − (1 + s) is concave and |ui |H is uniformly bounded, it follows from (5.2) and (5.6) that for every δ > 0

ϕ(uδ) − ϕ(u0) 27M 4 1 (5.6) ≤ ( 2 |u0|4 |uδ|2 + |P f(t)|2 )(1 + |u0|2)−2. δ 4ν3 V V ν H Thus λ λ 4 ϕ(ui ) − ϕ(ui−1) 27M2 0 4 δ 2 1 2 0 2 −2 λ ≤ ( 3 |u |V |u |V + |P f(t)|H )(1 + |u | ) ≤ bi hi 4ν ν

4 27M2 λ 2 c1 where bi = ( 4ν3 |ui |V + ν and from (5.3)

N X 1 c1T hλ |uλ|2 ≤ (|u |2 + ). i i V ν 0 H ν i=1 The estimate (5.4) is replaced by

1 3 1 3 λ λ λ λ λ 4 λ 4 λ 4 λ 4 |b(yi , ui , yi+1)| ≤ M1|ui |V |yi |H |yi |V |yi+1|H |yi+1|V

ν ν 3M 2|uλ|2 3M 2|uλ|2 ≤ |yλ |2 + |yλ |2 + 1 i V |yλ |2 + 1 i V |yλ |2 . 2 i+1 V 2 i+1 V 32ν i+1 V 32ν i+1 V λ λ Thus, if hi = h then we obtain (5.5) and thus |yi |H ≤ M for some constant independent of h > 0 and (1.5) holds. 

6.4 Approximation theory In this section we discuss the approximation theory of the mild solution to (6.24). Consider an evolution equation in Xn: d (3.1) u (t) ∈ A (t)u (t) t > s; u (s) = x dt n n n n where Xn is a linear closed subspace of X. We consider the family of approximating sequences (An(t), dom (An(t)), ϕ, D) on Xn satisfying the uniform dissipativity: there + + exist constant ω = ωα continous functions f : [0,T ] → X and L : R → R indepen- dent of t, s ∈ [0,T ] and n such that

(3.2) (1 − λωα) |x1 − x2| ≤ |x1 − x2 − λ (y1 − y2)| + λ |f(t) − f(s)|L(x2)|K(|y2|) for all x1 ∈ Dα ∩ dom (An(t)) x2 ∈ Dα ∩ dom (An(s))and y1 ∈ An(t)x1, y2 ∈ An(s)x2, and the consistency:

for β > 0, t ∈ [0,T ], and [x, y] ∈ A(t) with x ∈ Dβ,

(3.3) there exists [xn, yn] ∈ An(t) with xn ∈ Dα(β) such that

lim |xn − x| + |yn − y| = 0 as n → ∞.

140 where α(β) ≥ β and α : R+ → R+ is an increasing function. Lemma 3.1 Let (A(t),X) satisfy (A.1) − (A.2) and either (R) − (C.1) or (R.1) − (C.2) λ λ λ λ on [0,T ]. For each λ > 0, we assume that (ti , xi , yi , i ) satisfies

λ λ λ xi − xi−1 λ λ λ yj = λ λ − i ∈ A(ti )xi ti − ti−1

λ λ λ with t0 = s and x0 = x and x ∈ Dα for 0 ≤ i ≤ Nλ. For the case of (R.1) − (C.2) l we assume that x ∈ D × dom (A(s)), f is of bounded variation, and |y ambdai| is uniformly bounded in 1 ≤ i ≤ Nλ and λ > 0. Then the step function uλ(t; s, x) defined λ λ λ by uλ(t, s, x) = xi on (ti−1, ti ], satisfies

2ω(2t+d ) |uλ(t; s, x) − u(t; s, x)| ≤ e λ (2|x − u| + dλ (|v| + Mρ˜ (T ))

−1 +M˜ (T − s)(c ρ(T )dλ + ρ(δ) + δλ).

Proof: It follows from Lemms 2.6–2.7 that there exists a DS-approximation sequence µ µ µ µ (tj , xj , yj , j ) as defined in (2.12). For the case of (C.2) − (R.1), from Lemma 2.4 we µ ˜ have |yj | ≤ K for 1 ≤ j ≤ Nµ uniformly in µ. It thus follows from Theorem 2.5 that there exists a constnat M˜ such that

2ω(2t+d ) |uλ(t; s, x) − uµ(t; s, x)| ≤ e λ (2|x − u| + dλ (|v| + Mρ˜ (T ))

−1 +M˜ (T − s)(c ρ(T )dλ + ρ(δ) + δλ − δµ).

Theorem 3.2 Let (An(t), dom (An(t)), ϕ, D) be approximating sequences satisfying (R) or (R.1) (resp.) and (3.2)–(3.3) and we assume (A(t), dom (A(t)), D, ϕ) satisfies (A.1)–(A.2) and (R) or (R.1) (resp.). Then for every x ∈ D ∩ dom (A(s)) and xn ∈ D ∩ Xn such that lim xn = x as n → ∞ we have

lim |un(t; s, xn) − u(t; s, x)| = 0, as n → ∞ uniformly on [s, T ], where u(t; s, x) and un(t; s, x) is the unique mild solution to (2.1) and (3.1), respectively.

Proof: Let [xi, yi] ∈ A(ti) and xi ∈ Dα for i = 1, 2. From (3.3) we can choose n n n n n [xi , yi ] ∈ A (ti), i = 1, 2 with xi ∈ Dα0 such that |xi − xi| + |yi − yi| → 0 as n → ∞ λ λ λ for i = 1, 2. Thus, letting n → ∞ in (3.2), we obtain (C.1) or (C.2). Let (ti , xi , yi ) be a DS–approximation sequence of (2.1). We assume that there exits a β > 0 such λ that xi ∈ Dβ for all λ and 1 ≤ i ≤ Nλ. By the consistency (3.3) for any  > 0 there exists an integer n = n() such that for n ≥ n()

λ,n λ λ,n λ (3.4) |xi − xi | ≤ , |yi − yi | ≤ 

λ,n and xi ∈ Dα for 1 ≤ i ≤ Nλ.

PNλ λ,n λ,n λ λ,n i=1 |xi − xi−1 − hi yi | (3.5) PNλ λ λ λ λ ≤ i=1 |xi − xi−1 − hi yi | + (Nλ + T )  + |xn − x| = δn,λ,

141 By Theorems 2.5 and 2.8 that

2ω(2t+dλ) |uλ,n(t, xn) − un(t; xn)| ≤ e (2|xn − un| + dλ (|vn| + Mρ˜ (T )) (3.6) −1 +M˜ (T − s)(c ρ(T )dλ + ρ(δ) + δλ,n,) for all 0 < c < δ < T and [un, vn] ∈ An(s) with un ∈ Dα. From the definition of function uλ, uλ,n and (3.4)

|uλ(t; s, x) − uλ,n(t; s, xn)| ≤ , t ∈ (s, T ], n ≥ n()

Thus, we have

2ω(2t+dλ) |un(t; s, xn) − u(t; , s, x)| ≤ e (2|xn − un| + 2|x − u| + dλ (|vn| + |v| + 2Mρ˜ (T ))

−1 +2M˜ (T − s)(c ρ(T )dλ + ρ(δ)) + δλ,n,) +  for t ∈ (s, T ], n ≥ n(), where [u, v] ∈ A(s) with u ∈ Dβ. From the consistency (3.3) we can take [un, vn] ∈ An(s) such that un → u and vn → v as n → ∞. It thus follows that

2ω(2t+d ) limn→∞ |un(t; s, xn) − u(t; s, x)| = e λ (4|x − u| + 2dλ (|v| + Mρ˜ (T ))

−1 +2M˜ (T − s)(c ρ(T )dλ + ρ(δ)) + δλ,) + 

+ + for t ∈ [s, T ], where δλ, = (Nλ +T ). Now, letting  → 0 and then λ → 0 , we obtain

4ωt lim |un(t; s, xn) − u(t; s, x)| = e (4|x − u| + ρ(δ)). n→∞

Since u ∈ Dβ ∩dom (A(s)) and δ > 0 are arbitrary it follows that limn→∞ |un(t; s, xn)− u(t; s, x)| = 0 uniformly on [s, T ].  The following theorems give the equivalent characterization of the consistency con- dition (3.3). Theorem 3.2 Let (An,Xn), n ≥ 1 and (A, X) be dissipative operators satisfying the condition dom (An) ⊂ R(I − λ An) and dom (A) ⊂ R(I − λ A) n −1 −1 and set Jλ = (I − λ An) and Jλ = (I − λ A) for λ > 0. Also, let B be the operator −1 −1 that has {[Jλx, λ (Jλx − x)] : x ∈ dom (A), 0 < λ < ω } as its graph. Then the following statements (i) and (ii) are equivalent. (i) B ⊂ limn→∞ An (i.e., for all [x, y] ∈ B there exists a sequence {(xn, yn)}| such that [xn, yn] ∈ An and lim |xn − x| + |yn − y| → 0 as n → ∞.) (ii) dom (A) ⊂ limn→∞ dom (An) (equivalently, for all x ∈ dom (A) there exists a sequence {xn} such that xn ∈ dom (An) and lim |xn − x| → 0 as n → ∞.) and for all n xn ∈ dom (An) and x ∈ dom (A) with x = limn→∞ xn, we have limn→∞ Jλ xn → Jλx for each 0 < λ < ω−1. In particular, if dom (A) ⊂ dom (An) for all n, then the above are equivalent to n −1 (iii) limn→∞ Jλ x → Jλx for all 0 < λ < ω and x ∈ dom (A). Proof: (i) → (ii). Assume (i) holds. Then it is easy to prove that dom (B) ⊂ + limn→∞ dom (An). Since Jλx → x as lambda → 0 for all x ∈ dom (A), it follows that

142 dom (B) = dom (A). Thus, the first assertion of (ii) holds. Next, we let xn ∈ dom (An), x ∈ dom (A) and limn→∞ xn = x. From (i), we have I −λ B ⊂ limn→∞ I −λ An. Thus, −1 R(I − λ B) ⊂ limn→∞ R(I − λ An). Since x = Jλx − λ{λ (Jλx − x)} ∈ (I − λ B)Jλx for x ∈ dom (A), we have dom (A) ⊂ R(I − λ B). From (i) we can choose [un, vn] ∈ An −1 n such that lim |un −Jλx)|+|vn −λ (Jλx−x)| → 0 as n → ∞. If zn = Jλ xn, then there exists a [zn, yn] ∈ An such that xn = zn − λyn. It thus follows from the dissipativity of An that

|un − zn| ≤ |un − zn − λ(vn − yn)| = |un − λvn − xn| → |Jλx − (Jλx − x) − x| = 0 as n → ∞. Hence, we obtain

lim Jλxn = lim zn = lim un = Jλx as n → ∞. (ii) → (i). If [u, v] ∈ B, then from the definition of B, there exist x ∈ dom (A) −1 −1 and 0 < λ < ω such that u = Jλx and v = λ (Jλx − x). Since x ∈ dom (A) ⊂ limn→∞ dom (An) we can choose xn ∈ dom (An) such that lim xn = x. Thus from the n −1 n −1 assumption, we have lim Jλ xn = jλx = u and lim λ (Jλ xn − xn) = λ (Jλx − x) = v as n → ∞. Hence we obtain [u, v] ∈ limn→∞ An and thus B ⊂ limn→∞ An. Finally, if dom (A) ⊂ dom (An), then (ii) → (iii) is obvious. Conversely, if (iii) holds, then

n n |Jλ xn − Jλx| ≤ |xn − x| + |Jλ x − Jλx| → 0 as n → ∞ when lim xn = x, and thus (ii) holds. 

Theorem 3.3 Let (An,Xn), n ≥ 1 and (A, X) be m-dissipative operators, i.e.,

Xn = R(I − λ An) and X = R(I − λ A)

n −1 −1 −1 and set Jλ = (I − λ An) and Jλ = (I − λ A) for 0 < λ < ω . Then the following statements are equivalent. (i) A = limn→∞ An (ii) A ⊂ limn→∞ An n (iii) For all xn, x ∈ X such that limn→∞ xn = x, we have limn→∞ Jλ xn → Jλx for each 0 < λ < ω−1. −1 n (iv) For all x ∈ X and 0 < λ < ω limn→∞ Jλ x → Jλx. −1 n (v) For some 0 < λ0 < ω , limn→∞ Jλ x → Jλx for all x ∈ X. Proof: (i) → (ii) and (iv) → (v) are obvious. (ii) → (iii) follows from the proof of (i) → (ii) in Theorem 3.2. (v) → (ii). If [x, y] ∈ A, then from (v), lim J n (x − λ y) = J (x − λ y) = x and λ0 0 λ0 0 lim λ−1(J n (x−λ y)−(x−λ y)) = y as n → ∞. Since λ−1(J n (x−λ y)−(x−λ y)) ∈ 0 λ0 0 0 0 λ0 0 0 A J n (x − λ y), [x, y] ∈ lim A and thus (ii) holds. n λ0 0 n→∞ n (ii) → (i). It suffices to show that limn→∞ An ⊂ A. If [x, y] ∈ limn→∞ An then there exists a [xn, yn] ∈ An such that lim |xn − x| + |yn − y| = 0 as n → ∞. Since n xn − λ yn → x − λ y and n → ∞, it follows from (iii) that xn = Jλ (xn − λ yn) → Jλ(x − λ y) as n → ∞ and hence x = Jλ(x − λ y) ∈ dom (A). But, since

−1 −1 λ (x − (x − λ y)) = λ (Jλ(x − λ y) − (x − λ y)) ∈ AJλ(x − λ y) = Ax we have [x, y] ∈ A and thus (i) holds. 

143 The following corollary is an immediate consequence of Theorems 3.1 and 3.3.

Corollary 3.4 Let (An(t),Xn) and (A(t),X) be m-dissipative operators for t ∈ [0,T ], (i.e., Xn = R(I − λ An(t)) and X = R(I − λ A(t)) and (C.2) is satisfied), and let Un(t, s),U(t, s) be the nonlinear semigroups generated n −1 −1 by An(t),A(t), respectively. We set Jλ (t) = (I −λ An(t)) and Jλ(t) = (I −λ A(t)) for 0 < λ < ω−1. Then, if

n Jλ0 xn → Jλ0 x as n → ∞ for all sequence {xn} satisfying xn → x as n → ∞, then

|Un(t, s)xn − U(t, s)x| → 0 as n → ∞ where the convergence is uniform on arbitrary bounded subintervals. The next corollary is an extension of the Trotter-Kato theorem on the convergence of linear semigroups to the nonlinear evolution operators.

Corollary 3.5 Let (An(t),X) be m-dissipative operators and {Un(t, s), t ≥ s ≥ 0} n −1 be the semigroups generated by An(t). We set Jλ (t) = (I − λ An(t)) . For some )λ < ω−1, we assume that there exists lim J n (t)x exits as n → ∞ for all x ∈ X and 0 λ0 t ≥ 0 and denote the limit by Jλ0 x. Then we have −1 (i) The operator A(t) defined by the graph {[Jλ0 (t)x, λ0 (Jλ0 (t)x − x)] : x ∈ X} is an m-dissipative operator. Hence (A(t),X) generates a nonlinear semigroup U(t, s) on X.

(ii) For every x ∈ R(Jλ0 (s)) (= dom (A(s))), there exist a xn ∈ dom (An(s)) such that lim xn = x and lim U(t, s)xn = U(t, s)x for t ≥ s ≥ 0 as n → ∞. Moreover, the above convergence holds for every xn ∈ dom (An(s)) = X satisfying lim xn = x, and the convergence is uniform on arbitrary bounded intervals.

Proof: (i) Let [ui, vi] ∈ A(ti), i = 1, 2. Then from the definition of A(t) we have

−1 ui = Jλ0 (ti)xi and vi = λ0 (Jλ0 (ti)xi − xi) for xi, i = 1, 2. By the assumption

n −1 n −1 lim Jλ0 (ti)xi = Jλ0 (ti)xi lim λ0 (Jλ0 (ti)xi − xi) = λ0 (Jλ0 (ti)xi − xi) as n → ∞ for i = 1, 2. Since λ−1(J n (t )x − x ) ∈ A (t )J n(J n (t )x ), it follows from 0 λ0 i i i n i λ0 i i (C.2) that

n n (1 − λω) |Jλ0 (t1)x1 − Jλ0 (t2)x2|

n n −1 n −1 n ≤ |Jλ0 (t1)x1 − Jλ0 (t2)x2 − λ (λ0 (Jλ0 (t1)x1 − x1) − λ0 (Jλ0 (t2)x2 − x2)|

n −1 n +λ |f(t) − f(s)|L(|Jλ0 (t2)x2|)(1 + |λ0 (Jλ0 (t2)xs − x2)|). Letting n → ∞, we obtain

(1 − λω) |u1 − u2| ≤ |u1 − u2 − λ (v1 − v2)| + λ |f(t) − f(s)|L(|u2|)(1 + |v2|).

144 Hence, (A(t),X) is dissipative. Next, from the definition of A(t), for every x|inX we −1 have x = Jλ0 (t)x − λ0(λ0 (Jλ0 (t)x − x)) ∈ (I − λ0 A(t))Jλ0 x, which implies Jλ0 (t)x = −1 (I − λ A(t)) x. Hence, we obtain R(I − λ0 A(t)) = X and thus A(t) is m-dissipative. −1 (ii) Since A(t) is an m-dissipative operator and Jλ0 x = (I−λ0 A(t)) and dom (A(t)) =

R((Jλ0 (t)), it follows from Corollary 3.4 that Un(t, s)xn → U(t, s)x as n → ∞ if x ∈ dom (A(s)) and xn → x as n → ∞. 

6.5 Chernoff Theorem In this section we discuss the Chernoff theorem for the evolution equation (2.1).

Lemma 2.12 Let {Tρ(t)}, t ∈ [0,T ] for ρ > 0 be a family of mapping from D into itself satisfying Tρ(t): Dβ → Dψ(ρ,β), for t ∈ [0,T ] and β ≥ 0,

|Tρ(t)x − Tρ(t)y| ≤ (1 + ωαρ) |x − y| for x, y ∈ Dα, and

|Aρ(t)x − Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t) − f(s)| for x ∈ D and t, s ∈ [0,T ], where 1 A (t)x = (T (t)x − x), x ∈ D ρ ρ ρ

Then, the evolution operator Aρ(t), t ∈ [0,T ] satisfies (C.2).

Proof: For x1, x2 ∈ Dα

|x1 − x2 − λ (Aρ(t)x1 − Aρ(s)x2)|

λ λ ≥ (1 + ) |x − x | − (|T (t)x − T (t)x |) + |T (t)x − T (s)x | ρ 1 2 ρ ρ 1 ρ 2 ρ 2 ρ 2

≥ (1 − λωα) |x1 − x2| − λ L(|x2|)(1 + |Aρ(s)x2|) |f(t) − f(s)|.

Lemma 2.14 Let X0 be a closed convex subset of a Banach space X and α ≥ 1. Suppose C(t): X0 → X0, t ∈ [0,T ] be a family of evolution operators satisfying

(2.30) |C(t)x − C(t)y| ≤ α |x − y| for x, y ∈ X0, and

(2.31) |C(t)x − C(s)x| ≤ L(|x|)(δ + |(C(s) − I)x|) |g(t) − g(s)|. where δ > 0 and g : [0,T ] → X is continuous. Then, there exists a unique function 1 u = u(t; x) ∈ C (0,T ; X0) satisfying d (2.32) u(t) = (C(t) − I)u(t), u(0) = x, dt

145 and we have

(2.33) |u(t; x) − u(t; y)| ≤ e(α−1)t |x − y|.

Proof: It suffices to prove that there exists a unique function u ∈ C(0,T ; X0) satisfying Z t (2.34) u(t) = e−tx + e−(t−s)C(s)u(s) ds, t ∈ [0,T ]. 0

First, note that if v(t) : [0,T ] → X0 is continuous, then for t ∈ [0,T ]

−t R t −(t−s) e x + 0 e C(s)v(s) ds

R t −(t−s) ! 0 e C(s)v(s) ds = (1 − λ)x + λ ∈ X0 R t −(t−s) 0 e ds where Z t λ = e−(t−s) ds, 0 since X0 is closed and convex. Define a sequence of functions un ∈ C(0,T ; X0) by Z t −t −(t−s) un(t) = e x + e C(s)un−1(s) ds 0 with u0(t) = x ∈ X0. By induction we can show that

(αt)n Z t |un+1(t) − un(t)| ≤ |C(s)x − x| ds n! 0 and thus {un} is a Cauchy sequence in C(0,T ; X0). It follows that un converges to a unique limit u ∈ C(0,T ; X0) and u satisfies (2.34). Also, it is easy to show that there exists a unique continuous function that satisfies (2.34). Moreover since for x, y ∈ X0 Z t et |u(t; x) − u(t; y)| ≤ α es|u(s; x) − u(s; y)| ds, 0 by Gronwall’s inequality we have (2.33). Since s → C(s)u(s) ∈ X0 is continuous, 1 u ∈ C (0,T ; X0) satisfies (2.32). 

Theorem 2.15 Let α ≥ 1 and let C(t): Dβ → Dβ, t ∈ [0,T ] be a family of evolution operators satisfying (2.30)–(2.31). Assume that Xβ is a closed convex subset of X and g is Lipschitz continuous with Lipschitz constant Lg. Then, if we let u(t) = u(t; x) be the solution to d u(t; x) = (C(t) − I)u(t; x), u(0; x) = x, dt then there exist some constants Mτ such that for t ∈ [0, τ] (2.35) n n (α−1)t 2 1 |u(t; x) − Πi=1Cix| ≤ α e [(n − αt) + αt] 2 (Cn,τ + |C(0)x − x|)

Z t (α−1)(t−s) 2 1 +LgL(Mτ ) e [(n − s) − α(t − s) + α(t − s)] 2 K(|C(s)u(s) − u(s)|) ds 0

146 where Ci = C(i), i ≥ 0 and Cn,τ = L(|x|)Lg(δ + |C(0)x − x|) max(n, τ). Proof: Note that Z t (2.36) u(t; x) − x = es−t(C(s)u(s; x) − x) ds 0 Thus, from (2.30)–(2.31)

Z t |u(t; x) − x| ≤ es−t(|C(s)u(s; x) − C(0)u(s, x)| + |C(0)u(s; x) − C(0)x| + |C(0)x − x|) ds 0

Z t ≤ es−t(L(|u(s, x)|)|g(s) − g(0)|K(|C(0)x − x|) + |C(0)x − x| + α |u(s; x) − x|) ds 0 where K(s) = δ + s. Or, equivalently

Z t et |u(t; x)−x| ≤ es(L(M)|g(s)−g(0)|K(|C(0)x−x|)+|C(0)x−x|+α es|u(s; x)−x|) ds 0 since |u(t, x)| ≤ M on [0, τ] for some M = Mτ . Hence by Gronwall’s inequality Z t (2.37) |u(t; x)−x| ≤ e(α−1)(t−s)(L(M)|g(s)−g(0)|K(|C(0)x−x|)+|C(0)x−x|) ds. 0 It follows from (2.36) that

Z t n −t n s−t n u(t; x) − Πi=1Cix = e (x − Πi=1Cix) + e (C(s)u(s; x) − Πi=1Cix) ds. 0 Thus, by assumption Z t n −t n s−t n−1 |u(t) − Πi=1Cix| ≤ e |x − Πi=1Cix| + e (|C(s)u(s) − Cnu(s)| + |Cnu(s) − CnΠi=1 Cix|) ds 0

Z t −t n s−t n−1 ≤ e α δn + e (L(M)|g(s) − g(n)| |C(s)x − x| + α |u(s) − Πi=1 Cix| ds 0 Pn where δn = i=1 |Cix − x|. If we define

−n t n ϕn(t) = α e |u(t, x) − Πi=1Cix| then Z t s −n ϕn(t) ≤ δn + (Me α |g(s) − g(n)| + ϕn−1(s)) ds 0 By induction in n we obtain from (2.36)–(2.37) (2.38) n−1 k Z t X δn−kt 1 ϕ (t) ≤ + (t − s)n−1αseαs ds (L(|x|)L τK(|C(0)x − x|) + |C(0)x − x|) n k! (n − 1)! g k=0 0

n X Z t (t − s)k +L(M)α−n esαk |C(s)u(s) − u(s)| |g(s) − g(n − k)| ds k! k=0 0

147 on t ∈ [0, τ], where we used

Z t e(α−1)s ds ≤ αte−teαt. 0 Since Z t (k + 1)!(n − 1)! (t − s)n−1sk+1 ds = tk+n+1 . 0 (k + n + 1)! we have ∞ Z t X αk Z t (t − s)n−1seαs ds = (t − s)n−1sk+1 ds k! 0 k=0 0

∞ ∞ X (k + 1)αktk+n+1 X (k − n)αk−1tk = (n − 1)! = (n − 1)! . (k + n + 1)! k! k=0 k=n+1

Note that from (2.31)

|C(k)x − x| ≤ L(|x|)LgkK(|C(0)x − x|) for 0 ≤ k ≤ n.

Let Cn,τ = L(|x|)LgK(|C(0)x − x|)max(n, τ). Then we have (2.39) n−1 k Z t X δn−kt 1 + (t − s)n−1αseαs ds (L(|x|)L τK(|C(0)x − x|) + |C(0)x − x|) k! (n − 1)! g k=0 0

∞ k k ! X |k − n|α t αt 2 1 ≤ (C + |C(0)x − x|) ≤ Ce [(n − αt) + αt] 2 (C + |C(0)x − x|), k! n,τ n,τ k=0 where we used

1 ∞ k k ∞ 2 k k ! 2 X |k − n|α t αt X |k − n| α t αt 2 1 ≤ ≤ e [(n − αt) + αt] 2 k! 2 k! k=0 k=0 Moreover, we have

n X (t − s)k esαk |g(s) − g(n − k)| k! k=0

n X αk(t − s)k ≤ L es |s − (n − k)| g k! (2.40) k=0

1 ∞ 2 k k ! 2 s α(t−s) X |s − (n − k)| α (t − s) ≤ L e e 2 g k! k=0

s α(t−s) 2 1 ≤ Lge e [(n − s) − α(t − s) + α(t − s)] 2

Hence (2.35) follows from (2.36)–(2.40). 

148 Theorem 2.16 Let {Tρ(t)}, t ∈ [0,T ] for ρ > 0 be a family of mapping from D into itself satisfying

(2.41) Tρ(t): Dβ → Dψ(ρ,β), for t ∈ [0,T ] and β ≥ 0,

(2.42) |Tρ(t)x − Tρ(t)y| ≤ (1 + ωαρ) |x − y| for x, y ∈ Dα, and

(2.43) |Aρ(t)x − Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t) − f(s)| for x ∈ D and t, s ∈ [0,T ], where 1 A (t)x = (T (t)x − x), x ∈ D ρ ρ ρ

Assume that Dβ is a closed convex subset in X for each β ≥ 0 and f is Lipschitz continuous on [0,T ] with Lipschitz constant Lf . Then, if we let uρ(t) = u(t; x) be the solution to d (2.44) u(t; x) = A (t)u(t; x), u(0; x) = x ∈ D , dt ρ α then there exist constant M and ω = ωα such that

t [ ρ ] 1 ωt 2 2 ωt |uρ(t) − Πk=1Tρ(kρ)x| ≤≤ e [(1 + ωt) ρ + ωtρ + t] (e (L(|x|Lf t(1 + |Aρ(0)x|) + |Aρ(0)x|)

Z t ω(t−σ) √ +Lf L(M) e (1 + |Aρ(σ)uρ(σ)|) dσ) ρ. 0 for x ∈ dom (A(0)) and t ∈ [0,T ].

Proof: Let α = 1 + ωρ and we let C(t) = Tρ(ρ t), g(t) = f(ρt) and δ = ρ. Then, C(t) satisfies (2.30)–(2.31). Next, note that |uρ(t)| ≤ M. It thus follows from Lemma 2.14 and Theorem 2.15 that

t [ ρ ] 1 √ 2ωt 2 2 |uρ(t) − Πk=1Tρ(kρ)x| ≤ e [(1 + ωt) ρ + ωtρ + t] ρ ((|Aρ(0)x| + L(|x|)Lf t(1 + |Aρ(0)x|))

Z t ωt ω(t−σ) 2 1 √ +Lf L(M)e e [(1 + ω(t − σ)) ρ + ω(t − σ)ρ + (t − σ)] 2 ρ(1 + |Aρ(σ)uρ(σ)|) dσ. 0 σ where we set s = ρ , since Lg = Lf ρ.  Theorem 2.17 Assume that (A(t), dom (A(t)), D, ϕ) satisfies (A.1) − (A.2) and either (R) − (C.1) or (2.3) − (C.2) and that Dβ is a closed convex subset in X for each β ≥ 0 and f is Lipschitz continuous on [0,T ]. Let {Tρ(t)}, t ∈ [0,T ] for ρ > 0 be a family of mapping from D into itself satisfying (2.41)–(2.43) and assume the consistency

for β ≥ 0, t ∈ [0,T ], and [x, y] ∈ A(t) with x ∈ Dβ

there exists xρ ∈ Dα(β) such that lim |xρ − x| + |Aρ(t)xρ − y| = 0 as ρ → 0.

149 Then, for 0 ≤ s ≤ t ≥ T and x ∈ D ∩ dom (A(s))

[ t−s ] ρ + (2.45) |Πk=1 Tρ(kρ + s)x − u(t; s, x)| → 0 as ρ → 0 uniformly on [0,T ], where u(t; s, x) is the mild solution to (2.1), defined in Theorem 2.10. Proof: Without loss of generality we can assume that s = 0. It follows from Lemma 2.4 and Lemma 2.12 that

(α−1) t+L(M) |f|BV (0,T ) (2.46) |Aρ(t)u(t)| ≤ e (|Aρ(0)x| + L(M) |f|BV (0,T )) on [0,T ]. Note that uρ(t) satisfies u (iλ) − u ((i − 1)λ) ρ ρ = A (iλ)u (iλ) + λ λ ρ ρ i where Z iλ λ 1 i = (Aρ(s)uρ(s) − Aρ(iλ)uρ(iλ)) ds. λ (i−1)λ

Since Aρ(t) : [0,T ] × X → X and t → uρ(t) ∈ X are Lipschitz continuous, it follows that PNλ λ + i=1 λ |i | → 0 as λ → 0 .

Hence uρ(t) is the integral solution to (2.44). It thus follows from Theorem 3.2 that

lim |uρ(t) − u(t; s, x)|X → 0 as ρ → 0 for x ∈ D ∩ dom (A(s)). Now, from Theorem 2.16 and (2.46)

t−s [ ρ ] √ |Πk=1 Tρ(kρ + s)x − uρ(t)| ≤ M ρ for x ∈ Dβ ∩ dom (A(s)). Thus, (2.45) holds for all for x ∈ Dβ ∩ dom (A(s)). By the continuity of the right-hand side of (2.45) with respect to the initial condition x ∈ X (2.45) holds for all x ∈ Dβ ∩ dom (A(s)).  The next corollary follows from Theorem 2.17 and is an extension of the Chernoff theorem of the autonomous nonlinear semigroups to the evolution case. Corollary 2.18 (Chernoff Theorem Let C be a closed convex subset of X. We assume that the evolution operator (A(t),X) satisfies (A.1) with Lipschtz continuous f and C ⊂ R(I − λ A(t)) and (I − λA(t))−1 ∈ C for 0 < λ ≤ δ and t ≥ 0. Let {Tρ(t)}, t ≥ 0 for ρ > 0 be a family of mapping from C into itself satisfying |Tρ(t)x − Tρ(t)y| ≤ (1 + ωρ) |x − y| and |Aρ(t)x − Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t) − f(s)| for x, y ∈ C. If A(t) ⊂ limρ→0+ Aρ(t), or equivalently T (t) − I (I − λ ρ )−1x → (I − λ A(t))−1x ρ

150 for all x ∈ C and 0 < λ ≤ δ, then for x ∈ C and 0 ≤ s ≤ t

[ t−s ] ρ + |Πk=1 Tρ(s + kρ)x − U(t, s)x| → 0 as ρ → 0 , where U(t, s) is the nonlinear semigroup generated by A(t) and the convergence is uniform on arbitrary bounded intervals. Corollary 2.19 (Chernoff Theorem) Assume that (A(t),X) is m-dissipative and satisfy (A.1) with Lipschitz continuous f, and that dom (A(t)) are independent of t ∈ [0,T ] and convex. Let {Tρ(t)}, t ≥ 0 for ρ > 0 be a family of mapping from X into itself satisfying |Tρ(t)x − Tρ(t)y| ≤ (1 + ωρ) |x − y| for x, y ∈ X, and

|Aρ(t)x − Aρ(s)x| ≤ L(|x|)(1 + |Aρ(s)x|) |f(t) − f(s)|

If A(t) ⊂ limρ→0+ Aρ(t), or equivalently

T (t) − I (I − λ ρ )−1x → (I − λ A(t))−1x 0 ρ 0

−1 for all x ∈ X, t ≥ 0 and some 0 < λ0 < ω , then for x ∈ X and t ≥ s ≥ 0

[ t−s ] ρ + |Πk=1 Tρ(s + kρ)x − U(t, s)x| → 0 as ρ → 0 , where U(t, s) is the nonlinear semigroup generated by A(t) and the convergence is uniform on arbitrary bounded intervals. Theorem 2.20 Let (A(t),X) be m-dissipative subsets of X ×X and satisfy (A.1) with Lipschitz continuous f and assume that dom (A(t)) are independent of t ∈ [0,T ] and −1 convex. Let Aλ(t) = λ (Jλ(t)−I) for λ > 0 and t ∈ [0,T ] and u(t; s, x) = U(t, s; Aλ)x be the solution to d u(t) = A (t)u(t), u(s) = x ∈ dom (A(s)) dt λ Then, we have

t [ λ ] U(t, s)x = lim Πk=1Jλ(s + k λ) = lim U(t, s; Aλ)x. λ→0+ λ→0+ 6.6 Operator Splitting ∗ Theorem 3.1 Let X and X be uniformly convex and let An, n ≥ 1 and A be m- 0 dissipative subsets of X × X. If for all [x, y] ∈ A there exists [xn, yn] ∈ An such that |xn − x| + |yn − y| → 0 as n → ∞, then

Sn(t)xn → S(t)x as n → ∞ for every sequence xn ∈ dom (An) satisfying xn → x ∈ dom (A), where the convergence is uniform on arbitrary bounded intevals.

151 Proof: It follows from Theorem 1.4.2–1.4.3 that for x ∈ dom (A), S(t)x ∈ dom (A0), d+ 0 0 dt S(t)x = A S(t)x, and t → A S(t)x ∈ X is continuous except a countable number λ λ λ of values t ≥ 0. Hence if we define xi = S(ti )x, ti = i λ, then

λ λ λ Z ti xi − xi−1 0 λ λ −1 0 λ λ − A xi = i = λ (A (t)x(t) − A(ti )x(ti )) dt λ λ ti−1 where Z T t PNλ λ 0 0 i=1 λ |i | ≤ |A S(t)x − A S(([ ] + 1)λ| dt. 0 λ 0 0 t + Since |A S(t)x − A S(([ λ ] + 1)λ)x| → 0 a.e. t ∈ [0,T ] as λ → 0 , by Lebesgue PNλ λ dominated convergence theorem i=1 λ |i | → 0 as λ → 0. Hence it follows from the proof of Theorem 2.3.2 that |Sn(t)xn − S(t)x| → 0 as λ → 0 for all xn ∈ dom (A) satisfying xn → x ∈ dom (A). The theorem follows from the fact that S(t) and Sn(t) are of cotractions.  Theorem 3.2 Let X and X∗ be uniformly convex and let A and B be two m-dissipative subests of X × X. Assume that A + B is m-dissipative and let S(t) be the semigroup generated by A + B. Then we have

[ t ] S(t)x = lim (I − ρ A)−1(I − ρ B)−1 ρ x ρ→0+ for x ∈ dom (A) ∩ dom (B), uniformly in any t-bounded intervals.  A B B Proof: Define Tρ = Jρ Jρ and let xρ = x−ρ b where b ∈ Bx. Then, since Jρ (x−ρ b) = x T x − x J Ax − x ρ ρ ρ = ρ + b. ρ ρ −1 0 + 0 Hence ρ (Tρxρ − xρ) → A x + b as ρ → 0 . If we choose b ∈ Bx such that A x + b = 0 −1 0 + (A + B) x, then ρ (Tρxρ − xρ) → (A + B) x as ρ → 0 . Thus, the theorem follows from Theorem 3.1 and Corollay 2.3.6.  Theorem 3.3 Let X and X∗ be uniformly convex and let A and B be two m-dissipative subests of X × X. Assume that A + B is m-dissipative and let S(t) be the semigroup generated by A + B. Then we have

 ρ ρ [ t ] S(t)x = lim (2(I − A)−1 − I)(2(I − B)−1 − I) ρ x ρ→0+ 2 2 for x ∈ dom (A) ∩ dom (B), uniformly in any t-bounded intervals. A B Proof: Define T2ρ = (2Jρ − I)(2Jρ − I) and let xρ = x − ρ b where b ∈ Bx. Then, B B since Jρ (x − ρ b) = x, it follows that 2Jρ (xρ) − xρ = x + ρ b and

T x − x 2J A(x + ρ b) − (x + ρ b) − (x − ρ b) J A(x + ρ b) − x 2ρ ρ ρ = ρ = ρ . 2ρ 2ρ ρ If we define the subset E of X × X by Ex = Ax + b, then E is m-dissipative and A E −1 0 Jρ (x+ρ b) = Jρ x. Thus, it follows from Theorem 1.6 that ρ (Tρxρ −xρ) → (A+B) x + as ρ → 0 and hence the theorem follows from Theorem 3.1 and Corollary 2.3.6. 

152 Theorem 3.4 Let X and X∗ be uniformly convex and let A and B be two m-dissipative subests of X ×X. Assume that A, B are single-valued and A+B is m-dissipative. Let SA(t),SB(t) and S(t) be the semigroups generated by A, B and A + B, respectively. Then we have [ t ] S(t)x = lim (SA(ρ)SB(ρ)) ρ x ρ→0+ for x ∈ dom (A) ∩ dom (B), uniformly in any t-bounded intervals.

Proof: Clearly Tρ = SA(ρ)SB(ρ) is nonexpansive on C = dom (A) ∩ dom (B). We −1 0 show that limρ→0+ h (Tρx − x) = Ax + B x for every dom (A) ∩ dom (B). Note that

T x − x S (ρ)x − x S (ρ)S (ρ)x − S (ρ)x ρ = A + A B A . ρ ρ ρ

−1 Since A is single-valued, it follows from Theorem 1.4.3 that limρ→0+ h (SA(ρ)x−x) = Ax. Thus, it suffices to show that

S (ρ)S (ρ)x − S (ρ)x y = A B A → B0x as ρ → 0+. ρ ρ

Since SA(ρ) is nonexpansive and B is dissipative, it follows from Theorem 1.4.3 that S (ρ)x − x (3.1) |y | ≤ | B | ≤ |B0x| for all ρ > 0. ρ ρ On the other hand,

S (ρ)u − u S (ρ)x − x S (ρ)x − x (3.2) h A + B − A − y ,F (u − S (ρ)x)i ≥ 0 ρ ρ ρ ρ B

since SA(ρ) − I is dissipative and F is singled-valued. We choose a subsequence {yρn } that converges weakly to y in X. It thus follows from (3.2) that since A is single-valued

hAu + B0x − Ax − y, F (u − x)i ≥ 0.

0 Since A is maximal dissipstive, this implies that y = B x and thus yρn converges 0 weakly to B x. Since X is uniformly convex, (3.2) implies that yρn converges strongly 0 + 0 0 to B x as ρn → 0 . Now, since C = A + B is m-dissipative and C x = Ax + B x, the theorem follows from Corollary 2.3.6. 

7 Monotone Operator Theory and Nonlinear semi- group

In this section we discuss the nonlinear operator theory that extends the Lax-Milgram theory for nonlinear monotone operators and mappings from Banach space from X to X∗. Let us denote by F : X → X∗, the duality mapping of X, i.e.,

F (x) = {x∗ ∈ X∗ : hx, x∗i = |x|2 = |x∗|2}.

153 By Hahn-Banach theorem, F (x) is non-empty. In general F is multi-valued. Therefore, when X is a Hilbert space, h·, ·i coincides with its inner product if X∗ is identified with X and F (x) = x. Let H be a Hilbert space with scalar product (φ, ψ) and X be a real, reflexive Banach space and X ⊂ H with continuous dense injection. Let X∗ denote the strong dual space of X. H is identified with its dual so that X ⊂ H = H∗ ⊂ X∗. The dual product hφ, ψi on X × X∗ is the continuous extension of the scalar product of H restricted to X × H. The following proposition contains some further important properties of the duality mapping F . Theorem (Duality Mapping) (a) F (x) is a closed convex subset. (b) If X∗ is strictly convex (i.e., balls in X∗ are strictly convex), then for any x ∈ X, F (x) is single-valued. Moreover, the mapping x → F (x) is demicontinuous, i.e., if ∗ xn → x in X, then F (xn) converges weakly star to F (x) in X . (c) Assume X be uniformly convex (i.e., for each 0 <  < 2 there exists δ = δ() > 0 such that if |x| = |y| = 1and |x − y| > , then |x + y| ≤ 2(1 − δ)). If xn converges weakly to x and lim supn→∞ |xn| ≤ |x|, then xn converges strongly to x in X. (d) If X∗ is uniformly convex, then the mapping x → F (x) is uniformly continuous on bounded subsets of X. Proof: (a) Closeness of F (x) is an easy consequence of the follows from the continuity ∗ ∗ of the duality product. Choose x1, x2 ∈ F (x) and α ∈ (0, 1). For arbitrary z ∈ X ∗ ∗ ∗ ∗ ∗ ∗ we have (using |x1| = |x2| = |x|) hz, αx1 + (1 − α)x2i ≤ α|z| |x1| + (1 − α)|z| |x2| = ∗ ∗ ∗ ∗ 2 |z| |x|, which shows |αx1 + (1 − α)x2| ≤ |x|. Using hx, x i = hx, x1i = |x| we get ∗ ∗ ∗ ∗ 2 ∗ ∗ hx, αx1 + (1 − α)x2i = αhx, x1i + (1 − α)hx, x2i = |x| , so that |αx1 + (1 − α)x2| = |x|. ∗ ∗ This proves αx1 + (1 − α)x2 ∈ F (x). ∗ ∗ ∗ ∗ ∗ (b) Choose x1, x2 ∈ F (x), α ∈ (0, 1) and assume that |αx1 +(1−α)x2| = |x|. Since X ∗ ∗ is strictly convex, this implies x1 = x2. Let {xn} be a sequence such that xn → x ∈ X. ∗ From |F (xn)| = |xn| and the fact that closed balls in X are weakly star compact we ∗ see that there exists a weakly star accumulation point x of {F (xn)}. Since the closed ball in X∗ is weakly star closed, thus

hx, x∗i = |x|2 ≥ |x∗|2.

Hence hx, x∗i = |x|2 = |x∗|2 and thus x∗ = F (x). Since F (x) is single-valued, this implies F (xn) converges weakly to F (x). (c) Since lim inf |xn| ≤ |x|, thus limn→∞ |xn| = |x|. We set yn = xn/|xn| and y = x/|x|. Then yn converges weakly to y in X. Suppose yn does not converge strongly to y in X. Then there exists an  > 0 such that for a subsequence yn˜ |yn˜ − y| > . Since ∗ X is uniformly convex there exists a δ > 0 such that |yn˜ + y| ≤ 2(1 − δ). Since the norm is weakly lower semicontinuos, lettingn ˜ → ∞ we obtain |y| ≤ 1 − δ, which is a contradiction. (d) Assume F is not uniformly continuous on bounded subsets of X. Then there exist constants M > 0,  > 0 and sequences {un}, {vn} in X satisfying

|un|, |vn| ≤ M, |un − vn| → 0, and |F (un) − F (vn)| ≥ .

Without loss of the generality we can assume that, for a constant β > 0, we have in

154 addition |un| ≥ β, |vn| ≥ β. We set xn = un/|un| and yn = vn/|vn|. Then we have 1 |xn − yn| = ||vn|un − |un|vn| |un| |vn|

1 2M ≤ (|v ||u − v | + ||v | − |u || |v |) ≤ |u − v | → 0 as n → ∞. β2 n n n n n n β2 n n

Obviously we have 2 ≥ |F (xn) + F (yn)| ≥ hxn,F (xn) + F (yn)i and this together with

2 2 hxn,F (xn) + F (yn)i = |xn| + |yn| + hxn − yn,F (yn)i

= 2 + hxn − yn,F (yn)i ≥ 2 − |xn − yn| implies lim |F (xn) + F (yn)| = 2. n→∞

Suppose there exists an 0 > 0 and a subsequence {nk} such that |F (xnk )−F (ynk )| ≥ 0. ∗ Observing |F (xnk )| = |F (ynk )| = 1 and using uniform convexity of X we conclude that there exists a δ0 > 0 such that

|F (xnk ) + F (ynk )| ≤ 2(1 − δ0), which is a contradiction to the above. Therefore we have limn→∞ |F (xn) − F (yn)| = 0. Thus |F (un) − F (vn)| ≤ |un| |F (xn) − F (yn)| + ||un| − |vn|| |F (yn)| which implies F (un) converges strongly to F (vn). This contradiction proves the result. 

7.1 Monotone equations and Minty–Browder Theorem Definition (Monotone Mapping) (a) A mapping A ⊂ X × X∗ be given. is called monotone if

hx1 − x2, y1 − y2i ≥ 0 for all [x1, y1], [x2, y2] ∈ A.

(b) A monotone mapping A is called maximal monotone if any monotone extension of A coincides with A, i.e., if for [x, y] ∈ X × X∗, hx − u, y − vi ≥ 0 for all [u, v] ∈ A then [x, y] ∈ A. (c) The operator A is called coercive if for all sequences [xn, yn] ∈ A with limn→∞ |xn| = ∞ we have hx , y i lim n n = ∞. n→∞ |xn| (d) Assume that A is single-valued with dom(A) = X. The operator A is called hemicontinuouson X if for all x1, x2, x ∈ X, the function defined by

t ∈ R → hx, A(x1 + tx2)i is continuous on R.

155 For example, let F be the duality mapping of X. Then F is monotone, coercive and hemicontinuous. Indeed, for [x1, y1], [x2, y2] ∈ F we have 2 2 2 hx1 − x2, y1 − y2i = |x1| − hx1, y2i − hx2, y1i + |x2| ≥ (|x1| − |x2|) ≥ 0, (7.1) which shows monotonicity of F . Coercivity is obvious and hemicontinuity follows from the continuity of the duality product. Lemma 1 Let X be a finite dimensional Banach space and A be a hemicontinuous monotone operator from X to X∗. Then A is continuous. Proof: We first show that A is bounded on bounded subsets. In fact, otherwise there exists a sequence {xn} in X such that |Axn| → ∞ and xn → x0 as n → ∞. By monotonicity we have

Axn Ax hxn − x, − i ≥ 0 for all x ∈ X. |Axn| |Axn|

Without loss of generality we can assume that Axn → y in X∗ as n → ∞. Thus |Axn| 0

hx0 − x, y0i ≥ 0 for all x ∈ X and therefore y0 = 0. This is a contradiction and thus A is bounded. Now, assume {xn} converges to x0 and let y0 be a cluster point of {Axn}. Again by monotonicity of A hx0 − x, y0 − Axi ≥ 0 for all x ∈ X.

Setting x = x0 + t (u − x0), t > 0 for arbitrary u ∈ X, we have

hx0 − u, y0 − A(x0 + t (u − x0)) ≥ 0i for all u ∈ X. Then, letting limit t → 0+, by hemicontinuity of A we have

hx0 − u, y0 − Ax0i ≥ 0 for all u ∈ X, which implies y0 = Ax0.  Lemma 2 Let X be a reflexive Banach space and A : X → X∗ be a hemicontinuous monotone operator. Then A is maximumal monotone. ∗ Proof: For [x0, y0] ∈ X × X

hx0 − u, y0 − Aui ≥ 0 for all u ∈ X.

+ Setting u = x0 + t (x − x0), t > 0 and letting t → 0 , by hemicontinuity of A we have

hx0 − x, y0 − Ax0i ≥ 0 for all x ∈ X.

Hence y0 = Ax0 and thus A is maximum monotone.  The next theorem characterizes maximal monotone operators by a range condition.

Minty–Browder Theorem Assume that X, X∗ are reflexive and strictly convex. Let F denote the duality mapping of X and assume that A ⊂ X × X∗ is monotone. Then A is maximal monotone if and only if Range (λF + A) = X∗

156 for all λ > 0 or, equivalently, for some λ > 0.

Proof: Assume that the range condition is satisfied for some λ > 0 and let [x0, y0] ∈ X × X∗ be such that

hx0 − u, y0 − vi ≥ 0 for all [u, v] ∈ A.

Then there exists an element [x1, y1] ∈ A with

λF x1 + y1 = λF x0 + y0. (7.2)

From these we obtain, setting [u, v] = [x1, y1],

hx1 − x0, F x1 − F x0i ≤ 0.

By monotonicity of F we also have the converse inequality, so that

hx1 − x0, F x1 − F x0i = 0.

2 2 From (7.1) this implies that |x1| = |x0| and hx1, F x0i = |x1| , hx0, F x1i = |x0| . Hence F x0 = F x1 and 2 2 hx1, F x0i = hx0, F x0i = |x0| = |F x0| . If we denote by F ∗ the duality mapping of X∗ (which is also single-valued), then the last ∗ equation implies x1 = x0 = F (F x0). This and (7.2) imply that [x0, y0] = [x1, y1] ∈ A, which proves that A is maximal monotone.  In stead of the detailed proof of ”only if’ part of Theorem, we state the following results.  Corollary Let X be reflexive and A be a monotone, everywhere defined, hemicontinous operator. If A is coercive, then R(A) = X∗. ∗ Proof: Suppose A is coercive. Let y0 ∈ X be arbitrary. By the Appland’s renorming theorem, we may assume that X and X∗ are strictly convex Banach spaces. It then follows from Theorem that every λ > 0, equation

λ F xλ + Axλ = y0 has a solution xλ ∈ X. Multiplying this by xλ,

2 λ |xλ| + hxλ, Axλi = hy0, xλi. and thus hxλ, Axλi ≤ |y0|X∗ |xλ|X + Since A is coercive, this implies that {xλ} is bounded in X as λ → 0 . Thus, we may ∗ assume that xλ converges weakly to x0 in X and Axλ converges strongly to y0 in X as λ → 0+. Since A is monotone

hxλ − x, y0 − λ F xλ − Axi ≥ 0, and letting λ → 0+, we have

hx0 − x, y0 − Axi ≥ 0,

157 for all x ∈ X. Since A is maximal monotone, this implies y0 = Ax0. Hence, we ∗ conclude R(A) = X .  Theorem (Galerkin Approximation) Assume X is a reflexive, separable Banach space and A is a bounded, hemicontinuous, coercive monotone operator from X into ∗ n X . Let Xn = span{φ}i=1 satisfies the density condition: for each ψ ∈ X and any  > 0 there exists a sequence ψn ∈ Xn such that |ψ − ψn| → 0 as n → ∞. The xn be the solution to hψ, Axni = hψ, fi for all ψ ∈ Xn, (7.3) then there exists a subsequence of {xn} that converges weakly to a solution to Ax = f.

Proof: Since hx, Axi/|x|X → ∞ as |x|X → ∞ there exists a solution xn to (7.3) and |xn|X is bounded. Since A is bounded, thus Axn bounded. Thus there exists a subsequence of {n} (denoted by the same) such that xn converges weakly to x in X ∗ and Axn converges weakly in X . Since

lim hψ, Axni = lim (hψn, fi + hψ − ψn, Axni) = hψ, fi n→∞ n→∞

Axn converges weakly to f. Since A is monotone

hxn − u, Axn − Aui ≥ 0 for all u ∈ X

Note that lim hxn, Axni = lim hxn, fi = hx, fi. n→∞ n→∞ Thus taking limit n → ∞, we obtain

hx − u, f − Aui ≥ 0 for all u ∈ X.

Since A is maximum monotone this implies Ax = f.  The main theorem for monotone operators applies directly to the model problem involving the p-Laplace operator

−div(|∇u|p−2∇u) = f on Ω

(with appropriate boundary conditions) and ∂ −∆u + c u = f, − u ∈ β(u). at ∂Ω ∂n with β maximal monotone on R. Also, nonlinear problems of non-variational form are applicable, e.g., Lu + F (u, ∇u) = f on Ω where L(u) = −div(σ(∇u) −~b u) 1,p and we are looking for a solution u ∈ W0 (Ω), 1 < p < ∞. We assume the following conditions: (i) Monotonicity for the principle part L(u):

n (σ(ξ) − σ(η), ξ − η)Rn ≥ 0 for all ξ, η ∈ R .

158 (ii) Monotonicity for F = F (u):

(F (u) − F (v), u − v) ≥ 0 for all u, v ∈ R.

(iii) Coerciveness and Growth condition: for some c, d > 0

(σ(ξ), σ) ≥ c |ξ|p, |σ(ξ)| ≤ d (1 + |ξ|p−1) hold for all ξ ∈ Rn.

7.2 Pseudo-Monotone operator Next, we examine a somewhat more general class of nonlinear operators, called pseudo- monotone operators. In applications, it often occurs that the hypotheses imposed on monotonicity are unnecessarily strong. In particular, the monotonicity assumption F (u) involves both the first-order derivatives and the function itself. In general the compactness will take care of the lower-order terms. Definition (Pseudo-Monotone Operator) Let X be a reflexive Banach space. An operator T : X → X∗ is said to be pseudo-monotone operator if T is a bounded operator (not necessarily continuous) and if whenever un converges weakly to u in X and lim suphun − u, T (un)i ≤ 0, n→∞ it follows that, for all v ∈ X,

lim infhun − v, T (un)i ≥ hu − v, T (u)i. (7.4) n→∞ For example, a monotone and hemi-continuous operator is pseudo-monotone. Lemma If T is maximal monotone, then T is pseudo-monotone. Proof: Since T is monotone,

hun − u, T (un)i ≥ hun − u, T (u)i → 0 it follows from (7.4) that limn hun − u, T (un)i = 0. Assume un → u and T (un) → y weakly in X and X∗, respectively. Thus,

0 ≤ hun − v, T (un) − T (v)i = hun − u + u − v, T (un) − T (v)i → hu − v, y − T (v)i.

Since T is maximal monotone, we have y = T (u). Now,

limhun − v, T (un)i = limhun − u + u − v, T (un)i = hu − v, T (u)i. n n  A class of pseudo-monotone operators is intermediate between monotone and hemi- continuous. The sum of two pseudo-monotone operators is pseudo-monotone. More- over, the sum of a pseudo-monotone and a strongly continuous operator (xn converges weakly to x, then T xn → T x strongly) is pseudo-monotone. Property of Pseudo-monotone operator (1) Letting v = u in (7.4)

0 ≤ lim infhun − u, T (un)i ≤ lim suphun − u, T (un)i ≤ 0

159 and thus limnhun − u, T (un)i = 0. (2) From (1) and (7.4) hu − v, T ui ≤ lim infhun − v, T (un)i + lim infhu − un − u, T (un)i = lim infhu − v, T (un)i

(3) T (un) → T (u) weakly. In fact, from(2)

hv, T (u)i ≤ lim infhv, T (un)i ≤ lim suphv, T (un)i

= lim suphun − u, T (un)i + lim suphu + v − un,T (un)i

= − lim infhun − (u + v),T (un)i ≤ hv, T (u)i.

(4) hun − u, T (un) − T (u)i → 0 By the hypothesis

0 ≤ lim infhun − u, T (un)i = lim infhun − u, T (un)i − lim infhun − u, T (u)i

= lim infhun − u, T (un) − T (u)i ≤ lim suphun − u, T (un) − T (u)i

= lim suphun − u, T (un) − T (u)i ≤ 0

(5) hun,T (un)i → hu, T (u)i. Since

hun,T (un)i = hun − u, T (un) − ui − hu, T (un)i − hun,T (u)i + hun,T (un)i the claim follows from (3)–(4). (6) Conversely, if T (un) → T (u) weakly and hun,T (un)i → hu, T (u)i → 0, then the hypothesis holds. Using a very similar proof to that of the Browder-Minty theorem, one can show the following. Theorem (Bresiz) Assume, the operator A : X → X∗ is pseudo-monotone, bounded and coercive on the real separable and reflexive Banach space X. Then, for each f ∈ X∗ the equation A(u) = f has a solution.

7.3 Generalized Pseudo-Monotone Mapping In this section we discuss pseudo-monotone mappings. We extend the notion of the maximal monotone mapping, i.e., the maximal monotone mapping is a generalized pseudo-monotone mappings. Definition (Pseudo-monotone Mapping) (a) The set T (u) is nonempty, bounded, closed and convex for all u ∈ X. (b) T is upper semicontinuous from each finite-dimensional subspace F of X to the on X∗, i.e., to a given element u+ ∈ F and a weak neighborhood V ∗ of T (u), in X there exists a neighborhood U of u0 in F such that T (u) ⊂ V for all u ∈ U. (c) If {un} is a sequence in X that converges weakly tou ∈ X and if wn ∈ T (un) is such that lim suphun − u, wni ≤ 0, then to each element v ∈ X there exists w ∈ T (u) with the property that

lim infhun − v, wni ≥ hu − v, wi.

160 Definition (Generalized Pseudo-monotone Mapping) A mapping T from X into X∗ is said to be generalized pseudo-monotone if the following is satisfied. For any se- ∗ quence {un} in X and a corresponding sequence {wn} in X with wn ∈ T (xn), if un converges weakly to u and wn weakly to w such that

lim suphun − u, wn)i ≤ 0, n→∞ then w ∈ T (u) and hun, wni → hu, wi. Let X be a reflexive Banach space, T a pseudo-monotone mapping from X into X∗. Then T is generalized pseudo-monotone. Conversely, suppose T is a bounded generalized pseudo-monotone mapping from X into X∗. and assume that for each u ∈ X, T u is a nonempty closed convex subset of X∗. Then T is pseudo-monotone. PROPOSITION 2. A maximal monotone mapping T from the Banach space X into X∗ is generalized pseudo-monotone.

Proof: Let {un} be a sequence in X that converges weakly to u ∈ X, {wn} a se- ∗ ∗ quence in X with wn ∈ T (un) that converges weakly to w ∈ X . Suppose that lim suphun − u, wni ≤ 0 Let [x, y] ∈ T be an arbitrary element of the graph G(T ). By the monotonicity of T , hun − x, wn − yi ≥ 0 Since hun, wni = hun − x, wn − yi + hx, wni + hun, yi − hx, yi, where hx, wni + hun, yi → hx, wi + hu, yi − hx, yi, we have hu, wi ≤ lim suphun, wni ≥ hx, wi + hu, yi − hx, yi, i.e., hu − x, w − yi ≥ 0 Since T is maximal, w ∈ T (u). Consequently,

hun − u, wn − wi ≥ 0.

Thus, lim infhun, wni ≥ lim inf(hun, wihu, wni + hu, wi = hu, wi,

It follows that limhun, wni → hu, wi.  DEFINITION 5 (1) A mapping T from X into X∗ is coercive if there exists a function + c : R+ → R with limr∞ c(r) = ∞ such that

hu, wi ≥ c(|u|)|u| for all [u, w] ∈ G(T ).

(1) An operator T from X into X∗ is called smooth if it is bounded, coercive, maximal monotone, and has effective domain D(T ) = X. (2) Let T be a generalized pseudo-monotone mapping from X into X∗. Then T is said ∗ to be regular if R(T + T2) = X for each smooth operator T2.

161 (3) T is said to be quasi-bounded if for each M > 0 there exists K(M) > 0 such that whenever [u, w] ∈ G(T ) of T and

hu, wi ≤ M |u|, |u| ≤ M, then |w| ≤ K(M). T is said to be strongly quasi-bounded if for each M > 0 there exists K(M) > 0 such that whenever [u, w] ∈ G(T ) of T and

hu, wi ≤ M |u| ≤ M, then |w| ≤ K(M).

Theorem 1 Let T = T1 + T0, where T1 is maximal monotone with 0 ∈ D(T ), while T0 is a regular generalized pseudo-monotone mapping such that for a given constant k, hu, wik |u| for all [u, w] ∈ G(T ). If either T0 is quasi-bounded or T1 is strongly quasi-bounded, then T is regular. Theorem 2 (Browder). Let X be a reflexive Banach space and T be a generalized pseudo-monotone mapping from X into X∗. Suppose that R( F + T ) = X∗ for  > 0 and T is coercive. Then R(T ) = X∗. Lemma 1 Let T be a generalized pseudo-monotone mapping from the reflexive Banach space X into X∗, and let C be a bounded weakly closed subset of X. Then T (C) = {w ∈ X∗ : w ∈ T (u)for some u ∈ C} is closed in the strong topology of X∗. ∗ Proof: Let Let {wn} be a sequence in T (C) converging strongly to some w ∈ X . For each n, there exists un ∈ C such that wn ∈ T (un). Since C is bounded and weakly closed, without loss of the generality we assume un → u ∈ C, weakly in X. It follows that limhun − u, wni = 0. The generalized pseudo-monotonicity of T implies that w ∈ T (u), i.e., w lies in T (C).  Proof of Theorem 2: Let J denote the normalized Let F denote the duality mapping ∗ of X into X . It is known that F is a smooth operator. Let w0 be a given element of ∗ X . We wish to show that w0 ∈ R(T ). For each  > 0, it follows from the regularity of T that there exists an element u ∈ X such that

w0 −  p ∈ T (u) for some p ∈ F (u).

2 Let w = w0 −  p. Since T is coercive and hu, wi = |u| , we have for some k

2  |u| ≤ k |u| + |u||w0|.

Hence  |u| =  |p| ≤ |w0|. and |w| ≤ |w0| +  |p| ≤ k + 2|w0|

Since T is coercive, there exits M > 0 such that |u| ≤ M and thus

+ |w − w0| ≤  |p| =  |u| → 0 as  → 0

It thus follows from Lemma 1 that T (B(0,M)) is closed and thus w0 ∈ T (B(0,M)), i.e., w0 ∈ R(T ). 

162 THEOREM 3 Let X be a reflexive Banach space and T a pseudo-monotone mapping from X into X∗. Suppose that T is coercive, R(T ) = X∗. PROPOSITION 10. Let F be a finite-dimensional Banach space, a mapping from F into F ∗ such that for each u ∈ F , T(u) is a nonempty bounded closed convex subset of F ∗. Suppose that T is coercive and upper semicontinuous from F to F ∗. Then R(T ) = F ∗. ∗ ∗ Proof: Since for each w0 ∈ F define the mapping Tw0 : F → F by

Tw0 (u) = T (u) − w0 satisfies the same hypotheses as the original mapping T . It thus suffices to prove that 0 ∈ R(T ). Suppose that 0 does not lie in R(T ). Then for each u ∈ F , there exists v(u) ∈ F such that inf hv(u), wi > 0 w∈T (u) Since T is coercive, there exists a positive constant R such that c(R) > 0, and hence for each u ∈ F with |u| = R and each w ∈ T (u), hu, wi ≥ λ > 0 where λ = c(R)R, For such points u ∈ F , we may take v(u) = u. For a given nonzero v0 in F let

Wv0 = {u ∈ F : inf hv0, wi > 0}. w∈T (u)

The family {Wv0 : v0 ∈ F, v0 6= 0} forms a covering of the space F . By the upper ∗ semicontinuity of T , from F to F each Wv0 is open in F . Hence the family {Wv0 : v0 ∈ F, v0 6= 0} forms an open covering of F . We now choose a finite open covering {V1, ··· ,Vm} of the closed ball B(0,R) in F of radius R about the origin, with the property that for each k there exists an element vk in F such that Vk ⊂ Wvk and the additional condition that if Vk intersects the boundary sphere S(0,R), then vk is a R point of Vk ∩ S(0,R) and the diameter of such Vk is less than 2 .. We next choose a partition of unity {α1, ··· , αm} subordinated to the covering {V1, ··· ,Vm} where each αk is a continuous mapping of F into [0,I] which vanishes outside the corresponding P set Vk, and with the property that k αk(x) = 1 for each x ∈ B(0,R). Using this partition of unity, we may define a continuous mapping f of B(0,R) into F by setting X f(x) = α(x)vk. k

We note that for each k for which alphak(x) > 0 and for any winT0x, we have hvk, wi > 0 since x must lie in Vk, which is a subset of Wk. Hence for each x ∈ B(0,R) and any w ∈ T x 0 X hf(x), wi = α(x)hvk, wi > 0 k Consequently, f(x) 6= 0 for any x ∈ B(0,R). This implies that the degree of the mapping f on B(0,R) with respect to 0 equals 0. For x ∈ S(0,R), on the other hand, f(x) is a convex linear combination of points vk ∈ S(0,R), each of which is at distance R at most 2 from the point x. We infer that R |f(x) − x| ≤ , x ∈ S(0,R). 2

163 By this inequality, f considered as a mapping from B(0,R) into F is homotopic to the identity mapping. Hence the degree of f on B(0,R) with respect to 0 is 1. This contradiction proves that 0 must lie in T0u for some u ∈ B(0,R).  Proof of Theorem 3 Let Λ be the family of all finite-dimensional subspaces F of X, ordered by inclusion. For F ∈ Λ, let jF : F → X denote the inclusion mapping of F ∗ ∗ ∗ ∗ ∗ into X, and jF : X → F the dual projection mapping of X onto F . The operator

∗ TF = jF T jF

∗ ∗ then maps F into F . For each u ∈ F , TF u is a weakly compact convex subset of X ∗ ∗ ∗ and jF is continuous from the weak topology on X to the (unique) topology on F . ∗ Hence TF u is a nonempty closed convex subset of F . Since T is upper semicontinuous ∗ ∗ from F to F with X given its weak topology, TF is upper semicontinuous from F ∗ to F . By Proposition 10, to each F ∈ Λ there exists an element uF ∈ F such that ∗ jF w0 = wF for some wF ∈ TF uF . The coerciveness of T implies that wF ∈ TF uF , i.e.

∗ huF , jF w0i = huF , wF i ≥ c(|uF |)|uF |.

Consequently, the elements {|uF |} are uniformly bounded by M for all F ∈ Λ. For F ∈ Λ, let [ VF = {uF 0 }. F 0∈Λ,F 0⊂F

Then the set VF is contained in the closed ball B(0,M) in X Since B(0,M) is weakly compact, and since the family {weak closure(VF )} has the finite intersection property, the intersection ∩F ∈Λ{weak closure(VF )} is not empty. Let u0 be an element contained in this intersection.

The proof will be complete if we show that 0 ∈ Tw0 u0. Let v ∈ X be arbitrarily given. We choose F ∈ Λ such that it contains u0, v ∈ F Let {uFk }denote a sequence in V converging weakly to u . Since j∗ w = 0, we have F 0 Fk Fk

huFk − u0, wFk i = 0 for all k.

Consequently, by the pseudo-monotonicity of T , to given v ∈ X there exists w(v) ∈ T (u0) with

limhuFk − v, wFk i ≥ hu0 − v, wi. (7.5)

Suppose now that 0 ∈/ T (u0). Then 0 can be separated from the nonempty closed convex set T (u0), i.e., there exists an element x = u − v ∈ X such that

inf hu0 − v, zi > 0. z∈T (u0)

But this is a contradiction to (7.5).  Navier-Stokes equations Example Consider the constrained minimization

min F (y) subject to E(y) = 0. where E(y) = E0y + f(y).

164 The necessary optimality condition for x = (y, p)

 ∂F − E0(y)∗p  A(y, p) ∈ . E(y)

Let  ∗  ∂F − E0 p A0(y, p) ∈ , E0y and  −f 0(y)p  A (y, p) = . 1 f(y)

7.4 Fixed Point Theorems In this section we discuss the fixed point theorem. Banach Fixed Point Theorem. Let (X, d) be a non-empty complete metric space with a contraction mapping T : X → X. Then T admits a unique fixed-point x∗ in X ∗ (i.e. T (x∗) = x∗). Moreover, x is a fixed point of the fixed point iterate xn = T (xn−1) with arbitrary initial condition x0. Proof: d(xn+1, xn) = d(T (xn),T (xn−1) ≤ ρ d(xn, xn−1. By the induction n d(xn+1, xn) ≤ ρ d(x1, x0) and ρn−1 d(x , x ) ≤ d(x , x ) → 0 m n 1 − ρ 1 0 ∗ as m ≥ n → ∞. Thus, {xn} is a Cauchy sequence and {xn} converges a fixed point x ∗ ∗ of T . Suppose x1, x2 are two fixed point. Then,

∗ ∗ ∗ ∗ ∗ ∗ d(x1, x2) = d(T (x1),T (x2) ≤ ρ d(x1, x2)

∗ ∗ ∗ ∗ Since ρ < 1, d(x1, x2) = 0 and thus x1 = x2.  Brouwer fixed point theorem: is a fundamental result in topology which proves the existence of fixed points for continuous functions defined on compact, convex subsets of Euclidean spaces. Kakutani’s theorem extends this to set-valued functions. Kakutani’s theorem states: Let S be a non-empty, compact and convex subset of some Euclidean space Rn. Let T : S → S be a set-valued function on S with a closed graph and the property that T (x) is non-empty and convex for all x ∈ S. Then T has a fixed point. The Schauder fixed point theorem is an extension of the Brouwer fixed point theorem to topological vector spaces. Schauder fixed point theorem Let X be a locally convex , and let K ⊂ Xbe a compact, and convex set. Then given any continuous mapping T : K → K has a fixed point.

165 Proof: Given  > 0 there exists the family of open balls {B(x): x ∈ K} is open covering of K. Since K is compact, there exists a finite open sub-cover, i.e. there exist finite many points {xk} in K such that

n [ K = B(xk). k=1

Define the functions {gk} by gk(x) = max(0,  − |x − xk|). It is clear that each gk is P continuous, gk(x) ≥ 0 and k gk(x) > 1 for all x ∈ K. Thus, if we define a function on K by P k gk(x) xk g(x) = P k gk(x) n and then g is a continuous function from K to the convex hull K0 of points {xk}k=1. It is easily shown that |g(x) − x| ≤  for x ∈ K. Then g maps K0 to K0. Since K0 is compact convex subset of a finite dimensional vector space, we can apply the Brouwer fixed point theorem to assure the existence of z ∈ K0 such that z = g(z). We have the estimate |z − T (z)| = |g(T (z)) − T (z)| ≤ .

Since K is compact, the exits a subsequence n → 0 such that xn → x0 ∈ K and thus

|xn − T (xn )| ≤ |g(T (xn )) − T (xn )| ≤ n.

Since T is continuous, T (x0) = x0.  Schaefer’s fixed point theorem: Let T be a continuous and compact mapping of a Banach space X into itself, such that the set

{x ∈ X : x = λT x for some 0 ≤ λ ≤ 1} is bounded. Then T has a fixed point. Proof: Let |x| ≤ M for {x ∈ X : x = λT x for some 0 ≤ λ ≤ 1}. Define

MT (x) T˜(x) = max(M, |T (x)|)

Note that T˜ : B(0,M) → B(0,M).. Let K be the closed convex hull go T˜(B(0,M)). Since T is compact, K is a compact convex subset of X. Since TK˜ → K, it follows from Shauder’s fixed point theorem that there exits x ∈ K such that x = T˜(x). We now claim that x is a fixed point of T . If not, we should have |T (x)| > M and

M x = λ T (u) for λ = < 1 |T (x)|

˜ But, since |x| = |T (x)| ≤ M, which is a contradiction.  The advantage of Schaefer’s theorem over Schauder’s for applications is that it is not necessary to determine an explicit convex, compact set K. Krasnoselskii theorem The sum of two operators T = A + B has a fixed point in a nonempty closed convex subset C of a real Banach space X under conditions such

166 that T (C) ⊂ C, A is continuous on C, A(C) is a relatively compact subset of X, and B is a strict contraction on C. Note that the proof of Kransnoselskiis fixed point theorem combines the Banach contraction theorem and Schauders fixed point theorem, i.e., y ∈ C, Ay + Bx = x has the fixed point map x = ψ(y) and use Schauders fixed point theorem for ψ. Kakutani-Glicksberg-Fan theorem Let S be a non-empty, compact and convex subset of a locally convex topological vector space X. Let T : S → S be a Kakutani map, i.e., it is upper hemicontinuous, i.e., if for every open set W ⊂ X, the set {x ∈ X : T (x) ∈ W } is open in X and T (x) is non-empty, compact and convex for all x ∈ X. Then T has a fixed point. The corresponding result for single-valued functions is the Tychonoff fixed-point theorem.

8 Convex Analysis and Duality

This chapter is organized as follows. We present the basic convex analysis and du- ality theory for the optimization in Banach spaces. The subdifferential operator and monotone operator equation in Banach spaces are presented.

8.1 Convex Functional and Subdifferential Definition (Convex Functional) (1) A proper convex functional on a Banach space X is a function ϕ from X to (−∞, ∞], not identically +∞ such that

ϕ((1 − λ) x1 + λ x2) ≤ (1 − λ) ϕ(x1) + λ ϕ(x2) for all x1, x2 ∈ X and 0 ≤ λ ≤ 1. (2) A functional ϕ : X → R is said to be lower-semicontinuous if

ϕ(x) ≤ lim inf ϕ(y) for all x ∈ X. y→x

(3) A functional ϕ : X → R is said to be weakly lower-semicontinuous if

ϕ(x) ≤ lim inf ϕ(xn) n→∞ for all weakly convergent sequence {xn} to x. (4) The subset D(ϕ) = {x ∈ X; ϕ(x) < ∞} of X is called the domain of ϕ. (5) The epigraph of ϕ is defined by epi(ϕ) = {(x, c) ∈ X × R : ϕ(x) ≤ c}. Lemma 3 A convex functional ϕ is lower-semicontinuous if and only if it is weakly lower-semicontinuous on X. Proof: Since the level set {x ∈ X : ϕ(x) ≤ c} is a closed convex subset if ϕ is lower- semicontinuous. Thus, the claim follows the fact that a convex subset of X is closed if and only if it is weakly closed. Lemma 4 If ϕ be a proper lower-semicontinuous, convex functional on X, then ϕ is bounded below by an affine functional, i.e., there exist x∗ ∈ X∗ and c ∈ R such that

ϕ(x) ≥ hx∗, xi + β, x ∈ X.

167 Proof: Let x0 ∈ X and β ∈ R be such that ϕ(x0) > c. Since ϕ is lower-semicontinuous on X, there exists an open neighborhood V (x0) of X0 such that ϕ(x) > c for all x ∈ V (x0). Since the ephigraph epi(ϕ) is a closed convex subset of the product space X ×R. It follows from the separation theorem for convex sets that there exists a closed hyperplane H ⊂ X × R;

∗ ∗ ∗ H = {(x, r) ∈ X × R : hx0, xi + r = α} with x0 ∈ X , α ∈ R, that separates epi(ϕ) and V (x0) × (−∞, c). Since {x0} × (−∞, c) ⊂ {(x, r) ∈ X × R : ∗ hx0, xi + r < α} it follows that

∗ hx0, xi + r > α for all (x, c) ∈ epi(ϕ) which yields the desired estimate. Theorem C.6 If F : X → (−∞, ∞] is convex and bounded on an open set U, then F is continuous on U. Proof: We choose M ∈ R such that F (x) ≤ M −1 for all x ∈ U. Letx ˆ be any element in U. Since U is open there exists a δ > 0 such that the open ball {x ∈ X : |x−xˆ| < δ is  contained in U. For any epsilon ∈ (0, 1), let θ = . Then for x ∈ X satisfying M − F (ˆx) |x − xˆ| < θ δ x − xˆ |x − xˆ| | +x ˆ − xˆ| = < δ θ θ x − xˆ Hence +x ˆ ∈ U. By the convexity of F θ x − xˆ F (x) ≤ (1 − θ)F (ˆx) + θ F ( +x ˆ) ≤ (1 − θ)F (ˆx) + θ M θ and thus F (x) − F (ˆx) < θ(M − F (ˆx) =  xˆ − x Similarly, +x ˆ ∈ U and θ θ xˆ − x 1 θM 1 F (ˆx) ≤ F ( +x ˆ) + F (x) < + F (x) 1 + θ θ 1 + θ 1 + θ 1 + θ which implies F (x) − F (ˆx) > −θ(M − F (ˆx) = −

Therefore |F (x) − F (¯x)| <  if |x − xˆ| < θδ and F is continuous in U.  Definition (Subdifferential) Given a proper convex functional ϕ on a Banach space X the subdifferential of ∂ϕ(x) is a subset in X∗, defined by

∂ϕ(x) = {x∗ ∈ X∗ : ϕ(y) − ϕ(x) ≥ hx∗, y − xi for all y ∈ X}.

∗ ∗ Since for x1 ∈ ∂ϕ(x1) and x2 ∈ ∂ϕ(x2),

∗ ϕ(x1) − ϕ(x2) ≤ hx2, x1 − x2i

∗ ϕ(x2) − ϕ(x1) ≤ hx1, x2 − x1i

168 ∗ ∗ it follows that hx1 − x2, x1 − x2i ≥ 0. Hence ∂ϕ is a monotone operator from X into X∗. Example 1 Let ϕ be Gateaux differentiable at x. i.e., there exists w∗ ∈ X∗ such that

ϕ(x + t v) − ϕ(x) lim = hw∗, hi for all h ∈ X t→0+ t and w∗ is the Gateaux differential of ϕ at x and is denoted by ϕ0(x). If ϕ is convex, then ϕ is subdifferentiable at x and ∂ϕ(x) = {ϕ0(x)}. Indeed, for v = y − x

ϕ(x + t (y − x)) − ϕ(x) ≤ ϕ(y) − ϕ(x), 0 < t < 1 t Letting t → 0+ we have

ϕ(y) − ϕ(x) ≥ hϕ0(x), y − xi for all y ∈ X, and thus ϕ0(x) ∈ ∂ϕ(x). On the other hand if w∗ ∈ ∂ϕ(x) we have for y ∈ X and t > 0

ϕ(x + t y) − ϕ(x) ≥ hw∗, yi. t Taking limit t → 0+, we obtain

hϕ0(x) − w∗, yi ≥ 0 for all y ∈ X.

This implies w∗ = ϕ0(x). 1 2 Example 2 If ϕ(x) = 2 |x| then we will show that ∂ϕ(x) = F (x), the duality map- ping. In fact, if x∗ ∈ F (x), then 1 hx∗, x − yi = |x|2 − hy, x∗i ≥ (|x|2 − |y|2) for all y ∈ X. 2 Thus x∗ ∈ ∂ϕ(x). Conversely, if x∗ ∈ ∂ϕ(x), then 1 (|y|2 − |x|2) ≥ hx∗, y − xi for all y ∈ X (8.1) 2 We let y = t x, 0 < t < 1 and obtain 1 + t |x|2 ≤ hx, x∗i 2 and thus |x|2 ≤ hx, x∗i. Similarly, if t > 1, then we conclude |x|2 ≥ hx, x∗i and therefore |x|2 = hx, x∗i and |x∗| ≥ |x|. On the other hand, letting y = x + λ u, λ > 0 in (8.1), we have 1 λ hx∗, ui ≤ (|x + λ u|2 − |x|2) ≤ λ |u||x| + λ |u|2, 2 which implies hx∗, ui ≤ |u||x|. Hence |x∗| ≤ |x| and we obtain |x|2 = |x∗|2 = hx∗, xi.

Example 3 Let K be a closed convex subset of X and IK be the indicator function of K, i.e.,  0 if x ∈ K I (x) = K ∞ otherwise.

169 Obviously, IK is convex and lower-semicontinuous on X. By definition we have for x ∈ K ∗ ∗ ∗ ∂IK (x) = {x ∈ X : hx , x − yi ≥ 0 for all y ∈ K}

Thus D(IK ) = D(∂IK ) = K and ∂K (x) = {0} for each interior point of K. Moreover, if x lies on the boundary of K, then ∂IK (x) coincides with the cone of normals to K at x. Note that ∂F (x) is closed and convex and may be empty. Theorem C.10 If a convex function F is continuous atx ¯ then ∂F (¯x) is non empty.

Proof: Since F is continuous at x for any  > 0 there exists a neighborhood U ofx ¯ such that F (x) ≤ F (¯x) + , x ∈ U.

Then U × (F (¯x) + , ∞) is an open set in X × R and is contained in epi F . Hence (epi F )o is non empty. Since F is convex epi F is convex and (epi F )o is convex. For any neighborhood of O of (¯x, F (¯x)) there exists a t < 1 such that (¯x, tF (¯x)) ∈ O. But, tF (¯x) < F (¯x) and so (¯x, tF (¯x)) ∈/ epi F . Thus (¯x, F (¯x)) ∈/ (epi F )o. By the Hahn Banach separation theorem, there exists a closed hyperplane S = {(x, a) ∈ X × R : hx∗, xi + α a = β} for nontrivial (x∗, α) ∈ X∗ × R and β ∈ R such that hx∗, xi + α a > β for all (x, a) ∈ (epi F )o (8.2) hx∗, x¯i + α F (¯x) = β.

Since (epi F )o = epi F every neighborhood of (x, a) ∈ epi F contains an element of (epi ϕ)o. Suppose hx∗, xi + α a < β. Then {(x0, a0) ∈ X × R : hx∗, x0i + α a0 < β} is an neighborhood of (x, a) and contains an element of (epi F )o, which contradicts to (8.2). Hence hx∗, xi + α a ≥ β for all (x, a) ∈ epi F. (8.3)

Suppose α = 0. For any u ∈ U there is an a ∈ R such that F (u) ≤ a. Then from (8.3) hx∗, ui = hx∗, ui + α a ≥ β and thus ∗ hx , u − x¯i ≥ 0 for all u ∈ U. Choose a δ such that |u − x¯| ≤ δ implies u ∈ U. For any nonzero element x ∈ X let δ t = |x| . Then |(tx +x ¯) − x¯| = |tx| = δ so that tx +x ¯ ∈ U. Hence hx∗, xi = hx∗, (tx +x ¯) − x¯i/t ≥ 0.

Similarly, −t x +x ¯ ∈ U and hx∗, xi = hx∗, (−tx +x ¯) − x¯i/(−t) ≤ 0. Thus, hx∗, xi and x∗ = 0, which is a contradiction. Therefore α is nonzero. It now follows from (8.2)–(8.3) that x∗ h− , x − x¯i + F (¯x) ≤ F (x) α

170 x∗ for all x ∈ X and thus − α ∈ ∂F (¯x).  Definition (Conjugate function) Given a proper functional ϕ on a Banach space X, the conjugate functional on X∗ is defined by

ϕ∗(x∗) = sup{hx∗, xi − ϕ(x)}. x Examples (Conjugate functions) (1) If ϕ(x) = ha, xi − b is a bounded afine functional, ∗ ∗ then ϕ (x ) = Ix∗=a. 1 p ∗ ) 1 ∗ p (2) If ϕ(x) = p |x| , then ϕ (x = q |x | . ∗ ∗ (3) If ϕ(x) = |x|, then ϕ (x ) = I|x∗|≤1. Theorem 2.1 The conjugate function ϕ∗ is convex, lower semi-continuous and proper.

∗ ∗ ∗ Proof: For 0 < t < 1 and x1, x2 ∈ X

∗ ∗ ∗ ∗ ∗ ϕ (t x1 + (1 − t) x2) = supx{ht x1 + (1 − t) x2, xi − ϕ(x)}

∗ ∗ ≥ t supx{hx1, xi − ϕ(x)} + (1 − t) supx{hx2, xi − ϕ(x)}

∗ ∗ ∗ ∗ = t ϕ (x1) + (1 − t) ϕ (x2). Thus, ϕ∗ is convex. For  > 0 arbitrary let y ∈ X be

ϕ∗(x∗) −  ≤ hx∗, yi − ϕ(y)

∗ For xn → x ∗ ∗ ∗ ϕ (xn) ≥ hxn, yi − ϕ(y) and lim inf ϕ∗(x∗ ) ≥ hx∗, yi − ϕ(y) n→∞ n ∗ Since  > arbitrary, ϕ is lower semi-continuous.  Definition (Bi-Conjugate Function) For any proper functional ϕ on X the bi- conjugate function of ϕ is ϕ∗∗ : X → (−∞, ∞] is defined by

ϕ∗∗(x) = sup {hx∗, xi − ϕ∗(x∗)}. x∗∈X∗

Theorem C.3 For any F : X → (−∞, ∞] F ∗∗ is convex and l.s.c. and F ∗∗ ≤ F . Proof: Note that F ∗∗(¯x) = sup {hx∗, x¯i − F ∗(x∗)}. x∗∈D(F ∗) Since for x∗ ∈ D(F ∗) hx∗, xi − F ∗(x∗) ≤ F (x) ∗∗ for all x, F (¯x) ≤ F (¯x).  Theorem If F is a proper l.s.c. convex function, then F = F ∗∗. If X is reflexive, then (F ∗)∗ = F ∗∗.

171 Proof: By Theorem if F is a proper l.s.c. convex function, then F is bounded below by an affine functional hx∗, xi − a, i.e. a ≥ hx∗, xi − F (x) for all x ∈ X. Define

a(x∗) = inf{a ∈ R : a ≥ hx∗, xi − F (x) for all x ∈ X}

Then for a ≥ a(x∗) hx∗, xi − a ≤ hx∗, xi − a(x∗) for all x ∈ X. By Theorem again

F (¯x) = sup {hx∗, x¯i − a(x∗)}. x∗∈X∗ Since a(x∗) = sup{hx∗, xi − F (x)} = F ∗(x) x∈X we have ∗ ∗ ∗ F (x) = sup {hx , xi − F (x )}. x∗∈X∗ We have the duality property. Theorem 2.2 Let F be a proper convex function on X. (1) For F : X → (−∞, ∞] x∗ ∈ ∂F (¯x) is if and only if

F (¯x) + F ∗(x∗) = hx∗, x¯i.

(2) Assume X is reflexive. If x∗ ∈ ∂F (¯x), thenx ¯ ∈ ∂F ∗(x∗). If F is convex and lower semi-continuous, then x∗ ∈ ∂F (¯x) if and only ifx ¯ ∈ ∂F ∗(x∗). Proof: (1) Note that x∗ ∈ ∂F (¯x) is if and only if

hx∗, xi − F (x) ≤ hx∗, x¯i − F (¯x) (8.4) for all x ∈ X. By the definition of F ∗ this implies F ∗(x∗) = hx∗, x¯i−F (¯x). Conversely, if F (x∗) + F (¯x) = hx∗, x¯i then (8.4) holds for all x ∈ X. (2) Since F ∗∗ ≤ F by Theorem C.3, it follows from (1)

F ∗∗(¯x) ≤ hx∗, x¯i − F ∗(x∗)

But by the definition of F ∗∗

F ∗∗(¯x) ≥ hx∗, x¯i − F ∗(x∗).

Hence F ∗(x∗) + F ∗∗(¯x) = hx∗, x¯i. Since X is reflexive, F ∗∗ = (F ∗)∗. If we apply (1) to F ∗ we havex ¯ ∈ ∂F ∗(x∗). In addition if F is convex and l.s.c, it follows from Theorem C.1 that F ∗∗ = F . Thus if x∗ ∈ ∂F ∗(¯x) then

F (¯x) + F ∗(x∗) = F ∗(x∗) + F ∗∗(x) = hx∗, x¯i

∗ ∗ by applying (1) for F . Therefore x ∈ ∂F (¯x) again by (1). 

172 Theorem(Rockafellar) Let X be real Banach space. If ϕ is lower-semicontinuous proper convex functional on X, then ∂ϕ is a maximal monotone operator from X into X∗. Proof: We prove the theorem when X is reflexive. By Apuland theorem we can assume that X and X∗ are strictly convex. By Minty-Browder theorem ∂ϕ it suffices to prove ∗ ∗ ∗ ∗ that R(F + ∂ϕ) = X . For x0 ∈ X we must show that equation x0 ∈ F x + ∂ϕ(x) has at least a solution x0 Define the proper convex functional on X by 1 f(x) = |x|2 + ϕ(x) − hx∗, xi. 2 X 0

Since f is lower-semicontinuous and f(x) → ∞ as |x| → ∞ there exists x0 ∈ D(f) such that f(x0) ≤ f(x) for all x ∈ X. Since F is monotone

∗ ϕ(x) − ϕ(x0) ≥ hx0, x − x0, i − hx − x0,F (x)i.

Setting xt = x0 + t (u − x0) and since ϕ is convex, we have 1 ϕ(u) − ϕ(x ) ≥ (ϕ(x ) − ϕ(x )) ≥ hx∗, u − x , i − hF (x ), u − x i. 0 t t 0 0 0 t 0 Taking limit t → 0+, we obtain

∗ ϕ(u) − ϕ(x0) ≥ hx0, u − x0i − hF (x0), u − x0i,

∗ which implies x0 − F (x0) ∈ ∂ϕ(x0).  We have the perturbation result. Theorem Assume that X is a real Hilbert space and that A is a maximal monotone operator on X. Let ϕ be a proper, convex and lower semi-continuous functional on X satisfying dom(A) ∩ dom(∂ϕ) is not empty and

ϕ((I + λ A)−1x) ≤ ϕ(x) + λ M, for all λ > 0, x ∈ D(ϕ), where M is some non-negative constant. Then the operator A + ∂ϕ is maximal mono- tone. We use the following lemma. Lemma Let A and B be m-dissipative operators on X. Then for every y ∈ X the equation y ∈ −Ax − Bλx (8.5) has a unique solution x ∈ dom (A).

Proof: Equation (10.15) is equivalent to y = xλ − wλ − Bλxλ for some wλ ∈ A(xλ). Thus, λ λ 1 x − w = y + (x + λ B x ) λ λ + 1 λ λ + 1 λ + 1 λ λ λ

λ 1 = y + (I − λ B)−1. λ + 1 λ + 1

173 Since A is m-dissipative, we conclude that (10.15) is equivalent to that xλ is the fixed point of the operator λ λ 1 F x = (I − A)−1( y + (I − λ B)−1x). λ λ + 1 λ + 1 λ + 1 By m-dissipativity of the operators A and B their resolvents are contractions on X and thus λ |F x − F x | ≤ |x − x | for all λ > 0, x , x ∈ X. λ 1 λ 2 λ + 1 1 2 1 2 Hence, Fλ has the unique fixed point xλ and xλ ∈ dom (A) solves (10.15). 

Proof of Theorem: From Lemma there exists xλ for y ∈ X such that

y ∈ xλ − (−A)λxλ + ∂ϕ(xλ)

Moreover, one can show that |xλ| is bounded uniformly. Since

y − xλ + (−A)λxλ ∈ ∂ϕ(xλ) for z ∈ X ϕ(z) − ϕ(xλ) ≥ (z − xλ, y − xλ + (−A)λxλ) −1 Letting λ(I + λA) x, so that z − xλ = λ(−A)λxλ and we obtain

−1 (λ(−A)λxλ, y − xλ + (−A)λxλ) ≤ ϕ((I + λ A) ) − ϕ(xλ) ≤ λ M, and thus 2 |(−A)λxλ| ≤ |(−A)λxλ||y − xλ| + M.

Since xλ| is bounded and so that |(−A)λxλ|. 

8.2 Duality Theory In this section we discuss the duality theory. We call

(P ) inf F (x) x∈X is the primal problem, where X is a Banach space and F : X → (−∞, ∞] is a proper l.s.c. convex function. We have the following result for the existence of minimizer. Theorem 2 Let X be reflexive and F be a lower semicontinuous proper convex func- tional defined on X. Suppose F

lim F (x) = ∞ as |x| → ∞. (8.6)

Then there exists anx ¯ ∈ X such that

F (¯x) = inf {F (x); x ∈ X}.

Proof: Let η = inf{F (x); x ∈ X} and let {xn} be a minimizing sequence such that lim F (xn) = η. The condition (8.6) implies that {xn} is bounded in X. Since X is reflexive there exists a subsequence that converges weakly tox ¯ in X and it follows from Lemma 1 that F (¯x) = η. 

174 We imbed (P ) into a family of perturbed problem

(Py) inf Φ(x, y) x∈X where y ∈ Y , a Banach space is an imbedding variable and Φ : X × Y → (−∞, ∞] is a proper l.s.c. convex function with Φ(x, 0) = F (x). Thus (P0) = (P ). For example in terms of (13.1) we let F (x) = f(x) + ϕ(Λx) (8.7) and with Y = H Φ(x, y) = f(x) + ϕ(Λx + y) (8.8) Definition The dual problem of (P ) with respect to Φ is

(P ∗) sup (−Φ∗(0, y∗)). y∗∈Y

The value function of (Py) is defined by

h(y) = inf Φ(x, y), y ∈ Y. x∈X

We assume that h(y) > −∞ for all y ∈ Y . Then we have Theorem D.0 inf (P ) ≥ sup (P ∗). Proof. For any (x, y) ∈ X × Y and (x∗, y∗) ∈ X∗ × Y ∗

Φ∗(x∗, y∗) ≥ hx∗, xi + hy∗, yi − Φ(x, y).

Thus, letting y = 0 and x∗ = 0

F (x) + Φ∗(0, y∗) ≥ h0, xi + hy∗, 0i = 0 for all x ∈ X and y∗ ∈ Y ∗. Therefore

∗ ∗ ∗ inf (P ) = inf F (x) ≥ sup (−Φ (0, y )) = sup (P ). y∗∈Y ∗

In what follows we establish the equivalence of the primal problem (P ) and the dual problem (P ∗). We start with the properties of h. Lemma D.1 h is convex.

Proof. The proof is by contradiction. Suppose there exist y1, y2 ∈ Y and θ ∈ (0, 1) such that h(θ y1 + (1 − θ) y2) > θ h(y1) + (1 − θ) h(y2). Then there exist c and  > 0 such that

h(θ y1 + (1 − θ) y2) > c > c −  > θ h(y1) + (1 − θ) h(y2).

 Let a1 = h(y1) + θ > h(y1) and c − θ a c −  − θ h(y ) a = 1 = 1 > h(y ). 2 1 − θ 1 − θ 2

175 By definition of h there exist x1, x2 ∈ X such that

h(y1) ≤ Φ(x1, y1) ≤ a1 and h(y2) ≤ Φ(x2, y2) ≤ a2.

Thus h(θ y1 + (1 − θ) y2) ≤ Φ(θ x1 + (1 − θ) x2, θ y1 + (1 − θ) y2)

≤ θ Φ(x1, y1) + (1 − θ) Φ(x2, y2) ≤ θ a1 + (1 − θ) a2 = c, which is a contradiction. Hence h is convex.  Lemma D.2 For all y∗ ∈ Y ∗, h∗(y∗) = Φ∗(0, y∗). Proof. ∗ ∗ ∗ ∗ h (y ) = supy∈Y (hy , yi − h(y)) = supy∈Y (hy , yi − infx∈X Φ(x, y))

= sup sup ((hy∗, yi − Φ(x, y)) = sup sup (h0, xi + hy∗, yi − Φ(x, y)) y∈Y x∈X y∈Y x∈X

∗ ∗ ∗ = sup(x,y)∈X×Y (h(0, y ), (x, y)i − Φ(x, y)) = Φ (0, y ). Note that h is not necessarily l.s.c.. Theorem D.3 If h is l.s.c. at 0, then inf (P ) = sup (P ∗).

Proof. Since F is proper, h(0) = infx∈X F (x) < ∞. Since h is convex by Lemma D.1, it follows from Theorem C.4 that h(0) = h∗∗(0). Thus

∗ ∗ ∗ ∗ ∗ sup (P ) = supy∗∈Y ∗ (−Φ(0, y )) = supy∗∈Y ∗ ((hy , 0i − h (y ))

∗∗ = h (0) = h(0) = inf (P ) Theorem D.4 If h is subdifferentiable at 0, then inf (P ) = sup (P ∗) and ∂h(0) is the set of solutions of (P ∗). Proof. By Lemma D.2y ¯∗ solves (P ∗) if and only if

∗ ∗ ∗ ∗ −h (¯y ) = −Φ(0, y¯ ) = supy∗∈Y ∗ (−Φ(0, y ))

∗ ∗ ∗ ∗∗ = supy∗ (hy , 0) − h (y )) = h (0). By Theorem 2.2 h∗∗(0) + h∗∗∗(¯y∗) = hy¯∗, 0) = 0 if and only ify ¯∗ ∈ ∂h∗∗(0). Since h∗∗∗ = h∗ by Theorem C.5, y∗ solves (P ∗) if and only if y∗ ∈ ∂h∗∗(y∗). Since ∂h(0) is not empty, ∂h(0) = ∂h∗∗(0). Therefore ∂h(0) is the set of all solutions of (P ∗) and (P ∗) has at least one solution. Let y∗ ∈ ∂h(0). Then hy∗, xi + h(0) ≤ h(x) for all x ∈ X. Let {xn} is a sequence in X such that xn → 0.

∗ lim inf h(xn) ≥ limhy , xni + h(0) = h(0) n→∞

∗ and h is l.s.c. at 0. By Theorem D.3 inf (P ) = sup (P ). 

176 Corollary D.6 If there exists anx ¯ ∈ X such that Φ(¯x, ·) is finite and continuous at 0, h is continuous on an open neighborhood U of 0 and h = h∗∗. Moreover,

inf (P ) = sup (P ∗) and ∂h(0) is the set of solutions of (P ∗). Proof. First show that h is continuous. Clearly, Φ(X,¯ ·) is bounded above on an open neighborhood U of 0. Since for all y ∈ Y

h(y) ≤ Φ(¯x, y), his bounded above on U. Since h is convex by Lemma D.1 and h is continuous by Theorem C.6. Hence h = h∗∗ by Theorem C. Now, h is subdifferential at 0 by Theorem C.10. The conclusion follows from Theorem D.4.  Example D.4 Consider the example (8.7)–(8.8), i.e.,

Φ(x, y) = f(x) + ϕ(Λx + y) (8.9) for (13.1). Let us calculate the conjugate of Φ.

∗ ∗ ∗ ∗ ∗ Φ (x , y ) = supx∈X supy∈Y {hx , xi + hy , yi − Φ(x, y)}

∗ ∗ = supx∈X supy∈Y {hx , xi + hy , yi − f(x) − ϕ(Λx + y)}

∗ ∗ = supx∈X {hx , xi − f(x) supy∈Y [hy , yi − ϕ(Λx + y)]} where

∗ ∗ ∗ supy∈Y [hy , yi − ϕ(Λx + y)] = supy∈Y [hy , Λx + yi − ϕ(Λx + y) − hy , Λxi]

∗ ∗ ∗ ∗ = supz∈Y [hy , zi − G(z)] − hy , Λxi = ϕ(y ) − hy , Λxi. Thus

∗ ∗ ∗ ∗ ∗ ∗ Φ (x , y ) = supx∈X {hx , xi − hy , Λxi − f(x) + ϕ(y )}

= sup{hx∗ − Λ∗y∗, xi − f(x) + ϕ(y∗)} = f ∗(x∗ − Λy∗) + ϕ∗(y∗). x∈X Theorem D.7 For anyx ¯ ∈ X andy ¯∗ ∈ Y ∗, the following statements are equivalent. (1)x ¯ solves (P ),y ¯∗ solves (P ∗), and min (P ) = max (P ∗). (2) Φ(¯x) + Φ∗(0, y¯∗) = 0. (3) (0, y¯∗) ∈ ∂Φ(¯x, 0). Proof: Suppose (1) holds, then obviously (2) holds. Suppose (2) holds, then since

Φ(¯x, 0) = F (¯x) ≥ inf (P ) ≥ sup (P ∗) ≥ −Φ(0, y¯∗) by Theorem D.2, thus

Φ(¯x, 0) = min (P ) = sup (P ∗) = −Φ∗(0, y¯∗)

177 ∗ Therefore (2) implies (1). Since h(0, y¯ ), (¯x, 0)i = 0, (2) and (3) by Theorem C.7.  Any solution y∗ of (P ∗) is called a Lagrange multiplier associated with Φ. In fact for the example (8.9) the optimality condition implies

0 = Φ(¯x, 0) + Φ∗(0, y¯∗) = f(¯x) + f ∗(−Λ∗y¯∗) + ϕ(Λ¯x) + ϕ(¯y∗) (8.10) = [f(¯x) + f ∗(−Λ∗y¯∗) − h−Λ∗y¯∗, x¯i] + [ϕ(Λ¯x) + ϕ∗(¯y∗) − hy¯∗, Λ¯x)]

Since each expression of (8.10) in square brackets is nonnegative, it follows that

f(¯x) + f ∗(−Λ∗y¯∗) − h−Λ∗y¯∗, x¯i = 0

ϕ(Λ¯x) + ϕ∗(¯y∗) − hy¯∗, Λ¯x) = 0

By Theorem 2.2 0 ∈ ∂f(¯x) + Λ∗y¯∗ (8.11) y¯∗ ∈ ∂ϕ(Λ¯x). The function L : X × Y ∗ → (−∞, ∞] defined by

L(x, y∗) = inf {Φ(x, y) − hy∗, yi} (8.12) y∈Y is called the Lagrangian. Note that

Φ∗(x∗, y∗) = sup {hx∗, xi + hy∗, yi − Φ(x, y)} x∈X, y∈Y

= sup {hx∗, xi + sup {hy∗, yi − Φ(x, y)}} = sup {hx∗, xi − L(x, y∗)} x∈X y∈Y x∈X

Thus, −Φ∗(0, y∗) = inf L(x, y∗). (8.13) x∈X Hence the dual problem (P ∗) is equivalently written as

sup inf L(x, y∗). y∗∈Y ∗ x∈X

Similarly, if Φ is a proper convex l.s.c. function, then the bi-conjugate of Φ in y, ∗∗ Φx = Φ and ∗∗ ∗ ∗ ∗ Φ(x, y) = Φx (y) = supy∗∈Y ∗ {hy , yi − Φx(y )}

= sup {hy∗, yi + L(x, y∗)}. y∗∈Y ∗ Hence Φ(x, 0) = sup L(x, y∗) (8.14) y∗∈Y ∗ and the primal problem (P ) is equivalently written as

inf sup L(x, y∗). x∈X y∗∈Y

178 By introducing the Lagrangian L, the Problems (P ) and (P ∗) are formulated as min- max problems, which arise in the game theory. By Theorem D.0

sup inf L(x, y∗) ≤ inf sup L(x, y∗). y∗ x x y∗

Theorem D.8 (Saddle Point) Assume Φ is a proper convex l.s.c. function. Then the followings are equivalent. (1) (¯x, y¯∗) ∈ X × Y ∗ is a saddle point of L, i.e.,

L(¯x, y∗) ≤ L(¯x, y¯∗) ≤ L(x, y¯∗) for all x ∈ X, y∗ ∈ Y ∗. (8.15)

(2)x ¯ solves (P ),y ¯∗ solves (P ∗), and min (P ) = max (P ∗). Proof: Suppose (1) holds. From (8.13) and (8.15)

L(¯x, y¯∗) = inf L(x, y¯∗) = −Φ∗(0, y¯∗) x∈X and form (8.14) and (8.15)

L(¯x, y¯∗) = sup L(¯x, y∗) = Φ(¯x, 0) y∗∈Y ∗

Thus, Φ∗(¯x, 0) + Φ(0, y¯∗) = 0 and (2) follows from Theorem D.7. Conversely, if (2) holds, then from (8.13) and (8.14)

∗ ∗ ∗ ∗ −Φ (0, y¯ ) = infx∈X L(x, y¯ ) ≤ L(¯x, y¯ )

∗ ∗ Φ (¯x, 0) = supy∗∈Y ∗ L(¯x, y) ≥ L(¯x, y¯ ).

∗ ∗ ∗ By Theorem D.7 −Φ (0, y¯ ) = Φ (¯x, 0) and (8.15) holds.  Theorem D.8 implies that no duality gap between (P ) and (P ∗) is equivalent to the saddle point property of the pair (¯x, y¯∗). For the example

L(x, y∗) = inf {f(x) + ϕ(Λx + y) − hy∗, yi} = f(x) + hy∗, Λxi − ϕ∗(y∗) (8.16) y∈Y

If (¯x, y¯∗) is a saddle point, then from (8.15)

0 ∈ ∂f(¯x) + Λ∗y¯∗ (8.17) 0 ∈ Λ¯x − ∂ϕ∗(¯x).

It follows from Theorem 2.2 that the second equation is equivalent to

y¯∗ ∈ ∂ϕ(Λ¯x) and (8.17) is equivalent to (8.11).

8.3 p-Laplacian Z Φ(u, p) = ϕ(p) + (λ, ∆u − p)

179 9 Optimization Algorithms

In this section we discuss iterative methods for finding a solution to the optimization problem. First, consider the unconstrained minimization:

min J(x) over a Hilbert space X.

Assume J : X → R+ is continuously differentiable. If u∗ ∈ X minimizes J, then the necessarily optimality is given by J 0(u∗) = 0. The gradient method is given by

0 xn+1 = xn − α J (xn) (9.1) where α > 0 is a stepsize chosen so that the decent J(xn+1) < J(xn) holds. Let 0 gn = J (xn) and since

0 2 J(xn − α gn) − J(xn) = −α |J (xn)| + o(α), (9.2) there exits a α > 0 satisfies the decent property. A stepsize α > 0 can be determined by min J(xn − α gn) (9.3) α>0 or by a line search method (e.g, Amijo-rule) Next, consider the constrained minimization

min J(x) over x ∈ C. (9.4) where C is a closed convex set in X. It follows from Theorem 7 that if J : X → R is C1 and x∗ ∈ C is a minimizer, then

(J 0(x∗), x − x∗) ≥ 0 for all x ∈ C. (9.5)

∗ Define the orthogonal projection operator PC of X onto C, i.e., z = PCx minimizes

|z − x|2 over z ∈ C.

From Theorem 7, z∗ ∈ C satisfies

(z∗ − x, z − z∗) ≥ 0 for all z ∈ C. (9.6)

Note that (9.5) is equivalent to

(x∗ − (x∗ − α J 0(x∗)), x − x∗) ≥ 0 for all x ∈ C and α > 0.

Thus, it is follows from (9.6) that (9.5) is equivalent to

∗ ∗ 0 ∗ x = PC(x − α J (x )).

The projected gradient method for (9.4) is defined by

0 xn+1 = PC(xn − αn J (xn)) (9.7) with a stepsize αn > 0.

180 00 Define the the Hessian H = J (x0) ∈ L(X,X) of J at x0 by 1 J(x) − J(x ) − (J 0(x ), x − x ) = (x − x ,H(x − x )) + o(|x − x |2). 0 0 0 2 0 0 0 Thus, we minimize over x 1 J(x) ∼ J(x ) + (J 0(x ), x − x ) + (x − x ,H(x − x )) 0 0 0 2 0 0 we obtain the damped-Newton method for equation given by

−1 0 xn+1 = xn − α H(xn) J (xn), (9.8) where α > 0 is a stepsize and α = 1 corresponds to the Newton method. In the case of min J(x) over x ∈ C we obtain the projected Newton method

−1 0 xn+1 = P rojC(xn − α H(xn) J (xn)). (9.9) Computing the Hessian can be expensive and numerical evaluation can also be less accurate. The finite rank scant update (BFGS and GFP) is used to substitute H (Quasi-Newton method). It is a variable merit method and such updates are pre- conditioner for the gradient method. Consider the (regularized) nonlinear least square problem 1 α min |F (x)|2 + (P x, x) (9.10) 2 Y 2 X where F : X → Y is C1 and P is a nonnegative self-adjoint operator on a Hilbert space X. The Gauss-Newton method is an iterative method of the form 1 α x = argmin{ |F 0(x )(x − x ) + F (x )|2 + (P x, x) }, n+1 2 n n n Y 2 X i.e., 0 ∗ −1 0 ∗ xn+1 = xn − (F (xn) F (xn) + α P ) (F (xn) F (xn)). (9.11)

9.1 Constrained minimization and Lagrange multiplier method Consider the constrained minimization

min J(x, u) = F (x) + H(u) subject to E(x, u) = 0 and u ∈ C. (9.12)

We use the implicit function theory for developing algorithms for (9.12). Implicit Function Theory Let E : X × U → X is C1. Suppose a pair (¯x, u¯) satisfies E(x, u) = 0 and Ex(¯x, u¯) is bounded invertible. Then the exists a δ > 0 such that for |u − u¯| < δ equation E(x, u) = 0 has a locally defined unique solution x = Φ(u). Moreover, Φ : U → X is continuously differentiable atu ¯ andx ˙ = Φ0(¯x)(d) satisfies

Ex(¯x, u¯)x ˙ + Eu(¯x, u¯)d = 0.

Theorem (Lagrange Calculus) Assume that E(x, u) = 0 and Ex(x, u) is bounded invertible. Then, by the implicit function theory x = Φ(u) and for J(u) = J(Φ(u), u)

0 0 ∗ (J (u), d) = (H (u), d) + (Eu(x, u) λ, d)

181 where λ ∈ Y satisfies ∗ 0 Ex(x, u) λ + F (x) = 0. Proof: From the implicit function theory

(J 0, d) = (F 0(x), x˙) + (H0(u), d), where Ex(x, u)x ˙ + Eu(x, u) = 0. Thus, the claim follows from

0 ∗ (F (x), x˙) = −(Exλ, x˙) = −(λ, Exx˙) = (λ, Eud). Hence we obtain the gradient method for equality constraint problem (9.12); Gradient method

1. Solve for xn E(xn, un) = 0

2. Solve for λn ∗ 0 Ex(xn, un) λn + F (xn) = 0 3. Compute the gradient by

0 ∗ gn = H (un) + Eu(xn, un) λn

4. Gradient Step: Update un+1 = PC(un − α gn) with a line search for α > 0.

9.2 Conjugate gradient method In this section we develop the conjugate gradient and conjugate residual method for the saddle point problem. Consider the quadratic programing 1 J(x) = (Ax, x) − (b, x) 2 X X where A is positive self-adjioint operator on a Hilbert space X. Then, the negative gradient of J is given by r = −J 0(x) = b − Ax. The gradient method is given by

xk+1 = xk + αk pk, pk = rk, where the step size αk is selected by

min J(xk + α rk) over α > 0.

182 That is, the Cauchy step is given by

0 (pk, rk) 0 = (J (xk+1), pk) = α (Apk, pk) − (pk, rk) ⇒ αk = . (9.13) (Apk, pk) It is known that the gradient method with Cauchy step searches the solution on a subspace spanned by the major and minor axes of A dominantly, and exhibits a zig- zag pattern of its iterates. If the condition number

max σ(A) ρ = min σ(A) is large, then it is very slow convergent. In order to improve the convergence we introduce the conjugacy of the search directions pk and it can be achieved by the conjugate gradient method. For example, a new search direction pk is extracted for the residual rk = b − Axk by pk = rk − β pk−1 and the orthogonality

0 = (Apk, pk−1) = (rk, Apk−1) − β(Apk−1, pk−1) implies (rk, Apk−1) βk = . (9.14) (Apk−1, pk−1) Let P is the pre-conditioner, self-adjiont operator so that the condition number of − 1 − 1 P 2 AP 2 is much smaller that the one for A. If we apply (9.13)(9.14) with the pre-conditioner P , we have the pre-conditioned conjugate gradient method:

• Initialize x0 and set

−1 r0 = b − Ax0, z0 = P r0, p0 = z0

• Iterate on k until (rk, zk) is sufficiently small

(rk, zk) αk = , xk+1 = xk + αk pk, rk+1 = rk − αk Apk. (pk, Apk)

−1 (zk+1, rk+1) zk+1 = P rk+1, βk = , pk+1 := zk+1 + βk pk (zk, rk)

−1 Here, zk is the pre-conditioned residual and (rk, zk) = (rk,P rk). The above formulation is equivalent to applying the conjugate gradient method without precon- ditioned system − 1 − 1 − 1 P 2 AP 2 xˆ = P 2 b, 1 wherex ˆ = P 2 x. We have the conjugate gradient theory. Theorem (Conjugate gradient method) (1) We have conjugate property

(rk, zj) = 0, (pk, Apj) = 0 for 0 ≤ j < k.

183 (2) xm+1 minimizes J(x) over x ∈ x0 + span{p0, ··· , pm}. Proof: For k = 1 (r1, z0) = (r0, z0) − α0 (Az0, z0) = 0 and since (r1, z1) = (r0, z1) − α0 (r1, Az0) 1 (p1, Ap0) = (z1, Az0) + β0(z0, Az0) = − (r1, z1) + β0(z0, Az0) = 0 α0

Note that span{p0, ··· , pk} = span{z0, ··· , zk}. In induction in k, for j < k

(rk+1, zj) = (rk, zj) − αk(Apk, pj) = 0 and ((rk+1, zk) = (rk, zk) − αk(Apk, pk) = 0 Also, for j < k 1 (pk+1, Apj) = (zk+1, Apj) + βk(pk, Apj) = − (zk+1, rj+1 − rj) = 0 αj and 1 (pk+1, Apk) = (zk+1, Apk) + βk(pk, Apk) = − (rk+1, zk) + βk(pk, Apk) = 0. αk Hence (1) holds. For (2) we note that

0 (J (xm+1), pj) = (rm+1, pj) = 0 for j ≤ m and xm+1 = x0 + span{p0, ··· , pm}. Thus, (2) holds.  Remark Note that k−1 X −1 rk = r0 − αjP Apj j=0 Thus, k−1 2 2 X −1 |rk| = |r0| − (2 αj=0P Apj, r0) + |αj|.

9.3 Conjugate residual method Consider the constrained quadratic programing 1 min (Qy, y) − (a, y) subject to Ey = b in Y. 2 X X Then the necessary optimality condition is written for x = (y, λ) ∈ X × Y :

 QE∗  A(y, λ) = (a, b) where A =   , E 0 where A is self-adjoint but is indefinite.

184 In general we assume A is self-adjoint and invertible. Consider the least square problem 1 1 1 |Ax − b|2 = (AAx, x) − (Ab, x) + |b|2. 2 X 2 X X 2 We consider the conjugate directions {pk} satisfying (Apk, Apj) = 0, j < k. The pre- conditioned residual method may be derived in the same way as done for the conjugate gradient method:

• Initialize x0 and set −1 r0 = P (b − Ax0), p0 = r0, (Ap)0 = Ap0

• Iterate with k until (rk, rk) is sufficiently small

(rk, Ark) −1 αk = −1 , xk+1 = xk + αk pk, rk+1 = rk − αk P (Ap)k. ((Ap)k,P (Ap)k)

(rk+1, Ark+1) βk = , pk+1 = rk+1 + βk pk, (Ap)k+1 = Ark+1 + βk (Ap)k (rk, Ark)

Theorem (Conjugate residual method) Assuming αk 6= 0, (1) We have conjugate property −1 (rj, Ark) = 0, (Apk,P Apj) = 0 for 0 ≤ j < k. 2 (2) xm+1 minimizes |Ax − b| over x ∈ x0 + span{p0, ··· , pm}. Proof: For k = 1 −1 (Ar0, r1) = (Ar0, r0) − α0 (Ap0,P Ap0) = 0 −1 −1 −1 −1 and since (Ap0,P Ap1) = (P Ap0, Ar1) + β0 (Ap0,P Ap0) and (P Ap0, Ar1) = 1 (r − r , Ar ) = − 1 (r , Ar ), we have α0 0 1 1 α0 1 1

−1 1 −1 (Ap0,P Ap1) = − (r1, Ar1) + β0(Ap0,P Ap0) = 0. α0 −1 Note that span{p0, ··· , pk} = span{r0, ··· , rk} and rk = r0+P A span{p0, ··· , pk−1}. In induction in k, for j < k −1 (rk+1, Arj) = (rk, Arj) − αk(P Apk, Arj) = 0 and −1 (Ark+1, rk) = (Ark, rk) − αk(Apk,P Apk) = 0 Also, for j < k

−1 −1 −1 1 (Apk+1,P Apj) = (Ark+1,P Apj) + βk(Apk,P Apj) = − (Ark+1, rj+1 − rj) = 0 αj and

−1 −1 −1 1 (Apk+1,P Apk) = (Ark+1,P Apk)+βk(Apk,P Apk) = − (rk+1, Ark+1)+βk(pk, Apk) = 0. αk Hence (1) holds. For (2) we note that −1 −1 (Axm+1 − b, P Apj) = −(rm+1,P Apj) = 0 for j ≤ m and xm+1 = x0 + span{p0, ··· , pm}. Thus, (2) holds. 

185 9.4 MINRES The conjugate gradient method can be viewed as a special variant of the Lanczos method for positive definite symmetric systems. The minimal residual method (MIN- RES) and symmetric LQ method (SYMMLQ) methods are variants that can be applied to symmetric indefinite systems. The vector sequences in the conjugate gradient method correspond to a factorization of a tridiagonal matrix similar to the coefficient matrix. Therefore, a breakdown of the algorithm can occur corresponding to a zero pivot if the matrix is indefinite. The CG method is for the minimization

min(x, Ax) − (b, x)

Thus, for indefinite matrix A the minimization property of the conjugate gradient method is no longer well-defined. The MINRES methods is a variant of the conjugate gradient method that avoids the LU decomposition and does not suffer from breakdown. MINRES minimizes the residual in the 2-norm

min |Ax − b|2

The convergence behavior of the conjugate gradient and MINRES methods for indefi- nite systems was analyzed by Paige et al. (1995). When A is not positive definite, but symmetric, we can still construct an orthogonal basis for the Krylov subspace by three-term recurrence relations. Eliminating the search directions in the equations of the conjugate gradient method gives a recurrence

Ari = ri+1ti+1,i + riti,i + ri−1ti−1,i, which can be written in matrix form as

ARi = Ri+1Ti, where Ti is an (i+1)×i is the tridiagonal matrix. That is, given a symmetric matrix A and a vector b, the Lanczos process [30] computes vectors rk and tridiagonal matrices Tk according to r0 = 0 and β1v1 = b, and then

pi = Ari, αi = (ri, pi), βi+1ri+1 = pi − αi ri − βi ri−1 where ti+1,i = βi+1, ti,i = αi, ti−1,i = βi. In exact arithmetic, the columns of Ri are orthonormal and the process stops with i = ` and β`+1 = 0. In this case we have the problem that (·, ·)A no longer defines an inner product. However we can still try to minimize the residual in the 2-norm by obtaining

(i) i−1 (i) x ∈ Krylov subspace = {r0, Ar0, ..., A r0}, i.e. x = Riy that minimizes (i) |Ax − b|2 = |ARiy − b|2 = |Ri+1Tiy − b|2. Now we exploit the fact that if

(0) (1) (i) Di+1 = diag(|r |2, |r |2, ..., |r |2),

186 −1 then Ri+1Di+1 is an orthonormal transformation with respect to the current Krylov subspace (i) i (0) ((1) |Ax − b|2 = |Di+1T y − r |2|e |2, and this final expression can simply be seen as a minimum norm least squares problem. The element in the (i + 1, i) position of Ti can be annihilated by a simple Givens rotation and the resulting upper bidiagonal system (the other sub-diagonal elements having been removed in previous iteration steps) can simply be solved, which leads to the MINRES method (Paige and Saunders 1975).

9.5 DIIS method DIIS (direct inversion in the iterative subspace or direct inversion of the iterative sub- space), also known as Pulay mixing, is an extrapolation technique. DIIS was developed by Peter Pulay in the field of computational quantum chemistry with the intent to ac- celerate and stabilize the convergence of the Hartree-Fock self-consistent field method. Approximate error vectors ei = b − Api corresponding to the sequences of the trial m vectors {pi}i=1 is evaluated. With sufficient large m, we consider a linear combination m {pi}i=1 m X p = cipi. Then, we have m X X r = b − Ap = ci(b − Api) = ciri i i=1 P m for i ci = 1. The DIIS method seeks to minimize the norm of r = b−Ap over {ci}i=1, i.e. 1 min |r|2 subject to P c = 1. 2 i i The necessary optimality condition is given by X Bc = λ, ci = 1, i where the symmetric matrix B is given by

Bij = (ri, rj) P and a constant λ is a constant Lagrange multiplier for the constraint i ci = 1.

9.6 Nonlinear Conjugate Residual method One can extend the conjugate residual method for the nonlinear equation. Consider the equality constrained optimization

min J(y) subject to E(y) = 0.

The necessary optimality system for x = (y, λ) is

 J 0(y) + E0(y)∗λ  F (y, λ) =   = 0. E(y)

187 Multiplication of vector by the Jacobian F 0 can be approximated by the difference of two residual vectors by

F (x + t y) − F (x ) k k ∼ Ay with t |y| ∼ 1. t Thus, we have the nonlinear version of the conjugate residual method: Nonlinear Conjugate Residual method

• Calculate sk for approximating Ark by

F (xk + t rk) − F (xk) 1 sk = , t = . t |rk|

• Update the direction by

−1 (sk, rk) pk = P rk − βk pk−1, βk = . (sk−1, rk−1)

• Calculate qk = sk − βk qk−1. • Update the solution by

(sk, rk) xk+1 = xk + αk pk, αk = −1 . (qk,P qk)

• Calculate the residual rk+1 = F (xk+1).

9.7 Newton method Next, we develop the Newton method for J 0(u) = 0. Define the Lagrange functional

L(x, u, λ) = F (x) + H(u) + (λ, E(x, u)).

Note that 0 J (u) = Lu(x(u), u, λ(u)) where (x(u), λ(u)) satisfies

E(x(u), u) = 0,E0(x(u))∗λ(u) + F 0(x(u)) = 0 where we assumed Ex(x, u) is bounded invertible. By the implicit function theory

00 ∗ J (u) = Lux(x(u), u, λ(u))x ˙ + Eu(x(u), u, λ(u)) λ˙ + Luu(x(u), u, λ(u)), where (x, ˙ λ˙ ) satisfies

 ∗      Lxx(x, u, λ) Ex(x, u) x˙ Lxu(x, u, λ)     +   = 0. Ex(x, u) 0 λ˙ Eu(x, u)

188 Thus,

−1 ∗ −1 x˙ = −(Ex(x, u)) Eu(x, u), λ˙ = −(Ex(x, u) ) (Lxx(x, u, λ)x ˙ + Lxu(x, u, λ)) and 00 ∗ ∗ −1 J (u) = Luxx˙ − Eu(x, u) (Ex(x, u) ) (Luxx˙ + Lxu) + Luu(x, u, λ). Theorem (Lagrange Calculus II) The Newton step

J 00(u)(u+ − u) + J 0(u) = 0 is equivalently written as

 ∗      Lxx(x, u, λ) Lxu(x, u, λ) Ex(x, u) ∆x 0        ∗     ∗ 0   Lux(x, u, λ) Luu(x, u, λ) Eu(x, u)   ∆u  +  Eu(x, u) λ + H (u)  = 0             Ex(x, u) Eu(x, u, λ) 0 ∆λ 0 where ∆x = x+ − x, ∆u = u+ − u and ∆λ = λ+ − λ. Proof: Define the linear operator T by

 −1  −Ex(x, u) Eu(x, u) T (x, u) =   . I

Then, the Newton method is equivalently written as     ∗ 00 ∆x ∗ 0 T (x, u) L (x, u, λ) + T (x, u) ∗ 0 = 0. ∆u Euλ + H (u)

Since E0(x, u) is surjective, it follows from the closed range theorem

N(T (x, u)∗) = R(T (x, u))⊥ = N(E0(x, u)∗)⊥ = R(E0(x, u)).

Hence, we obtain the claimed update.  Thus, we have the Newton method for the equality constraint problem (9.12). Newton method

1. Given u0, initialize (x0, λ0) by

∗ 0 E(x0, u0) = 0,Ex(x0, u0) λ0 + F (x0) = 0

2. Newton Step: Solve for (∆x, ∆u, ∆λ)

 00 0 ∗      L (xn, un, λn) E (xn, un) ∆x 0 ∗ 0    ∆u  +  Eu(xn, un) λn + H (un)  = 0 0 E (xn, un) 0 ∆λ 0

and update un+1 = P rojC(un + α ∆u) with a stepsize α > 0.

189 3. Feasible Step: Solve for (xn+1, λn+1)

∗ 0 E(xn+1, un+1) = 0,Ex(xn+1, un+1) λn+1 + F (xn+1) = 0.

4. Stop or set n = n + 1 and return to step 2. where  00  F (xn) + (λn,Exx(xn, un)) (λn,Exu(xn, un)) 00 L (xn, un, λn) =   . 00 (λn,Eux(xn, un)) H (un) + (λn,Euu(xn, un)) and 0 E (xn, un) = (Ex(xn, un),Eu(xn, un)). Consider the general equality constraint minimization;

min J(y) subject to E(y) = 0

The necessary optimality condition is given by

J 0(y) + E0(y)∗λ = 0,E(y) = 0, (9.15) where we assume E0(y) is surjective. Problem (9.12) is a specific case of this, i.e.,

y = (x, u),J(y) = F (x) + H(u),E = E(x, u).

The Newton method applied to the necessary optimality (9.15) for (y, λ) is as follows.

One can apply the Newton method for the necessary optimality system

(Lx(x, u, λ),Lu(x, u, λ, E(x, u)) = 0 and results in the SQP (sequential quadratic programing): Newton method applied to Necessary optimality system (SQP) 1. Newton direction: Solve for (∆y, ∆λ)

 00 0 ∗     0 0 ∗  L (yn, λn) E (yn) ∆y J (yn) + E (yn) λn     +   = 0. (9.16) 0 E (yn) 0 ∆λ E(yn)

2. Update yn+1 = yn + α ∆y and λn+1 = λm + α ∆λ. 3. Stop or set n = n + 1 and return to step 1. where L00(y, λ) = J 00(y) + E00(y)∗λ. The both Newton methods solve the same linear system (9.16) with the different right hand side. But, the first Newton method we iterate on the variable u only but each step uses the feasible step E(yn) = E(xn, un) = 0 and Lx(xn, un, λn) = 0 for (xn, λn), which are the steps for the gradient method. The, it computes the Newton direction for J 0(u) and updates u. In this sense the second Newton’s method is a relaxed version of the first one.

190 10 Sequential programing

In this section we discuss the sequential programming. It is an intermediate method between the gradient method and the Newton method. Consider the constrained op- timization F (y) subject to E(y) = 0, y ∈ C. We linearize the equality constraint and consider a sequence of the constrained opti- mization;

0 min F (y) subject to E (yn)(y − yn) + E(yn) = 0, y ∈ C. (10.1)

( The necessary optimality (if E yn) is surjective) is given by

 0 0 ∗  (F (y) + E (yn) λ, y˜ − y) ≥ 0 for ally ˜ ∈ C (10.2) 0  E (yn)(y − yn) + E(yn) = 0.

Note that the necessary condition for (y∗, λ∗) is written as

 0 ∗ 0 ∗ ∗ 0 ∗ 0 ∗ ∗ ∗ ∗  (F (y ) + E (yn) λ − (E (yn) − E (y ) )λ , y˜ − y ) ≥ 0 for ally ˜ ∈ C (10.3) 0 ∗ 0 ∗ ∗  E (yn)(y − yn) + E(yn) = E (yn)(y − yn) + E(yn) − E(y ).

Suppose we have the Lipschitz continuity of solutions to (10.2) and (10.3) with respect 0 ∗ 0 ∗ ∗ ∗ 0 ∗ to the perturbation; ∆1 = (E (yn) − E (y ) )λ and ∆2 = E (yn)(y − yn) + E(yn) − E(y∗), we have

∗ 0 ∗ 0 ∗ ∗ ∗ 0 ∗ ∗ |y − y | ≤ c1 |(E (yn) − E (y ) )λ | + c2 |E (yn)(y − yn) + E(yn) − E(y )|.

Thus, for the (damped) update:

yn+1 = (1 − α)yn + α y, α ∈ (0, 1), (10.4) we have the estimate

∗ ∗ ∗ 2 |yn+1 − y | ≤ (1 − α + αγ)|yn − y | + cα |yn − y | , (10.5) for some constants γ and c > 0. In fact, for the case C = X let zn = (yn, λn), ∗ ∗ ∗ z = (y , λ ) and define Gn by

 00 0 ∗  F (yn) E (yn) Gn =   . 0 E (yn) 0

For the update: zn+1 = (1 − α)zn + α z, α ∈ (0, 1] we have  (E0(y )∗ − E0(y∗)∗)λ∗  z − z∗ = (1 − α)(z − z∗) + α G−1( n + δ ) n+1 n n 0 n

191 with  0 ∗ 0 00 ∗  F (y ) − F (yn+1) − F (yn)(y − yn+1) δn = ∗ 0 ∗ . E(y ) − E (yn)(y − yn) − E(yn). 00 Note that if F (yn) = Q, then

 ∆   Q−1(∆ − E0(y )∗e)  G−1 1 = 1 n , e = (E0(y )Q−1E0(y )∗)−1E0(y )Q−1∆ n 0 e n n n 1

Thus, we have  (E0(y )∗ − E0(y∗)∗)λ∗  |G−1 n | ≤ γ |yn − y∗| n 0 Since −1 ∗ 2 |Gn δn| ≤ c |yn − y | , we have ∗ ∗ ∗ 2 |zn+1 − z | ≤ (1 − α + αγ) |zn − z | + αc |zn − z | which imply (10.5). When either the quadratic variation of E is dominated by the linearization E0 of E, i.e.,

|E0(y∗)†(E0(yn) − E0(y∗))| ≤ β |yn − y| with small β > 0 or the multiplier |λ∗| is small, γ > 0 is sufficiently small. The estimate (10.5) implies that if |y − y∗|2 is dominating we use a small α > 0 to the damp out and if |y − y∗| is dominating we use α → 1.

10.1 Second order version The second order sequential programming is given by

0 min F (y) + hλn,E(y) − (E (yn)(y − yn) + E(yn))i (10.6) 0 subject to E (yn)(y − yn) + E(yn) = 0 and y ∈ C.

The necessary optimality condition for (10.6) is given by

 0 0 ∗ 0 ∗ 0 ∗  (F (y) + (E (y) − E (yn) )λn + E (yn) λ, y˜ − y) ≥ 0 for ally ˜ ∈ C (10.7) 0  E (yn)(y − yn) + E(yn) = 0.

It follows from (10.3) and (10.7) that

∗ ∗ ∗ 2 ∗ 2 |y − y | + |λ − λ | ≤ c (|yn − y | + |λn − λ | ).

In summary we have Sequential Programming I

1. Given yn ∈ C, we solve for

0 min F (y) subject to E (yn)(y − yn) + E(yn) = 0, over y ∈ C.

192 2. Update yn+1 = (1 − α) yn + α y, α ∈ (0, 1). Iterate until convergence.

Sequential Programming II

1. Given yn ∈ C, λn and we solve for y ∈ C:

0 0 min F (y)+(λn,E(y)−(E (yn)(y−yn)+E(yn)) subject to E (yn)(y−yn)+E(yn) = 0, y ∈ C.

2. Update (yn+1, λn+1) = (y, λ). Iterate until convergence.

10.2 Examples Consider the constrained optimization of the form

min F (x) + H(u) subject to E(x, u) = 0 and u ∈ C. (10.8)

Suppose H(u) is nonsmooth. We linearize the equality constraint and consider a se- quence of the constrained optimization;

0 min F (x) + H(u) + hλn,E(x, u) − (E (xn, un)(x − xn, u − un) + E(xn, un)i

0 subject to E (xn, un)(x − xn, u − un) + E(xn, un) = 0 and u ∈ C. (10.9) Note that if (x∗, u∗) is an optimizer of (10.8), then

0 ∗ 0 ∗ ∗ 0 0 ∗ 0 0 ∗ ∗ ∗ F (x ) + (E (x , u ) − E (xn, un))λn + E (xn, un)λ = (E (xn, un) − E (x , u ))(λ − λn) = ∆1

0 ∗ ∗ ∗ 2 ∗ 2 E (xn, un)(x − xn, u − un) + E(xn, un) = ∆2, |∆2| ∼ M (|xn − x | + |un − u | ). (10.10) Thus, (x∗, u∗) is the solution to the perturbed problem of (10.9):

0 min F (x) + H(u) + hλn,E(x, u) − (E (xn, un)(x − xn, u − un) + E(xn, un)i

0 subject to E (xn, un)(x − xn, u − un) + E(xn, un) = ∆ and u ∈ C. (10.11) Let (xn+1, un+1) be the solution of (10.9). One can assume the Lipschitz continuity of solutions to (10.11):

∗ ∗ ∗ |xn+1 − x | + |un+1 − u | + |λn − λ | ≤ C |∆| (10.12) From (10.10) and assumption (10.12) we have the quadratic convergence of the sequen- tial programing method (10.15):

∗ ∗ ∗ ∗ 2 ∗ 2 ∗ |xn+1 − x | + |un+1 − u | + |λ − λ | ≤ M (|xn − x | + |un − u | + |λn − λ |). (10.13) Assuming H is convex, the necessary optimality given by

 0 ∗ ∗ ∗ F (x) + (Ex(x, u) − Ex(xn, un) )λn + Ex(xn, un) λ = 0    ∗ ∗ ∗ H(v) − H(u) + ((Eu(x, u) − Eu(xn, un) )λn + Eu(x, u) λ, v − u) ≥ 0 for all v ∈ C    Ex(xn, un)(x − xn) + Eu(xn, un)(u − un) + E(xn, un) = 0. (10.14)

193 As in the Newton method we use

∗ ∗ 00 (Ex(x, u) − Ex(xn, un) )λn ∼ E (xn, un)(x − xn, u − un)λn

0 00 0 F (xn+1) ∼ F (xn)(xn+1 − xn) + F (xn) and obtain

00 0 ∗ F (xn)(x − xn) + F (xn) + λn(Exx(xn, un)(x − xn) + Exu(xn, un)(u − un)) + Ex(xn, un) λ = 0

Ex(xn, un)x + Eu(xn, un)u = Ex(xn, un)xn + Eu(xn, un)un − E(xn, un), which is a linear system of equations for (x, λ). Thus, one has x = x(u), λ = λ(u), a continuous affine function in u;

   00 ∗ −1 x(u) F (xn) + λnExx(xn, un) Ex(xn, un)   =   RHS (10.15) λ(u) Ex(xn, un) 0 with  F 00(x )x − F 0(x ) + λ E (x , u )x − E (x , u )(u − u ))λ  RHS = n n n n xx n n n xu n n n n Ex(xn, un)xn − Eu(xn, un)(u − un) − E(xn, un) and then the second inequality of (10.14) becomes the variational inequality for u ∈ C. Consequently, we obtain the following algorithm: Sequential Programming II 1. Given u ∈ C, solve (10.15) for (x(u), λ(u)). 2. Solve the variational inequality for u ∈ C:

∗ ∗ ∗ H(v) − H(u) + ((Eu(x(u), u) − Eu(xn, un) )λn + Eu(xn, un) λ(u), v − u) ≥ 0, (10.16) for all v ∈ C.

3. Set un+1 = u and xn+1 = x(u). Iterate until convergence.

Example (Optimal Control Problem) Let (x, u) ∈ H1(0,T ; Rn) × L2(0,T ; Rm). Con- sider the optimal control problem:

Z T min (`(x(t)) + h(u(t))) dt 0 subject to the dynamical constraint d x(t) = f(x(t)) + Bu(t), x(0) = x dt 0 and the control constraint

u ∈ C = {u(t) ∈ U, a.e.},

194 where U is a closed convex set in Rm. Then, the necessary optimality condition for (10.9) is

d E0(x , u )(x − x , u − u ) + E(x , u ) = − x + f 0(x )(x − x ) + Bu = 0 n n n n n n dt n n d F 0(x) + E (x , u )λ = λ + f 0(x )tλ + `0(x) = 0 x n n dt n

t u(t) = argminv∈U {h(v) + (B λ(t), v)}.

α 2 m Btλ(t) If h(u) = 2 |u| and U = R , then u(t) = − α . Thus, (10.9) is equivalent to solving the two point boundary value for (x, λ):  d 0 1 t  x = f (xn)(x − xn) − BB λ, x(0) = x0  dt α  d  − λ = (f 0(x) − f 0(x ))∗λ + f 0(x )∗λ + `0(x), λ(T ) = 0.  dt n n n

Example (Inverse Medium problem) Let (y, v) ∈ H1(Ω) × H(Ω). Consider the inverse medium problem: Z Z 1 2 α 2 min |y − z| dsx + (β |v| + |∇v| ) dx Γ 2 Ω 2 subject to ∂y −∆y + vy = 0 in Ω, = g at the boundary Γ ∂n and v ∈ C = {0 ≤ v ≤ γ < ∞}, where Ω is a bounded open domain in R2, z is a boundary measurement of the voltage y and g ∈ L2(Γ) is a given current. Problem is to determine the potential function v ∈ C from z ∈ L2(Γ) in a class of sparse and clustered media. Then, the necessary optimality condition for (10.9) is given by

 ∂y  −∆y + vn(y − yn) + vyn = 0, = g  ∂n    ∂λ −∆λ + v λ = 0, = y − z n ∂n    Z   (α (∇v, ∇φ − ∇v) + β (|φ| − |v|) + (λyn, v)) dx ≥ 0 for all φ ∈ C Ω 10.3 Exact penalty method Consider the penalty method for the inequality constraint minimization

min F (x), subject to Gx ≤ c (10.17)

195 where G ∈ L(X,L2(Ω)). If x∗ is a minimizer of (10.17), the necessary optimality condition is given by

F 0(x∗) + G∗µ = 0, µ = max(0, µ + Gx∗ − c) a.e. in Ω. (10.18)

For β > 0 the penalty method is defined by

min F (x) + β ψ(Gx − c) (10.19) where Z ψ(y) = max(0, y) dω. Ω The necessary optimality condition is given by

−F 0(x) ∈ βG∗∂ψ(Gx − c). (10.20) where   0 s < 0 ∂ψ(s) = [0, 1] s = 0  1 s > 0 Suppose the Lagrange multiplier µ satisfies

sup |µ(ω)| ≤ β, (10.21) ω∈Ω then µ ∈ β ∂ψ(Gx∗ −c) and thus x∗ satisfies (10.20). Moreover, if (10.19) has a unique minimizer in a neighborhood of x∗, then x = x∗ where x is a minimizer of (10.19). Due to the singularity and the non-uniqueness of the subdifferential of ∂ψ, the direct treatment of the condition (10.19) many not be feasible for numerical computation. We define a regularized functional of max(0, s); for  > 0   s ≤ 0  2   1 2  max(0, s) = 2 |s| + 2 0 ≤ s ≤  .    s s ≥ 0 and consider the regularized problem of (10.19);

min J(x) = F (x) + β ψ(Gx − c), (10.22) whee Z ψ(y) = max(0, y) dω Ω Theorem (Consistency) Assume F is weakly lower semi-continuous. For an arbi- trary β > 0, any weak cluster point of the solution x,  > 0 of the regularized problem (10.22) converges to a solution x of the non-smooth penalized problem (10.19) as  → 0.  Proof: First we note that 0 ≤ max (s) − max(0, s) ≤ for all s ∈ R. Let x be a  2 solution to (10.19) and x be a solution of the regularized problem (10.22). Then we have F (x) + β ψ(x) ≤ F (x) + β ψ(x)

F (x) + β ψ(x) ≤ F (x) + β ψ(x).

196 Adding these inequalities,

ψ(x) − ψ(x) ≤ ψ(x) − ψ(x).

Thus, we have for all cluster pointsx ¯ of x

F (¯x) + β ψ(¯x) ≤ F (x) + β ψ(x), since F is weakly lower semi-continuous and

ψ(x) − ψ(¯x) = ψ(x) − ψ(x) + ψ(x) − ψ(¯x)

≤ ψ(x) − ψ(x) + ψ(x) − ψ(¯x). and lim (ψ(x) − ψ(x) + ψ(x) − ψ(¯x)) ≤ 0. →0+ As a consequence of Theorem, we have Corollary If (10.19) has the unique minimizer in a neighborhood of x∗ and (10.21) ∗ holds, then x converges to the solution x of (10.17) as  → 0. The necessary optimality for (10.22) is given by

0 ∗ 0 F (x) + G ψ(Gx − c) = 0. (10.23) Although the non-uniqueness for concerning subdifferential in the optimality condition in (10.20) is now bypassed through regularization, the optimality condition (10.23) 0 is still nonlinear. For this objective, we first observe that max has an alternative representation, i.e.,  0 0 s ≤ 0 max(s) = χ(s)s, χ(s) = 1 (10.24) max(,s) s > 0 This suggests the following semi-implicit fixed point iteration;

k+1 k 0 k ∗ k α P (x − x ) + F (x ) + β G χk(Gx − c) = 0, χk = χ(Gxk − c), (10.25) where P is positive, self-adjoint and serves a pre-conditioner for F 0 and α > 0 serves a stabilizing and acceleration stepsize (see, Theorem 2). Let dk = xk+1 − xk. Equation (10.25) for xk+1 is equivalent to the equation for dk

k 0 k ∗ k k α P d + F (x ) + βG χk(Gx − c + Gd ) = 0, which gives us ∗ k 0 k (α P + β G χkG)d = −F(x ). (10.26) k k The direction d is a decent direction for J(x) at x , indeed,

0 k ∗ k k k k k (dk,J(x )) = −((α P + β G χkG)dk, d ) = −α(Dd , d ) − β(χkGd , Gd ) < 0. Here we used the fact that P is strictly positive definite. So, the iteration (10.25) can be seen as a decent method:

197 Algorithm (Fixed point iteration)

Step 0. Set parameters: β, α, , P and 0 < ρ ≤ 1

∗ k 0 k Step 1. Compute the direction by (α P + βG χkG)d = −F (x )

Step 2. Update xk+1 =)xk + ρdk.

0 k Step 3. If |J(x )| < T OL, then stop. Otherwise repeat Step 1 - Step 2.

Let us make some remarks on the Algorithm. In many applications, the structure of F 0 and G are sparse block diagonals, and the resulting system (10.26) for the direction dk then becomes a linear system with a sparse symmetric positive-definite matrix; and can be efficiently solved by the Cholesky decomposition method, for example. If 1 0 F (x) = 2 (x, Ax) − (b, x), then we have F (x) = Ax − b. For this case we may use the alternative update

k+1 k k+1 ∗ k+1 α P (x − x ) + Ax − b + β G χk(Gx − c) = 0, assuming that it doesn’t cost much to perform this fully implicit step. For the bilateral inequality constraint g ≤ Gx ≤ c the update (10.25) becomes

k+1 k 0 k ∗ k+1 k+1 α P (x − x ) + F (x ) + β G (χk(Gx − c) + + (g − Gx ) − (10.27) Ak Ak where the active index set Ak and the diagonal matrix χk on Ak are defined by

+ k 1 if j ∈ A = {j :(Gx − c)j ≥ 0}, (χk)jj = k . k max(,(Gx −c)j )

− k 1 if j ∈ A = {j :(Gx − g)j ≤ 0}, (χk)jj = k k max(,(g−Gx )j ) The damping parameter 0 < ρ ≤ 1 is used to globalizing the convergence of the proposed algorithm, i.e., we select ρ to have the decent property for J(x). The algo- rithm is globally convergent with ρ = 1 practically and the following results justify the fact. Theorem 3 Let

R(x, xˆ) = −(F (x) − F (ˆx) − F 0(ˆx)(x − xˆ)) + α (P (x − xˆ), x − xˆ) ≥ ω |x − xˆ|2 for some ω > 0 and all x, xˆ. Then, we have

k+1 k k+1 k β k k R(x , x ) + F (x ) − F (x ) + 2 ((χkGd , Gd )

X k+1 2 k 2 X k+1 2 k 2 + (χk, |Gx − c| − |Gx − c| ) + (χk, |Gx − g| − |Gx − g| ) = 0, k=j0 k=j00

0 k 00 k where {j :(Gx − c)j0 ≥ 0} and {j :(Gx − g)j00 ≤ 0}. Proof: Multiplying (10.27) by dk = xk+1 − xk

α (P dk, dk) − (F (xk+1) − F (xk) − F 0(xk)dk)

k+1 k +F (x ) − F (x ) + Ek = 0,

198 where

k+1 k β t k k Ek = β(χk(G(j, :)x − c),G(j, :)d ) = 2 ((χkG(j, :) d ,G(j, :)d )

β X X + (χ , |Gxk+1 − c|2 − |Gxk − c|2) + (χ , |Gxk+1 − g|2 − |Gxk − g|2)), 2 k k k=j0 k=j00

Corollary 2 Suppose all inactive indexes {χk = 0} remain inactive, then

k+1 k 2 k+1 k ω |x − x | + J(x ) − J(x ) ≤ 0 and {xk} is globally convergent. Proof: Since s → ψ(p|s − c|) on s ≥ c is and s → ψ(p|s − g|) on s ≤ g are concave, we have

k+1 2 k 2 k+1 k (χk, |(Gx − c)j0 | − |(Gx − c)j0 | ) ≥ ψ((Gx − c)j0 ) − ψ((Gx − c)j0 ) and

k+1 2 k 2 k+1 k (χk, |(g − Gx )j00 | − |(g − Gx )j00 | ) ≥ ψ((g − Gx )j00 ) − ψ((g − Gx )j00 )

Thus, we obtain β F (xk+1)+β ψ (xk+1)+R(xk+1, xk)+ (χ G(j, :)(xk+1−xk),G(j, :)(xk+1−xk)) ≤ F (xk)+β ψ (xk).  2 k 

2 k k If we assume R(x, xˆ) ≥ ω |x − xˆ| for some ω > 0, then F (x ) + ψ(Gx − c) is monotonically decreasing and

X k+1 k 2 |x − x | < ∞.

c k k+1 Corollary 3 Suppose i ∈ Ak = I is inactive but ψ(Gxi − c) > 0. Assume

β X (χ G(xk+1 − xk),G(xk+1 − xk)) − β ψ (Gxk+1 − c) ≥ −ω0 |xk+1 − xk|2 (10.28) 2 k  i i∈Ik with 0 ≤ ω0 < ω, then the algorithm is globally convergent. The condition (10.28) holds if n=5000; m=50; A=rand(n,n); A=A+A’; A=A+70*eye(n); C=rand(m,n); f=[ones(n,1);.5*ones(m,1)]; u=A\f(1:n); k=find(C*u(1:n)>=.5*ones(m,1)); AA=[A C(k,:)’;C(k,:) -1e-8*eye(size(k,1))]; uu=u; u=AA\[f(1:n);.5*ones(size(k,1),1)]; norm(u(1:n)-uu(1:n))

199 11 Sensitivity analysis

In this section the sensitivity analysis is discussed for the parameter-dependent opti- mization. Consider the parametric optimization problem;

F (x, p) x ∈ C. (11.1)

For given p ∈ P , a complete matrix space, let x = x(p) ∈ C is a minimizer of (11.1). Forp ¯ ∈ P and p =p ¯ + tp˙ ∈ P with t ∈ R and incrementp ˙ we have

F (x(p), p) ≤ F (x(¯p), p),F (x(¯p), p¯) ≤ F (x(p), p¯).

Thus, we have

F (x(p), p) − F (x(¯p), p¯) = F (x(p), p) − F (x(¯p), p) + F (x(¯p), p) − F (x(¯p), p¯) ≤ F (x(¯p), p) − F (x(¯p), p¯)

F (x(p), p) − F (x(¯p), p¯) = F (x(p), p¯) − F (x(¯p), p¯) + F (x(p), p) − F (x(p), p¯) ≥ F (x(p), p) − F (x(p), p¯) and

F (x(p), p) − F (x(p), p¯) ≤ F (x(p), p) − F (x(¯p), p¯) ≤ F (x(¯p), p) − F (x(¯p), p¯).

Hence for t > 0 we have F (x(¯p + tp˙), p¯ + tp˙) − F (x(¯p + tp˙), p¯) F (x(¯p + tp˙), p¯ + tp˙) − F (x(¯p), p¯) F (x(¯p), p¯ + tp˙) − F (x(¯p), p¯) ≤ ≤ t t t F (x(¯p), p¯) − F (x(¯p), p¯ − tp˙) F (x(¯p), p¯) − F (x(¯p − tp˙), p¯ − tp˙) F (x(¯p − tp˙), p¯) − F (x(¯p − t p˙), p¯ − tp˙) ≤ ≤ . t t t Based on this inequality we have Theorem (Sensitivity I) Assume (H1) there exists the continuous graph p → x(p) ∈ C in a neighborhood of (x(¯p), p¯) ∈ C × P and define the value function

V (p) = F (x(p), p).

(H2) in a neighborhood of (x(¯p), p¯)

(x, p) ∈ C × Q → Fp(x, p) is continuous.

Then, the G-derivative of V atp ¯ exists and is given by

0 V (¯p)p ˙ = Fp(x(¯p, p¯)p. ˙ (11.2)

Conversely, if V (p) is differentiable atp ¯ then (11.2) holds.

Lemma (Sensitivity I) Suppose V (p) = infx∈C F (x, p) where C is a closed convex set. If p ∈ F (x, p) is convex, then V (p) is concave and Fp(¯x, p¯) ∈ −∂V (¯p). Proof: Since for 0 ≤ t ≤ 1

F (x, t p1 + (1 − t) p2) ≥ t F (x, p1) + (1 − t) F (t, p2)

200 we have V (t p1 + (1 − t) p2) ≥ t V (p1) + (1 − t) V (p2). and V is concave. Since

V (p) − V (¯p) = F (x, p) − F (¯x, p) + F (¯x, p) − F (¯x, p¯) ≤ F (¯x, p) − F (¯x, p¯) we have Fp(¯x, p¯) ∈ −∂V (¯p). Next, consider the implicit function case. Consider a functional defined by

V (p) = F (x(p), p),E(x(p), p) = 0 where we assume that the constraint E is C1 and E(x, p) = 0 defines a continuous solution graph p → x(p) in a neighborhood of (x(¯p), p¯). Theorem (Sensitivity II) Assume there exists λ that satisfies the adjoint equation

∗ Ex(x(¯p), p¯) λ + Fx(x(¯p), p¯) = 0 and assume the Lagrange functional

L(x, p, λ) = F (x) + (E(x), λ) satisfies

L(x(p), p,¯ λ) − L(x(¯p), p,¯ λ) − Lx(x(¯p), p,¯ λ)(x(p) − x(¯p)) = o(|p − p¯|). (11.3)

Then, p → V (p) is differentiable atp ¯ and

0 V (¯p)p ˙ = (Lp(x(¯p), p,¯ λ), p˙).

Proof: If we let

0 F (x(p), p) − F (x(¯p, (¯p) − F (x(¯p))(x(p) − x(ˆp)) = 1, we have

V (p) − V (¯p) − Fx(x(¯p))(x(p) − x(ˆp)) = 1 + Fp(x(¯p), p¯)(p − p¯) + o(|p − p¯|).

Note that

(Ex(x(¯p), p¯)(x(p) − x(¯p)), λ) + Fx(x(¯p))(x(p) − x(¯p)) = 0.

Let 2 = (E(x(p), p¯) − E(x(¯p), p¯) − Ex(x(¯p), p¯)(x(p) − x(¯p)), λ), where

0 = (E(x(p), p) − E(x(¯p), p¯), λ) = 2 − Ep(x(¯p), p¯)(p − p¯)), λ) + o(|p − p¯|)

Thus, summing these we obtain

V (p) − V (¯p) = Lp(x(¯p), p¯)(p − p¯) + 1 + 2 + o(|p − p¯|)

201 Since

L(x(p), p,¯ λ) − L(x(¯p), p,¯ λ) − Lx(x(¯p), p,¯ λ)(x(p) − x(¯p)) = 1 + 2, p → V (x(p)) is differentiable atp ¯ and the desired result.  Lemma (Sensitivity II) If F (x, p) = F (x), p → E(x, p) is linear and x → L(x, p, λ) is convex, then p → V (p) is concave and Lp(¯x, p,¯ λ) ∈ −∂V (¯p). If we assume that E and F is C2 in x, then it is sufficient to have the Holder continuity 1 2 |x(p) − x(¯p)|X ∼ o(|p − p¯|Q) for condition (11.3) holding. Next, consider the parametric constrained optimization:

F (x, p) subject to E(x, p) ∈ K.

We assume (H3) λ ∈ Y satisfies the adjoint equation

∗ Ex(x(¯p), p¯) λ + Fx(¯p), p¯) = 0.

(H4) Conditions (11.3) holds. (H5) In a neighborhood of (x(¯p), p¯).

(x, p) ∈ C × P → Ep(x, p) ∈ Y is continuous.

Eigenvalue problem Let A(p) be a linear operator in a Hilbert space X. For x = (µ, y) ∈ R × X F (x) = µ, subject to A(p)y = µ y. That is, (µ, y) is an eigen pair of A(p).

A(p)∗λ − µ λ = 0, (y, λ) = 1

Thus, d µ0(p ˙) = ( A(p)py, ˙ λ). dp Next, consider the parametric constrained optimization; given p ∈ Q

min F (x, u, p) subject to E(x, u, p) = 0. (11.4) x,u

Let (¯x, u¯) is a minimizer givenp ¯ ∈ Q. We assume (H3) λ ∈ Y satisfies the adjoint equation

∗ Ex(¯x, u,¯ p¯) λ + Fx(¯x, u,¯ p¯) = 0.

(H4) Conditions (11.3) holds. (H5) In a neighborhood of (x(¯p), p¯).

(x, p) ∈ C × Q → Ep(x, p) ∈ Y is continuous.

202 Theorem (Sensitivity III) Under assumptions (H1)–(H5) the necessary optimality for (13.18) is given by ∗ Exλ + Fx(¯x, u,¯ p¯) = 0

∗ Euλ + Fu(¯x, u,¯ p¯) = 0

E(¯x, u,¯ p¯) = 0 and 0 V (¯p)p ˙ = Fp(¯x, u,¯ p¯)p ˙ + (λ, Ep(¯x, u,¯ p¯)p ˙) = 0. Proof: F (x(p), p) − F (x(¯p), p¯) = F (x(p), p¯) − F (x(¯p), p¯) + F (x(p), p) − F (x(p), p¯)

≥ Fx(x(¯p), p¯)(x(p) − x(¯p) + F (x(p), p) − F (x(p), p¯)

F (x(p), p) − F (x(¯p), p¯) = F (x(p), p) − F (x(¯p), p) + F (x(¯p), p) − F (x(¯p), p¯)

≤ Fx(x(p), p)(x(p) − x(¯p)) + F (x(¯p), p) − F (x(¯p), p¯).

Note that

E(x(p), p)−E(x(¯p), p¯) = Ex(x(¯p))(x(p)−x(¯p)+E(x(p), p)−E(x(p), p¯) = o(|x(p)−x(¯p|) and (λ, Ex(¯x, p¯)(x(p) − x(¯p)) + Fx(¯x, p¯)(x(p) − x(¯p)i = 0 Combining the above inequalities and letting t → 0, we have

12 Augmented Lagrangian Method

In this section we discuss the augmented Lagrangian method for the constrained min- imization; min F (x) subject to E(x) = 0,G(x) ≤ 0 (12.1) over x ∈ C. For the equality constraint case we consider the augmented Lagrangian functional c L (x, λ) = F (x) + (λ, E(x)) + |E(x)|2. c 2 for c > 0. The augmented Lagrangian method is an iterative method with the update

xn+1 = argminx∈C Lc(x, λn)

λn+1 = λn + c E(xn).

It is a combination of the multiplier method and the penalty method. That is, if c = 0 and λn+1 = λn + α E(xn) with step size α > 0 is the multiplier method and if λn = 0 is the penalty method. The multiplier method converges (locally) if F is (locally) uniformly convex. The penalty method requires c → ∞ and may suffer if c > 0 is too

203 large. It can be shown that the augmented Lagrangian method is locally convergent 00 ¯ provided that Lc (x, λ) is uniformly positive near the solution pair (¯x, λ). Since L00(¯x, λ¯) = F 00(¯x) + (λ,¯ E00(¯x)) + c E0(¯x)∗E0(¯x), the augmented Lagrangian method has an enlarged convergent class compared to the multiplier method. Unlike the penalty method it is not necessary to c → ∞ and the Lagrange multiplier update speeds up the convergence. For the inequality constraint case G(x) ≤ 0 we consider the equivalent formulation: c min F (x) + |G(x) − z|2 + (µ, G(x) − z) over (x, z) ∈ C × Z 2 subject to G(x) = z, and z ≤ 0. For minimizing this over z ≤ 0 we have that µ + c G(x) z∗ = min(0, ) c attains the minimum and thus we obtain 1 min F (x) + (| max(0, µ + c G(x))|2 − |µ|2) over x ∈ C. 2c For (12.1), given (λ, µ) and c > 0, define the augmented Lagrangian functional c 1 L (x, λ, µ) = F (x) + (λ, E(x)) + |E(x)|2 + (| max(0, µ + c G(x))|2 − |µ|2). (12.2) c 2 2c

The first order augmented Lagrangian method is a sequential minimization of Lc(x, λ, µ) over x ∈ C; Algorithm (First order Augmented Lagrangian method)

1. Initialize (λ0, µ0)

2. Let xn be a solution to

min Lc(x, λn, µn) over x ∈ C.

3. Update the Lagrange multipliers

λn+1 = λn + c E(xn), µn+1 = max(0, µn + c G(xn)). (12.3)

Remark (1) Define the value function

Φ(λ, µ) = min Lc(x, λ, µ). x∈C Then, one can prove that

Φλ = E(x), Φµ = max(0, µ + c G(x))

(2) The Lagrange multiplier update (12.3) is a gradient method for maximizing Φ(λ, µ).

204 (3) For the equality constraint case c L (x, λ) = F (x) + (λ, E(x)) + |E(x)|2 c 2 and 0 0 Lc(x, λ) = L0(x, λ + c E(x)). The necessary optimality for max min Lc(x, λ) λ x is given by 0 0 Lc(x, λ) = L0(x, λ + c E(x)) = 0,E(x) = 0 (12.4) If we apply the Newton method to system (12.4), we obtain

 00 0 ∗ 0 0 ∗   +   0  L0(x, λ + c E(x)) + c E (x) E (x) E (x) x − x L0(x, λ + c E(x))     = −   . E0(x) 0 λ+ − λ E(x)

00 The c > 0 adds the coercivity on the Hessian L0 term.

12.1 Primal-Dual Active set method For the inequality constraint G(x) ≤ 0 (e.g., Gx − c˜ ≤ 0); 1 L (x, µ) = F (x) + (| max(0, µ + c G(x)|2 − |µ|2) c 2c and the necessary optimality condition is

F 0(x) + G0(x)∗µ, µ = max(0, µ + c G(x)) where c > 0 is arbitrary. Based on the complementarity condition we have primal dual active set method. Primal Dual Active Set method For the affine case G(x) = Gx − c˜ we have the Primal Dual Active Set method as: 1. Define the active index and inactive index by

A = {k ∈ (µ + c (Gx − c˜))k > 0}, I = {j ∈ (µ + c (Gx − c˜))j ≤ 0}

2. Let µ+ = 0 on I and Gx+ =c ˜ on A. + + 3. Solve for (x , µA)

00 + 0 ∗ + + F (x)(x − x) + F (x) + GAµ = 0,GAx − c = 0

4. Stop or set n = n + 1 and return to step 1,

205 where GA = {Gk}, k ∈ A. For the nonlinear G

0 + G (x)(x − x) + G(x) = 0 on A = {k ∈ (µ + c G(x))k > 0}.

The primal dual active set method is a semismooth Newton method. That is, if we define a generalized derivative of s → max(0, s) by 0 on (−∞, 0] and 1 on (0, ∞). Note that s → max(0, s) is not differentiable at s = 0 and we define the derivative at s = 0 as the limit of derivative for s < 0. So, we select the generalized derivative of max(0, s) as 0, s ≤ 0 and 1, s > 0 and thus the generalized Newton update

µ+ − µ + 0 = µ if µ + c (Gx − c) ≤ 0

c G(x+ − x) + c (Gx − c˜) = Gx+ − c˜ = 0 if µ + c (Gx − c˜) > 0, which results in the active set strategy

+ + µj = 0, j ∈ I and (Gx − c˜)k = 0, k ∈ A. We will introduce the class of semismooth functions and the semismooth Newton method in Chapter 7 in function spaces. Specifically, the pointwise (coordinate) oper- ation s → max(0, s) defines a semismooth function from Lp(Ω) → Lq(Ω), p > q.

13 Lagrange multiplier Theory for Nonsmooth convex Optimization

In this section we present the Lagrange multiplier theory for the nonsmmoth optimiza- tion of the form min f(x) + ϕ(Λx) over x ∈ C (13.1) where X is a Banach space, H is a Hilbert space lattice, f : X → R is C1,Λ ∈ L(X,H) and C is a closed convex set in X. The nonsmoothness is represented by ϕ and ϕ is a proper, lower semi continuous convex functional in H. This problem class encompasses a wide variety of optimization problems including variational inequalities of the first and second kind [?, ?]. We will specify those to our specific examples. Moreover, the exact penalty formulation of the constraint problem is + min f(x) + c |E(x)|L1 + c |(G(x)) |L1 (13.2) where (G(x))+ = max(0,G(x)). We describe the Lagrange multiplier theory to deal with the non-smoothness of ϕ. Note that (13.1) is equivalent to

min f(x) + ϕ(Λx − u) (13.3) subject to x ∈ C and u = 0 in H.

Treating the equality constraint u = 0 in (13.3) by the augmented Lagrangian method, results in the minimization problem

c 2 min f(x) + ϕ(Λx − u) + (λ, u)H + |u|H , (13.4) x∈C,u∈H 2

206 where λ ∈ H is a multiplier and c is a positive scalar penalty parameter. By (13.6) we have min Lc(y, λ) = f(x) + ϕc(Λx, λ). (13.5) y∈C where c 2 ϕc(u, λ) = infv∈H {ϕ(u − v) + (λ, v)H + 2 |v|H } (13.6) c 2 = inf {ϕ(z) + (λ, u − z)H + |u − z|H } (u − v = z). z∈H 2 13.1 Yosida-Moreau approximations and Monotone Op- erators

For u, λ ∈ H and c > 0, ϕc(u, λ) is called the generalized Yoshida-Moreau approxima- tions ϕ. It can be shown that u → ϕc(u, λ) is continuously Fr´echet differentiable with Lipschitz continuous derivative. Then, the augmented Lagrangian functional [?, ?, ?] of (13.1) is given by Lc(x, λ) = f(x) + ϕc(Λx, λ). (13.7) Let ϕ∗ is the convex conjugate of ϕ

∗ c 2 ϕc(u, λ) = infz∈H {supy∈H {(y, z) − ϕ (y)} + (λ, u − z)H + 2 |u − z|H } 1 (13.8) = sup{− |y − λ|2 + (y, u) − ϕ∗(y)}. y∈H 2c where we used

c 2 1 2 inf {(y, z)H + (λ, u − z)H + |u − z|H } = (y, u) − |y − λ|H z∈H 2 2c Thus, from Theorem 2.2

0 1 2 ∗ ϕc(u, λ) = yc(u, λ) = argmaxy∈H {− 2c |y − λ| + (y, u) − ϕ (y)} (13.9) = λ + c vc(u, λ) where c ∂ v (u, λ) = argmin {ϕ∗(u − v) + (λ, v) + |v|2} = ϕ (u, λ) c v∈H 2 ∂λ c Moreover, if 1 p (λ) = argmin { |p − λ|2 + ϕ∗(p)}, (13.10) c p∈H 2c then (13.9) is equivalent to 0 ϕc(u, λ) = pc(λ + c u). (13.11) Theorem (Lipschitz complementarity) 0 (1) If λ ∈ ∂ϕ(x) for x, λ ∈ H, then λ = ϕc(x, λ) for all c > 0. 0 (2) Conversely, if λ = ϕc(x, λ) for some c > 0, then λ ∈ ∂ϕ(x). Proof: If λ ∈ ∂ϕ(x), then from (13.6) and (13.8)

∗ ϕ(x) ≥ ϕc(x, λ) ≥ hλ, xi − ϕ (λ) = ϕ(x).

207 0 Thus, λ ∈ H attains the supremum of (13.9) and we have λ = ϕc(x, λ). Conversely, 0 if λ ∈ H satisfies λ = ϕc(x, λ) for some c > 0, then vc(x, λ) = 0 by (13.9). Hence it follows from that ∗ ϕ(x) = ϕc(x, λ) = (λ, x) − ϕ (λ). which implies λ ∈ ∂ϕ(x). 

If xc ∈ C denotes the solution to (13.5), then it satisfies

0 ∗ hf (xc) + Λ λc, x − xciX∗,X ≥ 0, for all x ∈ C (13.12) 0 λc = ϕc(Λxc, λ).

Under appropriate conditions [?, ?, ?] the pair (xc, λc) ∈ C × H has a (strong-weak) cluster point (¯x, λ¯) as c → ∞ such thatx ¯ ∈ C is the minimizer of (13.1) and that λ¯ ∈ H is a Lagrange multiplier in the sense that 0 ∗¯ hf (¯x) + Λ λ, x − x¯iX∗,X ≥ 0, for all x ∈ C, (13.13) with the complementarity condition ¯ 0 ¯ λ = ϕc(Λ¯x, λ), for each c > 0. (13.14) System (13.13)–(13.14) is for the primal-dual variable (¯x, λ¯). The advantage here is that the frequently employed differential inclusion λ¯ ∈ ∂ϕ(Λ¯x) is replaced by the equivalent nonlinear equation (13.14). The first order augmented Lagrangian method for (13.1) is given by First order augmented Lagrangian method • Select λ0 and set n = 0.

• Let xn+1 = argminx∈CLc(x, λn). 0 • Update the Lagrange multiplier by λn+1 = ϕc(Λxn+1, λn). • Stop or set n = n + 1 and return to step 1.

The step minx∈C Lc(x, λn) can be spitted into the coordinate-wise as Coordinate splitting method • Select λ0 and set n = 0 c 2 • zn+1 = argminz{ϕ(v) + (λn, Λxn − z)H + 2 |Λxn − z|H }. c 2 • xn+1 = argminx∈C {f(x) + (λn, Λx − zn+1)H + 2 |Λx − zn+1|H }. • Update the Lagrange multiplier by λn+1 = pc(λn + c Λxn+1). • Stop or set n = n + 1 and return to step 1. In many applications, the convex conjugate functional ϕ∗ of ϕ is given by

∗ ϕ (v) = IK∗ (v),

∗ where K is a closed convex set in H and IS is the indicator function of a set S. From (13.8) 1 2 ϕc(u, λ) = sup {− |y − λ| + (y, u)} y∈K∗ 2c

208 and it follows from (13.10)–(13.11) that (13.14) is equivalent to ¯ ¯ λ = P rojK∗ (λ + c Λ¯x). (13.15) which is the basis of our approach.

13.2 Examples In this section we discuss the applications of the Lagrange multiplier theorem and the complementarity. 2 2 Example (Inequality) Let H = L (Ω) and ϕ = IC with C = {s ∈ L (Ω) : z ≤ 0, a.e., i.e., min f(x) over x ∈ Λx − c ≤ 0. (13.16) In this case ∗ ∗ ϕ (v) = IK∗ (v) with K = −C and the complementarity (15.26) implies that

f 0(¯x) + Λ∗λ¯ = 0

λ¯ = max(0, λ¯ + c (Λ¯x − c)). Z Example (L1(Ω) optimization) Let H = L2(Ω) and ϕ(v) = |v| dx, i.e. Ω Z min f(y) + |Λy|2 dx. (13.17) Ω In this case

∗ ∗ 2 ϕ (v) = IK∗ (v) with K = {z ∈ L (Ω) : |z|2 ≤ 1, a.e.} and the complementarity (15.26) implies that

f 0(¯x) + Λ∗λ¯ = 0

¯ λ¯ = λ + c Λ¯x a.e.. max(1, |λ¯ + c Λ¯x|2) Exaples (Constrained optimization) Let X be a Hilbert space and consider

min f(x) subject to Λx ∈ C (13.18) where f : X → R is C1 and C is a closed convex set in X. In this case we let f = J and Λ is the natural injection and ϕ = IC.

c 2 ϕc(u, λ) = inf {(λ, u − z)H + |u − z|H }. (13.19) z∈C 2 and λ z∗ = P roj (u + ) (13.20) C c

209 attains the minimum in (13.19) and

0 ∗ ϕc(u, λ) = λ + c (u − z ) (13.21) If H = L2(Ω) and C is the bilateral constraint {y ∈ X : φ ≤ Λy ≤ ψ}, the complemen- tarity (13.20)–(13.21) implies that for c > 0

f 0(x∗) + Λ∗µ = 0 (13.22) µ = max(0, µ + c (Λx∗ − ψ) + min(0, µ + c (Λx∗ − φ)) a.e..

If either φ = −∞ or ψ = ∞, it is unilateral constrained and defines the generalized obstacle problem. The cost functional J(y) can represent the performance index for the optimal control and design problem, the fidelity of data-to-fit for the inverse problem and deformation and restoring energy for the variational problem.

13.3 Monotone Operators and Yosida-Morrey approxi- mations 13.4 Lagrange multiplier Theory In this action we introduce the generalized Lagrange multiplier theory. We establish the conditions for the existence of the Lagrange multiplier λ¯ and derive the optimal- ity condition (1.7)–(1.8). In Section 1.5 it is shown that the both Uzawa and aug- mented Lagrangian method are fixed point methods for the complementarity condition (??))and the convergence analysis is presented . In Section 1.6 we preset the concrete examples and demonstrate the applicability of the results in Sections 1.4–1.5.

14 Semismooth Newton method

In this section we present the semismooth Newton method for the nonsmooth equation in Banach spaces, especially the necessary optimality condition, e.g., the complemen- tarity condition (13.12): 0 λ = ϕc(Λx, λ). Consider the nonlinear equation F (y) = 0 in a Banach space X. The generalized Newton update is given by

k+1 k −1 k y = y − Vk F (y ), (14.1)

k where Vk is a generalized derivative of F at y . In the finite dimensional space for a locally Lipschitz continuous function F let DF denote the set of points at which F is n differentiable. For x ∈ X = R we define the Bouligand derivative ∂BF (x) as   ∂BF (x) = J : J = lim ∇F (xi) . (14.2) xi→x, xi∈DF

Here, DF is dense by Rademacher’s theorem which states that every locally Lipschitz continuous function in the finite dimensional space is differentiable almost everywhere. k Thus, we take Vk ∈ ∂BF (y ).

210 In infinite dimensional spaces notions of generalized derivatives for functions which are not C1 cannot rely on Rademacher’s theorem. Here, instead, we shall mainly utilize a concept of generalized derivative that is sufficient to guarantee superlinear convergence of Newton’s method [?]. This notion of differentiability is called Newton derivative and is defined below. We refer to [?, ?, ?, ?] for further discussion of the notions and topics. Let X,Z be real Banach spaces and let D ⊂ X be an open set. Definition (Newton differentiable) (1) F : D ⊂ X → Z is called Newton dif- ferentiable at x, if there exists an open neighborhood N(x) ⊂ D and mappings G: N(x) → L(X,Z) such that |F (x + h) − F (x) − G(x + h)h| lim Z = 0. |h|→0 |h|X The family {G(y): y ∈ N(x)} is called a N-derivative of F at x. (2) F is called semismooth at y, if it is Newton differentiable at y and

lim G(y + t h)h exists uniformly in |h| = 1. t→0+ Semi-smoothness was originally introduced in [?] for scalar-valued functions. Con- vex functions and real-valued C1 functions are examples for such semismooth functions [?, ?] in the finite dimensional space. For example, if F (y)(s) = ψ(y(s)), point-wise, then G(y)(s) ∈ ψB(y(s)) is an N- derivative in Lp(Ω) → Lq(Ω) under appropriate conditions [?]. We often use ψ(s) = |s| and max(0, s) for the necessary optimality. Suppose F (y∗) = 0, Then, y∗ = yk + (y∗ − yk) and

k+1 ∗ −1 ∗ k ∗ k −1 k ∗ |y − y | = |Vk (F (y ) − F (y ) − Vk(y − y )| ≤ |Vk |o(|y − y |). Thus, the semismooth Newton method is q-superlinear convergent provided that the k ∗ Jacobian sequence Vk is uniformly invertible as y → y . That is, if one can select ∗ k ∗ a sequence of quasi-Jacobian Vk that is consistent, i.e., |Vk − G(y )| as y → y and invertible, (14.1) is still q-superlinear convergent.

14.1 Bilateral constraint Consider the bilateral constraint problem

min f(x) subject to φ ≤ Λx ≤ ψ. (14.3)

We have the optimality condition (13.22):

f 0(x) + Λ∗µ = 0

µ = max(0, µ + c (Λx − ψ) + min(0, µ + c (Λx − φ)) a.e.. Since ∂B max(0, s) = {0, 1}, ∂B min(0, s) = {−1, 0}, at s = 0, the semismooth Newton method for the bilateral constraint (14.3) is of the form of the primal-dual active set method: Primal dual active set method

211 1. Initialize y0 ∈ X and λ0 ∈ H. Set k = 0. 2. Set the active set A = A+ ∪ A− and the inactive set I by

A+ = {x ∈ Ω: µk+c (Λyk−ψ) > 0}, A− = {x ∈ Ω: µk+c (Λyk−φ) < 0}, I = Ω\A

3. Solve for (yk+1, λk+1)

J 00(yk)(yk+1 − yk) + J 0(yk) + Λ∗µk+1 = 0

Λyk+1(x) = ψ(x), x ∈ A+, Λyk−1(x) = φ(x), x ∈ A−, µk+1(x) = 0, x ∈ I

4. Stop, or set k = k + 1 and return to the second seep. Thus the algorithm involves solving the linear system of the form

 ∗ ∗      A ΛA+ ΛA− y f  ΛA+ 0 0   µA+  =  ψ  . ΛA− 0 0 µA− φ In order to gain the stability the following one-parameter family of the regularization can be used [?, ?]

µ = α (max(0, µ + c (Λy − ψ)) + min(0, µ + c (Λy − φ))), α → 1+.

14.2 L1 optimization R 1 As an important example, we discuss the case of ϕ(v) = Ω |v|2 dx (L -type optimiza- tion). The complementarity is reduced to the form

λ max(1 + , |λ + c v|) = λ + c v (14.4) for  ≥ 0. It is often convenient (but not essential) to use very small  > 0 to avoid the singularity for the implementation of algorithm. Let

 2 s +  , if |s| ≤ ,  2 2 ϕ(s) = (14.5) |s|, if |s| ≥ .

Then, (14.4) corresponds to the regularized one of (4.31):

min J(y) = F (y) + ϕ(Λy).

The semismooth Newton method is given by

F 0(y+) + Λ∗λ+ = 0  v+ λ+ = if |λ + v| ≤ 1 + ,    ! (14.6)   λ + c v t |λ + c v| λ+ + λ (λ+ + c v+)  |λ + c v|   = λ+ + c v+ + |λ + c v| λ if |λ + c v| > 1 + .

212 There is no guarantee that (14.6) is solvable for λ+ and is stable. In order to obtain the compact and unconditionally stable formula we use the damped and regularized algorithm with β ≤ 1;  v λ+ = if |λ + c v| ≤ 1 + ,      !  λ  λ + c v t |λ + c v| λ+ − β (λ+ + c v+)  max(1, |λ|) |λ + c v|   λ  = λ+ + c v+ + β |λ + c v| if |λ + c v| > 1 + .  max(1, |λ|) (14.7) λ Here, the purpose of the regularization |λ|∧1 is to automatically constrain the dual variable λ into the unit ball. The damping factor β is automatically selected to achieve the stability. Let λ λ + c v d = |λ + c v|, η = d − 1, a = , b = ,F = abt. |λ| ∧ 1 |λ + c v| Then, (14.7) is equivalent to

λ+ = (η I + β F )−1((I − β F )(c v+) + βd a), where by Sherman–Morrison formula 1 β (η I + β F )−1 = (I − F ). η η + β a · b Then, β d (η I + β F )−1βd a = a. η + β a · b Since F 2 = (a · b) F , 1 βd (η I + β F )−1(I − β F ) = (I − F ). η η + β a · b In order to achieve the stability, we let β d d − 1 = 1, i.e., β = ≤ 1. η + β a · b d − a · b Consequently, we obtain a compact Newton step 1 λ λ+ = (I − F )(c v+) + , (14.8) d − 1 |λ| ∧ 1 which results in; Primal-dual active set method (L1-optimization) 1. Initialize: λ0 = 0 and solve F 0(y0) = 0 for y0. Set k = 0.

2. Set inactive set Ik and active set Ak by

k k k k Ik = {|λ + c Λy | > 1 + }, and Ak = {|λ + c Λy | ≤ 1 + }.

213 3. Solve for (yk+1, λk+1) ∈ X × H:

 0 k+1 ∗ k+1  F (y ) + Λ λ = 0     k  k+1 1 k k+1 λ λ = k (I − F )(c Λy ) + k in Ak and  d − 1 |λ | ∧ 1     k+1 k+1  Λy =  λ in Ik.

4. Convergent or set k = k + 1 and Return to Step 2. This algorithm is unconditionally stable and is rapidly convergent forour test examples. Remark (1) Note that |v+|2 − (a · v+)(b · v+) (λ+, v+) = + a · v+, (14.9) d − 1 which implies stability of the algorithm. In fact, since  F 0(yk+1) − F 0(yk) + Λ∗(λk+1 − λk)    1  λk+1 − λk = (I − F k)(c Λyk+1) + ( − 1) λk,  |λk| we have 1 (F 0(yk+1) − F 0(yk), yk+1 − yk) + c ((I − F k)Λyk+1, Λyk+1) + ( − 1) (λk, Λyk+1). |λk| Suppose 0 2 F (y) − F (x) − (F (y), y − x) ≥ ω |y − x|X , we have k X 1 F (yk) + ω |yj − yj−1|2 + ( − 1) (λj−1, vj) ≤ F (y0). |λj−1| ∧ 1 j=1 where vj = Λyj. Note that (λk, vk+1) > 0 implies that (λk+1, vk+1) > 0 and thus inactive at k+1-step, pointwise. This fact is a key step for proving a global convergence of the algorithm. (2) If a · b → 1+, then β → 1. Suppose (λ, v) is a fixed point of (14.8), then  1  c λ + c v  1 1 − 1 − · v λ = v. |λ| ∧ 1 d − 1 |λ + c v| d − 1 Thus, the angle between λ and λ + c v is zero and 1 c  |v| |v|  1 − = − , |λ| ∧ 1 |λ| + c |v| − 1 |λ| |λ| ∧ 1 which implies |λ| = 1. It follows that λ + c v λ + c v = . max(|λ + c v|, 1 + )

214 That is, if the algorithm converges, a · b → 1 and |λ| → 1 and it is consistent. (3) Consider the substitution iterate:  F 0(y+) + Λ∗λ+ = 0,   (14.10)  + 1 +  λ = v , v = Λy.  max(, |v|) Note that  v  |v+|2 − |v|2 + |v+ − v|2  , v+ − v = dx |v| 2|v|

 (|v+| − |v|)2 + |v+ − v|2  = |v+| − |v| − . 2|v| Thus, + J(v ) ≤ J(v), where the equality holds only if v+. This fact can be used to prove the iterative method (14.10) is globally convergent [?, ?]. It also suggests to use the hybrid method; for 0 < µ < 1 µ 1 λ λ+ = v+ + (1 − µ)( (I − F )v+ + ), max(, |v|) d − 1 max(|λ|, 1) in order to gain the global convergence property without loosing the fast convergence of the Newton method.

15 Non-convex nonsmooth optimization

In this section we consider a general class of non-convex, nonsmooth optimization problems. Let X be a Banach space and continuously embed to H = L2(Ω)m. Let J : X → R+ is C1 and N is nonsmooth functional on H is given by Z N(y) = h(y(ω)) dω. Ω where h : Rm → R is lower semi-continuous. Consider the minimization on X; min J(x) + N(x) (15.1) subject to x ∈ C = {x(ω) ∈ U a.e. } where U is a closed convex set in Rm. α 2 For example, we consider h(u) = 2 |u| + |u|0 where |s|0 = 1, s 6= 0 and |0|0 = 0. For an integrable function u Z |u(ω)|0 dω = meas({u 6= 0}). Ω Thus, one can formulate the volume control for material design and and time scheduling problem for the optimal control problem as (15.1).

215 15.1 Existence of Minimizer In this section we develop a existence method for a general class of minimizations

min G(x) (15.2) on a Banach space X, including (15.1). Here, the constrained problem x ∈ C is formulated as G(x) = G(x) + IC(x) using the indicator function of the constrained set ∗ C. Let X be a reflexible space or X = X˜ for some normed space X˜. Thus xn * x means the weak or weakly star convergence in X. To prove the existence of solutions to (15.2) we modify a general existence result given in [?]. Theorem (Existence) Let G be an extended real-valued functional on a Banach space X satisfying the following properties: (i) G is proper, weakly lower semi-continuous and G(x) +  |x|p for each  > 0 is coercive for some p. (ii) for any sequence {x } in X with |x | → ∞, xn * x¯ weakly in X, and {G(x )} n n |xn| n bounded from above, we have xn → x¯ strongly in X and there exists ρ ∈ (0, |x |] |xn| n n and n0 = n0({xn}) such that

G(xn − ρnx¯) ≤ G(xn), (15.3)

for all n ≥ n0.

Then minx∈X G(x) admits a minimizer.

Proof: Let {n} ⊂ (0, 1) be a sequence converging to 0 from above and consider the family of auxiliary problems p min G(x) + n |x| . (15.4) x∈x By (i), every minimizing sequence for (15.1) is bounded. Extracting a weakly conver- gent subsequence, the existence of a solution xn ∈ X for (15.1) can be argued in a standard manner. Suppose {xn} is bounded. Then there exists a weakly convergent subsequence, ∗ ∗ + denoted by the same symbols and x such that xn * x . Passing to the limit n → 0 in   G(x ) + n |x |p ≤ G(x) + n |x|p for all x ∈ X n 2 n 2 and using again (i) we have

G(x∗) ≤ G(x) for all x ∈ X and thus x∗ is a minimizer for G. We now argue that {xn} is bounded, and assume to the contrary that limn→∞ |xn| = ∞. Then a weakly convergent subsequence can be extracted from xn such that, again |xn| dropping indices, xn * x¯, for somex ¯ ∈ X. Since G(x ) is bounded from (ii) |xn| n x n → x¯ strongly in X. (15.5) |xn|

Next, choose ρ > 0 and n0 according to (15.3) and thus for all n ≥ n0

p p p G(xn) + n |xn| ≤ G(xn − ρx¯) + n |xn − ρx¯| ≤ G(xn) + n |xn − ρx¯| .

216 It follows that

xn xn ρ xn |xn| ≤ |xn − ρx¯| = |xn − ρn + ρn( − x¯)| ≤ |xn|(1 − ) + ρn| − x¯|. |xn| |xn| |xn| |xn| This implies that x 1 ≤ | n − x¯|, |xn| which give a contradiction to (15.5), and concludes the proof.  The first half of condition (ii) is a compactness assumption and (15.3) is a compat- ibility condition [?]. Remark (1) If G is not bounded below. Define the (sequential) recession functional G∞ associated with G. It is the extended real-valued functional defined by 1 G∞(x) = inf lim inf G(tnxn). (15.6) xn*x,tn→∞ n→∞ tn

2 and assume G∞(x) ≥ 0. Then, G(x) +  |x| is coercive. If not, there exists a sequence {x } satisfying G(x ) +  |x |2 is bounded but |x | → ∞. Let w − lim xn =x ¯. n n n n |xn|

1 xn G(|xn| ) + |xn| → 0. |xn| |xn|

Since G (¯x) ≤ 1 G(|x | xn ), which yields a contradiction. ∞ |xn| n |xn| Moreover, in the proof note that

1 1 xn G∞(¯x) ≤ lim inf G(xn) = lim inf G( |xn|) ≤ 0. n→∞ |xn| n→∞ |xn| |xn|

If we assume G∞ ≥ 0, then G∞(¯x) = 0. Thus, (15.3) must hold, assuming G∞(¯x) = 0. In [?] (15.3) is assumed for all x ∈ X and G∞(¯x) = 0, i.e.,

G(x − ρx¯) ≤ G(x).

Thus, our condition is much weaker. In [?] we find a more general version of (15.3): there exists a ρn and zn such that ρn ∈ (0, |xn|] and zn ∈ X

G(xn − ρn zn) ≤ G(xn), |zn − z| → 0 and |z − x¯| < 1.

(2) We have also proved that a sequence {xn} that minimizes the regularized problem for each n has a weakly convergent subsequence to a minimizer of G(x).

Example 1 If ker(G∞) = 0, thenx ¯ = 0 and the compactness assumption implies |x¯| = 1, which is a contradiction. Thus, the theorem applies. Example 2 (`0 optimization) Let X = `2 and consider `0 minimization:

1 G(x) = |Ax − b|2 + β |x| , (15.7) 2 2 0 where 2 |x|0 = the number of nonzero elements of x ∈ ` ,

217 the counting measure of x. Assume N(A) is finite and R(A) is closed. The, one can apply Theorem to prove the existence of solutions to (15.7). That is, suppose xn |xn| → ∞, G(xn) is bounded from above and |xn| * z. First, we show that z ∈ N(A) xn and |xn| → z. Since {G(xn)} is bounded from above, here exists M such that 1 J(x ) = |Ax − b|2 + β |x | ≤ M, for all n. (15.8) n 2 n 2 n 0

∗ 1 2 Since R(A) closed, by the closed range theorem X = R(A )+N(A) and let xn = xn+xn 1 1 2 1 Consequently, 0 ≤ |Axn|2 − 2(b, Axn)2 + |b|2 ≤ K and |Axn| is bounded.

1 xn 2 1 ∗ xn 2 0 ≤ A( ) 2 − 2 ( A b, )2 + |b|2 → 0. (15.9) |xn|2 |xn|2 |xn|2

1 By the closed range theorem this implies that |xn is bounded and xn → x¯1 = 0 in `2. i |xn|2 2 Since xn * x¯2 and by assumption dim N(A) < ∞ it follows that xn → z =x ¯2 |xn|2 |xn|2 strongly in `2. Next, (15.3) holds. Since

xn |xn|0 = | |0 |xn| and thus |z|0 < ∞ (15.3) is equivalent to showing that

1 2 1 2 |(xn + xn − ρ z)i|0 ≤ |(xn + xn)i|0 for all i,

1 2 and for all n sufficiently large. Only the coordinates for which (xn + xn)i = 0 with zi 6= 0 require our attention. Since |z|0 < ∞ there exists ˜i such that zi = 0 for all i > ˜i. ˜ 1 2 For i ∈ {1,..., i} we define Ii = {n :(xn + xn)i = 0, zi 6= 0}. These sets are finite. 1 1 2 In fact, if Ii is infinite for some i ∈ {1,...,˜i}, then lim (x + x )i = 0. n→∞,n∈Ii |xn|2 n n 1 1 Since lim (x )i = 0 this implies that zi = 0, which is a contradiction. n→∞ |xn|2 n Example 3 (Obstacle problem) Consider

Z 1 1 2 min G(u) = ( |ux| − fu) dx (15.10) 0 2 subject to u( 1 ) ≤ 1. Let X = H1(0, 1). If |u | → ∞ and v = un * v. By the 2 n n |un| assumption in (ii)

Z 1 Z 1 n 1 2 1 G(u ) |(vn)x| dx ≤ fvn dx + 2 → 0 2 0 |un| 0 |un| and thus (v ) → 0 and v → v = c for some constant c. But, since v ( 1 ) ≤ 1 → 0, n x n n 2 |un| 1 c = v( 2 ) ≤ 0 Thus, Z 1 G(un − ρ v) = G(un) + ρvf(x) dx ≤ G(un) 0 R 1 R 1 if 0 f(x) dx ≥ 0. That is, if 0 f(x) dx ≥ 0 (15.10) has a minimizer.

218 Example 4 (Friction Problem) Z 1 1 2 min ( |ux| − fu) dx + |u(1)| (15.11) 0 2

Using exactly the same argument above. we have (vn) → 0 and vn → v = c. But, we have Z 1 G(un − ρv) = G(un) + ρc f(x) dx + |un(1) − ρc| − |un(1)| 0 where |un(1) − ρc| − |un(1)| ρ = |vn(1) − c| − |vn(1)| ≤ 0 |un| |un| for sufficiently large n since |u (1) − ρc| − |u (1)| n n → |c − ρcˆ | − |c| = −ρˆ|c| |un| whereρ ˆ = lim ρn . Thus, if | R 1 f(x) dx| ≤ 1, then (15.11) has a minimizer. n→∞ |un| 0 Example 5 (L∞ Laplacian). Z 1 min |u − f| dx subject to |ux| ≤ 1 0

Similarly, we have (vn)x → 0 and vn → v = c.

G(un − ρc) − G(un) f f = G(vn − ρcˆ − ) − G(vn − ) ≤ 0 |un| |un| |un| for sufficiently large n. Example 6 (Elastic contact problem) Let X = H1(Ω)d and ~u is the deformation field. Given, the boundary body force ~g, consider the elastic contact problem: Z 1 Z Z min  : σ dx − ~g · ~udsx + |τ · ~u| dsx, subject to n · ~u ≤ ψ, (15.12) Ω 2 Γ1 Γ2 where we assume the linear strain: 1 ∂u ∂u (u) = ( i + j ) 2 ∂xj ∂xi and the Hooke’s law: σ = µ + λ trI with positive Lame constants µ, λ. If |~un| → ∞ and G(un) is bounded we have

(vn) → 0 ∇(vn) → 0 for ~v = ~un * v. Thus, one can show that ~v → v = (v1, v2) ∈ {(−Ax + C , Ax + n | ~un| n 2 1 1 2 C2)} for the two dimensional problem. Consider the case when Ω = (0, 1) and Γ1 = {x2 = 1} and Γ2 = {x2 = 0}. As shown in the previous examples Z Z G(~un − ρ v) − G(~un) 2 1 = ρnˆ · ~gv dsx + ρτˆ · ~gv dsx |~un| Γ1 Γ1 Z 1 1 1 + (|(vn) − ρvˆ | − |(vn) |) dsx Γ2

219 2 1 Since v ∈ C = {Ax1 + C2 ≤ 0} and for v = C1 1 1 1 1 2 1 1 |(vn) − ρvˆ | − |(vn) | → |(v) − ρvˆ | − |(v) | = −ρˆ|v |, if we assume | R τ · ~gds | ≤ 1 and R n · ~gv ds ≥ 0 for all v ∈ C, then (15.12) has Γ1 x Γ1 2 x 2 a minimizer. Example 7 Z 1 1 2 2 min ( (a(x)|ux| + |u| ) − fu) dx 0 2 1 with a(x) > 0 except a( 2 ) = 0.

15.2 Necessary Optimality Let X be a Banach space and continuously embed to H = L2(Ω)m. Let J : X → R+ is C1 and N is nonsmooth functional on H = L2(Ω)m is given by Z N(y) = h(y(ω)) dω. Ω where h : Rm → R is a lower semi-continuous functional. Consider the minimization on H; min J(y) + N(y) (15.13) subject to y ∈ C = {y(ω) ∈ U a.e. } where U is a closed convex set in Rm. Given y∗ ∈ C, define needle perturbations of (the optimal solution) y∗ by  y∗ on |ω − s| > δ y = u on |ω − s| ≤ δ for δ > 0, s ∈ ω, and u ∈ U. Assume that |J(y) − J(y∗) − (J 0(y∗), y − y∗)| ∼ o(meas({|s| < δ}). (15.14) Let H is the Hamiltonian defined by H(y, λ) = (λ, y) − h(y), y, λ ∈ Rm. Then, we have the pointwise maximum principle. Theorem (Pointwise Optimality) If an integrable function y∗ attains the minimum and (15.14) holds, then for a.e. ω ∈ Ω J 0(y∗) + λ = 0 (15.15) H(u, λ(ω)) ≤ H(y∗(ω), λ(ω)) for all u ∈ U. Proof: Since 0 ≥ J(y) + N(y) − (J(y∗) + N(y∗))

Z = J(y) − J(y∗) − (J 0(y∗), y − y∗) + (J 0(y∗), u − y∗) + h(u) − h(y∗)) dω |ω−s|<δ

220 Since h is lower semicontinuos, it follows from (15.14) that at a Lubegue point s = ω ∗ of y (15.15) holds.  For λ ∈ Rm define a set-valued function by

Φ(λ) = argmaxy∈U {(λ, y) − h(y)}. Then, (15.15) is written as y(ω) ∈ Φ(−J 0(y(ω)) The graph of Φ is monotone:

(λ1 − λ2, y1 − y2) ≥ 0 for all y1 ∈ Φ(λ1), y2 ∈ Φ(λ2). but it not necessarily maximum monotone. Let h∗ be the conjugate function of h defined by h∗(λ) = max H(λ, y) = max{(λ, u) − h(u)}. u∈U u∈U Then, h∗ is necessarily convex. Since if u ∈ Φ(λ), then

h∗(λˆ) ≥ (λ,ˆ y) − h(u) = (λˆ − λ, y) + h∗(u), thus u ∈ ∂h∗(λ). Since h∗ is convex, ∂h∗ is maximum monotone and thus ∂h∗ is the maximum monotone extension of Φ. Let h∗∗ be the bi-conjugate function of h;

h∗∗(x) = sup{(x, λ) − sup H(y, λ)} λ y whose epigraph is the convex envelope of the epigraph of h, i.e.,

∗∗ epi(h ) = {λ1 y1 + (1 − λ) y2 for all y1, y2 ∈ epi(h) and λ ∈ [0, 1]}.

Then, h∗∗ is necessarily convex and is the convexfication of h and

λ ∈ ∂h∗∗(u) if and only if u ∈ ∂h∗(λ) and h∗(λ) + h∗∗(u) = (λ, u). Consider the relaxed problem: Z min J(y) + h∗∗(y(ω)) dω. (15.16) Ω Since h∗∗ is convex, Z N ∗∗(y) = h∗∗(y) dω Ω is weakly lower semi-continuous and thus there exits a minimizer y of (15.16). It thus follows from Theorem (Pointwise Optimality) that the necessary optimality of (15.16) is given by −J 0(y)(ω) ∈ ∂h∗∗(y(ω)), or equivalently y ∈ ∂h∗(−J 0(y)). (15.17)

221 Lemma 2.2 Assume

J(ˆy) − J(y) − J 0(y)(ˆy − y) ≥ 0 for ally ˆ ∈ C, where y(ω) ∈ Φ(−J 0(y(ω))). Then, y minimizes the original cost functional (15.13). Proof: Since Z (J 0(y), yˆ − y) + h(ˆy) − h(y) ≥ 0, Ω we have Z Z J(ˆy) + h(ˆy) dω ≥ J(y) + h(y) dω for ally ˆ ∈ C. Ω Ω One can construct a C1 realization of h∗ by  h∗(λ) = min{ |p − λ|2 + h∗(p)}.  p 2

∗ 1 ∗ 0 ∗ 0 ∗ That is, h is C and (h ) is Lipschitz and (h ) is the Yosida approximation of ∂h . ∗ ∗ If we let h = (h ) , then h is convex and 1 h (u) = min{ |y − u|2 + h∗∗(y)}.  2 Consider the relaxed problem Z min J(y) + h(y(ω)) dω. (15.18) Ω The necessary optimality condition becomes

∗ 0 0 y(ω) = (h ) (−J (y(ω))). (15.19)

Define the proximal approximation by  p (λ) = argmin{ |p − λ|2 + h∗(p)}  2 Then, ∗ 0 (h ) =  (λ − p(λ)). Thus, one can develop the semi smooth Newton method to solve (15.19). ∗∗ Since h ≤ h , one can argue that a sequence of minimizers y of (15.18) to a minimizer of (15.16). In fact, Z Z J(y) + h(y) dω ≤ J(y) + h(y) dω. Ω Ω for all y ∈ C. Suppose |y| is bounded, then y has a weakly convergent subsequence to y¯ ∈ C and Z Z J(¯y) + h∗∗(¯y) dω ≤ J(y) + h∗∗(y) dω. Ω Ω for all y ∈ C.

222 Example (L0(Ω) optimization) Based on our analysis one can analyze when h involves the volume control by L0(Ω), i.e., Let h on R be given by α h(u) = |u|2 + β|u| , (15.20) 2 0 where ( 0 if u = 0 |u|0 = 1 if u 6= 0. That is, Z |u|0 dω = meas({u(ω) 6= 0}). Ω In this case Theorem applies with √  q for |q| ≥ 2αβ  α Φ(q) := argminu∈R(h(u) − qu) = √ (15.21)  0 for |q| < 2αβ. and the conjugate function h∗ of h;

 1 2 √ − 2α |q| + β for |q| ≥ 2αβ ∗  −h (q) = h(Φ(q)) − q Φ(q) = √  0 for |q| < 2αβ.

The bi-conjugate function h∗∗(u) is given by  q  α |u|2 + β |u| > 2β  2 α h∗∗(u) = q  √ 2β  2αβ |u| |u| ≤ α .

Clearly, Φ : R → R is monotone, but it is not maximal monotone. The maximal monotone extension Φ˜ = (∂h∗∗)−1 = ∂h∗ of Φ and is given by √  q for |q| > 2αβ  α   √  0 for |q| < 2αβ ˜  Φ(q) ∈ √  q  [ α , 0] for q = − 2αβ   √  q [0, α ] for q = 2αβ. Thus, λ − p (λ) (h∗)0 =    where 1 p (λ) = argmin{ |p − λ|2 + h∗(p)}.  2

223 We have √  λ |λ| < 2αβ    √ √ √  2αβ 2αβ ≤ λ ≤ (1 +  ) 2αβ  α p(λ) = √ √  √  − 2αβ 2αβ ≤ −λ ≤ (1 + ) 2αβ  α   λ  √   |λ| ≥ (1 + ) 2αβ. 1+ α α √ (15.22)  0 |λ| < 2αβ    √ √ √  λ− 2αβ 2αβ ≤ λ ≤ (1 +  ) 2αβ   α (h∗)0(λ) =  √ √ √  λ+ 2αβ 2αβ ≤ −λ ≤ (1 +  ) 2αβ   α   √  λ  α+ |λ| ≥ (1 + α ) 2αβ. Example (Authority optimization) We analyze the authority optimization of two un- knowns u1 and u2, i.e., the at least one of u1 and u2 is zero. We formulate such a problem by using α h(u) = (|u|2 + |u |2) + β|u u | , (15.23) 2 1 2 1 2 0 The, we have √  1 2 2 (|p1| + |p2| ) − β |p1|, |p2| ≥ 2αβ  2α  ∗  1 2 √ h (p1, p2) = 2α |p1| |p1| ≥ |p2|, |p2| ≤ 2αβ   √  1 2 2α |p2| |p2| ≥ |p1|, |p1| ≤ 2αβ

224 Also,we have √  1  1+  (λ1, λ2) |λ1|, |λ2| ≥ (1 + α ) 2αβ  α    1 √  √ √  √  ( 1+  λ1, ± 2αβ) |λ1| ≥ (1 + α ) 2αβ, 2αβ ≤ |λ2| ≤ (1 + α ) 2αβ  α   √ √ √ √  1    (± 2αβ, 1+  λ2) |λ2| ≥ (1 + α ) 2αβ, 2αβ ≤ |λ1| ≤ (1 + α ) 2αβ  α   √ √ √  (λ , λ ) |λ | ≤ |λ | ≤ (1 +  ) 2αβ, 2αβ ≤ |λ | ≤ (1 +  ) 2αβ  2 2 2 1 α 2 α   √ √ √ p (λ , λ ) =    1 2 (λ1, λ1) |λ1| ≤ |λ2| ≤ (1 + α ) 2αβ, 2αβ ≤ |λ1| ≤ (1 + α ) 2αβ   √  λ1   ( 1+  , λ2) |λ1| ≥ (1 + α )| ≥ |λ2|, |λ2| ≤ 2αβ  α   √  λ2   (λ1, 1+  ) |λ2| ≥ (1 + α )| ≥ |λ1|, |λ1| ≤ 2αβ  α   √    (λ2, λ2) |λ2| ≤ |λ1| ≤ (1 + α )|λ2|, |λ2| ≤ 2αβ   √   (λ1, λ1) |λ1| ≤ |λ2| ≤ (1 + α )|λ1|, |λ1| ≤ 2αβ (15.24)

15.3 Augmented Lagrangian method Given λ ∈ H and c > 0, consider the augmented Lagrangian functional; c L (x, y, λ) = J(x) + (λ, x − y) + |x − y|2 + N(y). c H 2 H

Since Z c 2 Lc(x, y, λ) = J(x) + ( |x − y| + (λ, x − y) + h(y)) dω Ω 2 it follows from Theorem that if (¯x, y¯) minimizes Lc(x, y, λ), then the necessary opti- mality condition is given by

−J 0(¯x) = λ + c (¯x − y¯)

y¯ ∈ Φc(λ + c x¯). where c Φ (p) = argmax{(p, y) − |y|2 − h(y)} c 2 Thus, the augmented Lagrangian update is given by

λn+1 = λn + c (xn − yn) where (xn, yn) solves −J 0(xn) = λn + c (xn − yn)

n n n y ∈ Φc(λ + c x ).

225 ∗ Let L (λ) = inf(x,y) Lc(x, y, λ). Since c L∗(λ) = − sup{(x, −λ) + (y, λ) − J(x) − h(y) − |x − y|2|} (x,y) 2 and thus L∗(λ) is concave and

∗ y − x ∈ ∂L (λ) for (x, y) = argmin(x,y)Lc(x, y, λ).

Let λ¯ is a maximizer of L∗(λ). Since

L∗(λ) ≤ L∗(λ¯) + (λ − λ,¯ x¯ − y¯) and L∗(λ) − L∗(λ¯) ≤ 0 we have y − x = 0 and thus we obtain the necessary optimality

−J 0(x) = λ

x ∈ Φc(λ + c x).

Note that 1 2 ∗ supp{− 2c |p − λ| + (x, p) − h (p)} 1 = sup{− sup{(y, p) − h(y)} − |p − λ|2 + (x, p)} p y 2c

c 2 = inf{h(y) + (λ, x − y) + |y − x| } = ϕc(x, λ). y 2 Let c L (x, y, λ) = J(x) + (λ, x − y) + |y − x|2 + h(y). c H 2 H Then 1 L∗(λ) = inf inf{J(x) + |p − λ|2 + (x, p) + h∗(p)} x p 2c If we define the proximal approximation by 1 p (λ) = argmin{ |p − λ|2 + h∗(p)}, c 2c then 0 (ϕc) (x, λ) = pc(λ + c x). (15.25) If we define the proximal approximation by 1 p (λ) = argmin{ |p − λ|2 + h∗(p)}, c 2c then 0 (ϕc) (x, λ) = pc(λ + c x). (15.26) Since 1 1 1 − |p − λ|2 + (x, p) = − |p − (λ + c x)|2 + (|λ + c x)|2 − |λ|2) 2c 2c 2c

226 Also, we have λ − p (λ) (ϕ )0(λ) = c . (15.27) c c 0 Thus, the maximal monotone extension Φ˜ c of the graph Φc is given by (ϕc) (λ), the Yosida approximation of ∂h∗(λ) and results in the update:

−J 0(xn) = λn + c (xn − yn)

n ∗ 0 n n y = (ϕc ) (λ + c x ). In the case c = 0, we have the multiplier method

λn+1 = λn + α (xn − yn), α > 0 where (xn, yn) solves −J 0(xn) = λn

yn ∈ Φ(λn) In the case λ = 0 we have the proximal iterate of the form

n+1 0 n 0 n x = (ϕc) (x − α J (x )).

15.4 Semismooth Newton method The necessary optimality condition is written as

−J 0(x) = λ

0 λ = hc(x, λ) = pc(λ + c x).

α 2 is the semi-smooth equation. For the case of h = 2 |x| + β|x|0 we have from (15.22)- (15.26) √  λ if |λ + cx| ≤ 2αβ    √ √ c √  2αβ if 2αβ ≤ λ + cx ≤ (1 + α ) 2αβ 0  hc(x, λ) = √ √ √  − 2αβ if 2αβ ≤ −(λ + cx) ≤ (1 + c ) 2αβ  α   √  λ c α+c if |λ + cx| ≥ (1 + α ) 2αβ. Thus, the semi-smooth Newton update is √  xn+1 = 0 if |λn + cxn| ≤ 2αβ   √ √ √  n+1 n n c  λ = 2αβ if 2αβ < λ + cx < (1 + α ) 2αβ 0 n+1 n+1  −J (x ) = λ , √ √ √  λn+1 = − 2αβ if 2αβ < −(λn + cxn) < (1 + c ) 2αβ  α   √  n+1 λn+1 n n c x = α if |λ + cx | ≥ (1 + α ) 2αβ.

227 α 2 2 n n n For the case h(u1, u2) = 2 (|u1| + |u2| ) + β|u1u2|0. Let µ = λ + cu . From (15.24) the semi-smooth Newton update is

 n+1 √ un+1 = λ |µn|, |µn| ≥ (1 +  ) 2αβ  α 1 2 α    λn+1 √ √ √ √  un+1 = 1 , λn+1 = ± 2αβ |µn| ≥ (1 +  ) 2αβ, 2αβ ≤ |µn| ≤ (1 +  ) 2αβ  1 α 2 1 α 2 α    √ λn+1 √ √ √  λn+1 = ± 2αβ, un+1 = 2 |µn| ≥ (1 +  ) 2αβ, 2αβ ≤ |µn| ≤ (1 +  ) 2αβ  2 α 2 α 1 α    λn+1 √ √ √  λn+1, λn+1, un+1 = 1 |µn| ≤ |µn| ≤ (1 +  ) 2αβ, 2αβ ≤ |µn| ≤ (1 +  ) 2αβ  1 2 1 α 2 1 α 2 α   λn+1 √ √ √ λn+1 = λn+1, un+1 = 2 |λ | ≤ |λ | ≤ (1 +  ) 2αβ, 2αβ ≤ |λ | ≤ (1 +  ) 2αβ  1 2 2 α 1 2 α 1 α    λn+1 √  un+1 = 1 , un+1 = 0 |µn| ≥ (1 +  )| ≥ |µn|, |µn| ≤ 2αβ  1 α 2 1 α 2 2    λn+1 √  un+1 = 2 , un+1 = 0 |µn| ≥ (1 +  )| ≥ |µn|, |µn| ≤ 2αβ  2 α 1 2 α 1 1   √  λn+1 = λn+1 un+1 = 0 |µn| ≤ |µn| ≤ (1 +  )|µn|, |µn| ≤ 2αβ  2 2 2 2 1 α 2 2   √  n+1 n+1 n+1 n n  n n λ1 = λ2 , u1 = 0 |µ1 | ≤ |µ2 | ≤ (1 + α )|µ1 |, |µ1 | ≤ 2αβ. 15.5 Constrained optimization Consider the constrained minimization problem of the form; Z min J(x, u) = (`(x) + h(u)) dω over u ∈ U, (15.28) Ω subject to the equality constraint

E˜(x, u) = Ex + f(x) + Bu = 0. (15.29)

Here U = {u ∈ L2(Ω)m : u(ω) ∈ U a.e. in Ω}, where U is a closed convex subset of Rm. Let X is a closed subspace of L2(Ω)n and E : X × U → X∗ with E ∈ L(X,X∗) bounded invertible, B ∈ L(L2(Ω)m,L2(Ω)n) and f : L2(Ω)n → L2(Ω)n Lipschitz. We assume E˜(x, u) = 0 has a unique solution x = x(u) ∈ X, given u ∈ U. We assume there exist a solution (¯x, x¯) to Problem (15.33). Also, we assume that the adjoint equation:

∗ 0 (E + fx(¯x)) p + ` (¯x) = 0. (15.30) has a solution in X. It is a special case of (15.1) when J(u) is the implicit functional of u: Z J(u) = `(x(ω)) dω, Ω where x = x(u) ∈ X for u ∈ U is the unique solution to Ex + f(x) + Bu = 0. To derive a necessary condition for this class of (nonconvex) problems we use a maximum principle approach. For arbitrary s ∈ Ω, we shall utilize needle perturbations

228 of the optimal solutionu ¯ defined by   u on {ω : |ω − s| < δ} v(ω) = (15.31)  u¯(ω) otherwise, where u ∈ U is constant and δ > 0 is sufficiently small so that {|ω − s| < δ} ⊂ Ω. The following additional properties for the optimal statex ¯ and each perturbed state x(v) will be used:

 2 1 |x(v) − x¯| = o(meas({|ω − s| < δ})) 2  L2(Ω)   R  2 (15.32) Ω `(·, x(v)) − `(·, x¯) − `x(·, x¯)(x(v) − x¯) dω = O(|x(v) − x¯|L2(Ω))    2 hf(·, x(v)) − f(·, x¯) − fx(·, x¯)(x(v) − x¯), piX∗,X = O(|x(v) − x¯|X ) Theorem Suppose (¯x, u¯) ∈ X × U is optimal for problem (15.33) , that p ∈ X satisfies the adjoint equation (15.30) and that (??), (??), and (15.32) hold. Then we have the necessary optimality condition thatu ¯(ω) minimizes h(u) + (p, Bu) pointwise a.e. in Ω.

Proof: By the second property in (15.32) we have R  0 ≤ J(v) − J(¯u) = Ω `(·, x(v)) − `(·, x(¯u)) + h(v) − h(¯u) dω

R 2 = Ω (`x(·, x¯)(x − x¯) + h(v) − h(¯u)) dω + O(|x − x¯| ), where (`x(·, x¯)(x − x¯), p) = −h(E + fx(·, x¯)(x − x¯)), pi and 0 = hE(x − x¯) + f(·, x) − f(·, x¯) + B(v − u¯), pi

2 = hE(x − x¯) + fx(·, x¯)(x − x¯) + B(v − u¯), pi + O(|x − x¯| ) Thus, we obtain Z 0 ≤ J(v) − J(¯u) ≤ ((−B∗p, v − u¯) + h(v) − h(¯u)) dω. Ω Dividing this by meas({|ω − s| < δ}) > 0 , letting δ → 0, and using the first property in (15.32) we obtain the desired result at a Lebesgue point s

ω → (−B∗p, v − u¯) + h(v) − h(¯u)) since h is lower semi continuous.  Consider the relaxed problem of (15.33): Z min J(x, u) = (`(x) + h∗∗(u)) dω over u ∈ U, (15.33) Ω

229 We obtain the pointwise necessary optimality

 Ex¯ + f 0(¯x) + Bu¯ = 0    (E + f 0(¯x))∗p + `0(¯x) = 0    u¯ ∈ ∂h∗(−B∗p), or equivalently ∗ 0 ∗ λ = (hc ) (u, λ), λ = B p. ∗ ∗∗ where hc is the Yoshida-M approximation of h .

15.6 Algoritm Based the complementarity condition (??) we have the Primal dual active method for p α 2 p L optimization (h(u) = 2 |u| + β |u| .): Primal-Dual Active method (Lp(Ω)-optimization)

1. Initialize u0 ∈ X and λ0 ∈ H. Set n = 0. 2. Solve for (yn+1, un+1, pn+1)

Eyn+1 + Bun+1 = g, E∗pn+1 + `0(yn+1) = 0, λn+1 = B∗pn+1

and λn+1 un+1 = − if ω ∈ {|λn + c un| > (µ + cu } α + βp|un|p−2 p p

n+1 n n λ = µp if ω ∈ {µp ≤ λ + cu ≤ µp + cun}.

n+1 n n λ = −µp if ω ∈ {µp ≤ −(λ + cu ) ≤ µp + cun}.

n+1 n n u = 0, if ω ∈ {|λ + cu | ≤ µp}. 3. Stop, or set n = n + 1 and return to the second seep.

16 `0 optimization

2 Consider `0 minimization in `

J(x) = F (x) + β N0(x) where X N0(x) = |xi|0 i counts the number of nonzero elements of `2. Necessary optimality: Letx ¯ be a minimizer. Then for each coordinate

Gi(x ¯i) + β |x¯i|0 ≤ Gi(xi) + β |xi|0 for all xi ∈ R,

230 where Gi(xi) = F (¯x + (xi − x¯i) ei)

Suppose xi → Gi(xi) is strict convex. If zi is a minimizer of Gi, then

Fxi (z) = 0 and Gi(z) + β < Gi(0), then zi =x ¯i is a minimizer of Gi(xi) + β |xi|0. Otherwisex ¯i = 0 is a minimizer Thus, we obtain the necessary optimality

0 F (¯x) + λ = 0, λix¯i = 0

x¯i = 0 if Gi(0) − Gi(¯xi) ≤ β, λi = 0 if Gi(0) − Gi(¯xi) > β

Note that

Z zi 00 1 00 zi 2 Gi(0) − Gi(zi) = (z − s)Gi (s) ds ∼ FGi ( )|zi| . 0 2 2 and Z z 0 0 0 00 00 zi λi = −Gi(0) = −(Gi(0) − Gi(zi)) = Gi (s) ds ∼ Gi ( )zi. 0 2 Thus, we have approximate complementarity

2 00 zi |λi| < 2βGi ( 2 ) ⇒ x¯i = 0

2 2β λi = 0 ⇒ |x¯i| ≥ 00 zi Gi ( 2 ) Equivalently, 2 λk = 0 if k ∈ {k : |λk + Λkxk| > 2βΛk} (16.1) 2 λj = 0 if j ∈ {j : |λj + Λjxj| ≤ 2βΛj}

00 zi where Λi = Gi ( 2 ). 1 2 For the quadratic case F (x) = 2 (|Ax − b| + α (x, P x) with nonnegative P and 00 zi 2 α ≥ 0. We have Gi ( 2 ) = |Ai| + Pii and thus ∗ A (Ax − b) + P x + λ = 0, λixi = 0 with complementarity condition (16.1) is the necessary optimality condition.

17 Exercise

Problem 1 If A is a symmetric matrix on Rn and b ∈ Rn is a vector. Let F (x) = 1 t t 0 2 x Ax − b x. Show that F (x) = Ax − b. 2 Problem 2 Sequences {fn}, {gn}, {hn} are Cauchy in L (0, 1) but not on C[0, 1]. Problem 3 Show that ||x| − |y|| ≤ |x − y| and thus the norm is continuous. Problem 4 Show that all norms are equivalent on a finite dimensional vector space.

231 Problem 5 Show that for a closed linear operator T , the domain D(T ) is a Banach space if it is equipped by the graph norm

2 2 1/2 |x|D(T ) = (|x|X + |T x|Y ) .

Problem 6 Complete the DC motor problem. Problem 7 Consider the spline interpolation problem (3.8). Find the Riesz representa- ˜ 2 tions Fi and Fj in H0 (0, 1). Complete the spline interpolation problem (3.8). Problem 8 Derive the necessary optimality for L1 optimization (3.27) with U = [−1, 1].

Problem 9 Solve the constrained minimization in a Hilbert space X; 1 min |x|2 − (a, x) 2 subject to (yi, x) = bi, 1 ≤ i ≤ m1, (˜yj, x) ≤ cj, 1 ≤ j ≤ m2.

m1 m2 over x ∈ X. We assume {yi}i=1, {y˜j}j=1 are linearly independent. Problem 10 Solve the pointwise obstacle problem:

Z 1 1 du min ( | |2 − f(x)u(x)) dx 0 2 dx subject to 1 2 u( ) ≤ c , u( ) ≤ c , 3 1 3 2 1 over a Hilbert space X = H0 (0, 1) for f = 1. Problem 11 Find the differential form of the necessary optimality for

Z 1 1 du 2 2 1 2 2 min ( (a(x)| | +c(x)|u| )−f(x)u(x)) dx+ (α1|u(0)| +α2|u(1)| )−(g1, u(0))−(g2, u(1)) 0 2 dx 2 over a Hilbert space X = H1(0, 1).

Problem 12 Let B ∈ Rm,n. Show that R(B) = Rm if and only if BBt is positive definite on Rm. Show that x∗ = Bt(BBt)−1c defines a minimum norm solution to Bx = c.

Problem 13 Consider the optimal control problem (3.48) (Example Control problem) m with Uˆ = {u ∈ R : |u|∞ ≤ 1}. Derive the optimality condition (3.49). Problem 14 Consider the optimal control problem (3.48) (Example Control problem) m with Uˆ = {u ∈ R : |u|2 ≤ 1}. Find the orthogonal projection PC and derive the necessary optimality and develop the gradient method.

R 1 0 Problem 15 Let J(x) = 0 |x(t)| dt is a functional on C(0, 1). Show that J (d) = R 1 0 sign(x(t))d(t) dt. Problem 16 Develop the Newton methods for the optimal control problem (3.48) with- out the constraint on u (i.e., Uˆ = Rm).

232 Problem 17 (1) Show that the conjugate functional

h∗(p) = sup{(p, u) − h(u)} u is convex. (2) The set-valued function Φ

Φ(p) = argmaxu{(p, u) − h(u)} is monotone.

Problem 18 Find the graph Φ(p) for

max{pu − |u|0} |u|≤γ and max{pu − |u|1}. |u|≤γ

233