Master’s thesis First-order augmented Lagrangian methods for state constraint problems

submitted by Jonas Siegfried Jehle

at the Faculty of Sciences Department of Mathematics and Statistics

and the Department of Mathematics

German Supervisor and 1st Reviewer: Prof. Dr. Stefan Volkwein, University of Konstanz Chinese Supervisor and 2nd Reviewer: Prof. Ph.D. Jinyan Fan, Shanghai Jiao Tong University

Konstanz, 6 September 2018

Konstanzer Online-Publikations-System (KOPS) URL: http://nbn-resolving.de/urn:nbn:de:bsz:352-2-1q60suvytbn4t1 Declaration of Authorship

I hereby declare that the submitted thesis

First-order augmented Lagrangian methods for state constraint problems is my own unaided work. All direct or indirect sources used are acknowledged as references. I am aware that the thesis in digital form can be examined for the use of unauthorized aid and in order to determine whether the thesis as a whole or parts incorporated in it may be deemed as plagiarism. For the comparison of my work with existing sources I agree that it shall be entered in a database where it shall also remain after examination, to enable comparison with future theses submitted. Further rights of reproduction and usage, however, are not granted here.

This paper was not previously presented to another examination board and has not been published.

Konstanz, 6 September 2018 ...... Jonas Siegfried Jehle

i Abstract

In this thesis, the augmented Lagrangian method is used to solve optimal boundary control problems. The considered optimal control problem appears in energy efficient building operation and consists of a heat equation with convection along with bilateral control and state constraints. The goal is to fit the temperature (state) to a given prescribed temperature profile covering as few heating (controlling) costs as possible. Numerically, a first-order method is applied to solve the minimization problem occurring within the augmented Lagrangian . Thereto, we set up and solve the adjoint equation. Both partial differential equations, the state and the adjoint equation, are treated with the finite element Galerkin ansatz combined with an implicit Euler scheme in time. At the end, numerical tests of the created first-order augmented Lagrangian method illustrate the efficiency of the proposed strategy.

ii Contents

1 Introduction 1

2 Preliminaries 3

3 The Augmented Lagrangian Method 7 3.1 The Gradient and the ...... 7 3.2 The Advantage of the Expansion ...... 8 3.3 Numerical Approach ...... 9 3.4 Problems with Inequality Constraints ...... 12

4 The Convection-Diffusion Model 14 4.1 Weak Formulation ...... 14 4.2 Well-Posedness and Unique Existence ...... 16

5 The Optimal Control Problem of the Convection-Diffusion Model 19 5.1 The Augmented Lagrange Function of the Optimal Control Problem . . . . 20 5.2 The Derivative of the Augmented Lagrange Function ...... 23 5.3 The Adjoint Equation ...... 25

6 Numerical Consideration of the Optimal Control Problem 31 6.1 Solving the State and the Adjoint Equation ...... 31 6.1.1 Galerkin Ansatz and Implicit Euler Method for the State Equation 32 6.1.2 Galerkin Ansatz and Implicit Euler Method for the Adjoint Equation 35 6.2 The Practicable Code ...... 35 6.3 Tests ...... 37 6.3.1 Test 1 ...... 39 6.3.2 Test 2 ...... 44 6.3.3 Test 3 ...... 45

7 Conclusion and Outlook 48

References 51

iii Chapter 1 Introduction

Starting in the year 1696, when Johann Bernoulli released the established Brachystochrone problem (see [Str69, pages 391-399]), optimal control problems grew to an independent, relevant topic in the field of mathematics - but not alone in theory. After the invention of computers, calculating the solutions of optimal control problems became faster and more efficient. Optimal control searches for the answer to the question which control law is the most favourable one for a given system in such a way that a certain optimality criterion is reached. Typically, like problems, a control problem contains a cost function as well as constraints. The cost function thereby possesses the state variable y and the control variable u as function arguments. The control u may be arbitrarily chosen within the given constraints while the state y is uniquely identified in terms of u. The aim is to determine u such that the cost function is minimized and the constraints are fulfilled. Such controls are called optimal. In practice, one application area of optimal control is present in aviation and space technology. The example of the rocket car in the textbook [MS82, pages 10-15] by Macki and Strauss shows why: as often as an airplane is supposed to trail a flight path, issues of optimization are needed to bring into effect. Furthermore, optimal control is employed in medicine, robotics, movement sequences in sports and engineering to name just a few.

Since the augmented Lagrangian method belongs to the class of for solv- ing constrained optimization problems, it can be used for a numerical approach of an optimal control problem. In 1969, Magnus Hestenes and Michael Powell mentioned the method in their works (see [Hes69, pages 303-320] and [Pow69, pages 283–298]) for the first time. In practice, the algorithm resembles a . Instead of solving the constrained optimization problem, it is replaced by a series of unconstrained problems where a term including the Lagrange multiplier is being added to the objective as well as a penalty term. The Lagrange multipliers are auxiliary variables supporting the char- acterization of optimal solutions on the one hand. On the other hand they carry useful sensitivity information. The term of augmentation originates from the penalty term since it distinguishes the common Lagrangian method from the augmented one. It is worth noting that the mathematicians Ralph Tyrrell Rockafellar and Dimitri Bertsekas worked with the augmented Lagrangian method - Rockafellar in relation to convex pogramming in several papers (see [Roc73, pages 555—562], [Roc74, pages 268—285], [Roc76, pages 97—116]) while Bertsekas mentions it repeatedly in his book and extends it to the expo- nential method of multipliers (see [Ber96]).

In this thesis, we study an optimal boundary control problem appearing in energy efficient building operation. To produce the temperature changes in a room represent- ing the state of the optimal control, we observe a convection-diffusion model which is a parabolic partial differential equation with Robin boundary condition. The room temper- ature has to be kept inside a certain region controlled by heaters placed on the boundary of the room. The power of the heaters is restricted due to the real life practice and so

1 they underlie bilateral control constraints. The aim is to track a prescribed temperature profile and minimize the upcoming heating costs while not transgressing the selected state and control constraints, respectively.

In order to achieve our goal, the thesis is structured as follows:

• Chapter 2 deals with basic notations and definitions and provides assistance for solving optimization problems and the arising partial differential equations.

• The augmented Lagrangian method is introduced in Chapter 3, together with ex- amples and important properties.

• In Chapter 4, a weak formulation of the employed convection-diffusion model is given and its well-posedness is proven.

• Chapter 5 is devoted to the introduction of the cost functional and the Lagrange function whose gradient is determined. To work with the gradient, we introduce the adjoint equation by which we can reformulate the gradient in a suitable way.

• The numerical realization of the theoretical work is discussed in Chapter 6. We solve the state and adjoint equation using the finite element Galerkin ansatz combined with an implicit Euler scheme in time. Within the augmented Lagrangian method we use a gradient projection method to perform the minimization.

• Lastly, Chapter 7 is a summary of the thesis and the conclusion of the obtained results. It also presents a brief outlook on subsequent questions and possible modi- fications.

2 Chapter 2 Preliminaries

In this chapter, we provide basic definitions, notations and results, which are used through- out this thesis. The proofs for the results are omitted and references to the literature, where these topics are discussed in detail, are given instead. First of all, we go through some common expressions. Let m, n ∈ N in this chapter. n n Notation 2.1. Let x = (x1, ..., xn), y = (y1, ..., yn) ∈ R and Ω ⊂ R . • We write

x ≤ y : ⇐⇒ xi ≤ yi for i ∈ {1, ..., n},

x < y : ⇐⇒ xi < yi for i ∈ {1, ..., n}

and on basis of that, we define

n (x, y) := {z ∈ R |x < z < y}, n [x, y] := {z ∈ R |x ≤ z ≤ y} for x < y. Analogously, we define (x, y] and [x, y).

• The Euclidean norm is denoted as v u n uX 2 kxk := kxk2 = t xi i=1

with the corresponding inner product

n T X hx, yi := x y := xiyi. i=1

• Ck(X), k ∈ N, is the space of all k-times continuously differentiable functions map- ping from an open, non-empty subset X ⊂ Rn to R. Furthermore, a C∞ function is a function that is differentiable for all degrees of differentiation. C∞ functions are also called test functions.

• Let X,Y be topological vector spaces. Then, we call

L(X,Y ) := {T : X → Y | T is a linear and continuous operator}

the space of all linear and continuous operators from X to Y . In particular, X0 = L(X, R) is called the dual space of X.

• Let Ω ⊂ Rn be a measure space with Lebesgue measure. The common Lp space for Ω is denoted as Lp(Ω) with Lp-norm

3 1   p R p kfkLp(Ω) := |f(x)| dx Ω

for f ∈ Lp(Ω) (1 ≤ p < ∞). In particular, for f, g ∈ L2(Ω), it holds

R hf, giL2(Ω) = f(x)g(x) dx. Ω

1 1 • The Sobolev space H (Ω) is the completion of {u ∈ C (Ω) : kukH1(Ω) < ∞} with

q 2 2 kuk 1 := kuk + k∇uk . H (Ω) L2(Ω) L2(Ω;Rn)

• Let U,V be Hilbert spaces. Let S : U → V be a linear, bounded operator. We call ? ? S : V → U the adjoint if it holds hSu, viV = hu, S viU for all u ∈ U, v ∈ V . Next, we provide sufficient and necessary conditions for restricted optimization prob- lems of the form

n (ROP) min J(x) subject to (s.t.) x ∈ R and e(x) = 0,

n T n m where J : R → R represents the cost function and e = (e1, ..., em) : R → R the equality constraints. The following expressions and statements can also be found in [LY08] and [NW06].

Definition 2.2. Let x∗ ∈ Rn and U(x∗) ⊂ Rn be a neighbourhood of the point x∗. • We call x∗ local solution of (ROP) if it holds

e(x∗) = 0 and J(x∗) ≤ J(x) for all x ∈ U(x∗) with e(x) = 0.

• The point x∗ is a strict local solution of (ROP) if it holds

e(x∗) = 0 and J(x∗) < J(x) for all x ∈ U(x∗) with e(x) = 0.

• We call x∗ a global solution of (ROP) if it holds

e(x∗) = 0 and J(x∗) ≤ J(x) for all x ∈ Rn with e(x) = 0.

Analogously, the concept of a strict global solution is given.

Definition 2.3. Letx ¯ ∈ Rn fulfil e(¯x) = 0. Then, the pointx ¯ is called regular relative to 1×n n e(x) = 0 if and only if ∇e1(¯x), ∇e2(¯x), ..., ∇em(¯x) ∈ R are linearly independent in R .

Theorem 2.4 (First-order Necessary Optimality Condition). Let x∗ ∈ Rn be a local solution of (ROP) and regular. Then, there exists a unique Lagrange multiplier λ∗ ∈ Rm such that

m ∗ X ∗ ∗ ∗ ∗ T ∗ (2.1) ∇J(x ) + λi ∇ei(x ) = ∇J(x ) + (λ ) ∇e(x ) = 0. i=1

4 Proof. A proof can be found in [LY08, Chapter 11]

Definition and Remark 2.5. We define the common Lagrange function

L(x, λ) := J(x) + λT e(x) = J(x) + hλ, e(x)i.

Using its gradient, we can rewrite the first-order necessary optimality condition (2.1) and the constraints as

∗ ∗ ∗ ∗ T ∗ ∇xL(x , λ ) = ∇J(x ) + (λ ) ∇e(x ) = 0, ∗ ∗ ∗ T ∇λL(x , λ ) = e(x ) = 0, or just ∇L(x∗, λ∗) = 0.

Theorem 2.6 (Second-order Necessary Optimality Condition). Let x∗ be regular and a local minimum of J under the constraint e(x) = 0. Then, the n × n-matrix

m ∗ ∗ 2 ∗ ∗ T 2 ∗ 2 ∗ X ∗ 2 ∗ ∇xxL(x , λ ) = ∇ J(x ) + (λ ) ∇ e(x ) = ∇ J(x ) + λi ∇ ei(x ) i=1 is positive semidefinite on the nullspace of ∇e(x∗), i.e.

T ∗ ∗ ∗ n ∗ v ∇xxL(x , λ )v ≥ 0 for all v ∈ ker(∇e(x )) := {v ∈ R |∇e(x )v = 0}, where λ∗ represents the Lagrange multiplier from 2.4.

Proof. A proof can be found in [LY08, Chapter 11]

Theorem 2.7 (Second-order Sufficient Optimality Condition). Let x∗ ∈ Rn with e(x∗) = 0 and let λ∗ ∈ Rm satisfy ∇J(x∗) + (λ∗)T ∇e(x∗) = 0.

∗ ∗ ∗ ∗ Further, let x be regular and ∇xxL(x , λ ) be positive definite on the nullspace of ∇e(x ), i.e.

T ∗ ∗ ∗ v ∇xxL(x , λ )v > 0 for all v ∈ ker(∇e(x ))\{0}.

Then, x∗ is a strict local minimum of J subject to the constraint e(x) = 0. This means that x∗ is a strict local solution of (ROP).

Proof. A proof can be found in [LY08, Chapter 11] At last, assistive equipment for solving a partial differential equations, which occurs in the constraints of our optimal boundary control problem, is allocated.

Theorem 2.8 (Trace Theorem). Let Ω ⊂ Rn be bounded with C1-boundary. Then, there exists a bounded linear operator

T : H1(Ω) → L2(∂Ω) such that

5 1 ¯ • T u = u|∂Ω for u ∈ H (Ω) ∩ C(Ω),

1 •k T ukL2(∂Ω) ≤ CkukH1(Ω) for each u ∈ H (Ω) with constant C = C(Ω). Proof. A proof can be found in [Eva10, pages 258-259].

Definition 2.9. Let (V, h·, ·iV ) be a separable Hilbert space and let (A(t))t∈[0,T ] be a family of operators A : [0,T ] → L(V,V 0). The corresponding bilinear form a : [0,T ]×V ×V → R is defined by

a(t; ϕ, ψ) := −hA(t)ϕ, ψiV 0×V (t ∈ [0,T ], ϕ, ψ ∈ V ).

The bilinear form is called coercive, if there exist α > 0 and β ≥ 0 with

2 2 a(t; ϕ, ϕ) ≥ αkϕkV − βkϕkH (t ∈ [0,T ], ϕ ∈ V ). In the case β = 0 we say a is strict coercive.

Theorem 2.10. Let (V, h·, ·iV ) and (H, h·, ·iH ) be separable Hilbert spaces which form a Gelfand triple V,→ H = H0 ,→ V 0. Let A ∈ C([0,T ],L(V,V 0)) be coercive. Let the Hilbert space W (0,T ) := L2(0,T ; V ) ∩ H1(0,T ; V 0) be endowed with the inner product

T Z hϕ, ψiW (0,T ) := hϕ(t), ψ(t)iV + hϕt(t), ψt(t)iV 0 dt (ϕ, ψ ∈ W (0,T )). 0

2 0 Then, for all f ∈ L (0,T ; V ) and y0 ∈ H, there exists a unique solution y ∈ W (0,T ) of the abstract evolution equation

yt(t) − A(t)y(t) =f(t)(t ∈ (0,T )),

y(0) =y0.

In particular, the mapping

W (0,T ) → L2(0,T ; V 0) × H,

y 7→ (yt − Ay, y(0)) is an isomorphism of Hilbert spaces. Furthermore, the inverse mapping, which is the so called solution mapping, is linear and continuous. Hence, there is a constant C > 0 such that  kykW (0,T ) ≤ C kfkL2(0,T ;V 0) + ky0kH

2 0 is true for all (f, y0) ∈ L (0,T ; V ) × H, where y ∈ W (0,T ) is the solution of the abstract evolution equation. Proof. A proof can be found in [Paz92, Chapter 5] or in [Den13, Chapter 8]. The fact  that W (0,T ), h·, ·iW (0,T ) is a Hilbert space is proven in [Tr¨o10,pages 146-148]

Remark 2.11. Choosing V = H1(Ω) and H = L2(Ω) for a domain Ω ⊂ Rn, we obtain a Gelfand triple V,→ H,→ V 0. For the precise definition of a Gelfand triple see [Tr¨o10, page 147].

6 Chapter 3 The Augmented Lagrangian Method

In this chapter, we introduce the augmented Lagrangian method. First, the method is applied to solve an optimization problem with equality constraints. Properties of the method are shown and the numerical realization is presented. At the end of the chapter, we show how to deal with additional inequality constraints. The chapter is orientated on [Ber99, Chapter 3 and 4]. Let m, n ∈ N and m ≤ n. We consider the restricted optimization problem n (ROP) min J(x) s.t. x ∈ R and e(x) = 0, n T n m for the cost function J : R → R and the equality constraints e = (e1, ..., em) : R → R .

While we use the Lagrange function L(x, λ) = J(x) + λT e(x), in the common Lagrangian method with λ ∈ Rm, choosing c > 0 and adding the term c 2 c T 2 ke(x)k = 2 e(x) e(x) gives us the augmented Lagrange function c (3.1) L (x, λ) = J(x) + λT e(x) + ke(x)k2. c 2

We realize that Lc(x, λ) is the common Lagrange function for the problem c min J(x) + ke(x)k2 s.t. x ∈ n and e(x) = 0. 2 R As a consequence, this problem has the same minima as (ROP).

3.1 || The Gradient and the Hessian Matrix Now, we calculate the first and second derivative of the augmented Lagrange function (3.1). For the gradient we obtain T 1×(n+m) ∇Lc(x, λ) = (∇xLc(x, λ), e(x) ) ∈ R , where the product rule yields T ∇xLc(x, λ) = ∇J(x) + (λ + ce(x)) ∇e(x) | {z } =:λ˜ ˜ 1×n = ∇xL(x, λ) ∈ R . We write the Hessian matrix in block matrix form, namely  ∇2 L (x, λ) ∇e(x)T  ∇2L (x, λ) = xx c ∈ (n+m)×(n+m), c ∇e(x) 0 R where we use the product rule again to get 2 2 T 2 T n×n ∇xxLc(x, λ) = ∇ J(x) + (λ + ce(x)) ∇ e(x) +c∇e(x) ∇e(x) ∈ R . | {z } 2 ˜ =∇xxL(x,λ)

7 3.2 || The Advantage of the Expansion For determining the minima of (ROP), we make use of

Lemma 3.1. Let P,Q ∈ Rn×n be symmetric. Let Q be positive semidefinite and P be positive definite on the nullspace of Q, i.e. it holds xT P x > 0 for all x ∈ Rn with x 6= 0 and xT Qx = 0. Then there exists a c¯ > 0 such that the matrix P + cQ is positive definite for all c > c¯.

Proof. A proof can be found in [Ber99, Chapter 3].

If x∗ ∈ Rn and λ∗ ∈ Rm satisfy the second-order sufficient optimality condition (see Theorem 2.7)

∇L(x∗, λ∗) = 0, T 2 ∗ ∗ ∗ y ∇xxL(x , λ )y > 0 for all y 6= 0 with ∇e(x )y = 0, then it holds

∗ ∗ ∗ ∗ ∗ T ∗ ∇xLc(x , λ ) = ∇J(x ) + (λ + c e(x )) ∇e(x ) | {z } =0 ∗ ∗ = ∇xL(x , λ ) = 0 as well as

T 2 ∗ ∗ T 2 ∗ ∗ ∗ T 2 ∗ ∗ T ∗  v ∇xxLc(x , λ )v = v ∇ J(x ) + (λ + c e(x )) ∇ e(x ) + c∇e(x ) ∇e(x ) v | {z } =0 T 2 ∗ ∗ T ∗ T ∗ = v ∇xxL(x , λ ) v + c v ∇e(x ) ∇e(x )v | {z } | {z } p.d. on nullspace of ∇e(x∗) =k∇e(x∗)vk2≥0

n 2 ∗ ∗ for v ∈ R \{0}. According to Lemma 3.1, we get that ∇xxLc(x , λ ) is positive definite ∗ ∗ for a c > c¯ and simultaneously, x is a strict local minimum of Lc(·, λ ). Example 3.2. We compare the behaviour of the common Lagrangian method and the augmented Lagrangian method on the basis of the problem

min J(x) = −(x1x2 + x2x3 + x1x3) s.t. e(x) = x1 + x2 + x3 − 3 = 0. At first, we set up the Lagrange functions and calculate their gradients. Common Method Augmented Method c 2 L(x, λ) = J(x) + λe(x) Lc(x, λ) = J(x) + λe(x) + 2 e(x)     −x2 − x3 + λ −x2 − x3 + λ + ce(x) T  −x1 − x3 + λ  T  −x1 − x3 + λ + ce(x)  ∇L(x, λ) =   ∇Lc(x, λ) =    −x1 − x2 + λ   −x1 − x2 + λ + ce(x)  e(x) e(x) By setting the gradients equal to zero, we get the same candidate for a minimum for both methods with the values

∗ ∗ ∗ ∗ x1 = x2 = x3 = 1 and λ = 2.

8 To check the sufficient condition of second order, we need the Hessian matrices.

Common Method Augmented Method  0 −1 −1   c −1 + c −1 + c  2 ∗ ∗ 2 ∗ ∗ ∇xxL(x , λ ) =  −1 0 −1  ∇xxLc(x , λ ) =  −1 + c c −1 + c  −1 −1 0 −1 + c −1 + c c

On the left, for the common method, it remains to verify the positive definiteness on the nullspace of ∇e(x∗). On the contrary, it can be easily proven that the matrix on the right 3 2 is positive definite for all x ∈ R \{0} and c > c¯ = 3 .

3.3 || Numerical Approach In practice, instead of considering (ROP), we solve the unrestricted sequence of optimiza- tion problems

k k n (OP ) min Lck (x, λ ) s.t. x ∈ R . By doing so, we obtain the wanted solution. This is evidenced by

Lemma 3.3. Let f and e be continuous and let {x ∈ Rn|e(x) = 0} be non-empty. Let k (xk)k∈N be the sequence of the global minima of problem (OP ), where (λk)k∈N is bounded, 0 < ck < ck+1 holds for all k ∈ N and ck → ∞. Then, every limit point of the sequence (xk)k∈N is a global minimum of the original problem (ROP). Proof. A proof can be found in [Ber99, Chapter 4]. Our new problem (OPk) has no constraints and it contains the penalty parameter ck > 0. To find x∗, we increase ck in every iteration. A large ck makes sense because it implies high costs of inadmissibility. As a result, x∗ is admissible. For an admissible x, we get

Lck (x, λ) = J(x). Furthermore, it holds

Lck (˜x, λ) ≈ J(˜x) for a nearly admissiblex ˜. The x update is made by a gradient method (e.g. , conjugate gradient method, etc.) to avoid bad condition. We set ck+1 = βck for β > 1 (β ∈ [4, 10] in practice (see [Ber99, page 405])) for the next iteration. We choose

k k k k k∇xLck (x , λ )k ≤  for  → 0 as the stopping criterion. Last but not least, we update λ by means of Theorem 3.4. Let J and e be continuously differentiable. Let 0 < ck < ck+1 and ck → ∞

k for (c )k∈N and let k ≥ 0 and k → 0

9 k k k for ( )k∈N. Let the sequence (λ )k∈N be bounded and let (x )k∈N satisfy

k k k k∇xLck (x , λ )k ≤  .

kj ∗ ∗ Moreover, let the subsequence (x )j∈N converge to x such that ∇e(x ) has full rank. kj kj kj ∗ Then, (λ + c e(x ))j∈N converges to λ as well as ∇J(x∗) + (λ∗)T ∇e(x∗) = 0 and e(x∗) = 0. Proof. A proof can be found in [Ber99, Chapter 4]. From that and because of the fact that the x update, which is done in the previous step, provides xk+1, the λ update is implemented by λk+1 = λk + cke(xk+1). The procedure discussed above can be implemented in the following way:

Algorithm 1 : Augmented Lagrangian method 1: Data: Initial guess x0, initial multiplier λ0, weight c0, increment β, tolerance ; 2: begin 3: set k = 0; k k 4: while k∇Lck (x , λ )k ≥  do 5: solve for xk+1 the problem (OPk); 6: update the Lagrange multiplier

λk+1 = λk + cke(xk+1);

7: set ck+1 = βck and k = k + 1; 8: end.

Example 3.5. Let us solve the problem

1 2 2 min J(x) = 2 (x1 + x2) s.t. e(x) = x1 − 1 = 0. First, we calculate the solution by hand. The augmented Lagrange function reads

1 2 2 c 2 Lc(x, λ) = 2 (x1 + x2) + λ(x1 − 1) + 2 (x1 − 1) . We obtain its gradient   x1 + λ + c(x1 − 1) T ! ∇Lc(x, λ) =  x2  = 0. x1 − 1 This yields the candidate x∗ = (1, 0) with λ∗ = −1. Finally, the Hessian matrix is  1 + c 0  ∇2 L (x, λ) = , xx c 0 1

10 which is positive definite for all c > 0. Hence, x∗ is a minimum. Due to the contour lines we can guess the solution. In addition, one can see the effect of a constantly increasing c.

5 5

4 4

3 3

2 2

1 1

0 0

-1 -1

-2 -2

-3 -3

-4 -4

-5 -5 -5 0 5 -5 0 5

Figure 1: Contour lines (left: L0.5(·, −3), right: L20(·, −3))

The star in Figure 1 represents the known solution x∗. Now, we show that we get the same result for our numerical scheme. Therefore, we treat step 5 of Algorithm 1 with

Algorithm 2 : Gradient descent method (for f ∈ C1) 1: Data: Initial guess x0; 2: begin 3: set k = 0; 4: while no convergence and k < kmax do 5: set descent direction dk = −∇f(xk); 6: choose step size sk ∈ (0, 1); k+1 k k 7: calculate x = x + skd ; 8: set k = k + 1; 9: end.

n+1 n We take ||x − x || < εgd for a fixed εgd > 0 as the stopping criterion and choose the m least integer m, such that the step size sk = ζ for ζ ∈ (0, 1) fulfils the general sufficient decrease condition

k+1 k k 2 (3.2) f(x ) − f(x ) < −αsk||∇f(x )|| (α is typically set to 10−4) (see [Kel99, Chapter 3.2]), for the gradient descent method. −5 Furthermore, by taking ε = 10 , εgd = 0.2ε and ζ = 0.8, we attain Table 1.

x0 λ0 c0 β Iterations (Gradient loops) 1 (2, 1) −2 1 10 {28, 80, 92, 6, 2} 2 (2, 1) −2 1 4 {28, 9, 18, 48, 12, 3} 3 (2, 1) −2 50 4 {112, 45, 6, 2} 4 (2, 1) −2 100 4 {1714, 22, 5, 2} 5 (−2 · 105, −105) 2 · 105 1 4 {28, 9, 18, 48, 12, 3} 6 (−2 · 105, −105) 2 · 105 100 10 {129, 8, 2} Table 1: Augmented Lagrangian method with different initial guesses

11 We note that the number of iterations in a single gradient loop (see Table 1, last column) tells us how long the gradient descent method needs to solve step 5 of Algorithm 1. The total number of gradient loops shows the number of how often the method repeats the while loop (step 4) until the stopping criterion is reached. For this example, every combination converges rounded to x∗ = (1, 0), λ∗ = −1 for all of the chosen initial guesses. The algorithm also converges for large x0 and λ0. A good coordination of the initial guesses c0, β reduces the number of iterations in the single gradient loops (compare row 1 to row 4).

3.4 || Problems with Inequality Constraints The augmented Lagrangian method is also applicable for problems with inequality con- straints. Let us consider the problem n min J(x) s.t. x ∈ R , e(x) = 0 and g(x) ≤ 0, where g : Rn → Rp appears. First, using slack variables, we transform the inequality constraints to equality constraints and obtain the new problem n p 2 min J(x) s.t. x ∈ R , z ∈ R , e(x) = 0 and gi(x) + zi = 0, for i ∈ {1, ..., p}. The corresponding augmented Lagrange function has the form p p c X c X L¯ (x, z, λ, µ) = J(x) + λT e(x) + e(x)T e(x) + µ (g (x) + z2) + (g (x) + z2)2 c 2 i i i 2 i i i=1 i=1 with the Lagrange multipliers λ ∈ Rm and µ ∈ Rp. Numerically, we apply the procedure above, i.e. we solve the problem p p ¯ T c T X 2 c X 2 2 min Lc(x, z, λ, µ) = J(x) + λ e(x) + e(x) e(x) + µi(gi(x) + zi ) + (gi(x) + zi ) . x,z 2 2 i=1 i=1 ¯ At first, we minimize with respect to z. Differentiating Lc(x, z, λ, µ) and setting it to zero leads to the p equations

∂ ¯ 2 ! Lc(x, z, λ, µ) = 2µizi + c(gi(x) + zi )2zi = 0 for i = 1, ..., p. ∂zi

Dividing by 2zi (in case zi 6= 0) and reordering yields µ  (z∗)2 = − i + g (x) for i = 1, ..., p. i c i

Adding the case zi = 0, we get n µ o (z∗)2 = max 0, − i + g (x) for i = 1, ..., p. i c i ∗ 2 ¯ Now, we can insert (zi ) in Lc(x, z, λ, µ) to obtain ¯ ∗ Lc(x, λ, µ) :=Lc(x, z , λ, µ) p  2  T c T X µi =J(x) + λ e(x) + e(x) e(x) + − χ − µi +g (x) >0 (x) 2 2c { ( c i ) } i=1 p X  c 2  + µigi(x) + gi (x) χ − µi +g (x) ≤0 (x) 2 { ( c i ) } i=1

12  1 for x ∈ A, where for a set A, it holds χ (x) = . Finally, we minimize L (x, λ, µ) A 0 otherwise c with respect to x as is usual.

13 Chapter 4 The Convection-Diffusion Model

We consider the following state equation: For given 2 m u ∈ Uad := {u˜ ∈ L (0,T ; R ) | ua(t) ≤ u˜(t) ≤ ub(t) in [0,T ] a.e.} 2 m with ua, ub ∈ L (0,T ; R ) the function y is governed by the linear parabolic differential equation with Robin boundary conditions (SEa)

yt(t, x) − λ∆y(t, x) + v(t, x) · ∇y(t, x) = f(t, x) in Q := (0,T ) × Ω a.e., m ∂y X (SEb) λ (t, x) + γ y(t, x) = γ u (t)b (x) in Σ := (0,T ) × Γ a.e., ∂η c c i i c c i=1 ∂y (SEc) λ (t, x) + γ y(t, x) = γ y (t) in Σ := (0,T ) × Γ a.e., ∂η out out out out out

y(0, x) = y0(x) in Ω a.e.(SEd)

The state equation (SE) is called convection-diffusion equation, heat-convection phenom- ena or heat equation with convection term. As its names indicate the equation models a combination of diffusion and convection. It characterizes physical phenomena where particles, energy or other physical quantities are conveyed inside a physical system due to the two aforementioned processes. In our case the convection-diffusion equation models the temperature distribution in a room Ω. There are m heaters in the room which are placed by the functions b1, ..., bm. The respective usage of every heater is represented by the space-independent control functions u1, ..., um. Together, the right-hand side of (SEb) describes the heating process. The boundary Γ := ∂Ω of the domain Ω is split into the disjoint subsets Γc and Γout where it holds that

Γ = ∂Ω = Γc ∪ Γout.

In particular, we assume that Γc 6= ∅.

4.1 || Weak Formulation At first, we introduce a weak formulation of (SE). For that, we multiply (SEa) by an arbitrary test function ϕ ∈ C∞(Ω) and integrate over Ω with the result Z Z Z Z yt(t, x)ϕ(x) dx − λ ∆y(t, x)ϕ(x) dx + v(t, x) · ∇y(t, x)ϕ(x) dx = f(t, x)ϕ(x) dx. Ω Ω Ω Ω For the second term in (SEa), using Green’s first identity Z Z ∂ψ ϕ∆ψ + ∇ϕ · ∇ψ dx = ϕ dA(x)(ψ ∈ C2(Ω)) ∂η Ω ∂Ω

14 in combination with inserting the boundary conditions (SEb), (SEc), we get Z Z Z ∂y λ ∆y(t, x)ϕ(x) dx = − λ ∇y(t, x) · ∇ϕ(x) dx + λ (t, x)ϕ(x) dA(x) ∂η Ω Ω Γ Z = − λ ∇y(t, x) · ∇ϕ(x) dx

Ω m ! Z X + γc ui(t)bi(x) − γcy(t, x) ϕ(x) dA(x) i=1 Γc Z + (γoutyout(t) − γouty(t, x))ϕ(x) dA(x).

Γout This yields Z Z Z yt(t, x)ϕ(x) dx + λ ∇y(t, x) · ∇ϕ(x) dx + γc y(t, x)ϕ(x) dA(x)

Ω Ω Γc Z Z + γ y(t, x)ϕ(x) dA(x) + v(t, x) · ∇y(t, x)ϕ(x) dx (4.1) out Γout Ω m Z X Z Z = f(t, x)ϕ(x) dx + γc ui(t) bi(x)ϕ(x) dA(x) + γoutyout(t) ϕ(x) dA(x). i=1 Ω Γc Γout Since C∞(Ω) ⊂ H1(Ω) is dense, (4.1) also holds for all ϕ ∈ H1(Ω). Let V := H1(Ω) with V 0 = H1(Ω)0 and H := L2(Ω) be a Gelfand triple. By introducing the non-symmetric, time-dependent bilinear form a(t; ·, ·): V × V → R, t ∈ (0,T ), and the mapping g : (0,T ) × V → R defined by Z Z Z a(t; ψ, ϕ) :=λ ∇ψ(x) · ∇ϕ(x) dx + γc ψ(x)ϕ(x) dA(x) + γout ψ(x)ϕ(x) dA(x)

Ω Γc Γout Z + v(t, x) · ∇ψ(x)ϕ(x) dx

Ω and m Z X Z Z g(t, ϕ) := f(t, x)ϕ(x) dx + γc ui(t) bi(x)ϕ(x) dA(x) + γoutyout(t) ϕ(x) dA(x), i=1 Ω Γc Γout we obtain the weak formulation

0 (SEwfa) yt(t) + a(t; y(t), ·) = g(t, ·) in V a.e.,

(SEwfb) y(0) = y0 in H. of (SE) with the unknown

2 1 2 1 0 y ∈ W (0,T ) = {ϕ ∈ L (0,T ; H (Ω)) | ϕt ∈ L (0,T ; H (Ω) )}.

15 4.2 || Well-Posedness and Unique Existence Under the assumptions

• Ω ⊂ Rn is a bounded domain with C1-boundary Γ, • diffusion parameter λ > 0,

• convection v ∈ L∞(Q; Rn) is time-dependent,

2 • b = (b1, ..., bm) with bi ∈ L (Γ) for all i ∈ {1, ..., m},

• isolation coefficients γc, γout ≥ 0, • f ∈ L2(Q),

• control u is an element of the real Hilbert space U ⊂ L2(0,T ; Rm),

2 • the outer temperature yout ∈ L (0,T ) is space-independent,

2 • initial temperature y0 ∈ L (Ω), we will show that (SEwf) has a unique solution using the following properties of the bilinear form a and the right-hand side g. Lemma 4.1. The mapping a˜ : V → V 0, v 7→ a(t; v, ·), t ∈ (0,T ), is well-defined and one has a˜ ∈ L(V,V 0). Proof. Since a is bilinear, the operatora ˜(ϕ) is linear for any ϕ ∈ V . For any ϕ, ψ ∈ V and t ∈ (0,T ) using the Trace Theorem 2.8 and γmax := max{γc, γout}, we have

Z Z Z

|a(t; ϕ, ψ)| = λ ∇ϕ(x)∇ψ(x) dx + v(t, x) · ∇ϕ(x)ψ(x) dx + γc ϕ(x)ψ(x) dA(x)

Ω Ω Γc

Z

+γout ϕ(x)ψ(x) dA(x)

Γout

Z Z Z

≤ λ ∇ϕ(x)∇ψ(x) dx + v(t, x) · ∇ϕ(x)ψ(x) dx + γmax ϕ(x)ψ(x) dA(x)

Ω Ω Γ

≤λk∇ϕkL2(Ω;Rn)k∇ψkL2(Ω;Rn) + kv(t) · ∇ϕkL2(Ω)kψkL2(Ω) + γmaxkϕkL2(Γ)kψkL2(Γ) 2 ≤λkϕkV kψkV + kvkL∞(Q;Rn)k∇ϕkL2(Ω;Rn)kψkL2(Ω) + CTT γmaxkϕkV kψkV 2 ≤λkϕkV kψkV + kvkL∞(Q;Rn)kϕkV kψkV + CTT γmaxkϕkV kψkV =CkϕkV kψkV .

This means thata ˜ is well-defined anda ˜ ∈ L(V,V 0) with

2 ka˜kL(V,V 0) ≤ C := λ + kvkL∞(Q;Rn) + CTT γmax.

16 λ Lemma 4.2. The bilinear form a is coercive with coercivity constants α = 2 and 2 kvk ∞ n L (Q;R ) λ β = 2λ + 2 .

Proof. We take an arbitrary ϕ ∈ V and define Cv := kvkL∞(Q;Rn). Since it holds  a2 εb2  γc, γout ≥ 0 and by using Young’s inequality ab ≤ 2ε + 2 for a, b ∈ R≥0, ε > 0 , we get Z Z a(t; ϕ, ϕ) ≥ λ ∇ϕ(x)∇ϕ(x) dx + v(t, x) · ∇ϕ(x)ϕ(x) dx

Ω Ω 2 ≥ λk∇ϕk 2 n − kv(t) · ∇ϕk 2 kϕk 2 L (Ω;R ) L (Ω) L (Ω) 2 ≥ λk∇ϕk 2 n − C k∇ϕk 2 n kϕk 2 L (Ω;R ) v L (Ω;R ) L (Ω)   2 λ 2 Cv 2 ≥ λk∇ϕk 2 n − C k∇ϕk 2 n + kϕk 2 L (Ω;R ) v L (Ω;R ) L (Ω) 2Cv 2λ 2 λ 2 Cv 2 = k∇ϕk 2 n − kϕk 2 2 L (Ω;R ) 2λ L (Ω) λ C2 λ = kϕk2 − v + kϕk2 2 V 2λ 2 H for t ∈ (0,T ). Lemma 4.3. The right-hand side g˜ : (0,T ) → V 0, t 7→ g(t, ·) is well-defined and it holds g˜ ∈ L2(0,T ; V 0). Proof. When we look at the definition of g, it is obvious thatg ˜(t) ∈ V 0 is a linear operator for all t ∈ (0,T ). Applying the Trace Theorem 2.8, for almost any t ∈ (0,T ) and any ϕ ∈ V , it holds

Z m Z Z X |g˜(t)ϕ| = f(t, x)ϕ(x) dx + γc ui(t) bi(x)ϕ(x) dA(x) + γoutyout(t) ϕ dA(x)

i=1 Ω Γc Γout m X Z Z ≤kfkL2(Ω)kϕkL2(Ω) + γc |ui(t)| |bi(x)ϕ(x)| dA(x) + γout|yout(t)| |ϕ| dA(x) i=1 Γ Γ m X p ≤kfkL2(Ω)kϕkL2(Ω) + γc |ui(t)| kbikL2(Γ)kϕkL2(Γ) Leb(Γ) i=1 p + γout|yout(t)| kϕkL2(Γ) Leb(Γ) m p X ≤kfkL2(Ω)kϕkV + γcCTT Leb(Γ) |ui(t)| kbikL2(Γ)kϕkV i=1 p + γoutCTT Leb(Γ) |yout(t)| kϕkV

=:C(t)kϕkV where Leb(Γ) is the Lebesgue measure of Γ. Furthermore, we get kg˜kV 0 ≤ C(t) and

T T Z Z 2 2 2 kg˜kL2(0,T ;V 0) = kg˜(t)kV 0 dt ≤ C(t) dt 0 0

17 T  m Z p X = kfkL2(Ω) + γcCTT Leb(Γ) |ui(t)| kbikL2(Γ) 0 i=1 2 p + γoutCTT Leb(Γ) |yout(t)| dt

T m !2 Z p X ≤ 2 kfkL2(Ω) + γcCTT Leb(Γ) |ui(t)| kbikL2(Γ) dt 0 i=1 T Z  p 2 + 2 γoutCTT Leb(Γ) |yout(t)| dt 0 T Z m !2 2 p X ≤4T kfkL2(Ω) + 4 γcCTT Leb(Γ) kbikL2(Γ)|ui(t)| dt 0 i=1 T Z 2 2 2 + 2γoutCTT Leb(Γ) |yout(t)| dt 0 T Z 2 2 2 2 2 ≤4T kfk 2 + 4γ C Leb(Γ)kbk 2 m ku(t)k m dt L (Ω) c TT L (Γ,R ) R 0 2 2 2 + 2γoutCTT Leb(Γ)kyoutkL2(0,T ) 2 2 2 2 2 2 2 2 =4T kfk 2 + 4γ C Leb(Γ)kbk 2 m kuk + 2γ C Leb(Γ)ky k 2 L (Ω) c TT L (Γ,R ) U out TT out L (0,T ) <∞, 2 2 2 where we used (r + s) ≤ 2r + 2s (r, s ∈ R≥0) twice. This shows the well-definedness of g˜ andg ˜ ∈ L2(0,T ; V 0). Theorem 4.4. Under the given assumptions (SE) has a unique weak solution y ∈ W (0,T ) satisfying  (4.2) kykW (0,T ) ≤ C ky0kH + kg˜kL2(0,T ;V 0) . Proof. This follows directly from Theorem 2.10 since V = H1(Ω) and H = L2(Ω) form a Gelfand triple, the bilinear form a is coercive and it holdsg ˜ ∈ L2(0,T ; V 0) and 2 y0 ∈ L (Ω). This leads us to Definition and Remark 4.5. We define the solution operator T : U → W (0,T ) with y = T u as the weak solution to (SE). Moreover, we split the solution (T u)(t, x) = y(t, x) :=y ˆ(t, x) + (Su)(t, x) in the inhomogeneous party ˆ and the homogeneous part Su wherey ˆ solves (SE) for u = 0 and Su solves (SE) for f = 0, γoutyout = 0 and y0 = 0 in the weak sense, respectively. We notice that S is well-defined, linear and bounded. Hence, the adjoint S? : W (0,T ) → U exists.

18 Chapter 5 The Optimal Control Problem of the Convection-Diffusion Model

2 The goal is to track the temperature y to given desired temperature profiles yQ ∈ L (Q) and yT ∈ H while utilizing as less control input as possible. Therefore, we introduce the cost function J : W (0,T ) × U → R with

T Z m σQ 2 σT 2 1 X 2 (5.1) J(y, u) = ky(t) − y (t)k dt + ky(T ) − y k + σ ku k , 2 Q H 2 T H 2 i i L2(0,T ) 0 i=1 where σQ, σT ≥ 0 and σi > 0, i = 1, ..., m, are weighting parameters. We define the reduced cost functional by ˆ J(u) = J(T u, u) for u ∈ Uad.

This makes sense since the room temperature y is controlled by u in the way that u spec- ifies how much the respective heater is heating in each time step. One can also say that the control input u represents the heating and the controlled outcome is the temperature y in the room.

In order to achieve our abovesaid goal, we consider the optimal boundary control problem with state constraints ˆ (OCP) min J(u) s.t. u ∈ Uad, ya ≤ T u ≤ yb in Q a.e.,

2 where ya, yb ∈ L (Q) are given lower and upper bounds, respectively. We define the slack variables sa := T u − ya and sb := yb − T u. As a consequence, we have

(5.2) ya − T u + sa = 0 in Q a.e.,

(5.3) T u − yb + sb = 0 in Q a.e., together with sa ≥ 0 and sb ≥ 0. By doing so, (OCP) turns into an unrestricted optimiza- tion problem like the ones we faced in Chapter 3. For this kind of problem, we introduced the augmented Lagrangian method. In the following, we set up the augmented Lagrange function and compute its derivative. In the upcoming algorithm, we minimize the aug- mented Lagrange function with a first-order method. This means that we only require the already determined first derivative. To deal with the gradient, we introduce the adjoint equation. This adjoint equation is a partial differential equation which comprises some parts from the state equation and whose solution is one part of the gradient - namely, the part which is numerically not practicable without providing the adjoint equation.

19 5.1 || The Augmented Lagrange Function of the Optimal Control Problem According to (3.1), for c > 0, the augmented Lagrange function is given as ˆ Lc(u, sa, sb, µa, µb) =J(u) + hµa, ya − T u + saiL2(Q) + hµb, T u − yb + sbiL2(Q) c c + ky − T u + s k2 + kT u − y + s k2 . 2 a a L2(Q) 2 b b L2(Q) considering (5.2), (5.3) as equality constraints. From the optimality conditions

∂Lc ! 2 2 (u, sa, sb, µa, µb)δsa = hµa, δsa iL (Q) + chya − T u + sa, δsa iL (Q) = 0 ∂sa and

∂Lc ! 2 2 (u, sa, sb, µa, µb)δsb = hµb, δsb iL (Q) + chT u − yb + sb, δsb iL (Q) = 0 ∂sb 2 for δsa , δsb ∈ L (Q), we derive the two equalities

µa + c(ya − T u + sa) = 0 in Q a.e.,

µb + c(T u − yb + sb) = 0 in Q a.e.

To ensure sa ≥ 0 and sb ≥ 0 we set  1  (5.4) s = max 0, T u − y − µ , a a c a  1  (5.5) s = max 0, y − T u − µ . b b c b

+ + − − Let Qa ,Qb ,Qa ,Qb be defined as     + 1 Q = (t, x) T u − ya − µa (t, x) > 0 a.e. , a c     + 1 Q = (t, x) yb − T u − µb (t, x) > 0 a.e. , b c − + − + Qa = Q\Qa ,Qb = Q\Qb . Then, we find Z   1  L (u, s , s , µ , µ ) = Jˆ(u) + µ y − T u + T u − y − µ dV (t, x) c a b a b a a a c a + Qa c Z   1 2 + y − T u + T u − y − µ dV (t, x) 2 a a c a + Qa Z c Z + µ (y − T u − 0) dV (t, x) + (y − T u − 0)2 dV (t, x) a a 2 a − − Qa Qa Z   1  + µ T u − y + y − T u − µ dV (t, x) b b b c b + Qb

20 c Z   1 2 + T u − y + y − T u − µ dV (t, x) 2 b b c b + Qb Z c Z + µ (T u − y − 0) dV (t, x) + (T u − y − 0)2 dV (t, x) b b 2 b − − Qb Qb Z 1 c 1 2 = Jˆ(u) + − µ2 + µ dV (t, x) c a 2 c a + Qa Z c + µ (y − T u) + (y − T u)2 dV (t, x) a a 2 a − Qa Z  1  c 1 2 + µ − µ + µ dV (t, x) b c b 2 c b + Qb Z c + µ (T u − y ) + (T u − y )2 dV (t, x) b b 2 b − Qb Z Z ˆ 1 2 1 2 = J(u) − µa dV (t, x) − µb dV (t, x) 2c 2c Q Q Z 1 c + µ2 + µ (y − T u) + (y − T u)2 dV (t, x) 2c a a a 2 a − Qa Z 1 c + µ2 + µ (T u − y ) + (T u − y )2 dV (t, x) 2c b b b 2 b − Qb 1 1 = Jˆ(u) − kµ k2 − kµ k2 2c a L2(Q) 2c b L2(Q) c Z µ 2 µ  + a + 2 a (y − T u) + (y − T u)2 dV (t, x) 2 c c a a − Qa c Z µ 2 µ  + b + 2 b (T u − y ) + (T u − y )2 dV (t, x) 2 c c b b − Qb 1 1 = Jˆ(u) − kµ k2 − kµ k2 2c a L2(Q) 2c b L2(Q) c Z  µ 2 + y − T u + a dV (t, x) 2 a c − Qa c Z  µ 2 + T u − y + b dV (t, x). 2 b c − Qb

µa + µb + Now, we use that ya − T u + c < 0 on Qa and T u − yb + c < 0 on Qb . Hence, it holds 1 1 L (u, s , s , µ , µ ) = Jˆ(u) − kµ k2 − kµ k2 c a b a b 2c a L2(Q) 2c b L2(Q) 21 c Z  µ 2 + y − T u + a dV (t, x) 2 a c − Qa c Z  µ 2 + T u − y + b dV (t, x) 2 b c − Qb 1 = Jˆ(u) − (kµ k2 + kµ k2 ) 2c a L2(Q) b L2(Q) 2 c n µa o + max 0, ya − T u + 2 c L2(Q) 2 c n µb o + max 0, T u − yb + 2 c L2(Q) ˆ =:Lc(u, µa, µb).

By having this, we can formulate

Algorithm 3 : First-order augmented Lagrangian method 0 0 2 2 1: Data: Initial pair (µa, µb ) ∈ L (Q) × L (Q), initial weight c0 > 0, increment β > 0 for cn, tolerance ε > 0, nmax maximum number of iterations; 2: begin 3: set n = 0 and FLAG = true; 4: while FLAG and n < nmax do n n n+1 5: for fixed (µa , µb ) find u solving the problem

 n  2 ˆ n n ˆ cn µa min Lcn (u, µa , µb ) =J(u) + max 0, ya − T u + 2 cn L2(Q)  n  2 cn µb + max 0, T u − yb + 2 cn L2(Q)

1 n 2 n 2 − (kµa kL2(Q) + kµb kL2(Q)) 2cn | {z } independent of u

s.t. u ∈ Uad;

6: update the Lagrange multipliers

n+1 n n+1 µa =max{0, µa + cn(ya − T u )}, n+1 n n+1 µb =max{0, µb + cn(T u − yb)};

n n+1 7: if ku − u kU ≤ ε then 8: FLAG = false;

9: set cn+1 = βcn, n = n + 1; 10: end.

Let us verify the multipliers’ update. After introducing the slack variables, we have the equality constraints (5.2) and (5.3). We insert (5.4) into (5.2) and (5.5) into (5.3),

22 respectively, and obtain

 1  e (u) := y − T u + max 0, T u − y − µ = 0 a a a c a  1  e (u) := T u − y + max 0, y − T u − µ = 0 b b b c b which in turn allows us to write

n+1 n n+1 µa = µa + cnea(u )    n n+1 n+1 1 n = µa + cn ya − T u + max 0, T u − ya − µa cn (5.6) ( n n+1 n n+1 1 no µ + cn(ya − T u ) if max 0, T u − ya − µ = 0, = a cn a 0 otherwise n n+1 = max{0, µa + cn(ya − T u )} and

n+1 n n+1 (5.7) µb =max{0, µb + cn(T u − yb)}, analogously.

5.2 || The Derivative of the Augmented Lagrange Function Now, let us compute the derivatives of

 2 ˆ ˆ c n µa o Lc(u, µa, µb) =J(u) + max 0, ya − T u + 2 c L2(Q) n µ o 2  1 b 2 2 + max 0, T u − yb + − (kµakL2(Q) + kµbkL2(Q)). c L2(Q) 2c

Differentiation with respect to u yields

ˆ D n o E ∂Lc δ ˆ0 δ µa 0 δ (u, µa, µb)u =J (u)u + c max 0, ya − T u + , −T (u)u ∂u c L2(0,T ;H) D n µb o 0 δE + c max 0, T u − yb + , T (u)u c L2(0,T ;H)   n µ o = ∇Jˆ(u) + cT 0(u)? max 0, T u − y + b b c   n µa o δ −max 0, ya − T u + , u c U

δ m for u ∈ U. Let ei (i = 1, ..., m) be the standard basis for the Euclidean vector space R . Looking more precisely at the first summand D E ˆ δ 0 δ δ ∇J(u), u = ∇yJ(y, u), T (u)u 2 + ∇uJ(y, u), u U L (0,T ;H) U

23  T  σ Z Z =∇ Q (y(t, x) − y (t, x))2 T 0(u)uδ(t) dxdt y  2 Q  0 Ω   σ Z + ∇ T (y(T, x) − y (x))2 T 0(u)uδ(T ) dx y  2 T  Ω T m Z 1 X + σ (∇ u2(t)) · uδ(t) dt 2 i u i 0 i=1 T Z Z 0 δ =σQ (y(t, x) − yQ(t, x)) T (u)u (t) dxdt 0 Ω Z * m + 0 δ X δ + σT (y(T, x) − yT (x)) T (u)u (T ) dx + σiuiei, u Ω i=1 U 0 δ 0 δ = σQ (y − yQ) , T (u)u L2(0,T ;H) + σT (y(T ) − yT ) , T (u)u (T ) H   σ1u1   σ u    2 2  δ +  .  , u  .  U σmum combined with T 0(u)uδ = Suδ yields

∂Lˆ   n µ o c (u, µ , µ )uδ = σ (y − y ) + c max 0, T u − y + b ∂u a b Q Q b c   n µa o 0 δ −max 0, ya − T u + , T (u)u c L2(0,T ;H)   σ1u1   σ2u2   + σ (y(T ) − y ) , T 0(u)uδ(T ) +   , uδ T T H  .   .  U σmum   n µ o = σ (y − y ) + c max 0, T u − y + b Q Q b c   n µa o δ −max 0, ya − T u + , Su c L2(0,T ;H)   σ1u1   σ2u2   + σ (y(T ) − y ) , Suδ(T ) +   , uδ . T T H  .   .  U σmum

Using the adjoint operator S? : W (0,T ) → U of the solution operator S : U → W (0,T ), we obtain

24 ∂Lˆ   n µ o c (u, µ , µ )uδ = S? σ (y − y ) + c max 0, T u − y + b ∂u a b Q Q b c   σ1u1 (5.8) n µ o   σ2u2   a   δ − c max 0, ya − T u + +  .  , u c  .  U σmum δ + σT (y(T ) − yT ) , Su (T ) H for uδ ∈ U. In the previous computation we made use of the Hilbert adjoint S? : W (0,T ) → U of the homogeneous solution operator S. Since we do not know how the adjoint S? looks like and how to evaluate S?, we introduce the adjoint equation in the next chapter.

5.3 || The Adjoint Equation For given u, we solve the state equation (SE) to get y = T u and define n  µ  o Q = (t, x) ∈ Q y − y + a (t, x) > 0 , a a c n  µ  o Q = (t, x) ∈ Q y − y + b (t, x) > 0 . b b c

∂Lˆc For computing parts of ∂u , we introduce a second partial differential equation in

Definition 5.1. For the weak formulation (SEwf) and a given u ∈ Uad, the end value problem (AE)  µ (t) −p (t) + a(t; ·, p(t)) =σ (y (t) − T u(t)) + c y (t) − T u(t) + a χ t Q Q a c Qa  µ (t) − c T u(t) − y (t) + b χ in V 0 a.e., b c Qb

p(T ) =σT (yT − T u(T )) in H, is called the adjoint equation.

∂Lˆc The reason why the right-hand side of the adjoint has this form is the derivative ∂u of the augmented Lagrange function: We want to get rid of the first term - the one which ? ∂Lˆc includes the adjoint S in ∂u . In the following we will see how the adjoint equation helps to achieve that. We point out that the adjoint equation has an additional right-hand side compared to the one in [MV18, pages 7-8]. Lemma 5.2. The adjoint equation (AE) admits a unique solution p ∈ W (0,T ) for each u ∈ Uad. In particular, there exists a C > 0 such that   µ   µ  a b kpkW (0,T ) ≤ C σQ(yQ − T u) + c ya − T u + χQa − c T u − yb + χQb c c L2(0,T ;V 0)  + kσT (yT − T u(T ))kH .

25 Proof. We definep ˜(t) := p(T − t) for all t ∈ [0,T ] to transform (AE) into the initial value problem

−p˜t(t) + a(t; ·, p˜(t)) =σQ(yQ(T − t) − T u(T − t))  µ (t) + c y (T − t) − T u(T − t) + a χ a c Qa (5.9)  µ (t) − c T u(T − t) − y (T − t) + b χ in V 0 a.e., b c Qb

p˜(0) =p(T − 0) = σT (yT − T u(T )) in H. Lemma 4.2 tells us that a is coercive. Since  µ (·) σ (y (T − ·) − T u(T − ·)) + c y (T − ·) − T u(T − ·) + a χ Q Q a c Qa  µ (·) − c T u(T − ·) − y (T − ·) + b χ ∈ L2(0,T ; V ) b c Qb and σT (yT − T u(T )) ∈ H, we can apply Theorem 2.10. Consequently, there exists a unique solutionp ˜ to (5.9) for each u ∈ Uad with   µ   µ  a b kp˜kW (0,T ) ≤ C σQ(yQ − T u) + c ya − T u + χQa − c T u − yb + χQb c c L2(0,T ;V 0)  + kσT (yT − T u(T ))kH .

But then, there is a unique solution p of (AE) which fulfils the same inequality. Definition and Remark 5.3. We split the right-hand side g(t, ϕ) := r(t)ϕ+(Bu)(t)ϕ of (SEwf) analogously to Definition and Remark 4.5 in the inhomogeneous part r : (0,T ) → V 0, Z Z r(t)ϕ := f(t, x)ϕ(x) dx + γoutyout ϕ(x) dA(x)(t ∈ (0,T ), ϕ ∈ V )

Ω Γout and the homogeneous part B : U → L2(0,T ; V 0),

m X Z (Bu)(t)ϕ := γc ui(t) bi(x)ϕ(x) dA(x)(t ∈ (0,T ), u ∈ U, ϕ ∈ V ). i=1 Γc Considering the proof of Lemma 4.3, we can similarly compute that there exists a C > 0 such that

kBukL2(0,T ;V 0) ≤ CkbkL2(Γ,Rm)kukU . Definition 5.4. We define the linear and continuous solution operator 2 0 2 0 R1 : L (0,T ; V ) → W (0,T ) which maps an arbitrary right-hand side f ∈ L (0,T ; V ) of −p (t) + a(t; ·, p(t)) =f(t) in V 0 a.e., (5.10) t p(T ) =σt(yT − yˆ(T )) in H,

26 to its unique solution. Furthermore, we define another linear and continuous solution 2 0 operator R2 : L (0,T ; V ) → W (0,T ) which maps an arbitrary right-hand side f ∈ L2(0,T ; V 0) of

−p (t) + a(t; ·, p(t)) =f(t) in V 0 a.e., (5.11) t p(T ) = − σT Su(T ) in H, to its unique solution. Definition and Remark 5.5. Using the previous Definition 5.4 and T u = Su +y ˆ, we split the right-hand side of the adjoint equation (AE) in two parts and define   µ   µ   pˆ := R σ (y − yˆ) + c y − yˆ + a χ − c yˆ − y + b χ ∈ W (0,T ) 1 Q Q a c Qa b c Qb and A : U → W (0,T ) with

Au := R2(−σQSu − cSuχQa − cSuχQb ). We notice that A is linear and continuous. Furthermore, it holds that

p = Au +p ˆ =R2(−σQSu − cSuχQa − cSuχQb )   µ   µ   + R σ (y − yˆ) + c y − yˆ + a χ − c yˆ − y + b χ 1 Q Q a c Qa b c Qb solves the adjoint equation (AE) for a given u ∈ Uad.

Lemma 5.6. Let u, v ∈ U and y := Su, w := σQSv + cSvχQa + cSvχQb , p := Av ∈ W (0,T ). Then, it holds

T T Z Z h(Bu)(t), p(t)iV 0,V dt = − hw(t), y(t)iH dt − σT hSv(T ), y(T )iH . 0 0

Proof. If Bu is the only right-hand side of the weak formulation (SEwf), we have y = Su as the solution. Since p(t) ∈ V holds, we obtain

T T Z Z h(Bu)(t), p(t)iV 0,V dt = hyt(t), p(t)iV 0,V + a(t; y(t), p(t)) dt. 0 0

Integration by parts, using y(0) = 0 and p(T ) = −σT Sv(T ) yield

T T Z Z hyt(t), p(t)iV 0,V + a(t; y(t), p(t)) dt = h−pt(t), y(t)iV 0,V + a(t; y(t), p(t)) dt 0 0

+ hp(T ), y(T )iH − hp(0), y(0)iH T Z = h−pt(t), y(t)iV 0,V + a(t; y(t), p(t)) dt 0

− σT hSv(T ), y(T )iH .

27 Moreover, it holds that y(t) ∈ V and p solves (5.11) with right-side −w which means T T Z Z h−pt(t), y(t)iV 0,V + a(t; y(t), p(t)) dt = − hw(t), y(t)iH dt. 0 0 Together we get T T Z Z h(Bu)(t), p(t)iV 0,V dt = − hw(t), y(t)iH dt − σT hSv(T ), y(T )iH . 0 0

Lemma 5.7. For u, v ∈ U, it holds ? ? hB Av, uiU + σT hSv(T ), Su(T )iH = −hS S(σQ + cχQa + cχQb )v, uiU and ? hB p,ˆ uiU − σT hyT − yˆ(T ), Su(T )iH   ?   µa   µb   = S σQ(yQ − yˆ) + c ya − yˆ + χQa − c yˆ − yb + χQb , u . c c U Proof. Let u, v ∈ U and y, w, v, p be like in the proof of Lemma 5.6. Applying Lemma 5.6, it holds T Z ? ? 2 hS S(σQ + cχQa + cχQb )v, uiU =hS w, uiU = hw, SuiL (0,T ;H) = hw(t), y(t)iH dt 0 T Z = − h(Bu)(t), p(t)iV 0,V dt − σT hSv(T ), y(T )iH 0

= − hBu, piL2(0,T ;V 0),L2(0,T ;V ) − σT hSv(T ), y(T )iH ? = − hu, B piU − σT hSv(T ), y(T )iH ? = − hB Av, uiU − σT hSv(T ), y(T )iH Up next, using the definition ofp ˆ and integration by parts, we conclude

D ?   µa   µb   E S σQ(yQ − yˆ) + c ya − yˆ + χQa − c yˆ − yb + χQb , u c c U T Z D  µa   µb   E = σQ(yQ − yˆ) + c ya − yˆ + χQa − c yˆ − yb + χQb , Su dt c c H 0 T Z = h−pˆt(t), y(t)iH + a(t; y(t), pˆ(t)) dt 0 T Z = hyt(t), pˆ(t)iH + a(t; y(t), pˆ(t)) dt − σT hyT − yˆ(T ), y(T )iH 0 T Z = h(Bu)(t), pˆ(t)iV 0,V dt − σT hyT − yˆ(T ), y(T )iH 0

28 ? =hB p,ˆ uiU − σT hyT − yˆ(T ), y(T )iH ? =hB p,ˆ uiU − σT hyT − yˆ(T ), Su(T )iH where y = Su is the solution to (5.10) with right-hand side Bu.

Corollary 5.8. Let p = Au +p ˆ be the solution of the adjoint equation (AE) with end value p(T ) = σT (yT − T u(T )) and let y = T u = Su +y ˆ be the solution of the weak form δ of the state equation (SEwf) for a given u ∈ U. Then, for u ∈ U, Lemma 5.7 implies     ?  µa   µb  δ S σQ(yQ − T u) + c ya − T u + χQa − c T u − yb + χQb , u c c U ? δ ? δ δ δ =hB p,ˆ u iU + hB Au, u iU − σT hyT − yˆ(T ), Su (T )iH + σT hSu(T ), Su (T )iH ? δ δ =hB (Au +p ˆ), u iU − σT hyT − y(T ), Su (T )iH ? δ δ =hB p, u iU − hp(T ), Su (T )iH .

Lemma 5.9. Let p = Au +p ˆ be the solution of the adjoint equation (AE) with end value p(T ) = σT (yT − T u(T )) and let y = T u = Su +y ˆ be the solution of the weak form of the ˆ state equation (SEwf) for a given u ∈ U. We can write the gradient of Lc in the form   σ1u1 ∂Lˆ   σ2u2   c δ ?   δ (u, µa, µb)u = − B p +  .  , u , ∂u  .  U σmum where B? : L2(0,T ; V ) → U is the adjoint operator of B given by  R  γc b1(x)v(t, x) dA(x)  Γc  ?  .  B v(t) =  .  .  R  γc bm(x)v(t, x) dA(x) Γc

Proof. Plugging the results of Corollary 5.8 in (5.8) and using p(T ) = σT (yT − T u(T )), we get the representation

∂Lˆ   n µ o c (u, µ , µ )uδ = S? σ (y − y ) + c max 0, T u − y + b ∂u a b Q Q b c   σ1u1 n µ o   σ2u2   a   δ − c max 0, ya − T u + +  .  , u c  .  U σmum δ + σT (T u(T ) − yT ) , Su (T ) H   σ1u1   σ2u2   = − B?p +   , uδ − p(T ), Suδ(T ) + hp(T ), Suδ(T )i  .  H H  .  U σmum

29   σ1u1   σ u   ?  2 2  δ = − B p +  .  , u .  .  U σmum

It remains to determine B?. Let u ∈ U and v ∈ L2(0,T ; V ) be arbitrary. Then, it holds

T T m Z Z X Z h(Bu)(t), v(t)iV 0,V dt =γc ui(t) bi(x)v(t) dA(x) dt i=1 0 0 Γc ? =hu, B viU with  R  γc b1(x)v(t, x) dA(x)  Γc  ?  .  B v(t) =  .  .  R  γc bm(x)v(t, x) dA(x) Γc

Since we are able to solve the adjoint equation (AE), we obtain p = Au +p ˆ for a given ∂Lˆc control u ∈ U. Hence, we can numerically deal with ∂u .

30 Chapter 6 Numerical Consideration of the Optimal Control Problem

In the previous chapters, we made the theoretical preparations for the practical realization of the optimal control problem. Now, in this chapter, we show numerical results. Using finite elements and the Galerkin ansatz, we convert the state and the adjoint equation into ordinary differential equations. To obtain the solution of the arising ordinary differential equations, we apply the implicit Euler method. Moreover, to solve the minimization problem within the augmented Lagrangian method, we apply the gradient projection method, a first-order method. To recall, we want to solve the problem

T Z m σQ 2 σT 2 1 X 2 min J(y, u) = ky(t) − y (t)k dt + ky(T ) − y k + σ ku k 2 Q H 2 T H 2 i i L2(0,T ) 0 i=1 subject to the state equation

yt(t, x) − λ∆y(t, x) + v(t, x) · ∇y(t, x) = f(t, x) in Q = (0,T ) × Ω a.e., m ∂y X λ (t, x) + γ y(t, x) = γ u (t)b (x) in Σ = (0,T ) × Γ a.e., ∂η c c i i c c i=1 ∂y λ (t, x) + γ y(t, x) = γ y (t) in Σ = (0,T ) × Γ a.e., ∂η out out out out out

y(0, x) = y0(x) in Ω a.e. as well as

u ∈ Uad and ya ≤ T u ≤ yb in Q a.e. For the numerical treatment and all the tests, we choose the unit square Ω := (0, 1) × (0, 1) as our domain Ω and consequently, as the representative for our room. Furthermore, we set the final time T = 1.

6.1 || Solving the State and the Adjoint Equation In the previous chapters, we dealt with two partial differential equations as part of the augmented Lagrangian method. In the following, we solve the two equations numerically. For simplicity reasons, we rewrite the weak form of both equations. For the state equation (SE) we have Z 1 yt(t, x)ϕ(x) dx + a(t; y(t, x), ϕ(x)) = g(t, ϕ(x)) for ϕ ∈ V = H (Ω), Ω 2 y(0) = y0 in H = L (Ω).

31 Besides, for the adjoint equation (AE), for ϕ ∈ V = H1(Ω), it holds Z Z −pt(t, x)ϕ(x) dx + a(t; ϕ(x), p(t, x)) = σQ(yQ(t, x) − T u(t))ϕ(x) Ω Ω  µ (t, x) + c y (t, x) − T u(t) + a χ ϕ(x) a c Qa  µ (t, x) − c T u(t) − y (t, x) + b χ ϕ(x) dx, b c Qb 2 p(T ) =σT (yT − T u(T )) in H = L (Ω).

The state and the adjoint equation contain the terms Z Z Z a(t; ψ, ϕ) =λ ∇ψ(x) · ∇ϕ(x) dx + γc ψ(x)ϕ(x) dA(x) + γout ψ(x)ϕ(x) dA(x)

Ω Γc Γout Z + v(t, x) · ∇ψ(x)ϕ(x) dx

Ω and

m Z X Z Z g(t, ϕ) = f(t, x)ϕ(x) dx + γc ui(t) bi(x)ϕ(x) dA(x) + γoutyout(t) ϕ(x) dA(x). i=1 Ω Γc Γout

6.1.1 Galerkin Ansatz and Implicit Euler Method for the State Equation

Given a triangulation 4 of Ω, where ∂4c and ∂4out belong to the boundary Γc and Γout, respectively, the finite element space on 4 is denoted as V4. Let V4 ⊂ V be spanned by the piecewise linear functions ϕ1, ..., ϕN with N ∈ N where it holds that  1 for i = j, (6.1) ϕ (v ) = for 1 ≤ i, j ≤ N, i j 0 for i 6= j, with vj as the j-th node of the triangulation ∆. In this way, involving the Galerkin N P ansatz y(t, x) ≈ y˜i(t)ϕi(x) for suitabley ˜1, ..., y˜N (see [ZTZ05, Chapter 3] for instance), i=1 we approximate Z 1 yt(t, x)ϕ(x) dx + a(t; y(t, x), ϕ(x)) = g(t, ϕ(x)) for ϕ ∈ V = H (Ω) Ω by

N N X Z X y˜˙i(t) ϕi(x)ϕj(x) dx + y˜i(t)a(t; ϕi(x), ϕj(x)) = g(t, ϕj(x)) for j = 1, ..., N. i=1 Ω i=1 Before we solve this semidiscrete system of ordinary differential equations in time with the unknowny ˜ = (˜y1, ..., y˜N ), we give the resulting matrices and explain how to calculate

32 them. At first, there are the entries

Z X Z Mi,j := ϕi(x)ϕj(x) dx ≈ ϕi(x)ϕj(x) dx. Ω τ∈4 τ of the mass matrix M. Moreover, inside the bilinear form a and the right-hand side g, we have the terms Z Z 1 X Ai,j := ∇ϕi(x)∇ϕj(x) dx ≈ ∇ϕi(x)∇ϕj(x) dx, Ω τ∈4 τ Z Z 2 X Ai,j := ϕi(x)ϕj(x) dA(x) ≈ ϕi(x)ϕj(x) dA(x), τ∈∂4c Γc τ Z Z 3 X Ai,j := ϕi(x)ϕj(x) dA(x) ≈ ϕi(x)ϕj(x) dA(x), τ∈∂4out Γout τ Z Z 4 X Ai,j(t) := v(t, x) · ∇ϕi(x)ϕj(x) dx ≈ v(t, x) · ∇ϕi(x)ϕj(x) dx, Ω τ∈4 τ Z Z 1 X Gj (t) := f(t, x)ϕj(x) dx ≈ f(t, x)ϕj(x) dx, Ω τ∈4 τ Z Z 2 X Gk,j := bk(x)ϕj(x) dA(x) ≈ bk(x)ϕj(x) dA(x)(k ∈ {1, ..., m}), τ∈∈∂4c Γc τ and Z Z 3 X Gj := ϕj(x) dA(x) ≈ ϕj(x) dA(x). τ∈∂4out Γout τ In this thesis, we choose the linear finite element space ¯ V4 := {ϕ ∈ C(Ω) : ϕ|τ ∈ P1 for all τ ∈ 4} where P1 := {f(x) = a1x + a0 | a1, a0 ∈ R} describes the space of linear polynomials. To get values for every entry of the matrices, we use the reference triangle or quadrature formulas. We show this exemplary for the entries

X Z ϕi(x)ϕj(x) dx τ∈4 τ of the mass matrix M. For each simplex τ, we define the local mass matrix M|τ as Z

(M|τ )iτ ,jτ := ϕiτ (x)ϕjτ (x) dx τ

for iτ , jτ ∈ {1, 2, 3}. Restricted to one simplex, the basis ϕi is identical to the local one ϕiτ . Hence, we determine all 3 × 3 local mass matrices M|τ and sum each local contribution into the global one. Thereto, we use a reference triangleτ ˆ which is a triangle with vertices

33 vˆ1 = (0, 0), vˆ1 = (1, 0), vˆ1 = (0, 1). Every triangle τ of vertices viτ is supposed to be the image ofτ ˆ under the affine mapping R :τ ˆ → τ such that

R(ˆviτ ) = viτ = (xiτ , yiτ ), iτ ∈ {1, 2, 3}, with R(ˆp) = ΛT pˆ + r wherep ˆ = (ˆx, yˆ) represents the coordinates of a point in the reference system of the T reference triangle, r = (x1, y1) and  x − x y − y  Λ = 2 1 2 1 . x3 − x1 y3 − y1

1.2 1.2

1 1

0.8 0.8 R 0.6 −→ 0.6

0.4 0.4

0.2 0.2

0 0 0 0.2 0.4 0.6 0.8 1 1.2 0 0.2 0.4 0.6 0.8 1 1.2

Figure 2: The mapping R transforms the reference triangleτ ˆ into τ

This yields Z Z

(M|τ )iτ ,jτ = ϕiτ (x, y)ϕjτ (x, y) dxdy = | det(Λ)| ϕiτ (R(ˆx, yˆ))ϕjτ (R(ˆx, yˆ)) dxdˆ y.ˆ τ τˆ In the last step, we use a Gaussian quadrature rule with one node to obtain Z

(M|τ )iτ ,jτ = | det(Λ)| ϕiτ (R(ˆx, yˆ))ϕjτ (R(ˆx, yˆ)) dxdˆ yˆ τˆ 1 = | det(Λ)|ϕ (R(ξ, η))ϕ (R(ξ, η)). 2 iτ jτ 1 1 with (ξ, η) = ( 3 , 3 ). What we get at the end, is an approximation for all entries of the single matrices or vectors which build the ordinary differential equation M˜ y˜˙(t) + A(t)˜y(t) = G(t)(6.2) (6.3) y˜(0) =y ˜0 with the unknowny ˜ where the matrices M,A˜ (t) ∈ RN×N and the time-dependent right- hand side vector G(t) ∈ RN are given by ˜ Mi,j ≈ Mi,j, 1 2 3 4 Ai,j(t) ≈ λAi,j + γcAi,j + γoutAi,j + Ai,j(t), m 1 X 2 3 Gj(t) ≈ Gj (t) + γc uk(t)Gk,j(t) + γoutyout(t)Gj k=1

34 0 N for i, j ∈ {1, ..., N}. Due to y0, the initial valuey ˜ ∈ R can be determined by

N X 0 y0 ≈ y˜i ϕi i=1 with the aid of (6.1).

It remains to solve the ordinary differential equations (6.2) with initial condition (6.3). Therefore, we apply the implicit Euler method. In the following part, we introduce the method for a general ordinary differential equation

0 y˜˙ = f(t, y˜), y˜(t0) =y ˜ with Lipschitz-continuous right-hand side f. Choosing a discretization step size h > 0 and considering the discrete time steps tk = t0 + kh for k = 1, 2, ..., Nt (Nt ∈ N), the method produces

k+1 k k+1 y˜ =y ˜ + hf(tk+1, y˜ ) for k ∈ {0, 1, 2, ..., Nt − 1}

k where we sety ˜ :=y ˜(tk). In our case, for every time step starting from t0 = 0, we have to solve the linear system

k+1 k N (M + hA(tk+1))˜y = My˜ + hG(tk+1) ∈ R . The implicit Euler method, which is a first-order method, is chosen because it does not have any stability restrictions with respect to the time steps.

6.1.2 Galerkin Ansatz and Implicit Euler Method for the Adjoint Equation Analogously, we solve the adjoint equation. One should note that the implicit Euler method is thereby utilized backwards in time. At the end, we have to solve

(6.4)   µ (t ) (M + hA(t )T )˜pk =Mp˜k+1 + h σ (y (t ) − y˜k) + c y (t ) − y˜k + a k χ k Q Q k a k c Qa  µ (t )  − c y˜k − y (t ) + b k χ for k ∈ {N − 1,N − 2, ..., 0} b k c Qb t t

Nt p˜(Nt) =σT (yT − y˜ ) for the unknownp ˜ ∈ RN . We note thaty ˜ is the solution of the state equation from the foregoing section.

6.2 || The Practicable Code Numerically, we can solve our problem, which is a restricted optimization problem and an optimal control problem at the same time, with the augmented Lagrangian method. We already presented a pseudo code for this method in Algorithm 3. When we have a

35 look inside the code of Algorithm 3, we find that every iteration contains minimizing the function

 2 ˆ ˆ c n µa o Lc(u, µa, µb) =J(u) + max 0, ya − T u + 2 c L2(Q) n µ o 2  1 b 2 2 + max 0, T u − yb + − (kµakL2(Q) + kµbkL2(Q)). c L2(Q) 2c

For solving this minimization problem, we use the first-order

Algorithm 4 : Gradient projection method (for f ∈ C1) 1: Data: Initial guess u0, projection function P; 2: begin 3: set k = 0; 4: while no convergence and k < kmax do 5: set descent direction dk = −∇f(uk); 6: choose step size sk ∈ (0, 1); k+1 k k 7: calculate u = P(u + skd ); 8: set k = k + 1; 9: end.

A detailed description of this algorithm can be found in [Kel99, Chapter 5.4]. We choose the step size by means of the sufficient decrease condition for line searches (compare with 3.2)

k+1 k k+1 k 2 (6.5) f(u ) − f(u ) ≤ −αskku − u k where α is typically set to 10−4. We choose a ζ ∈ (0, 1) and try to find the least integer m m such that sk = ζ fulfils condition (6.5). The projection function P : U → Uad is the orthogonal projection to the control constraints’ set Uad, i.e. if we have u < ua or u > ub, P projects u to ua or ub, respectively. In this way, we guarantee that the control bounds are satisfied.

The gradient projection method requires the gradient of the function which shall be minimized. In line with this, we remember the gradient of the augmented Lagrange function   σ1u1 ∂Lˆ   σ2u2   c δ ?   δ (6.6) (u, µa, µb)u = − B p +  .  , u , ∂u  .  U σmum specified in Lemma 5.9, where p is the solution of the adjoint equation (AE).

After combining Algorithm 3 with Algorithm 4, we present a more practicable al- gorithm recorded in

36 Algorithm 5 : Gradient projection augmented Lagrangian method 0 0 0,0 0 1: Data: (µa, µb ), u , c > 0, β > 1, ε, εgp; 2: begin 3: set n, k = 0, chooseu, ˜ uˆ large enough to enter the loops; 4: solve the state equation for un,k to get T un,k; 5: solve the adjoint equation to get pn,k (needed for (6.6)); n,k 6: while ku˜ − u k > ε and n < nmax do 7: setu ˜ = un,k, c = cn; n,k 8: while kuˆ − u k > εgp and k < kmax do 9: setu ˆ = un,k; 10: % Step 5 from Algorithm 3 turns into: k ∂Lˆc n,k n n n,k 11: calculate descent direction d = − ∂u (u , µa , µb ) using p ; 12: choose step size sk ∈ (0, 1) using (6.5); n,k+1 n,k k 13: do one gradient step to get u = P(u + skd ); 14: solve the state equation for un,k+1 to get T un,k+1; 15: solve the adjoint equation to get pn,k+1 (needed for (6.6)); 16: set k = k + 1; 17: set un+1,0 = un,k; n+1 n+1 18: update the Lagrange multipliers via (5.6) and (5.7) to get µa , µb ; 19: update cn+1 = βcn; 20: set n = n + 1, k = 0, chooseu ˆ high enough to enter the inner loop again; 21: end.

6.3 || Tests All tests in this section are implemented on a Notebook Acer Aspire E15 E5-71G-717X, Intel R CoreTM i7-5500U 2.4GHz (with Turbo Boost up to 3.0GHz) and 8GB RAM in the programming language MATLAB.

In the following we call the inner loop of Algorithm 5 the gradient loop and the outer loop the Lagrangian loop. For Test 1, we look at different discretizations for the 1 2 arising equations using linear finite elements with Nx = 185 nodes , Nx = 320 nodes and 3 Nx = 712. The time interval is divided into Nt = 101 equidistantly distributed time steps. To create the geometry and the mesh, we employ the MATLAB routines createpde, geometryFromEdges and generateMesh. For solving the linear systems, MATLAB’s backslash command is utilized. In Figure 3, we can see the different generated meshes.

1 1 1

0.9 0.9 0.9

0.8 0.8 0.8

0.7 0.7 0.7

0.6 0.6 0.6

0.5 0.5 0.5

0.4 0.4 0.4

0.3 0.3 0.3

0.2 0.2 0.2

0.1 0.1 0.1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

1 2 3 Figure 3: Meshes of Ω with Nx ,Nx and Nx nodes from left to right, respectively

37 Furthermore, we note that the operator B?, which is given by the representation in 2 Lemma 5.9, can be obtained by transposing the matrix γcG appearing in the right-hand side G of the ordinary differential equation (6.2).

In our tests, we basically choose the same prescribed functions and parameters than Luca Mechelli and Stefan Volkwein employ in their tests in Chapter 6 of the paper ”POD- Based Economic Optimal Control of Heat-Convection Phenomena” (see [MV18]). The right-hand side f of (SEa) is fixed to be zero. We have four controls with shape functions  1 if x = 0, 0 ≤ x ≤ 0.25, b (x) = 1 2 1 0 otherwise,  1 if 0.25 ≤ x ≤ 0.5, x = 1, b (x) = 1 2 2 0 otherwise,  1 if x = 1, 0.5 ≤ x ≤ 0.75, b (x) = 1 2 3 0 otherwise,  1 if 0.5 ≤ x ≤ 0.75, x = 0, b (x) = 1 2 4 0 otherwise.

The physical parameters are λ = 1.0, γc = 1.0 and γout = 0.03. We set

y0 = | sin(2πx1) cos(2πx2)| (x = (x1, x2) ∈ Ω) as the initial condition of the state equation. The entries of the velocity field v(t, x) = (v1(t, x), v2(t, x)) (t ∈ [0,T ]) are given by   −1.6 if t < 0.5, x ∈ V1, v1(t, x) = −0.6 if t ≥ 0.5, x ∈ V2,  0 otherwise, (6.7)   0.5 if t < 0.5, x ∈ V1, v2(t, x) = 1.5 if t ≥ 0.5, x ∈ V2,  0 otherwise, where we have

V1 = {x = (x1, x2) ∈ Ω | 4x1 + 12x2 ≥ 3, 4x1 + 12x2 ≤ 13},

V2 = {x = (x1, x2) ∈ Ω | x1 + x2 ≥ 0.5, x1 + x2 ≤ 1.5}.

1

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) t < 0.5, x ∈ V1 (b) t ≥ 0.5, x ∈ V2 Figure 4: The shape functions (blue) and the region of the velocity field (red) in the domain Ω

38 Furthermore, we define the outside temperature

 −1 if 0 ≤ t < 0.5, y (t) = out 1 if 0.5 ≤ t ≤ T, and the targets

yQ(t, x) = min(2.0 + t, 3.0) as well as yT (x) = yQ(T, x).

The state constraints are fixed as

ya(t) = 0.5 + min(2t, 2.0) and yb = 3.0.

On the one hand, our state variable y should track the temperature profile yQ during our optimization procedure as accurately as possible. On the other hand, y is not allowed to be lower than ya and bigger than yb. Regarding the next figure, y has to be always between the green and red line.

3.5

3

2.5

2

1.5

1

0.5

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 5: The temperature profile yQ and the state constraints ya, yb

We restrict our control by setting ua(t) = 0 and ub(t) = 7 for t ∈ [0,T ].

6.3.1 Test 1 Test 1 deals with exactly the same functions and initial guesses than Test 1 in [MV18]: Let m = 4. As starting control we set

0 ui (t) = 3.5 for t ∈ [0,T ] and i ∈ {1, ..., m}.

The cost functional (5.1) weights are σT = σQ = 0 and σi = 1.0 for i ∈ {1, ..., m}. In order to solve the partial differential equations (SE) and (AE), we use the implicit Euler method with equidistant time step ∆t = 0.01 as mentioned in the Sections 6.1.1 and 6.1.2. We start our algorithm with

0 0 (6.8) µa = max{0, c0(ya − T u )}, 0 0 (6.9) µb = max{0, c0(T u − yb)}

39 −5 and set ε = 10 and εgp = 0.2ε as the stopping criterion. For finding the step size by (6.5), we start with ζ = 0.8. Results for different settings can be found in Table 2.

Nx c β Iterations (Gradient loops) 1 185 1 4 {2} 2 185 1 190 {2} 3 185 0.1 190 {3, 3, 4, 575, 4, 2} 4 185 0.1 4 {3, 2} 5 185 40 1.2 {2} 6 185 50 1.2 {328, 498, 3, 2} 7 320 1 4 {2} 8 320 1 350 {2} 9 320 0.1 350 {3, 3, 387, 208, 2} 10 320 0.1 10 {3, 2} 11 320 50 10 {2} 12 320 100 10 {693, 3, 2} 13 712 1 4 {2} 14 712 1 350 {2} 15 712 0.01 350 {3, 3, 415, 415, 3, 2} 16 712 0.01 10 {3, 2} 17 712 100 10 {2} 18 712 350 1.2 {127, 39, 26, 20, 19, 19, 21, 28, 43, 81, 273, 2} Table 2: Results for the method with different settings for Test 1

We note that the number of gradient loops is the number of iterations for the Lagrangian loop. Depending on the c and β combination, the algorithm executes more than two Lagrangian loop steps or not. When we have a look at the plots of the control for different settings, we see problems.

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a) c = 1, β = 4 (b) c = 50, β = 1.2 (c) c = 0.1, β = 190

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (d) c = 100, β = 10 (e) c = 0.1, β = 10 (f) c = 0.1, β = 350

40 7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (g) c = 0.01, β = 10 (h) c = 0.01, β = 350 (i) c = 350, β = 1.2

Figure 6: The control ui, i ∈ {1, 2, 3, 4}, for different settings (First row: Nx = 185, second row: Nx = 320, third row: Nx = 712)

The algorithm does not perform the anticipated results. Either it stops too soon or the control value at the time T is not equal to zero. However, the value has to be zero since ∂Lˆc the gradient ∂u tends to zero in theory and so (6.6) and p(T ) = 0 yield   σ1u1  σ2u2  ∂Lˆ ? ?   c u(T ) = −B 0 + u(T ) = −B p(T ) +  .  (T ) = (u(T )) → 0  .  ∂u σmum for the last computed control at ending time T . The reason behind the problem u(T ) 6= 0 in the algorithm is that (6.5) permanently produces very little step sizes. In Table 2 we can see that for none of our discretizations, c = 1 let the algorithm run for more than one Lagrangian loop - no matter what β we choose. Considering Nx = 712, either we have to start with c ≤ 0.01 in combination with β ≥ 350. But then, β is very big for the total procedure, which implies that the penalty increases too fast, or we initiate with c ≥ 350 with the result u(T ) 6= 0 for the computed u. What causes this problem? It is obvious that a small c in the beginning, 0.01 < c < 350 for Nx = 712 to be precise, does not affect the multipliers (see (6.8), (6.9)) and the result of the adjoint equation (see (6.4)) enough such that the new u differs sufficiently from the old u in order to stay in the loop. But in contrast, a very small c, actually c ≤ 0.01, shrinks the solution of the adjoint since the right-hand side of (6.4) becomes smaller. Hence, plugging in the small adjoint solution and concerning its consequence of obtaining bigger step sizes resulting from (6.5), the difference between the new u and the old one is big enough to remain in the loops. We observe that c has to be pushed a lot (see Table 3) and kept away from the region, where it does not influence the new u as described above, because otherwise the algorithm stagnates.

To improve the method’s results, we start with a small c and a big β. After the first iteration, we change the β and make it smaller. This means that we start our iter- ation with a little punishment. Then, we punish a lot in the next step to remain in the loop and in the following steps, we again increase the punishment but in a lower ratio. More precisely, we have a big β0 for the first c-update and for the outstanding iterations we take β ∈ [4, 10] which is common for Lagrangian methods. This is recorded in Table 3 and the new results can be seen in Figure 7.

41 Nx c β0 β Iterations (Gradient loops) 1 185 0.1 190 4 {3, 3, 5, 4, 258, 95, 5, 2} 2 185 0.1 190 8 {3, 3, 377, 597, 3, 3, 3, 331, 2} 3 185 0.1 190 10 {3, 3, 4, 67, 3, 146, 530, 3, 2} 6 320 0.05 350 4 {3, 3, 4, 3, 3, 154, 4, 2} 5 320 0.05 350 8 {3, 3, 4, 52, 41, 250, 362, 4, 168, 2} 6 320 0.05 350 10 {3, 3, 4, 2} 7 712 0.01 350 4 {3, 3, 680, 265, 274, 160, 118, 351, 378, 12, 3, 2} 8 712 0.01 350 8 {3, 3, 4, 3, 159, 233, 183, 21, 2} 9 712 0.01 350 10 {3, 3, 4, 2} Table 3: Results for the method with the adapted c-update with different settings for Test 1

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) Nx = 185, β = 4 (b) Nx = 185, β = 8 (c) Nx = 185, β = 10

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(d) Nx = 320, β = 4 (e) Nx = 320, β = 8 (f) Nx = 320, β = 10

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(g) Nx = 712, β = 4 (h) Nx = 712, β = 8 (i) Nx = 712, β = 10

Figure 7: The control ui, i ∈ {1, 2, 3, 4}, for different numbers of nodes and values for β for the method with the adapted c-update corresponding to Table 3

After comparing our results for the control with the ones from [MV18], we infer that the setting Nx = 712, c = 0.01, β0 = 350, β = 4 (Row 7, Table 3) is the most similar. We note that the tests in [MV18] are implemented in the programming language C. The

42 authors apply a primal-dual active set strategy which can be interpreted as a semismooth Newton method (see [MV18, pages 9-11]). In addition, a reduced-order approach based on proper orthogonal decomposition is used to speed up their method (see [MV18]).

7

6

5

4

3

2

1

0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a) Augmented Lagrangian method (b) Primal-dual active set strategy results for Nx = 712, c = 0.01, results in [MV18] β0 = 350, β = 4 Figure 8: Comparison of the control u for the different methods

We observe a jump of all controls at time t = 0.5. It is caused by the sudden change of the velocity field at that time step (see (6.7)). In addition, we provide the solutions of the state equation for different time steps for the last computed control u - compare with Figure 8 (a).

1 2.2

0.9 2.1 1 2.5 0.8 2 0.8 0.7

2 1.9 0.6 0.6

1.8 0.4 0.5 1.5 0.4 1.7 0.2

0.3 0 1 1.6 1 0.2 1 1 1 1.5 0.5 0.1 0.5 0.5 0.5 1.4 0 0 0 0 0

(a) time step 1 of 101 (b) time step 33 of 101

2.9 2.6

3 2.8 2.7

2.8 2.7 2.55 2.6

2.6 2.6 2.5 2.5 2.4 2.5 2.4 2.2 2.4 2.45 2 2.3 2.3 1 1 1 1 2.2 2.4 0.5 0.5 0.5 0.5 2.1 0 0 0 0

(c) time step 66 of 101 (d) time step 101 of 101

Figure 9: Solutions for the state equation in the setting Nx = 712, c = 0.01, β0 = 350, β = 4 (Row 3 of Table 3) for different time steps

43 Furthermore, the solutions of the adjoint equation for different time steps for the last computed control u are allocated.

107 106 3

7 2.8 6 10 10 6 4 8 2.6

2.4 5 3 6 2.2 2 4 4 2

1.8 1 2 3

1.6 0 0 1.4 2 1 1 1 1.2 1 0.5 0.5 1 0.5 1 0.5

0 0 0 0

(a) time step 1 of 101 (b) time step 5 of 101

1 8000

0.9 7000 10000 1 0.8 6000 8000 0.5 0.7

6000 5000 0.6 0 4000 4000 0.5

-0.5 0.4 2000 3000 0.3 0 -1 2000 1 1 0.2 1 1 1000 0.5 0.5 0.1 0.5 0.5 0 0 0 0 0

(c) time step 10 of 101 (d) time step 11 to 101 of 101

Figure 10: Solutions for the adjoint equation in the setting Nx = 712, c = 0.01, β0 = 350, β = 4 (Row 3 of Table 3) for different time steps

6.3.2 Test 2 By taking the starting control

0 ui (t) = 3.5 for t ∈ [0,T ] and i ∈ {1, ..., m}, the method needed the adapted c-update to produce acceptable results. Now, for Test 2, every function and the initial data except for the initial control u0 stays the same. This time, we run our algorithm with

0 ui (t) = 0 for t ∈ [0,T ] and i ∈ {1, ..., m}.

In this way, the control value at time T will not be equal to zero. The gradient step including (6.6) shows that the controls ui, i ∈ {1, ..., m}, will always be zero at time T in every step of the algorithm. So, inductively the last computed control is zero at time T . The results are listed in Table 4.

44 Nx c β Iterations (Gradient loops) 1 712 1 4 {4, 4, 507, 61, 269, 166, 133, 425, 51, 419, 3, 4, 2} 2 712 1 6 {4, 4, 582, 3, 78, 171, 722, 348, 2} 3 712 1 8 {4, 4, 515, 3, 2} Table 4: Results for the method with different settings for Test 2

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a) β = 4 (b) β = 6 (c) β = 8

Figure 11: The control ui, i ∈ {1, 2, 3, 4}, for different values of β corresponding to Table 4

Table 4 and Figure 11 show that the algorithm for β = 4 and β = 6 performs similar results than our adapted method in the test above. In case β = 8, the method stops too early. Here, we can start with c = 1. The new starting control does not exert the adapted c-update, i.e. no high increase of c is needed after the first iteration in order to gain expected results. All in all, the algorithm works without the necessity of the adapted c-update when we initiate with the new u0.

0 In fact, when all initial controls ui , i ∈ {1, ..., m}, have equal values at every time and these values are lower or equal than 0.323, the algorithm does not demand the adapted c-update. All other starting controls - more precisely, when their equal values are greater 0 than 0.323 - require the adapted c-update. We note that u has to lie in Uad. This means 0 0 that ui (t) < 0 or ui (t) > 7 is not admissible.

6.3.3 Test 3 In the previous tests, we wanted to find the economic model predictive control (see [GP17, Chapter 8]): Since we chose σQ, σT = 0, we did not want to reach the temperature targets

yQ(t, x) = min(2.0 + t, 3.0) and yT (x) = yQ(T, x).

Our focus was to fulfil the state constraints and push the controls as low as possible. Now, we activate the temperature targets through setting σQ, σT = 1. This also means that we have p(T ) = σT (yT − y(T )) instead of p(T ) = 0 (compare with (AE)). We test again with

0 ui (t) = 0 for t ∈ [0,T ] and i ∈ {1, ..., m}.

All unmentioned values are the same than in the other tests, i.e. σi = 1 for i ∈ {1, ..., m}, 0 0 0 0 −5 ∆t = 0.01, µa = max{0, c0(ya − T u )}, µb = max{0, c0(T u − yb)}, ε = 10 , εgp = 0.2ε

45 and ζ = 0.8. We try different settings for β while c is set to one. The results are recorded in Table 5 and Figure 12.

Nx c β Iterations (Gradient loops) 1 712 1 4 {5, 8, 447, 3, 270, 172, 126, 430, 46, 327, 6, 11, 6, 2} 2 712 1 6 {5, 321, 516, 3, 80, 158, 627, 3, 3, 5, 2} 3 712 1 8 {5, 513, 472, 3, 2} Table 5: Results for the method with different settings for Test 3

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 (a) β = 4 (b) β = 6 (c) β = 8

Figure 12: The control ui, i ∈ {1, 2, 3, 4}, for different values of β corresponding to Table 5

We observe that the method works for β = 4 and β = 6. In case that β is too big, e.g. β = 8, the method stops too early. In Figure 12 (b), we can see a jump in the values of control u4 between the times t = 0.55 and t = 0.6. Also in this setting, the value of β is too big and causes this abnormality. As a consequence, one can say that the setting in row 1 of Table 5 produces the best results, i.e. β = 4.

1.4

1.2

1

0.8

0.6

0.4

0.2

0

-0.2 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Figure 13: The differences in the controls ui, i ∈ {1, 2, 3, 4} between σQ, σT = 0 and σQ, σT = 1 for the setting Nx = 712, c = 1 and β = 4

If we compare the result for σQ, σT = 0 with the result for σQ, σT = 1, i.e. Figure 11 (a) with Figure 12 (a), we can see that the shapes of the graphs are similar. The biggest

46 difference is that for tracking the temperature profiles yQ and yT , we obviously have to pay more costs for the controls between the time t = 0.07 until time t = 0.5. After t = 0.5, there is no big difference in the costs: For σQ, σT = 1, they are higher until time t = 0.87. Then, the costs for σQ, σT = 0 even transcend the costs for σQ, σT = 1 a little. These statements can be seen in Figure 13.

47 Chapter 7 Conclusion and Outlook

In this thesis, the application of the augmented Lagrangian method combined with the first-order gradient projection method to the optimal boundary control problem (OCP) governed by the heat-convection phenomena (SE) was investigated. Let us recall the pro- cedure to our results.

In the first place, we introduced the augmented Lagrangian method for equality con- strained optimization problems and derived its properties. Advantages compared to the common Lagrangian method were revealed and after presenting the numerical realization of the augmented Lagrangian method, first examples were given to make the reader fa- miliar to the algorithm. How to deal with inequality constraints, which are part of the optimal control problem, was shown in the end of Chapter 3.

We proceeded with facing the convection-diffusion model. This linear parabolic dif- ferential equation with Robin boundary conditions is the state constraint of our optimal control problem. To be able to work with the equation theoretically and numerically, we determined its weak formulation followed by the proof of the well-posedness and the unique existence using the coercivity of the bilinear form from the weak formulation. In this way, we could define the solution operator of the heat-convection phenomena and split it into the inhomogeneous party ˆ and the homogeneous part Su (see Definition and Remark 4.5).

Up next, we considered the optimal boundary control problem. Chapter 3 taught us how to construct the corresponding augmented Lagrange function. After some com- putations, we obtained the gradient of the augmented Lagrange function which includes the Hilbert adjoint of S. In practice, we can not evaluate this adjoint. Therefore, we set up the adjoint equation (AE). The unique existence of the solution of this end value partial differential equation was provided by transforming it into an initial value problem. We defined the solution operators R1 and R2 in Definition 5.4 to get a representation of the solution of the adjoint equation. By doing this, we could rewrite the gradient of the augmented Lagrange function such that it was numerically practicable.

Regarding the numerical implementation, our first step was to solve the two aris- ing partial differential equations with the finite element Galerkin ansatz combined with an implicit Euler scheme in time, i.e. we chose the linear finite element space and con- verted the partial differential equations into ordinary differential equations which were then treated by the implicit Euler. Inside the augmented Lagrangian algorithm, we mini- mized the Lagrange function. With having the gradient, we were able to apply a first-order method. In our case, it was the gradient projection method. The projecting function of this method also took on the task of the control constraints. An algorithm of solving the optimal control problem was created and tested in the last part of the thesis. We basically solved the same test than Luca Mechelli and Stefan Volkwein use in their pa- per (see [MV18]). By taking the very same functions and initial guesses, we obtained

48 similar results if we used a certain technique: The first iteration had to be done with a small penalty parameter c. For the next iteration, we were forced to push this param- eter a lot with the increment β while we had to decrease β in the following loops and chose it within the common range [4, 10] for Lagrangian methods (see [Ber99, page 405]). Without using this technique, the method could not converge to the results Mechelli and Volkwein obtained. However, in the second test, we only modified the initial guess for the control. This changed the execution of the augmented Lagrangian method. We were able to start with c = 1 and without the necessity of the technique above but using a common constant β, we obtained similar results. The previous tests concentrated on finding the so-called economic model predictive control (see [GP17, Chapter 8]) and in the last test, we actually tried to track the prescribed temperature profiles. In case we chose the right initial control, we obtained reliable results without the explained c-update. Summarizing, using the augmented Lagrangian method to solve our problem was efficient.

Nevertheless, there is still space for several further investigations: It is possible to apply a second-order method to minimize the augmented Lagrange function. Of course, the second derivative is then required. The question is whether the presumable lower number of iterations can save enough computing time such that it can top the extra time of calculating the Hessian matrix and solving the bigger system. Furthermore, utilizing the finite element method for optimal control problems can quickly demand high costs because finite element spaces are normally high-dimensional. Model order reduction may help to achieve a faster convergence. Its concept is to create a low-dimensional subspace which still owns the most central aspects of the investigated problem. We refer to the joint book [SvdVR08] from Schilders, van der Vorst and Rommes for an overview and exem- plary applications. In particular, the specific method of proper orthogonal decomposition can be found there. In general, this method seems to be most widely used in optimal con- trol of parabolic partial differential equations (see [TV09, pages 83-115], [ABK01, pages 1311-1330] and [AFS00, Report no. 2000-25] for instance).

49 References

[ABK01] J. A. Atwell, J. T. Borggaard, and B. B. King. Reduced order controllers for Burgers’ equation with a nonlinear observer. International Journal of Applied Mathematics and Computer Science 11, no. 6, 2001.

[AFS00] E. Arian, M. Fahl, and E. W. Sachs. Trust-region Proper Orthogonal Decom- position for Flow Control. ICASE, 2000.

[Ber96] D. P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, Belmont, Massachusetts, 1996.

[Ber99] D. P. Bertsekas. . Athena Scientific, Belmont, Mas- sachusetts, second edition, 1999.

[Den13] R. Denk. Skript zur Vorlesung Theorie partieller Differentialgle- ichungen II. lecture manuscript, Konstanz, Baden-W¨urttemberg, 2013. https://www.mathematik.uni-konstanz.de/denk/forschung/ publikationen/skripten/, Accessed: February 11, 2018.

[Eva10] L. Evans. Partial Differential Equations. American Mathematical Society, Providence, Rhode Island, second edition, 2010.

[GP17] L. Gr¨uneand J. Pannek. Nonlinear Model Predictive Control: Theory and Algorithms. Springer, London, London, second edition, 2017.

[Hes69] M.R. Hestenes. Multiplier and Gradient Methods. Journal of Optimization Theory and Applications 4, 1969.

[Kel99] C.T. Kelley. Iterative Methods for Optimization. Society for Industrial and Applied Mathematics, Raleigh, North Carolina, 1999.

[LY08] D. G. Luenberger and Y. Ye. Linear and Nonlinear Programming. Springer, Stanford, California, fourth edition, 2008.

[MS82] J. Macki and A. Strauss. Introduction to Optimal Control Theory. Springer, Berlin, Berlin, 1982.

[MV18] L. Mechelli and S. Volkwein. POD-Based Economic Optimal Control of Heat- Convection Phenomena. Numerical methods for optimal control problems, eds. Falcone, Ferretti, Gruene, McEneaney. Springer INdAM Series, 2018. http://nbn-resolving.de/urn:nbn:de:bsz:352-2--au6ei3apyzpv0, Ac- cessed: April 22, 2018.

[NW06] J. Nocedal and S. J. Wright. Numerical Optimization. Springer, New York City, New York, second edition, 2006.

50 [Paz92] A. Pazy. Semigroups of linear operators and applications to partial differential equations. Springer, New York City, New York, 1992.

[Pow69] M.J.D. Powell. A method for nonlinear constraints in minimization problem, in Optimization ed. by R. Fletcher. Academic Press, New York City, New York, 1969.

[Roc73] R. T. Rockafellar. The multiplier method of Hestenes and Powell applied to convex programming. Journal of Optimization Theory and Applications 12, 1973.

[Roc74] R. T. Rockafellar. Augmented Lagrange multiplier functions and in nonconvex programming. SIAM Journal on Control 12, 1974.

[Roc76] R. T. Rockafellar. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of Operations Research 1, no. 2, 1976.

[Str69] D. Struik. A Source Book in Mathematics, 1200-1800. Harvard University Press, Cambridge, Massachusetts, 1969.

[SvdVR08] W. H. Schilders, H. A. van der Vorst, and J. Rommes. Model Order Reduction: Theory, Research Aspects and Applications. Springer, Berlin, Berlin, 2008.

[Tr¨o10] F. Tr¨oltzsch. Optimal Control of Partial Differential Equations: Theory, Methods and Applications. American Mathematical Society, Providence, Rhode Island, 2010.

[TV09] F. Tr¨oltzsch and S. Volkwein. POD a-posteriori error estimates for linear- quadratic optimal control problems. Computational Optimization and Appli- cations 44, 2009.

[ZTZ05] O. C. Zienkiewicz, R. L. Taylor, and J.Z. Zhu. The finite element method: its basis and fundamentals. Elsevier Butterworth-Heinemann, Oxford, Oxford- shire, sixth edition, 2005.

51