Penalty, Barrier and Augmented Lagrangian Methods Jesus´ Omar Ocegueda Gonzalez´

Abstract— Infeasible-Interior-Point methods shown in previous The quadratic penalty method constructs the following un- homeworks are well behaved when the number of constraints constrained function: are small and the dimension of the energy function domain is 1 X 2 also small. This fact is easily seen, since each iteration of such Q(x, µk) = f(x) + ci(x) 2µk methods requires of solving a linear equation system whose size i∈E depends precisely on the number of constraints and the dimension of the search space. In addition, the energy functions that we could where µk > 0 is the “penalty parameter”. optimize with the previous approaches are restricted to be linear with linear constraints. In this homework I describe three new If µk → 0 then the infeasibilities are increasingly penalized, methods that deal with these inconvenients. forcing the solution to be “almost feasible”.

B. The general constrained optimization problem I.INTRODUCTION In the general case, we would like to penalize a point x HE, main idea of the kind of methods described here whenever ci(x) < 0 but not when ci(x) ≥ 0. To achieve this T is to construct a sequence of unconstrained optimization we define the operator [·]− as: problems whose approximated solutions converge to a solution − of the original constrained problem. Such a sequence penalizes [x] = max{−x, 0}. every non-feasible point. The sequence starts with a small penalty term and this is sequencially inreased with time. At ∗ each iteration k we find an approximation xk to the solution of the modified unconstrained problem within a given tolerance ∗ τk. We expect that the sequence {xk} converge to a solution ∗ ∗ to the original problem, i.e. limk→∞ x = x . k a)Effect of the [·]− operator. b)Quadratic penalty function. Fig. 1. Quadratic penalty function for inequality constraints for different II.THEPROBLEM values of µ. In order to test the performance of the three methods Using this expression we define the function (see figure (4)) described here, I will use the energy function given by 1 X 2 1 X − 2 X Q(x, µ) = f(x) + ci(x) + ([ci(x)] ) U(f) = (f − f )2 subject to |f − g | ≤ δ 2µ 2µ r s r r i∈E i∈I ∈C2 these constraints are equivalent to C. Implementation In this case, the unconstrained problem is fr − gr + δ ≥ 0 X 2 gr − fr + δ ≥ 0 min Q(f, µ) = (fr − fs) +

∈C2 in this context, r, s ∈ S, where S is the set of “sites” in which 1 X − 2 − 2 the function f and g are defined. C2 is the set of “cliques” of ([f − g + δ] ) + ([g − f + δ] ) . 2µ r r r r order 2 and δ is a fixed positive constant. Intuitively, we are r∈S looking for the “smoothest” f for which the constraints are Taking the first derivative of Q respect to fr we have: held. The expected result is an “edge-preserving regularization” ∂Q X 1 − of the observed image g. (f, µ) = 2(fr −fs)− [fr − gr + δ] I(gr −fr −δ) ∂fr µ s∈Nr III.THE QUADRATIC PENALTY METHOD 1 − [g − f + δ]− I(f − g − δ) µ r r r r A. Equality-constrained problem where Nr is the first order neighbourhood of the site r, and I We will first examine the case in which we have only equality is the “step function” given by constraints: 0, x ≤ 0 I(x) = minx f(x) subject to ci(x) = 0. 1, x > 0 TABLE I NUMERICAL RESULTS OBTAINED WITH THE QUADRATIC PENALTY METHOD (USINGNORMALIZEDIMAGES).

Iter. Optimum value Mean constraint violation bola, δ = 0.2 20 × 5 56.161 0.01454 bola, δ = 0.1 20 × 5 144.788 0.01157 bola, δ = 0.05 20 × 5 228.742 0.00893 taza, δ = 0.2 20 × 5 9.185 0.00433 taza, δ = 0.1 20 × 5 27.744 0.00540 taza, δ = 0.1 20 × 5 46.801 0.00481

Since

d([x]−)2 = 2[x]−(−I(−x)) = −2[x]−I(−x) = 2xI(−x) dx we have

∂Q X 1 (f, µ) = 2(fr − fs) + (fr − gr + δ)I(gr − fr − δ) ∂fr µ s∈Nr

1 + (g − f + δ)I(f − g − δ) µ r r r r then, making ∂Q (f, µ) = 0 we obtain: ∂fr

I(g − f − δ) I(f − g − δ) f 2]N + r r − r r = r r µ µ a)Optimal image found. b)λ. c)λ˜. X 1 1 = 2 fs + (δ−gr)I(gr −fr −δ)− (δ+gr)I(fr −gr −δ) Fig. 2. Results obtained for δ = 0.2 (ﬁrst row), δ = 0.1 (second row) and µ µ δ = 0.05 (third row). s∈Nr

Since the value of I is not in terms of fr we can use the Gauss-Sidel iterative scheme given by E. Conclusions The quadratic penalty method leads us to a simplification 2 P f + 1 (δ − g )I − 1 (δ + g )I of the constrained optimization problem which can be solved s∈Nr s µ r 1 µ r 2 fr = using conventional methods. In the case of quadratic energy 2]N + I1 − I2 r µ µ functions, Gauss-Seidel approach can be used with excelent results, at least visually; it is difficult even to know what is where the optimal value of the energy function, then we can not I = I(g − f − δ) 1 r r know how good is the result obtained. The convergence is I = I(f − g − δ) 2 r r fast and the results are good, but the constraints are slightly violated. In the case of image processing, a slight violation of the constraints is not important but in cases in which it is so, a D. Experimental results correction must be done to x∗ to make it feasible. Finally, the For the experiments below I show the original image, the so- results on noisy images are not as good as on non-perturbed lution found, the estimated lagrange multipliers, the numerical images. value of the energy function, the average of the violation of the constraints v defined as IV. LOGARITHMIC BARRIER METHOD 1 X v = [c (x∗)]− ]S i As we can see, the quadratic penalty method has the incon- r∈S venient of slightly violating the constraints. We saw that in the case of image processing this is not too important. Barrier Our problem becomes

X 2 min P (f, µk) = (fr − fs)

∈C2

X −µk [log(fr − gr + δ) + log(gr − fr + δ)] r∈S

X ∇P (f, µk) = 2 (fr − fs) a)Noisy image, σ = 0.1. b)Result with δ = 0.17. s∈Nr Fig. 3. Results obtained by the quadratic penalty method on noisy images. X −µk [log(fr − gr + δ) + log(gr − fr + δ)] r∈S methods are used in cases in which it is very important the solution to be feasible, the general formulation of these methods begins by deﬁning a Barrier function, which is a function T X X 1 1 = 2 (fr − fs) − µk − deﬁned only in F0 for which the following propierties hold: (fr − gr + δ) (gr − fr + δ) s∈Nr r∈S

• limx→x0 T (x) = ∞ ∀x0 ∈ ∂F0 • T is smooth into F0 The modiﬁcation we have to do to avoid leaving the strictly feasible set is the following: where F0 is the strictly feasible set: t t Let φr = −h∇Pr(f , µt), we will leave the strictly feasible n F0 = {x ∈ R | ci(x) > 0∀i ∈ I} set if either

t+1 t t t The most important barrier function is the logarithmic bar- fr − gr + δ ≤ 0 ⇔ fr + φr − gr + δ ≤ 0 ⇔ φr ≤ gr − fr − δ rier function deﬁned as: X or T (x) = − log(ci(x)). t+1 t t t i∈I gr −fr +δ ≤ 0 ⇔ gr −fr −φr +δ ≤ 0 ⇔ φr ≥ gr −fr +δ. The unconstrained optimization problem is, then, minimize Let Ω be the subset Ω ⊂ S of sites for which one of the X P (x, µ) = f(x) − µ log(ci(x)). conditions above holds. Then we choose α = 1 if Ω is empty i∈I and if Ω is non-empty where µ is the barrier parameter that in is fact written as µk g − f − δ α = min r r and limk→∞ µk = 0.As we can see, it is necesary to have r∈Ω t φr a feasible starting point and, clearly, it is necesary F0 to be non-empty. The presence of the log function makes it harder if the ﬁrst condition holds and to minimize this function than the quadratic penalty function. gr − fr + δ α = minr∈Ω t φr

A. Implementation if the second condition holds. To ensure the next point is strictly feasible, we must set xt+1 = xt + α(1 − )φt. Once again, we will try to solve the problem described in section II, using the log-barrier method. We can make use of any of the conventional strategies for unconstrained B. Experimental results optimization, taking care of avoid leaving the feasible set {0. For this homework I implemented a simple gradient descense This method has many problems to converge. The minimum iterative method. In this case we have value of the energy function is greater that the obtained with t the quadratic penaty method. Visually, the regularization is X ∇ci(x ) notably worst than the obtained with the previous method. I ∇P (x, µ) = ∇f(x) − µ t ci(x ) i∈I will show the results in the same cases except on the noisy images. I think that, given the results, we can not expect good which leads us to yhe iterative scheme news on these.   t X ∇ci(x ) t+1 t t The Lagrange multipliers are difﬁcult to estimate because the x = x − h ∇f(x ) − µ t  ci(x ) i∈i constraints quickly take values close to zero. this is the Lagrangian with a quadratic penalty term on the constraints.

Taking the ﬁrst derivative of this function, we have X 1 X ∇L (x, λ, µ) = ∇f(x) − λ ∇c (x) + c (x)∇c (x) A i i µ i i i∈E i∈E X ci(x) = ∇f(x) − ∇c (x) λ − i i µ i∈E a)δ = 0.2 b)δ = 0.1 c)δ = 0.05 In an iterative scheme, we can see from the previous expres- Fig. 4. Results using log-barrier method. sion that k ci(x ) TABLE II λ∗ ≈ λk − i i µ NUMERICAL RESULTS OBTAINED WITH THE log-barrier METHOD (USING k NORMALIZEDIMAGES). when xk ≈ x∗.

Iter. Optimum value Mean constraint violation This property, motivates the Method of Multipliers-Equality bola, δ = 0.2 100 × 10 115.036 1.2 × 10−8 Constraints, that consists on the same scheme as before, bola, δ = 0.1 100 × 10 183.857 1.87 × 10−8 generating a sequence of partial solutions {xk} minimizing k bola, δ = 0.05 100 × 10 281.524 1.01 × 10−8 the Augemnted Lagrangian function leaving λ fixed and at taza, δ = 0.2 100 × 10 29.763 1.57 × 10−8 each iteration we set −8 k taza, δ = 0.1 100 × 10 43.534 1.31 × 10 k+1 k ci(x ) −9 λi = λi − taza, δ = 0.1 100 × 10 54.810 5.59 × 10 µk Now, we need an extension of this idea for inequality constraints. This is seen in the next subsection. C. Conclusions Given the experimetal results, it is clear that the gradient descent method used is not a good choice, the optimal values A. Augmented Lagrangian for inequality constraints found are very poor either numericaly and visually. The technique we will use to handle inequality constraints is the same we used when we studied the Simplex method, this This method rapidly becomes unstable, and we need to is, we will introduce slack variables. First, asume that we have monitor de numerical value of the variables to avoid them to just inequality constraints. Given the problem become zero or infinity. min f(x) subject to ci(x) ≥ 0 i ∈ I The constraints violation is clearly smaller than the obtained we reformulate it as with quadratic penalty, and in fact this is due to numerical error, because the form in which we avoid leaving the strictly min f(x) subject to ci(x) − si = 0 i ∈ I si ≥ 0. feasible set is teoretically correct. This expression seems to be helpless due to the presence of the new constraints si ≥ 0, i ∈ I. The difference is that now the inequality constraints are linear and, as we will see, it is V. AUGMENTED LAGRANGIAN METHOD easier to handle this constraints. When we use the quadratic penalty method, the equality constraints are not satisfied. Instead, the value of these are By writing the Augmented Lagrangian Function, our new problem is k ∗ ci(x ) = −µkλi ∀i ∈ E X 1 X 2 min LA(x, λ, µ) = f(x)− λi(ci(x)−si)+ (ci(x)−si) µ → 0 c (x) → 0 2µ instead, k , i , making the constraint to i∈E i∈E be satisfied. The lagrangian method avoid having this infeasibilities through the estimation of the lagrange multipliers. subject to si ≥ 0. The Augmented Lagrangian Function is defined as Now lets see which values of si minimize this function X 1 X 2 LA(x, λ, µ) = f(x) − λici(x) + ci(x) ∂LA 1 2µ (x, λ, µ) = λi − (ci(x) − si) i∈E i∈E ∂si µ TABLE III this function has a critical point in NUMERICAL RESULTS OBTAINED WITH THE AUGMENTED LAGRANGIAN METHOD (USINGNORMALIZEDIMAGES). si = ci(x) − µλi Iter. Optimum value Mean constraint violation since the function is quadratic with respect to si we have that −7 the restricted minimizer is bola, δ = 0.2 20 × 5 53.719 2.74 × 10 bola, δ = 0.1 20 × 5 140.345 2.39 × 10−7 −7 si = max{c(x) − µλi, 0}. bola, δ = 0.05 20 × 5 223.900 1.03 × 10 taza, δ = 0.2 20 × 5 8.956 1.04 × 10−7 Using this expression, and substituting its value on the taza, δ = 0.1 20 × 5 27.143 3.02 × 10−7 original problem, we see that taza, δ = 0.1 20 × 5 46.016 2.50 × 10−7

k 1 2 −λi (ci(x) − si) + (ci(x) − si) = 2µ B. Implementation For the problem stated in II, the Augmented Lagrangian for  −λkc (x) + 1 c (x)2 if c (x) − µλk ≤ 0  i i 2µ i i i inequality constraints has the form ˜  µ k 2 LA(f, λ, λ, µ) = − 2 (λi ) otherwise X X = (f −f )2 + (Ψ(c (f), λ , µ)+Ψ(˜c (f), λ˜ , µ)) In order to make this expression to be clear, we introduce r s r r r r ∈C r∈S the function Ψ given by 2 where  −σt + 1 t2 if t − µσ ≤ 0 cr(f) = gr − fr + δ  2µ c˜ (f) = f − g + δ Ψ(t, σ, µ) = r r r  µ 2 − 2 σ otherwise taking the first derivative with respect to fr we have ∂LA ˜ X Finally, we obtained the transformed problem (f, λ, λ, µ) = 2 (fr − fs) ∂fr s∈Nr k X k min LA(x, λ , µk) = f(x) + Ψ(ci(x), λi , µk) (g − f + δ) (f − g + δ) + λ − r r I + −λ˜ + r r I i∈I r µ 1 r µ 2 if we compare the derivative of this function where I is the step function defined above and I = I(µλ − (g − f + δ)) ∇ L (x , λk, µ ) 1 r r r x A k k ˜ I2 = I(µλr − (fr − gr + δ)) k X ci(x ) Again, the expression for I is not in terms of fr, then we = ∇f(x ) − λk − ∇c (xk) ≈ 0 k i µ i can apply the Gauss-Seidel iterative scheme to minimize the k k k i∈I|ci(x )≤µλi Augmented Lagrangian h i h i with the first KKT condition for an optimal point 2 P f + (gr +δ) − λ I + λ˜ − (δ−gr ) I s∈Nr s µ r 1 r 2 fr = I1 I2 ∗ X ∗ ∗ 2]Nr + µ + µ ∇f(x ) − λi ∇ci(x ) = 0 ∗ i∈I|ci(x )=0 C. Experimental results we see that the values of the Lagrange multipliers should be In fugure (5) I show the optimal images found applying the k ∗ k ci(x ) Augmented Lagrangian Method on the same images as before. λi ≈ λi − µk The quality of the images seems to be equal than the obtained with the quadratic penalty method, but by evaluating the then keeping their values to be nonnegative (we know that the energy function of the result we can see that the Augmented Lagrange multipliers must be nonnegative) we can construct Lagrangian Method si slightly better (see table III). The the sequence of partial solutions {xk} as before but with the main difference is seen on the lagrangian multipliers. As we extra step of actualizing the values of the lagrange multipliers said above, the problem can be seen as an Edge Preserving using the formula Regularization, then it is expected the Lagrange multipliers to be considerably biger in the edges of the image than in k k+1 k ci(x ) non-endge sites. This property can not be seen on the images λi = max{λi − , 0} µk given by the quadratic penalty method, but the images obtained a)Noisy image, σ = 0.1. b)Result with δ = 0.2. Fig. 6. Results obtained by the Augmented Lagrangian method on noisy images.

results for the test on noisy images are similar than the obtained eith the quadratic penalty method.

a)Result. b)λ. c)λ˜. d)λ + λ˜. Fig. 5. Results obtained for δ = 0.2 (rows 1 & 4), δ = 0.1 (rows 2 & 5) and δ = 0.05 (rows 3 & 6). using the Augmented Lagrangian show clearly this property. Clearly, since an edge is deﬁned by two sites say < r, s >, if the lagrange multiplier corresponding to the inequality cr(f) is activated then the multiplier corresponding to the inequality c˜s(f) should be activated on s. This property can also be clearly seen on the image formed by the sum of the Lagrange multipliers (the “edges” are wider than any of the other two images).

Another important characteristic that can be seen in table III is that the mean constraint violation is considerably smaller than the obtained with the quadratic penalty method.

D. Conclusion Given the results, (both numerical and visual), and the characteristics mentioned in the previous section, there is no much to say... This method is considerably better than the other two methods developed in this homework. In fact this method improves all the deﬁtiencies presented by its competitors. The